Skip to content

Options

kevin-montrose edited this page Apr 10, 2021 · 10 revisions

Options & Options Builder

Introduction

Options is an immutable class, instances of which describe how to read and write a specific Delimiter-Separated Value (DSV) format. As an immutable class, Options instances are thread safe and equatable.

They are created using the OptionBuilder class. OptionBuilder are mutable, and therefore neither thread safe nor equatable.

Available Options

The available options are as follows.

Value Separator

The string used to separate different cells in a row.

For comma separated values (CSV) this is ,, while for tab separated values (TSV) this is \t.

This option is required, and must not start with the same character as Escaped Value Start And End or Comment Character

Read Row Ending

What character sequence ends a row, as a ReadRowEnding - one of:

  • ReadRowEnding.CarriageReturn (\r)
  • ReadRowEnding.LineFeed (\n)
  • ReadRowEnding.CarriageReturnLineFeed (\r\n).
  • ReadRowEnding.Detect

Typically this value is ReadRowEnding.Detect.

This option is required.

Write Row Ending

What character sequence ends a row, as a WriteRowEnding - one of:

  • WriteRowEnding.CarriageReturn (\r)
  • WriteRowEnding.LineFeed (\n)
  • WriteRowEnding.CarriageReturnLineFeed (\r\n).

Typically this value is WriteRowEnding.CarriageReturnLineFeed.

This option is required.

Escaped Value Start And End

If a cell has a value that needs escaping (typically because the value contains a special character, like the row ending or value separator), what character starts and ends an escaped value.

For CSV this is typically ", so a value of Hello, World would be rendered as "Hello, World".

This option is not required, but if set it must not be the same as Value Separator or Comment Character.

If not set, attempting to write a value that requires escaping will throw an exception.

If a cell is escaped, but contains Escaped Value Start And End, the character that is written before Escaped Value Start And End to escape it.

For CSV this is typically also ", so a value of Billy "Bob" Bobson would be rendered as "Billy ""Bob"" Bobson".

This option is not required, but if set then Escaped Value Start And End must also be set.

If not set, attempting to write a value that contains Escaped Value Start And End will throw an exception.

When reading, whether or not a header is expected to be present as a ReadHeader - one of:

  • ReadHeader.Never
  • ReadHeader.Always
  • ReadHeader.Detect

If ReadHeader.Never and a header is present, will raise an exception.

If ReadHeader.Always and a header is not present, will raise an exception.

If ReadHeader.Detect then Cesil will automatically detect if a header is present.

For ReadHeader.Always and ReaderHeader.Detect the order of columns in the header will influence the members set when reading, while with ReadHeader.Never whatever is returned by the configured Type Describer will be used unaltered.

This option is required.

Write Header

Whether or not to write headers, as a WriteHeader - one of:

  • WriteHeader.Always
  • WriteHeader.Never

The order of columns in a header is determined by the configured Type Describer.

This option is required.

Type Describer

Which ITypeDescriber to use to control what is read or written for a given type or dynamic instance.

DefaultTypeDescriber implements "normal" .NET behavior, and should accommodate most needs.

This option is required.

Write Trailing Row Ending

Whether or not to end the last written row with a row ending as a WriteTrailingRowEnding - one of:

  • WriteTrailingRowEnding.Always
  • WriteTrailingRowEnding.Never

This option is required.

Memory Pool Provider

The IMemoryPoolProvider Cesil will use during normal operations.

While there are cases where Cesil still has to allocate "normally" (via new or similar), considerable effort is taken to avoid allocations. For cases where temporary scratch space (including the buffers documented below) is needed, it is rented from MemoryPool<T>s obtained from the configured IMemoryPoolProvider.

This option is required.

Write Buffer Size Hint

Cesil can buffer writes before flushing to the underlying stream, pipe writer, buffer writer, etc. This can result in increased performance, but requires some amount of memory be allocated per-writer.

Setting the write buffer size hint to 0 will disable write buffering, setting it to null will make Cesil use a "reasonable" default value.

This option is not required.

Comment Character

Comments are a non-standard extension to CSV, and are present in some other DSV formats.

Cesil supports whole line comments which are prefixed by a specific character, typically #.

This option is not required, but if set it must not be the same as Value Separator or Escaped Value Start And End.

Read Buffer Size Hint

Cesil reads from the underlying stream, pipe reader, read only sequence, etc. into read buffers.

This setting is used to influence the size of the read buffers. If set to 0, a reasonable default size is used instead.

This option is not required.

Dynamic Row Disposal

When reading dynamic rows, resources may be allocated to back those rows. This option controls when those resources are released with a DynamicRowDisposal.

For DynamicRowDisposal.OnReaderDispose all rows returned by an IReader<dynamic> or IAsyncReader<dynamic> are automatically disposed when the reader is disposed.

For DynamicRowDisposal.OnExplicitDispose each row must be explicitly disposed with a call with Dispose().

This option is required.

Whitespace Treatment

How to handle whitespace encountered while reading rows, with a WhitespaceTreatments. Whitespace is defined as any char for which Char.IsWhiteSpace(char) returns true.

This can be divided into two categories:

  • What do with whitespace "outside" of values
  • What do with whitespace "inside" values
    • Note that if a cell isn't escaped (ie. it's not wrapped in quote, for CSV) then there's no distinction between "inside" and "outside"

When there's whitespace "outside" of values, it is possible that the data cannot be parsed without specifying a WhitespaceTreatments. For example hello, "world" is often considered malformed CSV as " should be the first character in an escaped cell.

When there is whitespace inside of values, it is possible that converting from text to a more specific data type can fail. For example, mapping the second cell in foo," 123" to an int would fail under many circumstance because of the leading whitespace. Specifying a custom Parser can handle this case, but Cesil can also do this automatically.

Set whitespace treatment to some combination of WhitespaceTreatments.TrimBeforeValues, WhitespaceTreatments.TrimAfterValues, WhitespaceTreatments.TrimLeadingInValues, and WhitespaceTreatments.TrimTrailingInValues to handle cases like those.

WhitespaceTreatments.Preserve is used to indicate that no special treatment of whitespace is desired.

WhitespaceTreatments.TrimInValues trims both leading and trailing whitespace in values.

WhitespaceTreatments.TrimBetweenValues trims leading and trailing whitespace outside of values.

WhitespaceTreatments.Trim trims leading and trailing whitespace both inside and outside of values.

This option is not required, which results in the equivalent of WhitespaceTreatments.Preserve.

Extra Column Treatment

When reading CSVs it is possible for there to be "extra" columns, and Cesil can be configured to respond to this in different ways.

Cesil expects each row to have a number of columns equal to or less than the number of columns in the header row (if present), the number of columns returned from the configured ITypeDescriber's' EnumerateMembersToDeserialize(TypeInfo) method (if no header row is present and reading a static type), or the number of columns in the first row (if no header row is present and reading a dynamic type). Any additional columns past that are considered "extra".

Cesil can respond to extra columns in three ways, indicated with an ExtraColumnTreatment:

  • ExtraColumnTreatment.Ignore
    • The column will be silently ignored as if it were not present. It will not be parsed or passed to any user code.
    • Although ignored, the extra columns must still be validly formatted.
  • ExtraColumnTreatment.IncludeDynamic
    • If reading into a dynamic, the column will be included in the row. Because the column was unexpected, it will not be accesible by name but can be accessed by index and will be present if the dynamic row is treated as an enumerable.
    • For static types, this is equivalent to ExtraColumnTreatment.Ignore
  • ExtraColumnTreatment.ThrowException
    • An exception will be thrown if any extra column is encountered.

This option is required.

Building Options

OptionBuilder follows the builder pattern, and in particular is patterned after .NET's Immutable Collections.

Creating An OptionsBuilder

You can obtain an OptionBuilder using one of these static methods:

Setting Options

Each public property on Options and OptionBuilder has at least one corresponding WithXXX(...) method on OptionBuilder which takes the new option value, and returns the now-modified OptionBuilder.

For example, to set the expected row ending to \r call OptionsBuilder.WithRowEnding(RowEnding.CarriageReturn).

Creating Options

Call OptionsBuilder.ToOptions() to create a new Options instance with the current values on the OptionBuilder.

After creating an Options, the OptionsBuilder can continue to be used - changes on the OptionBuilder will not modify any previously created Options.

Thread Safety

Options is immutable, while OptionBuilder is mutable. Accordingly, all members on Options are always thread safe and no members on OptionBuilder can be safely accessed from multiple threads simultaneously.