-
-
Notifications
You must be signed in to change notification settings - Fork 3
Options
- Introduction
-
Available Options
- Value Separator
- Read Row Ending
- Write Row Ending
- Escaped Value Start And End
- Escaped Value Escape Character
- Read Header
- Write Header
- Type Describer
- Write Trailing Row Ending
- Memory Pool Provider
- Write Buffer Size Hint
- Comment Character
- Read Buffer Size Hint
- Dynamic Row Disposal
- Whitespace Treatment
- Extra Column Treatment
- Building Options
- Thread Safety
Options
is an immutable class, instances of which describe how to read and write a specific Delimiter-Separated Value (DSV) format. As an immutable class, Options
instances are thread safe and equatable.
They are created using the OptionBuilder
class. OptionBuilder
are mutable, and therefore neither thread safe nor equatable.
The available options are as follows.
The string used to separate different cells in a row.
For comma separated values (CSV) this is ,
, while for tab separated values (TSV) this is \t
.
This option is required, and must not start with the same character as Escaped Value Start And End
or Comment Character
What character sequence ends a row, as a ReadRowEnding
- one of:
-
ReadRowEnding.CarriageReturn
(\r
) -
ReadRowEnding.LineFeed
(\n
) -
ReadRowEnding.CarriageReturnLineFeed
(\r\n
). ReadRowEnding.Detect
Typically this value is ReadRowEnding.Detect
.
This option is required.
What character sequence ends a row, as a WriteRowEnding
- one of:
-
WriteRowEnding.CarriageReturn
(\r
) -
WriteRowEnding.LineFeed
(\n
) -
WriteRowEnding.CarriageReturnLineFeed
(\r\n
).
Typically this value is WriteRowEnding.CarriageReturnLineFeed
.
This option is required.
If a cell has a value that needs escaping (typically because the value contains a special character, like the row ending or value separator), what character starts and ends an escaped value.
For CSV this is typically "
, so a value of Hello, World
would be rendered as "Hello, World"
.
This option is not required, but if set it must not be the same as Value Separator
or Comment Character
.
If not set, attempting to write a value that requires escaping will throw an exception.
If a cell is escaped, but contains Escaped Value Start And End, the character that is written before Escaped Value Start And End to escape it.
For CSV this is typically also "
, so a value of Billy "Bob" Bobson
would be rendered as "Billy ""Bob"" Bobson"
.
This option is not required, but if set then Escaped Value Start And End must also be set.
If not set, attempting to write a value that contains Escaped Value Start And End will throw an exception.
When reading, whether or not a header is expected to be present as a ReadHeader
- one of:
ReadHeader.Never
ReadHeader.Always
ReadHeader.Detect
If ReadHeader.Never
and a header is present, will raise an exception.
If ReadHeader.Always
and a header is not present, will raise an exception.
If ReadHeader.Detect
then Cesil will automatically detect if a header is present.
For ReadHeader.Always
and ReaderHeader.Detect
the order of columns in the header will influence the members set when reading, while with ReadHeader.Never
whatever is returned by the configured Type Describer will be used unaltered.
This option is required.
Whether or not to write headers, as a WriteHeader
- one of:
WriteHeader.Always
WriteHeader.Never
The order of columns in a header is determined by the configured Type Describer.
This option is required.
Which ITypeDescriber
to use to control what is read or written for a given type or dynamic
instance.
DefaultTypeDescriber implements "normal" .NET behavior, and should accommodate most needs.
This option is required.
Whether or not to end the last written row with a row ending as a WriteTrailingRowEnding
- one of:
WriteTrailingRowEnding.Always
WriteTrailingRowEnding.Never
This option is required.
The IMemoryPoolProvider
Cesil will use during normal operations.
While there are cases where Cesil still has to allocate "normally" (via new
or similar), considerable effort is taken to avoid allocations. For cases where temporary scratch space (including the buffers documented below) is needed, it is rented from MemoryPool<T>s
obtained from the configured IMemoryPoolProvider
.
This option is required.
Cesil can buffer writes before flushing to the underlying stream, pipe writer, buffer writer, etc. This can result in increased performance, but requires some amount of memory be allocated per-writer.
Setting the write buffer size hint to 0
will disable write buffering, setting it to null
will make Cesil use a "reasonable" default value.
This option is not required.
Comments are a non-standard extension to CSV, and are present in some other DSV formats.
Cesil supports whole line comments which are prefixed by a specific character, typically #
.
This option is not required, but if set it must not be the same as Value Separator
or Escaped Value Start And End.
Cesil reads from the underlying stream, pipe reader, read only sequence, etc. into read buffers.
This setting is used to influence the size of the read buffers. If set to 0
, a reasonable default size is used instead.
This option is not required.
When reading dynamic
rows, resources may be allocated to back those rows. This option controls when those resources are released with a DynamicRowDisposal
.
For DynamicRowDisposal.OnReaderDispose
all rows returned by an IReader<dynamic>
or IAsyncReader<dynamic>
are automatically disposed when the reader is disposed.
For DynamicRowDisposal.OnExplicitDispose
each row must be explicitly disposed with a call with Dispose()
.
This option is required.
How to handle whitespace encountered while reading rows, with a WhitespaceTreatments
. Whitespace is defined as any char
for which Char.IsWhiteSpace(char)
returns true
.
This can be divided into two categories:
- What do with whitespace "outside" of values
- What do with whitespace "inside" values
- Note that if a cell isn't escaped (ie. it's not wrapped in quote, for CSV) then there's no distinction between "inside" and "outside"
When there's whitespace "outside" of values, it is possible that the data cannot be parsed without specifying a WhitespaceTreatments
. For example hello, "world"
is often considered malformed CSV as "
should be the first character in an escaped cell.
When there is whitespace inside of values, it is possible that converting from text to a more specific data type can fail. For example, mapping the second cell in foo," 123"
to an int
would fail under many circumstance because of the leading whitespace. Specifying a custom Parser
can handle this case, but Cesil can also do this automatically.
Set whitespace treatment to some combination of WhitespaceTreatments.TrimBeforeValues
, WhitespaceTreatments.TrimAfterValues
, WhitespaceTreatments.TrimLeadingInValues
, and WhitespaceTreatments.TrimTrailingInValues
to handle cases like those.
WhitespaceTreatments.Preserve
is used to indicate that no special treatment of whitespace is desired.
WhitespaceTreatments.TrimInValues
trims both leading and trailing whitespace in values.
WhitespaceTreatments.TrimBetweenValues
trims leading and trailing whitespace outside of values.
WhitespaceTreatments.Trim
trims leading and trailing whitespace both inside and outside of values.
This option is not required, which results in the equivalent of WhitespaceTreatments.Preserve
.
When reading CSVs it is possible for there to be "extra" columns, and Cesil can be configured to respond to this in different ways.
Cesil expects each row to have a number of columns equal to or less than the number of columns in the header row (if present), the number of columns returned from the configured ITypeDescriber
's' EnumerateMembersToDeserialize(TypeInfo)
method (if no header row is present and reading a static type), or the number of columns in the first row (if no header row is present and reading a dynamic type). Any additional columns past that are considered "extra".
Cesil can respond to extra columns in three ways, indicated with an ExtraColumnTreatment
:
-
ExtraColumnTreatment.Ignore
- The column will be silently ignored as if it were not present. It will not be parsed or passed to any user code.
- Although ignored, the extra columns must still be validly formatted.
-
ExtraColumnTreatment.IncludeDynamic
- If reading into a
dynamic
, the column will be included in the row. Because the column was unexpected, it will not be accesible by name but can be accessed by index and will be present if the dynamic row is treated as an enumerable. - For static types, this is equivalent to
ExtraColumnTreatment.Ignore
- If reading into a
-
ExtraColumnTreatment.ThrowException
- An exception will be thrown if any extra column is encountered.
This option is required.
OptionBuilder
follows the builder pattern, and in particular is patterned after .NET's Immutable Collections.
You can obtain an OptionBuilder
using one of these static methods:
-
OptionsBuilder.CreateBuilder()
- Creates an empty
OptionBuilder
, you will need to set all required options
- Creates an empty
-
OptionsBuilder.CreateBuilder(Options)
- Creates an
OptionBuilder
that has copied it's options from the givenOptions
- Creates an
-
Options.CreateBuilder()
- A convenience alias for
OptionsBuilder.CreateBuilder()
- A convenience alias for
-
Options.CreateBuilder(Options)
- A convenience alias for
OptionsBuilder.CreateBuilder(Options)
- A convenience alias for
Each public property on Options
and OptionBuilder
has at least one corresponding WithXXX(...)
method on OptionBuilder
which takes the new option value, and returns the now-modified OptionBuilder
.
For example, to set the expected row ending to \r
call OptionsBuilder.WithRowEnding(RowEnding.CarriageReturn)
.
Call OptionsBuilder.ToOptions()
to create a new Options
instance with the current values on the OptionBuilder
.
After creating an Options
, the OptionsBuilder
can continue to be used - changes on the OptionBuilder
will not modify any previously created Options.
Options
is immutable, while OptionBuilder
is mutable. Accordingly, all members on Options
are always thread safe and no members on OptionBuilder
can be safely accessed from multiple threads simultaneously.