Skip to content
kevin-montrose edited this page Apr 10, 2021 · 20 revisions

Cesil

Introduction

Cesil is a modern .NET library for reading and writing Delimiter-Separated Values (DSVs), the most common of which is Comma-Separated Values (CSVs).

Cesil supports reading and writing, in synchronous and asychronous ways, static and dynamic data types. Cesil requires .NET Core 3.0+.

Quick Start

  1. Install the latest Cesil off of Nuget.
  2. Add using Cesil; to your C# file
  3. Use one of the EnumerateXXX(...) or WriteXXX(...) methods on CesilUtils to read or write

Continue reading for the more configurable ways to use Cesil.

Example: Reading Synchronously

Using a convient method:

using Cesil;

// ...

using(TextReader reader = /* some TextReader */)
{
  IEnumerable<MyType> rows = CesilUtils.Enumerate<MyType>(reader);
}

In a more explicit, and configurable, way using explicit configuration and options.

using Cesil;

// ...

Options myOptions = /* ... */
IBoundConfiguration<MyType> myConfig = Configuration.For<MyType>(myOptions);

using(TextReader reader = /* ... */)
using(IReader<MyType> csv = myConfig.CreateReader(reader))
{
  IEnumerable<MyType> rows = csv.EnumerateAll();
}

For more detail, see Reading.

Example: Reading Asynchronously

Using a convient method:

using Cesil;

// ...

using(TextReader reader = /* some TextReader */)
{
  IAsyncEnumerable<MyType> rows = CesilUtils.EnumerateAsync<MyType>(reader);
}

In a more explicit, and configurable, way using explicit configuration and options.

using Cesil;

// ...

Options myOptions = /* ... */
IBoundConfiguration<MyType> myConfig = Configuration.For<MyType>(myOptions);

using(TextReader reader = /* ... */)
await using(IAsyncReader<MyType> csv = myConfig.CreateAsyncReader(reader))
{
  IAsyncReader<MyType> rows = csv.EnumerateAllAsync();
}

For more detail, see Reading.

Example: Writing Synchronously

Using a convient method:

using Cesil;

// ...

IEnumerable<MyType> myRows = /* ... */

using(TextWriter writer = /* .. */)
{
  CesilUtilities.Write(myRows, writer);
}

In a more explicit, and configurable, way using explicit configuration and options.

using Cesil;

// ...

IEnumerable<MyType> myRows = /* ... */

Options myOptions = /* ... */
IBoundConfiguration<MyType> myConfig = Configuration.For<MyType>(myOptions);

using(TextWriter writer = /* ... */)
using(IWriter<MyType> csv = myConfig.CreateWriter(writer))
{
  csv.WriteAll(myRows);
}

For more detail, see Writing.

Example: Writing Asynchronously

Using a convient method:

using Cesil;

// ...

// IAsyncEnumerable<MyType> will also work
IEnumerable<MyType> myRows = /* ... */

using(TextWriter writer = /* .. */)
{
  await CesilUtilities.WriteAsync(myRows, writer);
}

In a more explicit, and configurable, way using explicit configuration and options.

using Cesil;

// ...

// IAsyncEnumerable<MyType> will also work
IEnumerable<MyType> myRows = /* ... */

Options myOptions = /* ... */
IBoundConfiguration<MyType> myConfig = Configuration.For<MyType>(myOptions);

using(TextWriter writer = /* ... */)
await using(IWriter<MyType> csv = myConfig.CreateAsyncWriter(writer))
{
  await csv.WriteAllAsync(myRows);
}

For more detail, see Writing.

Supported Data Streams

Cesil can read and write to a number of different "stream" types, classes or interfaces that conceptually model streams of data.

For synchronously reading with the IReader<TRow> interface, Cesil supports:

For asynchronously reading with the IAsyncReader<TRow> interface, Cesil supports:

For synchronously writing with the IWriter<TRow> interface, Cesil supports:

For asynchronously writing with the IAsyncWriter<TRow> interface, Cesil supports:

Customization

By default when dealing with concrete types, Cesil will use Options.Default which assumes:

  • You're working with CSVs
    • The value delimiter is a ,
  • Rows end in \r\n
  • Cells can be escaped with "
  • Within a cell, " can be escaped with another "
  • Headers are optional, and thus automatically detected, when reading
  • Headers are always written
  • Types are (de)serialized in keeping with "normal" .NET conventions
  • The final row will NOT be terminated with a new line when writing
  • Allocations come out of MemoryPool<char>.Shared
  • Comments are not supported
  • Write buffering is enabled, but no size hint is given
  • Read buffering (which is always enabled) is not given a size hint
  • Dynamically read rows are disposed when the IReader<TRow> or IAsyncReader<TRow> that last returned them is disposed
  • Whitespace is preserved

When dealing with dynamic (ie. using configurations obtained via Configuration.ForDynamic), Cesil will us Options.DynamicDefault which is identical to Options.Default except that headers are not optional when reading, and are assumed to be present.

Every method on Configuration accepts an optional Options, which will be used instead of the defaults documented above if set. Custom Options can be built with an OptionsBuilder. Options are immutable, thread-safe, and can be safely used by many readers or writers at the same time.

Type Describers

Cesil's default Options will use TypeDescribers.Default, as noted above. This ITypeDesciber is a shared instance of DefaultTypeDescriber which implements "normal" .NET conventions around (de)serializing.

You can read more about what Cesil considers "normal" here, but in brief:

  • Requires any constructed types have a parameter-less constructor
  • Public properties are (de)serialized, provided they have public setters (for reading) and getters (for writing), and their type has a default formatter and a default parser.
  • Any properties or fields with DataMemberAttribute are (de)serialized, subject to the same setter/getter and type rules as above
  • Name, Order, and IsRequired on DataMemberAtribute are respected
  • Any properties or fields with IgnoreDataMemberAttribute are ignored
  • ShouldSerializeXXX() and ResetXXX() methods are discovered and used for (de)serialized properties

The above behavior applies to concrete types, when (de)serializing in a dynamic context Cesil falls back to that behavior for types that don't participate in the Dynamic Language Runtime.

DefaultTypeDescriber is extensible, with a number of virtual methods documented here. Cesil actually works in terms of the ITypeDescriber interface, which allows for behaviors that are completely unrelated to the "normal" .NET way.

Source Generators

Cesil can be used without runtime code generation using Source Generators, which were added in C# 9.

To use them, you must add the Cesil.SourceGenerator nuget package, and attach various attributes to the types you want to read and write. Most of the customization options provided by Cesil can be achieved with Cesil.SourceGenerator, although there are some additional restrictions (primarily not being able to use delegates, and not being able to ignore member accessibility) due to the nature of source generators.

A simple example for reading a type:

using System;
using Cesil;

namespace Foo 
{   
    [GenerateDeserializer]
    public class ReadMe
    {
        [DeserializerMember(ParserType=typeof(ReadMe), ParserMethodName=nameof(ForInt))]
        public int Bar { get; set; }
        [DeserializerMember(ParserType=typeof(ReadMe), ParserMethodName=nameof(ForString))]
        public string Fizz = """";
        
        private DateTime _Hello;
        [DeserializerMember(Name=""Hello"", ParserType=typeof(ReadMe), ParserMethodName=nameof(ForDateTime))]
        public void SomeMtd(DateTime dt) 
        { 
            _Hello = dt;
        }

        public DateTime GetHello()
        => _Hello;

        public ReadMe() { }

        public static bool ForInt(ReadOnlySpan<char> data, in ReadContext ctx, out int val)
        => int.TryParse(data, out val);

        public static bool ForString(ReadOnlySpan<char> data, in ReadContext ctx, out string val)
        {
            val = new string(data);
            return true;
        }

        public static bool ForDateTime(ReadOnlySpan<char> data, in ReadContext ctx, out DateTime val)
        => DateTime.TryParse(data, out val);
    }
}

And one for writing a type:

using System;
using System.Buffers;
using Cesil;

namespace Foo 
{   
    [GenerateSerializer]
    public class WriteMe
    {
        [SerializerMember(FormatterType=typeof(WriteMe), FormatterMethodName=nameof(ForInt))]
        public int Bar { get; set; }
        [SerializerMember(FormatterType=typeof(WriteMe), FormatterMethodName=nameof(ForString))]
        public string Fizz = """";
        [SerializerMember(Name=""Hello"", FormatterType=typeof(WriteMe), FormatterMethodName=nameof(ForDateTime))]
        public DateTime SomeMtd() => new DateTime(2020, 11, 15, 0, 0, 0);

        public WriteMe() { }

        public static bool ForInt(int val, in WriteContext ctx, IBufferWriter<char> buffer)
        {
            var span = buffer.GetSpan(100);
            if(!val.TryFormat(span, out var written))
            {
                return false;
            }

            buffer.Advance(written);
            return true;
        }

        public static bool ForString(string val, in WriteContext ctx, IBufferWriter<char> buffer)
        {
            var span = buffer.GetSpan(val.Length);
            val.AsSpan().CopyTo(span);

            buffer.Advance(val.Length);
            return true;
        }

        public static bool ForDateTime(DateTime val, in WriteContext ctx, IBufferWriter<char> buffer)
        {
            var span = buffer.GetSpan(4);
            if(!val.Year.TryFormat(span, out var written))
            {
                return false;
            }

            buffer.Advance(written);
            return true;
        }
    }
}

Both examples include custom parsers or formatters, but this is just for demonstration's sake. If you do not specify parsers or formatters, a default parser or ][default formatter|Default Formatters]] will be used if on exists.

More details can be read in Source Generators.

Performance

Cesil's focuses on ease of use and flexibility, but performance is important - especially as .NET is increasingly adopted for more performance critical code. A set of benchmarks can be found Cesil's repo that cover the following cases:

The benchmarks for static operations use CsvHelper as a baseline comparison and the dynamic benchmarks compare to Cesil's static equivalent. Each benchmark has an InitializeAndTest() method that assures, in DEBUG builds, that the output of the compared variants is identical.

A sample run of all benchmarks (including some ones for internal implementation details) can be found in the repo. A select subset of those results for static operations is shown below.

Benchmarking reading and writing for different row types and row counts

Dynamic operations aim to be no more than 3x slower than their static equivalent, but this naturally varies based on exactly is done with the returned dynamic rows.

Contributing

Cesil is open source, under the MIT license.

Cesil intentionally exploits new functionality found in C# 8 and .NET Core 3.0+. Early frameworks are not supported, by design.

Further reading: