Skip to content

rixed/dessser

Repository files navigation

Code generator tailored at data manipulation

Can generate (de)serialiers, converters, filters...

Supports several back-ends and various external data formats.

As of today:

Backends:

  • OCaml
  • C++

External formats:

  • Clickhouse's row-binary
  • Ramen's ringbuffers
  • CSV
  • S-Expressions
  • JSON

Suggested reading order

  1. DessserTypes.ml: defines the supported data types and operators to manipulate them

Dessser supports most compound data types, up to and including sum and product types (aka. tagged unions and tuples). Recursive types are supported to some extend. There is also limited support for user defined types. There is no support for type parameters, though. In other words, users cannot define polymorphic types.

Types are organized in two abstraction layers:

  • the types that can store user manipulable values, belonging to the type named typ. Most of those types can be (de)serialized and manipulated in many ways (the exceptions being the types used to implement serializers themselves, such as the pointer types etc).

  • often times, values (of some value type) are optional (aka null or unknown). So the maybe_nullable type extends the typ type with a boolean indicating whether these values can be null.

Notice that NULL in dessser behaves like SQL's NULL rather than ML language option types (Haskell Maybe or OCaml's option) in that any combination of NULLs collapse into one; for instance, NotNull (NULL) is not a valid value.

  1. DessserExpressions.ml: Although for technical reasons the type of expressions expr is defined in DessserTypes.ml, most of functions on expressions are defined in DessserExpressions.

The level of abstraction offered by the expression language tries to maintain a good balance between simplicity for the user and for the back-end.

  1. DessserBackEndOCaml.ml implements the OCaml back-end (the simplest)

  2. Dessser.ml implements the (de)serializers and converters (parameterized with encodings)

Given a data type and an encoding, Dessser.ml can generate a serializer, a deserializer, a converter between two encoding, etc.

Note that converters generated by Dessser do not store intermediary values in memory; instead of desserializing the whole value into the heap and then serializing it, it performs the conversion piecewise so that the full value is actually never materialized, to save time and memory. See DessserHeapValue to build such a fully-fledged value in memory.

  1. DessserSExpr.ml implements the simplest encoding: s-expressions

  2. DessserHeapValue.ml implement the special encoding for values stored in memory

Deserializing a value consists of converting from serial buffer into a memory "reified" value. DessserHeapValue construct an expression that will build that value (in any chosen back-end).

Likewise, serializing a value consists of building an expression that iterate through a memory value and write it in a buffer.

Additionally, DessserHeapValue can also build an expression that computes the size of the serialized value, without serializing it (come handy if preallocating the buffer is necessary).

  1. DessserStdLib.ml implements various meta-functions, generating expressions from expressions and acting like a library for Dessser's intermediary language.