Skip to content

Understanding Serialization and Compression

Peter Goldstein edited this page Dec 6, 2021 · 4 revisions

When Dalli persists a value to a memcached server there are two steps in transforming the value from Ruby to a set of bytes stored in memcached. In order, these are serialization and compression. And by the same token, when a value is retrieved by Dalli from memcached it is transformed via decompression and deserialization, in that order. Information about the serialization and compression state of a stored value is contained in the bitflags (the "extra" bytes) that are stored with a value, so when retrieving a value you don't generally need to know how it was serialized and compressed on storage.

As a Dalli user, you can control both of these steps on a per-client and, sometimes, on a per-request basis. This allows more fine grained control over the values that are stored in memcached, and well as supporting cross-compatibility with other libraries and languages.

In addition, serialization and compression options may impact what commands are available from the Dalli client. Some commands can only be used with values that are stored in an uncompressed, "raw" form. More on that below.

Serialization

By default all values that are stored by Dalli in memcached as serialized using Ruby language serialization. Dalli uses the Marshal class to serialize and deserialize values by default. The output bytes from this process include data about the serialized class, which can be used to reconstitute the class instance when the data is deserialized.

As an example, the string "Hello" is serialized by Marshal as "\x04\bI\"\nHello\x06:\x06ET". So if you stored the value "Hello" under some key via Dalli, and then looked at the bytes on the wire, or retrieved the corresponding key from memcached, by default you'd see the serialized bytes "\x04\bI\"\nHello\x06:\x06ET".

There are two ways to change this behavior.

Raw Values

First, is to set the raw option to true on either the Dalli client or the specific set invocation. This bypasses Ruby language level serialization, and simply calls to_s on the passed in value when serialized. All values deserialized from memcached by Dalli in this mode are treated as strings.

As an example the value "Hello" will be stored as the five bytes [72, 101, 108, 108, 111] (Note there is no terminating null byte).

Similarly, both the integer value 500 and the string "500" will be stored as a sequence of three bytes - [53, 48, 48]. Note that this means that there is no way to distinguish between an integer and its corresponding string.

In addition to enabling cross-compatibility with non-Ruby clients, there are certain commands that can only be used with values that are stored in raw. Namely the append, prepend, incr, and decr commands can only be used with raw values. These commands require memcached to transform the stored value and, as such, memcached must be able to interpret the value as a string or an integer. A value that is stored using Ruby language serialization is going to be, by definition, opaque to memcached.

If you need to perform similar functionality on non-raw values, you will need to execute it client-side. That is, you will need to get the value from memcached, update the value in your Ruby code, and replace the existing value with the new value.

Custom Serializer

Second, is to specify a custom serializer. This custom serializer needs to respond to the methods dump and load, which should serialize and deserialize a value to/from bytes to be stored in memcached respectively. This alternate serializer can be passed in via the serializer option at the client level.

Please note that if you use a custom serializer, you need to ensure that you always retrieve values that you stored with the custom serializer with a client that is configured with the same (or a compatible) serializer. While memcached stores whether or not a serializer was used with the stored value, it does not store WHICH serializer was used. So if you store a value with a custom serializer, the serializer used cannot be determined from the data in memcached. It must be configured in the Dalli client.

Compression

Dalli supports compression of values stored in memcached. This is on by default, but can be disabled for storage operations on either a per-client or per-request basis using the compress option. Setting this option to false with disable compression for a storage operation. Note that this option is ignored on retrieval operations, as the compression status of the previously stored value is recorded in the bitflags for the value.

Compressing values consisting of a small number of bytes is usually counterproductive - the notionally "compressed" value may even wind up larger than the uncompressed value. So Dalli only compresses values whose serialized size (see above) is greater than a user-configurable threshold. This threshold can be adjusted using the Dalli client option compression_min_size, which has a default value of 4096 bytes (4 kB). Values whose serialized representation byte size (or just raw number of bytes, if the raw option is being used) exceed this threshold will be compressed. Values whose size is under this threshold will not be compressed.

Compression is done by the Compressor object for the client. This is an object that implements the compress and decompress methods, which respectively compress a set of bytes and decompress a set of bytes. The Compressor class can be customized by setting the compressor option on the Dalli client.

The default Compressor for Dalli is Dalli::Compressor and uses Zlib DEFLATE compression. Dalli also includes a compressor class Dalli::zipCompressor that uses gzip compression. Custom compressors are very straightforward to write.