Compressing and decompressing with dictionaries, between different zstd versions #3802

ktsaou · 2023-10-27T12:45:43Z

Hi,

Thank you for ZSTD! Amazing work!

I am the founder of Netdata (https://github.com/netdata/netdata).

We have added support for ZSTD in Netdata streaming feature, to get better compression than LZ4 that we were using so far. Everything looks great!

To improve the scalability of Netdata, we want the best compression and speed, and ZSTD dictionaries seem very appealing.

But unfortunately, Netdata agents may be built with different versions of ZSTD. So, I have the following questions when using dictionaries:

Different versions of ZSTD will still be compatible (compress with any version, decompress with any other version)?
Since decompression requires the dictionary, I see 2 options of sharing the dictionaries between the compressor and the decompressor:
- share the samples, so that the compressor and the decompressor will train the dictionaries using the same samples.
  For different versions of ZSTD for the compressor and the decompressor, I assume the dictionaries will still be compatible if the source samples are exactly the same.
- train the dictionary on the compressor and share the binary dictionary with the decompressor.
  For this to work, the binary dictionary needs to be compatible among different versions of the compressor and decompressor and even between different architectures (big/little endian).

Keep in mind that the way we will implement the use of dictionaries is somewhat dynamic. For example, every few minutes the compressor will get a sample of the data transferred, retrain the dictionary and use the new one. Of course this will be synchronized with the decompressor (any other version of ZSTD, older or newer), either as samples, or binary trained dictionary.

What do you think?

ktsaou · 2023-10-27T15:49:31Z

Copied from #3610 (comment) (I accidentally commended to the wrong thread).

Here is our situation at scale: we do about 1 million compressions / decompressions per second to move data between netdata agents using a few hundreds of compressors and decompressors. It is not one pipe, it is not one stream. It is hundreds of streams, each sending thousands of very small messages (usually less than 100 bytes) per second, that need to be transmitted on time (low latency).

We are using ZSTD in streaming mode, which we never reset, but we flush the output buffer of the compressor after every compressed message, since we need to send it asap.

The content itself has many variable components (values), but also a lot of fixed components (keywords). It is text.

We can take care of the transport required to move dictionaries or samples around, and also synchronize the compressors and decompressors to always use the same dictionaries (the versioning part you mentioned), so this is not a problem for us.

I don't know if using dictionaries in our case could provide the benefits of dictionaries. We see quite some decrease in CPU consumption and increase of compression ratio over time (e.g. after a day we experience 10% less CPU consumption that is almost gradually achieved), but I can't be sure if this is due to some change on the data, or that ZSTD learns to do it more efficiently over time.

If dictionaries can provide a benefit in streaming mode, the compatibility of dictionaries across versions of ZSTD is crucial for us.

Cyan4973 · 2023-10-31T17:48:49Z

Different versions of ZSTD will still be compatible (compress with any version, decompress with any other version)?

Yes. The format is frozen in RFC8878.

share the samples, so that the compressor and the decompressor will train the dictionaries using the same samples.

Nope. This implies that, using the same sample sets, the same dictionary will be generated. This is not guaranteed. It may turn out to be accidentally true using a limited set of versions and platforms for testings, but that's not a safe guarantee to build upon.

train the dictionary on the compressor and share the binary dictionary with the decompressor.

Yes, that's the more common approach.

the binary dictionary needs to be compatible among different versions of the compressor and decompressor and even between different architectures (big/little endian).

Yes, the dictionary is also defined in RFC8878, so it's guaranteed to be interoperable.

Here is our situation at scale:
We are using ZSTD in streaming mode, which we never reset
It is hundreds of streams, each sending thousands of very small messages

OK, so you are currently using Streaming compression.
And you are considering Dictionary instead of, or on top of, Streaming.

Rule of thumb : except in specific corner cases, Streaming compression is expected to always compress stronger than Dictionary compression.
Dictionary compression bridges the gap between independent frames and streaming.

So, if compression ratio is the only parameter that matters, then it will be difficult to beat Streaming compression with Dictionary. You could imagine a combination of Streaming + Dictionary, which should compress even better, but depending on the total size of each stream (sum of all these little messages), the final benefits might be underwhelming to be worth the complexity, since dictionary is mostly effective during the first Kilobytes of the Stream.
There might be an additional saving in this scenario though, thanks header statistics computed and saved into the Dictionary, because individual messages are so small (~100 bytes) and constantly flushed so they may never allow transmission of more optimal header statistics. So who knows, maybe the final savings could end up being significant in this case.

The downside of Streaming is the need to maintain a state, on both side. for each stream.
If there are many streams (hundreds are mentioned), this can result in a significant memory budget.
If this memory budget is a problem, then dictionary compression (without Streaming) is a pretty good alternative.
It will probably not compress as well as streaming, but possibly be close enough, while slashing the memory cost, because now, the system only needs as many states as there are active compression / decompression sessions in parallel, which are typically orders of magnitude lower (depending on the application).

ktsaou · 2023-11-01T11:46:56Z

Great! Thank you very much for answering this. We will do some experiments to see if streaming with dictionaries improves the situation in compression and memory.

Cyan4973 added the question label Oct 31, 2023

Cyan4973 self-assigned this Oct 31, 2023

ktsaou closed this as completed Nov 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compressing and decompressing with dictionaries, between different zstd versions #3802

Compressing and decompressing with dictionaries, between different zstd versions #3802

ktsaou commented Oct 27, 2023

ktsaou commented Oct 27, 2023

Cyan4973 commented Oct 31, 2023

ktsaou commented Nov 1, 2023

Compressing and decompressing with dictionaries, between different zstd versions #3802

Compressing and decompressing with dictionaries, between different zstd versions #3802

Comments

ktsaou commented Oct 27, 2023

ktsaou commented Oct 27, 2023

Cyan4973 commented Oct 31, 2023

ktsaou commented Nov 1, 2023