Idea: adaptive compression #75

annulen · 2015-11-19T21:24:15Z

I need to send large core dumps from embedded device to server, and do it in minimal time. I've found zstd to be optimal solution because it is very fast and has good compression ratio. I'm using it like this:

zstd  -c | curl -T - $url

where kernel fills stdin of zstd

However, if user has narrow bandwidth for upload, it could be benifical to switch from fast compression to high compression which in my case is 30% more effective. For example, if in first 10 seconds zstd detects that compression throughput is N times larger than write speed (in this case to stdout), it automatically switches high compression and uses it up to the end of input stream.

Would such feature make sense?

The text was updated successfully, but these errors were encountered:

Cyan4973 · 2015-11-19T22:51:30Z

Yes. I have this idea in mind for quite some time.

I believe it is first necessary to implement multi-threading, in order to achieve IO and compression in parallel. Then it seems possible to check and compare both speeds.

That's not a easy feature, so it will take some time to make it right.

stati64 · 2015-12-01T15:53:56Z

We do this with:

a monotone list of Pareto Convex Frontier compressor+parameter.
e.g. raw, lz4+256, lz4+32, lz4+2, zstd, lzlib+0, lzlib+3
an idle time goal e.g. 5%.
16 step Pulse Width Modulation between adjacent members of 1, e.g.
typedef union{
u32 level;
struct{
u08 smooth2:4; //smoothing
u08 blend2:4; //blending adjacent compressors
u08 basec2:8; //base compressor
u16 fill2:16; //unused
}bit2;
}blender;

renevdzee · 2015-12-20T10:17:41Z

I've been thinking about the same thing for about a year and came up with the following algorithm to optimize for the slowest resource in the chain:
Create n+2 threads; n compression threads, 1 input thread and 1 output thread.

the input-thread reads data blocks from the input and puts these in an input buffer
the output-thread reads compressed blocks from an output buffer and writes them to the output
the compression threads allocate and compress blocks from the inputbuffer, where the compression level selection is based on how full the input and output buffers are.

When the inputbuffer is below a certain level or the output buffer is near-full a higher compression level is selected. With a full inputbuffer or an empty output buffer a faster compression level, etcetera.

When used over a network connection, whenever there is congestion in the network or on the receiving side (cpu or i/o) the output buffer will fill up and the compression threads will switch to a higher compression ratio. Thus optimizing for highest throughput end-to-end.

regards, René

Cyan4973 · 2015-12-22T23:14:27Z

@renevdzee 's algorithm is very good, and basically what I had in mind.

That being said, in a streaming scenario, the amount of changes that can be done to compression parameters while preserving compression context is very low. Currently, among the multiple parameters accessible through _advanced() functions, only one, searchLog, can be safely modified between blocks while preserving frame compression context. Another one, strategy, can sometimes be changed, but not always, making it more difficult to adapt. It still gives some room for adapting speed, but cannot range from 1 to 20.

Once these restrictions are accepted, it becomes possible to adapt @renevdzee 's methodology.

annulen · 2015-12-23T09:45:38Z

@Cyan4973: What do you mean in "strategy, can sometimes be changed, but not always"? Are there any specific conditions?

Cyan4973 · 2015-12-23T12:04:21Z

Oh yes, it's a bit complex though.

"fast" and "btlazy2" have their own specific table layout, incompatible with other strategies. So it's not possible to swap them "on the fly".
The 3 remaining strategies have compatible table layouts, so could be swapped.

Now that I'm thinking again about it, "btlazy2" has additional requirements : it's possible to reduce "searchLog" on the fly without breaking the tables, but increasing this value is less clear.
I believe it's possible, because the search ends by cleaning the tail of the tree, effectively reducing search length for future searches and inserts, so it should be okay, but without a test, there are remaining risks...

annulen · 2015-12-23T12:15:57Z

I do not care much about btlazy2, but I'm going to start from fast by default

annulen · 2015-12-23T12:28:13Z

In my particular case I think it may be possible to proceed without preserving compression context, just choose "best" params for the next ~~100MB chunk~~ 5MB of output and start from scratch.

Cyan4973 · 2015-12-23T13:15:03Z

Yes, 100 MB is a fairly large size, it's pretty reasonable to use that value to cut input stream into independent chunks.

annulen · 2015-12-23T13:20:08Z

I've written 100 MB first but then realized that my data (core dump) is heterogenous and it's more resonable to split by output size instead.

annulen · 2015-12-26T15:45:25Z

Is it possible to reuse only dictionary instead of full compression context?

Cyan4973 · 2015-12-28T00:13:37Z

@annulen : in theory, yes, it is.

The problem is, loading a dictionary is, in itself, a costly operation.
The higher the compression ratio, the worse it costs.

In the end, most of the time might be spent just loading dictionaries. So it does not seem interesting as a speed optimisation.

chadnickbok · 2016-08-31T22:58:26Z

I'd argue that dynamic compression should remain outside the scope of this library.

Instead, I'd argue that an approach similar to that of realtime video compression libraries (like x264) would be more appropriate - focus on providing an API that has calls for changing things like compression complexity on-the-fly, and allow others to build tools that implement algorithms for specific scenarios.

X-Ryl669 · 2017-03-10T14:20:11Z

Another use case would be for a backup project where some statistics are computed from a buffer to be compressed, and depending on the statistics, one would use either the low or the high compression ratio (maybe 2 instances?). In that case, you don't select based on time spent or FIFO pressure but more likely on the data themselves. If the entropy is high, it's very likely the input data is already compressed (or not compressible) and it's worthless try to burn CPU with the high compression algorithm. Similarly, if the data is redundant, then it makes sense to try to compress it harder.

Said differently, spend your force when it matters more.

paulcruz74 · 2017-08-08T20:07:16Z

I've recently been working on a new utility that acts as a wrapper for ZSTD and provides an algorithm for adapting compression level based on data flow. The utility is currently included in the repository under the contrib directory as adaptive-compression.

The algorithm works by breaking up data into 40MB chunks and creating three threads (reading, compressing, writing). Each 40MB chunk becomes a compression “job”. For each job, a set of two buffers is used. The reader thread fills the first buffer with 40MB worth of data, the compression thread compresses that into the second buffer, and finally the writer thread writes from the second buffer. Because the reader and writer threads operate on separate buffers, they run concurrently and only have to wait on the compression thread. Conversely, the compression thread must wait on both the reader and the writer threads. The compression level is determined in the compression thread right before actually compressing the data. At a high level, if the compression thread was previously waiting, compression is too fast and we can increase compression level. Similarly, if the writer or reader thread is waiting, we are compressing too slowly and should decrease compression level. I have also implemented a few mechanisms to determine how much each thread had completed its job before one had to wait. This can be helpful in a situation such as if the compression thread was waiting on the writer thread, but the writer thread was 99% finished.

So far, I have tested the tool with artificial tests (limit pipe speeds), as well as with some ssh tests that copy files over a network. In scenarios where the static compression level is too low (compression faster than pipeline speed), the algorithm improves compression ratio. Furthermore, the tool does not bottleneck the pipeline (which would occur when static compression level is too high).

Since this is a long-standing request, it would interesting to see if anyone has any feedback, questions, or thoughts on the tool.

annulen · 2017-08-08T21:48:22Z

Do you have any source code publicly available?

In my use case (compression of core dumps) data is read from pipe. I wonder if it's still a good idea to use separate reading thread in this case. Also, chunk size needs to be tunable.

paulcruz74 · 2017-08-08T21:55:28Z

The code is available in contrib/adaptive-compression with an included Makefile for building it.

data-man · 2017-12-17T02:58:16Z

The code is available in contrib/adaptive-compression with an included Makefile for building it.

Not compiled with a several errors (for a latest dev branch).

Cyan4973 · 2017-12-17T04:50:53Z

Fair enough.
We haven't integrated /contrib projects into CI, and as a consequence, do not regularly check if they still compile. That could be improved.

annulen · 2018-01-10T20:24:47Z

It would be nice to have contrib/adaptive-compression program combined with contrib/pzstd

Cyan4973 · 2018-09-25T23:06:51Z

The feature will be integrated into zstd regular command line utility, with command --adapt.
See #1327.

Cyan4973 · 2018-10-05T16:58:56Z

The --adapt feature is now present in zstd v1.3.6 .

Cyan4973 added the enhancement label Dec 2, 2015

Cyan4973 mentioned this issue Sep 25, 2018

Adaptive compression #1327

Merged

Cyan4973 closed this as completed Oct 5, 2018

debuglevel mentioned this issue Aug 11, 2022

Consider an adaptive algorithm by default loomchild/volume-backup#45

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idea: adaptive compression #75

Idea: adaptive compression #75

annulen commented Nov 19, 2015

Cyan4973 commented Nov 19, 2015

stati64 commented Dec 1, 2015

renevdzee commented Dec 20, 2015

Cyan4973 commented Dec 22, 2015

annulen commented Dec 23, 2015

Cyan4973 commented Dec 23, 2015

annulen commented Dec 23, 2015

annulen commented Dec 23, 2015

Cyan4973 commented Dec 23, 2015

annulen commented Dec 23, 2015

annulen commented Dec 26, 2015

Cyan4973 commented Dec 28, 2015 •

edited

Loading

chadnickbok commented Aug 31, 2016

X-Ryl669 commented Mar 10, 2017

paulcruz74 commented Aug 8, 2017

annulen commented Aug 8, 2017

paulcruz74 commented Aug 8, 2017

data-man commented Dec 17, 2017 •

edited

Loading

Cyan4973 commented Dec 17, 2017

annulen commented Jan 10, 2018

Cyan4973 commented Sep 25, 2018 •

edited

Loading

Cyan4973 commented Oct 5, 2018

Idea: adaptive compression #75

Idea: adaptive compression #75

Comments

annulen commented Nov 19, 2015

Cyan4973 commented Nov 19, 2015

stati64 commented Dec 1, 2015

renevdzee commented Dec 20, 2015

Cyan4973 commented Dec 22, 2015

annulen commented Dec 23, 2015

Cyan4973 commented Dec 23, 2015

annulen commented Dec 23, 2015

annulen commented Dec 23, 2015

Cyan4973 commented Dec 23, 2015

annulen commented Dec 23, 2015

annulen commented Dec 26, 2015

Cyan4973 commented Dec 28, 2015 • edited Loading

chadnickbok commented Aug 31, 2016

X-Ryl669 commented Mar 10, 2017

paulcruz74 commented Aug 8, 2017

annulen commented Aug 8, 2017

paulcruz74 commented Aug 8, 2017

data-man commented Dec 17, 2017 • edited Loading

Cyan4973 commented Dec 17, 2017

annulen commented Jan 10, 2018

Cyan4973 commented Sep 25, 2018 • edited Loading

Cyan4973 commented Oct 5, 2018

Cyan4973 commented Dec 28, 2015 •

edited

Loading

data-man commented Dec 17, 2017 •

edited

Loading

Cyan4973 commented Sep 25, 2018 •

edited

Loading