New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance comparison with rust-snappy #1
Comments
Yeah! I added snappy to the benchmarks. The short answer is that this appears to be much faster than snappy still. Granted this is only using compression level 3 for gzip, which is on the faster size for sure. When using the
Raw criterion output.
|
So just offloading to another thread is improving speed ~2x, and adding more threads as helps to a point. So I think it's safe to say that this is faster. However, idk if snappy's format allows for a similarly abuse as gzip in catting blocks together. If it does it could easily be added / receive the same treatment a gzip here. |
Right --- so with Snappy, if you use the "FrameEncoder" feature, you can concatenate together compressed chunks from multiple frames. See the issue we raised over in that repo. This was what allowed us to use Snappy in the first place. If we had to serialize the compression, that would have been a non-starter for our use-case. However, I think the details of the multi-threaded implementation are a bit different here. In that case, each thread has it's own frame and compresses wrt it, and the order in the output file is however those threads finish and happen to write. Here, however, you have a strict fifo order on things being passed to the compressor for compression / writing. |
Ah, I see I skimmed over the bit about multi-threading in your initial comment because I was excited to try a new format! I see what you mean now though. I've pushed a branch (all
I want to add more encoding options to this anyways, I'll make snap the first non-gzip option. The decompression speeds for snaps are awesome. |
Awesome; @sstadick! The use-case that we have is sort of "special" in that we have a chunk oriented file where the chunks are independent and can appear in any order. So we use the However, I imagine that the much more common case would be much better suited for your design here, where you pass off the uncompressed data to the compressor, which uses multiple threads to do the compression behind the stream and write the results to file. It'll be very useful to have different compression backends, and to have a little table where users can look up the speed / size tradeoffs of them. This is a very cool library; and I look forward to using it! |
@rob-p see the latest release / updated README for some crude comparisons between Snappy and Gzip compression sizes and times with a variety of threads. It basically shows that Snappy is faster, especially with fewer threads, but is a little less efficient with compression. What was most surprising to me is that compression with gzip not in parallel is not substantially more efficient than space wise than in parallel where I don't keep a streaming dictionary around like pigz does. |
This looks really interesting, @sstadick! We (optionally) use compression to reduce the size of collated RAD files in our aleinv-fry project. Specifically, we make use of the Snappy frame encoding to allow multiple different threads to compress the data in parallel. It's not quite as fast as writing the raw data to disk, but the overhead is small and the compression benefits are pretty large.
Do you have any idea how this compares to snap? Specifically, in the multithreaded case, I'd be curious of the time / size tradeoffs.
Thanks!
Rob
The text was updated successfully, but these errors were encountered: