Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance comparison with rust-snappy #1

Closed
rob-p opened this issue Aug 5, 2021 · 6 comments
Closed

Performance comparison with rust-snappy #1

rob-p opened this issue Aug 5, 2021 · 6 comments

Comments

@rob-p
Copy link

rob-p commented Aug 5, 2021

This looks really interesting, @sstadick! We (optionally) use compression to reduce the size of collated RAD files in our aleinv-fry project. Specifically, we make use of the Snappy frame encoding to allow multiple different threads to compress the data in parallel. It's not quite as fast as writing the raw data to disk, but the overhead is small and the compression benefits are pretty large.

Do you have any idea how this compares to snap? Specifically, in the multithreaded case, I'd be curious of the time / size tradeoffs.

Thanks!
Rob

@rob-p rob-p changed the title Performance comparison with [rust-snappy](https://github.com/BurntSushi/rust-snappy) Performance comparison with rust-snappy Aug 5, 2021
@sstadick
Copy link
Owner

sstadick commented Aug 5, 2021

Yeah! I added snappy to the benchmarks. The short answer is that this appears to be much faster than snappy still. Granted this is only using compression level 3 for gzip, which is on the faster size for sure.

When using the zlib-ng-compat backend for flate2 the times between snappy and flate2 are within 20% of each other.

cargo bench --features zlib-ng-compat

Raw criterion output. Compression/<num> refers to the number of threads used by gzp

Compression/1           time:   [42.183 ms 42.428 ms 42.691 ms]                          
                        change: [-28.261% -27.294% -26.316%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
Compression/4           time:   [26.850 ms 27.361 ms 27.885 ms]                          
                        change: [-28.842% -26.925% -24.728%] (p = 0.00 < 0.05)
                        Performance has improved.
Compression/8           time:   [23.468 ms 23.802 ms 24.187 ms]                          
                        change: [-37.463% -35.959% -34.311%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
Compression/12          time:   [21.985 ms 22.332 ms 22.686 ms]                           
                        change: [-31.289% -29.592% -27.973%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
Benchmarking Compression/Flate2: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 13.6s, or reduce sample count to 30.
Compression/Flate2      time:   [120.06 ms 121.95 ms 123.81 ms]                               
                        change: [-36.405% -34.791% -33.120%] (p = 0.00 < 0.05)
                        Performance has improved.
Benchmarking Compression/Snap: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 11.1s, or reduce sample count to 40.
Compression/Snap        time:   [99.807 ms 100.80 ms 101.89 ms]                             
                        change: [-36.241% -35.447% -34.604%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

@sstadick
Copy link
Owner

sstadick commented Aug 5, 2021

So just offloading to another thread is improving speed ~2x, and adding more threads as helps to a point. So I think it's safe to say that this is faster. However, idk if snappy's format allows for a similarly abuse as gzip in catting blocks together. If it does it could easily be added / receive the same treatment a gzip here.

@rob-p
Copy link
Author

rob-p commented Aug 5, 2021

Right --- so with Snappy, if you use the "FrameEncoder" feature, you can concatenate together compressed chunks from multiple frames. See the issue we raised over in that repo. This was what allowed us to use Snappy in the first place. If we had to serialize the compression, that would have been a non-starter for our use-case. However, I think the details of the multi-threaded implementation are a bit different here. In that case, each thread has it's own frame and compresses wrt it, and the order in the output file is however those threads finish and happen to write. Here, however, you have a strict fifo order on things being passed to the compressor for compression / writing.

@sstadick
Copy link
Owner

sstadick commented Aug 5, 2021

Ah, I see I skimmed over the bit about multi-threading in your initial comment because I was excited to try a new format!

I see what you mean now though. I've pushed a branch with_snap that replaces GzEncoder with FrameEncoder and it is very fast, rough 3-4x faster. I'm sure that with enough cpu's to through at it the difference would go away.

(all Compression/<num> are using snap)

Compression/1           time:   [13.281 ms 13.465 ms 13.697 ms]                          
                        change: [-68.714% -68.264% -67.669%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe
Compression/4           time:   [9.3764 ms 9.5344 ms 9.7098 ms]                          
                        change: [-66.012% -65.154% -64.250%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe
Compression/8           time:   [8.8137 ms 8.9939 ms 9.1984 ms]                          
                        change: [-63.149% -62.213% -61.182%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe
Compression/12          time:   [8.3453 ms 8.5298 ms 8.7454 ms]                           
                        change: [-62.847% -61.805% -60.765%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe
Benchmarking Compression/Flate2: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 19.2s, or reduce sample count to 20.
Compression/Flate2      time:   [172.77 ms 174.37 ms 176.09 ms]                               
                        change: [+40.498% +42.994% +45.614%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
Benchmarking Compression/Snap: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 16.7s, or reduce sample count to 20.
Compression/Snap        time:   [159.52 ms 161.03 ms 162.61 ms]                             
                        change: [+57.521% +59.757% +61.972%] (p = 0.00 < 0.05)
                        Performance has regressed.

I want to add more encoding options to this anyways, I'll make snap the first non-gzip option. The decompression speeds for snaps are awesome.

@rob-p
Copy link
Author

rob-p commented Aug 6, 2021

Awesome; @sstadick! The use-case that we have is sort of "special" in that we have a chunk oriented file where the chunks are independent and can appear in any order. So we use the FrameEncoder to have each parsing and processing thread compress it's own work chunk, and then write it to file and grab some more work.

However, I imagine that the much more common case would be much better suited for your design here, where you pass off the uncompressed data to the compressor, which uses multiple threads to do the compression behind the stream and write the results to file. It'll be very useful to have different compression backends, and to have a little table where users can look up the speed / size tradeoffs of them. This is a very cool library; and I look forward to using it!

@rob-p rob-p closed this as completed Aug 6, 2021
@sstadick
Copy link
Owner

@rob-p see the latest release / updated README for some crude comparisons between Snappy and Gzip compression sizes and times with a variety of threads. It basically shows that Snappy is faster, especially with fewer threads, but is a little less efficient with compression.

What was most surprising to me is that compression with gzip not in parallel is not substantially more efficient than space wise than in parallel where I don't keep a streaming dictionary around like pigz does.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants