Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for hardware compression on s390x for zlib-ng #72

Merged
merged 4 commits into from
Mar 6, 2022

Conversation

Erk-
Copy link
Contributor

@Erk- Erk- commented Dec 16, 2020

@joshtriplett
Copy link
Member

Please don't duplicate the common settings. You can declare a variable for the builder and then call some of the methods conditionally.

Signed-off-by: Valdemar Erk <valdemar@erk.io>
@Erk-
Copy link
Contributor Author

Erk- commented Aug 23, 2021

Sorry for the wait, I have made the changes you requested and also run a few benchmarks showing up to 20x speedup with decompression and up to 220x increase in compression (at levels 1-7) when used through flate2

Flate2

Flate2 default
   Running unittests (target/release/deps/bench-58c514034e2bbc05)
Gnuplot not found, using plotters backend
uncompressed: 3266560 bytes
compression/flate2-1.pack
                      time:   [26.246 ms 26.259 ms 26.271 ms]
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) high severe
flate2-1: 1425349 bytes
compression/flate2-1.unpack
                      time:   [16.255 ms 16.284 ms 16.349 ms]
Found 2 outliers among 10 measurements (20.00%)
1 (10.00%) high mild
1 (10.00%) high severe
compression/flate2-2.pack
                      time:   [38.978 ms 38.990 ms 39.005 ms]
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) high mild
flate2-2: 1194487 bytes
compression/flate2-2.unpack
                      time:   [14.195 ms 14.203 ms 14.209 ms]
compression/flate2-3.pack
                      time:   [58.104 ms 58.130 ms 58.163 ms]
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) high severe
flate2-3: 1111333 bytes
compression/flate2-3.unpack
                      time:   [13.354 ms 13.402 ms 13.463 ms]
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) high severe
compression/flate2-4.pack
                      time:   [69.797 ms 70.081 ms 70.445 ms]
flate2-4: 1099059 bytes
compression/flate2-4.unpack
                      time:   [13.618 ms 13.645 ms 13.704 ms]
compression/flate2-5.pack
                      time:   [84.634 ms 84.793 ms 84.965 ms]
flate2-5: 1082945 bytes
compression/flate2-5.unpack
                      time:   [13.397 ms 13.430 ms 13.498 ms]
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) high severe
Benchmarking compression/flate2-6.pack: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 6.7sor enable flat sampling.
compression/flate2-6.pack
                      time:   [121.21 ms 121.34 ms 121.48 ms]
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) high mild
flate2-6: 1071897 bytes
compression/flate2-6.unpack
                      time:   [13.205 ms 13.213 ms 13.218 ms]
Benchmarking compression/flate2-7.pack: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 7.8sor enable flat sampling.
compression/flate2-7.pack
                      time:   [141.34 ms 141.54 ms 142.03 ms]
flate2-7: 1068897 bytes
compression/flate2-7.unpack
                      time:   [13.118 ms 13.163 ms 13.225 ms]
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) high severe
Benchmarking compression/flate2-8.pack: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 8.9sor enable flat sampling.
compression/flate2-8.pack
                      time:   [161.49 ms 161.63 ms 161.77 ms]
flate2-8: 1066961 bytes
compression/flate2-8.unpack
                      time:   [13.051 ms 13.063 ms 13.074 ms]
Found 2 outliers among 10 measurements (20.00%)
2 (20.00%) high mild
Zlib-ng
   Running unittests (target/release/deps/bench-75831b7ab974fa31)
WARNING: HTML report generation will become a non-default optional feature in Criterion.rs 0.4.0.
This feature is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'html_reports' feature in your Cargo.toml.

Gnuplot not found, using plotters backend
uncompressed: 3266560 bytes
compression/flate2-1.pack
                      time:   [61.815 ms 62.016 ms 62.230 ms]
flate2-1: 1658487 bytes
compression/flate2-1.unpack
                      time:   [9.6884 ms 9.7319 ms 9.7932 ms]
compression/flate2-2.pack
                      time:   [51.364 ms 51.525 ms 51.695 ms]
Found 2 outliers among 10 measurements (20.00%)
1 (10.00%) high mild
1 (10.00%) high severe
flate2-2: 1179064 bytes
compression/flate2-2.unpack
                      time:   [9.0035 ms 9.0360 ms 9.0715 ms]
compression/flate2-3.pack
                      time:   [61.177 ms 61.415 ms 61.680 ms]
flate2-3: 1132397 bytes
compression/flate2-3.unpack
                      time:   [8.5760 ms 8.5807 ms 8.5847 ms]
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) high severe
compression/flate2-4.pack
                      time:   [70.395 ms 71.033 ms 71.700 ms]
flate2-4: 1086307 bytes
compression/flate2-4.unpack
                      time:   [8.6578 ms 8.6793 ms 8.7178 ms]
Found 4 outliers among 10 measurements (40.00%)
1 (10.00%) low severe
1 (10.00%) low mild
2 (20.00%) high severe
compression/flate2-5.pack
                      time:   [76.064 ms 76.686 ms 77.311 ms]
flate2-5: 1078796 bytes
compression/flate2-5.unpack
                      time:   [8.4941 ms 8.5150 ms 8.5521 ms]
Found 2 outliers among 10 measurements (20.00%)
1 (10.00%) high mild
1 (10.00%) high severe
compression/flate2-6.pack
                      time:   [80.999 ms 81.256 ms 81.548 ms]
Found 2 outliers among 10 measurements (20.00%)
1 (10.00%) low mild
1 (10.00%) high mild
flate2-6: 1075222 bytes
compression/flate2-6.unpack
                      time:   [8.3507 ms 8.3664 ms 8.3981 ms]
Benchmarking compression/flate2-7.pack: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 5.7s or enable flat sampling.
compression/flate2-7.pack
                      time:   [101.49 ms 102.26 ms 103.42 ms]
flate2-7: 1065662 bytes
compression/flate2-7.unpack
                      time:   [8.7837 ms 8.8895 ms 9.0759 ms]
Benchmarking compression/flate2-8.pack: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 6.3s or enable flat sampling.
compression/flate2-8.pack
                      time:   [113.64 ms 114.89 ms 116.08 ms]
flate2-8: 1063153 bytes
compression/flate2-8.unpack
                      time:   [8.8862 ms 8.9613 ms 9.0268 ms]
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) low mild
Zlib-ng + my patch witout hardware accelerated compression (still hw acceleration on level 1 compression)
   Running unittests (target/release/deps/bench-8c415de9634e6f9f)
WARNING: HTML report generation will become a non-default optional feature in Criterion.rs 0.4.0.
This feature is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'html_reports' feature in your Cargo.toml.

Gnuplot not found, using plotters backend
uncompressed: 3266560 bytes
compression/flate2-1.pack
                      time:   [235.49 us 235.93 us 236.29 us]
                      change: [-99.621% -99.619% -99.618%] (p = 0.00 < 0.05)
                      Performance has improved.
flate2-1: 1470182 bytes
compression/flate2-1.unpack
                      time:   [411.82 us 412.07 us 412.25 us]
                      change: [-95.782% -95.762% -95.745%] (p = 0.00 < 0.05)
                      Performance has improved.
compression/flate2-2.pack
                      time:   [51.435 ms 51.526 ms 51.677 ms]
                      change: [-0.1061% +0.2418% +0.5505%] (p = 0.19 > 0.05)
                      No change in performance detected.
flate2-2: 1179064 bytes
compression/flate2-2.unpack
                      time:   [454.76 us 454.91 us 455.04 us]
                      change: [-94.987% -94.969% -94.953%] (p = 0.00 < 0.05)
                      Performance has improved.
compression/flate2-3.pack
                      time:   [61.548 ms 61.742 ms 61.872 ms]
                      change: [+0.0001% +0.3675% +0.7466%] (p = 0.08 > 0.05)
                      No change in performance detected.
flate2-3: 1132397 bytes
compression/flate2-3.unpack
                      time:   [440.53 us 440.96 us 441.98 us]
                      change: [-94.872% -94.860% -94.842%] (p = 0.00 < 0.05)
                      Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) high severe
compression/flate2-4.pack
                      time:   [69.412 ms 69.624 ms 69.774 ms]
                      change: [-2.5619% -1.7645% -0.9637%] (p = 0.00 < 0.05)
                      Change within noise threshold.
flate2-4: 1086307 bytes
compression/flate2-4.unpack
                      time:   [430.13 us 430.38 us 430.67 us]
                      change: [-95.074% -95.034% -94.994%] (p = 0.00 < 0.05)
                      Performance has improved.
compression/flate2-5.pack
                      time:   [73.149 ms 73.270 ms 73.480 ms]
                      change: [-5.9472% -4.9114% -3.9668%] (p = 0.00 < 0.05)
                      Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) high mild
flate2-5: 1078796 bytes
compression/flate2-5.unpack
                      time:   [428.64 us 429.26 us 429.85 us]
                      change: [-94.993% -94.966% -94.943%] (p = 0.00 < 0.05)
                      Performance has improved.
Found 2 outliers among 10 measurements (20.00%)
2 (20.00%) high mild
compression/flate2-6.pack
                      time:   [78.893 ms 78.966 ms 79.063 ms]
                      change: [-3.3955% -2.7469% -2.0639%] (p = 0.00 < 0.05)
                      Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) high severe
flate2-6: 1075222 bytes
compression/flate2-6.unpack
                      time:   [426.40 us 426.51 us 426.62 us]
                      change: [-94.951% -94.926% -94.905%] (p = 0.00 < 0.05)
                      Performance has improved.
Benchmarking compression/flate2-7.pack: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 5.3s or enable flat sampling.
compression/flate2-7.pack
                      time:   [96.995 ms 97.125 ms 97.243 ms]
                      change: [-6.2515% -5.5200% -4.7046%] (p = 0.00 < 0.05)
                      Performance has improved.
flate2-7: 1065662 bytes
compression/flate2-7.unpack
                      time:   [423.29 us 423.43 us 423.71 us]
                      change: [-95.305% -95.242% -95.183%] (p = 0.00 < 0.05)
                      Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) high mild
Benchmarking compression/flate2-8.pack: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 6.1s or enable flat sampling.
compression/flate2-8.pack
                      time:   [110.04 ms 110.24 ms 110.36 ms]
                      change: [-4.9203% -4.2248% -3.5290%] (p = 0.00 < 0.05)
                      Performance has improved.
flate2-8: 1063153 bytes
compression/flate2-8.unpack
                      time:   [422.59 us 422.76 us 422.93 us]
                      change: [-95.308% -95.267% -95.220%] (p = 0.00 < 0.05)
                      Performance has improved.
Zlib-ng + my patch + hardware accelerated compression
     Running unittests (target/release/deps/bench-8c415de9634e6f9f)
WARNING: HTML report generation will become a non-default optional feature in Criterion.rs 0.4.0.
This feature is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'html_reports' feature in your Cargo.toml.

Gnuplot not found, using plotters backend
uncompressed: 3266560 bytes
compression/flate2-1.pack
                        time:   [234.29 us 234.36 us 234.42 us]
                        change: [-0.4435% -0.3905% -0.3425%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
flate2-1: 1470182 bytes
compression/flate2-1.unpack
                        time:   [413.68 us 414.00 us 414.21 us]
                        change: [+0.2569% +0.4367% +0.5602%] (p = 0.00 < 0.05)
                        Change within noise threshold.
compression/flate2-2.pack
                        time:   [234.32 us 234.41 us 234.52 us]
                        change: [-99.560% -99.559% -99.558%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
flate2-2: 1470182 bytes
compression/flate2-2.unpack
                        time:   [413.79 us 413.95 us 414.14 us]
                        change: [-9.5157% -9.4611% -9.4157%] (p = 0.00 < 0.05)
                        Performance has improved.
compression/flate2-3.pack
                        time:   [234.39 us 234.46 us 234.56 us]
                        change: [-99.624% -99.622% -99.621%] (p = 0.00 < 0.05)
                        Performance has improved.
flate2-3: 1470182 bytes
compression/flate2-3.unpack
                        time:   [413.51 us 413.75 us 413.96 us]
                        change: [-6.1808% -6.0194% -5.7633%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe
compression/flate2-4.pack
                        time:   [234.36 us 234.45 us 234.55 us]
                        change: [-99.661% -99.661% -99.660%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
flate2-4: 1470182 bytes
compression/flate2-4.unpack
                        time:   [411.54 us 411.80 us 412.04 us]
                        change: [-4.3757% -4.3002% -4.2225%] (p = 0.00 < 0.05)
                        Performance has improved.
compression/flate2-5.pack
                        time:   [234.28 us 234.32 us 234.40 us]
                        change: [-99.681% -99.680% -99.680%] (p = 0.00 < 0.05)
                        Performance has improved.
flate2-5: 1470182 bytes
compression/flate2-5.unpack
                        time:   [411.24 us 411.41 us 411.58 us]
                        change: [-4.4804% -4.2792% -4.1013%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 10 measurements (20.00%)
  2 (20.00%) high mild
compression/flate2-6.pack
                        time:   [234.23 us 234.40 us 234.56 us]
                        change: [-99.704% -99.703% -99.703%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
flate2-6: 1470182 bytes
compression/flate2-6.unpack
                        time:   [411.42 us 411.49 us 411.57 us]
                        change: [-3.5549% -3.5303% -3.5062%] (p = 0.00 < 0.05)
                        Performance has improved.
Benchmarking compression/flate2-7.pack: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 5.3s or enable flat sampling.
compression/flate2-7.pack
                        time:   [95.855 ms 95.921 ms 96.015 ms]
                        change: [-1.2030% -0.8535% -0.3794%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
flate2-7: 1065662 bytes
compression/flate2-7.unpack
                        time:   [425.18 us 425.33 us 425.44 us]
                        change: [+0.1861% +0.3293% +0.4405%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Benchmarking compression/flate2-8.pack: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 6.0s or enable flat sampling.
compression/flate2-8.pack
                        time:   [108.34 ms 108.43 ms 108.59 ms]
                        change: [-1.6135% -1.3806% -1.1612%] (p = 0.00 < 0.05)
                        Performance has improved.
flate2-8: 1063153 bytes
compression/flate2-8.unpack
                        time:   [424.46 us 424.74 us 425.09 us]
                        change: [+0.4369% +0.5072% +0.5841%] (p = 0.00 < 0.05)
                        Change within noise threshold.

GZP

I also tested it with https://github.com/sstadick/gzp and the results were surprising, using the 100 times shakespeare file as the other benchmarks on the site (and 2 threads as I am limited to that in my vps) I ran the benchmarks for gzip

GZP Zlib-ng
Benchmarking Compression/Gzip/2: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 129.0s.
Compression/Gzip/2      time:   [12.898 s 12.913 s 12.936 s]
                        change: [+9943.5% +9963.3% +9984.9%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe
Benchmarking Compression/Gzip Only: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 128.4s.
Compression/Gzip Only   time:   [12.802 s 12.875 s 12.975 s]
                        change: [+10080% +10140% +10227%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe
GZP Zlib-ng + my patch
Compression/Gzip/2      time:   [890.24 ms 928.41 ms 969.07 ms]
                        change: [-93.093% -92.810% -92.544%] (p = 0.00 < 0.05)
                        Performance has improved.
Benchmarking Compression/Gzip Only: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 5.6s.
Compression/Gzip Only   time:   [558.82 ms 559.74 ms 560.81 ms]
                        change: [-95.686% -95.652% -95.627%] (p = 0.00 < 0.05)
                        Performance has improved.

They show that using GZP is around double the speed of using flate2 with 1 core for the same so there must be some dependencies that break when used with more than one thread at the time.

Misc

Related: It could be nice to get zlib-ng updated for this commit zlib-ng/zlib-ng@0573840

build.rs Outdated Show resolved Hide resolved
@joshtriplett joshtriplett merged commit e7910b5 into rust-lang:main Mar 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants