Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zstd optimize small blocks #265

Merged
merged 16 commits into from
Jun 5, 2020
Merged

Zstd optimize small blocks #265

merged 16 commits into from
Jun 5, 2020

Conversation

klauspost
Copy link
Owner

@klauspost klauspost commented Jun 2, 2020

Single threaded:

benchmark                                                    old MB/s     new MB/s     speedup
BenchmarkDecoder_DecoderSmall/kppkn.gtb.zst-32               299.66       348.55       1.16x
BenchmarkDecoder_DecoderSmall/geo.protodata.zst-32           824.68       971.14       1.18x
BenchmarkDecoder_DecoderSmall/plrabn12.txt.zst-32            211.31       233.53       1.11x
BenchmarkDecoder_DecoderSmall/lcet10.txt.zst-32              248.88       274.21       1.10x
BenchmarkDecoder_DecoderSmall/asyoulik.txt.zst-32            248.88       287.44       1.15x
BenchmarkDecoder_DecoderSmall/alice29.txt.zst-32             240.27       274.48       1.14x
BenchmarkDecoder_DecoderSmall/html_x_4.zst-32                1481.90      1442.57      0.97x
BenchmarkDecoder_DecoderSmall/paper-100k.pdf.zst-32          3848.34      4570.00      1.19x
BenchmarkDecoder_DecoderSmall/fireworks.jpeg.zst-32          12196.13     12295.82     1.01x
BenchmarkDecoder_DecoderSmall/urls.10K.zst-32                374.76       422.87       1.13x
BenchmarkDecoder_DecoderSmall/html.zst-32                    641.72       767.35       1.20x
BenchmarkDecoder_DecoderSmall/comp-data.bin.zst-32           394.59       425.95       1.08x
BenchmarkDecoder_DecodeAll/kppkn.gtb.zst-32                  302.35       351.40       1.16x
BenchmarkDecoder_DecodeAll/geo.protodata.zst-32              823.43       970.26       1.18x
BenchmarkDecoder_DecodeAll/plrabn12.txt.zst-32               253.10       291.87       1.15x
BenchmarkDecoder_DecodeAll/lcet10.txt.zst-32                 305.60       342.69       1.12x
BenchmarkDecoder_DecodeAll/asyoulik.txt.zst-32               250.99       292.41       1.17x
BenchmarkDecoder_DecodeAll/alice29.txt.zst-32                242.98       273.30       1.12x
BenchmarkDecoder_DecodeAll/html_x_4.zst-32                   1492.46      1448.13      0.97x
BenchmarkDecoder_DecodeAll/paper-100k.pdf.zst-32             3953.49      4726.96      1.20x
BenchmarkDecoder_DecodeAll/fireworks.jpeg.zst-32             13011.31     13076.62     1.01x
BenchmarkDecoder_DecodeAll/urls.10K.zst-32                   410.19       493.80       1.20x
BenchmarkDecoder_DecodeAll/html.zst-32                       641.87       765.77       1.19x
BenchmarkDecoder_DecodeAll/comp-data.bin.zst-32              379.34       425.51       1.12x
BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-32          5786.45      6353.87      1.10x
BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-32      15827.85     17395.66     1.10x
BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-32       4726.87      5203.13      1.10x
BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-32         5660.74      6190.10      1.09x
BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-32       4781.65      5233.33      1.09x
BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-32        4465.43      4834.69      1.08x
BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-32           28007.52     23775.05     0.85x
BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-32     70726.30     75137.45     1.06x
BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-32     58807.06     67592.15     1.15x
BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-32           8025.76      9043.30      1.13x
BenchmarkDecoder_DecodeAllParallel/html.zst-32               12243.78     13733.45     1.12x
BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-32      2925.99      4023.69      1.38x

huff0 alone:

benchmark                                            old MB/s     new MB/s     speedup
BenchmarkDecompress1XTable/digits-32                 230.69       351.20       1.52x
BenchmarkDecompress1XTable/gettysburg-32             202.22       302.48       1.50x
BenchmarkDecompress1XTable/twain-32                  212.84       320.18       1.50x
BenchmarkDecompress1XTable/low-ent.10k-32            250.10       384.25       1.54x
BenchmarkDecompress1XTable/superlow-ent-10k-32       245.01       381.06       1.56x
BenchmarkDecompress1XTable/crash2-32                 21.42        24.67        1.15x
BenchmarkDecompress1XTable/endzerobits-32            69.13        74.07        1.07x
BenchmarkDecompress1XTable/endnonzero-32             14.81        15.55        1.05x
BenchmarkDecompress1XTable/case1-32                  27.26        30.61        1.12x
BenchmarkDecompress1XTable/case2-32                  22.63        25.42        1.12x
BenchmarkDecompress1XTable/case3-32                  24.02        26.89        1.12x
BenchmarkDecompress1XTable/pngdata.001-32            242.79       407.23       1.68x
BenchmarkDecompress1XTable/normcount2-32             62.93        70.96        1.13x
BenchmarkDecompress1XNoTable/digits-32               229.28       350.49       1.53x
BenchmarkDecompress1XNoTable/gettysburg-32           235.56       383.63       1.63x
BenchmarkDecompress1XNoTable/twain-32                211.48       322.60       1.53x
BenchmarkDecompress1XNoTable/low-ent.10k-32          248.74       387.60       1.56x
BenchmarkDecompress1XNoTable/superlow-ent-10k-32     248.19       388.28       1.56x
BenchmarkDecompress1XNoTable/crash2-32               166.94       220.59       1.32x
BenchmarkDecompress1XNoTable/endzerobits-32          112.91       124.23       1.10x
BenchmarkDecompress1XNoTable/endnonzero-32           132.30       153.07       1.16x
BenchmarkDecompress1XNoTable/case1-32                214.54       314.86       1.47x
BenchmarkDecompress1XNoTable/case2-32                208.43       317.56       1.52x
BenchmarkDecompress1XNoTable/case3-32                208.96       303.29       1.45x
BenchmarkDecompress1XNoTable/pngdata.001-32          246.06       415.27       1.69x
BenchmarkDecompress1XNoTable/normcount2-32           222.14       322.08       1.45x
BenchmarkDecompress4XNoTable/digits-32               454.06       589.41       1.30x
BenchmarkDecompress4XNoTable/gettysburg-32           519.04       549.23       1.06x
BenchmarkDecompress4XNoTable/twain-32                377.67       455.67       1.21x
BenchmarkDecompress4XNoTable/low-ent.10k-32          606.85       692.22       1.14x
BenchmarkDecompress4XNoTable/superlow-ent-10k-32     587.74       677.59       1.15x
BenchmarkDecompress4XNoTable/case1-32                170.95       229.36       1.34x
BenchmarkDecompress4XNoTable/case2-32                165.58       229.85       1.39x
BenchmarkDecompress4XNoTable/case3-32                174.01       238.52       1.37x
BenchmarkDecompress4XNoTable/pngdata.001-32          585.15       655.99       1.12x
BenchmarkDecompress4XNoTable/normcount2-32           193.93       289.04       1.49x
BenchmarkDecompress4XTable/digits-32                 452.40       587.78       1.30x
BenchmarkDecompress4XTable/gettysburg-32             370.66       395.90       1.07x
BenchmarkDecompress4XTable/twain-32                  379.40       449.42       1.18x
BenchmarkDecompress4XTable/low-ent.10k-32            609.16       687.14       1.13x
BenchmarkDecompress4XTable/superlow-ent-10k-32       572.14       656.00       1.15x
BenchmarkDecompress4XTable/case1-32                  26.26        28.90        1.10x
BenchmarkDecompress4XTable/case2-32                  21.77        24.31        1.12x
BenchmarkDecompress4XTable/case3-32                  23.13        25.91        1.12x
BenchmarkDecompress4XTable/pngdata.001-32            564.01       635.71       1.13x
BenchmarkDecompress4XTable/normcount2-32             59.49        68.19        1.15x  

fse/bitreader.go Outdated Show resolved Hide resolved
~1% faster

```
benchmark                                            old MB/s     new MB/s     speedup
BenchmarkDecompress4XNoTable/digits-32               539.58       539.10       1.00x
BenchmarkDecompress4XNoTable/gettysburg-32           599.42       602.45       1.01x
BenchmarkDecompress4XNoTable/twain-32                494.24       492.16       1.00x
BenchmarkDecompress4XNoTable/low-ent.10k-32          721.10       723.53       1.00x
BenchmarkDecompress4XNoTable/superlow-ent-10k-32     693.80       697.17       1.00x
BenchmarkDecompress4XNoTable/case1-32                214.86       217.89       1.01x
BenchmarkDecompress4XNoTable/case2-32                216.82       221.86       1.02x
BenchmarkDecompress4XNoTable/case3-32                217.85       223.18       1.02x
BenchmarkDecompress4XNoTable/pngdata.001-32          709.13       711.84       1.00x
BenchmarkDecompress4XNoTable/normcount2-32           231.85       238.10       1.03x
BenchmarkDecompress4XTable/digits-32                 535.73       537.17       1.00x
BenchmarkDecompress4XTable/gettysburg-32             418.03       420.13       1.01x
BenchmarkDecompress4XTable/twain-32                  492.16       494.24       1.00x
BenchmarkDecompress4XTable/low-ent.10k-32            711.49       715.16       1.01x
BenchmarkDecompress4XTable/superlow-ent-10k-32       670.99       673.28       1.00x
BenchmarkDecompress4XTable/case1-32                  28.75        28.90        1.01x
BenchmarkDecompress4XTable/case2-32                  24.69        24.61        1.00x
BenchmarkDecompress4XTable/case3-32                  26.02        25.95        1.00x
BenchmarkDecompress4XTable/pngdata.001-32            685.85       689.59       1.01x
BenchmarkDecompress4XTable/normcount2-32             65.53        66.29        1.01x
```
```
benchmark                                            old ns/op     new ns/op     delta
BenchmarkDecompress1XTable/digits-32                 438004        436984        -0.23%
BenchmarkDecompress1XTable/gettysburg-32             7610          7594          -0.21%
BenchmarkDecompress1XTable/twain-32                  1238883       1233712       -0.42%
BenchmarkDecompress1XTable/low-ent.10k-32            161110        161066        -0.03%
BenchmarkDecompress1XTable/superlow-ent-10k-32       42624         42459         -0.39%
BenchmarkDecompress1XTable/crash2-32                 636           626           -1.57%
BenchmarkDecompress1XTable/endzerobits-32            78.0          70.8          -9.23%
BenchmarkDecompress1XTable/endnonzero-32             438           426           -2.74%
BenchmarkDecompress1XTable/case1-32                  1911          1878          -1.73%
BenchmarkDecompress1XTable/case2-32                  1845          1834          -0.60%
BenchmarkDecompress1XTable/case3-32                  1860          1848          -0.65%
BenchmarkDecompress1XTable/pngdata.001-32            209872        209872        +0.00%
BenchmarkDecompress1XTable/normcount2-32             1330          1322          -0.60%
BenchmarkDecompress1XNoTable/digits-32               437276        436904        -0.09%
BenchmarkDecompress1XNoTable/gettysburg-32           6572          6550          -0.33%
BenchmarkDecompress1XNoTable/twain-32                1235781       1231793       -0.32%
BenchmarkDecompress1XNoTable/low-ent.10k-32          160400        160666        +0.17%
BenchmarkDecompress1XNoTable/superlow-ent-10k-32     42090         42067         -0.05%
BenchmarkDecompress1XNoTable/crash2-32               89.4          85.4          -4.47%
BenchmarkDecompress1XNoTable/endzerobits-32          44.9          42.8          -4.68%
BenchmarkDecompress1XNoTable/endnonzero-32           53.2          51.3          -3.57%
BenchmarkDecompress1XNoTable/case1-32                252           247           -1.98%
BenchmarkDecompress1XNoTable/case2-32                213           202           -5.16%
BenchmarkDecompress1XNoTable/case3-32                225           222           -1.33%
BenchmarkDecompress1XNoTable/pngdata.001-32          207247        207004        -0.12%
BenchmarkDecompress1XNoTable/normcount2-32           384           383           -0.26%
```
@klauspost klauspost merged commit 31108c0 into master Jun 5, 2020
@klauspost klauspost deleted the zstd-optimize-small-blocks branch June 5, 2020 08:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants