Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zstd: Asm decoder tweaks #537

Merged
merged 3 commits into from
Mar 18, 2022
Merged

zstd: Asm decoder tweaks #537

merged 3 commits into from
Mar 18, 2022

Conversation

klauspost
Copy link
Owner

@klauspost klauspost commented Mar 18, 2022

  • Add non-bmi amd64 tests
  • Use BEXTRQ for extracting shifted values.
  • Move 0 check into getBits.
  • Remove ctx alloc.

Sequences only, BMI:

benchmark                                                                                          old ns/op     new ns/op     delta
Benchmark_seqdec_decode/n-12286-lits-13914-prev-9869-1990358-3296656-win-4194304.blk-32            91657         91114         -0.59%
Benchmark_seqdec_decode/n-12485-lits-6960-prev-976039-2250252-2463561-win-4194304.blk-32           92392         90416         -2.14%
Benchmark_seqdec_decode/n-14746-lits-14461-prev-209-8-1379909-win-4194304.blk-32                   83022         79745         -3.95%
Benchmark_seqdec_decode/n-1525-lits-1498-prev-2009476-797934-2994405-win-4194304.blk-32            9149          8856          -3.20%
Benchmark_seqdec_decode/n-3478-lits-3628-prev-895243-2104056-2119329-win-4194304.blk-32            22402         22102         -1.34%
Benchmark_seqdec_decode/n-8422-lits-5840-prev-168095-2298675-433830-win-4194304.blk-32             60844         60114         -1.20%
Benchmark_seqdec_decode/n-1000-lits-1057-prev-21887-92-217-win-8388608.blk-32                      5785          5879          +1.62%
Benchmark_seqdec_decode/n-15134-lits-20798-prev-4882976-4884216-4474622-win-8388608.blk-32         118030        115597        -2.06%
Benchmark_seqdec_decode/n-2-lits-0-prev-620601-689171-848-win-8388608.blk-32                       135           64.3          -52.35%
Benchmark_seqdec_decode/n-90-lits-67-prev-19498-23-19710-win-8388608.blk-32                        648           589           -9.03%
Benchmark_seqdec_decode/n-931-lits-1179-prev-36502-1526-1518-win-8388608.blk-32                    5555          5467          -1.58%
Benchmark_seqdec_decode/n-2898-lits-4062-prev-335-386-751-win-8388608.blk-32                       17896         17605         -1.63%
Benchmark_seqdec_decode/n-4056-lits-12419-prev-10792-66-309849-win-8388608.blk-32                  27457         27232         -0.82%
Benchmark_seqdec_decode/n-8028-lits-4568-prev-917-65-920-win-8388608.blk-32                        59341         58158         -1.99%

No BMI:

benchmark                                                                                           old ns/op     new ns/op     delta
Benchmark_seqdec_decodeNoBMI/n-12286-lits-13914-prev-9869-1990358-3296656-win-4194304.blk-32        114889        113333        -1.35%
Benchmark_seqdec_decodeNoBMI/n-12485-lits-6960-prev-976039-2250252-2463561-win-4194304.blk-32       121269        119500        -1.46%
Benchmark_seqdec_decodeNoBMI/n-14746-lits-14461-prev-209-8-1379909-win-4194304.blk-32               106986        102585        -4.11%
Benchmark_seqdec_decodeNoBMI/n-1525-lits-1498-prev-2009476-797934-2994405-win-4194304.blk-32        10910         10304         -5.55%
Benchmark_seqdec_decodeNoBMI/n-3478-lits-3628-prev-895243-2104056-2119329-win-4194304.blk-32        25965         24642         -5.10%
Benchmark_seqdec_decodeNoBMI/n-8422-lits-5840-prev-168095-2298675-433830-win-4194304.blk-32         80183         77980         -2.75%
Benchmark_seqdec_decodeNoBMI/n-1000-lits-1057-prev-21887-92-217-win-8388608.blk-32                  6702          6369          -4.97%
Benchmark_seqdec_decodeNoBMI/n-15134-lits-20798-prev-4882976-4884216-4474622-win-8388608.blk-32     151867        148752        -2.05%
Benchmark_seqdec_decodeNoBMI/n-2-lits-0-prev-620601-689171-848-win-8388608.blk-32                   139           46.8          -66.31%
Benchmark_seqdec_decodeNoBMI/n-90-lits-67-prev-19498-23-19710-win-8388608.blk-32                    744           609           -18.13%
Benchmark_seqdec_decodeNoBMI/n-931-lits-1179-prev-36502-1526-1518-win-8388608.blk-32                6570          6083          -7.41%
Benchmark_seqdec_decodeNoBMI/n-2898-lits-4062-prev-335-386-751-win-8388608.blk-32                   20448         19955         -2.41%
Benchmark_seqdec_decodeNoBMI/n-4056-lits-12419-prev-10792-66-309849-win-8388608.blk-32              34177         32790         -4.06%
Benchmark_seqdec_decodeNoBMI/n-8028-lits-4568-prev-917-65-920-win-8388608.blk-32                    77864         75628         -2.87%

* Add non-bmi amd64 tests
* Use BEXTRQ for extracting shifted values.
* Move 0 check into getBits.
@WojciechMula
Copy link
Contributor

I run it on an Ice Lake machine with hacked decodeSync.

benchmark                                                                 old ns/op     new ns/op     delta
BenchmarkDecoder_DecoderSmall/kppkn.gtb.zst-16                            5509496       5527925       +0.33%
BenchmarkDecoder_DecoderSmall/geo.protodata.zst-16                        976587        976897        +0.03%
BenchmarkDecoder_DecoderSmall/plrabn12.txt.zst-16                         17477189      17467257      -0.06%
BenchmarkDecoder_DecoderSmall/lcet10.txt.zst-16                           12984602      13100822      +0.90%
BenchmarkDecoder_DecoderSmall/asyoulik.txt.zst-16                         4104249       4116200       +0.29%
BenchmarkDecoder_DecoderSmall/alice29.txt.zst-16                          5672330       5696725       +0.43%
BenchmarkDecoder_DecoderSmall/html_x_4.zst-16                             1521827       1540812       +1.25%
BenchmarkDecoder_DecoderSmall/paper-100k.pdf.zst-16                       230917        229915        -0.43%
BenchmarkDecoder_DecoderSmall/fireworks.jpeg.zst-16                       133560        125095        -6.34%
BenchmarkDecoder_DecoderSmall/urls.10K.zst-16                             13941048      14109452      +1.21%
BenchmarkDecoder_DecoderSmall/html.zst-16                                 1084921       1087247       +0.21%
BenchmarkDecoder_DecoderSmall/comp-data.bin.zst-16                        81267         81400         +0.16%
BenchmarkDecoder_DecodeAll/kppkn.gtb.zst-16                               696704        694461        -0.32%
BenchmarkDecoder_DecodeAll/geo.protodata.zst-16                           120473        119892        -0.48%
BenchmarkDecoder_DecodeAll/plrabn12.txt.zst-16                            2129030       2145450       +0.77%
BenchmarkDecoder_DecodeAll/lcet10.txt.zst-16                              1589307       1596482       +0.45%
BenchmarkDecoder_DecodeAll/asyoulik.txt.zst-16                            531478        539082        +1.43%
BenchmarkDecoder_DecodeAll/alice29.txt.zst-16                             713246        720601        +1.03%
BenchmarkDecoder_DecodeAll/html_x_4.zst-16                                252560        254116        +0.62%
BenchmarkDecoder_DecodeAll/paper-100k.pdf.zst-16                          24937         25219         +1.13%
BenchmarkDecoder_DecodeAll/fireworks.jpeg.zst-16                          11289         11296         +0.06%
BenchmarkDecoder_DecodeAll/urls.10K.zst-16                                1651235       1656384       +0.31%
BenchmarkDecoder_DecodeAll/html.zst-16                                    140311        140738        +0.30%
BenchmarkDecoder_DecodeAll/comp-data.bin.zst-16                           10419         10432         +0.12%
BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/fastest-16      1588209       1595238       +0.44%
BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/default-16      1695205       1715132       +1.18%
BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/better-16       1567204       1584200       +1.08%
BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/best-16         1672189       1700214       +1.68%
BenchmarkDecoder_DecodeAllFiles/e.txt/fastest-16                          9216          9184          -0.35%
BenchmarkDecoder_DecodeAllFiles/e.txt/default-16                          371543        373075        +0.41%
BenchmarkDecoder_DecodeAllFiles/e.txt/better-16                           259378        259967        +0.23%
BenchmarkDecoder_DecodeAllFiles/e.txt/best-16                             154587        154254        -0.22%
BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/fastest-16              3350          3360          +0.30%
BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/default-16              3158          3164          +0.19%
BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/better-16               3659          3680          +0.57%
BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/best-16                 9400          9390          -0.11%
BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/fastest-16                 5025          5071          +0.92%
BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/default-16                 8119          8116          -0.04%
BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/better-16                  8174          8151          -0.28%
BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/best-16                    8990          8816          -1.94%
BenchmarkDecoder_DecodeAllFiles/html.txt/fastest-16                       101491        94080         -7.30%
BenchmarkDecoder_DecodeAllFiles/html.txt/default-16                       100827        101378        +0.55%
BenchmarkDecoder_DecodeAllFiles/html.txt/better-16                        91302         91258         -0.05%
BenchmarkDecoder_DecodeAllFiles/html.txt/best-16                          110644        102223        -7.61%
BenchmarkDecoder_DecodeAllFiles/pi.txt/fastest-16                         9185          9184          -0.01%
BenchmarkDecoder_DecodeAllFiles/pi.txt/default-16                         379203        378454        -0.20%
BenchmarkDecoder_DecodeAllFiles/pi.txt/better-16                          254367        254601        +0.09%
BenchmarkDecoder_DecodeAllFiles/pi.txt/best-16                            153654        154090        +0.28%
BenchmarkDecoder_DecodeAllFiles/pngdata.bin/fastest-16                    28691         28517         -0.61%
BenchmarkDecoder_DecodeAllFiles/pngdata.bin/default-16                    31180         31352         +0.55%
BenchmarkDecoder_DecodeAllFiles/pngdata.bin/better-16                     25448         25496         +0.19%
BenchmarkDecoder_DecodeAllFiles/pngdata.bin/best-16                       30224         30196         -0.09%
BenchmarkDecoder_DecodeAllFiles/sharnd.out/fastest-16                     9179          9176          -0.03%
BenchmarkDecoder_DecodeAllFiles/sharnd.out/default-16                     9177          9170          -0.08%
BenchmarkDecoder_DecodeAllFiles/sharnd.out/better-16                      9169          9168          -0.01%
BenchmarkDecoder_DecodeAllFiles/sharnd.out/best-16                        9175          9179          +0.04%
BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/fastest-16     221665        223757        +0.94%
BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/default-16     256708        254197        -0.98%
BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/better-16      226857        231308        +1.96%
BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/best-16        240448        240278        -0.07%
BenchmarkDecoder_DecodeAllFilesP/e.txt/fastest-16                         1033          1031          -0.19%
BenchmarkDecoder_DecodeAllFilesP/e.txt/default-16                         62172         62366         +0.31%
BenchmarkDecoder_DecodeAllFilesP/e.txt/better-16                          40623         40157         -1.15%
BenchmarkDecoder_DecodeAllFilesP/e.txt/best-16                            19939         19985         +0.23%
BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/fastest-16             684           649           -5.16%
BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/default-16             700           700           -0.10%
BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/better-16              666           748           +12.32%
BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/best-16                1043          1052          +0.86%
BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/fastest-16                1034          1021          -1.26%
BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/default-16                1571          1565          -0.38%
BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/better-16                 1543          1534          -0.58%
BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/best-16                   1867          1923          +3.00%
BenchmarkDecoder_DecodeAllFilesP/html.txt/fastest-16                      21367         21252         -0.54%
BenchmarkDecoder_DecodeAllFilesP/html.txt/default-16                      23006         22775         -1.00%
BenchmarkDecoder_DecodeAllFilesP/html.txt/better-16                       21048         21021         -0.13%
BenchmarkDecoder_DecodeAllFilesP/html.txt/best-16                         22364         23247         +3.95%
BenchmarkDecoder_DecodeAllFilesP/pi.txt/fastest-16                        1056          1034          -2.08%
BenchmarkDecoder_DecodeAllFilesP/pi.txt/default-16                        61669         61869         +0.32%
BenchmarkDecoder_DecodeAllFilesP/pi.txt/better-16                         39404         39558         +0.39%
BenchmarkDecoder_DecodeAllFilesP/pi.txt/best-16                           19843         19985         +0.72%
BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/fastest-16                   5637          5571          -1.17%
BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/default-16                   5767          5628          -2.41%
BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/better-16                    5007          5042          +0.70%
BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/best-16                      4280          4224          -1.31%
BenchmarkDecoder_DecodeAllFilesP/sharnd.out/fastest-16                    1054          1033          -1.99%
BenchmarkDecoder_DecodeAllFilesP/sharnd.out/default-16                    1046          1039          -0.67%
BenchmarkDecoder_DecodeAllFilesP/sharnd.out/better-16                     1044          1040          -0.38%
BenchmarkDecoder_DecodeAllFilesP/sharnd.out/best-16                       1064          1028          -3.38%
BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-16                       104978        105888        +0.87%
BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-16                   19890         20054         +0.82%
BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-16                    317536        320238        +0.85%
BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-16                      238798        237143        -0.69%
BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-16                    81620         81709         +0.11%
BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-16                     114520        115462        +0.82%
BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-16                        33892         33623         -0.79%
BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-16                  4320          4313          -0.16%
BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-16                  1267          1257          -0.79%
BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-16                        208811        206488        -1.11%
BenchmarkDecoder_DecodeAllParallel/html.zst-16                            22129         21965         -0.74%
BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-16                   1842          1812          -1.63%

benchmark                                                                 old MB/s     new MB/s     speedup
BenchmarkDecoder_DecoderSmall/kppkn.gtb.zst-16                            267.64       266.75       1.00x
BenchmarkDecoder_DecoderSmall/geo.protodata.zst-16                        971.45       971.14       1.00x
BenchmarkDecoder_DecoderSmall/plrabn12.txt.zst-16                         220.57       220.69       1.00x
BenchmarkDecoder_DecoderSmall/lcet10.txt.zst-16                           262.93       260.60       0.99x
BenchmarkDecoder_DecoderSmall/asyoulik.txt.zst-16                         244.00       243.29       1.00x
BenchmarkDecoder_DecoderSmall/alice29.txt.zst-16                          214.50       213.58       1.00x
BenchmarkDecoder_DecoderSmall/html_x_4.zst-16                             2153.20      2126.67      0.99x
BenchmarkDecoder_DecoderSmall/paper-100k.pdf.zst-16                       3547.59      3563.06      1.00x
BenchmarkDecoder_DecoderSmall/fireworks.jpeg.zst-16                       7373.06      7871.96      1.07x
BenchmarkDecoder_DecoderSmall/urls.10K.zst-16                             402.89       398.08       0.99x
BenchmarkDecoder_DecoderSmall/html.zst-16                                 755.08       753.46       1.00x
BenchmarkDecoder_DecoderSmall/comp-data.bin.zst-16                        401.25       400.59       1.00x
BenchmarkDecoder_DecodeAll/kppkn.gtb.zst-16                               264.56       265.41       1.00x
BenchmarkDecoder_DecodeAll/geo.protodata.zst-16                           984.35       989.12       1.00x
BenchmarkDecoder_DecodeAll/plrabn12.txt.zst-16                            226.33       224.60       0.99x
BenchmarkDecoder_DecodeAll/lcet10.txt.zst-16                              268.52       267.31       1.00x
BenchmarkDecoder_DecodeAll/asyoulik.txt.zst-16                            235.53       232.21       0.99x
BenchmarkDecoder_DecodeAll/alice29.txt.zst-16                             213.24       211.06       0.99x
BenchmarkDecoder_DecodeAll/html_x_4.zst-16                                1621.80      1611.87      0.99x
BenchmarkDecoder_DecodeAll/paper-100k.pdf.zst-16                          4106.36      4060.45      0.99x
BenchmarkDecoder_DecodeAll/fireworks.jpeg.zst-16                          10903.82     10896.64     1.00x
BenchmarkDecoder_DecodeAll/urls.10K.zst-16                                425.19       423.87       1.00x
BenchmarkDecoder_DecodeAll/html.zst-16                                    729.81       727.59       1.00x
BenchmarkDecoder_DecodeAll/comp-data.bin.zst-16                           391.20       390.71       1.00x
BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/fastest-16      244.28       243.20       1.00x
BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/default-16      228.86       226.20       0.99x
BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/better-16       247.55       244.90       0.99x
BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/best-16         232.01       228.19       0.98x
BenchmarkDecoder_DecodeAllFiles/e.txt/fastest-16                          10851.38     10888.91     1.00x
BenchmarkDecoder_DecodeAllFiles/e.txt/default-16                          269.16       268.05       1.00x
BenchmarkDecoder_DecodeAllFiles/e.txt/better-16                           385.55       384.68       1.00x
BenchmarkDecoder_DecodeAllFiles/e.txt/best-16                             646.90       648.30       1.00x
BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/fastest-16              1228.51      1225.04      1.00x
BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/default-16              1303.54      1301.08      1.00x
BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/better-16               1124.91      1118.40      0.99x
BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/best-16                 437.85       438.32       1.00x
BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/fastest-16                 308.08       305.28       0.99x
BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/default-16                 190.66       190.73       1.00x
BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/better-16                  189.37       189.92       1.00x
BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/best-16                    172.19       175.59       1.02x
BenchmarkDecoder_DecodeAllFiles/html.txt/fastest-16                       438.23       472.76       1.08x
BenchmarkDecoder_DecodeAllFiles/html.txt/default-16                       441.12       438.73       0.99x
BenchmarkDecoder_DecodeAllFiles/html.txt/better-16                        487.14       487.38       1.00x
BenchmarkDecoder_DecodeAllFiles/html.txt/best-16                          401.98       435.10       1.08x
BenchmarkDecoder_DecodeAllFiles/pi.txt/fastest-16                         10887.74     10888.90     1.00x
BenchmarkDecoder_DecodeAllFiles/pi.txt/default-16                         263.72       264.24       1.00x
BenchmarkDecoder_DecodeAllFiles/pi.txt/better-16                          393.14       392.78       1.00x
BenchmarkDecoder_DecodeAllFiles/pi.txt/best-16                            650.83       648.99       1.00x
BenchmarkDecoder_DecodeAllFiles/pngdata.bin/fastest-16                    1784.51      1795.42      1.01x
BenchmarkDecoder_DecodeAllFiles/pngdata.bin/default-16                    1642.08      1633.06      0.99x
BenchmarkDecoder_DecodeAllFiles/pngdata.bin/better-16                     2011.97      2008.18      1.00x
BenchmarkDecoder_DecodeAllFiles/pngdata.bin/best-16                       1694.00      1695.57      1.00x
BenchmarkDecoder_DecodeAllFiles/sharnd.out/fastest-16                     10895.05     10898.67     1.00x
BenchmarkDecoder_DecodeAllFiles/sharnd.out/default-16                     10897.49     10906.08     1.00x
BenchmarkDecoder_DecodeAllFiles/sharnd.out/better-16                      10906.20     10907.89     1.00x
BenchmarkDecoder_DecodeAllFiles/sharnd.out/best-16                        10899.99     10895.32     1.00x
BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/fastest-16     1750.23      1733.87      0.99x
BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/default-16     1511.31      1526.23      1.01x
BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/better-16      1710.17      1677.26      0.98x
BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/best-16        1613.51      1614.65      1.00x
BenchmarkDecoder_DecodeAllFilesP/e.txt/fastest-16                         96781.45     96953.41     1.00x
BenchmarkDecoder_DecodeAllFilesP/e.txt/default-16                         1608.50      1603.48      1.00x
BenchmarkDecoder_DecodeAllFilesP/e.txt/better-16                          2461.75      2490.30      1.01x
BenchmarkDecoder_DecodeAllFilesP/e.txt/best-16                            5015.35      5003.88      1.00x
BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/fastest-16             6013.47      6339.85      1.05x
BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/default-16             5877.06      5883.32      1.00x
BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/better-16              6178.23      5500.80      0.89x
BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/best-16                3945.84      3912.72      0.99x
BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/fastest-16                1497.34      1516.09      1.01x
BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/default-16                985.22       989.08       1.00x
BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/better-16                 1003.09      1009.24      1.01x
BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/best-16                   828.93       804.90       0.97x
BenchmarkDecoder_DecodeAllFilesP/html.txt/fastest-16                      2081.55      2092.83      1.01x
BenchmarkDecoder_DecodeAllFilesP/html.txt/default-16                      1933.31      1952.92      1.01x
BenchmarkDecoder_DecodeAllFilesP/html.txt/better-16                       2113.08      2115.87      1.00x
BenchmarkDecoder_DecodeAllFilesP/html.txt/best-16                         1988.81      1913.25      0.96x
BenchmarkDecoder_DecodeAllFilesP/pi.txt/fastest-16                        94717.97     96695.04     1.02x
BenchmarkDecoder_DecodeAllFilesP/pi.txt/default-16                        1621.61      1616.38      1.00x
BenchmarkDecoder_DecodeAllFilesP/pi.txt/better-16                         2537.88      2528.02      1.00x
BenchmarkDecoder_DecodeAllFilesP/pi.txt/best-16                           5039.72      5003.94      0.99x
BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/fastest-16                   9082.41      9190.08      1.01x
BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/default-16                   8877.60      9097.84      1.02x
BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/better-16                    10225.36     10155.63     0.99x
BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/best-16                      11963.79     12121.31     1.01x
BenchmarkDecoder_DecodeAllFilesP/sharnd.out/fastest-16                    94920.28     96765.59     1.02x
BenchmarkDecoder_DecodeAllFilesP/sharnd.out/default-16                    95567.11     96258.40     1.01x
BenchmarkDecoder_DecodeAllFilesP/sharnd.out/better-16                     95820.60     96168.41     1.00x
BenchmarkDecoder_DecodeAllFilesP/sharnd.out/best-16                       93993.80     97233.25     1.03x
BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-16                       1755.80      1740.71      0.99x
BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-16                   5962.22      5913.45      0.99x
BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-16                    1517.50      1504.70      0.99x
BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-16                      1787.09      1799.56      1.01x
BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-16                    1533.68      1532.00      1.00x
BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-16                     1328.06      1317.22      0.99x
BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-16                        12085.57     12182.27     1.01x
BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-16                  23705.03     23741.30     1.00x
BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-16                  97170.24     97941.62     1.01x
BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-16                        3362.31      3400.14      1.01x
BenchmarkDecoder_DecodeAllParallel/html.zst-16                            4627.37      4661.88      1.01x
BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-16                   2213.35      2248.91      1.02x

@klauspost
Copy link
Owner Author

@WojciechMula Any objections to this getting merged?

@WojciechMula
Copy link
Contributor

@WojciechMula Any objections to this getting merged?

Not really. Looks good, except for the two regressions.

@klauspost
Copy link
Owner Author

@WojciechMula The extremely small encodes are very temperamental. The alloc elimination made those cases significantly better.

@klauspost klauspost merged commit 6c809ac into master Mar 18, 2022
@klauspost klauspost deleted the zstd-decoder-tweaks branch March 18, 2022 14:06
@klauspost klauspost restored the zstd-decoder-tweaks branch March 18, 2022 14:06
@klauspost klauspost deleted the zstd-decoder-tweaks branch March 11, 2023 10:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants