Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zstd: Big speedup on small dictionary encodes #345

Merged
merged 5 commits into from
Mar 24, 2021
Merged

Conversation

klauspost
Copy link
Owner

@klauspost klauspost commented Mar 23, 2021

All credit goes to @tony2001

As shown in #344 the speed of small dictionary compression tasks (< 32K) can be improved significantly by keeping track of the state of the hash table.

This effectively implements #344 but avoids a penalty for non-dictionary encodes and extends the functionality to the "better" compression mode as well.

This change will also make it easier to remove the copy of the literal dictionary every time an encode starts and have specialized code to deal with this.

benchmark                                                                     old ns/op     new ns/op     delta
BenchmarkEncodeAllDict0_1024/length-19-level-fastest-dict-1-32                5729          870           -84.82%
BenchmarkEncodeAllDict0_1024/length-19-level-default-dict-1-32                59694         2115          -96.46%
BenchmarkEncodeAllDict0_1024/length-19-level-better-dict-1-32                 197183        2454          -98.76%
BenchmarkEncodeAllDict0_1024/length-5-level-fastest-dict-1-32                 5596          600           -89.28%
BenchmarkEncodeAllDict0_1024/length-5-level-default-dict-1-32                 59342         1222          -97.94%
BenchmarkEncodeAllDict0_1024/length-5-level-better-dict-1-32                  194466        1958          -98.99%
BenchmarkEncodeAllDict0_1024/length-659-level-fastest-dict-1-32               13343         13132         -1.58%
BenchmarkEncodeAllDict0_1024/length-659-level-default-dict-1-32               72651         33988         -53.22%
BenchmarkEncodeAllDict0_1024/length-659-level-better-dict-1-32                211509        22635         -89.30%
BenchmarkEncodeAllDict0_1024/length-174-level-fastest-dict-1-32               12190         10318         -15.36%
BenchmarkEncodeAllDict0_1024/length-174-level-default-dict-1-32               71443         28580         -60.00%
BenchmarkEncodeAllDict0_1024/length-174-level-better-dict-1-32                213304        17914         -91.60%
BenchmarkEncodeAllDict0_1024/length-5-level-fastest-dict-1#01-32              5582          595           -89.33%
BenchmarkEncodeAllDict0_1024/length-5-level-default-dict-1#01-32              58721         1221          -97.92%
BenchmarkEncodeAllDict0_1024/length-5-level-better-dict-1#01-32               196875        1963          -99.00%
BenchmarkEncodeAllDict0_1024/length-659-level-fastest-dict-1#01-32            13260         13132         -0.97%
BenchmarkEncodeAllDict0_1024/length-659-level-default-dict-1#01-32            71944         33896         -52.89%
BenchmarkEncodeAllDict0_1024/length-659-level-better-dict-1#01-32             207200        22533         -89.12%
BenchmarkEncodeAllDict0_1024/length-174-level-fastest-dict-1#01-32            12218         10295         -15.74%
BenchmarkEncodeAllDict0_1024/length-174-level-default-dict-1#01-32            69490         28531         -58.94%
BenchmarkEncodeAllDict0_1024/length-174-level-better-dict-1#01-32             205039        18020         -91.21%
BenchmarkEncodeAllDict0_1024/length-5-level-fastest-dict-1#02-32              5579          599           -89.26%
BenchmarkEncodeAllDict0_1024/length-5-level-default-dict-1#02-32              60810         1228          -97.98%
BenchmarkEncodeAllDict0_1024/length-5-level-better-dict-1#02-32               198740        1953          -99.02%
BenchmarkEncodeAllDict0_1024/length-659-level-fastest-dict-1#02-32            13352         13128         -1.68%
BenchmarkEncodeAllDict0_1024/length-659-level-default-dict-1#02-32            72544         33887         -53.29%
BenchmarkEncodeAllDict0_1024/length-659-level-better-dict-1#02-32             213331        22516         -89.45%
BenchmarkEncodeAllDict0_1024/length-174-level-fastest-dict-1#02-32            12204         10299         -15.61%
BenchmarkEncodeAllDict0_1024/length-174-level-default-dict-1#02-32            69317         28421         -59.00%
BenchmarkEncodeAllDict0_1024/length-174-level-better-dict-1#02-32             207613        17917         -91.37%
BenchmarkEncodeAllDict0_1024/length-5-level-fastest-dict-1#03-32              5542          600           -89.17%
BenchmarkEncodeAllDict0_1024/length-5-level-default-dict-1#03-32              59132         1218          -97.94%
BenchmarkEncodeAllDict0_1024/length-5-level-better-dict-1#03-32               196451        1952          -99.01%
BenchmarkEncodeAllDict0_1024/length-659-level-fastest-dict-1#03-32            13319         13112         -1.55%
BenchmarkEncodeAllDict0_1024/length-659-level-default-dict-1#03-32            70234         33843         -51.81%
BenchmarkEncodeAllDict0_1024/length-659-level-better-dict-1#03-32             209384        22447         -89.28%
BenchmarkEncodeAllDict0_1024/length-174-level-fastest-dict-1#03-32            12285         10297         -16.18%
BenchmarkEncodeAllDict0_1024/length-174-level-default-dict-1#03-32            71972         28585         -60.28%
BenchmarkEncodeAllDict0_1024/length-174-level-better-dict-1#03-32             215483        17902         -91.69%
BenchmarkEncodeAllDict1024_8192/length-1076-level-fastest-dict-1-32           16508         16221         -1.74%
BenchmarkEncodeAllDict1024_8192/length-1076-level-default-dict-1-32           83569         41344         -50.53%
BenchmarkEncodeAllDict1024_8192/length-1076-level-better-dict-1-32            220306        39384         -82.12%
BenchmarkEncodeAllDict1024_8192/length-5872-level-fastest-dict-1-32           41125         40975         -0.36%
BenchmarkEncodeAllDict1024_8192/length-5872-level-default-dict-1-32           163203        77122         -52.74%
BenchmarkEncodeAllDict1024_8192/length-5872-level-better-dict-1-32            318789        137116        -56.99%
BenchmarkEncodeAllDict1024_8192/length-1076-level-fastest-dict-1#01-32        16586         16294         -1.76%
BenchmarkEncodeAllDict1024_8192/length-1076-level-default-dict-1#01-32        82607         41120         -50.22%
BenchmarkEncodeAllDict1024_8192/length-1076-level-better-dict-1#01-32         219278        39179         -82.13%
BenchmarkEncodeAllDict1024_8192/length-5872-level-fastest-dict-1#01-32        42267         41093         -2.78%
BenchmarkEncodeAllDict1024_8192/length-5872-level-default-dict-1#01-32        164353        76905         -53.21%
BenchmarkEncodeAllDict1024_8192/length-5872-level-better-dict-1#01-32         327857        136501        -58.37%
BenchmarkEncodeAllDict1024_8192/length-1076-level-fastest-dict-1#02-32        16554         16177         -2.28%
BenchmarkEncodeAllDict1024_8192/length-1076-level-default-dict-1#02-32        83337         41239         -50.52%
BenchmarkEncodeAllDict1024_8192/length-1076-level-better-dict-1#02-32         226392        39385         -82.60%
BenchmarkEncodeAllDict1024_8192/length-5872-level-fastest-dict-1#02-32        41175         40834         -0.83%
BenchmarkEncodeAllDict1024_8192/length-5872-level-default-dict-1#02-32        160614        77318         -51.86%
BenchmarkEncodeAllDict1024_8192/length-5872-level-better-dict-1#02-32         313359        136739        -56.36%
BenchmarkEncodeAllDict1024_8192/length-1076-level-fastest-dict-1#03-32        16413         16274         -0.85%
BenchmarkEncodeAllDict1024_8192/length-1076-level-default-dict-1#03-32        81907         41151         -49.76%
BenchmarkEncodeAllDict1024_8192/length-1076-level-better-dict-1#03-32         222585        39181         -82.40%
BenchmarkEncodeAllDict1024_8192/length-5872-level-fastest-dict-1#03-32        41232         40978         -0.62%
BenchmarkEncodeAllDict1024_8192/length-5872-level-default-dict-1#03-32        159086        77235         -51.45%
BenchmarkEncodeAllDict1024_8192/length-5872-level-better-dict-1#03-32         309822        136600        -55.91%
BenchmarkEncodeAllDict8192_16384/length-12131-level-fastest-dict-1-32         55120         55056         -0.12%
BenchmarkEncodeAllDict8192_16384/length-12131-level-default-dict-1-32         291966        132353        -54.67%
BenchmarkEncodeAllDict8192_16384/length-12131-level-better-dict-1-32          467914        206802        -55.80%
BenchmarkEncodeAllDict8192_16384/length-12131-level-fastest-dict-1#01-32      53770         54785         +1.89%
BenchmarkEncodeAllDict8192_16384/length-12131-level-default-dict-1#01-32      291053        130230        -55.26%
BenchmarkEncodeAllDict8192_16384/length-12131-level-better-dict-1#01-32       476829        205292        -56.95%
BenchmarkEncodeAllDict8192_16384/length-9024-level-fastest-dict-1-32          31805         31891         +0.27%
BenchmarkEncodeAllDict8192_16384/length-9024-level-default-dict-1-32          116904        61027         -47.80%
BenchmarkEncodeAllDict8192_16384/length-9024-level-better-dict-1-32           260057        95128         -63.42%
BenchmarkEncodeAllDict8192_16384/length-12131-level-fastest-dict-1#02-32      54833         54341         -0.90%
BenchmarkEncodeAllDict8192_16384/length-12131-level-default-dict-1#02-32      291523        131595        -54.86%
BenchmarkEncodeAllDict8192_16384/length-12131-level-better-dict-1#02-32       467178        206408        -55.82%
BenchmarkEncodeAllDict8192_16384/length-12131-level-fastest-dict-1#03-32      54431         54289         -0.26%
BenchmarkEncodeAllDict8192_16384/length-12131-level-default-dict-1#03-32      291092        130441        -55.19%
BenchmarkEncodeAllDict8192_16384/length-12131-level-better-dict-1#03-32       476490        205606        -56.85%
BenchmarkEncodeAllDict16384_65536/length-59695-level-fastest-dict-1-32        245211        243965        -0.51%
BenchmarkEncodeAllDict16384_65536/length-59695-level-default-dict-1-32        817566        822310        +0.58%
BenchmarkEncodeAllDict16384_65536/length-59695-level-better-dict-1-32         1258889       590281        -53.11%
BenchmarkEncodeAllDict16384_65536/length-59695-level-fastest-dict-1#01-32     242203        241662        -0.22%
BenchmarkEncodeAllDict16384_65536/length-59695-level-default-dict-1#01-32     812895        818005        +0.63%
BenchmarkEncodeAllDict16384_65536/length-59695-level-better-dict-1#01-32      1265187       590826        -53.30%
BenchmarkEncodeAllDict16384_65536/length-59695-level-fastest-dict-1#02-32     242602        241849        -0.31%
BenchmarkEncodeAllDict16384_65536/length-59695-level-default-dict-1#02-32     828540        819250        -1.12%
BenchmarkEncodeAllDict16384_65536/length-59695-level-better-dict-1#02-32      1286233       586918        -54.37%
BenchmarkEncodeAllDict16384_65536/length-59695-level-fastest-dict-1#03-32     245593        244559        -0.42%
BenchmarkEncodeAllDict16384_65536/length-59695-level-default-dict-1#03-32     813931        819203        +0.65%
BenchmarkEncodeAllDict16384_65536/length-59695-level-better-dict-1#03-32      1272813       581714        -54.30%
BenchmarkEncodeAllDict16384_65536/length-20000-level-fastest-dict-1-32        18972         18733         -1.26%
BenchmarkEncodeAllDict16384_65536/length-20000-level-default-dict-1-32        75984         39850         -47.55%
BenchmarkEncodeAllDict16384_65536/length-20000-level-better-dict-1-32         213173        27825         -86.95%
BenchmarkEncodeAllDict65536_0/length-210569-level-fastest-dict-1-32           1070089       1055243       -1.39%
BenchmarkEncodeAllDict65536_0/length-210569-level-default-dict-1-32           1780011       1819554       +2.22%
BenchmarkEncodeAllDict65536_0/length-210569-level-better-dict-1-32            2785437       1631976       -41.41%
BenchmarkEncodeAllDict65536_0/length-102605-level-fastest-dict-1-32           500568        499781        -0.16%
BenchmarkEncodeAllDict65536_0/length-102605-level-default-dict-1-32           1036024       1076927       +3.95%
BenchmarkEncodeAllDict65536_0/length-102605-level-better-dict-1-32            1740181       859317        -50.62%
BenchmarkEncodeAllDict65536_0/length-68013-level-fastest-dict-1-32            410671        405122        -1.35%
BenchmarkEncodeAllDict65536_0/length-68013-level-default-dict-1-32            1025429       1025611       +0.02%
BenchmarkEncodeAllDict65536_0/length-68013-level-better-dict-1-32             1584230       739134        -53.34%
BenchmarkEncodeAllDict65536_0/length-210569-level-fastest-dict-1#01-32        1054258       1048012       -0.59%
BenchmarkEncodeAllDict65536_0/length-210569-level-default-dict-1#01-32        1756825       1810346       +3.05%
BenchmarkEncodeAllDict65536_0/length-210569-level-better-dict-1#01-32         2816869       1659755       -41.08%
BenchmarkEncodeAllDict65536_0/length-102605-level-fastest-dict-1#01-32        498201        500382        +0.44%
BenchmarkEncodeAllDict65536_0/length-102605-level-default-dict-1#01-32        1045296       1075033       +2.84%
BenchmarkEncodeAllDict65536_0/length-102605-level-better-dict-1#01-32         1772563       855280        -51.75%
BenchmarkEncodeAllDict65536_0/length-68013-level-fastest-dict-1#01-32         411487        404032        -1.81%
BenchmarkEncodeAllDict65536_0/length-68013-level-default-dict-1#01-32         1009682       1023147       +1.33%
BenchmarkEncodeAllDict65536_0/length-68013-level-better-dict-1#01-32          1588776       728182        -54.17%
BenchmarkEncodeAllDict65536_0/length-102605-level-fastest-dict-1#02-32        501487        498564        -0.58%
BenchmarkEncodeAllDict65536_0/length-102605-level-default-dict-1#02-32        1037744       1074253       +3.52%
BenchmarkEncodeAllDict65536_0/length-102605-level-better-dict-1#02-32         1753509       859959        -50.96%
BenchmarkEncodeAllDict65536_0/length-68013-level-fastest-dict-1#02-32         407233        403579        -0.90%
BenchmarkEncodeAllDict65536_0/length-68013-level-default-dict-1#02-32         1013906       1026835       +1.28%
BenchmarkEncodeAllDict65536_0/length-68013-level-better-dict-1#02-32          1591512       731027        -54.07%
BenchmarkEncodeAllDict65536_0/length-102605-level-fastest-dict-1#03-32        500983        495842        -1.03%
BenchmarkEncodeAllDict65536_0/length-102605-level-default-dict-1#03-32        1046435       1075070       +2.74%
BenchmarkEncodeAllDict65536_0/length-102605-level-better-dict-1#03-32         1760434       860257        -51.13%
BenchmarkEncodeAllDict65536_0/length-68013-level-fastest-dict-1#03-32         409099        405108        -0.98%
BenchmarkEncodeAllDict65536_0/length-68013-level-default-dict-1#03-32         1011372       1021036       +0.96%
BenchmarkEncodeAllDict65536_0/length-68013-level-better-dict-1#03-32          1572944       731780        -53.48%

benchmark                                                                     old MB/s     new MB/s     speedup
BenchmarkEncodeAllDict0_1024/length-19-level-fastest-dict-1-32                3.32         21.84        6.58x
BenchmarkEncodeAllDict0_1024/length-19-level-default-dict-1-32                0.32         8.98         28.06x
BenchmarkEncodeAllDict0_1024/length-19-level-better-dict-1-32                 0.10         7.74         77.40x
BenchmarkEncodeAllDict0_1024/length-5-level-fastest-dict-1-32                 0.89         8.34         9.37x
BenchmarkEncodeAllDict0_1024/length-5-level-default-dict-1-32                 0.08         4.09         51.12x
BenchmarkEncodeAllDict0_1024/length-5-level-better-dict-1-32                  0.03         2.55         85.00x
BenchmarkEncodeAllDict0_1024/length-659-level-fastest-dict-1-32               49.39        50.18        1.02x
BenchmarkEncodeAllDict0_1024/length-659-level-default-dict-1-32               9.07         19.39        2.14x
BenchmarkEncodeAllDict0_1024/length-659-level-better-dict-1-32                3.12         29.11        9.33x
BenchmarkEncodeAllDict0_1024/length-174-level-fastest-dict-1-32               14.27        16.86        1.18x
BenchmarkEncodeAllDict0_1024/length-174-level-default-dict-1-32               2.44         6.09         2.50x
BenchmarkEncodeAllDict0_1024/length-174-level-better-dict-1-32                0.82         9.71         11.84x
BenchmarkEncodeAllDict0_1024/length-5-level-fastest-dict-1#01-32              0.90         8.40         9.33x
BenchmarkEncodeAllDict0_1024/length-5-level-default-dict-1#01-32              0.09         4.10         45.56x
BenchmarkEncodeAllDict0_1024/length-5-level-better-dict-1#01-32               0.03         2.55         85.00x
BenchmarkEncodeAllDict0_1024/length-659-level-fastest-dict-1#01-32            49.70        50.18        1.01x
BenchmarkEncodeAllDict0_1024/length-659-level-default-dict-1#01-32            9.16         19.44        2.12x
BenchmarkEncodeAllDict0_1024/length-659-level-better-dict-1#01-32             3.18         29.25        9.20x
BenchmarkEncodeAllDict0_1024/length-174-level-fastest-dict-1#01-32            14.24        16.90        1.19x
BenchmarkEncodeAllDict0_1024/length-174-level-default-dict-1#01-32            2.50         6.10         2.44x
BenchmarkEncodeAllDict0_1024/length-174-level-better-dict-1#01-32             0.85         9.66         11.36x
BenchmarkEncodeAllDict0_1024/length-5-level-fastest-dict-1#02-32              0.90         8.35         9.28x
BenchmarkEncodeAllDict0_1024/length-5-level-default-dict-1#02-32              0.08         4.07         50.88x
BenchmarkEncodeAllDict0_1024/length-5-level-better-dict-1#02-32               0.03         2.56         85.33x
BenchmarkEncodeAllDict0_1024/length-659-level-fastest-dict-1#02-32            49.36        50.20        1.02x
BenchmarkEncodeAllDict0_1024/length-659-level-default-dict-1#02-32            9.08         19.45        2.14x
BenchmarkEncodeAllDict0_1024/length-659-level-better-dict-1#02-32             3.09         29.27        9.47x
BenchmarkEncodeAllDict0_1024/length-174-level-fastest-dict-1#02-32            14.26        16.90        1.19x
BenchmarkEncodeAllDict0_1024/length-174-level-default-dict-1#02-32            2.51         6.12         2.44x
BenchmarkEncodeAllDict0_1024/length-174-level-better-dict-1#02-32             0.84         9.71         11.56x
BenchmarkEncodeAllDict0_1024/length-5-level-fastest-dict-1#03-32              0.90         8.33         9.26x
BenchmarkEncodeAllDict0_1024/length-5-level-default-dict-1#03-32              0.08         4.11         51.38x
BenchmarkEncodeAllDict0_1024/length-5-level-better-dict-1#03-32               0.03         2.56         85.33x
BenchmarkEncodeAllDict0_1024/length-659-level-fastest-dict-1#03-32            49.48        50.26        1.02x
BenchmarkEncodeAllDict0_1024/length-659-level-default-dict-1#03-32            9.38         19.47        2.08x
BenchmarkEncodeAllDict0_1024/length-659-level-better-dict-1#03-32             3.15         29.36        9.32x
BenchmarkEncodeAllDict0_1024/length-174-level-fastest-dict-1#03-32            14.16        16.90        1.19x
BenchmarkEncodeAllDict0_1024/length-174-level-default-dict-1#03-32            2.42         6.09         2.52x
BenchmarkEncodeAllDict0_1024/length-174-level-better-dict-1#03-32             0.81         9.72         12.00x
BenchmarkEncodeAllDict1024_8192/length-1076-level-fastest-dict-1-32           65.18        66.33        1.02x
BenchmarkEncodeAllDict1024_8192/length-1076-level-default-dict-1-32           12.88        26.03        2.02x
BenchmarkEncodeAllDict1024_8192/length-1076-level-better-dict-1-32            4.88         27.32        5.60x
BenchmarkEncodeAllDict1024_8192/length-5872-level-fastest-dict-1-32           142.78       143.31       1.00x
BenchmarkEncodeAllDict1024_8192/length-5872-level-default-dict-1-32           35.98        76.14        2.12x
BenchmarkEncodeAllDict1024_8192/length-5872-level-better-dict-1-32            18.42        42.82        2.32x
BenchmarkEncodeAllDict1024_8192/length-1076-level-fastest-dict-1#01-32        64.88        66.04        1.02x
BenchmarkEncodeAllDict1024_8192/length-1076-level-default-dict-1#01-32        13.03        26.17        2.01x
BenchmarkEncodeAllDict1024_8192/length-1076-level-better-dict-1#01-32         4.91         27.46        5.59x
BenchmarkEncodeAllDict1024_8192/length-5872-level-fastest-dict-1#01-32        138.93       142.90       1.03x
BenchmarkEncodeAllDict1024_8192/length-5872-level-default-dict-1#01-32        35.73        76.35        2.14x
BenchmarkEncodeAllDict1024_8192/length-5872-level-better-dict-1#01-32         17.91        43.02        2.40x
BenchmarkEncodeAllDict1024_8192/length-1076-level-fastest-dict-1#02-32        65.00        66.51        1.02x
BenchmarkEncodeAllDict1024_8192/length-1076-level-default-dict-1#02-32        12.91        26.09        2.02x
BenchmarkEncodeAllDict1024_8192/length-1076-level-better-dict-1#02-32         4.75         27.32        5.75x
BenchmarkEncodeAllDict1024_8192/length-5872-level-fastest-dict-1#02-32        142.61       143.80       1.01x
BenchmarkEncodeAllDict1024_8192/length-5872-level-default-dict-1#02-32        36.56        75.95        2.08x
BenchmarkEncodeAllDict1024_8192/length-5872-level-better-dict-1#02-32         18.74        42.94        2.29x
BenchmarkEncodeAllDict1024_8192/length-1076-level-fastest-dict-1#03-32        65.56        66.12        1.01x
BenchmarkEncodeAllDict1024_8192/length-1076-level-default-dict-1#03-32        13.14        26.15        1.99x
BenchmarkEncodeAllDict1024_8192/length-1076-level-better-dict-1#03-32         4.83         27.46        5.69x
BenchmarkEncodeAllDict1024_8192/length-5872-level-fastest-dict-1#03-32        142.41       143.30       1.01x
BenchmarkEncodeAllDict1024_8192/length-5872-level-default-dict-1#03-32        36.91        76.03        2.06x
BenchmarkEncodeAllDict1024_8192/length-5872-level-better-dict-1#03-32         18.95        42.99        2.27x
BenchmarkEncodeAllDict8192_16384/length-12131-level-fastest-dict-1-32         220.08       220.34       1.00x
BenchmarkEncodeAllDict8192_16384/length-12131-level-default-dict-1-32         41.55        91.66        2.21x
BenchmarkEncodeAllDict8192_16384/length-12131-level-better-dict-1-32          25.93        58.66        2.26x
BenchmarkEncodeAllDict8192_16384/length-12131-level-fastest-dict-1#01-32      225.61       221.43       0.98x
BenchmarkEncodeAllDict8192_16384/length-12131-level-default-dict-1#01-32      41.68        93.15        2.23x
BenchmarkEncodeAllDict8192_16384/length-12131-level-better-dict-1#01-32       25.44        59.09        2.32x
BenchmarkEncodeAllDict8192_16384/length-9024-level-fastest-dict-1-32          283.73       282.97       1.00x
BenchmarkEncodeAllDict8192_16384/length-9024-level-default-dict-1-32          77.19        147.87       1.92x
BenchmarkEncodeAllDict8192_16384/length-9024-level-better-dict-1-32           34.70        94.86        2.73x
BenchmarkEncodeAllDict8192_16384/length-12131-level-fastest-dict-1#02-32      221.23       223.24       1.01x
BenchmarkEncodeAllDict8192_16384/length-12131-level-default-dict-1#02-32      41.61        92.18        2.22x
BenchmarkEncodeAllDict8192_16384/length-12131-level-better-dict-1#02-32       25.97        58.77        2.26x
BenchmarkEncodeAllDict8192_16384/length-12131-level-fastest-dict-1#03-32      222.87       223.45       1.00x
BenchmarkEncodeAllDict8192_16384/length-12131-level-default-dict-1#03-32      41.67        93.00        2.23x
BenchmarkEncodeAllDict8192_16384/length-12131-level-better-dict-1#03-32       25.46        59.00        2.32x
BenchmarkEncodeAllDict16384_65536/length-59695-level-fastest-dict-1-32        243.44       244.69       1.01x
BenchmarkEncodeAllDict16384_65536/length-59695-level-default-dict-1-32        73.02        72.59        0.99x
BenchmarkEncodeAllDict16384_65536/length-59695-level-better-dict-1-32         47.42        101.13       2.13x
BenchmarkEncodeAllDict16384_65536/length-59695-level-fastest-dict-1#01-32     246.47       247.02       1.00x
BenchmarkEncodeAllDict16384_65536/length-59695-level-default-dict-1#01-32     73.44        72.98        0.99x
BenchmarkEncodeAllDict16384_65536/length-59695-level-better-dict-1#01-32      47.18        101.04       2.14x
BenchmarkEncodeAllDict16384_65536/length-59695-level-fastest-dict-1#02-32     246.06       246.83       1.00x
BenchmarkEncodeAllDict16384_65536/length-59695-level-default-dict-1#02-32     72.05        72.87        1.01x
BenchmarkEncodeAllDict16384_65536/length-59695-level-better-dict-1#02-32      46.41        101.71       2.19x
BenchmarkEncodeAllDict16384_65536/length-59695-level-fastest-dict-1#03-32     243.06       244.09       1.00x
BenchmarkEncodeAllDict16384_65536/length-59695-level-default-dict-1#03-32     73.34        72.87        0.99x
BenchmarkEncodeAllDict16384_65536/length-59695-level-better-dict-1#03-32      46.90        102.62       2.19x
BenchmarkEncodeAllDict16384_65536/length-20000-level-fastest-dict-1-32        1054.19      1067.64      1.01x
BenchmarkEncodeAllDict16384_65536/length-20000-level-default-dict-1-32        263.21       501.88       1.91x
BenchmarkEncodeAllDict16384_65536/length-20000-level-better-dict-1-32         93.82        718.77       7.66x
BenchmarkEncodeAllDict65536_0/length-210569-level-fastest-dict-1-32           196.78       199.55       1.01x
BenchmarkEncodeAllDict65536_0/length-210569-level-default-dict-1-32           118.30       115.73       0.98x
BenchmarkEncodeAllDict65536_0/length-210569-level-better-dict-1-32            75.60        129.03       1.71x
BenchmarkEncodeAllDict65536_0/length-102605-level-fastest-dict-1-32           204.98       205.30       1.00x
BenchmarkEncodeAllDict65536_0/length-102605-level-default-dict-1-32           99.04        95.28        0.96x
BenchmarkEncodeAllDict65536_0/length-102605-level-better-dict-1-32            58.96        119.40       2.03x
BenchmarkEncodeAllDict65536_0/length-68013-level-fastest-dict-1-32            165.61       167.88       1.01x
BenchmarkEncodeAllDict65536_0/length-68013-level-default-dict-1-32            66.33        66.31        1.00x
BenchmarkEncodeAllDict65536_0/length-68013-level-better-dict-1-32             42.93        92.02        2.14x
BenchmarkEncodeAllDict65536_0/length-210569-level-fastest-dict-1#01-32        199.73       200.92       1.01x
BenchmarkEncodeAllDict65536_0/length-210569-level-default-dict-1#01-32        119.86       116.31       0.97x
BenchmarkEncodeAllDict65536_0/length-210569-level-better-dict-1#01-32         74.75        126.87       1.70x
BenchmarkEncodeAllDict65536_0/length-102605-level-fastest-dict-1#01-32        205.95       205.05       1.00x
BenchmarkEncodeAllDict65536_0/length-102605-level-default-dict-1#01-32        98.16        95.44        0.97x
BenchmarkEncodeAllDict65536_0/length-102605-level-better-dict-1#01-32         57.89        119.97       2.07x
BenchmarkEncodeAllDict65536_0/length-68013-level-fastest-dict-1#01-32         165.29       168.34       1.02x
BenchmarkEncodeAllDict65536_0/length-68013-level-default-dict-1#01-32         67.36        66.47        0.99x
BenchmarkEncodeAllDict65536_0/length-68013-level-better-dict-1#01-32          42.81        93.40        2.18x
BenchmarkEncodeAllDict65536_0/length-102605-level-fastest-dict-1#02-32        204.60       205.80       1.01x
BenchmarkEncodeAllDict65536_0/length-102605-level-default-dict-1#02-32        98.87        95.51        0.97x
BenchmarkEncodeAllDict65536_0/length-102605-level-better-dict-1#02-32         58.51        119.31       2.04x
BenchmarkEncodeAllDict65536_0/length-68013-level-fastest-dict-1#02-32         167.01       168.52       1.01x
BenchmarkEncodeAllDict65536_0/length-68013-level-default-dict-1#02-32         67.08        66.24        0.99x
BenchmarkEncodeAllDict65536_0/length-68013-level-better-dict-1#02-32          42.73        93.04        2.18x
BenchmarkEncodeAllDict65536_0/length-102605-level-fastest-dict-1#03-32        204.81       206.93       1.01x
BenchmarkEncodeAllDict65536_0/length-102605-level-default-dict-1#03-32        98.05        95.44        0.97x
BenchmarkEncodeAllDict65536_0/length-102605-level-better-dict-1#03-32         58.28        119.27       2.05x
BenchmarkEncodeAllDict65536_0/length-68013-level-fastest-dict-1#03-32         166.25       167.89       1.01x
BenchmarkEncodeAllDict65536_0/length-68013-level-default-dict-1#03-32         67.25        66.61        0.99x
BenchmarkEncodeAllDict65536_0/length-68013-level-better-dict-1#03-32          43.24        92.94        2.15x 

@tony2001
Copy link

According to my benchmarks, the best results are achieved with tableShardSize = 64, (that would mean tableShardCnt = 1 << (tableBits - 6) in enc_fast.go).
This number is empirical, but I believe the reason is that sizeof(tableEntry) == 64, which means that each "shard" is 64*64=4096 bytes, standard page size on Linux.

Other than that, it seems your version is just my patch on steroids :), so I still get the same results in the benchmarks (after adjusting the shard size, that is).

RE: code beauty
Do you think a table struct with getters/setters doing the "dirty" stuff behind the scenes would affect performance negatively? It just asks for some kind of general solution for all these tables, but my attempts resulted in like -10%-20% of speed =(

@klauspost
Copy link
Owner Author

@tony2001

tableShardSize = 64

I will make it a tweakable constant and benchmark the numbers. It will mainly be a tradeoff between the cost of branching vs the cost of copying memory. So a system with less memory bandwidth will prefer smaller shards.

Do you think a table struct with getters/setters doing the "dirty" stuff behind the scenes would affect performance negatively? It just asks for some kind of general solution for all these tables, but my attempts resulted in like -10%-20% of speed =(

I looked into that. The main issue is that the e.tableShardDirty[entryNum/tableShardSize] division must be resolvable at compile time, otherwise performance will be horrible.

Less important, but also important is the i*shardSize multiplication which also benefits greatly from being constant. Though, if the shard size is constant for all tables that can be achieved with a generic implementation.

@klauspost
Copy link
Owner Author

I would have liked to be able to the same for all like the "fast", where it falls back to the regular encode when exceeding a certain size:

	if e.allDirty || len(src) > 32<<10 {
		e.fastEncoder.Encode(blk, src)
		e.allDirty = true
		return
	}

.. but I couldn't find a neat way for that.

@klauspost klauspost merged commit 2bb69be into master Mar 24, 2021
@klauspost klauspost deleted the dict-experiments branch March 24, 2021 16:22
@tony2001
Copy link

Very much appreciated!

mostynb added a commit to mostynb/zstdpool-syncpool that referenced this pull request Mar 27, 2021
There have been a few zstd improvements since v1.11.6:

* zstd: Big speedup on small dictionary encodes
  klauspost/compress#345
* zstd: Add WithLowerEncoderMem
  klauspost/compress#336
* Faster "compression" of incompressible data
  klauspost/compress#314
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants