Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align allocations #230

Merged
merged 3 commits into from
Jan 12, 2023
Merged

Align allocations #230

merged 3 commits into from
Jan 12, 2023

Conversation

klauspost
Copy link
Owner

For AMD64 aligned inputs can make a big speed difference.

This is an example of the speed difference when inputs are unaligned/aligned:

BenchmarkEncode100x20x10000-32    	    7058	    172648 ns/op	6950.57 MB/s
BenchmarkEncode100x20x10000-32    	    8406	    137911 ns/op	8701.24 MB/s

This is mostly the case when dealing with odd-sized shards.

To facilitate this the package provides an AllocAligned(shards, each int) [][]byte.
This will allocate a number of shards, each with the size each.
Each shard will then be aligned to a 64 byte boundary.

Each encoder also has a AllocAligned(each int) [][]byte as an extended interface which will return the same,
but with the shard count configured in the encoder.

It is not possible to re-aligned already allocated slices, for example when using Split.
When it is not possible to write to aligned shards, you should not copy to them.

Full (but rather noisy) benchmark:

benchmark                                                old ns/op      new ns/op      delta
BenchmarkGalois128K-32                                   2284           2254           -1.31%
BenchmarkGalois1M-32                                     21925          19042          -13.15%
BenchmarkGaloisXor128K-32                                2810           2782           -1.00%
BenchmarkGaloisXor1M-32                                  24223          22716          -6.22%
BenchmarkEncode2x1x1M-32                                 38969          33115          -15.02%
BenchmarkEncode800x200/64-32                             29007          28090          -3.16%
BenchmarkEncode800x200/256-32                            65858          64747          -1.69%
BenchmarkEncode800x200/1024-32                           207661         203905         -1.81%
BenchmarkEncode800x200/4096-32                           806579         789913         -2.07%
BenchmarkEncode800x200/16384-32                          4088967        3688426        -9.80%
BenchmarkEncode800x200/65536-32                          27241951       24104804       -11.52%
BenchmarkEncode800x200/262144-32                         120608789      113648633      -5.77%
BenchmarkEncode800x200/1048576-32                        451364367      420720500      -6.79%
BenchmarkEncode1K/4+4/cauchy-32                          335            345            +2.96%
BenchmarkEncode1K/4+4/leopard-gf8-32                     640            632            -1.33%
BenchmarkEncode1K/4+4/leopard-gf16-32                    455            436            -4.16%
BenchmarkEncode1K/8+8/cauchy-32                          1099           1081           -1.64%
BenchmarkEncode1K/8+8/leopard-gf8-32                     1831           1792           -2.13%
BenchmarkEncode1K/8+8/leopard-gf16-32                    1608           1586           -1.37%
BenchmarkEncode1K/16+16/cauchy-32                        4340           4372           +0.74%
BenchmarkEncode1K/16+16/leopard-gf8-32                   3330           3280           -1.50%
BenchmarkEncode1K/16+16/leopard-gf16-32                  2637           2614           -0.87%
BenchmarkEncode1K/32+32/cauchy-32                        17257          17397          +0.81%
BenchmarkEncode1K/32+32/leopard-gf8-32                   9849           9623           -2.29%
BenchmarkEncode1K/32+32/leopard-gf16-32                  8903           8806           -1.09%
BenchmarkEncode1K/64+64/cauchy-32                        68672          68374          -0.43%
BenchmarkEncode1K/64+64/leopard-gf8-32                   18283          17992          -1.59%
BenchmarkEncode1K/64+64/leopard-gf16-32                  15558          15541          -0.11%
BenchmarkEncode1K/128+128/cauchy-32                      270881         270547         -0.12%
BenchmarkEncode1K/128+128/leopard-gf8-32                 49601          48871          -1.47%
BenchmarkEncode1K/128+128/leopard-gf16-32                46158          45735          -0.92%
BenchmarkEncode1K/256+256/leopard-gf16-32                84268          83318          -1.13%
BenchmarkEncode1K/512+512/leopard-gf16-32                235278         231775         -1.49%
BenchmarkEncode1K/1024+1024/leopard-gf16-32              436245         430979         -1.21%
BenchmarkEncode1K/2048+2048/leopard-gf16-32              1227665        1108337        -9.72%
BenchmarkEncode1K/4096+4096/leopard-gf16-32              2573166        2273580        -11.64%
BenchmarkEncode1K/8192+8192/leopard-gf16-32              7377235        6443287        -12.66%
BenchmarkEncode1K/16384+16384/leopard-gf16-32            20045895       17426286       -13.07%
BenchmarkEncode1K/32768+32768/leopard-gf16-32            53570005       50222577       -6.25%
BenchmarkDecode1K/4+4/cauchy-32                          2160           2262           +4.72%
BenchmarkDecode1K/4+4/cauchy-inv-32                      1369           1461           +6.72%
BenchmarkDecode1K/4+4/cauchy-single-32                   1300           1304           +0.31%
BenchmarkDecode1K/4+4/cauchy-single-inv-32               604            633            +4.71%
BenchmarkDecode1K/4+4/leopard-gf8-32                     4277           4176           -2.36%
BenchmarkDecode1K/4+4/leopard-gf8-inv-32                 2455           2330           -5.09%
BenchmarkDecode1K/4+4/leopard-gf8-single-32              3891           3750           -3.62%
BenchmarkDecode1K/4+4/leopard-gf8-single-inv-32          1974           1914           -3.04%
BenchmarkDecode1K/4+4/leopard-gf16-32                    794838         792366         -0.31%
BenchmarkDecode1K/4+4/leopard-gf16-single-32             791991         793335         +0.17%
BenchmarkDecode1K/8+8/cauchy-32                          5445           5651           +3.78%
BenchmarkDecode1K/8+8/cauchy-inv-32                      2920           3039           +4.08%
BenchmarkDecode1K/8+8/cauchy-single-32                   2290           2285           -0.22%
BenchmarkDecode1K/8+8/cauchy-single-inv-32               788            814            +3.30%
BenchmarkDecode1K/8+8/leopard-gf8-32                     7270           7134           -1.87%
BenchmarkDecode1K/8+8/leopard-gf8-inv-32                 5433           5156           -5.10%
BenchmarkDecode1K/8+8/leopard-gf8-single-32              6168           6123           -0.73%
BenchmarkDecode1K/8+8/leopard-gf8-single-inv-32          4302           4365           +1.46%
BenchmarkDecode1K/8+8/leopard-gf16-32                    792261         787290         -0.63%
BenchmarkDecode1K/8+8/leopard-gf16-single-32             800258         793031         -0.90%
BenchmarkDecode1K/16+16/cauchy-32                        21492          21664          +0.80%
BenchmarkDecode1K/16+16/cauchy-inv-32                    7805           8140           +4.29%
BenchmarkDecode1K/16+16/cauchy-single-32                 5176           5105           -1.37%
BenchmarkDecode1K/16+16/cauchy-single-inv-32             1211           1194           -1.40%
BenchmarkDecode1K/16+16/leopard-gf8-32                   16062          15813          -1.55%
BenchmarkDecode1K/16+16/leopard-gf8-inv-32               14737          13697          -7.06%
BenchmarkDecode1K/16+16/leopard-gf8-single-32            13899          13393          -3.64%
BenchmarkDecode1K/16+16/leopard-gf8-single-inv-32        11865          11487          -3.19%
BenchmarkDecode1K/16+16/leopard-gf16-32                  805833         794984         -1.35%
BenchmarkDecode1K/16+16/leopard-gf16-single-32           808125         799816         -1.03%
BenchmarkDecode1K/32+32/cauchy-32                        117646         117088         -0.47%
BenchmarkDecode1K/32+32/cauchy-inv-32                    23975          24454          +2.00%
BenchmarkDecode1K/32+32/cauchy-single-32                 14250          13861          -2.73%
BenchmarkDecode1K/32+32/cauchy-single-inv-32             2018           1994           -1.19%
BenchmarkDecode1K/32+32/leopard-gf8-32                   31886          32203          +0.99%
BenchmarkDecode1K/32+32/leopard-gf8-inv-32               30959          30270          -2.23%
BenchmarkDecode1K/32+32/leopard-gf8-single-32            20726          22461          +8.37%
BenchmarkDecode1K/32+32/leopard-gf8-single-inv-32        20717          22592          +9.05%
BenchmarkDecode1K/32+32/leopard-gf16-32                  823015         808697         -1.74%
BenchmarkDecode1K/32+32/leopard-gf16-single-32           816765         809475         -0.89%
BenchmarkDecode1K/64+64/cauchy-32                        813174         804646         -1.05%
BenchmarkDecode1K/64+64/cauchy-inv-32                    80979          82487          +1.86%
BenchmarkDecode1K/64+64/cauchy-single-32                 45979          44580          -3.04%
BenchmarkDecode1K/64+64/cauchy-single-inv-32             3432           3286           -4.25%
BenchmarkDecode1K/64+64/leopard-gf8-32                   78343          74364          -5.08%
BenchmarkDecode1K/64+64/leopard-gf8-inv-32               73370          63120          -13.97%
BenchmarkDecode1K/64+64/leopard-gf8-single-32            50130          43959          -12.31%
BenchmarkDecode1K/64+64/leopard-gf8-single-inv-32        51308          41208          -19.69%
BenchmarkDecode1K/64+64/leopard-gf16-32                  864012         846280         -2.05%
BenchmarkDecode1K/64+64/leopard-gf16-single-32           850149         830551         -2.31%
BenchmarkDecode1K/128+128/cauchy-32                      5929095        5900026        -0.49%
BenchmarkDecode1K/128+128/cauchy-inv-32                  304087         306298         +0.73%
BenchmarkDecode1K/128+128/cauchy-single-32               164090         160974         -1.90%
BenchmarkDecode1K/128+128/cauchy-single-inv-32           5850           5625           -3.85%
BenchmarkDecode1K/128+128/leopard-gf8-32                 158429         145910         -7.90%
BenchmarkDecode1K/128+128/leopard-gf8-inv-32             152309         141621         -7.02%
BenchmarkDecode1K/128+128/leopard-gf8-single-32          112267         94170          -16.12%
BenchmarkDecode1K/128+128/leopard-gf8-single-inv-32      104646         96275          -8.00%
BenchmarkDecode1K/128+128/leopard-gf16-32                927823         920083         -0.83%
BenchmarkDecode1K/128+128/leopard-gf16-single-32         893019         885971         -0.79%
BenchmarkDecode1K/256+256/leopard-gf16-32                1132479        1105774        -2.36%
BenchmarkDecode1K/256+256/leopard-gf16-single-32         1017945        1003342        -1.43%
BenchmarkDecode1K/512+512/leopard-gf16-32                1495247        1457558        -2.52%
BenchmarkDecode1K/512+512/leopard-gf16-single-32         1276089        1239965        -2.83%
BenchmarkDecode1K/1024+1024/leopard-gf16-32              2511310        2355124        -6.22%
BenchmarkDecode1K/1024+1024/leopard-gf16-single-32       1926875        1786114        -7.31%
BenchmarkDecode1K/2048+2048/leopard-gf16-32              4574758        4051357        -11.44%
BenchmarkDecode1K/2048+2048/leopard-gf16-single-32       3404487        2936912        -13.73%
BenchmarkDecode1K/4096+4096/leopard-gf16-32              9917650        9381317        -5.41%
BenchmarkDecode1K/4096+4096/leopard-gf16-single-32       7439868        6237255        -16.16%
BenchmarkDecode1K/8192+8192/leopard-gf16-32              27173871       22125130       -18.58%
BenchmarkDecode1K/8192+8192/leopard-gf16-single-32       19590423       15888578       -18.90%
BenchmarkDecode1K/16384+16384/leopard-gf16-32            65490106       60630937       -7.42%
BenchmarkDecode1K/16384+16384/leopard-gf16-single-32     43015162       40455732       -5.95%
BenchmarkDecode1K/32768+32768/leopard-gf16-32            137665712      121400489      -11.82%
BenchmarkDecode1K/32768+32768/leopard-gf16-single-32     89620746       84439785       -5.78%
BenchmarkEncodeLeopard/83840-32                          38754790       34248175       -11.63%
BenchmarkEncode10x2x10000-32                             3146           3104           -1.34%
BenchmarkEncode100x20x10000-32                           156083         132442         -15.15%
BenchmarkEncode17x3x1M-32                                266619         549537         +106.11%
BenchmarkEncode10x4x16M-32                               8589080        10118010       +17.80%
BenchmarkEncode5x2x1M-32                                 69710          67893          -2.61%
BenchmarkEncode10x2x1M-32                                105670         107245         +1.49%
BenchmarkEncode10x4x1M-32                                163118         176928         +8.47%
BenchmarkEncode50x20x1M-32                               3118381        14567976       +367.16%
BenchmarkEncodeLeopard50x20x1M-32                        10595922       11908254       +12.39%
BenchmarkEncode17x3x16M-32                               10172963       11955679       +17.52%
BenchmarkEncode_8x4x8M-32                                3728191        3439514        -7.74%
BenchmarkEncode_12x4x12M-32                              6803056        6497932        -4.49%
BenchmarkEncode_16x4x16M-32                              10882620       10934306       +0.47%
BenchmarkEncode_16x4x32M-32                              21758418       21431559       -1.50%
BenchmarkEncode_16x4x64M-32                              45258777       43288619       -4.35%
BenchmarkEncode_8x5x8M-32                                4385065        4089020        -6.75%
BenchmarkEncode_8x6x8M-32                                4826290        4502941        -6.70%
BenchmarkEncode_8x7x8M-32                                5403098        4968727        -8.04%
BenchmarkEncode_8x9x8M-32                                6302487        6002242        -4.76%
BenchmarkEncode_8x10x8M-32                               6913882        6637816        -3.99%
BenchmarkEncode_8x11x8M-32                               7326503        7059232        -3.65%
BenchmarkEncode_8x8x05M-32                               144080         311435         +116.15%
BenchmarkEncode_8x8x1M-32                                291361         274498         -5.79%
BenchmarkEncode_8x8x8M-32                                5802282        5384482        -7.20%
BenchmarkEncode_8x8x32M-32                               24160839       23571850       -2.44%
BenchmarkEncode_24x8x24M-32                              36720647       30780145       -16.18%
BenchmarkEncode_24x8x48M-32                              63394650       63509811       +0.18%
BenchmarkVerify800x200/64-32                             40375          38159          -5.49%
BenchmarkVerify800x200/256-32                            83771          78225          -6.62%
BenchmarkVerify800x200/1024-32                           271752         240426         -11.53%
BenchmarkVerify800x200/4096-32                           1082853        929348         -14.18%
BenchmarkVerify800x200/16384-32                          5725732        4615986        -19.38%
BenchmarkVerify800x200/65536-32                          32934571       26172560       -20.53%
BenchmarkVerify800x200/262144-32                         155253386      126563067      -18.48%
BenchmarkVerify800x200/1048576-32                        561659750      490361633      -12.69%
BenchmarkVerify10x2x10000-32                             5904           5500           -6.84%
BenchmarkVerify50x5x100000-32                            211479         171168         -19.06%
BenchmarkVerify10x2x1M-32                                504104         425526         -15.59%
BenchmarkVerify5x2x1M-32                                 395883         335088         -15.36%
BenchmarkVerify10x4x1M-32                                1089718        938154         -13.91%
BenchmarkVerify50x20x1M-32                               7889230        8431484        +6.87%
BenchmarkVerify10x4x16M-32                               23219342       17756713       -23.53%
BenchmarkReconstruct10x2x10000-32                        3222           3117           -3.26%
BenchmarkReconstruct800x200/64-32                        1179363        1100846        -6.66%
BenchmarkReconstruct800x200/256-32                       1294499        1214204        -6.20%
BenchmarkReconstruct800x200/1024-32                      1881303        1737606        -7.64%
BenchmarkReconstruct800x200/4096-32                      4420506        3860332        -12.67%
BenchmarkReconstruct800x200/16384-32                     27171535       22810506       -16.05%
BenchmarkReconstruct800x200/65536-32                     119789411      108652470      -9.30%
BenchmarkReconstruct800x200/262144-32                    531271300      505694550      -4.81%
BenchmarkReconstruct800x200/1048576-32                   2918642700     2408563700     -17.48%
BenchmarkReconstruct50x5x50000-32                        148109         134759         -9.01%
BenchmarkReconstruct10x2x1M-32                           185262         186595         +0.72%
BenchmarkReconstruct5x2x1M-32                            129164         126902         -1.75%
BenchmarkReconstruct10x4x1M-32                           276915         268351         -3.09%
BenchmarkReconstruct50x20x1M-32                          3541594        6128024        +73.03%
BenchmarkReconstructLeopard50x20x1M-32                   30241625       29009541       -4.07%
BenchmarkReconstruct10x4x16M-32                          9437829        8883464        -5.87%
BenchmarkReconstructData10x2x10000-32                    3060           2988           -2.35%
BenchmarkReconstructData800x200/64-32                    1147537        1091256        -4.90%
BenchmarkReconstructData800x200/256-32                   1263484        1212899        -4.00%
BenchmarkReconstructData800x200/1024-32                  1839242        1725180        -6.20%
BenchmarkReconstructData800x200/4096-32                  4362936        3849980        -11.76%
BenchmarkReconstructData800x200/16384-32                 26168836       22671938       -13.36%
BenchmarkReconstructData800x200/65536-32                 120528889      107130140      -11.12%
BenchmarkReconstructData800x200/262144-32                563569000      475725700      -15.59%
BenchmarkReconstructData800x200/1048576-32               2615405500     2429356800     -7.11%
BenchmarkReconstructData50x5x50000-32                    143226         129456         -9.61%
BenchmarkReconstructData10x2x1M-32                       172211         237621         +37.98%
BenchmarkReconstructData5x2x1M-32                        112454         100520         -10.61%
BenchmarkReconstructData10x4x1M-32                       228400         304907         +33.50%
BenchmarkReconstructData50x20x1M-32                      2314132        4094795        +76.95%
BenchmarkReconstructData10x4x16M-32                      7298997        6668680        -8.64%
BenchmarkReconstructP10x2x10000-32                       826            829            +0.35%
BenchmarkReconstructP10x5x20000-32                       1474           1574           +6.78%
BenchmarkSplit10x4x160M-32                               5724818        5133214        -10.33%
BenchmarkSplit5x2x5M-32                                  185120         158360         -14.46%
BenchmarkSplit10x2x1M-32                                 30804          26726          -13.24%
BenchmarkSplit10x4x10M-32                                362056         329514         -8.99%
BenchmarkSplit50x20x50M-32                               1822737        1658791        -8.99%
BenchmarkSplit17x3x272M-32                               4211272        3692685        -12.31%
BenchmarkParallel_8x8x64K-32                             6069           9427           +55.33%
BenchmarkParallel_8x8x05M-32                             363357         355407         -2.19%
BenchmarkParallel_20x10x05M-32                           597304         595410         -0.32%
BenchmarkParallel_8x8x1M-32                              714469         709469         -0.70%
BenchmarkParallel_8x8x8M-32                              5690194        5753258        +1.11%
BenchmarkParallel_8x8x32M-32                             22776537       22795771       +0.08%
BenchmarkParallel_8x3x1M-32                              404962         405967         +0.25%
BenchmarkParallel_8x4x1M-32                              466195         467154         +0.21%
BenchmarkParallel_8x5x1M-32                              528804         529565         +0.14%
BenchmarkStreamEncode10x2x10000-32                       5614           5579           -0.62%
BenchmarkStreamEncode100x20x10000-32                     270235         254176         -5.94%
BenchmarkStreamEncode17x3x1M-32                          1517849        1472684        -2.98%
BenchmarkStreamEncode10x4x16M-32                         19262797       18293434       -5.03%
BenchmarkStreamEncode5x2x1M-32                           417544         403906         -3.27%
BenchmarkStreamEncode10x2x1M-32                          823367         821781         -0.19%
BenchmarkStreamEncode10x4x1M-32                          884722         856973         -3.14%
BenchmarkStreamEncode50x20x1M-32                         6553097        12518249       +91.03%
BenchmarkStreamEncode17x3x16M-32                         28583679       27318121       -4.43%
BenchmarkStreamVerify10x2x10000-32                       8255           8086           -2.05%
BenchmarkStreamVerify50x5x50000-32                       659940         636418         -3.56%
BenchmarkStreamVerify10x2x1M-32                          1239043        1195332        -3.53%
BenchmarkStreamVerify5x2x1M-32                           765149         737183         -3.65%
BenchmarkStreamVerify10x4x1M-32                          1617370        1584146        -2.05%
BenchmarkStreamVerify50x20x1M-32                         9435350        11808476       +25.15%
BenchmarkStreamVerify10x4x16M-32                         29359025       27062443       -7.82%

benchmark                                                old MB/s      new MB/s      speedup
BenchmarkGalois128K-32                                   57379.29      58153.95      1.01x
BenchmarkGalois1M-32                                     47824.71      55066.66      1.15x
BenchmarkGaloisXor128K-32                                46639.10      47116.53      1.01x
BenchmarkGaloisXor1M-32                                  43287.93      46160.20      1.07x
BenchmarkEncode2x1x1M-32                                 80724.26      94994.70      1.18x
BenchmarkEncode800x200/64-32                             2206.35       2278.42       1.03x
BenchmarkEncode800x200/256-32                            3887.13       3953.86       1.02x
BenchmarkEncode800x200/1024-32                           4931.12       5021.94       1.02x
BenchmarkEncode800x200/4096-32                           5078.24       5185.38       1.02x
BenchmarkEncode800x200/16384-32                          4006.88       4442.00       1.11x
BenchmarkEncode800x200/65536-32                          2405.70       2718.79       1.13x
BenchmarkEncode800x200/262144-32                         2173.51       2306.62       1.06x
BenchmarkEncode800x200/1048576-32                        2323.13       2492.33       1.07x
BenchmarkEncode1K/4+4/cauchy-32                          24475.48      23772.24      0.97x
BenchmarkEncode1K/4+4/leopard-gf8-32                     12789.95      12961.00      1.01x
BenchmarkEncode1K/4+4/leopard-gf16-32                    18014.46      18797.28      1.04x
BenchmarkEncode1K/8+8/cauchy-32                          14909.02      15154.24      1.02x
BenchmarkEncode1K/8+8/leopard-gf8-32                     8946.12       9142.99       1.02x
BenchmarkEncode1K/8+8/leopard-gf16-32                    10190.76      10330.89      1.01x
BenchmarkEncode1K/16+16/cauchy-32                        7549.49       7494.83       0.99x
BenchmarkEncode1K/16+16/leopard-gf8-32                   9840.18       9990.30       1.02x
BenchmarkEncode1K/16+16/leopard-gf16-32                  12423.98      12533.28      1.01x
BenchmarkEncode1K/32+32/cauchy-32                        3797.58       3767.08       0.99x
BenchmarkEncode1K/32+32/leopard-gf8-32                   6654.34       6810.61       1.02x
BenchmarkEncode1K/32+32/leopard-gf16-32                  7361.49       7442.01       1.01x
BenchmarkEncode1K/64+64/cauchy-32                        1908.67       1916.98       1.00x
BenchmarkEncode1K/64+64/leopard-gf8-32                   7169.14       7285.10       1.02x
BenchmarkEncode1K/64+64/leopard-gf16-32                  8424.57       8433.80       1.00x
BenchmarkEncode1K/128+128/cauchy-32                      967.74        968.94        1.00x
BenchmarkEncode1K/128+128/leopard-gf8-32                 5285.04       5364.05       1.01x
BenchmarkEncode1K/128+128/leopard-gf16-32                5679.29       5731.83       1.01x
BenchmarkEncode1K/256+256/leopard-gf16-32                6221.66       6292.60       1.01x
BenchmarkEncode1K/512+512/leopard-gf16-32                4456.75       4524.11       1.02x
BenchmarkEncode1K/1024+1024/leopard-gf16-32              4807.28       4866.02       1.01x
BenchmarkEncode1K/2048+2048/leopard-gf16-32              3416.49       3784.32       1.11x
BenchmarkEncode1K/4096+4096/leopard-gf16-32              3260.03       3689.60       1.13x
BenchmarkEncode1K/8192+8192/leopard-gf16-32              2274.19       2603.83       1.14x
BenchmarkEncode1K/16384+16384/leopard-gf16-32            1673.88       1925.51       1.15x
BenchmarkEncode1K/32768+32768/leopard-gf16-32            1252.73       1336.23       1.07x
BenchmarkDecode1K/4+4/cauchy-32                          3792.93       3622.19       0.95x
BenchmarkDecode1K/4+4/cauchy-inv-32                      5983.86       5606.62       0.94x
BenchmarkDecode1K/4+4/cauchy-single-32                   6300.83       6280.54       1.00x
BenchmarkDecode1K/4+4/cauchy-single-inv-32               13551.36      12940.92      0.95x
BenchmarkDecode1K/4+4/leopard-gf8-32                     1915.14       1961.54       1.02x
BenchmarkDecode1K/4+4/leopard-gf8-inv-32                 3337.34       3516.48       1.05x
BenchmarkDecode1K/4+4/leopard-gf8-single-32              2105.18       2184.65       1.04x
BenchmarkDecode1K/4+4/leopard-gf8-single-inv-32          4150.58       4281.05       1.03x
BenchmarkDecode1K/4+4/leopard-gf16-32                    10.31         10.34         1.00x
BenchmarkDecode1K/4+4/leopard-gf16-single-32             10.34         10.33         1.00x
BenchmarkDecode1K/8+8/cauchy-32                          3009.11       2899.32       0.96x
BenchmarkDecode1K/8+8/cauchy-inv-32                      5611.04       5390.93       0.96x
BenchmarkDecode1K/8+8/cauchy-single-32                   7155.24       7171.21       1.00x
BenchmarkDecode1K/8+8/cauchy-single-inv-32               20782.54      20118.64      0.97x
BenchmarkDecode1K/8+8/leopard-gf8-32                     2253.61       2296.49       1.02x
BenchmarkDecode1K/8+8/leopard-gf8-inv-32                 3015.54       3177.67       1.05x
BenchmarkDecode1K/8+8/leopard-gf8-single-32              2656.40       2675.83       1.01x
BenchmarkDecode1K/8+8/leopard-gf8-single-inv-32          3808.73       3753.53       0.99x
BenchmarkDecode1K/8+8/leopard-gf16-32                    20.68         20.81         1.01x
BenchmarkDecode1K/8+8/leopard-gf16-single-32             20.47         20.66         1.01x
BenchmarkDecode1K/16+16/cauchy-32                        1524.69       1512.59       0.99x
BenchmarkDecode1K/16+16/cauchy-inv-32                    4198.25       4025.31       0.96x
BenchmarkDecode1K/16+16/cauchy-single-32                 6330.39       6419.31       1.01x
BenchmarkDecode1K/16+16/cauchy-single-inv-32             27068.69      27452.92      1.01x
BenchmarkDecode1K/16+16/leopard-gf8-32                   2040.15       2072.28       1.02x
BenchmarkDecode1K/16+16/leopard-gf8-inv-32               2223.54       2392.34       1.08x
BenchmarkDecode1K/16+16/leopard-gf8-single-32            2357.57       2446.68       1.04x
BenchmarkDecode1K/16+16/leopard-gf8-single-inv-32        2761.64       2852.71       1.03x
BenchmarkDecode1K/16+16/leopard-gf16-32                  40.66         41.22         1.01x
BenchmarkDecode1K/16+16/leopard-gf16-single-32           40.55         40.97         1.01x
BenchmarkDecode1K/32+32/cauchy-32                        557.06        559.72        1.00x
BenchmarkDecode1K/32+32/cauchy-inv-32                    2733.51       2679.96       0.98x
BenchmarkDecode1K/32+32/cauchy-single-32                 4599.08       4728.23       1.03x
BenchmarkDecode1K/32+32/cauchy-single-inv-32             32476.37      32874.63      1.01x
BenchmarkDecode1K/32+32/leopard-gf8-32                   2055.34       2035.06       0.99x
BenchmarkDecode1K/32+32/leopard-gf8-inv-32               2116.85       2165.02       1.02x
BenchmarkDecode1K/32+32/leopard-gf8-single-32            3162.09       2917.71       0.92x
BenchmarkDecode1K/32+32/leopard-gf8-single-inv-32        3163.38       2900.87       0.92x
BenchmarkDecode1K/32+32/leopard-gf16-32                  79.63         81.04         1.02x
BenchmarkDecode1K/32+32/leopard-gf16-single-32           80.24         80.96         1.01x
BenchmarkDecode1K/64+64/cauchy-32                        161.19        162.89        1.01x
BenchmarkDecode1K/64+64/cauchy-inv-32                    1618.58       1589.00       0.98x
BenchmarkDecode1K/64+64/cauchy-single-32                 2850.67       2940.14       1.03x
BenchmarkDecode1K/64+64/cauchy-single-inv-32             38190.74      39891.62      1.04x
BenchmarkDecode1K/64+64/leopard-gf8-32                   1673.05       1762.57       1.05x
BenchmarkDecode1K/64+64/leopard-gf8-inv-32               1786.44       2076.54       1.16x
BenchmarkDecode1K/64+64/leopard-gf8-single-32            2614.63       2981.72       1.14x
BenchmarkDecode1K/64+64/leopard-gf8-single-inv-32        2554.62       3180.75       1.25x
BenchmarkDecode1K/64+64/leopard-gf16-32                  151.70        154.88        1.02x
BenchmarkDecode1K/64+64/leopard-gf16-single-32           154.18        157.81        1.02x
BenchmarkDecode1K/128+128/cauchy-32                      44.21         44.43         1.00x
BenchmarkDecode1K/128+128/cauchy-inv-32                  862.07        855.85        0.99x
BenchmarkDecode1K/128+128/cauchy-single-32               1597.56       1628.49       1.02x
BenchmarkDecode1K/128+128/cauchy-single-inv-32           44812.17      46602.36      1.04x
BenchmarkDecode1K/128+128/leopard-gf8-32                 1654.65       1796.62       1.09x
BenchmarkDecode1K/128+128/leopard-gf8-inv-32             1721.13       1851.02       1.08x
BenchmarkDecode1K/128+128/leopard-gf8-single-32          2335.01       2783.74       1.19x
BenchmarkDecode1K/128+128/leopard-gf8-single-inv-32      2505.05       2722.86       1.09x
BenchmarkDecode1K/128+128/leopard-gf16-32                282.54        284.91        1.01x
BenchmarkDecode1K/128+128/leopard-gf16-single-32         293.55        295.88        1.01x
BenchmarkDecode1K/256+256/leopard-gf16-32                462.96        474.14        1.02x
BenchmarkDecode1K/256+256/leopard-gf16-single-32         515.05        522.54        1.01x
BenchmarkDecode1K/512+512/leopard-gf16-32                701.27        719.41        1.03x
BenchmarkDecode1K/512+512/leopard-gf16-single-32         821.71        845.65        1.03x
BenchmarkDecode1K/1024+1024/leopard-gf16-32              835.08        890.46        1.07x
BenchmarkDecode1K/1024+1024/leopard-gf16-single-32       1088.37       1174.14       1.08x
BenchmarkDecode1K/2048+2048/leopard-gf16-32              916.84        1035.28       1.13x
BenchmarkDecode1K/2048+2048/leopard-gf16-single-32       1231.99       1428.13       1.16x
BenchmarkDecode1K/4096+4096/leopard-gf16-32              845.83        894.18        1.06x
BenchmarkDecode1K/4096+4096/leopard-gf16-single-32       1127.52       1344.92       1.19x
BenchmarkDecode1K/8192+8192/leopard-gf16-32              617.40        758.29        1.23x
BenchmarkDecode1K/8192+8192/leopard-gf16-single-32       856.40        1055.93       1.23x
BenchmarkDecode1K/16384+16384/leopard-gf16-32            512.36        553.42        1.08x
BenchmarkDecode1K/16384+16384/leopard-gf16-single-32     780.06        829.41        1.06x
BenchmarkDecode1K/32768+32768/leopard-gf16-32            487.48        552.79        1.13x
BenchmarkDecode1K/32768+32768/leopard-gf16-single-32     748.81        794.75        1.06x
BenchmarkEncodeLeopard/83840-32                          2163.35       2448.01       1.13x
BenchmarkEncode10x2x10000-32                             38138.59      38660.50      1.01x
BenchmarkEncode100x20x10000-32                           7688.19       9060.59       1.18x
BenchmarkEncode17x3x1M-32                                78657.27      38162.16      0.49x
BenchmarkEncode10x4x16M-32                               27346.47      23214.15      0.85x
BenchmarkEncode5x2x1M-32                                 105293.15     108112.25     1.03x
BenchmarkEncode10x2x1M-32                                119077.13     117328.97     0.99x
BenchmarkEncode10x4x1M-32                                89996.44      82971.98      0.92x
BenchmarkEncode50x20x1M-32                               23537.96      5038.47       0.21x
BenchmarkEncodeLeopard50x20x1M-32                        6927.22       6163.82       0.89x
BenchmarkEncode17x3x16M-32                               32983.93      28065.68      0.85x
BenchmarkEncode_8x4x8M-32                                27000.57      29266.72      1.08x
BenchmarkEncode_12x4x12M-32                              29593.55      30983.18      1.05x
BenchmarkEncode_16x4x16M-32                              30833.05      30687.30      1.00x
BenchmarkEncode_16x4x32M-32                              30842.71      31313.10      1.02x
BenchmarkEncode_16x4x64M-32                              29655.62      31005.32      1.05x
BenchmarkEncode_8x5x8M-32                                24868.94      26669.44      1.07x
BenchmarkEncode_8x6x8M-32                                24333.50      26080.85      1.07x
BenchmarkEncode_8x7x8M-32                                23288.33      25324.21      1.09x
BenchmarkEncode_8x9x8M-32                                22626.99      23758.85      1.05x
BenchmarkEncode_8x10x8M-32                               21839.39      22747.68      1.04x
BenchmarkEncode_8x11x8M-32                               21754.38      22578.03      1.04x
BenchmarkEncode_8x8x05M-32                               58222.07      26935.36      0.46x
BenchmarkEncode_8x8x1M-32                                57582.31      61119.59      1.06x
BenchmarkEncode_8x8x8M-32                                23131.88      24926.77      1.08x
BenchmarkEncode_8x8x32M-32                               22220.71      22775.93      1.02x
BenchmarkEncode_24x8x24M-32                              21930.61      26163.18      1.19x
BenchmarkEncode_24x8x48M-32                              25406.13      25360.06      1.00x
BenchmarkVerify800x200/64-32                             1585.15       1677.20       1.06x
BenchmarkVerify800x200/256-32                            3055.95       3272.60       1.07x
BenchmarkVerify800x200/1024-32                           3768.14       4259.11       1.13x
BenchmarkVerify800x200/4096-32                           3782.60       4407.39       1.17x
BenchmarkVerify800x200/16384-32                          2861.47       3549.40       1.24x
BenchmarkVerify800x200/65536-32                          1989.88       2504.00       1.26x
BenchmarkVerify800x200/262144-32                         1688.49       2071.25       1.23x
BenchmarkVerify800x200/1048576-32                        1866.92       2138.37       1.15x
BenchmarkVerify10x2x10000-32                             20326.26      21818.19      1.07x
BenchmarkVerify50x5x100000-32                            26007.30      32132.24      1.24x
BenchmarkVerify10x2x1M-32                                24960.93      29570.24      1.18x
BenchmarkVerify5x2x1M-32                                 18540.92      21904.78      1.18x
BenchmarkVerify10x4x1M-32                                13471.44      15647.82      1.16x
BenchmarkVerify50x20x1M-32                               9303.86       8705.50       0.94x
BenchmarkVerify10x4x16M-32                               10115.75      13227.73      1.31x
BenchmarkReconstruct10x2x10000-32                        37249.17      38497.52      1.03x
BenchmarkReconstruct800x200/64-32                        54.27         58.14         1.07x
BenchmarkReconstruct800x200/256-32                       197.76        210.84        1.07x
BenchmarkReconstruct800x200/1024-32                      544.30        589.32        1.08x
BenchmarkReconstruct800x200/4096-32                      926.59        1061.05       1.15x
BenchmarkReconstruct800x200/16384-32                     602.98        718.27        1.19x
BenchmarkReconstruct800x200/65536-32                     547.09        603.17        1.10x
BenchmarkReconstruct800x200/262144-32                    493.43        518.38        1.05x
BenchmarkReconstruct800x200/1048576-32                   359.27        435.35        1.21x
BenchmarkReconstruct50x5x50000-32                        37134.76      40813.72      1.10x
BenchmarkReconstruct10x2x1M-32                           67919.63      67434.39      0.99x
BenchmarkReconstruct5x2x1M-32                            56827.25      57840.28      1.02x
BenchmarkReconstruct10x4x1M-32                           53012.91      54704.77      1.03x
BenchmarkReconstruct50x20x1M-32                          20725.22      11977.81      0.58x
BenchmarkReconstructLeopard50x20x1M-32                   2427.13       2530.21       1.04x
BenchmarkReconstruct10x4x16M-32                          24887.19      26440.25      1.06x
BenchmarkReconstructData10x2x10000-32                    39218.24      40158.36      1.02x
BenchmarkReconstructData800x200/64-32                    55.77         58.65         1.05x
BenchmarkReconstructData800x200/256-32                   202.61        211.06        1.04x
BenchmarkReconstructData800x200/1024-32                  556.75        593.56        1.07x
BenchmarkReconstructData800x200/4096-32                  938.82        1063.90       1.13x
BenchmarkReconstructData800x200/16384-32                 626.09        722.66        1.15x
BenchmarkReconstructData800x200/65536-32                 543.74        611.74        1.13x
BenchmarkReconstructData800x200/262144-32                465.15        551.04        1.18x
BenchmarkReconstructData800x200/1048576-32               400.92        431.63        1.08x
BenchmarkReconstructData50x5x50000-32                    38400.98      42485.55      1.11x
BenchmarkReconstructData10x2x1M-32                       73066.67      52953.80      0.72x
BenchmarkReconstructData5x2x1M-32                        65271.49      73020.37      1.12x
BenchmarkReconstructData10x4x1M-32                       64273.61      48146.03      0.75x
BenchmarkReconstructData50x20x1M-32                      31718.30      17925.27      0.57x
BenchmarkReconstructData10x4x16M-32                      32179.90      35221.52      1.09x
BenchmarkReconstructP10x2x10000-32                       145267.14     144760.51     1.00x
BenchmarkReconstructP10x5x20000-32                       203564.07     190537.34     0.94x
BenchmarkParallel_8x8x64K-32                             172764.77     111229.29     0.64x
BenchmarkParallel_8x8x05M-32                             23086.39      23602.81      1.02x
BenchmarkParallel_20x10x05M-32                           26332.73      26416.51      1.00x
BenchmarkParallel_8x8x1M-32                              23482.07      23647.57      1.01x
BenchmarkParallel_8x8x8M-32                              23587.55      23329.00      0.99x
BenchmarkParallel_8x8x32M-32                             23571.23      23551.34      1.00x
BenchmarkParallel_8x3x1M-32                              28482.50      28411.98      1.00x
BenchmarkParallel_8x4x1M-32                              26990.67      26935.25      1.00x
BenchmarkParallel_8x5x1M-32                              25777.97      25740.90      1.00x
BenchmarkStreamEncode10x2x10000-32                       17811.48      17925.46      1.01x
BenchmarkStreamEncode100x20x10000-32                     3700.48       3934.28       1.06x
BenchmarkStreamEncode17x3x1M-32                          11744.12      12104.29      1.03x
BenchmarkStreamEncode10x4x16M-32                         8709.65       9171.17       1.05x
BenchmarkStreamEncode5x2x1M-32                           12556.48      12980.44      1.03x
BenchmarkStreamEncode10x2x1M-32                          12735.21      12759.79      1.00x
BenchmarkStreamEncode10x4x1M-32                          11852.03      12235.81      1.03x
BenchmarkStreamEncode50x20x1M-32                         8000.61       4188.19       0.52x
BenchmarkStreamEncode17x3x16M-32                         9978.17       10440.42      1.05x
BenchmarkStreamVerify10x2x10000-32                       12114.06      12367.66      1.02x
BenchmarkStreamVerify50x5x50000-32                       7576.44       7856.47       1.04x
BenchmarkStreamVerify10x2x1M-32                          8462.79       8772.25       1.04x
BenchmarkStreamVerify5x2x1M-32                           6852.10       7112.05       1.04x
BenchmarkStreamVerify10x4x1M-32                          6483.22       6619.19       1.02x
BenchmarkStreamVerify50x20x1M-32                         5556.64       4439.93       0.80x
BenchmarkStreamVerify10x4x16M-32                         5714.50       6199.45       1.08x 

For AMD64 aligned inputs can make a difference in speed.

This is an example of the speed difference when inputs are unaligned/aligned:

```
BenchmarkEncode100x20x10000-32    	    7058	    172648 ns/op	6950.57 MB/s
BenchmarkEncode100x20x10000-32    	    8406	    137911 ns/op	8701.24 MB/s
```

To facilitate this the package provides an `AllocAligned(shards, each int) [][]byte`.
This will allocate a number of shards, each with the size `each`.
Each shard will then be aligned to a 64 byte boundary.

Each encoder also has a `AllocAligned(each int) [][]byte` as an extended interface which will return the same,
but with the shard count configured in the encoder.

It is not possible to re-aligned already allocated slices, for example when using `Split`.
When it is not possible to write to aligned shards, you should not copy, since that is most likely much slower than using them as-is.
@klauspost klauspost merged commit e4bf561 into master Jan 12, 2023
@klauspost klauspost deleted the align-allocations branch January 12, 2023 09:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant