Align allocations #230

klauspost · 2023-01-08T19:27:47Z

For AMD64 aligned inputs can make a big speed difference.

This is an example of the speed difference when inputs are unaligned/aligned:

BenchmarkEncode100x20x10000-32    	    7058	    172648 ns/op	6950.57 MB/s
BenchmarkEncode100x20x10000-32    	    8406	    137911 ns/op	8701.24 MB/s

This is mostly the case when dealing with odd-sized shards.

To facilitate this the package provides an AllocAligned(shards, each int) [][]byte.
This will allocate a number of shards, each with the size each.
Each shard will then be aligned to a 64 byte boundary.

Each encoder also has a AllocAligned(each int) [][]byte as an extended interface which will return the same,
but with the shard count configured in the encoder.

It is not possible to re-aligned already allocated slices, for example when using Split.
When it is not possible to write to aligned shards, you should not copy to them.

Full (but rather noisy) benchmark:

benchmark                                                old ns/op      new ns/op      delta
BenchmarkGalois128K-32                                   2284           2254           -1.31%
BenchmarkGalois1M-32                                     21925          19042          -13.15%
BenchmarkGaloisXor128K-32                                2810           2782           -1.00%
BenchmarkGaloisXor1M-32                                  24223          22716          -6.22%
BenchmarkEncode2x1x1M-32                                 38969          33115          -15.02%
BenchmarkEncode800x200/64-32                             29007          28090          -3.16%
BenchmarkEncode800x200/256-32                            65858          64747          -1.69%
BenchmarkEncode800x200/1024-32                           207661         203905         -1.81%
BenchmarkEncode800x200/4096-32                           806579         789913         -2.07%
BenchmarkEncode800x200/16384-32                          4088967        3688426        -9.80%
BenchmarkEncode800x200/65536-32                          27241951       24104804       -11.52%
BenchmarkEncode800x200/262144-32                         120608789      113648633      -5.77%
BenchmarkEncode800x200/1048576-32                        451364367      420720500      -6.79%
BenchmarkEncode1K/4+4/cauchy-32                          335            345            +2.96%
BenchmarkEncode1K/4+4/leopard-gf8-32                     640            632            -1.33%
BenchmarkEncode1K/4+4/leopard-gf16-32                    455            436            -4.16%
BenchmarkEncode1K/8+8/cauchy-32                          1099           1081           -1.64%
BenchmarkEncode1K/8+8/leopard-gf8-32                     1831           1792           -2.13%
BenchmarkEncode1K/8+8/leopard-gf16-32                    1608           1586           -1.37%
BenchmarkEncode1K/16+16/cauchy-32                        4340           4372           +0.74%
BenchmarkEncode1K/16+16/leopard-gf8-32                   3330           3280           -1.50%
BenchmarkEncode1K/16+16/leopard-gf16-32                  2637           2614           -0.87%
BenchmarkEncode1K/32+32/cauchy-32                        17257          17397          +0.81%
BenchmarkEncode1K/32+32/leopard-gf8-32                   9849           9623           -2.29%
BenchmarkEncode1K/32+32/leopard-gf16-32                  8903           8806           -1.09%
BenchmarkEncode1K/64+64/cauchy-32                        68672          68374          -0.43%
BenchmarkEncode1K/64+64/leopard-gf8-32                   18283          17992          -1.59%
BenchmarkEncode1K/64+64/leopard-gf16-32                  15558          15541          -0.11%
BenchmarkEncode1K/128+128/cauchy-32                      270881         270547         -0.12%
BenchmarkEncode1K/128+128/leopard-gf8-32                 49601          48871          -1.47%
BenchmarkEncode1K/128+128/leopard-gf16-32                46158          45735          -0.92%
BenchmarkEncode1K/256+256/leopard-gf16-32                84268          83318          -1.13%
BenchmarkEncode1K/512+512/leopard-gf16-32                235278         231775         -1.49%
BenchmarkEncode1K/1024+1024/leopard-gf16-32              436245         430979         -1.21%
BenchmarkEncode1K/2048+2048/leopard-gf16-32              1227665        1108337        -9.72%
BenchmarkEncode1K/4096+4096/leopard-gf16-32              2573166        2273580        -11.64%
BenchmarkEncode1K/8192+8192/leopard-gf16-32              7377235        6443287        -12.66%
BenchmarkEncode1K/16384+16384/leopard-gf16-32            20045895       17426286       -13.07%
BenchmarkEncode1K/32768+32768/leopard-gf16-32            53570005       50222577       -6.25%
BenchmarkDecode1K/4+4/cauchy-32                          2160           2262           +4.72%
BenchmarkDecode1K/4+4/cauchy-inv-32                      1369           1461           +6.72%
BenchmarkDecode1K/4+4/cauchy-single-32                   1300           1304           +0.31%
BenchmarkDecode1K/4+4/cauchy-single-inv-32               604            633            +4.71%
BenchmarkDecode1K/4+4/leopard-gf8-32                     4277           4176           -2.36%
BenchmarkDecode1K/4+4/leopard-gf8-inv-32                 2455           2330           -5.09%
BenchmarkDecode1K/4+4/leopard-gf8-single-32              3891           3750           -3.62%
BenchmarkDecode1K/4+4/leopard-gf8-single-inv-32          1974           1914           -3.04%
BenchmarkDecode1K/4+4/leopard-gf16-32                    794838         792366         -0.31%
BenchmarkDecode1K/4+4/leopard-gf16-single-32             791991         793335         +0.17%
BenchmarkDecode1K/8+8/cauchy-32                          5445           5651           +3.78%
BenchmarkDecode1K/8+8/cauchy-inv-32                      2920           3039           +4.08%
BenchmarkDecode1K/8+8/cauchy-single-32                   2290           2285           -0.22%
BenchmarkDecode1K/8+8/cauchy-single-inv-32               788            814            +3.30%
BenchmarkDecode1K/8+8/leopard-gf8-32                     7270           7134           -1.87%
BenchmarkDecode1K/8+8/leopard-gf8-inv-32                 5433           5156           -5.10%
BenchmarkDecode1K/8+8/leopard-gf8-single-32              6168           6123           -0.73%
BenchmarkDecode1K/8+8/leopard-gf8-single-inv-32          4302           4365           +1.46%
BenchmarkDecode1K/8+8/leopard-gf16-32                    792261         787290         -0.63%
BenchmarkDecode1K/8+8/leopard-gf16-single-32             800258         793031         -0.90%
BenchmarkDecode1K/16+16/cauchy-32                        21492          21664          +0.80%
BenchmarkDecode1K/16+16/cauchy-inv-32                    7805           8140           +4.29%
BenchmarkDecode1K/16+16/cauchy-single-32                 5176           5105           -1.37%
BenchmarkDecode1K/16+16/cauchy-single-inv-32             1211           1194           -1.40%
BenchmarkDecode1K/16+16/leopard-gf8-32                   16062          15813          -1.55%
BenchmarkDecode1K/16+16/leopard-gf8-inv-32               14737          13697          -7.06%
BenchmarkDecode1K/16+16/leopard-gf8-single-32            13899          13393          -3.64%
BenchmarkDecode1K/16+16/leopard-gf8-single-inv-32        11865          11487          -3.19%
BenchmarkDecode1K/16+16/leopard-gf16-32                  805833         794984         -1.35%
BenchmarkDecode1K/16+16/leopard-gf16-single-32           808125         799816         -1.03%
BenchmarkDecode1K/32+32/cauchy-32                        117646         117088         -0.47%
BenchmarkDecode1K/32+32/cauchy-inv-32                    23975          24454          +2.00%
BenchmarkDecode1K/32+32/cauchy-single-32                 14250          13861          -2.73%
BenchmarkDecode1K/32+32/cauchy-single-inv-32             2018           1994           -1.19%
BenchmarkDecode1K/32+32/leopard-gf8-32                   31886          32203          +0.99%
BenchmarkDecode1K/32+32/leopard-gf8-inv-32               30959          30270          -2.23%
BenchmarkDecode1K/32+32/leopard-gf8-single-32            20726          22461          +8.37%
BenchmarkDecode1K/32+32/leopard-gf8-single-inv-32        20717          22592          +9.05%
BenchmarkDecode1K/32+32/leopard-gf16-32                  823015         808697         -1.74%
BenchmarkDecode1K/32+32/leopard-gf16-single-32           816765         809475         -0.89%
BenchmarkDecode1K/64+64/cauchy-32                        813174         804646         -1.05%
BenchmarkDecode1K/64+64/cauchy-inv-32                    80979          82487          +1.86%
BenchmarkDecode1K/64+64/cauchy-single-32                 45979          44580          -3.04%
BenchmarkDecode1K/64+64/cauchy-single-inv-32             3432           3286           -4.25%
BenchmarkDecode1K/64+64/leopard-gf8-32                   78343          74364          -5.08%
BenchmarkDecode1K/64+64/leopard-gf8-inv-32               73370          63120          -13.97%
BenchmarkDecode1K/64+64/leopard-gf8-single-32            50130          43959          -12.31%
BenchmarkDecode1K/64+64/leopard-gf8-single-inv-32        51308          41208          -19.69%
BenchmarkDecode1K/64+64/leopard-gf16-32                  864012         846280         -2.05%
BenchmarkDecode1K/64+64/leopard-gf16-single-32           850149         830551         -2.31%
BenchmarkDecode1K/128+128/cauchy-32                      5929095        5900026        -0.49%
BenchmarkDecode1K/128+128/cauchy-inv-32                  304087         306298         +0.73%
BenchmarkDecode1K/128+128/cauchy-single-32               164090         160974         -1.90%
BenchmarkDecode1K/128+128/cauchy-single-inv-32           5850           5625           -3.85%
BenchmarkDecode1K/128+128/leopard-gf8-32                 158429         145910         -7.90%
BenchmarkDecode1K/128+128/leopard-gf8-inv-32             152309         141621         -7.02%
BenchmarkDecode1K/128+128/leopard-gf8-single-32          112267         94170          -16.12%
BenchmarkDecode1K/128+128/leopard-gf8-single-inv-32      104646         96275          -8.00%
BenchmarkDecode1K/128+128/leopard-gf16-32                927823         920083         -0.83%
BenchmarkDecode1K/128+128/leopard-gf16-single-32         893019         885971         -0.79%
BenchmarkDecode1K/256+256/leopard-gf16-32                1132479        1105774        -2.36%
BenchmarkDecode1K/256+256/leopard-gf16-single-32         1017945        1003342        -1.43%
BenchmarkDecode1K/512+512/leopard-gf16-32                1495247        1457558        -2.52%
BenchmarkDecode1K/512+512/leopard-gf16-single-32         1276089        1239965        -2.83%
BenchmarkDecode1K/1024+1024/leopard-gf16-32              2511310        2355124        -6.22%
BenchmarkDecode1K/1024+1024/leopard-gf16-single-32       1926875        1786114        -7.31%
BenchmarkDecode1K/2048+2048/leopard-gf16-32              4574758        4051357        -11.44%
BenchmarkDecode1K/2048+2048/leopard-gf16-single-32       3404487        2936912        -13.73%
BenchmarkDecode1K/4096+4096/leopard-gf16-32              9917650        9381317        -5.41%
BenchmarkDecode1K/4096+4096/leopard-gf16-single-32       7439868        6237255        -16.16%
BenchmarkDecode1K/8192+8192/leopard-gf16-32              27173871       22125130       -18.58%
BenchmarkDecode1K/8192+8192/leopard-gf16-single-32       19590423       15888578       -18.90%
BenchmarkDecode1K/16384+16384/leopard-gf16-32            65490106       60630937       -7.42%
BenchmarkDecode1K/16384+16384/leopard-gf16-single-32     43015162       40455732       -5.95%
BenchmarkDecode1K/32768+32768/leopard-gf16-32            137665712      121400489      -11.82%
BenchmarkDecode1K/32768+32768/leopard-gf16-single-32     89620746       84439785       -5.78%
BenchmarkEncodeLeopard/83840-32                          38754790       34248175       -11.63%
BenchmarkEncode10x2x10000-32                             3146           3104           -1.34%
BenchmarkEncode100x20x10000-32                           156083         132442         -15.15%
BenchmarkEncode17x3x1M-32                                266619         549537         +106.11%
BenchmarkEncode10x4x16M-32                               8589080        10118010       +17.80%
BenchmarkEncode5x2x1M-32                                 69710          67893          -2.61%
BenchmarkEncode10x2x1M-32                                105670         107245         +1.49%
BenchmarkEncode10x4x1M-32                                163118         176928         +8.47%
BenchmarkEncode50x20x1M-32                               3118381        14567976       +367.16%
BenchmarkEncodeLeopard50x20x1M-32                        10595922       11908254       +12.39%
BenchmarkEncode17x3x16M-32                               10172963       11955679       +17.52%
BenchmarkEncode_8x4x8M-32                                3728191        3439514        -7.74%
BenchmarkEncode_12x4x12M-32                              6803056        6497932        -4.49%
BenchmarkEncode_16x4x16M-32                              10882620       10934306       +0.47%
BenchmarkEncode_16x4x32M-32                              21758418       21431559       -1.50%
BenchmarkEncode_16x4x64M-32                              45258777       43288619       -4.35%
BenchmarkEncode_8x5x8M-32                                4385065        4089020        -6.75%
BenchmarkEncode_8x6x8M-32                                4826290        4502941        -6.70%
BenchmarkEncode_8x7x8M-32                                5403098        4968727        -8.04%
BenchmarkEncode_8x9x8M-32                                6302487        6002242        -4.76%
BenchmarkEncode_8x10x8M-32                               6913882        6637816        -3.99%
BenchmarkEncode_8x11x8M-32                               7326503        7059232        -3.65%
BenchmarkEncode_8x8x05M-32                               144080         311435         +116.15%
BenchmarkEncode_8x8x1M-32                                291361         274498         -5.79%
BenchmarkEncode_8x8x8M-32                                5802282        5384482        -7.20%
BenchmarkEncode_8x8x32M-32                               24160839       23571850       -2.44%
BenchmarkEncode_24x8x24M-32                              36720647       30780145       -16.18%
BenchmarkEncode_24x8x48M-32                              63394650       63509811       +0.18%
BenchmarkVerify800x200/64-32                             40375          38159          -5.49%
BenchmarkVerify800x200/256-32                            83771          78225          -6.62%
BenchmarkVerify800x200/1024-32                           271752         240426         -11.53%
BenchmarkVerify800x200/4096-32                           1082853        929348         -14.18%
BenchmarkVerify800x200/16384-32                          5725732        4615986        -19.38%
BenchmarkVerify800x200/65536-32                          32934571       26172560       -20.53%
BenchmarkVerify800x200/262144-32                         155253386      126563067      -18.48%
BenchmarkVerify800x200/1048576-32                        561659750      490361633      -12.69%
BenchmarkVerify10x2x10000-32                             5904           5500           -6.84%
BenchmarkVerify50x5x100000-32                            211479         171168         -19.06%
BenchmarkVerify10x2x1M-32                                504104         425526         -15.59%
BenchmarkVerify5x2x1M-32                                 395883         335088         -15.36%
BenchmarkVerify10x4x1M-32                                1089718        938154         -13.91%
BenchmarkVerify50x20x1M-32                               7889230        8431484        +6.87%
BenchmarkVerify10x4x16M-32                               23219342       17756713       -23.53%
BenchmarkReconstruct10x2x10000-32                        3222           3117           -3.26%
BenchmarkReconstruct800x200/64-32                        1179363        1100846        -6.66%
BenchmarkReconstruct800x200/256-32                       1294499        1214204        -6.20%
BenchmarkReconstruct800x200/1024-32                      1881303        1737606        -7.64%
BenchmarkReconstruct800x200/4096-32                      4420506        3860332        -12.67%
BenchmarkReconstruct800x200/16384-32                     27171535       22810506       -16.05%
BenchmarkReconstruct800x200/65536-32                     119789411      108652470      -9.30%
BenchmarkReconstruct800x200/262144-32                    531271300      505694550      -4.81%
BenchmarkReconstruct800x200/1048576-32                   2918642700     2408563700     -17.48%
BenchmarkReconstruct50x5x50000-32                        148109         134759         -9.01%
BenchmarkReconstruct10x2x1M-32                           185262         186595         +0.72%
BenchmarkReconstruct5x2x1M-32                            129164         126902         -1.75%
BenchmarkReconstruct10x4x1M-32                           276915         268351         -3.09%
BenchmarkReconstruct50x20x1M-32                          3541594        6128024        +73.03%
BenchmarkReconstructLeopard50x20x1M-32                   30241625       29009541       -4.07%
BenchmarkReconstruct10x4x16M-32                          9437829        8883464        -5.87%
BenchmarkReconstructData10x2x10000-32                    3060           2988           -2.35%
BenchmarkReconstructData800x200/64-32                    1147537        1091256        -4.90%
BenchmarkReconstructData800x200/256-32                   1263484        1212899        -4.00%
BenchmarkReconstructData800x200/1024-32                  1839242        1725180        -6.20%
BenchmarkReconstructData800x200/4096-32                  4362936        3849980        -11.76%
BenchmarkReconstructData800x200/16384-32                 26168836       22671938       -13.36%
BenchmarkReconstructData800x200/65536-32                 120528889      107130140      -11.12%
BenchmarkReconstructData800x200/262144-32                563569000      475725700      -15.59%
BenchmarkReconstructData800x200/1048576-32               2615405500     2429356800     -7.11%
BenchmarkReconstructData50x5x50000-32                    143226         129456         -9.61%
BenchmarkReconstructData10x2x1M-32                       172211         237621         +37.98%
BenchmarkReconstructData5x2x1M-32                        112454         100520         -10.61%
BenchmarkReconstructData10x4x1M-32                       228400         304907         +33.50%
BenchmarkReconstructData50x20x1M-32                      2314132        4094795        +76.95%
BenchmarkReconstructData10x4x16M-32                      7298997        6668680        -8.64%
BenchmarkReconstructP10x2x10000-32                       826            829            +0.35%
BenchmarkReconstructP10x5x20000-32                       1474           1574           +6.78%
BenchmarkSplit10x4x160M-32                               5724818        5133214        -10.33%
BenchmarkSplit5x2x5M-32                                  185120         158360         -14.46%
BenchmarkSplit10x2x1M-32                                 30804          26726          -13.24%
BenchmarkSplit10x4x10M-32                                362056         329514         -8.99%
BenchmarkSplit50x20x50M-32                               1822737        1658791        -8.99%
BenchmarkSplit17x3x272M-32                               4211272        3692685        -12.31%
BenchmarkParallel_8x8x64K-32                             6069           9427           +55.33%
BenchmarkParallel_8x8x05M-32                             363357         355407         -2.19%
BenchmarkParallel_20x10x05M-32                           597304         595410         -0.32%
BenchmarkParallel_8x8x1M-32                              714469         709469         -0.70%
BenchmarkParallel_8x8x8M-32                              5690194        5753258        +1.11%
BenchmarkParallel_8x8x32M-32                             22776537       22795771       +0.08%
BenchmarkParallel_8x3x1M-32                              404962         405967         +0.25%
BenchmarkParallel_8x4x1M-32                              466195         467154         +0.21%
BenchmarkParallel_8x5x1M-32                              528804         529565         +0.14%
BenchmarkStreamEncode10x2x10000-32                       5614           5579           -0.62%
BenchmarkStreamEncode100x20x10000-32                     270235         254176         -5.94%
BenchmarkStreamEncode17x3x1M-32                          1517849        1472684        -2.98%
BenchmarkStreamEncode10x4x16M-32                         19262797       18293434       -5.03%
BenchmarkStreamEncode5x2x1M-32                           417544         403906         -3.27%
BenchmarkStreamEncode10x2x1M-32                          823367         821781         -0.19%
BenchmarkStreamEncode10x4x1M-32                          884722         856973         -3.14%
BenchmarkStreamEncode50x20x1M-32                         6553097        12518249       +91.03%
BenchmarkStreamEncode17x3x16M-32                         28583679       27318121       -4.43%
BenchmarkStreamVerify10x2x10000-32                       8255           8086           -2.05%
BenchmarkStreamVerify50x5x50000-32                       659940         636418         -3.56%
BenchmarkStreamVerify10x2x1M-32                          1239043        1195332        -3.53%
BenchmarkStreamVerify5x2x1M-32                           765149         737183         -3.65%
BenchmarkStreamVerify10x4x1M-32                          1617370        1584146        -2.05%
BenchmarkStreamVerify50x20x1M-32                         9435350        11808476       +25.15%
BenchmarkStreamVerify10x4x16M-32                         29359025       27062443       -7.82%

benchmark                                                old MB/s      new MB/s      speedup
BenchmarkGalois128K-32                                   57379.29      58153.95      1.01x
BenchmarkGalois1M-32                                     47824.71      55066.66      1.15x
BenchmarkGaloisXor128K-32                                46639.10      47116.53      1.01x
BenchmarkGaloisXor1M-32                                  43287.93      46160.20      1.07x
BenchmarkEncode2x1x1M-32                                 80724.26      94994.70      1.18x
BenchmarkEncode800x200/64-32                             2206.35       2278.42       1.03x
BenchmarkEncode800x200/256-32                            3887.13       3953.86       1.02x
BenchmarkEncode800x200/1024-32                           4931.12       5021.94       1.02x
BenchmarkEncode800x200/4096-32                           5078.24       5185.38       1.02x
BenchmarkEncode800x200/16384-32                          4006.88       4442.00       1.11x
BenchmarkEncode800x200/65536-32                          2405.70       2718.79       1.13x
BenchmarkEncode800x200/262144-32                         2173.51       2306.62       1.06x
BenchmarkEncode800x200/1048576-32                        2323.13       2492.33       1.07x
BenchmarkEncode1K/4+4/cauchy-32                          24475.48      23772.24      0.97x
BenchmarkEncode1K/4+4/leopard-gf8-32                     12789.95      12961.00      1.01x
BenchmarkEncode1K/4+4/leopard-gf16-32                    18014.46      18797.28      1.04x
BenchmarkEncode1K/8+8/cauchy-32                          14909.02      15154.24      1.02x
BenchmarkEncode1K/8+8/leopard-gf8-32                     8946.12       9142.99       1.02x
BenchmarkEncode1K/8+8/leopard-gf16-32                    10190.76      10330.89      1.01x
BenchmarkEncode1K/16+16/cauchy-32                        7549.49       7494.83       0.99x
BenchmarkEncode1K/16+16/leopard-gf8-32                   9840.18       9990.30       1.02x
BenchmarkEncode1K/16+16/leopard-gf16-32                  12423.98      12533.28      1.01x
BenchmarkEncode1K/32+32/cauchy-32                        3797.58       3767.08       0.99x
BenchmarkEncode1K/32+32/leopard-gf8-32                   6654.34       6810.61       1.02x
BenchmarkEncode1K/32+32/leopard-gf16-32                  7361.49       7442.01       1.01x
BenchmarkEncode1K/64+64/cauchy-32                        1908.67       1916.98       1.00x
BenchmarkEncode1K/64+64/leopard-gf8-32                   7169.14       7285.10       1.02x
BenchmarkEncode1K/64+64/leopard-gf16-32                  8424.57       8433.80       1.00x
BenchmarkEncode1K/128+128/cauchy-32                      967.74        968.94        1.00x
BenchmarkEncode1K/128+128/leopard-gf8-32                 5285.04       5364.05       1.01x
BenchmarkEncode1K/128+128/leopard-gf16-32                5679.29       5731.83       1.01x
BenchmarkEncode1K/256+256/leopard-gf16-32                6221.66       6292.60       1.01x
BenchmarkEncode1K/512+512/leopard-gf16-32                4456.75       4524.11       1.02x
BenchmarkEncode1K/1024+1024/leopard-gf16-32              4807.28       4866.02       1.01x
BenchmarkEncode1K/2048+2048/leopard-gf16-32              3416.49       3784.32       1.11x
BenchmarkEncode1K/4096+4096/leopard-gf16-32              3260.03       3689.60       1.13x
BenchmarkEncode1K/8192+8192/leopard-gf16-32              2274.19       2603.83       1.14x
BenchmarkEncode1K/16384+16384/leopard-gf16-32            1673.88       1925.51       1.15x
BenchmarkEncode1K/32768+32768/leopard-gf16-32            1252.73       1336.23       1.07x
BenchmarkDecode1K/4+4/cauchy-32                          3792.93       3622.19       0.95x
BenchmarkDecode1K/4+4/cauchy-inv-32                      5983.86       5606.62       0.94x
BenchmarkDecode1K/4+4/cauchy-single-32                   6300.83       6280.54       1.00x
BenchmarkDecode1K/4+4/cauchy-single-inv-32               13551.36      12940.92      0.95x
BenchmarkDecode1K/4+4/leopard-gf8-32                     1915.14       1961.54       1.02x
BenchmarkDecode1K/4+4/leopard-gf8-inv-32                 3337.34       3516.48       1.05x
BenchmarkDecode1K/4+4/leopard-gf8-single-32              2105.18       2184.65       1.04x
BenchmarkDecode1K/4+4/leopard-gf8-single-inv-32          4150.58       4281.05       1.03x
BenchmarkDecode1K/4+4/leopard-gf16-32                    10.31         10.34         1.00x
BenchmarkDecode1K/4+4/leopard-gf16-single-32             10.34         10.33         1.00x
BenchmarkDecode1K/8+8/cauchy-32                          3009.11       2899.32       0.96x
BenchmarkDecode1K/8+8/cauchy-inv-32                      5611.04       5390.93       0.96x
BenchmarkDecode1K/8+8/cauchy-single-32                   7155.24       7171.21       1.00x
BenchmarkDecode1K/8+8/cauchy-single-inv-32               20782.54      20118.64      0.97x
BenchmarkDecode1K/8+8/leopard-gf8-32                     2253.61       2296.49       1.02x
BenchmarkDecode1K/8+8/leopard-gf8-inv-32                 3015.54       3177.67       1.05x
BenchmarkDecode1K/8+8/leopard-gf8-single-32              2656.40       2675.83       1.01x
BenchmarkDecode1K/8+8/leopard-gf8-single-inv-32          3808.73       3753.53       0.99x
BenchmarkDecode1K/8+8/leopard-gf16-32                    20.68         20.81         1.01x
BenchmarkDecode1K/8+8/leopard-gf16-single-32             20.47         20.66         1.01x
BenchmarkDecode1K/16+16/cauchy-32                        1524.69       1512.59       0.99x
BenchmarkDecode1K/16+16/cauchy-inv-32                    4198.25       4025.31       0.96x
BenchmarkDecode1K/16+16/cauchy-single-32                 6330.39       6419.31       1.01x
BenchmarkDecode1K/16+16/cauchy-single-inv-32             27068.69      27452.92      1.01x
BenchmarkDecode1K/16+16/leopard-gf8-32                   2040.15       2072.28       1.02x
BenchmarkDecode1K/16+16/leopard-gf8-inv-32               2223.54       2392.34       1.08x
BenchmarkDecode1K/16+16/leopard-gf8-single-32            2357.57       2446.68       1.04x
BenchmarkDecode1K/16+16/leopard-gf8-single-inv-32        2761.64       2852.71       1.03x
BenchmarkDecode1K/16+16/leopard-gf16-32                  40.66         41.22         1.01x
BenchmarkDecode1K/16+16/leopard-gf16-single-32           40.55         40.97         1.01x
BenchmarkDecode1K/32+32/cauchy-32                        557.06        559.72        1.00x
BenchmarkDecode1K/32+32/cauchy-inv-32                    2733.51       2679.96       0.98x
BenchmarkDecode1K/32+32/cauchy-single-32                 4599.08       4728.23       1.03x
BenchmarkDecode1K/32+32/cauchy-single-inv-32             32476.37      32874.63      1.01x
BenchmarkDecode1K/32+32/leopard-gf8-32                   2055.34       2035.06       0.99x
BenchmarkDecode1K/32+32/leopard-gf8-inv-32               2116.85       2165.02       1.02x
BenchmarkDecode1K/32+32/leopard-gf8-single-32            3162.09       2917.71       0.92x
BenchmarkDecode1K/32+32/leopard-gf8-single-inv-32        3163.38       2900.87       0.92x
BenchmarkDecode1K/32+32/leopard-gf16-32                  79.63         81.04         1.02x
BenchmarkDecode1K/32+32/leopard-gf16-single-32           80.24         80.96         1.01x
BenchmarkDecode1K/64+64/cauchy-32                        161.19        162.89        1.01x
BenchmarkDecode1K/64+64/cauchy-inv-32                    1618.58       1589.00       0.98x
BenchmarkDecode1K/64+64/cauchy-single-32                 2850.67       2940.14       1.03x
BenchmarkDecode1K/64+64/cauchy-single-inv-32             38190.74      39891.62      1.04x
BenchmarkDecode1K/64+64/leopard-gf8-32                   1673.05       1762.57       1.05x
BenchmarkDecode1K/64+64/leopard-gf8-inv-32               1786.44       2076.54       1.16x
BenchmarkDecode1K/64+64/leopard-gf8-single-32            2614.63       2981.72       1.14x
BenchmarkDecode1K/64+64/leopard-gf8-single-inv-32        2554.62       3180.75       1.25x
BenchmarkDecode1K/64+64/leopard-gf16-32                  151.70        154.88        1.02x
BenchmarkDecode1K/64+64/leopard-gf16-single-32           154.18        157.81        1.02x
BenchmarkDecode1K/128+128/cauchy-32                      44.21         44.43         1.00x
BenchmarkDecode1K/128+128/cauchy-inv-32                  862.07        855.85        0.99x
BenchmarkDecode1K/128+128/cauchy-single-32               1597.56       1628.49       1.02x
BenchmarkDecode1K/128+128/cauchy-single-inv-32           44812.17      46602.36      1.04x
BenchmarkDecode1K/128+128/leopard-gf8-32                 1654.65       1796.62       1.09x
BenchmarkDecode1K/128+128/leopard-gf8-inv-32             1721.13       1851.02       1.08x
BenchmarkDecode1K/128+128/leopard-gf8-single-32          2335.01       2783.74       1.19x
BenchmarkDecode1K/128+128/leopard-gf8-single-inv-32      2505.05       2722.86       1.09x
BenchmarkDecode1K/128+128/leopard-gf16-32                282.54        284.91        1.01x
BenchmarkDecode1K/128+128/leopard-gf16-single-32         293.55        295.88        1.01x
BenchmarkDecode1K/256+256/leopard-gf16-32                462.96        474.14        1.02x
BenchmarkDecode1K/256+256/leopard-gf16-single-32         515.05        522.54        1.01x
BenchmarkDecode1K/512+512/leopard-gf16-32                701.27        719.41        1.03x
BenchmarkDecode1K/512+512/leopard-gf16-single-32         821.71        845.65        1.03x
BenchmarkDecode1K/1024+1024/leopard-gf16-32              835.08        890.46        1.07x
BenchmarkDecode1K/1024+1024/leopard-gf16-single-32       1088.37       1174.14       1.08x
BenchmarkDecode1K/2048+2048/leopard-gf16-32              916.84        1035.28       1.13x
BenchmarkDecode1K/2048+2048/leopard-gf16-single-32       1231.99       1428.13       1.16x
BenchmarkDecode1K/4096+4096/leopard-gf16-32              845.83        894.18        1.06x
BenchmarkDecode1K/4096+4096/leopard-gf16-single-32       1127.52       1344.92       1.19x
BenchmarkDecode1K/8192+8192/leopard-gf16-32              617.40        758.29        1.23x
BenchmarkDecode1K/8192+8192/leopard-gf16-single-32       856.40        1055.93       1.23x
BenchmarkDecode1K/16384+16384/leopard-gf16-32            512.36        553.42        1.08x
BenchmarkDecode1K/16384+16384/leopard-gf16-single-32     780.06        829.41        1.06x
BenchmarkDecode1K/32768+32768/leopard-gf16-32            487.48        552.79        1.13x
BenchmarkDecode1K/32768+32768/leopard-gf16-single-32     748.81        794.75        1.06x
BenchmarkEncodeLeopard/83840-32                          2163.35       2448.01       1.13x
BenchmarkEncode10x2x10000-32                             38138.59      38660.50      1.01x
BenchmarkEncode100x20x10000-32                           7688.19       9060.59       1.18x
BenchmarkEncode17x3x1M-32                                78657.27      38162.16      0.49x
BenchmarkEncode10x4x16M-32                               27346.47      23214.15      0.85x
BenchmarkEncode5x2x1M-32                                 105293.15     108112.25     1.03x
BenchmarkEncode10x2x1M-32                                119077.13     117328.97     0.99x
BenchmarkEncode10x4x1M-32                                89996.44      82971.98      0.92x
BenchmarkEncode50x20x1M-32                               23537.96      5038.47       0.21x
BenchmarkEncodeLeopard50x20x1M-32                        6927.22       6163.82       0.89x
BenchmarkEncode17x3x16M-32                               32983.93      28065.68      0.85x
BenchmarkEncode_8x4x8M-32                                27000.57      29266.72      1.08x
BenchmarkEncode_12x4x12M-32                              29593.55      30983.18      1.05x
BenchmarkEncode_16x4x16M-32                              30833.05      30687.30      1.00x
BenchmarkEncode_16x4x32M-32                              30842.71      31313.10      1.02x
BenchmarkEncode_16x4x64M-32                              29655.62      31005.32      1.05x
BenchmarkEncode_8x5x8M-32                                24868.94      26669.44      1.07x
BenchmarkEncode_8x6x8M-32                                24333.50      26080.85      1.07x
BenchmarkEncode_8x7x8M-32                                23288.33      25324.21      1.09x
BenchmarkEncode_8x9x8M-32                                22626.99      23758.85      1.05x
BenchmarkEncode_8x10x8M-32                               21839.39      22747.68      1.04x
BenchmarkEncode_8x11x8M-32                               21754.38      22578.03      1.04x
BenchmarkEncode_8x8x05M-32                               58222.07      26935.36      0.46x
BenchmarkEncode_8x8x1M-32                                57582.31      61119.59      1.06x
BenchmarkEncode_8x8x8M-32                                23131.88      24926.77      1.08x
BenchmarkEncode_8x8x32M-32                               22220.71      22775.93      1.02x
BenchmarkEncode_24x8x24M-32                              21930.61      26163.18      1.19x
BenchmarkEncode_24x8x48M-32                              25406.13      25360.06      1.00x
BenchmarkVerify800x200/64-32                             1585.15       1677.20       1.06x
BenchmarkVerify800x200/256-32                            3055.95       3272.60       1.07x
BenchmarkVerify800x200/1024-32                           3768.14       4259.11       1.13x
BenchmarkVerify800x200/4096-32                           3782.60       4407.39       1.17x
BenchmarkVerify800x200/16384-32                          2861.47       3549.40       1.24x
BenchmarkVerify800x200/65536-32                          1989.88       2504.00       1.26x
BenchmarkVerify800x200/262144-32                         1688.49       2071.25       1.23x
BenchmarkVerify800x200/1048576-32                        1866.92       2138.37       1.15x
BenchmarkVerify10x2x10000-32                             20326.26      21818.19      1.07x
BenchmarkVerify50x5x100000-32                            26007.30      32132.24      1.24x
BenchmarkVerify10x2x1M-32                                24960.93      29570.24      1.18x
BenchmarkVerify5x2x1M-32                                 18540.92      21904.78      1.18x
BenchmarkVerify10x4x1M-32                                13471.44      15647.82      1.16x
BenchmarkVerify50x20x1M-32                               9303.86       8705.50       0.94x
BenchmarkVerify10x4x16M-32                               10115.75      13227.73      1.31x
BenchmarkReconstruct10x2x10000-32                        37249.17      38497.52      1.03x
BenchmarkReconstruct800x200/64-32                        54.27         58.14         1.07x
BenchmarkReconstruct800x200/256-32                       197.76        210.84        1.07x
BenchmarkReconstruct800x200/1024-32                      544.30        589.32        1.08x
BenchmarkReconstruct800x200/4096-32                      926.59        1061.05       1.15x
BenchmarkReconstruct800x200/16384-32                     602.98        718.27        1.19x
BenchmarkReconstruct800x200/65536-32                     547.09        603.17        1.10x
BenchmarkReconstruct800x200/262144-32                    493.43        518.38        1.05x
BenchmarkReconstruct800x200/1048576-32                   359.27        435.35        1.21x
BenchmarkReconstruct50x5x50000-32                        37134.76      40813.72      1.10x
BenchmarkReconstruct10x2x1M-32                           67919.63      67434.39      0.99x
BenchmarkReconstruct5x2x1M-32                            56827.25      57840.28      1.02x
BenchmarkReconstruct10x4x1M-32                           53012.91      54704.77      1.03x
BenchmarkReconstruct50x20x1M-32                          20725.22      11977.81      0.58x
BenchmarkReconstructLeopard50x20x1M-32                   2427.13       2530.21       1.04x
BenchmarkReconstruct10x4x16M-32                          24887.19      26440.25      1.06x
BenchmarkReconstructData10x2x10000-32                    39218.24      40158.36      1.02x
BenchmarkReconstructData800x200/64-32                    55.77         58.65         1.05x
BenchmarkReconstructData800x200/256-32                   202.61        211.06        1.04x
BenchmarkReconstructData800x200/1024-32                  556.75        593.56        1.07x
BenchmarkReconstructData800x200/4096-32                  938.82        1063.90       1.13x
BenchmarkReconstructData800x200/16384-32                 626.09        722.66        1.15x
BenchmarkReconstructData800x200/65536-32                 543.74        611.74        1.13x
BenchmarkReconstructData800x200/262144-32                465.15        551.04        1.18x
BenchmarkReconstructData800x200/1048576-32               400.92        431.63        1.08x
BenchmarkReconstructData50x5x50000-32                    38400.98      42485.55      1.11x
BenchmarkReconstructData10x2x1M-32                       73066.67      52953.80      0.72x
BenchmarkReconstructData5x2x1M-32                        65271.49      73020.37      1.12x
BenchmarkReconstructData10x4x1M-32                       64273.61      48146.03      0.75x
BenchmarkReconstructData50x20x1M-32                      31718.30      17925.27      0.57x
BenchmarkReconstructData10x4x16M-32                      32179.90      35221.52      1.09x
BenchmarkReconstructP10x2x10000-32                       145267.14     144760.51     1.00x
BenchmarkReconstructP10x5x20000-32                       203564.07     190537.34     0.94x
BenchmarkParallel_8x8x64K-32                             172764.77     111229.29     0.64x
BenchmarkParallel_8x8x05M-32                             23086.39      23602.81      1.02x
BenchmarkParallel_20x10x05M-32                           26332.73      26416.51      1.00x
BenchmarkParallel_8x8x1M-32                              23482.07      23647.57      1.01x
BenchmarkParallel_8x8x8M-32                              23587.55      23329.00      0.99x
BenchmarkParallel_8x8x32M-32                             23571.23      23551.34      1.00x
BenchmarkParallel_8x3x1M-32                              28482.50      28411.98      1.00x
BenchmarkParallel_8x4x1M-32                              26990.67      26935.25      1.00x
BenchmarkParallel_8x5x1M-32                              25777.97      25740.90      1.00x
BenchmarkStreamEncode10x2x10000-32                       17811.48      17925.46      1.01x
BenchmarkStreamEncode100x20x10000-32                     3700.48       3934.28       1.06x
BenchmarkStreamEncode17x3x1M-32                          11744.12      12104.29      1.03x
BenchmarkStreamEncode10x4x16M-32                         8709.65       9171.17       1.05x
BenchmarkStreamEncode5x2x1M-32                           12556.48      12980.44      1.03x
BenchmarkStreamEncode10x2x1M-32                          12735.21      12759.79      1.00x
BenchmarkStreamEncode10x4x1M-32                          11852.03      12235.81      1.03x
BenchmarkStreamEncode50x20x1M-32                         8000.61       4188.19       0.52x
BenchmarkStreamEncode17x3x16M-32                         9978.17       10440.42      1.05x
BenchmarkStreamVerify10x2x10000-32                       12114.06      12367.66      1.02x
BenchmarkStreamVerify50x5x50000-32                       7576.44       7856.47       1.04x
BenchmarkStreamVerify10x2x1M-32                          8462.79       8772.25       1.04x
BenchmarkStreamVerify5x2x1M-32                           6852.10       7112.05       1.04x
BenchmarkStreamVerify10x4x1M-32                          6483.22       6619.19       1.02x
BenchmarkStreamVerify50x20x1M-32                         5556.64       4439.93       0.80x
BenchmarkStreamVerify10x4x16M-32                         5714.50       6199.45       1.08x

For AMD64 aligned inputs can make a difference in speed. This is an example of the speed difference when inputs are unaligned/aligned: ``` BenchmarkEncode100x20x10000-32 7058 172648 ns/op 6950.57 MB/s BenchmarkEncode100x20x10000-32 8406 137911 ns/op 8701.24 MB/s ``` To facilitate this the package provides an `AllocAligned(shards, each int) [][]byte`. This will allocate a number of shards, each with the size `each`. Each shard will then be aligned to a 64 byte boundary. Each encoder also has a `AllocAligned(each int) [][]byte` as an extended interface which will return the same, but with the shard count configured in the encoder. It is not possible to re-aligned already allocated slices, for example when using `Split`. When it is not possible to write to aligned shards, you should not copy, since that is most likely much slower than using them as-is.

klauspost added 3 commits January 8, 2023 15:16

Add fallback and constants for testing.

a571386

Fix no unsafe.

8ff7132

klauspost merged commit e4bf561 into master Jan 12, 2023

klauspost deleted the align-allocations branch January 12, 2023 09:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Align allocations #230

Align allocations #230

klauspost commented Jan 8, 2023

Align allocations #230

Align allocations #230

Conversation

klauspost commented Jan 8, 2023