Fix shared write buffers 2 #14

klauspost · 2020-05-04T13:51:22Z

Fixes #12

Fix shared buffers:

benchmark                          old MB/s     new MB/s     speedup
BenchmarkAvx2/32KB-32              2232.44      2039.22      0.91x
BenchmarkAvx2/64KB-32              2935.47      2707.90      0.92x
BenchmarkAvx2/128KB-32             3428.63      2839.23      0.83x
BenchmarkAvx2/256KB-32             3628.36      3145.90      0.87x
BenchmarkAvx2/512KB-32             3576.96      3370.22      0.94x
BenchmarkAvx2/1MB-32               3534.08      3417.84      0.97x
BenchmarkAvx2/2MB-32               3459.18      3363.09      0.97x
BenchmarkAvx2/4MB-32               3484.55      3348.91      0.96x
BenchmarkAvx2/8MB-32               3497.50      3400.22      0.97x
BenchmarkAvx2Parallel/32KB-32      30512.99     20568.38     0.67x
BenchmarkAvx2Parallel/64KB-32      37090.64     21099.39     0.57x
BenchmarkAvx2Parallel/128KB-32     41318.22     20926.21     0.51x
BenchmarkAvx2Parallel/256KB-32     43143.56     24411.63     0.57x
BenchmarkAvx2Parallel/512KB-32     43985.58     29105.24     0.66x
BenchmarkAvx2Parallel/1MB-32       44011.91     29499.57     0.67x
BenchmarkAvx2Parallel/2MB-32       44756.98     29765.74     0.67x
BenchmarkAvx2Parallel/4MB-32       44581.99     27552.38     0.62x
BenchmarkAvx2Parallel/8MB-32       44145.26     25791.88     0.58x

And adds 3x16x32KB alloc when creating a server.

Still looks pretty bleak: ``` benchmark old MB/s new MB/s speedup BenchmarkBlock8-32 2509.41 2523.00 1.01x BenchmarkAvx2/32KB-32 2199.15 1765.44 0.80x BenchmarkAvx2/64KB-32 2936.65 2344.89 0.80x BenchmarkAvx2/128KB-32 3338.90 2774.66 0.83x BenchmarkAvx2/256KB-32 3558.00 3022.57 0.85x BenchmarkAvx2/512KB-32 3513.83 3104.90 0.88x BenchmarkAvx2/1MB-32 3433.49 3079.63 0.90x BenchmarkAvx2/2MB-32 3416.81 3061.47 0.90x BenchmarkAvx2/4MB-32 3425.56 3099.28 0.90x BenchmarkAvx2/8MB-32 3415.68 3102.60 0.91x BenchmarkAvx2Parallel/32KB-32 31816.47 8515.98 0.27x BenchmarkAvx2Parallel/64KB-32 38000.07 9303.00 0.24x BenchmarkAvx2Parallel/128KB-32 41164.65 11504.55 0.28x BenchmarkAvx2Parallel/256KB-32 43592.91 15237.84 0.35x BenchmarkAvx2Parallel/512KB-32 44030.27 15926.92 0.36x BenchmarkAvx2Parallel/1MB-32 44192.77 16602.93 0.38x BenchmarkAvx2Parallel/2MB-32 44830.55 16490.63 0.37x BenchmarkAvx2Parallel/4MB-32 44405.23 15724.88 0.35x BenchmarkAvx2Parallel/8MB-32 43470.73 15546.07 0.36x ```

klauspost · 2020-05-04T14:05:30Z

If this is the best we can do, the project is close to useless. This tests under optimal conditions.

Total server throughput is less than crypto/md5. Completely optimal packing of two cores is < 2x faster or 2 cores of crypto/md5.

Crypto speeds for refrence:

BenchmarkCryptoMd5/32KB-32              826.64       816.76       0.99x
BenchmarkCryptoMd5/64KB-32              829.01       825.49       1.00x
BenchmarkCryptoMd5/128KB-32             825.57       829.75       1.01x
BenchmarkCryptoMd5/256KB-32             831.03       830.69       1.00x
BenchmarkCryptoMd5/512KB-32             830.21       825.46       0.99x
BenchmarkCryptoMd5/1MB-32               830.93       825.38       0.99x
BenchmarkCryptoMd5/2MB-32               829.94       825.27       0.99x
BenchmarkCryptoMd5Parallel/32KB-32      21039.14     20523.20     0.98x
BenchmarkCryptoMd5Parallel/64KB-32      20942.51     20800.14     0.99x
BenchmarkCryptoMd5Parallel/128KB-32     20887.57     20818.76     1.00x
BenchmarkCryptoMd5Parallel/256KB-32     21027.41     20808.08     0.99x
BenchmarkCryptoMd5Parallel/512KB-32     20840.18     20881.23     1.00x
BenchmarkCryptoMd5Parallel/1MB-32       20743.23     20676.26     1.00x
BenchmarkCryptoMd5Parallel/2MB-32       20684.87     20521.84     0.99x
BenchmarkCryptoMd5Parallel/4MB-32       20581.08     20325.99     0.99x
BenchmarkCryptoMd5Parallel/8MB-32       20291.55     20108.06     0.99x

So assuming much less optimal packing in real-world these numbers are not worth it.

Gives quite a performance boost: ``` benchmark old MB/s new MB/s speedup BenchmarkAvx2/32KB-8 3316.19 3798.53 1.15x BenchmarkAvx2/64KB-8 4153.43 4653.64 1.12x BenchmarkAvx2/128KB-8 4927.97 5611.43 1.14x BenchmarkAvx2/256KB-8 5359.29 6207.72 1.16x BenchmarkAvx2/512KB-8 5085.95 6112.55 1.20x BenchmarkAvx2/1MB-8 5027.90 6196.06 1.23x BenchmarkAvx2/2MB-8 4978.25 6076.93 1.22x BenchmarkAvx2/4MB-8 4988.47 6305.10 1.26x BenchmarkAvx2/8MB-8 4999.42 6413.97 1.28x BenchmarkAvx2Parallel/32KB-8 14604.34 17229.63 1.18x BenchmarkAvx2Parallel/64KB-8 16039.15 19440.62 1.21x BenchmarkAvx2Parallel/128KB-8 17518.13 20677.54 1.18x BenchmarkAvx2Parallel/256KB-8 17870.89 21456.42 1.20x BenchmarkAvx2Parallel/512KB-8 17945.12 21590.50 1.20x BenchmarkAvx2Parallel/1MB-8 17925.02 21673.21 1.21x BenchmarkAvx2Parallel/2MB-8 17872.97 21352.82 1.19x BenchmarkAvx2Parallel/4MB-8 16141.88 21325.05 1.32x BenchmarkAvx2Parallel/8MB-8 17327.55 21210.17 1.22x ```

This claws back most of the lost performance for single server.

fwessels

LGTM

fwessels mentioned this pull request May 4, 2020

Fix shared write buffers #13

Closed

fwessels and others added 7 commits May 6, 2020 15:39

Merge branch 'master' into shared-write-buffers2

79400f0

Merge branch 'test-elim-copy-avx2' into shared-write-buffers2

f785b2a

Use a single shared buffer for the server.

410f0ae

This claws back most of the lost performance for single server.

Fix tests.

2a0cf90

Add race detection to hash test.

6172fd7

Re-add parallel simulators+ balance.

1805309

fwessels approved these changes May 7, 2020

View reviewed changes

fwessels merged commit cb281b5 into minio:master May 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix shared write buffers 2 #14

Fix shared write buffers 2 #14

klauspost commented May 4, 2020 •

edited

klauspost commented May 4, 2020 •

edited

fwessels left a comment

Fix shared write buffers 2 #14

Fix shared write buffers 2 #14

Conversation

klauspost commented May 4, 2020 • edited

klauspost commented May 4, 2020 • edited

fwessels left a comment

Choose a reason for hiding this comment

klauspost commented May 4, 2020 •

edited

klauspost commented May 4, 2020 •

edited