Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix shared write buffers 2 #14

Merged
merged 8 commits into from May 7, 2020
Merged

Conversation

klauspost
Copy link
Contributor

@klauspost klauspost commented May 4, 2020

Fixes #12

Fix shared buffers:

benchmark                          old MB/s     new MB/s     speedup
BenchmarkAvx2/32KB-32              2232.44      2039.22      0.91x
BenchmarkAvx2/64KB-32              2935.47      2707.90      0.92x
BenchmarkAvx2/128KB-32             3428.63      2839.23      0.83x
BenchmarkAvx2/256KB-32             3628.36      3145.90      0.87x
BenchmarkAvx2/512KB-32             3576.96      3370.22      0.94x
BenchmarkAvx2/1MB-32               3534.08      3417.84      0.97x
BenchmarkAvx2/2MB-32               3459.18      3363.09      0.97x
BenchmarkAvx2/4MB-32               3484.55      3348.91      0.96x
BenchmarkAvx2/8MB-32               3497.50      3400.22      0.97x
BenchmarkAvx2Parallel/32KB-32      30512.99     20568.38     0.67x
BenchmarkAvx2Parallel/64KB-32      37090.64     21099.39     0.57x
BenchmarkAvx2Parallel/128KB-32     41318.22     20926.21     0.51x
BenchmarkAvx2Parallel/256KB-32     43143.56     24411.63     0.57x
BenchmarkAvx2Parallel/512KB-32     43985.58     29105.24     0.66x
BenchmarkAvx2Parallel/1MB-32       44011.91     29499.57     0.67x
BenchmarkAvx2Parallel/2MB-32       44756.98     29765.74     0.67x
BenchmarkAvx2Parallel/4MB-32       44581.99     27552.38     0.62x
BenchmarkAvx2Parallel/8MB-32       44145.26     25791.88     0.58x

And adds 3x16x32KB alloc when creating a server.

Still looks pretty bleak:

```
benchmark                               old MB/s     new MB/s     speedup
BenchmarkBlock8-32                      2509.41      2523.00      1.01x
BenchmarkAvx2/32KB-32                   2199.15      1765.44      0.80x
BenchmarkAvx2/64KB-32                   2936.65      2344.89      0.80x
BenchmarkAvx2/128KB-32                  3338.90      2774.66      0.83x
BenchmarkAvx2/256KB-32                  3558.00      3022.57      0.85x
BenchmarkAvx2/512KB-32                  3513.83      3104.90      0.88x
BenchmarkAvx2/1MB-32                    3433.49      3079.63      0.90x
BenchmarkAvx2/2MB-32                    3416.81      3061.47      0.90x
BenchmarkAvx2/4MB-32                    3425.56      3099.28      0.90x
BenchmarkAvx2/8MB-32                    3415.68      3102.60      0.91x
BenchmarkAvx2Parallel/32KB-32           31816.47     8515.98      0.27x
BenchmarkAvx2Parallel/64KB-32           38000.07     9303.00      0.24x
BenchmarkAvx2Parallel/128KB-32          41164.65     11504.55     0.28x
BenchmarkAvx2Parallel/256KB-32          43592.91     15237.84     0.35x
BenchmarkAvx2Parallel/512KB-32          44030.27     15926.92     0.36x
BenchmarkAvx2Parallel/1MB-32            44192.77     16602.93     0.38x
BenchmarkAvx2Parallel/2MB-32            44830.55     16490.63     0.37x
BenchmarkAvx2Parallel/4MB-32            44405.23     15724.88     0.35x
BenchmarkAvx2Parallel/8MB-32            43470.73     15546.07     0.36x
```
@klauspost
Copy link
Contributor Author

klauspost commented May 4, 2020

If this is the best we can do, the project is close to useless. This tests under optimal conditions.

Total server throughput is less than crypto/md5. Completely optimal packing of two cores is < 2x faster or 2 cores of crypto/md5.

Crypto speeds for refrence:

BenchmarkCryptoMd5/32KB-32              826.64       816.76       0.99x
BenchmarkCryptoMd5/64KB-32              829.01       825.49       1.00x
BenchmarkCryptoMd5/128KB-32             825.57       829.75       1.01x
BenchmarkCryptoMd5/256KB-32             831.03       830.69       1.00x
BenchmarkCryptoMd5/512KB-32             830.21       825.46       0.99x
BenchmarkCryptoMd5/1MB-32               830.93       825.38       0.99x
BenchmarkCryptoMd5/2MB-32               829.94       825.27       0.99x
BenchmarkCryptoMd5Parallel/32KB-32      21039.14     20523.20     0.98x
BenchmarkCryptoMd5Parallel/64KB-32      20942.51     20800.14     0.99x
BenchmarkCryptoMd5Parallel/128KB-32     20887.57     20818.76     1.00x
BenchmarkCryptoMd5Parallel/256KB-32     21027.41     20808.08     0.99x
BenchmarkCryptoMd5Parallel/512KB-32     20840.18     20881.23     1.00x
BenchmarkCryptoMd5Parallel/1MB-32       20743.23     20676.26     1.00x
BenchmarkCryptoMd5Parallel/2MB-32       20684.87     20521.84     0.99x
BenchmarkCryptoMd5Parallel/4MB-32       20581.08     20325.99     0.99x
BenchmarkCryptoMd5Parallel/8MB-32       20291.55     20108.06     0.99x

So assuming much less optimal packing in real-world these numbers are not worth it.

@fwessels fwessels mentioned this pull request May 4, 2020
fwessels and others added 7 commits May 6, 2020 15:39
Gives quite a performance boost:
```
benchmark                         old MB/s     new MB/s     speedup
BenchmarkAvx2/32KB-8              3316.19      3798.53      1.15x
BenchmarkAvx2/64KB-8              4153.43      4653.64      1.12x
BenchmarkAvx2/128KB-8             4927.97      5611.43      1.14x
BenchmarkAvx2/256KB-8             5359.29      6207.72      1.16x
BenchmarkAvx2/512KB-8             5085.95      6112.55      1.20x
BenchmarkAvx2/1MB-8               5027.90      6196.06      1.23x
BenchmarkAvx2/2MB-8               4978.25      6076.93      1.22x
BenchmarkAvx2/4MB-8               4988.47      6305.10      1.26x
BenchmarkAvx2/8MB-8               4999.42      6413.97      1.28x
BenchmarkAvx2Parallel/32KB-8      14604.34     17229.63     1.18x
BenchmarkAvx2Parallel/64KB-8      16039.15     19440.62     1.21x
BenchmarkAvx2Parallel/128KB-8     17518.13     20677.54     1.18x
BenchmarkAvx2Parallel/256KB-8     17870.89     21456.42     1.20x
BenchmarkAvx2Parallel/512KB-8     17945.12     21590.50     1.20x
BenchmarkAvx2Parallel/1MB-8       17925.02     21673.21     1.21x
BenchmarkAvx2Parallel/2MB-8       17872.97     21352.82     1.19x
BenchmarkAvx2Parallel/4MB-8       16141.88     21325.05     1.32x
BenchmarkAvx2Parallel/8MB-8       17327.55     21210.17     1.22x
```
This claws back most of the lost performance for single server.
Copy link
Contributor

@fwessels fwessels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fwessels fwessels merged commit cb281b5 into minio:master May 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data race
2 participants