New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix shared write buffers 2 #14
Conversation
Still looks pretty bleak: ``` benchmark old MB/s new MB/s speedup BenchmarkBlock8-32 2509.41 2523.00 1.01x BenchmarkAvx2/32KB-32 2199.15 1765.44 0.80x BenchmarkAvx2/64KB-32 2936.65 2344.89 0.80x BenchmarkAvx2/128KB-32 3338.90 2774.66 0.83x BenchmarkAvx2/256KB-32 3558.00 3022.57 0.85x BenchmarkAvx2/512KB-32 3513.83 3104.90 0.88x BenchmarkAvx2/1MB-32 3433.49 3079.63 0.90x BenchmarkAvx2/2MB-32 3416.81 3061.47 0.90x BenchmarkAvx2/4MB-32 3425.56 3099.28 0.90x BenchmarkAvx2/8MB-32 3415.68 3102.60 0.91x BenchmarkAvx2Parallel/32KB-32 31816.47 8515.98 0.27x BenchmarkAvx2Parallel/64KB-32 38000.07 9303.00 0.24x BenchmarkAvx2Parallel/128KB-32 41164.65 11504.55 0.28x BenchmarkAvx2Parallel/256KB-32 43592.91 15237.84 0.35x BenchmarkAvx2Parallel/512KB-32 44030.27 15926.92 0.36x BenchmarkAvx2Parallel/1MB-32 44192.77 16602.93 0.38x BenchmarkAvx2Parallel/2MB-32 44830.55 16490.63 0.37x BenchmarkAvx2Parallel/4MB-32 44405.23 15724.88 0.35x BenchmarkAvx2Parallel/8MB-32 43470.73 15546.07 0.36x ```
If this is the best we can do, the project is close to useless. This tests under optimal conditions. Total server throughput is less than Crypto speeds for refrence:
So assuming much less optimal packing in real-world these numbers are not worth it. |
Gives quite a performance boost: ``` benchmark old MB/s new MB/s speedup BenchmarkAvx2/32KB-8 3316.19 3798.53 1.15x BenchmarkAvx2/64KB-8 4153.43 4653.64 1.12x BenchmarkAvx2/128KB-8 4927.97 5611.43 1.14x BenchmarkAvx2/256KB-8 5359.29 6207.72 1.16x BenchmarkAvx2/512KB-8 5085.95 6112.55 1.20x BenchmarkAvx2/1MB-8 5027.90 6196.06 1.23x BenchmarkAvx2/2MB-8 4978.25 6076.93 1.22x BenchmarkAvx2/4MB-8 4988.47 6305.10 1.26x BenchmarkAvx2/8MB-8 4999.42 6413.97 1.28x BenchmarkAvx2Parallel/32KB-8 14604.34 17229.63 1.18x BenchmarkAvx2Parallel/64KB-8 16039.15 19440.62 1.21x BenchmarkAvx2Parallel/128KB-8 17518.13 20677.54 1.18x BenchmarkAvx2Parallel/256KB-8 17870.89 21456.42 1.20x BenchmarkAvx2Parallel/512KB-8 17945.12 21590.50 1.20x BenchmarkAvx2Parallel/1MB-8 17925.02 21673.21 1.21x BenchmarkAvx2Parallel/2MB-8 17872.97 21352.82 1.19x BenchmarkAvx2Parallel/4MB-8 16141.88 21325.05 1.32x BenchmarkAvx2Parallel/8MB-8 17327.55 21210.17 1.22x ```
This claws back most of the lost performance for single server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Fixes #12
Fix shared buffers:
And adds 3x16x32KB alloc when creating a server.