-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/crypto/blake2b: very low performance for AVX and AVX2 code #18563
Labels
Milestone
Comments
rakyll
changed the title
blake2b: very low performance for AVX and AVX2 code
x/crypto/blake2b: very low performance for AVX and AVX2 code
Jan 7, 2017
Replacing |
CL https://golang.org/cl/34993 mentions this issue. |
harshavardhana
added a commit
to minio/minio
that referenced
this issue
Jan 25, 2017
c-expert-zigbee
pushed a commit
to c-expert-zigbee/crypto_go
that referenced
this issue
Mar 28, 2022
On some amd64 CPUs (Xeon E5-2680v4 / E5-2620v3) using SSE and AVX instructions leads to very low performance. On a i7-6500U the SSE-AVX code performs following: AVX2: name time/op Write128-4 165ns ± 0% Write1K-4 1.20µs ± 0% Sum128-4 189ns ± 1% Sum1K-4 1.22µs ± 0% name speed Write128-4 773MB/s ± 1% Write1K-4 855MB/s ± 0% Sum128-4 675MB/s ± 1% Sum1K-4 838MB/s ± 0% while the same code achieves values < 65MB/s on a Xeon E5-2620v3. Replacing the `MOVQ` and `PINSRQ` with the AVX instructions `VMOVQ` and `VPINSRQ` increases the performance of the AVX/AVX2 code to some expected values: name old time/op new time/op delta Write128-12 2.20µs ±10% 0.22µs ± 9% -90.00% (p=0.029 n=4+4) Write1K-12 16.2µs ± 0% 1.1µs ± 0% -93.07% (p=0.029 n=4+4) Sum128-12 2.10µs ± 0% 0.22µs ± 0% -89.47% (p=0.029 n=4+4) Sum1K-12 16.3µs ± 0% 1.2µs ± 0% -92.65% (p=0.029 n=4+4) name old speed new speed delta Write128-12 58.5MB/s ±10% 582.8MB/s ±10% +897.08% (p=0.029 n=4+4) Write1K-12 63.1MB/s ± 0% 909.8MB/s ± 0% +1341.40% (p=0.029 n=4+4) Sum128-12 60.8MB/s ± 0% 576.3MB/s ± 0% +847.84% (p=0.029 n=4+4) Sum1K-12 62.8MB/s ± 0% 855.2MB/s ± 0% +1260.78% (p=0.029 n=4+4) The AVX/AVX2 code now uses only AVX (no SSE) instructions. Fixes golang/go#18563. Change-Id: I1961dd8fa02014642587523b7f099816a263c9f5 Reviewed-on: https://go-review.googlesource.com/34993 Reviewed-by: Adam Langley <agl@golang.org>
c-expert-zigbee
pushed a commit
to c-expert-zigbee/crypto_go
that referenced
this issue
Mar 29, 2022
On some amd64 CPUs (Xeon E5-2680v4 / E5-2620v3) using SSE and AVX instructions leads to very low performance. On a i7-6500U the SSE-AVX code performs following: AVX2: name time/op Write128-4 165ns ± 0% Write1K-4 1.20µs ± 0% Sum128-4 189ns ± 1% Sum1K-4 1.22µs ± 0% name speed Write128-4 773MB/s ± 1% Write1K-4 855MB/s ± 0% Sum128-4 675MB/s ± 1% Sum1K-4 838MB/s ± 0% while the same code achieves values < 65MB/s on a Xeon E5-2620v3. Replacing the `MOVQ` and `PINSRQ` with the AVX instructions `VMOVQ` and `VPINSRQ` increases the performance of the AVX/AVX2 code to some expected values: name old time/op new time/op delta Write128-12 2.20µs ±10% 0.22µs ± 9% -90.00% (p=0.029 n=4+4) Write1K-12 16.2µs ± 0% 1.1µs ± 0% -93.07% (p=0.029 n=4+4) Sum128-12 2.10µs ± 0% 0.22µs ± 0% -89.47% (p=0.029 n=4+4) Sum1K-12 16.3µs ± 0% 1.2µs ± 0% -92.65% (p=0.029 n=4+4) name old speed new speed delta Write128-12 58.5MB/s ±10% 582.8MB/s ±10% +897.08% (p=0.029 n=4+4) Write1K-12 63.1MB/s ± 0% 909.8MB/s ± 0% +1341.40% (p=0.029 n=4+4) Sum128-12 60.8MB/s ± 0% 576.3MB/s ± 0% +847.84% (p=0.029 n=4+4) Sum1K-12 62.8MB/s ± 0% 855.2MB/s ± 0% +1260.78% (p=0.029 n=4+4) The AVX/AVX2 code now uses only AVX (no SSE) instructions. Fixes golang/go#18563. Change-Id: I1961dd8fa02014642587523b7f099816a263c9f5 Reviewed-on: https://go-review.googlesource.com/34993 Reviewed-by: Adam Langley <agl@golang.org>
LewiGoddard
pushed a commit
to LewiGoddard/crypto
that referenced
this issue
Feb 16, 2023
On some amd64 CPUs (Xeon E5-2680v4 / E5-2620v3) using SSE and AVX instructions leads to very low performance. On a i7-6500U the SSE-AVX code performs following: AVX2: name time/op Write128-4 165ns ± 0% Write1K-4 1.20µs ± 0% Sum128-4 189ns ± 1% Sum1K-4 1.22µs ± 0% name speed Write128-4 773MB/s ± 1% Write1K-4 855MB/s ± 0% Sum128-4 675MB/s ± 1% Sum1K-4 838MB/s ± 0% while the same code achieves values < 65MB/s on a Xeon E5-2620v3. Replacing the `MOVQ` and `PINSRQ` with the AVX instructions `VMOVQ` and `VPINSRQ` increases the performance of the AVX/AVX2 code to some expected values: name old time/op new time/op delta Write128-12 2.20µs ±10% 0.22µs ± 9% -90.00% (p=0.029 n=4+4) Write1K-12 16.2µs ± 0% 1.1µs ± 0% -93.07% (p=0.029 n=4+4) Sum128-12 2.10µs ± 0% 0.22µs ± 0% -89.47% (p=0.029 n=4+4) Sum1K-12 16.3µs ± 0% 1.2µs ± 0% -92.65% (p=0.029 n=4+4) name old speed new speed delta Write128-12 58.5MB/s ±10% 582.8MB/s ±10% +897.08% (p=0.029 n=4+4) Write1K-12 63.1MB/s ± 0% 909.8MB/s ± 0% +1341.40% (p=0.029 n=4+4) Sum128-12 60.8MB/s ± 0% 576.3MB/s ± 0% +847.84% (p=0.029 n=4+4) Sum1K-12 62.8MB/s ± 0% 855.2MB/s ± 0% +1260.78% (p=0.029 n=4+4) The AVX/AVX2 code now uses only AVX (no SSE) instructions. Fixes golang/go#18563. Change-Id: I1961dd8fa02014642587523b7f099816a263c9f5 Reviewed-on: https://go-review.googlesource.com/34993 Reviewed-by: Adam Langley <agl@golang.org>
BiiChris
pushed a commit
to BiiChris/crypto
that referenced
this issue
Sep 15, 2023
On some amd64 CPUs (Xeon E5-2680v4 / E5-2620v3) using SSE and AVX instructions leads to very low performance. On a i7-6500U the SSE-AVX code performs following: AVX2: name time/op Write128-4 165ns ± 0% Write1K-4 1.20µs ± 0% Sum128-4 189ns ± 1% Sum1K-4 1.22µs ± 0% name speed Write128-4 773MB/s ± 1% Write1K-4 855MB/s ± 0% Sum128-4 675MB/s ± 1% Sum1K-4 838MB/s ± 0% while the same code achieves values < 65MB/s on a Xeon E5-2620v3. Replacing the `MOVQ` and `PINSRQ` with the AVX instructions `VMOVQ` and `VPINSRQ` increases the performance of the AVX/AVX2 code to some expected values: name old time/op new time/op delta Write128-12 2.20µs ±10% 0.22µs ± 9% -90.00% (p=0.029 n=4+4) Write1K-12 16.2µs ± 0% 1.1µs ± 0% -93.07% (p=0.029 n=4+4) Sum128-12 2.10µs ± 0% 0.22µs ± 0% -89.47% (p=0.029 n=4+4) Sum1K-12 16.3µs ± 0% 1.2µs ± 0% -92.65% (p=0.029 n=4+4) name old speed new speed delta Write128-12 58.5MB/s ±10% 582.8MB/s ±10% +897.08% (p=0.029 n=4+4) Write1K-12 63.1MB/s ± 0% 909.8MB/s ± 0% +1341.40% (p=0.029 n=4+4) Sum128-12 60.8MB/s ± 0% 576.3MB/s ± 0% +847.84% (p=0.029 n=4+4) Sum1K-12 62.8MB/s ± 0% 855.2MB/s ± 0% +1260.78% (p=0.029 n=4+4) The AVX/AVX2 code now uses only AVX (no SSE) instructions. Fixes golang/go#18563. Change-Id: I1961dd8fa02014642587523b7f099816a263c9f5 Reviewed-on: https://go-review.googlesource.com/34993 Reviewed-by: Adam Langley <agl@golang.org>
desdeel2d0m
added a commit
to desdeel2d0m/crypto
that referenced
this issue
Jul 1, 2024
On some amd64 CPUs (Xeon E5-2680v4 / E5-2620v3) using SSE and AVX instructions leads to very low performance. On a i7-6500U the SSE-AVX code performs following: AVX2: name time/op Write128-4 165ns ± 0% Write1K-4 1.20µs ± 0% Sum128-4 189ns ± 1% Sum1K-4 1.22µs ± 0% name speed Write128-4 773MB/s ± 1% Write1K-4 855MB/s ± 0% Sum128-4 675MB/s ± 1% Sum1K-4 838MB/s ± 0% while the same code achieves values < 65MB/s on a Xeon E5-2620v3. Replacing the `MOVQ` and `PINSRQ` with the AVX instructions `VMOVQ` and `VPINSRQ` increases the performance of the AVX/AVX2 code to some expected values: name old time/op new time/op delta Write128-12 2.20µs ±10% 0.22µs ± 9% -90.00% (p=0.029 n=4+4) Write1K-12 16.2µs ± 0% 1.1µs ± 0% -93.07% (p=0.029 n=4+4) Sum128-12 2.10µs ± 0% 0.22µs ± 0% -89.47% (p=0.029 n=4+4) Sum1K-12 16.3µs ± 0% 1.2µs ± 0% -92.65% (p=0.029 n=4+4) name old speed new speed delta Write128-12 58.5MB/s ±10% 582.8MB/s ±10% +897.08% (p=0.029 n=4+4) Write1K-12 63.1MB/s ± 0% 909.8MB/s ± 0% +1341.40% (p=0.029 n=4+4) Sum128-12 60.8MB/s ± 0% 576.3MB/s ± 0% +847.84% (p=0.029 n=4+4) Sum1K-12 62.8MB/s ± 0% 855.2MB/s ± 0% +1260.78% (p=0.029 n=4+4) The AVX/AVX2 code now uses only AVX (no SSE) instructions. Fixes golang/go#18563. Change-Id: I1961dd8fa02014642587523b7f099816a263c9f5 Reviewed-on: https://go-review.googlesource.com/34993 Reviewed-by: Adam Langley <agl@golang.org>
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
go version
)?1.7.*
What operating system and processor architecture are you using (
go env
)?amd64/linux on Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
Further info
What did you do?
go test -bench=Benchmark
for x/crypto/blake2bWhat did you expect to see?
What did you see instead?
Performance about 800 MB/s as on a i7-6500U
The text was updated successfully, but these errors were encountered: