-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crypto/cipher: A performance optimization idea of “crypto” lib for ARM-arch #42010
Labels
NeedsInvestigation
Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone
Comments
kkoogqw
changed the title
A performance optimization idea of “crypto” lib for ARM-arch
crypto/cipher: A performance optimization idea of “crypto” lib for ARM-arch
Oct 16, 2020
toothrot
added
the
NeedsInvestigation
Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
label
Oct 16, 2020
Change https://golang.org/cl/142537 mentions this issue: |
gopherbot
pushed a commit
that referenced
this issue
Nov 7, 2020
cpu: HiSilicon(R) Kirin 970 2.4GHz name old time/op new time/op delta XORBytes/8Bytes 39.8ns ± 0% 17.3ns ± 0% -56.53% (p=0.000 n=10+10) XORBytes/128Bytes 376ns ± 0% 28ns ± 0% -92.63% (p=0.000 n=10+8) XORBytes/2048Bytes 5.67µs ± 0% 0.22µs ± 0% -96.03% (p=0.000 n=10+10) XORBytes/32768Bytes 90.3µs ± 0% 3.5µs ± 0% -96.12% (p=0.000 n=10+10) AESGCMSeal1K 853ns ± 0% 853ns ± 0% ~ (all equal) AESGCMOpen1K 876ns ± 0% 874ns ± 0% -0.23% (p=0.000 n=10+10) AESGCMSign8K 3.09µs ± 0% 3.08µs ± 0% -0.34% (p=0.000 n=10+9) AESGCMSeal8K 5.87µs ± 0% 5.87µs ± 0% +0.01% (p=0.008 n=10+8) AESGCMOpen8K 5.82µs ± 0% 5.82µs ± 0% +0.02% (p=0.037 n=10+10) AESCFBEncrypt1K 7.05µs ± 0% 4.27µs ± 0% -39.38% (p=0.000 n=10+10) AESCFBDecrypt1K 7.12µs ± 0% 4.30µs ± 0% -39.54% (p=0.000 n=10+9) AESCFBDecrypt8K 56.7µs ± 0% 34.1µs ± 0% -39.82% (p=0.000 n=10+10) AESOFB1K 5.20µs ± 0% 2.54µs ± 0% -51.07% (p=0.000 n=10+10) AESCTR1K 4.96µs ± 0% 2.30µs ± 0% -53.62% (p=0.000 n=9+10) AESCTR8K 39.5µs ± 0% 18.2µs ± 0% -53.98% (p=0.000 n=8+10) AESCBCEncrypt1K 5.81µs ± 0% 3.07µs ± 0% -47.13% (p=0.000 n=10+8) AESCBCDecrypt1K 5.83µs ± 0% 3.10µs ± 0% -46.84% (p=0.000 n=10+8) name old speed new speed delta XORBytes/8Bytes 201MB/s ± 0% 461MB/s ± 0% +129.80% (p=0.000 n=6+10) XORBytes/128Bytes 340MB/s ± 0% 4625MB/s ± 0% +1259.91% (p=0.000 n=8+10) XORBytes/2048Bytes 361MB/s ± 0% 9088MB/s ± 0% +2414.23% (p=0.000 n=8+10) XORBytes/32768Bytes 363MB/s ± 0% 9350MB/s ± 0% +2477.44% (p=0.000 n=10+10) AESGCMSeal1K 1.20GB/s ± 0% 1.20GB/s ± 0% -0.02% (p=0.041 n=10+10) AESGCMOpen1K 1.17GB/s ± 0% 1.17GB/s ± 0% +0.20% (p=0.000 n=10+10) AESGCMSign8K 2.65GB/s ± 0% 2.66GB/s ± 0% +0.35% (p=0.000 n=10+9) AESGCMSeal8K 1.40GB/s ± 0% 1.40GB/s ± 0% -0.01% (p=0.000 n=10+7) AESGCMOpen8K 1.41GB/s ± 0% 1.41GB/s ± 0% -0.03% (p=0.022 n=10+10) AESCFBEncrypt1K 145MB/s ± 0% 238MB/s ± 0% +64.95% (p=0.000 n=10+10) AESCFBDecrypt1K 143MB/s ± 0% 237MB/s ± 0% +65.39% (p=0.000 n=10+9) AESCFBDecrypt8K 144MB/s ± 0% 240MB/s ± 0% +66.15% (p=0.000 n=10+10) AESOFB1K 196MB/s ± 0% 401MB/s ± 0% +104.35% (p=0.000 n=9+10) AESCTR1K 205MB/s ± 0% 443MB/s ± 0% +115.57% (p=0.000 n=7+10) AESCTR8K 207MB/s ± 0% 450MB/s ± 0% +117.27% (p=0.000 n=10+10) AESCBCEncrypt1K 176MB/s ± 0% 334MB/s ± 0% +89.15% (p=0.000 n=10+8) AESCBCDecrypt1K 176MB/s ± 0% 330MB/s ± 0% +88.08% (p=0.000 n=10+9) Updates #42010 Change-Id: I75e6d66fd0070e184d93b020c55a7580c713647c Reviewed-on: https://go-review.googlesource.com/c/go/+/142537 Reviewed-by: Meng Zhuo <mzh@golangcn.org> Reviewed-by: Filippo Valsorda <filippo@golang.org> Run-TryBot: Meng Zhuo <mzh@golangcn.org> TryBot-Result: Go Bot <gobot@golang.org> Trust: Meng Zhuo <mzh@golangcn.org>
See my PR #53154 which adds non-NEON and NEON implementations of xorBytes for ARM. This bridges the gap with ARM64. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
NeedsInvestigation
Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
What version of Go are you using (
go version
)?When I run AES-CBC performance analysis on amd64 and arm64 platforms, I found that function:
func xorBytes(dst, a, b []byte) int
andfunc safeXORBytes(dst, a, b []byte, n int)
(in crypto/cipher/xor_generic.go) on arm64-arch always appears top15 in pprof list. Compared with amd64-arch, this function uses SSE2 SIMD instruction infunc xorBytesSSE2(dst, a, b *byte, n int)
.I consider whether we can use the arm64 SIMD instruction to optimize the performance of this function?
The text was updated successfully, but these errors were encountered: