Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/crypto/chacha20poly1305: linux/arm64 Go 1.9 performance is 3X slower than OpenSSL #22809

Open
williamweixiao opened this issue Nov 19, 2017 · 15 comments

Comments

@williamweixiao
Copy link
Member

commented Nov 19, 2017

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.9.2 linux/arm64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

GOARCH="arm64"
GOBIN=""
GOEXE=""
GOHOSTARCH="arm64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH=""
GORACE=""
GOROOT="/usr/lib/go-1.6"
GOTOOLDIR="/usr/lib/go-1.6/pkg/tool/linux_arm64"
GO15VENDOREXPERIMENT="1"
CC="gcc"
GOGCCFLAGS="-fPIC -pthread -fmessage-length=0"
CXX="g++"
CGO_ENABLED="1"

What did you do?

go test vendor/golang_org/x/crypto/chacha20poly1305 -bench .

What did you expect to see?

Performance can be on par with OpenSSL (https://blog.cloudflare.com/content/images/2017/11/sym_key_1_core.png)

What did you see instead?

3X slower than OpenSSL( https://blog.cloudflare.com/content/images/2017/11/go_sym_key_1_core.png)

@titanous titanous added this to the Unplanned milestone Nov 21, 2017
@dmitshur dmitshur changed the title crypto/chacha20poly1305: linux/arm64 Go 1.9 performance is 3X slower than OpenSSL x/crypto/chacha20poly1305: linux/arm64 Go 1.9 performance is 3X slower than OpenSSL Mar 20, 2018
@gopherbot

This comment has been minimized.

Copy link

commented Apr 9, 2018

Change https://golang.org/cl/105895 mentions this issue: crypto/poly1305: arm64 implementation using multiword arithmetic

@gopherbot

This comment has been minimized.

Copy link

commented Apr 9, 2018

Change https://golang.org/cl/105896 mentions this issue: crypto/poly1305: arm64 implementation using multiword arithmetic

@gopherbot

This comment has been minimized.

Copy link

commented Apr 19, 2018

Change https://golang.org/cl/107628 mentions this issue: internal/chacha20: add arm64 SIMD implementation

@vielmetti

This comment has been minimized.

Copy link

commented Jun 26, 2018

Substantial perf improvements on Cavium ThunderX going from go 1.10.2 to go1.11beta1, but not 3x faster.

ed@ed-2a-bcc-llvm:~$ go version
go version go1.10.2 linux/arm64
ed@ed-2a-bcc-llvm:~$ go test vendor/golang_org/x/crypto/chacha20poly1305 -bench .
goos: linux
goarch: arm64
pkg: vendor/golang_org/x/crypto/chacha20poly1305
BenchmarkChacha20Poly1305Open_64-96               500000              3047 ns/op          21.00 MB/s
BenchmarkChacha20Poly1305Seal_64-96               500000              2920 ns/op          21.91 MB/s
BenchmarkChacha20Poly1305Open_1350-96              50000             30990 ns/op          43.56 MB/s
BenchmarkChacha20Poly1305Seal_1350-96              50000             30890 ns/op          43.70 MB/s
BenchmarkChacha20Poly1305Open_8K-96                10000            173794 ns/op          47.14 MB/s
BenchmarkChacha20Poly1305Seal_8K-96                10000            173907 ns/op          47.11 MB/s
PASS
ok      vendor/golang_org/x/crypto/chacha20poly1305     10.538s
ed@ed-2a-bcc-llvm:~$ 
ed@ed-2a-bcc-llvm:~$ ~/go/bin/go1.11beta1 test vendor/golang_org/x/crypto/chacha20poly1305 -bench .
goos: linux
goarch: arm64
pkg: vendor/golang_org/x/crypto/chacha20poly1305
BenchmarkChacha20Poly1305Open_64-96              1000000              2249 ns/op          28.45 MB/s
BenchmarkChacha20Poly1305Seal_64-96              1000000              2245 ns/op          28.50 MB/s
BenchmarkChacha20Poly1305Open_1350-96             100000             19541 ns/op          69.08 MB/s
BenchmarkChacha20Poly1305Seal_1350-96             100000             19439 ns/op          69.45 MB/s
BenchmarkChacha20Poly1305Open_8K-96                10000            105547 ns/op          77.61 MB/s
BenchmarkChacha20Poly1305Seal_8K-96                10000            105938 ns/op          77.33 MB/s
PASS
ok      vendor/golang_org/x/crypto/chacha20poly1305     11.173s
@mengzhuo

This comment has been minimized.

Copy link
Contributor

commented Jun 29, 2018

@vielmetti internal/chacha20 won't be merged into go1.11 since it's frozen.

@vielmetti

This comment has been minimized.

Copy link

commented Jun 29, 2018

Thanks @mengzhuo . Can we get this onto the go1.12 roster then? It's currently marked as "Unplanned".

@ianlancetaylor ianlancetaylor modified the milestones: Unplanned, Go1.12 Jun 29, 2018
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Jun 29, 2018

I changed the milestone, but note that that doesn't cause the work to be done. This is an open source project so the best way to get something done is to volunteer to do it. Thanks.

@vielmetti

This comment has been minimized.

Copy link

commented Jun 29, 2018

Noted at https://go-review.googlesource.com/c/crypto/+/107628

"If you prioritize arm64 chacha and arm64 poly, it will see production use super soon after."

It appears from comments on that patch that the coding work has largely been done but there are constraints on the availability of reviewers for arm64 assembly.

@vielmetti

This comment has been minimized.

Copy link

commented Aug 23, 2018

Who is reviewing arm64 assembly these days, @ianlancetaylor , and can they use qualified help? I'm happy to help recruit qualified reviewers from the arm64 Go community if I know the qualifications.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Aug 23, 2018

At present arm64 assembly is typically reviewed by the tireless @cherrymui . I'm sure @benshi001 would also have good input.

@zx2c4

This comment has been minimized.

Copy link
Contributor

commented Jan 3, 2019

@FiloSottile This was marked for 1.12. Things still on target for that?

@vielmetti

This comment has been minimized.

Copy link

commented Feb 5, 2019

This issue is marked currently as "help wanted". What is the nature of the help desired?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Feb 6, 2019

@vielmetti Figuring out how to make the code run faster.

One of the meanings of the "help wanted" label is "we would like this to happen but nobody is working on it."

@zx2c4

This comment has been minimized.

Copy link
Contributor

commented Feb 6, 2019

"we would like this to happen but nobody is working on it."

Pretty sure somebody was working on it, but then CL just didn't get much of a review. Until today, that is.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Feb 6, 2019

@zx2c4 The "help wanted" label was added before any of the CLs were sent. But I should have been clearer in my response; my apologies.

gopherbot pushed a commit to golang/crypto that referenced this issue Feb 11, 2019
Inspired by Vectorization of ChaCha Stream Cipher
https://eprint.iacr.org/2013/759.pdf

name            old time/op    new time/op    delta
ChaCha20/32        690ns ± 0%     872ns ± 0%   +26.38%  (p=0.000 n=10+10)
ChaCha20/63        750ns ± 0%     987ns ± 0%   +31.53%  (p=0.000 n=10+10)
ChaCha20/64        674ns ± 0%     879ns ± 0%   +30.42%  (p=0.000 n=8+10)
ChaCha20/256      2.28µs ± 0%    0.82µs ± 0%   -64.13%  (p=0.000 n=10+10)
ChaCha20/1024     8.64µs ± 0%    2.92µs ± 0%   -66.15%  (p=0.000 n=9+9)
ChaCha20/1350     11.9µs ± 0%     4.5µs ± 0%   -62.51%  (p=0.000 n=10+8)
ChaCha20/65536     554µs ± 0%     181µs ± 0%   -67.33%  (p=0.000 n=10+10)

name            old speed      new speed      delta
ChaCha20/32     46.3MB/s ± 0%  36.7MB/s ± 0%   -20.87%  (p=0.000 n=10+9)
ChaCha20/63     83.9MB/s ± 0%  63.8MB/s ± 0%   -23.97%  (p=0.000 n=10+10)
ChaCha20/64     94.9MB/s ± 0%  72.8MB/s ± 0%   -23.31%  (p=0.000 n=10+10)
ChaCha20/256     112MB/s ± 0%   312MB/s ± 0%  +178.74%  (p=0.000 n=10+10)
ChaCha20/1024    119MB/s ± 0%   350MB/s ± 0%  +195.31%  (p=0.000 n=10+9)
ChaCha20/1350    114MB/s ± 0%   303MB/s ± 0%  +166.73%  (p=0.000 n=8+8)
ChaCha20/65536   118MB/s ± 0%   362MB/s ± 0%  +206.12%  (p=0.000 n=10+10)

Updates golang/go#22809
Change-Id: I487487faa2ae4ff29de6fd8eb1317740c2939c10
Reviewed-on: https://go-review.googlesource.com/c/107628
Reviewed-by: Filippo Valsorda <filippo@golang.org>
@andybons andybons modified the milestones: Go1.12, Go1.13 Feb 12, 2019
@andybons andybons removed this from the Go1.13 milestone Jul 8, 2019
@andybons andybons added this to the Go1.14 milestone Jul 8, 2019
@rsc rsc modified the milestones: Go1.14, Backlog Oct 9, 2019
bored-engineer pushed a commit to bored-engineer/ssh that referenced this issue Oct 13, 2019
Inspired by Vectorization of ChaCha Stream Cipher
https://eprint.iacr.org/2013/759.pdf

name            old time/op    new time/op    delta
ChaCha20/32        690ns ± 0%     872ns ± 0%   +26.38%  (p=0.000 n=10+10)
ChaCha20/63        750ns ± 0%     987ns ± 0%   +31.53%  (p=0.000 n=10+10)
ChaCha20/64        674ns ± 0%     879ns ± 0%   +30.42%  (p=0.000 n=8+10)
ChaCha20/256      2.28µs ± 0%    0.82µs ± 0%   -64.13%  (p=0.000 n=10+10)
ChaCha20/1024     8.64µs ± 0%    2.92µs ± 0%   -66.15%  (p=0.000 n=9+9)
ChaCha20/1350     11.9µs ± 0%     4.5µs ± 0%   -62.51%  (p=0.000 n=10+8)
ChaCha20/65536     554µs ± 0%     181µs ± 0%   -67.33%  (p=0.000 n=10+10)

name            old speed      new speed      delta
ChaCha20/32     46.3MB/s ± 0%  36.7MB/s ± 0%   -20.87%  (p=0.000 n=10+9)
ChaCha20/63     83.9MB/s ± 0%  63.8MB/s ± 0%   -23.97%  (p=0.000 n=10+10)
ChaCha20/64     94.9MB/s ± 0%  72.8MB/s ± 0%   -23.31%  (p=0.000 n=10+10)
ChaCha20/256     112MB/s ± 0%   312MB/s ± 0%  +178.74%  (p=0.000 n=10+10)
ChaCha20/1024    119MB/s ± 0%   350MB/s ± 0%  +195.31%  (p=0.000 n=10+9)
ChaCha20/1350    114MB/s ± 0%   303MB/s ± 0%  +166.73%  (p=0.000 n=8+8)
ChaCha20/65536   118MB/s ± 0%   362MB/s ± 0%  +206.12%  (p=0.000 n=10+10)

Updates golang/go#22809
Change-Id: I487487faa2ae4ff29de6fd8eb1317740c2939c10
Reviewed-on: https://go-review.googlesource.com/c/107628
Reviewed-by: Filippo Valsorda <filippo@golang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants
You can’t perform that action at this time.