-
Notifications
You must be signed in to change notification settings - Fork 18.3k
Description
Both go 1.8 and go tip provides too slow server-side handshake performance for RSA certificates if the client doesn't use TLS session cache:
$ go get -u github.com/valyala/fasthttp/fasthttputil
$ GOMAXPROCS=1 go test github.com/valyala/fasthttp/fasthttputil -bench=TLSHandshake
goos: linux
goarch: amd64
pkg: github.com/valyala/fasthttp/fasthttputil
BenchmarkPlainHandshake 300000 3953 ns/op
BenchmarkTLSHandshakeWithClientSessionCache 20000 81960 ns/op
BenchmarkTLSHandshakeWithoutClientSessionCache 500 3493016 ns/op
BenchmarkTLSHandshakeWithCurvesWithClientSessionCache 20000 80307 ns/op
BenchmarkTLSHandshakeWithCurvesWithoutClientSessionCache 500 3518508 ns/op
PASS
ok github.com/valyala/fasthttp/fasthttputil 11.683s
The results show that a single amd64 core may perform only 300 handshakes per second from new clients without session tickets. This is very discouraging performance comparing to openssl
as described on https://istlsfastyet.com/ :
$ openssl version
OpenSSL 1.0.2g 1 Mar 2016
$ openssl speed ecdh
...
op op/s
256 bit ecdh (nistp256) 0.0001s 12797.0
384 bit ecdh (nistp384) 0.0007s 1416.8
521 bit ecdh (nistp521) 0.0005s 1968.0
Note that openssl
performs 12797 256-bit ecdh operations per second on a single CPU core. This is 40x higher than the results from the comparable BenchmarkTLSHandshakeWithCurvesWithoutClientSessionCache
above. Below are cpu profiles for this benchmark:
Mixed client and server profile:
(pprof) top20
Showing nodes accounting for 154.20ms, 87.56% of 176.10ms total
Dropped 200 nodes (cum <= 0.88ms)
Showing top 20 nodes out of 103
flat flat% sum% cum cum%
82.50ms 46.85% 46.85% 82.50ms 46.85% math/big.addMulVVW /home/aliaksandr/work/go-tip/src/math/big/arith_amd64.s
19.50ms 11.07% 57.92% 113.20ms 64.28% math/big.nat.montgomery /home/aliaksandr/work/go-tip/src/math/big/nat.go
13.70ms 7.78% 65.70% 13.70ms 7.78% runtime.memmove /home/aliaksandr/work/go-tip/src/runtime/memmove_amd64.s
7.40ms 4.20% 69.90% 21.10ms 11.98% math/big.nat.divLarge /home/aliaksandr/work/go-tip/src/math/big/nat.go
5ms 2.84% 72.74% 5ms 2.84% math/big.mulAddVWW /home/aliaksandr/work/go-tip/src/math/big/arith_amd64.s
3.40ms 1.93% 74.67% 3.40ms 1.93% crypto/sha256.block /home/aliaksandr/work/go-tip/src/crypto/sha256/sha256block_amd64.s
2.70ms 1.53% 76.21% 2.70ms 1.53% math/big.subVV /home/aliaksandr/work/go-tip/src/math/big/arith_amd64.s
2.70ms 1.53% 77.74% 2.70ms 1.53% p256MulInternal /home/aliaksandr/work/go-tip/src/crypto/elliptic/p256_asm_amd64.s
2.60ms 1.48% 79.22% 2.60ms 1.48% math/big.addVV /home/aliaksandr/work/go-tip/src/math/big/arith_amd64.s
2.60ms 1.48% 80.69% 2.60ms 1.48% p256SqrInternal /home/aliaksandr/work/go-tip/src/crypto/elliptic/p256_asm_amd64.s
2.10ms 1.19% 81.89% 2.10ms 1.19% math/big.shlVU /home/aliaksandr/work/go-tip/src/math/big/arith_amd64.s
1.80ms 1.02% 82.91% 1.80ms 1.02% syscall.Syscall /home/aliaksandr/work/go-tip/src/syscall/asm_linux_amd64.s
1.30ms 0.74% 83.65% 1.30ms 0.74% crypto/elliptic.p256Sqr /home/aliaksandr/work/go-tip/src/crypto/elliptic/p256_asm_amd64.s
1.10ms 0.62% 84.27% 2.30ms 1.31% sync.(*Pool).Put /home/aliaksandr/work/go-tip/src/sync/pool.go
1ms 0.57% 84.84% 4.20ms 2.39% math/big.nat.add /home/aliaksandr/work/go-tip/src/math/big/nat.go
1ms 0.57% 85.41% 3.70ms 2.10% math/big.nat.mulAddWW /home/aliaksandr/work/go-tip/src/math/big/nat.go
1ms 0.57% 85.97% 1ms 0.57% sync.(*Mutex).Lock /home/aliaksandr/work/go-tip/src/sync/mutex.go
1ms 0.57% 86.54% 1ms 0.57% sync.(*Mutex).Unlock /home/aliaksandr/work/go-tip/src/sync/mutex.go
0.90ms 0.51% 87.05% 24.90ms 14.14% math/big.(*Int).GCD /home/aliaksandr/work/go-tip/src/math/big/int.go
0.90ms 0.51% 87.56% 5.40ms 3.07% math/big.(*Int).Mul /home/aliaksandr/work/go-tip/src/math/big/int.go
Server profile:
(pprof) top20 Server
Showing nodes accounting for 149.20ms, 84.72% of 176.10ms total
Dropped 110 nodes (cum <= 0.88ms)
Showing top 20 nodes out of 86
flat flat% sum% cum cum%
82.50ms 46.85% 46.85% 82.50ms 46.85% math/big.addMulVVW /home/aliaksandr/work/go-tip/src/math/big/arith_amd64.s
19.50ms 11.07% 57.92% 113.20ms 64.28% math/big.nat.montgomery /home/aliaksandr/work/go-tip/src/math/big/nat.go
13.40ms 7.61% 65.53% 13.40ms 7.61% runtime.memmove /home/aliaksandr/work/go-tip/src/runtime/memmove_amd64.s
7.40ms 4.20% 69.73% 21.10ms 11.98% math/big.nat.divLarge /home/aliaksandr/work/go-tip/src/math/big/nat.go
5ms 2.84% 72.57% 5ms 2.84% math/big.mulAddVWW /home/aliaksandr/work/go-tip/src/math/big/arith_amd64.s
2.70ms 1.53% 74.11% 2.70ms 1.53% math/big.subVV /home/aliaksandr/work/go-tip/src/math/big/arith_amd64.s
2.60ms 1.48% 75.58% 2.60ms 1.48% math/big.addVV /home/aliaksandr/work/go-tip/src/math/big/arith_amd64.s
2.30ms 1.31% 76.89% 2.30ms 1.31% crypto/sha256.block /home/aliaksandr/work/go-tip/src/crypto/sha256/sha256block_amd64.s
2.10ms 1.19% 78.08% 2.10ms 1.19% math/big.shlVU /home/aliaksandr/work/go-tip/src/math/big/arith_amd64.s
1.40ms 0.8% 78.88% 1.40ms 0.8% p256MulInternal /home/aliaksandr/work/go-tip/src/crypto/elliptic/p256_asm_amd64.s
1.30ms 0.74% 79.61% 1.30ms 0.74% syscall.Syscall /home/aliaksandr/work/go-tip/src/syscall/asm_linux_amd64.s
1.20ms 0.68% 80.30% 1.20ms 0.68% p256SqrInternal /home/aliaksandr/work/go-tip/src/crypto/elliptic/p256_asm_amd64.s
1.10ms 0.62% 80.92% 2.30ms 1.31% sync.(*Pool).Put /home/aliaksandr/work/go-tip/src/sync/pool.go
1ms 0.57% 81.49% 4.20ms 2.39% math/big.nat.add /home/aliaksandr/work/go-tip/src/math/big/nat.go
1ms 0.57% 82.06% 3.70ms 2.10% math/big.nat.mulAddWW /home/aliaksandr/work/go-tip/src/math/big/nat.go
1ms 0.57% 82.62% 1ms 0.57% sync.(*Mutex).Lock /home/aliaksandr/work/go-tip/src/sync/mutex.go
1ms 0.57% 83.19% 1ms 0.57% sync.(*Mutex).Unlock /home/aliaksandr/work/go-tip/src/sync/mutex.go
0.90ms 0.51% 83.70% 24.90ms 14.14% math/big.(*Int).GCD /home/aliaksandr/work/go-tip/src/math/big/int.go
0.90ms 0.51% 84.21% 5.40ms 3.07% math/big.(*Int).Mul /home/aliaksandr/work/go-tip/src/math/big/int.go
0.90ms 0.51% 84.72% 114.30ms 64.91% math/big.nat.expNNMontgomery /home/aliaksandr/work/go-tip/src/math/big/nat.go
Client profile:
(pprof) top20 Client
Showing nodes accounting for 14.10ms, 8.01% of 176.10ms total
Showing top 20 nodes out of 202
flat flat% sum% cum cum%
2.60ms 1.48% 1.48% 2.60ms 1.48% p256SqrInternal /home/aliaksandr/work/go-tip/src/crypto/elliptic/p256_asm_amd64.s
2.10ms 1.19% 2.67% 2.10ms 1.19% p256MulInternal /home/aliaksandr/work/go-tip/src/crypto/elliptic/p256_asm_amd64.s
1.60ms 0.91% 3.58% 2.50ms 1.42% math/big.nat.divLarge /home/aliaksandr/work/go-tip/src/math/big/nat.go
1.10ms 0.62% 4.20% 1.10ms 0.62% crypto/sha256.block /home/aliaksandr/work/go-tip/src/crypto/sha256/sha256block_amd64.s
0.90ms 0.51% 4.71% 0.90ms 0.51% crypto/elliptic.p256Sqr /home/aliaksandr/work/go-tip/src/crypto/elliptic/p256_asm_amd64.s
0.80ms 0.45% 5.17% 4.50ms 2.56% crypto/elliptic.p256PointDoubleAsm /home/aliaksandr/work/go-tip/src/crypto/elliptic/p256_asm_amd64.s
0.80ms 0.45% 5.62% 0.80ms 0.45% math/big.addMulVVW /home/aliaksandr/work/go-tip/src/math/big/arith_amd64.s
0.70ms 0.4% 6.02% 0.70ms 0.4% syscall.Syscall /home/aliaksandr/work/go-tip/src/syscall/asm_linux_amd64.s
0.50ms 0.28% 6.30% 1.30ms 0.74% math/big.basicMul /home/aliaksandr/work/go-tip/src/math/big/nat.go
0.40ms 0.23% 6.53% 0.40ms 0.23% math/big.subVV /home/aliaksandr/work/go-tip/src/math/big/arith_amd64.s
0.30ms 0.17% 6.70% 0.30ms 0.17% crypto/elliptic.p256Select /home/aliaksandr/work/go-tip/src/crypto/elliptic/p256_asm_amd64.s
0.30ms 0.17% 6.87% 0.30ms 0.17% crypto/hmac.New /home/aliaksandr/work/go-tip/src/crypto/hmac/hmac.go
0.30ms 0.17% 7.04% 0.30ms 0.17% math/big.mulAddVWW /home/aliaksandr/work/go-tip/src/math/big/arith_amd64.s
0.30ms 0.17% 7.21% 0.30ms 0.17% p256SubInternal /home/aliaksandr/work/go-tip/src/crypto/elliptic/p256_asm_amd64.s
0.30ms 0.17% 7.38% 0.30ms 0.17% runtime.mallocgc /home/aliaksandr/work/go-tip/src/runtime/mbitmap.go
0.30ms 0.17% 7.55% 0.30ms 0.17% runtime.memmove /home/aliaksandr/work/go-tip/src/runtime/memmove_amd64.s
0.20ms 0.11% 7.67% 0.60ms 0.34% crypto/elliptic.p256PointAddAffineAsm /home/aliaksandr/work/go-tip/src/crypto/elliptic/p256_asm_amd64.s
0.20ms 0.11% 7.78% 1.70ms 0.97% encoding/asn1.parseField /home/aliaksandr/work/go-tip/src/encoding/asn1/asn1.go
0.20ms 0.11% 7.89% 0.20ms 0.11% math/big.nat.setBytes /home/aliaksandr/work/go-tip/src/math/big/nat.go
0.20ms 0.11% 8.01% 0.20ms 0.11% runtime.heapBitsSetType /home/aliaksandr/work/go-tip/src/runtime/mbitmap.go
As you can see, the client side takes 1/10 part of CPU time comparing to the server side.