Skip to content

crypto/tls: slow server-side handshake performance for RSA certificates without client session cache #20058

@valyala

Description

@valyala

Both go 1.8 and go tip provides too slow server-side handshake performance for RSA certificates if the client doesn't use TLS session cache:

$ go get -u github.com/valyala/fasthttp/fasthttputil
$ GOMAXPROCS=1 go test github.com/valyala/fasthttp/fasthttputil -bench=TLSHandshake
goos: linux
goarch: amd64
pkg: github.com/valyala/fasthttp/fasthttputil
BenchmarkPlainHandshake                                  	  300000	      3953 ns/op
BenchmarkTLSHandshakeWithClientSessionCache              	   20000	     81960 ns/op
BenchmarkTLSHandshakeWithoutClientSessionCache           	     500	   3493016 ns/op
BenchmarkTLSHandshakeWithCurvesWithClientSessionCache    	   20000	     80307 ns/op
BenchmarkTLSHandshakeWithCurvesWithoutClientSessionCache 	     500	   3518508 ns/op
PASS
ok  	github.com/valyala/fasthttp/fasthttputil	11.683s

The results show that a single amd64 core may perform only 300 handshakes per second from new clients without session tickets. This is very discouraging performance comparing to openssl as described on https://istlsfastyet.com/ :

$ openssl version
OpenSSL 1.0.2g  1 Mar 2016
$ openssl speed ecdh
...
                              op      op/s
 256 bit ecdh (nistp256)   0.0001s  12797.0
 384 bit ecdh (nistp384)   0.0007s   1416.8
 521 bit ecdh (nistp521)   0.0005s   1968.0

Note that openssl performs 12797 256-bit ecdh operations per second on a single CPU core. This is 40x higher than the results from the comparable BenchmarkTLSHandshakeWithCurvesWithoutClientSessionCache above. Below are cpu profiles for this benchmark:

Mixed client and server profile:

(pprof) top20
Showing nodes accounting for 154.20ms, 87.56% of 176.10ms total
Dropped 200 nodes (cum <= 0.88ms)
Showing top 20 nodes out of 103
      flat  flat%   sum%        cum   cum%
   82.50ms 46.85% 46.85%    82.50ms 46.85%  math/big.addMulVVW /home/aliaksandr/work/go-tip/src/math/big/arith_amd64.s
   19.50ms 11.07% 57.92%   113.20ms 64.28%  math/big.nat.montgomery /home/aliaksandr/work/go-tip/src/math/big/nat.go
   13.70ms  7.78% 65.70%    13.70ms  7.78%  runtime.memmove /home/aliaksandr/work/go-tip/src/runtime/memmove_amd64.s
    7.40ms  4.20% 69.90%    21.10ms 11.98%  math/big.nat.divLarge /home/aliaksandr/work/go-tip/src/math/big/nat.go
       5ms  2.84% 72.74%        5ms  2.84%  math/big.mulAddVWW /home/aliaksandr/work/go-tip/src/math/big/arith_amd64.s
    3.40ms  1.93% 74.67%     3.40ms  1.93%  crypto/sha256.block /home/aliaksandr/work/go-tip/src/crypto/sha256/sha256block_amd64.s
    2.70ms  1.53% 76.21%     2.70ms  1.53%  math/big.subVV /home/aliaksandr/work/go-tip/src/math/big/arith_amd64.s
    2.70ms  1.53% 77.74%     2.70ms  1.53%  p256MulInternal /home/aliaksandr/work/go-tip/src/crypto/elliptic/p256_asm_amd64.s
    2.60ms  1.48% 79.22%     2.60ms  1.48%  math/big.addVV /home/aliaksandr/work/go-tip/src/math/big/arith_amd64.s
    2.60ms  1.48% 80.69%     2.60ms  1.48%  p256SqrInternal /home/aliaksandr/work/go-tip/src/crypto/elliptic/p256_asm_amd64.s
    2.10ms  1.19% 81.89%     2.10ms  1.19%  math/big.shlVU /home/aliaksandr/work/go-tip/src/math/big/arith_amd64.s
    1.80ms  1.02% 82.91%     1.80ms  1.02%  syscall.Syscall /home/aliaksandr/work/go-tip/src/syscall/asm_linux_amd64.s
    1.30ms  0.74% 83.65%     1.30ms  0.74%  crypto/elliptic.p256Sqr /home/aliaksandr/work/go-tip/src/crypto/elliptic/p256_asm_amd64.s
    1.10ms  0.62% 84.27%     2.30ms  1.31%  sync.(*Pool).Put /home/aliaksandr/work/go-tip/src/sync/pool.go
       1ms  0.57% 84.84%     4.20ms  2.39%  math/big.nat.add /home/aliaksandr/work/go-tip/src/math/big/nat.go
       1ms  0.57% 85.41%     3.70ms  2.10%  math/big.nat.mulAddWW /home/aliaksandr/work/go-tip/src/math/big/nat.go
       1ms  0.57% 85.97%        1ms  0.57%  sync.(*Mutex).Lock /home/aliaksandr/work/go-tip/src/sync/mutex.go
       1ms  0.57% 86.54%        1ms  0.57%  sync.(*Mutex).Unlock /home/aliaksandr/work/go-tip/src/sync/mutex.go
    0.90ms  0.51% 87.05%    24.90ms 14.14%  math/big.(*Int).GCD /home/aliaksandr/work/go-tip/src/math/big/int.go
    0.90ms  0.51% 87.56%     5.40ms  3.07%  math/big.(*Int).Mul /home/aliaksandr/work/go-tip/src/math/big/int.go

Server profile:

(pprof) top20 Server
Showing nodes accounting for 149.20ms, 84.72% of 176.10ms total
Dropped 110 nodes (cum <= 0.88ms)
Showing top 20 nodes out of 86
      flat  flat%   sum%        cum   cum%
   82.50ms 46.85% 46.85%    82.50ms 46.85%  math/big.addMulVVW /home/aliaksandr/work/go-tip/src/math/big/arith_amd64.s
   19.50ms 11.07% 57.92%   113.20ms 64.28%  math/big.nat.montgomery /home/aliaksandr/work/go-tip/src/math/big/nat.go
   13.40ms  7.61% 65.53%    13.40ms  7.61%  runtime.memmove /home/aliaksandr/work/go-tip/src/runtime/memmove_amd64.s
    7.40ms  4.20% 69.73%    21.10ms 11.98%  math/big.nat.divLarge /home/aliaksandr/work/go-tip/src/math/big/nat.go
       5ms  2.84% 72.57%        5ms  2.84%  math/big.mulAddVWW /home/aliaksandr/work/go-tip/src/math/big/arith_amd64.s
    2.70ms  1.53% 74.11%     2.70ms  1.53%  math/big.subVV /home/aliaksandr/work/go-tip/src/math/big/arith_amd64.s
    2.60ms  1.48% 75.58%     2.60ms  1.48%  math/big.addVV /home/aliaksandr/work/go-tip/src/math/big/arith_amd64.s
    2.30ms  1.31% 76.89%     2.30ms  1.31%  crypto/sha256.block /home/aliaksandr/work/go-tip/src/crypto/sha256/sha256block_amd64.s
    2.10ms  1.19% 78.08%     2.10ms  1.19%  math/big.shlVU /home/aliaksandr/work/go-tip/src/math/big/arith_amd64.s
    1.40ms   0.8% 78.88%     1.40ms   0.8%  p256MulInternal /home/aliaksandr/work/go-tip/src/crypto/elliptic/p256_asm_amd64.s
    1.30ms  0.74% 79.61%     1.30ms  0.74%  syscall.Syscall /home/aliaksandr/work/go-tip/src/syscall/asm_linux_amd64.s
    1.20ms  0.68% 80.30%     1.20ms  0.68%  p256SqrInternal /home/aliaksandr/work/go-tip/src/crypto/elliptic/p256_asm_amd64.s
    1.10ms  0.62% 80.92%     2.30ms  1.31%  sync.(*Pool).Put /home/aliaksandr/work/go-tip/src/sync/pool.go
       1ms  0.57% 81.49%     4.20ms  2.39%  math/big.nat.add /home/aliaksandr/work/go-tip/src/math/big/nat.go
       1ms  0.57% 82.06%     3.70ms  2.10%  math/big.nat.mulAddWW /home/aliaksandr/work/go-tip/src/math/big/nat.go
       1ms  0.57% 82.62%        1ms  0.57%  sync.(*Mutex).Lock /home/aliaksandr/work/go-tip/src/sync/mutex.go
       1ms  0.57% 83.19%        1ms  0.57%  sync.(*Mutex).Unlock /home/aliaksandr/work/go-tip/src/sync/mutex.go
    0.90ms  0.51% 83.70%    24.90ms 14.14%  math/big.(*Int).GCD /home/aliaksandr/work/go-tip/src/math/big/int.go
    0.90ms  0.51% 84.21%     5.40ms  3.07%  math/big.(*Int).Mul /home/aliaksandr/work/go-tip/src/math/big/int.go
    0.90ms  0.51% 84.72%   114.30ms 64.91%  math/big.nat.expNNMontgomery /home/aliaksandr/work/go-tip/src/math/big/nat.go

Client profile:

(pprof) top20 Client
Showing nodes accounting for 14.10ms, 8.01% of 176.10ms total
Showing top 20 nodes out of 202
      flat  flat%   sum%        cum   cum%
    2.60ms  1.48%  1.48%     2.60ms  1.48%  p256SqrInternal /home/aliaksandr/work/go-tip/src/crypto/elliptic/p256_asm_amd64.s
    2.10ms  1.19%  2.67%     2.10ms  1.19%  p256MulInternal /home/aliaksandr/work/go-tip/src/crypto/elliptic/p256_asm_amd64.s
    1.60ms  0.91%  3.58%     2.50ms  1.42%  math/big.nat.divLarge /home/aliaksandr/work/go-tip/src/math/big/nat.go
    1.10ms  0.62%  4.20%     1.10ms  0.62%  crypto/sha256.block /home/aliaksandr/work/go-tip/src/crypto/sha256/sha256block_amd64.s
    0.90ms  0.51%  4.71%     0.90ms  0.51%  crypto/elliptic.p256Sqr /home/aliaksandr/work/go-tip/src/crypto/elliptic/p256_asm_amd64.s
    0.80ms  0.45%  5.17%     4.50ms  2.56%  crypto/elliptic.p256PointDoubleAsm /home/aliaksandr/work/go-tip/src/crypto/elliptic/p256_asm_amd64.s
    0.80ms  0.45%  5.62%     0.80ms  0.45%  math/big.addMulVVW /home/aliaksandr/work/go-tip/src/math/big/arith_amd64.s
    0.70ms   0.4%  6.02%     0.70ms   0.4%  syscall.Syscall /home/aliaksandr/work/go-tip/src/syscall/asm_linux_amd64.s
    0.50ms  0.28%  6.30%     1.30ms  0.74%  math/big.basicMul /home/aliaksandr/work/go-tip/src/math/big/nat.go
    0.40ms  0.23%  6.53%     0.40ms  0.23%  math/big.subVV /home/aliaksandr/work/go-tip/src/math/big/arith_amd64.s
    0.30ms  0.17%  6.70%     0.30ms  0.17%  crypto/elliptic.p256Select /home/aliaksandr/work/go-tip/src/crypto/elliptic/p256_asm_amd64.s
    0.30ms  0.17%  6.87%     0.30ms  0.17%  crypto/hmac.New /home/aliaksandr/work/go-tip/src/crypto/hmac/hmac.go
    0.30ms  0.17%  7.04%     0.30ms  0.17%  math/big.mulAddVWW /home/aliaksandr/work/go-tip/src/math/big/arith_amd64.s
    0.30ms  0.17%  7.21%     0.30ms  0.17%  p256SubInternal /home/aliaksandr/work/go-tip/src/crypto/elliptic/p256_asm_amd64.s
    0.30ms  0.17%  7.38%     0.30ms  0.17%  runtime.mallocgc /home/aliaksandr/work/go-tip/src/runtime/mbitmap.go
    0.30ms  0.17%  7.55%     0.30ms  0.17%  runtime.memmove /home/aliaksandr/work/go-tip/src/runtime/memmove_amd64.s
    0.20ms  0.11%  7.67%     0.60ms  0.34%  crypto/elliptic.p256PointAddAffineAsm /home/aliaksandr/work/go-tip/src/crypto/elliptic/p256_asm_amd64.s
    0.20ms  0.11%  7.78%     1.70ms  0.97%  encoding/asn1.parseField /home/aliaksandr/work/go-tip/src/encoding/asn1/asn1.go
    0.20ms  0.11%  7.89%     0.20ms  0.11%  math/big.nat.setBytes /home/aliaksandr/work/go-tip/src/math/big/nat.go
    0.20ms  0.11%  8.01%     0.20ms  0.11%  runtime.heapBitsSetType /home/aliaksandr/work/go-tip/src/runtime/mbitmap.go

As you can see, the client side takes 1/10 part of CPU time comparing to the server side.

@agl , @vkrasnov

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions