-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Description
What version of Go are you using (go version)?
$ go version `go1.16 darwin/amd6`
Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (go env)?
go env Output
$ go envCOMP12013:dd-go richard.artoul$ go env GO111MODULE="auto" GOARCH="amd64" GOBIN="" GOCACHE="/Users/richard.artoul/Library/Caches/go-build" GOENV="/Users/richard.artoul/Library/Application Support/go/env" GOEXE="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="darwin" GOINSECURE="" GOMODCACHE="/Users/richard.artoul/go/pkg/mod" GONOPROXY="" GONOSUMDB="" GOOS="darwin" GOPATH="/Users/richard.artoul/go" GOPRIVATE="" GOPROXY="https://proxy.golang.org,direct" GOROOT="/usr/local/Cellar/go/1.15/libexec" GOSUMDB="sum.golang.org" GOTMPDIR="" GOTOOLDIR="/usr/local/Cellar/go/1.15/libexec/pkg/tool/darwin_amd64" GCCGO="gccgo" AR="ar" CC="clang" CXX="clang++" CGO_ENABLED="1" GOMOD="" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/t2/02qzh_vs4cn57ctvc7dwcsc80000gn/T/go-build075316343=/tmp/go-build -gno-record-gcc-switches -fno-common"
What did you do?
I'm using this library: https://github.com/google/go-cloud to read files from S3 in a streaming fashion.
I noticed that a lot of time was being spent in syscalls:

I was aware of this issue: #22618
So I tuned my http client read/write buffer transport sizes to be 256kib instead of 64kib but this had no impact on time spent in syscalls which made me suspicious that somehow the way the reads were being performed that reads were not actually being buffered as I expected.
I wrote a small program to download a file from S3 in a streaming fashion using large 1mib reads, like this:
stream, err := store.GetStream(ctx, *bucket, *path)
if err != nil {
log.Fatalf("error getting: %s, err: %v", *path, err)
}
defer stream.Close()
buf := make([]byte, 1<<20)
for {
n, err := bufio.NewReaderSize(stream, 1<<20).Read(buf)
if err == io.EOF {
break
}
if err != nil {
log.Fatal(err)
}
fmt.Println("n:", n)
}I couldn't get dtrace to work properly on OS X, but luckily my application uses a custom dialer for setting write/read deadlines on every socket read so I was able to instrument the actual socket read sizes like this:
func (d *deadlineConn) Read(p []byte) (int, error) {
d.Conn.SetReadDeadline(time.Now().Add(d.readDeadline))
fmt.Println("read size", len(p))
n, err := d.Conn.Read(p)
err = maybeWrapErr(err)
return n, err
}What did you expect to see?
Large syscall reads (in the range of 256KiB)
What did you see instead?
Extremely small sys call reads:
read size 52398
n: 16384
n: 1024
n: 16384
n: 1024
n: 16384
n: 1024
read size 52398
n: 16384
n: 1024
read size 28361
read size 26929
n: 16384
n: 1024
n: 16384
read size 5449
n: 1024
read size 56415
read size 47823
n: 16384
n: 1024
n: 16384
read size 23479
What did you do after?
I made a small change in tls.go to instantiate the TLS client with a much larger rawInput buffer:
// Client returns a new TLS client side connection
// using conn as the underlying transport.
// The config cannot be nil: users must set either ServerName or
// InsecureSkipVerify in the config.
func Client(conn net.Conn, config *Config) *Conn {
c := &Conn{
rawInput: *bytes.NewBuffer(make([]byte, 0, 1<<20)),
conn: conn,
config: config,
isClient: true,
}
c.handshakeFn = c.clientHandshake
return c
}As expected I began to observe much larger syscall reads:
read size 1034019
read size 1024035
read size 1022603
read size 1003987
read size 993963
read size 991099
read size 985371
read size 982507
read size 981075
read size 965323
read size 963891
read size 955299
read size 945275
read size 935251
I haven't tried deploying my fork to production, and measuring performance on my laptop is not interesting since I have a terrible connection to S3, but I think its well understood that a 10x increase in syscalls (especially with such a small read size of 64kib) has a dramatic impact on performance.
Proposal
I'm not 100% sure what the best approach is here, but I think we should do something since this issue means that streaming large amounts of data over TLS is much more CPU intensive than it needs to be which is a big deal for applications that process large volumes of data over the network like distributed databases.
The tls package already has a Config struct. It seems like it would be straightforward to add buffer size configuration there like has already been done for the HTTP transport. In addition, it seems reasonable that the HTTP client transport buffer sizes should be automatically propagated as the values for the TLS buffer size if the user doesn't specify a specific override.