Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: interleaving sendfile and read calls are very slow #45256

Open
erikdubbelboer opened this issue Mar 26, 2021 · 13 comments
Open

net: interleaving sendfile and read calls are very slow #45256

erikdubbelboer opened this issue Mar 26, 2021 · 13 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@erikdubbelboer
Copy link
Contributor

What version of Go are you using (go version)?

$ go version
go version go1.16.2 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/root/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/root/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.16.2"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/root/gocode/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2301853134=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Calling io.Copy triggering sendfile multiple times on a socket with the same data size.

What did you expect to see?

Each io.Copy call to take the same amount of time.

What did you see instead?

The first io.Copy call is fast at around 500µs. All following calls to io.Copy are slow at around 43ms.

I have written a simple program to demonstrate the issue: https://gist.github.com/erikdubbelboer/3eebecc551245a3b1d5f06a9ef433654#file-bench-go

I have also included a C version of the server to show that implementing similar code in C doesn't have the issue: https://gist.github.com/erikdubbelboer/3eebecc551245a3b1d5f06a9ef433654#file-server-c

Output:

$ go run bench.go
501.52µs       <-- fast
211.711399ms   <-- always super slow for some reason
48.073968ms    <-- still slow
43.86743ms
43.951452ms
47.955709ms
43.997812ms
43.916976ms
44.037359ms
48.032407ms
$ gcc -o server server.c
$ ./server &
[1] 44860
$ go run bench.go noserver
214.586µs
108.971µs
276.024µs
278.569µs
91.277µs
102.528µs
102.278µs
87.94µs
90.094µs
92.318µs
[1]+  Done  ./server

Interesting is also that removing the 1 byte that is being send to the server first, and just having the server send all 10 files at once doesn't trigger the performance impact. Also not when replacing the reading of the 1 byte with a time.Sleep(time.Millisecond). So it seems that something about the data coming from the other direction causes the followup sendfile calls to be slow.

The reason why I'm thinking this is related to sendfile is because when I use io.Copy(writerOnly{conn}, fd) with writerOnly from net/net.go it doesn't affect performance. writerOnly as documented in the Go source doesn't implement io.ReaderFrom (like net.TCPConn does) and doesn't trigger sendfile.

Not using sendfile at all is also fast:

$ go run bench-nosendfile.go 
324.237µs
165.15µs
154.409µs
173.415µs
158.407µs
146.023µs
144.038µs
140.452µs
159.669µs
156.783µs

This mostly impacts net/http file serving on keep-alive connections. This is why I included the 1 byte being send from the client to the server first as it imitates http behavior.

@seankhliao seankhliao added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 26, 2021
@ianlancetaylor
Copy link
Contributor

The C code that I can see doesn't seem to be comparable, because it using blocking I/O, but Go always nonblocking I/O.

What I see from strace is that the sendfile consistently sends 102400 bytes. But for some reason the read in the client only reads 65536 bytes. So it makes another read system call to complete the io.ReadFull. The first time, that read completes immediately, returning the remaining 36864 bytes. The subsequent times, that second read returns EAGAIN. So the program adds the descriptor to epoll, and waits for it to become readable. When that happens, the read returns 36864.

If I change the file size to 65536, every loop iteration takes roughly the same amount of time.

I don't know why the kernel is only returning 65536 bytes for the first read, and I don't know why a second following read returns EAGAIN for the subsequent loop iterations.

@erikdubbelboer
Copy link
Contributor Author

@ianlancetaylor indeed it seems like the client is the issue look at strace. But it's still weird that changing the server to not use sendfile or the blocking IO C implementation makes it fast. It seems to be some timing issue. When I put a time.Sleep(time.Millisecond*100) in the client for loop everything becomes fast again.

But the timing issue always seems present when using net/http. The following program demonstrate this: https://gist.github.com/erikdubbelboer/3eebecc551245a3b1d5f06a9ef433654#file-benchhttp-go

The output I get from this program is:

$ go run benchhttp.go
2.107016ms
624.168µs
668.172µs
670.807µs
632.343µs
554.222µs
529.344µs
625.901µs
606.854µs
598.888µs

But when I remove client.CloseIdleConnections() to use a keep-alive connection the output is:

$ go run benchhttp.go
2.637642ms
447.034µs
205.563976ms
206.852669ms
207.015865ms
211.073544ms
211.069326ms
207.004693ms
211.057584ms
207.053117ms

As you can see even with the time.Sleep(time.Second) still in place the response is much slower. Removing the time.Sleep doesn't affect the response time at all.

Again wrapping the http.ResponseWriter in a writerOnly{} (to not use sendfile) fixes the issue and makes the responses slightly faster as not using a keep-alive connection.

I'm wondering if this is a deeper issue we can fix or if we should at least make it so net/http doesn't use sendfile on keep-alive connections?

@qingyunha
Copy link
Contributor

As I test, the C code also behaves similarly to go, If the count param of sendfile larger than the file size (go default is 4194304).

qingyunha added a commit to qingyunha/go that referenced this issue Mar 28, 2021
if the count param of sendfile larger than the file size to be sent,
the TCP will cause a 200ms delay somehow.

Fixes golang#41513
Fixes golang#45256
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/305229 mentions this issue: net: use the size be sent as the count param of sendfile

@erikdubbelboer
Copy link
Contributor Author

This indeed seems to fix the problem. You can test this by passing an io.LimitReader(fd, size) instead of just fd to trigger remain to be set to the correct size instead of 4194304.

@ianlancetaylor
Copy link
Contributor

Running a server and a client on the same system is not the common case, and is not the case for which we should optimize. Can you do some measurements where the two programs are on different systems? Thanks.

@erikdubbelboer
Copy link
Contributor Author

I have done some tests between servers with <1ms latency and with ~6ms latency and in both cases it's not noticeable. So I'm guessing its indeed some weird behavior with nonblocking I/O and sendfile where all streams are on the same machine.

@OneOfOne
Copy link
Contributor

OneOfOne commented Mar 30, 2021

Running a server and a client on the same system is not the common case, and is not the case for which we should optimize. Can you do some measurements where the two programs are on different systems? Thanks.

It's semi common when you run a https server that proxies to multiple microservices on the same machine.

@lesismal
Copy link

I write a non-blocking IO framework, and tried the sendfile test, it seems to work fine on the non-blocking fd:
https://gist.github.com/lesismal/d63098add20868d2b71a2168483ef891

Output:

go run bench.go 
277.042µs
237.266µs
288.234µs
219.801µs
238.872µs
260.966µs
274.589µs
265.921µs
205.281µs
250.087µs

erikdubbelboer's std version:
https://gist.github.com/erikdubbelboer/3eebecc551245a3b1d5f06a9ef433654#file-bench-go

go run ./std.go 
486.578µs
209.483756ms
207.547278ms
208.009642ms
43.189515ms
208.18629ms
208.294767ms
43.925097ms
211.092839ms
208.350165ms

both were run on my laptop, same ubuntu VM.

@qingyunha
Copy link
Contributor

I write a non-blocking IO framework, and tried the sendfile test, it seems to work fine on the non-blocking fd:
https://gist.github.com/lesismal/d63098add20868d2b71a2168483ef891

https://github.com/lesismal/nbio/blob/517442a262d0e26fecf1a380cddd403ae70bdb14/sendfile_linux.go#L32, your sendfile use the size of the file, this is the thing.

@lesismal
Copy link

lesismal commented Apr 25, 2021

I write a non-blocking IO framework, and tried the sendfile test, it seems to work fine on the non-blocking fd:
https://gist.github.com/lesismal/d63098add20868d2b71a2168483ef891

https://github.com/lesismal/nbio/blob/517442a262d0e26fecf1a380cddd403ae70bdb14/sendfile_linux.go#L32, your sendfile use the size of the file, this is the thing.

You are right, I tried a larger size, then get a slower result, same as the io.Copy and c code.

I tried another test: set the read buffer size for the new connection to be large enough before the data interaction started. The new test performs better than the original one:
https://gist.github.com/lesismal/d63098add20868d2b71a2168483ef891#file-bench_readbuffer-go

Output:

default read buffer:
309.454µs
209.564193ms
207.495256ms
206.871764ms
46.101953ms
205.103742ms
210.380848ms
44.039169ms
210.697342ms
209.346464ms

read buffer 1024*1000:
218.201µs
142.148µs
138.761µs
102.964µs
86.283µs
72.657µs
72.307µs
120.246µs
257.105µs
153.59µs

The kernel implementation of sendfile relies on splice, I read the kernel code but not familiar and not sure about it:
https://github.com/torvalds/linux/blob/master/fs/read_write.c#L1254
I guess the zero-copy process of splice may wait for several RTTs to confirm that there is enough read buffer for the other side to receive the incoming size from the application layer of the server which is larger than the file size.

@lesismal
Copy link

As I test, the C code also behaves similarly to go, If the count param of sendfile larger than the file size (go default is 4194304).

it seems better to use the size of the file:
https://github.com/golang/go/blob/master/src/net/sendfile_linux.go#L21

as https://github.com/golang/go/blob/master/src/net/sendfile_unix_alt.go#L43 did

@AlexanderYastrebov
Copy link
Contributor

Rediscovered as #61530

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants