Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: ioutil.ReadAll(req.Body) gets stuck on HTTP/2 #40816

jkowalski opened this issue Aug 15, 2020 · 7 comments

x/net/http2: ioutil.ReadAll(req.Body) gets stuck on HTTP/2 #40816

jkowalski opened this issue Aug 15, 2020 · 7 comments


Copy link

@jkowalski jkowalski commented Aug 15, 2020

What version of Go are you using (go version)?

$ go version
go 1.14.7

Does this issue reproduce with the latest release?


What operating system and processor architecture are you using (go env)?

Debian/Buster (server)
Linux Mint (client)

What did you do?

This was reported by users of Kopia (backup tool). I can't personally reproduce this, but two users have reported it independently.

There's a client and server (HTTP+TLS), both compiled using golang 1.14.7. The client sends large blobs (~20 MB) to the server using PUT method on 1-4 parallel goroutines.

Sometimes the server gets stuck handling requests for no apparent reason. We have captured stack traces when this happens, both client and server-side:

(server) kopia/kopia#538 (comment)
(client) kopia/kopia#538 (comment)

In the stack traces you see that the server is stuck on ioutil.ReadAll() when reading from the request body for 3 minutes:

goroutine 23781 [sync.Cond.Wait, 3 minutes]:
sync.runtime_notifyListWait(0xc000326528, 0x16)
	/home/travis/.gimme/versions/go1.14.7.linux.amd64/src/runtime/sema.go:513 +0xf8
	/home/travis/.gimme/versions/go1.14.7.linux.amd64/src/sync/cond.go:56 +0x9d
net/http.(*http2pipe).Read(0xc000326510, 0xc005779b50, 0x7e2b0, 0x7e2b0, 0x0, 0x0, 0x0)
	/home/travis/.gimme/versions/go1.14.7.linux.amd64/src/net/http/h2_bundle.go:3515 +0x8f
net/http.(*http2requestBody).Read(0xc00117e7e0, 0xc005779b50, 0x7e2b0, 0x7e2b0, 0x12, 0x0, 0x0)
	/home/travis/.gimme/versions/go1.14.7.linux.amd64/src/net/http/h2_bundle.go:5876 +0xa0
bytes.(*Buffer).ReadFrom(0xc0001136c0, 0x7fb0a5e23020, 0xc00117e7e0, 0x7fb0a5e23020, 0x9, 0xf8bc01)
	/home/travis/.gimme/versions/go1.14.7.linux.amd64/src/bytes/buffer.go:204 +0xb1
io/ioutil.readAll(0x7fb0a5e23020, 0xc00117e7e0, 0x200, 0x0, 0x0, 0x0, 0x0, 0x0)
	/home/travis/.gimme/versions/go1.14.7.linux.amd64/src/io/ioutil/ioutil.go:36 +0xe3
	/home/travis/.gimme/versions/go1.14.7.linux.amd64/src/io/ioutil/ioutil.go:45*Server).handleContentPut(0xc0000cc4d0, 0x12e4380, 0xc00117e8a0, 0xc002a2b900, 0x7fb0a5c80658, 0x1030ea0, 0xf22842676f7cbc01)

The client:

goroutine 66172 [select]:
net/http.(*http2ClientConn).roundTrip(0xc000187980, 0xc004356700, 0x0, 0x0, 0x0, 0x0)
	/home/travis/.gimme/versions/go1.14.7.linux.amd64/src/net/http/h2_bundle.go:7596 +0x977
net/http.(*http2Transport).RoundTripOpt(0xc00069cb40, 0xc004356700, 0xf8ba00, 0xc0002d3f50, 0xc00432d740, 0x5)
	/home/travis/.gimme/versions/go1.14.7.linux.amd64/src/net/http/h2_bundle.go:6948 +0x16f
net/http.http2noDialH2RoundTripper.RoundTrip(0xc00069cb40, 0xc004356700, 0x12cb520, 0xc00069cb40, 0x0)
	/home/travis/.gimme/versions/go1.14.7.linux.amd64/src/net/http/h2_bundle.go:9103 +0x3e
net/http.(*Transport).roundTrip(0x1b130a0, 0xc004356700, 0xc0043277f0, 0x1114ace, 0xc004368630)
	/home/travis/.gimme/versions/go1.14.7.linux.amd64/src/net/http/transport.go:515 +0xd94
net/http.(*Transport).RoundTrip(0x1b130a0, 0xc004356700, 0x13, 0xc00019f0a0, 0x14)
	/home/travis/.gimme/versions/go1.14.7.linux.amd64/src/net/http/roundtrip.go:17 +0x35, 0x1b130a0, 0xc00019f220, 0x13, 0xc00019f0a0, 0x14, 0xc004356700, 0x2a1e40e9e74a, 0x3b7b8839, 0x3b7b883900000008)
	/home/travis/gopath/src/ +0x75, 0xc0002d3e00, 0xc004356700, 0xc0043277e0, 0xc0044712d0, 0x40cc48)
	/home/travis/gopath/src/ +0x7b
net/http.send(0xc004356700, 0x12cac20, 0xc00018a130, 0x0, 0x0, 0x0, 0xc004a3cbd8, 0x203001, 0x1, 0x0)
	/home/travis/.gimme/versions/go1.14.7.linux.amd64/src/net/http/client.go:252 +0x43e
net/http.(*Client).send(0xc0002d3e30, 0xc004356700, 0x0, 0x0, 0x0, 0xc004a3cbd8, 0x0, 0x1, 0xc004368660)
	/home/travis/.gimme/versions/go1.14.7.linux.amd64/src/net/http/client.go:176 +0xfa
net/http.(*Client).do(0xc0002d3e30, 0xc004356700, 0x0, 0x0, 0x0)
	/home/travis/.gimme/versions/go1.14.7.linux.amd64/src/net/http/client.go:699 +0x44a
	/home/travis/.gimme/versions/go1.14.7.linux.amd64/src/net/http/client.go:567*KopiaAPIClient).runRequest(0xc00001a800, 0x12e42c0, 0xc004496100, 0x110b701, 0x3, 0xc00432d740, 0x54, 0x0, 0x0, 0xf1a040, ...)
	/home/travis/gopath/src/ +0x135*KopiaAPIClient).Put(...)

Server-side code is here:

Client-side code is here:

According to the bug report, disabling HTTP/2 using GODEBUG resolves the issue.

Full disclosure: I'm a Googler, but Kopia is my personal project and not affiliated with Google.

Copy link

@jkowalski jkowalski commented Aug 16, 2020

I did some more digging through the stack traces and the server is handling 5 requests at this time:

goroutine 23751 - holding rlock - inside ioutil.ReadAll()
goroutine 23781 - holding rlock - inside ioutil.ReadAll()

goroutine 23888 - waiting for rlock - did not do ioutil.ReadAll() yet
goroutine 23939 - waiting for rlock - did not do ioutil.ReadAll() yet
goroutine 24018 - waiting for rlock - did not do ioutil.ReadAll() yet

goroutine 34 - waiting for r/w lock

My understanding is that since RWMutex is writer-preferring, 34 will not acquire the r/w lock until 23751 and 23781 finish, and will also prevent 24018, 23939 and 23888 from acquiring the read lock.

Now I'm wondering whether having 3 requests that are not reading request bodies immediately (because they have not acquired the shared lock) could prevent the other 2 requests that already have the shared lock from making progress, in particular due to the fact that request bodies are rather large ~20MB in size each.

In other words - does HTTP/2 require all request handlers to read request bodies to ensure liveness? It appears that HTTP/1.1 server does not have this behavior (I'm guessing because there is no request pipelining).

Copy link

@networkimprov networkimprov commented Aug 16, 2020

cc @fraenkel

Copy link

@jkowalski jkowalski commented Aug 17, 2020

FWIW - we changed the code in Kopia to read all request payload before acquiring shared lock and got confirmation that it fixed the issue.

If timely reading of request bodies is indeed required for liveness, perhaps it should be documented.

@dmitshur dmitshur added this to the Backlog milestone Aug 18, 2020
Copy link

@dmitshur dmitshur commented Aug 18, 2020

/cc @bradfitz @tombergan per owners.

@dmitshur dmitshur added the NeedsInvestigation label Aug 18, 2020
Copy link

@oguzyildiz1991 oguzyildiz1991 commented Nov 4, 2020

hi @jkowalski , were you able to find any workarounds for this problem? I'm having it pretty consistently too with http2

Copy link

@networkimprov networkimprov commented Nov 4, 2020

See his last comment above.

Copy link

@oguzyildiz1991 oguzyildiz1991 commented Nov 4, 2020

oh I misunderstood the problem then, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet

No branches or pull requests

4 participants