One request/stream has its context cancelled, causing the abortStream
The same request/stream also blocks on requestBody.Close
Other requests on the same connection try to acquire the mutex
Is this actually expected/intended behaviour? I.e. do we require that closing the request body should always be non-blocking? If so, I guess we could mitigate by increasing the concurrent connection count, but it feels like we could still hit the deadlock across multiple connections
What did you expect to see?
Other requests be able to make progress whilst the first request is blocked
What did you see instead?
All requests on the same client connection were blocked.
Other things
I'm very happy to attempt a fix, but I'm a bit of a novice in this part of the codebase!
The text was updated successfully, but these errors were encountered:
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes, this issue reproduces with
golang.org/x/net v0.0.0-20220425223048-2871e0cb64e4
(see reproduction below)What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
We enabled HTTP/2 for an internal proxy written in Go, and noticed that after 10 minutes, every connection into the proxy seemed to be hanging. A goroutine dump revealed that many goroutines were blocked on acquiring the ClientConn mutex (e.g. see https://github.com/golang/net/blob/2871e0cb64e47e47ba7466fad6e11562caf99563/http2/transport.go#L790).
One goroutine, however, had acquired the mutex and was blocked in
abortStreamLocked
trying to close the request body: https://github.com/golang/net/blob/2871e0cb64e47e47ba7466fad6e11562caf99563/http2/transport.go#L379. In our case this seemed to be indefinitely blocked trying to read from a network socket. We haven't figured out why yet, but that's a separate issue.I've knocked up a very crude and simple reproduction here - I'll try and come back and tidy this up a bit later: https://github.com/milesbxf/go-http2-request-close-deadlock/blob/main/deadlock_test.go
For this to occur:
Is this actually expected/intended behaviour? I.e. do we require that closing the request body should always be non-blocking? If so, I guess we could mitigate by increasing the concurrent connection count, but it feels like we could still hit the deadlock across multiple connections
What did you expect to see?
Other requests be able to make progress whilst the first request is blocked
What did you see instead?
All requests on the same client connection were blocked.
Other things
I'm very happy to attempt a fix, but I'm a bit of a novice in this part of the codebase!
The text was updated successfully, but these errors were encountered: