Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: connection-level flow control not returned if stream errors, causes server hang #40423

Open
jared2501 opened this issue Jul 27, 2020 · 6 comments
Milestone

Comments

@jared2501
Copy link

@jared2501 jared2501 commented Jul 27, 2020

What version of Go are you using (go version)?

$ go version
go version go1.14.2 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What's the issue

A hang was reported between a gRPC client (grpc-go v1.27.0) hitting a gRPC server in one of our production environments. The client and server are both running on the same host. I captured a core dump of the client and server code to analyze with delve. I noticed that the google.golang.org/grpc/internal/transport.loopyWriter.cbuf.sendQuota was 0 in the client code, which indicates that the client's connection-level send window had run out and was at 0. In the server's core dump, I tracked down the corresponding http2.serverConn and noticed that it's serverConn.inflow.n was set to 0 too. I then tracked down the two places in http2/server.go that call inflow.take and noticed what I believe is the issue in processData:

func (sc *serverConn) processData(f *DataFrame) error {
	...

	if f.Length > 0 {
		// Check whether the client has flow control quota.
		if st.inflow.available() < int32(f.Length) {
			return streamError(id, ErrCodeFlowControl)
		}
		st.inflow.take(int32(f.Length))

		if len(data) > 0 {
			wrote, err := st.body.Write(data)
			if err != nil {
				return streamError(id, ErrCodeStreamClosed)
			}
			if wrote != len(data) {
				panic("internal error: bad Writer")
			}
			st.bodyBytes += int64(len(data))
		}

		// Return any padded flow control now, since we won't
		// refund it later on body reads.
		if pad := int32(f.Length) - int32(len(data)); pad > 0 {
			sc.sendWindowUpdate32(nil, pad)
			sc.sendWindowUpdate32(st, pad)
		}
	}
...

In this code, st.inflow.take is called, but if st.body.Write returns an error then the flow control is not refunded to the client since the code bails and returns a streamError (nor is it added to the st.body's pipeBuffer since pipe.Write returns immediately if it has an error to return).

Side note: st.body.Write may return an error if st.body.Close is called. The server which had this issue is using grpc-go's serverHandlerTransport which does, in fact, call req.Body.Close (see here). A gRPC bi-directional streaming endpoint is running between the client and server, and what i suspect is happening is the client is sending the server data over the bi-di stream while an error happens in the gRPC server that causes the request to end, and therefore req.Body.Close to be called while data is in flight.

Here's what I think a possible fix to net/http2 could look like:

diff --git a/http2/server.go b/http2/server.go
index 01f4ecc..ba3ebd1 100644
--- a/http2/server.go
+++ b/http2/server.go
@@ -1650,6 +1650,7 @@ func (sc *serverConn) processData(f *DataFrame) error {
                if len(data) > 0 {
                        wrote, err := st.body.Write(data)
                        if err != nil {
+                               sc.sendWindowUpdate32(nil, int32(f.Length))
                                return streamError(id, ErrCodeStreamClosed)
                        }
                        if wrote != len(data) {
@jared2501 jared2501 changed the title x/net/http2: connection-level flow control not rendered if stream errors, cases server hang x/net/http2: connection-level flow control not returned if stream errors, cases server hang Jul 27, 2020
@jared2501 jared2501 changed the title x/net/http2: connection-level flow control not returned if stream errors, cases server hang x/net/http2: connection-level flow control not returned if stream errors, causes server hang Jul 27, 2020
@networkimprov
Copy link

@networkimprov networkimprov commented Jul 27, 2020

@gopherbot
Copy link

@gopherbot gopherbot commented Jul 28, 2020

Change https://golang.org/cl/245158 mentions this issue: net/http2: send WINDOW_UPDATE on a body's write failure

@cagedmantis cagedmantis added this to the Backlog milestone Jul 28, 2020
@cagedmantis
Copy link
Contributor

@cagedmantis cagedmantis commented Jul 28, 2020

@jared2501
Copy link
Author

@jared2501 jared2501 commented Jul 28, 2020

Thanks for looking into this so quickly! Once the fix is merged, is there any way we could get this fix backported to 1.14 (or make it's way into a 1.15 release)?

@wader
Copy link

@wader wader commented Aug 5, 2020

Hello, i'm suspecting that I might have run into this issue when talking to GCS using the google-cloud-go package. It's a service that more or less proxies HTTP to GCS and under quite high load it usually stops talking to GCS after ~15min.

When running with GODEBUG=http2debug=2 this is what is see just before it stops doing requests:

...
2020/08/05 15:13:06 http2: Framer 0xc0000c8460: wrote HEADERS flags=END_STREAM|END_HEADERS stream=3067 len=121
2020/08/05 15:13:11 http2: Framer 0xc0000c8460: wrote WINDOW_UPDATE stream=3061 len=4 incr=5120
2020/08/05 15:19:06 http2: Framer 0xc0000c8460: read RST_STREAM stream=21 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Transport received RST_STREAM stream=21 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Framer 0xc0000c8460: read RST_STREAM stream=23 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Transport received RST_STREAM stream=23 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Framer 0xc0000c8460: read RST_STREAM stream=2725 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Transport received RST_STREAM stream=2725 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Framer 0xc0000c8460: read RST_STREAM stream=2971 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Transport received RST_STREAM stream=2971 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Framer 0xc0000c8460: read RST_STREAM stream=3037 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Transport received RST_STREAM stream=3037 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Framer 0xc0000c8460: read RST_STREAM stream=3053 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Transport received RST_STREAM stream=3053 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Framer 0xc0000c8460: read RST_STREAM stream=3067 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Transport received RST_STREAM stream=3067 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Framer 0xc0000c8460: read RST_STREAM stream=3061 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Transport received RST_STREAM stream=3061 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Framer 0xc0000c8460: read RST_STREAM stream=2717 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Transport received RST_STREAM stream=2717 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Framer 0xc0000c8460: read RST_STREAM stream=547 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Transport received RST_STREAM stream=547 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Transport encoding header ":authority" = "storage.googleapis.com"
2020/08/05 15:19:06 http2: Transport encoding header ":method" = "GET"
2020/08/05 15:19:06 RoundTrip failure: stream error: stream ID 21; INTERNAL_ERROR
...

Or is this something else or i'm doing something wrong?

@networkimprov
Copy link

@networkimprov networkimprov commented Aug 5, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.