Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: misbehaved streams can cause connections to exhaust flow control #28204

Open
jared2501 opened this Issue Oct 15, 2018 · 14 comments

Comments

Projects
None yet
8 participants
@jared2501
Copy link

jared2501 commented Oct 15, 2018

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go1.11 darwin/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/jnewman/Library/Caches/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/jnewman/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/_t/hg9f_j4x1q743pqh00kdv6m00000gn/T/go-build803039253=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do? / What did you expect to see? / What did you see instead?

Use httputil.ReverseProxy with an http2.Transport. I'm still trying to find a minimal reproducible example.

Basically, the http2.Transport is closing an HTTP/2 stream and then sending a number of DATA frames on the connection. The http2.Server is detecting that the stream has been half-closed by the remote and is following RFC7540 and responding with an ErrCodeStreamClosed (see http2/server.go).

However, since the http2.Server does not send a window update the remote http2.Transport does not add back the sent DATA frames to its flow control. This causes a re-used http2.Transport to eventually expend all connection flow control and halt sending.

A proposed fix that I've applied to my fork is to make the following patch

diff --git a/http2/server.go b/http2/server.go
index 56859d1..7a39456 100644
--- a/http2/server.go
+++ b/http2/server.go
@@ -1586,6 +1586,8 @@ func (sc *serverConn) processData(f *DataFrame) error {
        // "open" or "half-closed (local)" state, the recipient MUST respond with a
        // stream error (Section 5.4.2) of type STREAM_CLOSED.
        if state == stateClosed {
+               sc.inflow.take(int32(f.Length))
+               sc.sendWindowUpdate(nil, int(f.Length))
                return streamError(id, ErrCodeStreamClosed)
        }
        if st == nil || state != stateOpen || st.gotTrailerHeader || st.resetQueued {

Although, one could argue that http2.Transport/httputil.Reverse proxy should be fixed to make sure that data frames aren't sent after a close.

@jared2501

This comment has been minimized.

Copy link
Author

jared2501 commented Oct 15, 2018

I think this might be related to 8f38f28#diff-d863507a61be206d112f6e00e6d812a2 since it appears that the http.Server marks the stream as stateClosed if the handler panics, even though the remote side hasn't sent a RESET_STREAM frame. Therefore, the stream should be in "half-closed (local)" and so should send back WINDOW_UPDATE frames but it's not.

@FiloSottile

This comment has been minimized.

Copy link
Member

FiloSottile commented Oct 15, 2018

/cc @bradfitz

@bradfitz

This comment has been minimized.

Copy link
Member

bradfitz commented Oct 17, 2018

Can you capture the logs (from setting GODEBUG=http2debug=2) while this happens?

@jared2501

This comment has been minimized.

Copy link
Author

jared2501 commented Oct 17, 2018

@prashantv

This comment has been minimized.

Copy link
Contributor

prashantv commented Dec 8, 2018

We ran into the same issue, and I've put together a small repro using the gRPC client, and a gRPC server hosted via the x/net HTTP/2 server:
https://github.com/prashantv/grpc-http2-repro

This example doesn't contain use the reverse proxy, and the issue is exactly as @jared2501 pointed out -- when the server sends RST_STREAM with stream closed, it's not updating flow control, and so the client stops sending data.

When this test is run, the client reports a "deadline exceeded" (since it's waiting to send data, but never gets to it since the window is full):

--- FAIL: TestGRPCLargePayload (1.02s)
        require.go:794:
                        Error Trace:    repro_test.go:58
                                                                repro_test.go:59
                        Error:          Received unexpected error:
                                        rpc error: code = DeadlineExceeded desc = context deadline exceeded
                        Test:           TestGRPCLargePayload
                        Messages:       Expect following requests to succeed

I have the GODEBUG=http2debug=2 logs:
https://gist.github.com/prashantv/e181206b39b030ffea44b05f16596ac1

The client doesn't stop sending frames, and so the server keeps responding with:
RST_STREAM stream=1 len=4 ErrCode=STREAM_CLOSED. However, there's no corresponding WINDOW_UPDATE (which it did before the stream was closed).

This seems like a regression introduced in golang/net@039a425 -- before that change, the code would send a window update.

I also have the "fix" (update flow-control) on a branch called "fix", where the test mostly passes. However, around 1-2% of the time, it still fails with the same error -- I think there might be other paths where the server sends RST_STREAM without taking flow control into account.

@jared2501

This comment has been minimized.

Copy link
Author

jared2501 commented Dec 8, 2018

Hey @prashantv - glad to see you can repro my issue! I looked over your example and it's possible you're also being hit by #28634. Although, even with patches for this issue and #28634, I was still seeing some flow control issues when using grpc with a context with a timeout...

@fraenkel

This comment has been minimized.

Copy link
Contributor

fraenkel commented Dec 11, 2018

@prashantv You are correct. I did introduce a bug, but there is another one there too. Once we receive a data frame that does not cause a connection error, we must account for the data. The take() logic should be before all the additional state checks regardless of the path taken.

I believe there may also be a similar bug on the client side.

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Dec 13, 2018

Change https://golang.org/cl/153977 mentions this issue: http2: Revert a closed stream cannot receive data

@jared2501

This comment has been minimized.

Copy link
Author

jared2501 commented Jan 18, 2019

@bcmills @fraenkel - quick ping on this issue, the http/2 flow control stalls are still biting us a fair bit. (Sorry, I'm sure you guys probably have a lot of other stuff going on with go 1.12 release date upcoming)

gopherbot pushed a commit to golang/net that referenced this issue Jan 19, 2019

http2: Revert a closed stream cannot receive data
This reverts CL 111676 for golang/go#25023.

Reason for revert: The code change no longer issued a WindowUpdate which
is required when processing data. The original issue found in golang/go#25023
is not present after the revert.

Updates golang/go#28204

Change-Id: Iadbb63d50ca06df1281e699b9ef13181d0593f80
Reviewed-on: https://go-review.googlesource.com/c/153977
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Jan 23, 2019

Change https://golang.org/cl/159179 mentions this issue: net/http: update bundled http2

gopherbot pushed a commit that referenced this issue Jan 23, 2019

net/http: update bundled http2
Updates bundled http2 to x/net git rev ed066c81e7 for:

    http2: Revert a closed stream cannot receive data
    https://golang.org/cl/153977

Updates #28204

Change-Id: I0a489e4e8a581a107970199f64f0fa9281982efe
Reviewed-on: https://go-review.googlesource.com/c/159179
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>

@andybons andybons modified the milestones: Go1.12, Go1.13 Feb 12, 2019

@grahamking

This comment has been minimized.

Copy link

grahamking commented Feb 13, 2019

You don't need a misbehaved stream to trigger this. If the server doesn't read the request body, flow control gets exhausted.

Here's a client/server where the server ignores the body on the first few POST (it just WriteHeader). The client blocks because it's available() flow is 0.

It's broken (server-side) from 1.11.0 through 1.12beta2, and fixed in 1.12rc1 by golang/net@ed066c8 . We see this in production when the client is Chrome, as well as Go, so I don't think it's a misbehaving client.

Repro unit test: https://play.golang.org/p/EOUShlw2SNk (replace cert.pem, cert.key, and myhost).

Client

package main

import (
	"bytes"
	"log"
	"net/http"
)

func main() {
	for {
		body := bytes.Repeat([]byte{0}, 128*1024*1024)
		_, err := http.Post("https://myhost:8443", "", bytes.NewReader(body))
		if err != nil {
			log.Fatal(err)
		}
	}
}

Server

package main

import (
	"fmt"
	"io/ioutil"
	"log"
	"net/http"
)

func main() {
	var i int
	http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
		fmt.Println(i)
		i++
		if i <= 5 {
			w.WriteHeader(http.StatusAccepted)
			return
		}
		if _, err := ioutil.ReadAll(r.Body); err != nil {
			log.Fatal(err)
		}
		w.Write([]byte("OK"))
	})
	log.Fatal(http.ListenAndServeTLS(":8443","cert.pem","cert.key",nil))
}

If the above doesn't reproduce it for you try increasing if i <= 5 to 10 or 20.

To make this easier for others to find, could the issue's title be updated to something like "Not reading request body can cause connections to exhaust flow control"?

@bradfitz

This comment has been minimized.

Copy link
Member

bradfitz commented Feb 14, 2019

To make this easier for others to find, could the issue's title be updated to something like "Not reading request body can cause connections to exhaust flow control"?

The title should make it clear whether this is a Server or Transport issue. Which is it? I see discussion of both in here. (I haven't looked into this yet.)

@grahamking

This comment has been minimized.

Copy link

grahamking commented Feb 14, 2019

I don't know enough to say if it's Server or Transport, but I guess Server. It's an accounting error in http2serverConn.flow, in the connection-level flow control.

What I think is happening is that when the HTTP handler returns without reading the body (say it returns a 404), any DATA frames in flight are counted towards the connection-level flow control by the client, but not by the server, so they get out of sync.

Once the HTTP handler returns the server closes the stream like this:

http2: Framer 0xc00013c1c0: wrote HEADERS flags=END_STREAM|END_HEADERS stream=5 len=26
http2: Framer 0xc00013c1c0: wrote RST_STREAM stream=5 len=4 ErrCode=NO_ERROR

The in-flight DATA frames are considered invalid:

http2: Framer 0xc00013c1c0: wrote RST_STREAM stream=5 len=4 ErrCode=STREAM_CLOSED

If I print out the flow control values (http2flow.n) at the connection level, and use the client/server from my previous comment, when it blocks the server has 1,073,807,359 bytes but the client has 0.

This is the same as @prashantv described above.

They added a clarification to the spec for this:

A receiver that receives a flow-controlled frame MUST always account
for its contribution against the connection flow-control window,
unless the receiver treats this as a connection error
(Section 5.4.1). This is necessary even if the frame is in error.
The sender counts the frame toward the flow-control window, but if
the receiver does not, the flow-control window at the sender and
receiver can become different.

@grahamking

This comment has been minimized.

Copy link

grahamking commented Feb 26, 2019

In http2 the flow control values are always relative, which makes problems like this very difficult to catch (it survived all of Go 1.11.X). Personally I think the inability to re-sync flow control window sizes between client and server makes the protocol brittle.

I started a discussion about that on the IETF HTTP Working Group's mailing list, here https://lists.w3.org/Archives/Public/ietf-http-wg/2019JanMar/0155.html .

The emerging consensus seems to be that the protocol is fine but better debugging support would be welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.