Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: the first write error on a connection will cause all subsequent write requests to fail blindly #39337

Open
jaricftw opened this issue Jun 1, 2020 · 6 comments
Milestone

Comments

@jaricftw
Copy link

@jaricftw jaricftw commented Jun 1, 2020

What version of Go are you using (go version)?

$ go version
go version go1.14.3 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/jzhan/.cache/go-build"
GOENV="/home/jzhan/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/jzhan/gocode"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/jzhan/bin/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/jzhan/bin/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build713868028=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Here is a test that reproduces the issue: https://github.com/jaricftw/go-http2-stuck-transport/blob/master/conn_write_timeout_test.go#L19

It tests the TCP write deadline behavior with Go HTTP2 library.

Server setup: a blocking TCP server, i.e. connection can be established, but never read data from the connection.
Client setup: An http2 client makes calls to the blocking TCP server. TCP write timeout (using https://golang.org/pkg/net/#TCPConn.SetWriteDeadline) is set for each Write operation of a connection. TCP write buffer is also set to be small to quickly fill up the write buffer of the client.

What did you expect to see?

Each request will get stuck and eventually time out after the specified TCP write timeout (i.e. 2 seconds)

What did you see instead?

First request times out after the specified 2 seconds, as expected

--- FAIL: TestConnWriteTimeout (2.04s)
    conn_write_timeout_test.go:37: got err: Post http://127.0.0.1:44417: write tcp 127.0.0.1:47152->127.0.0.1:44417: i/o timeout

However, any subsequent request returns immediately with the same error:

 conn_write_timeout_test.go:43: got err: Post http://127.0.0.1:44417: write tcp 127.0.0.1:47152->127.0.0.1:44417: i/o timeout
    conn_write_timeout_test.go:45:
                Error Trace:    conn_write_timeout_test.go:45
                Error:          Should be true
                Test:           TestConnWriteTimeout
                Messages:       expect response time to be larger than 2s, actual: 52.696µs

This indicates:

  1. The same connection (source port doesn't change from the above logs across different requests) is reused for requests to the server, even after the first connection write time out.
  2. Even though a new TCP write deadline is set for the connection before each write, the deadline is not respected after the first time out.

Dug into the http2 transport implementation, this is likely because the first write error on the connection will make all future writeHeaders to fail with the same error immediately:

type ClientConn struct {
	// other fields

	werr error      // first write error that has occurred
}

func (cc *ClientConn) writeHeaders(streamID uint32, endStream bool, maxFrameSize int, hdrs []byte) error {
	first := true // first frame written (HEADERS is first, then CONTINUATION)
	for len(hdrs) > 0 && cc.werr == nil {
		// actual work 
	}
	cc.bw.Flush()
	return cc.werr
}

ref: https://github.com/golang/net/blob/master/http2/transport.go#L1245

@dmitshur
Copy link
Member

@dmitshur dmitshur commented Jun 2, 2020

@dmitshur dmitshur added this to the Backlog milestone Jun 2, 2020
@networkimprov
Copy link

@networkimprov networkimprov commented Jun 2, 2020

@fraenkel
Copy link
Contributor

@fraenkel fraenkel commented Jun 7, 2020

There is a missing MarkDead on the connection. The current code only does cleanup if we start the readLoop which never happens.

@fraenkel
Copy link
Contributor

@fraenkel fraenkel commented Jun 7, 2020

Sorry, I did some further digging. It seems the readLoop is entered but never exits which is why the cleanup never occurs as expected.

@fraenkel
Copy link
Contributor

@fraenkel fraenkel commented Jun 7, 2020

@bradfitz It doesn't seem like the http2 tranport handles write errors properly. I see that the error is recorded and it flows up to the client, but I don't see any place that would tear the connection down. Is the assumption that a read failure must occur? It seems a bit difficult if the server side is mishandling requests like this test case does.

@gopherbot
Copy link

@gopherbot gopherbot commented Jun 28, 2020

Change https://golang.org/cl/240337 mentions this issue: x/net/http2: Close connection on write errors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.