Skip to content

x/net/http2: the first write error on a connection will cause all subsequent write requests to fail blindly  #39337

@jaricftw

Description

@jaricftw

What version of Go are you using (go version)?

$ go version
go version go1.14.3 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/jzhan/.cache/go-build"
GOENV="/home/jzhan/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/jzhan/gocode"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/jzhan/bin/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/jzhan/bin/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build713868028=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Here is a test that reproduces the issue: https://github.com/jaricftw/go-http2-stuck-transport/blob/master/conn_write_timeout_test.go#L19

It tests the TCP write deadline behavior with Go HTTP2 library.

Server setup: a blocking TCP server, i.e. connection can be established, but never read data from the connection.
Client setup: An http2 client makes calls to the blocking TCP server. TCP write timeout (using https://golang.org/pkg/net/#TCPConn.SetWriteDeadline) is set for each Write operation of a connection. TCP write buffer is also set to be small to quickly fill up the write buffer of the client.

What did you expect to see?

Each request will get stuck and eventually time out after the specified TCP write timeout (i.e. 2 seconds)

What did you see instead?

First request times out after the specified 2 seconds, as expected

--- FAIL: TestConnWriteTimeout (2.04s)
    conn_write_timeout_test.go:37: got err: Post http://127.0.0.1:44417: write tcp 127.0.0.1:47152->127.0.0.1:44417: i/o timeout

However, any subsequent request returns immediately with the same error:

 conn_write_timeout_test.go:43: got err: Post http://127.0.0.1:44417: write tcp 127.0.0.1:47152->127.0.0.1:44417: i/o timeout
    conn_write_timeout_test.go:45:
                Error Trace:    conn_write_timeout_test.go:45
                Error:          Should be true
                Test:           TestConnWriteTimeout
                Messages:       expect response time to be larger than 2s, actual: 52.696µs

This indicates:

  1. The same connection (source port doesn't change from the above logs across different requests) is reused for requests to the server, even after the first connection write time out.
  2. Even though a new TCP write deadline is set for the connection before each write, the deadline is not respected after the first time out.

Dug into the http2 transport implementation, this is likely because the first write error on the connection will make all future writeHeaders to fail with the same error immediately:

type ClientConn struct {
	// other fields

	werr error      // first write error that has occurred
}

func (cc *ClientConn) writeHeaders(streamID uint32, endStream bool, maxFrameSize int, hdrs []byte) error {
	first := true // first frame written (HEADERS is first, then CONTINUATION)
	for len(hdrs) > 0 && cc.werr == nil {
		// actual work 
	}
	cc.bw.Flush()
	return cc.werr
}

ref: https://github.com/golang/net/blob/master/http2/transport.go#L1245

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions