Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: SetKeepAlivesEnabled(false) closes all HTTP/2 connections older than 5 seconds #36946

Open
nwidger opened this issue Jan 31, 2020 · 7 comments
Milestone

Comments

@nwidger
Copy link
Contributor

@nwidger nwidger commented Jan 31, 2020

What version of Go are you using (go version)?

$ go version 1.13.7
go version go1.13.7 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/niels/.cache/go-build"
GOENV="/home/niels/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/niels/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/niels/sdk/go1.13.7"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/niels/sdk/go1.13.7/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build412221026=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Here is a play.golang.org link for a program which reproduces the issue:

https://play.golang.org/p/7mPbWhl7QeA

it may be necessary to save it to a main.go file and run it manually with

$ go run main.go

Here is an example output from running the program:

2020/01/31 15:04:28 server: listening on 127.0.0.1:8443
2020/01/31 15:04:30 client: request 1: sent
2020/01/31 15:04:30 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46770: state new
2020/01/31 15:04:30 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:46770: state active
2020/01/31 15:04:30 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:46770: state idle
2020/01/31 15:04:30 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:46770: state active
2020/01/31 15:04:33 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:46770: state idle
2020/01/31 15:04:33 client: request 1: response: 200: received request 1 over http2
2020/01/31 15:04:33 client: request 2: sent
2020/01/31 15:04:33 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:46770: state active
2020/01/31 15:04:36 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:46770: state idle
2020/01/31 15:04:36 client: request 2: response: 200: received request 2 over http2
2020/01/31 15:04:36 client: request 3: sent
2020/01/31 15:04:36 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:46770: state active
2020/01/31 15:04:39 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:46770: state idle
2020/01/31 15:04:39 client: request 3: response: 200: received request 3 over http2
2020/01/31 15:04:39 client: request 4: sent
2020/01/31 15:04:39 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:46770: state active
2020/01/31 15:04:39 ========== disabling keep alives ==========
2020/01/31 15:04:39 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:46770: state idle
2020/01/31 15:04:39 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46770: state closed
2020/01/31 15:04:39 Get https://127.0.0.1:8443/: unexpected EOF
exit status 1

The program starts up an HTTPS server and has a client send it requests in a loop. Importantly, both the client and server are configured to use HTTP/2 over TLS and the client reuses a single TCP connection for each request. The handler for the HTTPS server blocks for 3 seconds and then sends back its response. On the fourth request, once we can be sure that the client connection has been open for at least 5 seconds, SetKeepAlivesEnabled(false) is called on the server within the handler of the request, before the response has been written out.

What did you expect to see?

After the call to SetKeepAlivesEnabled, I expected the requests sent by the client to continue working as normal. The documentation for SetKeepAlivesEnabled claims only to disable HTTP keep-alives, however by looking at the source code one can see it also apparently calls a method called closeIdleConns. However, because we are calling SetKeepAlivesEnabled within the handler of a request that has not yet been responded to yet, clearly this connection is not idle. The documentation for StateIdle is defined to be:

    // StateIdle represents a connection that has finished
    // handling a request and is in the keep-alive state, waiting
    // for a new request. Connections transition from StateIdle
    // to either StateActive or StateClosed.

What did you see instead?

The connection is closed by SetKeepAlivesEnabled via closeIdleConns and the client request fails, coming back with an unexpected EOF error. I believe this is ultimately due to the fact that the HTTP/2 code keeps its own connection state bookkeeping which is not shared with the HTTP/1 server. Using the http.Server.ConnState hook and some sneaky use of the runtime package, the logs in the example output above show that, according to the HTTP/1 server, the connection never transitions out of the StateNew state while the HTTP/2 code transitions it between StateActive and StateIdle with each new request on the connection.

Add to that this heuristic within closeIdleConns:

		// Issue 22682: treat StateNew connections as if
		// they're idle if we haven't read the first request's
		// header in over 5 seconds.
		if st == StateNew && unixSec < time.Now().Unix()-5 {
			st = StateIdle
		}

and you get the outcome that any HTTP/2 connection that has been open for longer than 5 seconds will be closed by a call to SetKeepAlivesEnabled(false). This heuristic was apparently introduced by #22682.

You can run the example program with both the client and the server configured to use HTTP/1 by passing -http=false:

$ go run main.go -http2=false

which shows that the client requests continue working after the call to SetKeepAlivesEnabled(false). Here is an example output running with HTTP/1:

2020/01/31 15:10:23 server: listening on 127.0.0.1:8443
2020/01/31 15:10:25 client: request 1: sent
2020/01/31 15:10:25 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46950: state new
2020/01/31 15:10:25 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46950: state active
2020/01/31 15:10:28 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46950: state idle
2020/01/31 15:10:28 client: request 1: response: 200: received request 1 over http1
2020/01/31 15:10:28 client: request 2: sent
2020/01/31 15:10:28 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46950: state active
2020/01/31 15:10:31 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46950: state idle
2020/01/31 15:10:31 client: request 2: response: 200: received request 2 over http1
2020/01/31 15:10:31 client: request 3: sent
2020/01/31 15:10:31 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46950: state active
2020/01/31 15:10:34 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46950: state idle
2020/01/31 15:10:34 client: request 3: response: 200: received request 3 over http1
2020/01/31 15:10:34 client: request 4: sent
2020/01/31 15:10:34 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46950: state active
2020/01/31 15:10:34 ========== disabling keep alives ==========
2020/01/31 15:10:34 ==========  disabled keep alives ==========
2020/01/31 15:10:37 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46950: state closed
2020/01/31 15:10:37 client: request 4: response: 200: received request 4 over http1
2020/01/31 15:10:37 client: request 5: sent
2020/01/31 15:10:37 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46960: state new
2020/01/31 15:10:37 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46960: state active
2020/01/31 15:10:40 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46960: state closed
2020/01/31 15:10:40 client: request 5: response: 200: received request 5 over http1
2020/01/31 15:10:40 client: request 6: sent
2020/01/31 15:10:40 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46962: state new
2020/01/31 15:10:40 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46962: state active
2020/01/31 15:10:43 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46962: state closed
2020/01/31 15:10:43 client: request 6: response: 200: received request 6 over http1
2020/01/31 15:10:43 client: request 7: sent
2020/01/31 15:10:43 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46966: state new
2020/01/31 15:10:43 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46966: state active
2020/01/31 15:10:46 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46966: state closed
2020/01/31 15:10:46 client: request 7: response: 200: received request 7 over http1
2020/01/31 15:10:46 client: request 8: sent
2020/01/31 15:10:46 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46970: state new
2020/01/31 15:10:46 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46970: state active
2020/01/31 15:10:49 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:46970: state closed
2020/01/31 15:10:49 client: request 8: response: 200: received request 8 over http1

Please let me know if I can provide any additional information about the issue.

@fraenkel

This comment has been minimized.

Copy link
Contributor

@fraenkel fraenkel commented Feb 4, 2020

@bradfitz It seems the connection should be transitioned to StateHijacked. I haven't checked but I believe we are just collecting http/2 connections.

@nwidger

This comment has been minimized.

Copy link
Contributor Author

@nwidger nwidger commented Feb 5, 2020

@fraenkel @bradfitz It looks like http.Server's graceful shutdown method Shutdown is also affected since it too calls closeIdleConns. Here's lightly modified version of the previous program which calls Shutdown instead of SetKeepAlivesEnabled within the handler:

https://play.golang.org/p/UtgaaHEJjYw

Here is example output running with HTTP/2, you can see that request 4 comes back with an unexpected EOF error since the connection is closed before a response is written:

2020/02/05 15:12:33 server: listening on 127.0.0.1:8443
2020/02/05 15:12:35 client: request 1: sent
2020/02/05 15:12:35 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:37818: state new
2020/02/05 15:12:35 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:37818: state active
2020/02/05 15:12:35 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:37818: state idle
2020/02/05 15:12:35 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:37818: state active
2020/02/05 15:12:38 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:37818: state idle
2020/02/05 15:12:38 client: request 1: response: 200: received request 1 over http2
2020/02/05 15:12:38 client: request 2: sent
2020/02/05 15:12:38 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:37818: state active
2020/02/05 15:12:41 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:37818: state idle
2020/02/05 15:12:41 client: request 2: response: 200: received request 2 over http2
2020/02/05 15:12:41 client: request 3: sent
2020/02/05 15:12:41 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:37818: state active
2020/02/05 15:12:44 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:37818: state idle
2020/02/05 15:12:44 client: request 3: response: 200: received request 3 over http2
2020/02/05 15:12:44 client: request 4: sent
2020/02/05 15:12:44 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:37818: state active
2020/02/05 15:12:44 ========== calling shutdown ==========
2020/02/05 15:12:44 http2 connstate: 127.0.0.1:8443 -> 127.0.0.1:37818: state idle
2020/02/05 15:12:44 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:37818: state closed
2020/02/05 15:12:44 client: request 4: error: Get https://127.0.0.1:8443/: unexpected EOF
2020/02/05 15:12:44 ==========  called shutdown ==========
2020/02/05 15:12:44 servertls error: http: Server closed
2020/02/05 15:12:44 server: no longer serving, waiting 5s seconds before exiting...
2020/02/05 15:12:49 done.

and with -http2=false to show the behavior with HTTP/1:

2020/02/05 15:11:27 server: listening on 127.0.0.1:8443
2020/02/05 15:11:29 client: request 1: sent
2020/02/05 15:11:29 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:37814: state new
2020/02/05 15:11:29 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:37814: state active
2020/02/05 15:11:32 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:37814: state idle
2020/02/05 15:11:32 client: request 1: response: 200: received request 1 over http1
2020/02/05 15:11:32 client: request 2: sent
2020/02/05 15:11:32 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:37814: state active
2020/02/05 15:11:35 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:37814: state idle
2020/02/05 15:11:35 client: request 2: response: 200: received request 2 over http1
2020/02/05 15:11:35 client: request 3: sent
2020/02/05 15:11:35 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:37814: state active
2020/02/05 15:11:38 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:37814: state idle
2020/02/05 15:11:38 client: request 3: response: 200: received request 3 over http1
2020/02/05 15:11:38 client: request 4: sent
2020/02/05 15:11:38 http1 connstate: 127.0.0.1:8443 -> 127.0.0.1:37814: state active
2020/02/05 15:11:38 ========== calling shutdown ==========
2020/02/05 15:11:38 servertls error: http: Server closed
2020/02/05 15:11:38 server: no longer serving, waiting 5s seconds before exiting...
2020/02/05 15:11:43 done.
@fraenkel

This comment has been minimized.

Copy link
Contributor

@fraenkel fraenkel commented Feb 6, 2020

@nwidger Its all related on how connection management works and when idle connections are closed.
There is an outstanding issue #26303, which I need more details on since its unclear to me why http2 is being managed this way.

@nwidger

This comment has been minimized.

Copy link
Contributor Author

@nwidger nwidger commented Feb 6, 2020

@fraenkel Agreed, I too am very curious to learn more.

@cagedmantis cagedmantis added this to the Backlog milestone Feb 7, 2020
@cagedmantis

This comment has been minimized.

Copy link
Contributor

@cagedmantis cagedmantis commented Feb 7, 2020

@nwidger

This comment has been minimized.

Copy link
Contributor Author

@nwidger nwidger commented Feb 13, 2020

@fraenkel Is there anything I can do to help make progress? I am very interested in addressing this issue as it is causing us upgrade issues in production. Right now my only fix is to disable HTTP/2.

@fraenkel

This comment has been minimized.

Copy link
Contributor

@fraenkel fraenkel commented Feb 13, 2020

Not at this time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.