New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: Transport.MaxConnsPerHost doesn't work well with HTTP/2 #27753

Open
Gaboose opened this Issue Sep 19, 2018 · 12 comments

Comments

Projects
None yet
6 participants
@Gaboose

Gaboose commented Sep 19, 2018

What version of Go are you using (go version)?

go version go1.11 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

GOHOSTARCH="amd64"
GOHOSTOS="linux"

What did you do?

I ran a program that quickly and concurrently executes 10 http requests to the same url from a single http.Client, also httptraceing TLSHandshakeStart and GotConn events.

package main

import (
	"fmt"
	"net/http"
	"net/http/httptrace"
	"sync"
)

func checkError(err error) {
	if err != nil {
		panic(err)
	}
}

func main() {
	client := &http.Client{}
	req, err := http.NewRequest("GET", "https://golang.org/doc/", nil)
	checkError(err)

	trace := &httptrace.ClientTrace{
		GotConn: func(connInfo httptrace.GotConnInfo) {
			fmt.Printf("Got Conn: %+v\n", connInfo)
		},
		TLSHandshakeStart: func() {
			fmt.Println("TLSHandshakeStart")
		},
	}
	req = req.WithContext(httptrace.WithClientTrace(req.Context(), trace))

	wg := sync.WaitGroup{}
	for i := 0; i < 10; i++ {
		wg.Add(1)
		go func() {
			_, err := client.Do(req)
			checkError(err)
			wg.Done()
		}()
	}
	wg.Wait()
}

What did you expect to see?

TLSHandshakeStart being printed as many times as GotConn: {[...] Reused:false [...]} - only once.

What did you see instead?

TLSHandhshakeStart is printed 10 times, but GotConn reports that it reused the connection 9 times out of 10.

This is the full output:

TLSHandshakeStart
TLSHandshakeStart
TLSHandshakeStart
TLSHandshakeStart
TLSHandshakeStart
TLSHandshakeStart
TLSHandshakeStart
TLSHandshakeStart
TLSHandshakeStart
TLSHandshakeStart
Got Conn: {Conn:0xc4202b6000 Reused:false WasIdle:false IdleTime:0s}
Got Conn: {Conn:0xc4202b6000 Reused:true WasIdle:false IdleTime:0s}
Got Conn: {Conn:0xc4202b6000 Reused:true WasIdle:false IdleTime:0s}
Got Conn: {Conn:0xc4202b6000 Reused:true WasIdle:false IdleTime:0s}
Got Conn: {Conn:0xc4202b6000 Reused:true WasIdle:false IdleTime:0s}
Got Conn: {Conn:0xc4202b6000 Reused:true WasIdle:false IdleTime:0s}
Got Conn: {Conn:0xc4202b6000 Reused:true WasIdle:false IdleTime:0s}
Got Conn: {Conn:0xc4202b6000 Reused:true WasIdle:false IdleTime:0s}
Got Conn: {Conn:0xc4202b6000 Reused:true WasIdle:false IdleTime:0s}
Got Conn: {Conn:0xc4202b6000 Reused:true WasIdle:false IdleTime:0s}

The large number of TLS handshakes is a big performance hit for us. This doesn't happen if I specify an http2 transport explicitly:

- client := &http.Client{}
+ client := &http.Client{Transport: &http2.Transport{}}

Additionally strace -f -e trace=network -s 10000 ./mytestprogram prints ~10 times less connect syscalls if I specify the http2 transport explicitly.

@bcmills

This comment has been minimized.

Member

bcmills commented Sep 22, 2018

@bcmills bcmills added this to the Go1.12 milestone Sep 22, 2018

@fraenkel

This comment has been minimized.

Contributor

fraenkel commented Sep 22, 2018

@Gaboose Set your MaxConnsPerHost. You are using the default which is unlimited.

@Gaboose

This comment has been minimized.

Gaboose commented Sep 24, 2018

@fraenkel That didn't fix it.

- client := &http.Client{}
+ client := &http.Client{Transport: &http.Transport{MaxConnsPerHost: 2}}

gives

TLSHandshakeStart
TLSHandshakeStart
Got Conn: {Conn:0xc00020a380 Reused:false WasIdle:false IdleTime:0s}
Got Conn: {Conn:0xc00020a380 Reused:true WasIdle:false IdleTime:0s}
TLSHandshakeStart
TLSHandshakeStart
Got Conn: {Conn:0xc00020a380 Reused:true WasIdle:false IdleTime:0s}
Got Conn: {Conn:0xc00020a380 Reused:true WasIdle:false IdleTime:0s}
TLSHandshakeStart
TLSHandshakeStart
Got Conn: {Conn:0xc00020a380 Reused:true WasIdle:false IdleTime:0s}
Got Conn: {Conn:0xc00020a380 Reused:true WasIdle:false IdleTime:0s}
TLSHandshakeStart
TLSHandshakeStart
Got Conn: {Conn:0xc00020a380 Reused:true WasIdle:false IdleTime:0s}
Got Conn: {Conn:0xc00020a380 Reused:true WasIdle:false IdleTime:0s}
TLSHandshakeStart
TLSHandshakeStart
Got Conn: {Conn:0xc00020a380 Reused:true WasIdle:false IdleTime:0s}
Got Conn: {Conn:0xc00020a380 Reused:true WasIdle:false IdleTime:0s}
@meirf

This comment has been minimized.

Member

meirf commented Sep 24, 2018

I'm not sure if it's a problem that GotConn is being called a bunch of time since the tracing shows that the connection is being reused. Seems like the bigger problem is that the TLS handshake happens more than once. And @Gaboose 's strace comment shows that this isn't a tracing bug.

@Gaboose is this right?

@Gaboose

This comment has been minimized.

Gaboose commented Sep 24, 2018

@meirf Yes, I think TLS handshake should happen only once, especially because the connection seems to be successfully reused. I first discovered this by seeing too much of cpu time being spent in crypto/tls.(*Conn).clientHandshake while profiling. That's definitely the problem I want to report here. Sorry for not being clearer.

@bradfitz

This comment has been minimized.

Member

bradfitz commented Sep 24, 2018

As @fraenkel said, the answer is supposed to be MaxConnsPerHost (which was #13957) but that appeared late-ish in the Go 1.11 cycle and didn't get enough attention from me & other testers: it appears that it doesn't work that well with HTTP/2.

What's happening for HTTP/2 is that before the dial it acquires a token from the semaphore, dials, handshakes, realizes it's HTTP/2, and then immediately releases the token, before adding the connection to the connection pool. The second problem is that connection acquistion code's MaxConnsPerHost path doesn't re-check for idle connections after doing the semaphore wait, which only matters for HTTP/2 (for HTTP/1, it gets idle connections from the t.getIdleConnCh select case).

@meirf, do you want to work on this more?

@bradfitz bradfitz changed the title from net/http: transport creates new short-lived conns before deciding to reuse to net/http: Transport.MaxConnsPerHost doesn't work well with HTTP/2 Sep 24, 2018

@meirf

This comment has been minimized.

Member

meirf commented Sep 25, 2018

Yes I’d like to. I clearly have more to learn here.

bradfitz, thanks for investigating. gaboose, thanks for posting.

@meirf

This comment has been minimized.

Member

meirf commented Oct 5, 2018

@fraenkel, any interest in taking this? (I don't have much bandwidth right now.)

@fraenkel

This comment has been minimized.

Contributor

fraenkel commented Oct 5, 2018

@meirf Sure, I'll take a crack at it.

@fraenkel

This comment has been minimized.

Contributor

fraenkel commented Oct 6, 2018

@bradfitz Do you have a preference on how we manage the http2 connection on the http side? The problem right now is that we attempt a RoundTrip and if that does not exist or fails we dial. No where do we track the pconn.
I can try to treat this pconn as a persistent idle connection, however we cannot decrement the host count until we know the pconn/pconn.conn is closed unless we want to allow for a race and invalidate the MaxConnHosts contract.
The tricky part is knowing when the pconn.conn is closed. Because we pass a tls.Conn to h2, I cannot wrap it. The only way I see around this is to just decrement when we fail trying to reuse the connection from h2. I am just wondering if I am missing something more obvious or easier.

@gopherbot

This comment has been minimized.

gopherbot commented Oct 6, 2018

Change https://golang.org/cl/140357 mentions this issue: net/http: HTTP/2 respects Transport.MaxConnsPerHost

@bradfitz bradfitz self-assigned this Nov 3, 2018

@gopherbot

This comment has been minimized.

gopherbot commented Nov 30, 2018

Change https://golang.org/cl/151857 mentions this issue: http2: revert Transport's strict interpretation of MAX_CONCURRENT_STREAMS

gopherbot pushed a commit to golang/net that referenced this issue Dec 1, 2018

http2: revert Transport's strict interpretation of MAX_CONCURRENT_STR…
…EAMS

And add the http2.Transport.StrictMaxConcurrentStreams bool knob to
behavior being reverted.

In CL 53250 for golang/go#13774 (for Go 1.10) we changed the HTTP/2
Transport's policy such that a server's advertisement of a
MAX_CONCURRENT_STREAMS value meant that it was a maximum for the
entire process, instead of just a single connection.

We thought that was a reasonable interpretation of the spec and
provided nice safety against slamming a server from a bunch of
goroutines doing concurrent requests, but it's been largely
unpopular (see golang/go#27044). It's also different behavior from
HTTP/1 and because you're usually not sure which protocol version
you're going to get, you need to limit your outbound HTTP requests
anyway in case you're hitting an HTTP/1 server.

And nowadays we have the Go 1.11 Transport.MaxConnsPerHost knob too
(CL 71272 for golang/go#13957). It doesn't yet work for HTTP/2, but it
will in either Go 1.12 or Go 1.13 (golang/go#27753)

After this is bundled into net/http's, the default HTTP client will
have this knob set false, restoring the old Go 1.9 behavior where new
TCP connections are created as necessary. Users wanting the strict
behavior and import golang.org/x/net/http2 themselves and make a
Transport with StrictMaxConcurrentStreams set to true. Or they can set
Transport.MaxConnsPerHost, once that works for HTTP/2.

Updates golang/go#27044 (fixes after bundle into std)

Change-Id: I4efdad7698feaf674ee8e01032d2dfa5c2f8a3a8
Reviewed-on: https://go-review.googlesource.com/c/151857
Reviewed-by: Andrew Bonventre <andybons@golang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment