Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: max connections overflow #39389

Open
stevenh opened this issue Jun 3, 2020 · 5 comments · May be fixed by golang/net#73
Open

x/net/http2: max connections overflow #39389

stevenh opened this issue Jun 3, 2020 · 5 comments · May be fixed by golang/net#73
Milestone

Comments

@stevenh
Copy link
Contributor

@stevenh stevenh commented Jun 3, 2020

What version of Go are you using (go version)?

go version go1.14.4 freebsd/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
GO111MODULE=""
GOARCH="amd64"
GOBIN="/data/go/bin"
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="freebsd"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="freebsd"
GOPATH="/data/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/freebsd_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/data/go/src/github.com/multiplay/go/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build080411694=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Highly concurrent requests to a http2 endpoint.

Sample reproduction code:

package main

import (
        "flag"
        "log"
        "net"
        "net/http"
        "sync"
        "sync/atomic"
        "time"
)

var (
        client    *http.Client
        successes int32
        errors    int32
        max       int
        url          string
)

func worker(wg *sync.WaitGroup, run <-chan struct{}) {
        defer wg.Done()

        req, err := http.NewRequest(http.MethodGet, url, nil)
        if err != nil {
                log.Println(err)
                atomic.AddInt32(&errors, 1)
                return
        }

        <-run
        if _, err = client.Do(req); err != nil {
                log.Println(err)
                atomic.AddInt32(&errors, 1)
        } else {
                atomic.AddInt32(&successes, 1)
        }
}

func main() {
        flag.IntVar(&max, "max", 10, "maximum concurrent requests")
        flag.StringVar(&url, "max", "https://example/com", "url to test against")
        flag.Parse()

        client = &http.Client{
                Transport: &http.Transport{
                        Proxy: http.ProxyFromEnvironment,
                        DialContext: (&net.Dialer{
                                Timeout:   30 * time.Second,
                                KeepAlive: 30 * time.Second,
                                DualStack: true,
                        }).DialContext,
                        ForceAttemptHTTP2:     true,
                        MaxIdleConns:          max,
                        IdleConnTimeout:       90 * time.Second,
                        TLSHandshakeTimeout:   10 * time.Second,
                        ExpectContinueTimeout: 1 * time.Second,
                },
        }

        var wg sync.WaitGroup
        wg.Add(max)
        run := make(chan struct{})
        for i := 0; i < max; i++ {
                go worker(&wg, run)
        }
        close(run)
        wg.Wait()
        log.Printf("successes: %d errors: %d\n", successes, errors)
}

In our tests we ran the above with -max=1000 to trigger the issue as the server has MAX_CONCURRENT_STREAMS=256.

What did you expect to see?

All requests succeed.

What did you see instead?

http2: server sent GOAWAY and closed the connection; LastStreamID=533, ErrCode=PROTOCOL_ERROR

Debugging so far

After doing some debugging the issue is down to the fact transport configures maxConcurrentStreams to 1000 and with lots of very quick requests as generated by the above a connection ends up getting more than the max streams before it processes the settings header from the server which includes MAX_CONCURRENT_STREAMS=256.

The result of this is that all streams > 256 get refused with a mixture of:

  1. RST_STREAM ErrCode=REFUSED_STREAM
  2. GOAWAY ErrCode=PROTOCOL_ERROR

The former is handled by http2canRetryError but go away isn't as there are two distinct types of GOAWAY errors http2errClientConnGotGoAway and http2GoAwayError and it's http2GoAwayError I'm seeing.

Which begs the question should http2GoAwayError be handled hence that fix is simply:

@@ -7028,10 +7047,14 @@ func http2canRetryError(err error) bool {
        if err == http2errClientConnUnusable || err == http2errClientConnGotGoAway {
                return true
        }
-       if se, ok := err.(http2StreamError); ok {
-               return se.Code == http2ErrCodeRefusedStream
+       switch err := err.(type) {
+       case http2GoAwayError:
+               return err.ErrCode == http2ErrCodeProtocol
+       case http2StreamError:
+               return err.Code == http2ErrCodeRefusedStream
+       default:
+               return false
        }
-       return false
 }

 func (t *http2Transport) dialClientConn(addr string, singleUse bool) (*http2ClientConn, error) {

With this in place I now see retries don't get GOAWAY errors but start getting unexpected EOF.

Finally if I change maxConcurrentStreams to 256 as desired by the server everything works perfectly.

Reading the HTTP2 spec it seems valid for the client to send additional frames after its connection preface and in particular before its received the servers connection preface so setting up additional streams is valid.

Given all this I wonder if setting a lower SETTINGS_MAX_CONCURRENT_STREAMS such as the min recommended by the spec of 100 should be done until a settings frame has been received, as which it could be upgraded to either the value sent or the current default if no SETTINGS_MAX_CONCURRENT_STREAMS is present. This should have minimal impact on normal operation but in high concurrent situations avoid significant extra overhead of trying to setup streams which will never seen.

To put this into context I've seen the go client hit 500 streams before it processes the settings frame, so in our case which is talking to a cloudflare endpoint means up to half the streams need to be retried on different connections.

Thoughts?

@dmitshur dmitshur added this to the Backlog milestone Jun 4, 2020
@dmitshur
Copy link
Member

@dmitshur dmitshur commented Jun 4, 2020

stevenh added a commit to stevenh/net that referenced this issue Jun 4, 2020
Prevent the client trying to establish more streams than the server is
willing to accept during the initial life time of a connection by
limiting maxConcurrentStreams to 100, the http2 specifications
recommended minimum, until we've received the initial SETTINGS frame
from the server.

After a SETTINGS frame has been received use the servers
MAX_CONCURRENT_STREAMS, if present, otherwise use 1000 as a reasonable
value.

For normal consumers this will have very little impact, allowing a
decent level of concurrency from the start, and for highly concurrent
consumers or large bursts it will prevent significant number of rejected
streams being attempted hence actually increasing performance.

Fixes golang/go#39389
@gopherbot
Copy link

@gopherbot gopherbot commented Jun 4, 2020

Change https://golang.org/cl/236497 mentions this issue: http2: limit client initial MAX_CONCURRENT_STREAMS

@networkimprov
Copy link

@networkimprov networkimprov commented Jun 7, 2020

cc @fraenkel

@dmitshur, note that Michael is a good reference for http2 issues.

@fraenkel
Copy link
Contributor

@fraenkel fraenkel commented Jun 7, 2020

My one concern is the number of connections opened prior to upgrading the MAX_CONCURRENT_STREAMS. You described a situation whereby 250+ requests were processed before the initial SETTINGS. This would create at least 4 additional connections prior to adjusting the count.

I am looking at other implementations both client and server to get a better idea of what defaults are being used. Envoy is 1 << 29, nginx is 128, Java went from 100 -> Integer.MAX_VALUE.

There are more examples of using a larger default over a smaller default.

@stevenh
Copy link
Contributor Author

@stevenh stevenh commented Jun 7, 2020

I think our example is very much worse case scenario being a load test which was specifically designed to request as many connections at startup as configured which was 10k resulting in 100 - 2000 GOAWAY from cloudflare due to this. Enforcing a limit may even allow the client to process SETTINGS as from enabling debugging it was clear the SETTINGS frame was received way before it was processed, likely due to lock contention from the new streams being opened.

Given this ensuring we stay within the limit seems more important in real world scenarios.

If you compare this to http 1 all would be individual connections, hence in the grand scheme of things 4 connections is not a huge number.

So I guess the question is keep min at 100 or go slightly higher with 128 for nginx thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

5 participants
You can’t perform that action at this time.