Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: performance collapse when http/2 requests wait for connection #34944

Open
rhysh opened this issue Oct 17, 2019 · 0 comments

Comments

@rhysh
Copy link
Contributor

@rhysh rhysh commented Oct 17, 2019

Does this issue reproduce with the latest release? / What version of Go are you using (go version)?

The code that causes this behavior is present in Go 1.13 and in tip, but #34941 shadows the bug in releases newer than Go 1.11—so for today, I'll demonstrate it with go1.11.13.

$ go1.11 version
go version go1.11.13 darwin/amd64

What operating system and processor architecture are you using (go env)?

go env Output
$ go1.11 env
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/rhys/Library/Caches/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/rhys/go"
GOPROXY=""
GORACE=""
GOROOT="/Users/rhys/go/version/go1.11"
GOTMPDIR=""
GOTOOLDIR="/Users/rhys/go/version/go1.11/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/49/zmds5zsn75z1283vtzxyfr5hj7yjq4/T/go-build546431991=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

In the test, I set up an http/2 server/client pair, limited the number of TCP connections that could carry HTTP requests and started a large number of HTTP requests. I then measured how long it took to create and cancel one additional request.

What did you expect to see?

I expected the speed of creating and canceling additional requests to not vary based on the number of other HTTP requests waiting on the TCP connection.

What did you see instead?

Creating and canceling an HTTP request gets slow as a function of how many requests are waiting for the same TCP connection when HTTP/2 is active.

The code in net/http/h2_bundle.go that bridges between
the channel for canceling a single request and the sync.Cond that guards the TCP connection (net/http.http2ClientConn.awaitOpenSlotForRequest) responds to request cancelation by waking up every goroutine that's waiting to use the TCP connection (with sync.Cond.Broadcast). Each of those goroutines in sequence will acquire the lock on the *http2ClientConn to check if there's room to send another request.

On top of that, the contention on the sync.Mutex protecting a single connection results in a slowdown on the sync.Mutex protecting the Transport's HTTP/2 connection pool when *http2clientConnPool.getClientConn calls cc.idleState while holding the pool's lock.


In the reproducer, the baseline speed of creating and canceling an HTTP/2 request is 1ms since the test waits that long before canceling to give the RoundTrip goroutine time to find the TCP connection in the pool and start waiting for an available slot.

When there are a small number of outstanding requests (100 or 200), creating and canceling an additional request takes about 1.3ms: that baseline of 1ms plus 300µs of actual time.

As the number of outstanding requests grows past the capacity of the single TCP connection (the default Go http/2 server sets that to 250 HTTP requests), creating and canceling a request wakes up more and more goroutines. The cost of this is still small with 1600 idle requests (1.1ms over the 1ms baseline), but with 6400 idle requests it's grown to 5.9ms over the 1ms baseline. With 100k idle requests, the cost to cancel one is nearly one second of work.

With N idle requests that all time out / are canceled, the total cost is O(N^2). The cost should be O(N).

name                          time/op
QueuedRequests/idle=100-8     1.35ms ± 4%
QueuedRequests/idle=200-8     1.32ms ± 6%
QueuedRequests/idle=400-8     1.41ms ± 5%
QueuedRequests/idle=800-8     1.62ms ±12%
QueuedRequests/idle=1600-8    2.08ms ±15%
QueuedRequests/idle=3200-8    2.92ms ±43%
QueuedRequests/idle=6400-8    6.88ms ±61%
QueuedRequests/idle=12800-8   13.4ms ±19%
QueuedRequests/idle=25600-8   31.1ms ±18%
QueuedRequests/idle=51200-8   87.6ms ±16%
QueuedRequests/idle=102400-8   764ms ±71%
package repro

import (
	"context"
	"crypto/tls"
	"fmt"
	"io/ioutil"
	"log"
	"net/http"
	"net/http/httptest"
	"sync"
	"sync/atomic"
	"testing"
	"time"
)

func withHTTP2Server(h http.Handler, do func(s *httptest.Server)) {
	s := httptest.NewUnstartedServer(h)
	s.TLS = &tls.Config{
		NextProtos: []string{"h2"},
	}
	s.Config.ErrorLog = log.New(ioutil.Discard, "", 0) // swallow the "bad certificate" log lines
	s.StartTLS()
	defer s.Close()

	transport := s.Client().Transport.(*http.Transport)
	clientConfig := transport.TLSClientConfig
	transport.TLSClientConfig = nil

	// make a request to trigger HTTP/2 autoconfiguration
	resp, err := s.Client().Get(s.URL)
	if err == nil {
		resp.Body.Close()
	}
	// now allow the client to connect to the ad-hoc test server
	transport.TLSClientConfig.RootCAs = clientConfig.RootCAs

	do(s)
}

func BenchmarkQueuedRequests(b *testing.B) {
	testcase := func(idleRequestCount int) func(b *testing.B) {
		return func(b *testing.B) {
			allow := make(chan struct{})
			var starts int64
			h := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
				if !r.ProtoAtLeast(2, 0) {
					b.Errorf("Request is not http/2: %q", r.Proto)
					return
				}
				atomic.AddInt64(&starts, 1)
				<-allow
			})

			withHTTP2Server(h, func(s *httptest.Server) {
				// We're looking at how http/2 requests get queued on a single
				// connection. Force use of a single connection so we can examine
				// that queuing.
				s.Client().Transport.(*http.Transport).MaxConnsPerHost = 1

				ctx := context.Background()
				ctx, cancel := context.WithCancel(ctx)
				defer cancel()

				// Set up some (variable) number of idle/outstanding requests

				var wg sync.WaitGroup
				for i := 0; i < idleRequestCount; i++ {
					req, err := http.NewRequest("GET", s.URL, nil)
					if err != nil {
						b.Fatalf("NewRequest: %s", err)
					}
					wg.Add(1)
					go func() {
						defer wg.Done()
						ctx, cancel := context.WithCancel(ctx)
						defer cancel()
						req = req.WithContext(ctx)
						resp, err := s.Client().Do(req)
						if err != nil {
							return
						}
						resp.Body.Close()
					}()
				}

				// Allow requests to settle
				time.Sleep(time.Second)

				// Measure how quickly we can create and cancel a marginal request
				// on the contended http/2 connection.

				b.ResetTimer()
				for i := 0; i < b.N; i++ {

					req, err := http.NewRequest("GET", s.URL, nil)
					if err != nil {
						b.Fatalf("NewRequest: %s", err)
					}

					ctx, cancel := context.WithTimeout(ctx, time.Millisecond)
					req = req.WithContext(ctx)

					resp, err := s.Client().Do(req)
					if err == nil {
						resp.Body.Close()
					}
					cancel()
				}
				b.StopTimer()

				close(allow)
				cancel()
				wg.Wait()
			})
		}
	}

	for idle := 100; idle < 120000; idle *= 2 {
		b.Run(fmt.Sprintf("idle=%d", idle), testcase(idle))
	}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.