Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: metastability in Transport pool for TLS+HTTP/1.1 #63404

Open
rhysh opened this issue Oct 5, 2023 · 0 comments
Open

net/http: metastability in Transport pool for TLS+HTTP/1.1 #63404

rhysh opened this issue Oct 5, 2023 · 0 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@rhysh
Copy link
Contributor

rhysh commented Oct 5, 2023

What version of Go are you using (go version)?

$ go1.21 version
go version go1.21.2 darwin/arm64

Does this issue reproduce with the latest release?

Yes, this is present in the Go 1.21 series and in the development branch.

What operating system and processor architecture are you using (go env)?

I see this on linux/amd64, and it's easiest to reproduce there. The reproducer isn't very reliable on my darwin/arm64 (M1) laptop.

What did you do?

The complete application includes several layers of Go programs, connected over the network with TLS-encrypted HTTP/1.1 calls. The Go programs that make up each layer use net/http.Server to receive HTTP/1.1 requests over the network, use the context.Context value associated with the http.Request for function calls within the app, and eventually attach a context.Context value derived from the inbound request to make outbound HTTP/1.1 requests with a http.Transport.

The reproducer I've written includes three layers: a client, a middle (both a client and server), and a backend (server only).

Calls to dependencies take time for network and processing, and sometimes that total duration is within the expectations of the immediate caller but is slower than the top-level client would like. The top-level client can indicate that they no longer want the results of the call by canceling the Context value they used for the http.Request. For requests made with HTTP/1.1, http.Transport will communicate that cancellation to its peer by closing the connection.

When an http.Server sees that the peer has closed the connection on which it received an HTTP/1.1 request, the http.Server will cancel the Context that it attached to the inbound http.Request, indicating that the http.Handler may stop working on the request. For the applications in question, the work done in the http.Handler includes outbound HTTP calls to sub-dependencies.

At this point, we get to the http.Transport connection pool.

The pool is a cache of TCP+TLS connections. If a connection is available in the pool, the program can use it to make an HTTP request at a cost of very little on-CPU time. If there's no connection available in the pool, making that same HTTP request involves much more on-CPU time since it triggers establishment of a new connection. For HTTPS requests, that's a TCP handshake (mostly network latency) followed by a TLS handshake (similar network latency, but also substantial on-CPU time). The on-CPU time cost of making a request is bimodal.

Without admission control to limit the amount of work a middle-layer app instance accepts, or to limit the concurrency of the http.Transport's TLS handshaking, the app can end up spending so much of its available on-CPU time in TLS handshakes that it creates its own timeouts through the exhaustion of its processing power. Dialing a new TCP+TLS+HTTP/1.1 connection involves several steps on several goroutines, and that work is scheduled onto software threads in a fair / round-robin sort of way, rather than prioritizing (among the handshaking goroutines) the handshakes that have the least amount of work left to complete.

Each handshake ends up taking a larger amount of wall-clock time because scarcity in on-CPU time means the involved goroutines need to wait in the runnable state for their turn to execute. That wall-clock time can end up being larger than the app's own expectations for how long its outbound calls should take, leading to the app's internal timeouts triggering and canceling the in-flight handshakes. (Or the app's remote caller can decide the timeout; the effect is the same.)

When the http.Transport's connection pool is full, the app works well and so it stays working well. When the pool is empty, the app works poorly (spending a huge amount of time on TLS handshakes) and so it stays working poorly. With two stable states, the system is metastable.

What did you expect to see?

I expected that basic use of net/http would not have such sharp edges. The key ingredients are:

  • Use HTTP/1.1 for outbound calls
  • Encrypt those calls with TLS
  • Thread through Context values
  • At some level of the system, have a timeout / cancellation

These are apps that disable HTTP/2 for outbound calls (a separate challenge), and which contact some dependencies that do not enable HTTP/2, and which usually set MaxIdleConnsPerHost to something much larger than the default of 2. But otherwise, it looks to me like the apps are built with the encouraged best practices (including threading through Context values).

Admission control (don't accept additional work when the app is already overloaded) is part of the complete response to this problem, but that involves work far beyond what's provided in the core Go release. (The other half of the solution for us is to remove the blockers that have kept us from using HTTP/2, so we can move away from HTTP/1.1 -- but that also only works for places where we control the dependency and its load balancer.)

What did you see instead?

Go apps that use HTTP/1.1 for outbound calls, which encrypt those calls with TLS, and which either have timeouts on those calls or thread through Context values that can be canceled as a result of a timeout experience metastability, where a large fraction of their on-CPU time ends up going to re-establishing TLS connections which are immediately discarded because CPU overload in the process causes timeouts.

I've included a reproducer below. It allows sweeping through two major dimensions -- the speed at which requests enter the system and the fraction of requests that will experience timeouts at the backend -- and enabling/disabling HTTP/2.

The first step to using the reproducer is to sweep through the request arrival rate with 3% injected timeouts to find one where less than 50% of calls succeed. Repeat with 0.3% injected timeouts to confirm that only around 0.3% of calls fail.

Then use that arrival rate as fixed while varying the injected failure rate and (if you like) whether the TLS-encrypted requests use HTTP/2.

The test framework passes the -test.cpuprofile and -test.trace flags through to the middle and backend processes.

On my hardware (recent Intel, 8 hardware threads), the tipping point is around 60µs between requests. With that arrival interval (see collapsed output below for details), the "middle" application can keep its pool full of TLS connections if 0% or 0.3% of its calls time out. But if 1% or more of the calls time out, it's no longer able to keep up with demand and more than half of the calls the frontend makes to that middle layer end up timing out. This situation is better with HTTP/2, since canceling an HTTP/2 request does not require destroying the TLS connection.

CC @neild

use of reproducer
$ go version ./pool-121 
./pool-121: go1.21.2

$ ./pool-121 -test.v -test.run='/h2=false,fail=0.03.*d=' | egrep '(RUN.*/|success)'
=== RUN   TestHTTPPoolMetastability/h2=false,fail=0.030000,d=250µs
    pool_test.go:380: success 9692
=== RUN   TestHTTPPoolMetastability/h2=false,fail=0.030000,d=210µs
    pool_test.go:380: success 9696
=== RUN   TestHTTPPoolMetastability/h2=false,fail=0.030000,d=180µs
    pool_test.go:380: success 9698
=== RUN   TestHTTPPoolMetastability/h2=false,fail=0.030000,d=150µs
    pool_test.go:380: success 9691
=== RUN   TestHTTPPoolMetastability/h2=false,fail=0.030000,d=120µs
    pool_test.go:380: success 9702
=== RUN   TestHTTPPoolMetastability/h2=false,fail=0.030000,d=100µs
    pool_test.go:380: success 9687
=== RUN   TestHTTPPoolMetastability/h2=false,fail=0.030000,d=90µs
    pool_test.go:380: success 9677
=== RUN   TestHTTPPoolMetastability/h2=false,fail=0.030000,d=80µs
    pool_test.go:380: success 5858
=== RUN   TestHTTPPoolMetastability/h2=false,fail=0.030000,d=70µs
    pool_test.go:380: success 4156
=== RUN   TestHTTPPoolMetastability/h2=false,fail=0.030000,d=60µs
    pool_test.go:380: success 182
=== RUN   TestHTTPPoolMetastability/h2=false,fail=0.030000,d=50µs
    pool_test.go:380: success 2510

$ ./pool-121 -test.v -test.run='/h2=false,fail=.*d=60' | egrep '(RUN.*/|success)'
=== RUN   TestHTTPPoolMetastability/h2=false,fail=0.000000,d=60µs
    pool_test.go:380: success 10000
=== RUN   TestHTTPPoolMetastability/h2=false,fail=0.003000,d=60µs
    pool_test.go:380: success 9977
=== RUN   TestHTTPPoolMetastability/h2=false,fail=0.010000,d=60µs
    pool_test.go:380: success 4086
=== RUN   TestHTTPPoolMetastability/h2=false,fail=0.030000,d=60µs
    pool_test.go:380: success 674
=== RUN   TestHTTPPoolMetastability/h2=false,fail=0.100000,d=60µs
    pool_test.go:380: success 66

$ ./pool-121 -test.v -test.run='/h2=true,fail=.*d=60' | egrep '(RUN.*/|success)'
=== RUN   TestHTTPPoolMetastability/h2=true,fail=0.000000,d=60µs
    pool_test.go:380: success 10000
=== RUN   TestHTTPPoolMetastability/h2=true,fail=0.003000,d=60µs
    pool_test.go:380: success 9968
=== RUN   TestHTTPPoolMetastability/h2=true,fail=0.010000,d=60µs
    pool_test.go:380: success 9911
=== RUN   TestHTTPPoolMetastability/h2=true,fail=0.030000,d=60µs
    pool_test.go:380: success 9708
=== RUN   TestHTTPPoolMetastability/h2=true,fail=0.100000,d=60µs
    pool_test.go:380: success 9022
./pool_test.go
// This is a demonstration of metastability in Go's HTTP/1.1 client connection
// pool, especially when used for HTTPS requests. When the pool is full of
// usable connections, it's cheap to make a request. When that cache is empty,
// dialing a new connection and then adding TLS is expensive.
//
// If for some reason requests get canceled, such as when the backend server is
// slow, Go is eager to pass that cancellation signal on to the remote peer.
// With HTTP/1.1, the only was to communicate that message is to close the TCP
// connection. That creates a hole in the connection pool that soon needs to be
// filled with a costly dial operation.
//
// Furthermore, when a process spends its time on TLS handshakes, that CPU time
// is not available for the program's other work. That can slow down responses
// even more, leading to more timeouts, more canceled requests and closed
// connections, and even more time spend establishing new TLS connections. This
// is metastability.
//
// How to use this demo:
//   - Fiddle with whether HTTP/2 is enabled or not
//   - Fiddle with the fraction of requests that time out (slow server)
//   - Fiddle with the arrival rate of requests (so we operate in the realm of overload)
//
// The subtests help with that. Try controlling all variables except for one:
//   - go test -v -run='TestHTTPPoolMetastability/h2=false,fail=0.003.*,d=.*'
//   - go test -v -run='TestHTTPPoolMetastability/h2=false,fail=.*,d=600.*'
package main

import (
	"context"
	"crypto/tls"
	"flag"
	"fmt"
	"io"
	"log"
	"math/rand"
	"net"
	"net/http"
	"net/http/httptest"
	"net/url"
	"os"
	"os/exec"
	"runtime"
	"strings"
	"sync"
	"sync/atomic"
	"testing"
	"time"
)

var (
	backend = flag.String("backend", "", "Run as the backend server listing on the specified address")
	middle  = flag.String("middle", "", "Run as the middle server listening on the specified address")
	http2   = flag.Bool("http2", false, "Attempt HTTP2")
	fail    = flag.Float64("fail", 0.1, "Fraction of requests to fail")
)

// TestRunMiddle starts a server that receives incoming plaintext HTTP requests
// (TCP+HTTP), passing them along to a backend using HTTPS (TCP+TLS+HTTP).
func TestRunMiddle(t *testing.T) {
	if *middle == "" {
		t.Skip()
	}

	ctx := context.Background()
	addr := *middle
	backend := *backend
	forceAttemptHTTP2 := *http2

	runtime.GOMAXPROCS(1)

	ctx, cancel := context.WithCancel(ctx)
	defer cancel()

	l, err := net.Listen("tcp", addr)
	if err != nil {
		panic(err)
	}
	defer l.Close()

	transport := &http.Transport{
		MaxIdleConnsPerHost: 1000,
		TLSClientConfig:     &tls.Config{InsecureSkipVerify: true},
		ForceAttemptHTTP2:   forceAttemptHTTP2,
	}
	client := &http.Client{Transport: transport}

	srv := httptest.NewUnstartedServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		ctx := r.Context()

		if r.URL.Path == "/shutdown" {
			cancel()
			return
		}
		if r.URL.Path == "/health" {
			return
		}

		req, err := http.NewRequestWithContext(ctx, "GET", "https://"+backend, nil)
		if err != nil {
			http.Error(w, err.Error(), http.StatusInternalServerError)
			return
		}

		resp, err := client.Do(req)
		if err != nil {
			http.Error(w, err.Error(), http.StatusInternalServerError)
			return
		}
		defer resp.Body.Close()

		_, err = io.Copy(io.Discard, resp.Body)
		if err != nil {
			http.Error(w, err.Error(), http.StatusInternalServerError)
			return
		}

		if (resp.ProtoMajor == 2) != forceAttemptHTTP2 {
			http.Error(w, "desire for HTTP2 did not match outcome", http.StatusInternalServerError)
			return
		}

		w.WriteHeader(http.StatusOK)
	}))
	srv.Listener = l
	srv.Config.ErrorLog = log.New(io.Discard, "", 0)
	srv.Start()
	<-ctx.Done()
	srv.Close()
}

// TestRunBackend starts a server that receives incoming HTTPS requests
// (TCP+TLS+HTTP) and spends some amount of time "processing" them. A fraction
// of requests take much longer than the others -- long enough to exceed the
// caller's desired timeout.
func TestRunBackend(t *testing.T) {
	if *backend == "" || *middle != "" {
		t.Skip()
	}

	ctx := context.Background()
	addr := *backend
	failFraction := *fail

	ctx, cancel := context.WithCancel(ctx)
	defer cancel()

	l, err := net.Listen("tcp", addr)
	if err != nil {
		panic(err)
	}
	defer l.Close()

	srv := httptest.NewUnstartedServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		if r.URL.Path == "/shutdown" {
			cancel()
			return
		}
		if r.URL.Path == "/health" {
			return
		}
		// All requests take some "processing time"
		time.Sleep(100*time.Microsecond + time.Duration(randFloat64()*float64(1*time.Millisecond)))
		if randFloat64() < failFraction {
			// Some take "too long" to process, exceeding the caller's timeout
			time.Sleep(50 * time.Millisecond)
		}
	}))
	srv.Listener = l
	srv.EnableHTTP2 = true
	srv.Config.ErrorLog = log.New(io.Discard, "", 0)
	srv.StartTLS()
	<-ctx.Done()
	srv.Close()
}

var randFloat64 = func() func() float64 {
	var (
		mu sync.Mutex
		// use a private re-seeded pseudo-random number source. As of Go 1.20,
		// math/rand's global source is always re-seeded. Doing it explicitly
		// here decreases the differences between running this test with Go 1.19
		// versus 1.20.
		rng = rand.New(rand.NewSource(time.Now().UnixNano()))
	)

	return func() float64 {
		mu.Lock()
		defer mu.Unlock()
		return rng.Float64()
	}
}()

func init() {
	log.SetFlags(log.LUTC | log.Ldate | log.Ltime | log.Lmicroseconds)
}

const (
	wholeTestTimeout = 100 * time.Second

	workerCount = 1000
	jobCount    = 10000
)

func TestHTTPPoolMetastability(t *testing.T) {
	if *backend != "" || *middle != "" {
		t.Skip()
	}

	testcase := func(interval time.Duration, fail float64, http2 bool) func(t *testing.T) {
		getAddr := func() string {
			l, err := net.Listen("tcp", "localhost:0")
			if err != nil {
				t.Fatalf("net.Listen: %v", err)
			}
			defer l.Close()
			return l.Addr().String()
		}

		var (
			backendAddr = getAddr()
			middleAddr  = getAddr()
		)

		return func(t *testing.T) {
			exe, err := os.Executable()
			if err != nil {
				t.Fatalf("os.Executable; err = %v", err)
			}

			ctx := context.Background()
			ctx, cancel := context.WithTimeout(ctx, wholeTestTimeout)
			defer cancel()

			withProfileArgs := func(suffix string, args []string) []string {
				allArgs := append([]string(nil), args...)
				flag.Visit(func(f *flag.Flag) {
					if strings.HasSuffix(f.Name, "profile") || f.Name == "trace" || f.Name == "test.trace" {
						allArgs = append(allArgs, fmt.Sprintf("-%s=%s%s", f.Name, f.Value, suffix))
					} else if strings.Contains(f.Name, "profile") || f.Name == "outputdir" || f.Name == "test.outputdir" {
						allArgs = append(allArgs, fmt.Sprintf("-%s=%s", f.Name, f.Value))
					}
				})
				return allArgs
			}

			backend := exec.CommandContext(ctx, exe,
				withProfileArgs(fmt.Sprintf("-backend-%s", url.PathEscape(t.Name())),
					[]string{
						fmt.Sprintf("-backend=%s", backendAddr),
						fmt.Sprintf("-fail=%f", fail),
					},
				)...)
			backend.Stdout = os.Stdout
			backend.Stderr = os.Stderr

			middle := exec.CommandContext(ctx, exe,
				withProfileArgs(fmt.Sprintf("-middle-%s", url.PathEscape(t.Name())),
					[]string{
						fmt.Sprintf("-middle=%s", middleAddr),
						fmt.Sprintf("-http2=%t", http2),
						fmt.Sprintf("-backend=%s", backendAddr),
					},
				)...)
			middle.Stdout = os.Stdout
			middle.Stderr = os.Stderr

			backend.Start()
			middle.Start()

			transport := &http.Transport{
				MaxIdleConnsPerHost: workerCount,
				TLSClientConfig:     &tls.Config{InsecureSkipVerify: true},
			}
			client := &http.Client{Transport: transport}

			get := func(ctx context.Context, url string) error {
				req, err := http.NewRequestWithContext(ctx, "GET", url, nil)
				if err != nil {
					return fmt.Errorf("NewRequestWithContext: %w", err)
				}
				resp, err := client.Do(req)
				if err != nil {
					return fmt.Errorf("client.Do: %w", err)
				}
				defer resp.Body.Close()

				_, err = io.Copy(io.Discard, resp.Body)
				if err != nil {
					return fmt.Errorf("io.Copy: %w", err)
				}

				if resp.StatusCode != http.StatusOK {
					return fmt.Errorf("status code %d", resp.StatusCode)
				}

				return nil
			}

			waitUp := func(ctx context.Context, url string) error {
				var err error
				for i := 0; i < 10 && ctx.Err() == nil; i++ {
					callCtx, cancel := context.WithTimeout(ctx, 100*time.Millisecond)
					err = get(callCtx, url)
					cancel()
					if err == nil {
						return nil
					}
					time.Sleep((2 << i) * time.Millisecond)
				}
				return err
			}

			err = waitUp(ctx, fmt.Sprintf("http://%s/health", middleAddr))
			if err != nil {
				t.Fatalf("middle did not come up: %v", err)
			}
			err = waitUp(ctx, fmt.Sprintf("https://%s/health", backendAddr))
			if err != nil {
				t.Fatalf("backend did not come up: %v", err)
			}

			// start by making several requests with a generous timeout to
			// establish the connection pool

			var wg sync.WaitGroup
			for i := 0; i < 10; i++ {
				wg.Add(1)
				go func() {
					defer wg.Done()
					callCtx, cancel := context.WithTimeout(ctx, 300*time.Millisecond)
					get(callCtx, fmt.Sprintf("http://%s/", middleAddr))
					cancel()
				}()
			}
			wg.Wait()

			// ready to go!

			work := make(chan struct{})
			var success int64
			for i := 0; i < workerCount; i++ {
				wg.Add(1)
				go func() {
					defer wg.Done()
					for range work {
						callCtx, cancel := context.WithTimeout(ctx, 30*time.Millisecond)
						err := get(callCtx, fmt.Sprintf("http://%s/", middleAddr))
						cancel()
						if err == nil {
							atomic.AddInt64(&success, 1)
						}
					}
				}()
			}

			for i := 0; i < jobCount; i++ {
				work <- struct{}{}
				time.Sleep(interval)
			}
			close(work)
			wg.Wait()
			t.Logf("success %d", atomic.LoadInt64(&success))

			// cleanup time

			get(ctx, fmt.Sprintf("http://%s/shutdown", middleAddr))
			get(ctx, fmt.Sprintf("https://%s/shutdown", backendAddr))

			backend.Wait()
			middle.Wait()
		}
	}

	for _, d := range []time.Duration{
		250 * time.Microsecond,
		210 * time.Microsecond,
		180 * time.Microsecond,
		150 * time.Microsecond,
		120 * time.Microsecond,
		100 * time.Microsecond,
		90 * time.Microsecond,
		80 * time.Microsecond,
		70 * time.Microsecond,
		60 * time.Microsecond,
		50 * time.Microsecond} {
		for _, http2 := range []bool{false, true} {
			for _, fail := range []float64{0.000, 0.003, 0.010, 0.030, 0.100} {
				t.Run(fmt.Sprintf("h2=%t,fail=%f,d=%s", http2, fail, d), testcase(d, fail, http2))
			}
		}
	}
}
@seankhliao seankhliao added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 6, 2023
@seankhliao seankhliao added this to the Unplanned milestone Jul 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

2 participants