Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: Transport race condition by Content-Length == 0 response #41600

Open
akihiro opened this issue Sep 24, 2020 · 9 comments
Open

net/http: Transport race condition by Content-Length == 0 response #41600

akihiro opened this issue Sep 24, 2020 · 9 comments
Labels
Milestone

Comments

@akihiro
Copy link

@akihiro akihiro commented Sep 24, 2020

What version of Go are you using (go version)?

$ go version
go version go1.15.2 linux/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/hiroaki-m/.cache/go-build"
GOENV="/home/hiroaki-m/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/hiroaki-m/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/hiroaki-m/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build121864325=/tmp/go-build -gno-record-gcc-switches"

What did you do?

package main

import (
	"context"
	"fmt"
	"net/http"
	"net/http/httptest"
)

type handler struct{}

func (h handler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
	w.Header().Add("Content-Length", "0")
}

func requestAndCancel(url string) {
	for {
		wait := make(chan struct{})
		done := make(chan struct{})
		ctx, cancel := context.WithCancel(context.Background())
		req, _ := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
		go func() {
			close(wait)
			http.DefaultTransport.RoundTrip(req)
			close(done)
		}()
		<-wait
		cancel()
		<-done
	}
}

func main() {
	ts := httptest.NewServer(&handler{})
	defer ts.Close()

	http.DefaultTransport.(*http.Transport).MaxConnsPerHost = 1

	// request and cancel
	for i := 0; i < 8; i++ {
		go requestAndCancel(ts.URL)
	}

	// request only
	for {
		req, _ := http.NewRequest(http.MethodGet, ts.URL, nil)
		if _, err := http.DefaultTransport.RoundTrip(req); err != nil {
			fmt.Printf("An unexpected error has occurred: %#v\n", err)
			break
		}
	}
}

I'm gussing that the keep-alived tcp connection is used simultaneously by multiple goroutine.
Early return to connection pool https://go.googlesource.com/go/+/refs/tags/go1.15.2/src/net/http/transport.go#2089

What did you expect to see?

No error response and loop infinity.

What did you see instead?

$ date -u; go run context_cancel_race.go ; date -u
Thu 24 Sep 2020 03:28:55 AM UTC
An unexpected error has occurred: &errors.errorString{s:"context canceled"}
Thu 24 Sep 2020 03:30:00 AM UTC
@akihiro akihiro changed the title net/http: Transport race condition by Context-Length == 0 response net/http: Transport race condition by Content-Length == 0 response Sep 24, 2020
@fraenkel
Copy link
Contributor

@fraenkel fraenkel commented Sep 25, 2020

Take a look at how many connections you have in TIME_WAIT when it fails.

@davecheney
Copy link
Contributor

@davecheney davecheney commented Sep 25, 2020

Looking at your logic there is nothing to prevent this sequence

                // step 1
		go func() {
                         // step 4
			close(wait)
                         // step 5, req is already cancelled.
			http.DefaultTransport.RoundTrip(req)
			close(done)
		}()
                // step 2
		<-wait
                // step 3
		cancel()
		<-done
@akihiro
Copy link
Author

@akihiro akihiro commented Sep 25, 2020

Take a look at how many connections you have in TIME_WAIT when it fails.

$ go run context-test.go; ss -atn > case1.log
An unexpected error has occurred: &errors.errorString{s:"context canceled"}
$ grep TIME-WAIT case1.log | wc -l
4096

I set tcp_max_tw_buckets to 65536 for relax limitation of a number of TIME-WAIT sockets.

# echo 65536  >  /proc/sys/net/ipv4/tcp_max_tw_buckets

I test again.

$ cat /proc/sys/net/ipv4/tcp_fin_timeout
60
$ sleep 120 # wait for transition TIME-WAIT to CLOSED 
$ go run context-test.go; ss -atn > case2-1.log
An unexpected error has occurred: &errors.errorString{s:"context canceled"}
$ grep TIME-WAIT case2-1.log | wc -l
8243
$ sleep 120
$ go run context-test.go; ss -atn > case2-2.log
An unexpected error has occurred: &errors.errorString{s:"context canceled"}
$ grep TIME-WAIT case2-2.log | wc -l
1495
$ sleep 120
$ go run context-test.go; ss -atn > case2-3.log
An unexpected error has occurred: &errors.errorString{s:"context canceled"}
$ grep TIME-WAIT case2-3.log | wc -l
14615
$ sleep 120
$ go run context-test.go; ss -atn > case2-4.log
An unexpected error has occurred: &errors.errorString{s:"context canceled"}
$ grep TIME-WAIT case2-4.log | wc -l
9449
@fraenkel
Copy link
Contributor

@fraenkel fraenkel commented Sep 26, 2020

Sorry about the wild goose chase. I have narrowed down the issue. Still trying to understand how its happening but I do understand why you get the error.
The connection handed to a request, not just the non-cancelling one is already broken, closed and cancelled. Now the question is why this connection actually exists in the pool.

@fraenkel
Copy link
Contributor

@fraenkel fraenkel commented Sep 26, 2020

I am going to have to think about this. There doesn't seem to be an easy fix. The goroutine which reads the response, puts the connection back into the pool and then sends the response back via a channel to roundTrip(). However, the roundTrip() is also watching for contexts that are Done() and will cancel the connection. The connection is already in the idleConn pool. Turns out a request will get this connection before the previous one is actually done so the connection is sometimes broken out of the pool and sometimes healthy but then breaks later.

@fraenkel
Copy link
Contributor

@fraenkel fraenkel commented Sep 26, 2020

Here is a minimal testcase to reproduce the issue:

package http_test

import (
	"context"
	"net/http"
	"net/http/httptest"
	"sync"
	"testing"
)

func Test41600(t *testing.T) {
	ts := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, req *http.Request) {
		w.Header().Add("Content-Length", "0")
	}))
	defer ts.Close()

	client := ts.Client()
	transport := client.Transport.(*http.Transport)
	transport.MaxIdleConns = 1
	transport.MaxConnsPerHost = 1

	var wg sync.WaitGroup

	ctx, cancel := context.WithCancel(context.Background())

	for i := 0; i < 10; i++ {
		wg.Add(1)
		go func() {
			defer wg.Done()
			for ctx.Err() == nil {
				reqctx, reqcancel := context.WithCancel(context.Background())
				go reqcancel()
				req, _ := http.NewRequestWithContext(reqctx, http.MethodGet, ts.URL, nil)
				rsp, err := client.Do(req)
				if err == nil {
					defer rsp.Body.Close()
				}
			}
		}()
	}

	for {
		req, _ := http.NewRequest(http.MethodGet, ts.URL, nil)
		if rsp, err := client.Do(req); err != nil {
			t.Errorf("unexpected: %p %v", req, err)
			break
		} else {
			rsp.Body.Close()
		}
	}

	cancel()
	wg.Wait()
}

@fraenkel
Copy link
Contributor

@fraenkel fraenkel commented Sep 26, 2020

The problem is easier to solve than I thought. The race is close to what is described but slightly incorrect. This problem can occur with all request but its easiest to reproduce with a HEAD/no response body request.
The readLoop for request 1 will put the connection back into the idle pool. Request 2 will pick it up and cancel it. Request 1's roundTrip will be notified that the connection has closed because it hasn't received the response. Once it puts the connection back, it should no longer care about any connection issue since a response will be coming.

@gopherbot
Copy link

@gopherbot gopherbot commented Sep 26, 2020

Change https://golang.org/cl/257818 mentions this issue: net/http: ignore connection closes once done with the connection

@networkimprov
Copy link

@networkimprov networkimprov commented Sep 26, 2020

cc @odeke-em as possible reviewer :-)

@cagedmantis cagedmantis added this to the Backlog milestone Sep 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants
You can’t perform that action at this time.