Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: request.Body.Read() can be called after client.Do() returns and resp.Body is drained/closed in go1.18 #51907

Open
liggitt opened this issue Mar 24, 2022 · 12 comments
Labels
NeedsInvestigation
Milestone

Comments

@liggitt
Copy link
Contributor

@liggitt liggitt commented Mar 24, 2022

What version of Go are you using (go version)?

$ go version
go version go1.18 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/liggitt/Library/Caches/go-build"
GOENV="/Users/liggitt/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/liggitt/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/liggitt/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/Users/liggitt/.gvm/gos/go1.18"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/Users/liggitt/.gvm/gos/go1.18/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="go1.18"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/liggitt/go/src/k8s.io/kubernetes/go.mod"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/7f/9xt_73f12xlby0w362rgk0s400kjgb/T/go-build2769859399=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

Make a request with a resettable body, if the response is a 429, reset the body and repeat the request.

Reported in kubernetes/kubernetes#108906, standalone reproducer here:

package mytest

import (
	"bytes"
	"io"
	"net/http"
	"net/http/httptest"
	"strings"
	"testing"
)

func TestSeek(t *testing.T) {
	testServer := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, req *http.Request) {
		req.Body.Close()
		http.Error(w, "try again", http.StatusTooManyRequests)
	}))
	defer testServer.Close()

	client := &http.Client{}

	body := bytes.NewReader([]byte(strings.Repeat("abcd", 1000)))
	req, err := http.NewRequest("POST", testServer.URL, body)
	if err != nil {
		t.Fatal(err)
	}

	for i := 0; i < 1000; i++ {
		resp, err := client.Do(req)
		if err != nil {
			t.Fatal(err)
		}
		// drain and close the response body to complete the request before reusing the request body
		io.Copy(io.Discard, resp.Body)
		resp.Body.Close()
		body.Seek(0, 0)
	}
}

What did you expect to see?

Success, as in go1.17 and earlier releases

$ go version && go test -c -race my_test.go && stress ./mytest.test 

go version go1.17.8 darwin/amd64
72 runs so far, 0 failures
151 runs so far, 0 failures
214 runs so far, 0 failures
...
3084 runs so far, 0 failures
^C

What did you see instead?

Data race between the body reset and reads from the request body called from net/http.(*transferWriter).writeBody()

$ go version && go test -c -race my_test.go && stress ./mytest.test 

go version go1.18 darwin/amd64
73 runs so far, 0 failures
152 runs so far, 0 failures

/var/folders/7f/9xt_73f12xlby0w362rgk0s400kjgb/T/go-stress-20220325T130052-929070282
==================
WARNING: DATA RACE
Read at 0x00c00007ef78 by goroutine 13:
  bytes.(*Reader).Read()
      /Users/liggitt/.gvm/gos/go1.18/src/bytes/reader.go:41 +0x58
  io.(*nopCloser).Read()
      <autogenerated>:1 +0x76
  io.discard.ReadFrom()
      /Users/liggitt/.gvm/gos/go1.18/src/io/io.go:610 +0x91
  io.(*discard).ReadFrom()
      <autogenerated>:1 +0x5d
  io.copyBuffer()
      /Users/liggitt/.gvm/gos/go1.18/src/io/io.go:412 +0x1c2
  io.Copy()
      /Users/liggitt/.gvm/gos/go1.18/src/io/io.go:385 +0x64
  net/http.(*transferWriter).doBodyCopy()
      /Users/liggitt/.gvm/gos/go1.18/src/net/http/transfer.go:411 +0x3e
  net/http.(*transferWriter).writeBody()
      /Users/liggitt/.gvm/gos/go1.18/src/net/http/transfer.go:374 +0x7ef
  net/http.(*Request).write()
      /Users/liggitt/.gvm/gos/go1.18/src/net/http/request.go:698 +0x1197
  net/http.(*persistConn).writeLoop()
      /Users/liggitt/.gvm/gos/go1.18/src/net/http/transport.go:2395 +0x2e4
  net/http.(*Transport).dialConn.func6()
      /Users/liggitt/.gvm/gos/go1.18/src/net/http/transport.go:1751 +0x39

Previous write at 0x00c00007ef78 by goroutine 91:
  bytes.(*Reader).Read()
      /Users/liggitt/.gvm/gos/go1.18/src/bytes/reader.go:46 +0x11c
  io.(*nopCloser).Read()
      <autogenerated>:1 +0x76
  io.(*LimitedReader).Read()
      /Users/liggitt/.gvm/gos/go1.18/src/io/io.go:476 +0xc5
  io.copyBuffer()
      /Users/liggitt/.gvm/gos/go1.18/src/io/io.go:426 +0x28a
  io.Copy()
      /Users/liggitt/.gvm/gos/go1.18/src/io/io.go:385 +0x8d
  net.genericReadFrom()
      /Users/liggitt/.gvm/gos/go1.18/src/net/net.go:662 +0x28
  net.(*TCPConn).readFrom()
      /Users/liggitt/.gvm/gos/go1.18/src/net/tcpsock_posix.go:54 +0xac
  net.(*TCPConn).ReadFrom()
      /Users/liggitt/.gvm/gos/go1.18/src/net/tcpsock.go:130 +0x68
  io.copyBuffer()
      /Users/liggitt/.gvm/gos/go1.18/src/io/io.go:412 +0x1c2
  io.Copy()
      /Users/liggitt/.gvm/gos/go1.18/src/io/io.go:385 +0x86
  net/http.persistConnWriter.ReadFrom()
      /Users/liggitt/.gvm/gos/go1.18/src/net/http/transport.go:17
…

/var/folders/7f/9xt_73f12xlby0w362rgk0s400kjgb/T/go-stress-20220325T130052-124085153
==================
WARNING: DATA RACE
Read at 0x00c0000a0f78 by goroutine 13:
  bytes.(*Reader).Read()
      /Users/liggitt/.gvm/gos/go1.18/src/bytes/reader.go:41 +0x58
  io.(*nopCloser).Read()
      <autogenerated>:1 +0x76
  io.discard.ReadFrom()
      /Users/liggitt/.gvm/gos/go1.18/src/io/io.go:610 +0x91
  io.(*discard).ReadFrom()
      <autogenerated>:1 +0x5d
  io.copyBuffer()
      /Users/liggitt/.gvm/gos/go1.18/src/io/io.go:412 +0x1c2
  io.Copy()
      /Users/liggitt/.gvm/gos/go1.18/src/io/io.go:385 +0x64
  net/http.(*transferWriter).doBodyCopy()
      /Users/liggitt/.gvm/gos/go1.18/src/net/http/transfer.go:411 +0x3e
  net/http.(*transferWriter).writeBody()
      /Users/liggitt/.gvm/gos/go1.18/src/net/http/transfer.go:374 +0x7ef
  net/http.(*Request).write()
      /Users/liggitt/.gvm/gos/go1.18/src/net/http/request.go:698 +0x1197
  net/http.(*persistConn).writeLoop()
      /Users/liggitt/.gvm/gos/go1.18/src/net/http/transport.go:2395 +0x2e4
  net/http.(*Transport).dialConn.func6()
      /Users/liggitt/.gvm/gos/go1.18/src/net/http/transport.go:1751 +0x39

Previous write at 0x00c0000a0f78 by goroutine 92:
  bytes.(*Reader).Read()
      /Users/liggitt/.gvm/gos/go1.18/src/bytes/reader.go:46 +0x11c
  io.(*nopCloser).Read()
      <autogenerated>:1 +0x76
  io.(*LimitedReader).Read()
      /Users/liggitt/.gvm/gos/go1.18/src/io/io.go:476 +0xc5
  io.copyBuffer()
      /Users/liggitt/.gvm/gos/go1.18/src/io/io.go:426 +0x28a
  io.Copy()
      /Users/liggitt/.gvm/gos/go1.18/src/io/io.go:385 +0x8d
  net.genericReadFrom()
      /Users/liggitt/.gvm/gos/go1.18/src/net/net.go:662 +0x28
  net.(*TCPConn).readFrom()
      /Users/liggitt/.gvm/gos/go1.18/src/net/tcpsock_posix.go:54 +0xac
  net.(*TCPConn).ReadFrom()
      /Users/liggitt/.gvm/gos/go1.18/src/net/tcpsock.go:130 +0x68
  io.copyBuffer()
      /Users/liggitt/.gvm/gos/go1.18/src/io/io.go:412 +0x1c2
  io.Copy()
      /Users/liggitt/.gvm/gos/go1.18/src/io/io.go:385 +0x86
  net/http.persistConnWriter.ReadFrom()
      /Users/liggitt/.gvm/gos/go1.18/src/net/http/transport.go:17
…
213 runs so far, 2 failures
@aojea
Copy link

@aojea aojea commented Mar 24, 2022

shorter reproducer , EDIT, it seems different but it happens only in 1.18 too

func TestRequestWriteSeek(t *testing.T) {
	testServer := httptest.NewServer(HandlerFunc(func(w ResponseWriter, req *Request) {
		w.Write([]byte("200"))
	}))
	defer testServer.Close()

	client := &Client{}

	body := bytes.NewReader([]byte(strings.Repeat("abcd", 1000)))
	req, err := NewRequest("POST", testServer.URL, body)
	if err != nil {
		t.Fatal(err)
	}
	resp, err := client.Do(req)
	if err != nil {
		t.Fatal(err)
	}
	defer resp.Body.Close()
	body.Seek(0, 0)
	io.Copy(io.Discard, resp.Body)
}

race detected

../../../bin/go test . -run TestRequestWriteSeek -race -c
[aojea@juanan http]$ stress ./http.test -test.run TestRequestWriteSeek

/tmp/go-stress-20220324T161734-2258076405
==================
WARNING: DATA RACE
Write at 0x00c00011e468 by goroutine 31:
  bytes.(*Reader).Seek()
      /home/aojea/src/golang/go/src/bytes/reader.go:133 +0x456
  net/http_test.TestRequestWriteSeek()

removing w.Write([]byte("200")) , there is no race anymore

func TestRequestWriteSeek(t *testing.T) {
	testServer := httptest.NewServer(HandlerFunc(func(w ResponseWriter, req *Request) {
		// w.Write([]byte("200"))
	}))
	defer testServer.Close()

	client := &Client{}

	body := bytes.NewReader([]byte(strings.Repeat("abcd", 1000)))
	req, err := NewRequest("POST", testServer.URL, body)
	if err != nil {
		t.Fatal(err)
	}
	resp, err := client.Do(req)
	if err != nil {
		t.Fatal(err)
	}
	defer resp.Body.Close()
	body.Seek(0, 0)
	io.Copy(io.Discard, resp.Body)
}

$ ../../../bin/go test . -run TestRequestWriteSeek -race -c
$ stress ./http.test -test.run TestRequestWriteSeek
5s: 1894 runs so far, 0 failures
10s: 3803 runs so far, 0 failures
15s: 5723 runs so far, 0 failures
20s: 7591 runs so far, 0 failures

@mknyszek mknyszek added this to the Backlog milestone Mar 24, 2022
@mknyszek mknyszek added the NeedsInvestigation label Mar 24, 2022
@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Mar 24, 2022

CC @neild

@liggitt
Copy link
Contributor Author

@liggitt liggitt commented Mar 24, 2022

shorter reproducer

TestRequestWriteSeek is not the same race we're seeing in kubernetes/kubernetes#108906. I don't see how the response writer would race with a Seek on the request body from the client side

@neild
Copy link
Contributor

@neild neild commented Mar 24, 2022

The HTTP client transport can continue to read from a request body after Client.Do returns, generally when the server returns response headers while still reading the request. The client will not read from the request body after the response body has been closed.

This is probably underdocumented in the net/http package documentation, but it's always been the case that the client may not be done with the request body when Do returns. I don't know what changed in go1.18 to affect the specific case here, but it's probably not a bug.

The fix here is to close the response body before reusing the request body; e.g.:

	if resp.StatusCode == http.StatusTooManyRequests {
		resp.Body.Close() // close the response body to complete the request before reusing the request body
		body.Seek(0, 0)
		return true
	}

@liggitt
Copy link
Contributor Author

@liggitt liggitt commented Mar 24, 2022

hrmm... that took longer to fail but still hit the same race between Read and Seek

package mytest

import (
	"bytes"
	"net/http"
	"net/http/httptest"
	"strings"
	"testing"
)

func TestSeek(t *testing.T) {
	testServer := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, req *http.Request) {
		defer req.Body.Close()
		w.Write([]byte("ok"))
	}))
	defer testServer.Close()

	client := &http.Client{}

	body := bytes.NewReader([]byte(strings.Repeat("abcd", 1000)))
	req, err := http.NewRequest("POST", testServer.URL, body)
	if err != nil {
		t.Fatal(err)
	}

	resp, err := client.Do(req)
	if err != nil {
		t.Fatal(err)
	}
	resp.Body.Close() // close the response body to complete the request before reusing the request body
	body.Seek(0, 0)
}
go version && go test -c -race my_test.go && stress ./mytest.test 
go version go1.18 darwin/amd64
726 runs so far, 0 failures
1631 runs so far, 0 failures

/var/folders/7f/9xt_73f12xlby0w362rgk0s400kjgb/T/go-stress-20220324T151410-135951252
PASS
==================
WARNING: DATA RACE
Read at 0x00c00009ef78 by goroutine 13:
  bytes.(*Reader).Read()
      /Users/liggitt/.gvm/gos/go1.18/src/bytes/reader.go:41 +0x58
  io.(*nopCloser).Read()
      <autogenerated>:1 +0x76
  io.discard.ReadFrom()
      /Users/liggitt/.gvm/gos/go1.18/src/io/io.go:610 +0x91
  io.(*discard).ReadFrom()
      <autogenerated>:1 +0x5d
  io.copyBuffer()
      /Users/liggitt/.gvm/gos/go1.18/src/io/io.go:412 +0x1c2
  io.Copy()
      /Users/liggitt/.gvm/gos/go1.18/src/io/io.go:385 +0x64
  net/http.(*transferWriter).doBodyCopy()
      /Users/liggitt/.gvm/gos/go1.18/src/net/http/transfer.go:411 +0x3e
  net/http.(*transferWriter).writeBody()
      /Users/liggitt/.gvm/gos/go1.18/src/net/http/transfer.go:374 +0x7ef
  net/http.(*Request).write()
      /Users/liggitt/.gvm/gos/go1.18/src/net/http/request.go:698 +0x1197
  net/http.(*persistConn).writeLoop()
      /Users/liggitt/.gvm/gos/go1.18/src/net/http/transport.go:2395 +0x2e4
  net/http.(*Transport).dialConn.func6()
      /Users/liggitt/.gvm/gos/go1.18/src/net/http/transport.go:1751 +0x39

Previous write at 0x00c00009ef78 by goroutine 7:
  bytes.(*Reader).Seek()
      /Users/liggitt/.gvm/gos/go1.18/src/bytes/reader.go:133 +0x3c5
  command-line-arguments.TestSeek()
      /Users/liggitt/projects/go118race/my_test.go:31 +0x39b
  testing.tRunner()
      /Users/liggitt/.gvm/gos/go1.18/src/testing/testing.go:1439 +0x213
  testing.(*T).Run.func1()
      /Users/liggitt/.gvm/gos/go1.18/src/testing/testing.go:1486 +0x47

Goroutine 13 (running) created at:
  net/http.(*Transport).dialConn()
      /Users/liggitt/.gvm/gos/go1.18/src/net/http/transport.go:1751 +0x2555
  net/http.(*Transport).dialConnFor()
      /Users/liggitt/.gvm/gos/go1.18/src/net/http/transport.go:1449 +0x13a
  net/http.(*Transport).queueForDial.func1()
      /Users/liggitt/.gvm/gos/go1.18/src/net/http/transport.go:1418 +0x47

Goroutine 7 (running) created at:
  testing.(*T).Run()
      /Users/liggitt/.gvm/gos/go1.18/src/testing/testing.go:1486 +0x724
  testing.runTests.func1()

@neild
Copy link
Contributor

@neild neild commented Mar 24, 2022

Interesting. That does look like a bug; nothing should continue reading from the request body after the response body has been closed.

@liggitt liggitt changed the title net/http: request.Body.Read() can be called after client.Do() returns in go1.18 net/http: request.Body.Read() can be called after client.Do() returns and resp.Body is drained/closed in go1.18 Mar 25, 2022
@liggitt
Copy link
Contributor Author

@liggitt liggitt commented Mar 25, 2022

Updated the issue description with a reproducer for the race occurring after client.Do() returns and resp.Body is drained/closed. It ~quickly fails with race issues when run with stress on go1.18 and I ran a few million requests on go1.17.8 without failures before I stopped it

@aojea
Copy link

@aojea aojea commented Mar 26, 2022

This is interesting, enabling the debugRoundTrip in the code

diff --git a/src/net/http/transport.go b/src/net/http/transport.go
index e41b20a15b..e24a023322 100644
--- a/src/net/http/transport.go
+++ b/src/net/http/transport.go
@@ -2591,7 +2591,7 @@ func (pc *persistConn) roundTrip(req *transportRequest) (resp *Response, err err
                }
        }()
 
-       const debugRoundTrip = false
+       const debugRoundTrip = true
 
        // Write the request concurrently with waiting for a response,
        // in case the server decides to reply before reading our full
@@ -2678,6 +2678,7 @@ func (tr *transportRequest) logf(format string, args ...any) {
        if logf, ok := tr.Request.Context().Value(tLogKey{}).(func(string, ...any)); ok {
                logf(time.Now().Format(time.RFC3339Nano)+": "+format, args...)
        }
+       fmt.Printf(time.Now().Format(time.RFC3339Nano)+": "+format+"\n", args...)
 }
 
 // markReused marks this connection as having been successfully used for a

we can see that the write finish before the recv in a working test

../../../bin/go test . -run TestRequestWriteSeek -race -v
=== RUN   TestRequestWriteSeek
2022-03-26T00:31:44.459028363+01:00: writeErrCh resv: <nil>/<nil>
2022-03-26T00:31:44.459401112+01:00: resc recv: 0xc000124000, <nil>/<nil>
Seek 2022-03-26 00:31:44.459426908 +0100 CET m=+0.012935412
--- PASS: TestRequestWriteSeek (0.00s)

however, in a test with a race, the writer returns later than the reader, and races with the Seek ... adding a lot of printfs to the code we can see it

2022-03-26T01:09:20.19703448+01:00 doBodyCopy else
2022-03-26T01:09:20.197045168+01:00 doBodyCopy
2022-03-26T01:09:20.197128351+01:00 doBodyCopy error:<nil>
2022-03-26T01:09:20.197144657+01:00 doBodyCopy else2
2022-03-26T01:09:20.197485482+01:00 doBodyCopy
2022-03-26T01:09:20.198247622+01:00: resc recv: 0xc00019a900, <nil>/<nil>  <---- return recv
Seek 2022-03-26 01:09:20.198295419 +0100 CET m=+0.015794059                  <---- seek
2022-03-26T01:09:20.197873433+01:00 doBodyCopy error:<nil>                        <---- is still writing
2022-03-26T01:09:20.198424047+01:00 write body and trailer error <nil>
2022-03-26T01:09:20.198445516+01:00 write writer flush after write body
2022-03-26T01:09:20.198477622+01:00 write writer exit closed true
2022-03-26T01:09:20.198499581+01:00 write loop after write 4119

@neild can this be related to any changes with the buffers or with some of the optimizations like splice() ?

or is a legit race?

@aojea
Copy link

@aojea aojea commented Mar 26, 2022

this race happens in 1.17 too

$ go version
go version go1.17.6 linux/amd64
$ go test http_test.go -v -race
=== RUN   TestSeek
==================
WARNING: DATA RACE
Write at 0x00c0001a2020 by goroutine 7:

, if we increase the request body

package mytest

import (
	"bytes"
	"net/http"
	"net/http/httptest"
	"strings"
	"testing"
)

func TestSeek(t *testing.T) {
	testServer := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, req *http.Request) {
		defer req.Body.Close()
		w.Write([]byte("ok"))
	}))
	defer testServer.Close()

	client := &http.Client{}

	body := bytes.NewReader([]byte(strings.Repeat("abcd", 10000000))) // <----
	req, err := http.NewRequest("POST", testServer.URL, body)
	if err != nil {
		t.Fatal(err)
	}

	resp, err := client.Do(req)
	if err != nil {
		t.Fatal(err)
	}
	resp.Body.Close() // close the response body to complete the request before reusing the request body
	body.Seek(0, 0)
}

it seems that the difference is go1.18 is faster to read the response and exit

@liggitt
Copy link
Contributor Author

@liggitt liggitt commented Mar 26, 2022

this race happens in 1.17 too ...

ah, that's helpful to know (though disconcerting)

@aojea
Copy link

@aojea aojea commented Mar 26, 2022

the question is if the writers, to the body of the request or the response, should be cancellable.

@aojea
Copy link

@aojea aojea commented Mar 29, 2022

Interesting. That does look like a bug; nothing should continue reading from the request body after the response body has been closed.

@neild that is not what is happening from my understanding of the code, I can see that there are 2 independent goroutines for processing the request and the response, and the one that reads the request body can not be cancelled by closing the response body, actually I can't see how it can be cancelled by any means.

It is easy to test by providing a big buffer in the request #51907 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation
Projects
None yet
Development

No branches or pull requests

4 participants