Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: investigate data race in Transport Request.Body reuse when GetBody is not defined #25009

Closed
ictar opened this issue Apr 23, 2018 · 69 comments

Comments

@ictar
Copy link

@ictar ictar commented Apr 23, 2018

What version of Go are you using (go version)?

$ go version
go version go1.10.1 linux/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOCACHE="/home/test/.cache/go-build"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GORACE=""
GOROOT="/home/test/go"
GOTMPDIR=""
GOTOOLDIR="/home/test/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build614182611=/tmp/go-build -gno-record-gcc-switches"

and nginx version: nginx/1.13.3

What did you do?

Here is my client:

package main

import (
    "fmt"
    "net/http"
    "sync"
    "time"
    "errors"
    "bytes"
    "io/ioutil"
)

type Client struct {
    urlBase    string
    httpClient *http.Client
}

func NewClient(url string) *Client {
    cc := &Client{
        urlBase:    url,
        httpClient: &http.Client{Transport: &http.Transport{MaxIdleConnsPerHost: 100, MaxIdleConns: 100, IdleConnTimeout: 90 * time.Second, ExpectContinueTimeout: 5 * time.Second}},
    }
    cc.httpClient.Timeout = 5 * time.Second
    return cc
}

func (self *Client) Send(content []byte) error {
    if content == nil {
        return errors.New("nil content")
    }

    body := &bytes.Buffer{}
    body.Write(content)

    begin := time.Now()
    req, err := http.NewRequest("POST", self.urlBase, body)
    if err != nil {
        return fmt.Errorf("Send %+v (from %v to %v)", err, begin, time.Now())
    }
    req.Header = make(http.Header)
    req.Header.Set("Content-Type", "application/json")
    begin = time.Now()
    resp, err := self.httpClient.Do(req)
    if err != nil {
        return fmt.Errorf("Send %+v (from %v to %v)", err, begin, time.Now())
    }
    defer resp.Body.Close()

    if resp.StatusCode >= http.StatusBadRequest {
        bs, _ := ioutil.ReadAll(resp.Body)
        return fmt.Errorf("Send : HTTP : %d , BODY : %s , REQ : %s", resp.StatusCode, string(bs), string(content))
    }

    bs, err := ioutil.ReadAll(resp.Body)
    if err != nil {
        return fmt.Errorf("Send : ERR : %v", err)
    }
    fmt.Println("Send : Resp : ", string(bs))
    return nil
}

func main() {
    client := NewClient(
        "https://test.cn", // I use /etc/hosts to bind the domain
    )
    var content = []byte(`{"test":1}`)
    var wg sync.WaitGroup
    for i := 0; i < 100; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            client.Send(content)
        }()
    }
    wg.Wait()
}

And here is my nginx configure of http2:

    http2_max_concurrent_streams 40960;
    http2_max_requests 3;
    http2_max_field_size  8k;
    http2_max_header_size  32k;
    http2_recv_buffer_size 1024k;
    http2_recv_timeout  120s;

What did you expect to see?

I expect all requests are successfully.

What did you see instead?

But I got 400 occasionally and found client prematurely closed stream: only 0 out of 10 bytes of request body received line in nginx log.

I used GODEBUG to try to figure out, and found data not sent after END_HEADERS

2018/04/23 18:32:25 http2: Transport encoding header ":method" = "POST"
2018/04/23 18:32:25 http2: Transport encoding header ":path" = "/"
2018/04/23 18:32:25 http2: Transport encoding header ":scheme" = "https"
2018/04/23 18:32:25 http2: Transport encoding header "content-type" = "application/json"
2018/04/23 18:32:25 http2: Transport encoding header "content-length" = "10"
2018/04/23 18:32:25 http2: Transport encoding header "accept-encoding" = "gzip"
2018/04/23 18:32:25 http2: Transport encoding header "user-agent" = "Go-http-client/2.0"
2018/04/23 18:32:25 http2: Framer 0xc4205480e0: wrote HEADERS flags=END_HEADERS stream=5 len=8
2018/04/23 18:32:25 http2: Framer 0xc4205480e0: wrote DATA flags=END_STREAM stream=5 len=0 data=""

That's the reason why nginx responsed 400 directly.

@agnivade agnivade changed the title net/http2 doesn't send body while content-length is not zero which cause 400 in nginx net/http: http2 client doesn't send body while content-length is not zero which causes 400 in nginx Apr 23, 2018
@agnivade agnivade added this to the Go1.11 milestone Apr 23, 2018
@agnivade

This comment has been minimized.

Copy link
Contributor

@agnivade agnivade commented Apr 23, 2018

Does this happen only with http2 ?

Is it possible to for you to write a quick http2 server in Go and make this entire issue to be self-contained and fully reproducible ?

@ictar

This comment has been minimized.

Copy link
Author

@ictar ictar commented Apr 24, 2018

@agnivade Yes, it's fine with http1.1. Here is a quite simple server:

package main

import (
    "fmt"
    "log"
    "net/http"
)

func myHandler(w http.ResponseWriter, r *http.Request) {
    fmt.Fprintf(w, "Hello there!\n")
}

func main() {
    http.HandleFunc("/", myHandler)
    log.Fatal(http.ListenAndServe(":18800", nil))
}

And nginx is needed.

@agnivade

This comment has been minimized.

Copy link
Contributor

@agnivade agnivade commented Apr 24, 2018

And nginx is needed.

I don't understand. The whole point of having a Go server was to have the repro self-contained. If this is a client issue, why do we need nginx ? Is the Go client-server not enough to cause the repro ?

@ictar

This comment has been minimized.

Copy link
Author

@ictar ictar commented Apr 24, 2018

@agnivade Nginx is needed because it will check the body length against the header content-length. That's why nginx reponse 400 Bad Request directly. I don't think this is nginx issue because according to the tcpdump and GODEBUG, no body is sent indeed

And please note that in order to make this issue reproducible, http2_max_requests is set to 3 for nginx.

@agnivade

This comment has been minimized.

Copy link
Contributor

@agnivade agnivade commented Apr 24, 2018

The point of having a Go server is to allow us to reproduce the issue without nginx. You can simply add the equivalent check in the Go server so that it's behavior is same as nginx.

@odeke-em

This comment has been minimized.

Copy link
Member

@odeke-em odeke-em commented May 12, 2018

Thank you @ictar for the report, we'll investigate it.

@agnivade thank you for handling this. I just wanted to add that I can relate to @ictar's need to have nginx for a repro(many companies that I know heavily use it) and that the question of absolving the issue only because Go's servers can't reproduce it might be tricky. This is because many companies will usually put nginx or some other web server as their frontend to their web apps, and from there proxy the content to polyglot services on their backend e.g Go, Node.js, Python(etc) web server. Interoperability between protocols and servers matters a lot and discrepancies between Go clients talking to various servers creep in :)

@agnivade

This comment has been minimized.

Copy link
Contributor

@agnivade agnivade commented May 12, 2018

Yes I understand. I was trying to isolate the issue.

I did not mean to say to try to reproduce the issue with a plain Go HTTP server. Rather I meant to write equivalent code in Go to mimic nginx behavior so that we hit the bug. If that is hard to do, then we obviously need nginx. But if a simple check of verifying the content-length header and the body shows the issue, then we theoretically don't need nginx. Kind of like the test case that one will write if this indeed is a bug.

@ernado

This comment has been minimized.

Copy link
Contributor

@ernado ernado commented May 31, 2018

Seems like just verifying content-length to body length does not show the issue and we need nginx for that.

Also does not reproduce if http.Client is used just by single goroutine (in a non-concurrent way).

@agnivade agnivade changed the title net/http: http2 client doesn't send body while content-length is not zero which causes 400 in nginx x/net/http2: client doesn't send body while content-length is not zero which causes 400 in nginx May 31, 2018
@agnivade

This comment has been minimized.

Copy link
Contributor

@agnivade agnivade commented May 31, 2018

@odeke-em

This comment has been minimized.

Copy link
Member

@odeke-em odeke-em commented Jun 15, 2018

@ictar I just got sometime this evening to try to reproduce this:

  1. I have installed nginx on an Ubuntu server
  2. I created a server in Go to put behind the nginx reverse proxy
  3. I generated a certificate and key using https://github.com/odeke-em/ssg ssg --host localhost
  4. I fired up nginx, the Go backend server and the Go client
    with my nginx file looking like this
user www-data;
worker_processes auto;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;

events {
        worker_connections 768;
        # multi_accept on;
}

http {
        http2_max_concurrent_streams 40960;
        http2_max_requests 3;
        http2_max_field_size 8k;
        http2_max_header_size 32k;
        http2_recv_buffer_size 1024k;
        http2_recv_timeout 120s;

        sendfile on;
        tcp_nopush on;
        tcp_nodelay on;
        keepalive_timeout 65;
        types_hash_max_size 2048;

        include /etc/nginx/mime.types;
        default_type application/octet-stream;

        ##
        # SSL Settings
        ##
        ssl_protocols TLSv1 TLSv1.1 TLSv1.2; # Dropping SSLv3, ref: POODLE
        ssl_prefer_server_ciphers on;

        ##
        # Logging Settings
        ##

        access_log /var/log/nginx/access.log;
        error_log /var/log/nginx/error.log;

        ##
        # Gzip Settings
        ##

        gzip on;

        # gzip_vary on;
        # gzip_proxied any;
        # gzip_comp_level 6;
        # gzip_buffers 16 8k;
        # gzip_http_version 1.1;
        # gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;

        ##
        # Virtual Host Configs
        ##

        include /etc/nginx/conf.d/*.conf;
        include /etc/nginx/sites-enabled/*;

        server {
            listen 9889;
            ssl_certificate     /home/emmanuel/Desktop/opensrc/bugs/golang/24763/cert.pem;
            ssl_certificate_key /home/emmanuel/Desktop/opensrc/bugs/golang/24763/key.pem;
            ssl on;
            location / {
                proxy_pass    https://127.0.0.1:9789;
            }
        }
}

and I ran it 50 times in a row for ((i=0; i<50; i++)) do go run client.go >> output;done
and even with curl as a control

$ curl -i -k -X POST https://localhost:9889
HTTP/1.1 200 OK
Server: nginx/1.14.0 (Ubuntu)
Date: Fri, 15 Jun 2018 06:45:44 GMT
Content-Type: text/plain; charset=utf-8
Content-Length: 12
Connection: keep-alive

Hello world!
$ curl -i -k -X POST https://localhost:9889
HTTP/1.1 200 OK
Server: nginx/1.14.0 (Ubuntu)
Date: Fri, 15 Jun 2018 06:45:46 GMT
Content-Type: text/plain; charset=utf-8
Content-Length: 12
Connection: keep-alive

Hello world!

but no failures and the client and server look like this

Client

package main

import (
        "bytes"
        "crypto/tls"
        "errors"
        "fmt"
        "io/ioutil"
        "net/http"
        "sync"
        "time"
)

type client struct {
        base string
        hc   *http.Client
}

func newClient(baseURL string) *client {
        return &client{
                base: baseURL,
                hc: &http.Client{
                        Timeout: 5 * time.Second,
                        Transport: &http.Transport{
                                MaxIdleConnsPerHost:   100,
                                MaxIdleConns:          100,
                                IdleConnTimeout:       90 * time.Second,
                                ExpectContinueTimeout: 5 * time.Second,
                                TLSClientConfig: &tls.Config{
                                        InsecureSkipVerify: true,
                                },
                        },
                },
        }
}

func (c *client) send(content []byte) error {
        if content == nil {
                return errors.New("nil content")
        }

        body := new(bytes.Buffer)
        body.Write(content)

        start := time.Now()
        req, err := http.NewRequest("POST", c.base, body)
        if err != nil {
                return fmt.Errorf("Send %+v (from %s)", err, time.Since(start))
        }
        req.Header = make(http.Header)
        req.Header.Set("Content-Type", "application/json")
        resp, err := c.hc.Do(req)
        if err != nil {
                return fmt.Errorf("Send: HTTPClient.Do: %+v (from %s)", err, time.Since(start))
        }
        defer resp.Body.Close()

        blob, err := ioutil.ReadAll(resp.Body)
        if err != nil {
                return fmt.Errorf("Send: Read: %+v (from %s)", err, time.Since(start))
        }
        fmt.Printf("Send: Resp: %s\n", blob)
        return nil
}

func main() {
        client := newClient("https://localhost:9889/")
        content := []byte(`{"test":1}`)
        var wg sync.WaitGroup
        for i := 0; i < 100; i++ {
                wg.Add(1)
                go func(i int) {
                        defer wg.Done()
                        if err := client.send(content); err != nil {
                                fmt.Printf("#%d: err: %v\n", i, err)
                        }
                }(i)
        }
        wg.Wait()
}

server.go

package main

import (
        "io/ioutil"
        "log"
        "net/http"
)

func main() {
        addr := ":9789"
        log.Printf("Serving at: %s", addr)
        http.HandleFunc("/", helloWorld)
        if err := http.ListenAndServeTLS(addr, "cert.pem", "key.pem", nil); err != nil {
                log.Fatalf("Failed to serve: %v", err)
        }
}

func helloWorld(w http.ResponseWriter, r *http.Request) {
        _, _ = ioutil.ReadAll(r.Body)
        defer r.Body.Close()
        w.Write([]byte("Hello world!"))
}

@ictar with the setup that I've provided above, could you please try it locally too and see if this can reproduce the bug? Thank you.

@ernado

This comment has been minimized.

Copy link
Contributor

@ernado ernado commented Jun 15, 2018

@odeke-em
Can you try a http server instead of https?
Also, you can take a look at #25652 where there are client and server code.
It is likely that 50 requests are not enough.

@odeke-em

This comment has been minimized.

Copy link
Member

@odeke-em odeke-em commented Jun 15, 2018

With a minor edit to client.go to ensure we've configured the transport for HTTP2 with

--- old.go	2018-06-14 23:55:32.030299882 -0700
+++ client.go	2018-06-14 23:55:10.526187144 -0700
@@ -9,6 +9,8 @@
 	"net/http"
 	"sync"
 	"time"
+
+	"golang.org/x/net/http2"
 )
 
 type client struct {
@@ -17,19 +19,22 @@
 }
 
 func newClient(baseURL string) *client {
+	tr := &http.Transport{
+		MaxIdleConnsPerHost:   100,
+		MaxIdleConns:          100,
+		IdleConnTimeout:       90 * time.Second,
+		ExpectContinueTimeout: 5 * time.Second,
+		TLSClientConfig: &tls.Config{
+			InsecureSkipVerify: true,
+		},
+	}
+
+	http2.ConfigureTransport(tr)
 	return &client{
 		base: baseURL,
 		hc: &http.Client{
-			Timeout: 5 * time.Second,
-			Transport: &http.Transport{
-				MaxIdleConnsPerHost:   100,
-				MaxIdleConns:          100,
-				IdleConnTimeout:       90 * time.Second,
-				ExpectContinueTimeout: 5 * time.Second,
-				TLSClientConfig: &tls.Config{
-					InsecureSkipVerify: true,
-				},
-			},
+			Timeout:   5 * time.Second,
+			Transport: tr,
 		},
 	}
 }
@odeke-em

This comment has been minimized.

Copy link
Member

@odeke-em odeke-em commented Jun 15, 2018

@ictar would you mind sharing your nginx.conf adapted from the one I posted above?

@odeke-em

This comment has been minimized.

Copy link
Member

@odeke-em odeke-em commented Jun 15, 2018

*rather @ernado

@ernado

This comment has been minimized.

Copy link
Contributor

@ernado ernado commented Jun 15, 2018

@odeke-em
I will try to create something like docker-compose so we can have fully reproducible environment.

@odeke-em

This comment has been minimized.

Copy link
Member

@odeke-em odeke-em commented Jun 15, 2018

@ernado roger that

It is likely that 50 requests are not enough.

@ernado I used 100 requests and ran it 50 times, so 5000 requests. I've done 4 concurrent for ((i=0; i<1000; i++)) do go run client.go;done so 4 * 1000 * 100 = 400,000 requests

@ernado

This comment has been minimized.

Copy link
Contributor

@ernado ernado commented Jun 15, 2018

@odeke-em
I've created an example via docker-compose.
Please take a look at https://github.com/ernado/go-issue-25009

@odeke-em

This comment has been minimized.

Copy link
Member

@odeke-em odeke-em commented Jun 16, 2018

Awesome, thank you @ernado for the repro via docker-compose
screen shot 2018-06-16 at 3 08 02 pm

I am going to update the title a little though as this seems to happen a few times and not always
and I'll defer for some help from @bradfitz @rs @tombergan

@ernado

This comment has been minimized.

Copy link
Contributor

@ernado ernado commented Jun 16, 2018

The http2_max_requests nginx directive affects the error rate. If I set it high enough (more than total requests is done by the client) there are no errors.

Somehow relates to #18112.

I've updated my go-issue-25009 with test.sh so one can play with nginx config or client/server and quickly do the repro.

@odeke-em

This comment has been minimized.

Copy link
Member

@odeke-em odeke-em commented Jun 16, 2018

Yeah, I wondered and noticed that too that the max_requests if high enough, depreciates the error rate.

@ernado

This comment has been minimized.

Copy link
Contributor

@ernado ernado commented Jun 17, 2018

Interesting point: if go1.7 is used, I'm getting an error on http request:

http2: Transport: cannot retry err [http2: Transport received Server's graceful shutdown GOAWAY] after Request.Body was written; define Request.GetBody to avoid this error

(tried setting request.GetBody with no effect).
And with 1.8 I get http2: no cached connection was available.

Maybe that could help.

@rs

This comment has been minimized.

Copy link
Contributor

@rs rs commented Jun 17, 2018

With such a low max_request, you have a high chance of getting a GOAWAY frame from the server before the body is sent. The h2 client will decide that that the request can be retried, but because it is a POST, the client needs a way to get a new version of the body buffer. To let the client retry POST you need to define Request.GetBody on your client as describe in the error message. I'm not sure why it would not work if defined though.

@rs

This comment has been minimized.

Copy link
Contributor

@rs rs commented Jun 17, 2018

The h2 retry logic is here for reference: https://github.com/golang/net/blob/master/http2/transport.go#L422

@ernado

This comment has been minimized.

Copy link
Contributor

@ernado ernado commented Jun 17, 2018

@rs , I'm passing *bytes.Reader to http.NewRequest which is already setting req.GetBody: https://github.com/golang/go/blob/master/src/net/http/request.go#L816

@ernado

This comment has been minimized.

Copy link
Contributor

@ernado ernado commented Jun 17, 2018

Also if we use separate http.Client for each goroutine (so we don't call it concurrently) there are no errors.
I've updated my repo so you can try it via environment variables. E.g.

$ CONCURRENT=0 ./test.sh 
# works fine
$ CONCURRENT=1 ./test.sh 
# same 400 Bad Request

Also you can set BODY=1 to setreq.GetBody explicitly (no effect) and pass REQUESTS and JOBS count.

@ernado

This comment has been minimized.

Copy link
Contributor

@ernado ernado commented Aug 26, 2018

I've implemented a test that does not require nginx:
https://github.com/ernado/go-issue-25009/blob/9f8369b77105a097b36f583c37f66391b31f85d5/main_test.go#L85-L214

$ git clone https://github.com/ernado/go-issue-25009.git && cd go-issue-25009 && go test
     [send] SETTINGS Frame (length:6, flags:0x00, stream_id:0)
     [send] SETTINGS Frame (length:0, flags:0x01, stream_id:0)
     [recv] WINDOW_UPDATE Frame (length:4, flags:0x00, stream_id:0)
     [recv] HEADERS Frame (length:39, flags:0x04, stream_id:1)
     [recv] DATA Frame (length:1, flags:0x00, stream_id:1)
     [send] HEADERS Frame (length:4, flags:0x04, stream_id:1)
     [send] DATA Frame (length:7, flags:0x01, stream_id:1)
     [send] GOAWAY Frame (length:20, flags:0x00, stream_id:0)
     [send] SETTINGS Frame (length:6, flags:0x00, stream_id:0)
     [send] SETTINGS Frame (length:0, flags:0x01, stream_id:0)
     [recv] WINDOW_UPDATE Frame (length:4, flags:0x00, stream_id:0)
     [recv] HEADERS Frame (length:39, flags:0x04, stream_id:1)
     [send] GOAWAY Frame (length:12, flags:0x00, stream_id:0)
     [send] SETTINGS Frame (length:6, flags:0x00, stream_id:0)
     [send] SETTINGS Frame (length:0, flags:0x01, stream_id:0)
     [recv] WINDOW_UPDATE Frame (length:4, flags:0x00, stream_id:0)
     [recv] HEADERS Frame (length:39, flags:0x04, stream_id:1)
     [recv] DATA Frame (length:0, flags:0x01, stream_id:1)
     [send] HEADERS Frame (length:1, flags:0x05, stream_id:1)
     [send] DATA Frame (length:5, flags:0x01, stream_id:1)
     [send] GOAWAY Frame (length:20, flags:0x00, stream_id:0)
--- FAIL: TestIssue25009 (0.05s)
    main_test.go:119: server: read request: [header field ":authority" = "127.0.0.1:39629" header field ":method" = "POST" header field ":path" = "/" header field ":scheme" = "https" header field "content-length" = "1" header field "accept-encoding" = "gzip" header field "user-agent" = "Go-http-client/2.0"]
    main_test.go:140: server: got length: 1
    main_test.go:208: client: ok: 002
    main_test.go:119: server: read request: [header field ":authority" = "127.0.0.1:39629" header field ":method" = "POST" header field ":path" = "/" header field ":scheme" = "https" header field "content-length" = "1" header field "accept-encoding" = "gzip" header field "user-agent" = "Go-http-client/2.0"]
    main_test.go:119: server: read request: [header field ":authority" = "127.0.0.1:39629" header field ":method" = "POST" header field ":path" = "/" header field ":scheme" = "https" header field "content-length" = "1" header field "accept-encoding" = "gzip" header field "user-agent" = "Go-http-client/2.0"]
    main_test.go:140: server: got length: 0
    main_test.go:206: client: failed: code=400 001
FAIL
exit status 1
FAIL	github.com/ernado/go-issue-25009	0.057s

So, as mentioned earlier, we are getting zero length data frame:

[recv] DATA Frame (length:0, flags:0x01, stream_id:1)

If I remove the pconn.alt == nil condition, I'm getting the following:

go test -v
=== RUN   TestIssue25009
     [send] SETTINGS Frame (length:6, flags:0x00, stream_id:0)
     [send] SETTINGS Frame (length:0, flags:0x01, stream_id:0)
     [recv] WINDOW_UPDATE Frame (length:4, flags:0x00, stream_id:0)
     [recv] HEADERS Frame (length:39, flags:0x04, stream_id:1)
     [recv] DATA Frame (length:1, flags:0x00, stream_id:1)
     [send] HEADERS Frame (length:4, flags:0x04, stream_id:1)
     [send] DATA Frame (length:7, flags:0x01, stream_id:1)
     [send] GOAWAY Frame (length:20, flags:0x00, stream_id:0)
     [send] SETTINGS Frame (length:6, flags:0x00, stream_id:0)
     [send] SETTINGS Frame (length:0, flags:0x01, stream_id:0)
     [recv] WINDOW_UPDATE Frame (length:4, flags:0x00, stream_id:0)
     [recv] HEADERS Frame (length:39, flags:0x04, stream_id:1)
     [send] GOAWAY Frame (length:12, flags:0x00, stream_id:0)
     [send] SETTINGS Frame (length:6, flags:0x00, stream_id:0)
     [send] SETTINGS Frame (length:0, flags:0x01, stream_id:0)
     [recv] WINDOW_UPDATE Frame (length:4, flags:0x00, stream_id:0)
     [recv] HEADERS Frame (length:39, flags:0x04, stream_id:1)
     [recv] DATA Frame (length:1, flags:0x00, stream_id:1)
     [send] HEADERS Frame (length:4, flags:0x04, stream_id:1)
     [send] DATA Frame (length:7, flags:0x01, stream_id:1)
     [send] GOAWAY Frame (length:20, flags:0x00, stream_id:0)
--- PASS: TestIssue25009 (0.05s)
    main_test.go:119: server: read request: [header field ":authority" = "127.0.0.1:37237" header field ":method" = "POST" header field ":path" = "/" header field ":scheme" = "https" header field "content-length" = "1" header field "accept-encoding" = "gzip" header field "user-agent" = "Go-http-client/2.0"]
    main_test.go:140: server: got length: 1
    main_test.go:208: client: ok: 002
    main_test.go:119: server: read request: [header field ":authority" = "127.0.0.1:37237" header field ":method" = "POST" header field ":path" = "/" header field ":scheme" = "https" header field "content-length" = "1" header field "accept-encoding" = "gzip" header field "user-agent" = "Go-http-client/2.0"]
    main_test.go:119: server: read request: [header field ":authority" = "127.0.0.1:37237" header field ":method" = "POST" header field ":path" = "/" header field ":scheme" = "https" header field "content-length" = "1" header field "accept-encoding" = "gzip" header field "user-agent" = "Go-http-client/2.0"]
    main_test.go:140: server: got length: 1
    main_test.go:208: client: ok: 001
PASS
ok  	github.com/ernado/go-issue-25009	0.056s

@bradfitz Do you need more info?

CC @rs

UPD: Can we consider fixing it in go1.11.1 instead of go1.12?
UPD2: Re-checked on b7d3e14, no changes, still failing without patch

@fraenkel

This comment has been minimized.

Copy link
Contributor

@fraenkel fraenkel commented Aug 26, 2018

I think you have inferred a behavior that is incorrect. The GOAWAY Frame returns the stream id that was last seen. Given the test is sending back stream id 0, stream 1 is guaranteed not to succeed. See GOAWAY

@ernado

This comment has been minimized.

Copy link
Contributor

@ernado ernado commented Aug 26, 2018

@fraenkel I'm 100% sure that my test server behavior is incorrect, but http client should not retry the request without body rewind even in that case. I'm sure that server behavior triggers the condition that is very close to bug that I've discovered. I can try to do the full repro, but I think it is not necessary.

So the logs for nginx as a server with http2debug=2:

=== RUN   TestWithNginx
2018/08/26 18:48:54 http2: Transport failed to get client conn for go-issue-25009:443: http2: no cached connection was available
2018/08/26 18:48:54 http2: Transport failed to get client conn for go-issue-25009:443: http2: no cached connection was available
2018/08/26 18:48:54 http2: Transport creating client conn 0xc0003a4380 to 127.0.0.1:443
2018/08/26 18:48:54 http2: Framer 0xc00035e000: wrote SETTINGS len=18, settings: ENABLE_PUSH=0, INITIAL_WINDOW_SIZE=4194304, MAX_HEADER_LIST_SIZE=10485760
2018/08/26 18:48:54 http2: Framer 0xc00035e000: wrote WINDOW_UPDATE len=4 (conn) incr=1073741824
2018/08/26 18:48:54 http2: Transport encoding header ":authority" = "go-issue-25009"
2018/08/26 18:48:54 http2: Transport encoding header ":method" = "POST"
2018/08/26 18:48:54 http2: Transport encoding header ":path" = "/00/002"
2018/08/26 18:48:54 http2: Transport encoding header ":scheme" = "https"
2018/08/26 18:48:54 http2: Transport encoding header "content-length" = "1"
2018/08/26 18:48:54 http2: Transport encoding header "accept-encoding" = "gzip"
2018/08/26 18:48:54 http2: Transport encoding header "user-agent" = "Go-http-client/2.0"
2018/08/26 18:48:54 http2: Framer 0xc00035e000: wrote HEADERS flags=END_HEADERS stream=1 len=44
2018/08/26 18:48:54 http2: Framer 0xc00035e000: read SETTINGS len=18, settings: MAX_CONCURRENT_STREAMS=128, INITIAL_WINDOW_SIZE=65536, MAX_FRAME_SIZE=16777215
2018/08/26 18:48:54 http2: Transport encoding header ":authority" = "go-issue-25009"
2018/08/26 18:48:54 http2: Transport received SETTINGS len=18, settings: MAX_CONCURRENT_STREAMS=128, INITIAL_WINDOW_SIZE=65536, MAX_FRAME_SIZE=16777215
2018/08/26 18:48:54 http2: Transport encoding header ":method" = "POST"
2018/08/26 18:48:54 http2: Transport encoding header ":path" = "/01/001"
2018/08/26 18:48:54 http2: Transport encoding header ":scheme" = "https"
2018/08/26 18:48:54 http2: Transport encoding header "content-length" = "1"
2018/08/26 18:48:54 http2: Transport encoding header "accept-encoding" = "gzip"
2018/08/26 18:48:54 http2: Transport encoding header "user-agent" = "Go-http-client/2.0"
2018/08/26 18:48:54 http2: Framer 0xc00035e000: wrote HEADERS flags=END_HEADERS stream=3 len=13
2018/08/26 18:48:54 http2: Framer 0xc00035e000: wrote DATA stream=3 len=1 data="1"
2018/08/26 18:48:54 http2: Framer 0xc00035e000: wrote DATA stream=1 len=1 data="2"
2018/08/26 18:48:54 http2: Framer 0xc00035e000: wrote SETTINGS flags=ACK len=0
2018/08/26 18:48:54 http2: Framer 0xc00035e000: read WINDOW_UPDATE len=4 (conn) incr=2147418112
2018/08/26 18:48:54 http2: Transport received WINDOW_UPDATE len=4 (conn) incr=2147418112
2018/08/26 18:48:54 http2: Framer 0xc00035e000: read SETTINGS flags=ACK len=0
2018/08/26 18:48:54 http2: Transport received SETTINGS flags=ACK len=0
2018/08/26 18:48:54 http2: Framer 0xc00035e000: read GOAWAY len=8 LastStreamID=1 ErrCode=NO_ERROR Debug=""
2018/08/26 18:48:54 http2: Transport received GOAWAY len=8 LastStreamID=1 ErrCode=NO_ERROR Debug=""
2018/08/26 18:48:54 http2: Framer 0xc00035e000: read WINDOW_UPDATE stream=1 len=4 incr=1
2018/08/26 18:48:54 http2: Transport received WINDOW_UPDATE stream=1 len=4 incr=1
2018/08/26 18:48:54 http2: Transport failed to get client conn for go-issue-25009:443: http2: no cached connection was available
2018/08/26 18:48:54 http2: Framer 0xc00035e000: wrote DATA flags=END_STREAM stream=3 len=0 data=""
2018/08/26 18:48:54 http2: Framer 0xc00035e000: wrote DATA flags=END_STREAM stream=1 len=0 data=""
2018/08/26 18:48:54 http2: Framer 0xc00035e000: read HEADERS flags=END_STREAM|END_HEADERS stream=1 len=41
2018/08/26 18:48:54 http2: decoded hpack field header field ":status" = "200"
2018/08/26 18:48:54 http2: decoded hpack field header field "server" = "nginx/1.15.2"
2018/08/26 18:48:54 http2: decoded hpack field header field "date" = "Sun, 26 Aug 2018 15:48:54 GMT"
2018/08/26 18:48:54 http2: decoded hpack field header field "content-length" = "0"
2018/08/26 18:48:54 http2: Transport received HEADERS flags=END_STREAM|END_HEADERS stream=1 len=41
2018/08/26 18:48:54 http2: Framer 0xc00035e000: read WINDOW_UPDATE stream=1 len=4 incr=2147418111
2018/08/26 18:48:54 http2: Transport received WINDOW_UPDATE stream=1 len=4 incr=2147418111
2018/08/26 18:48:54 http2: Transport readFrame error on conn 0xc0003a4380: (*errors.errorString) EOF
2018/08/26 18:48:54 http2: Transport creating client conn 0xc0001c81c0 to 127.0.0.1:443
2018/08/26 18:48:54 http2: Framer 0xc0003d41c0: wrote SETTINGS len=18, settings: ENABLE_PUSH=0, INITIAL_WINDOW_SIZE=4194304, MAX_HEADER_LIST_SIZE=10485760
2018/08/26 18:48:54 http2: Framer 0xc0003d41c0: wrote WINDOW_UPDATE len=4 (conn) incr=1073741824
2018/08/26 18:48:54 http2: Transport encoding header ":authority" = "go-issue-25009"
2018/08/26 18:48:54 http2: Transport encoding header ":method" = "POST"
2018/08/26 18:48:54 http2: Transport encoding header ":path" = "/01/001"
2018/08/26 18:48:54 http2: Transport encoding header ":scheme" = "https"
2018/08/26 18:48:54 http2: Transport encoding header "content-length" = "1"
2018/08/26 18:48:54 http2: Transport encoding header "accept-encoding" = "gzip"
2018/08/26 18:48:54 http2: Transport encoding header "user-agent" = "Go-http-client/2.0"
2018/08/26 18:48:54 http2: Framer 0xc0003d41c0: wrote HEADERS flags=END_HEADERS stream=1 len=44
2018/08/26 18:48:54 http2: Framer 0xc0003d41c0: wrote DATA flags=END_STREAM stream=1 len=0 data=""
2018/08/26 18:48:54 http2: Framer 0xc0003d41c0: read SETTINGS len=18, settings: MAX_CONCURRENT_STREAMS=128, INITIAL_WINDOW_SIZE=65536, MAX_FRAME_SIZE=16777215
2018/08/26 18:48:54 http2: Transport received SETTINGS len=18, settings: MAX_CONCURRENT_STREAMS=128, INITIAL_WINDOW_SIZE=65536, MAX_FRAME_SIZE=16777215
2018/08/26 18:48:54 http2: Framer 0xc0003d41c0: wrote SETTINGS flags=ACK len=0
2018/08/26 18:48:54 http2: Framer 0xc0003d41c0: read WINDOW_UPDATE len=4 (conn) incr=2147418112
2018/08/26 18:48:54 http2: Transport received WINDOW_UPDATE len=4 (conn) incr=2147418112
2018/08/26 18:48:54 http2: Framer 0xc0003d41c0: read SETTINGS flags=ACK len=0
2018/08/26 18:48:54 http2: Transport received SETTINGS flags=ACK len=0
2018/08/26 18:48:54 http2: Framer 0xc0003d41c0: read GOAWAY len=8 LastStreamID=1 ErrCode=NO_ERROR Debug=""
2018/08/26 18:48:54 http2: Transport received GOAWAY len=8 LastStreamID=1 ErrCode=NO_ERROR Debug=""
2018/08/26 18:48:54 http2: Framer 0xc0003d41c0: read HEADERS flags=END_HEADERS stream=1 len=52
2018/08/26 18:48:54 http2: decoded hpack field header field ":status" = "400"
2018/08/26 18:48:54 http2: decoded hpack field header field "server" = "nginx/1.15.2"
2018/08/26 18:48:54 http2: decoded hpack field header field "date" = "Sun, 26 Aug 2018 15:48:54 GMT"
2018/08/26 18:48:54 http2: decoded hpack field header field "content-type" = "text/html"
2018/08/26 18:48:54 http2: decoded hpack field header field "content-length" = "173"
2018/08/26 18:48:54 http2: Transport received HEADERS flags=END_HEADERS stream=1 len=52
2018/08/26 18:48:54 http2: Framer 0xc0003d41c0: read DATA flags=END_STREAM stream=1 len=173 data="<html>\r\n<head><title>400 Bad Request</title></head>\r\n<body bgcolor=\"white\">\r\n<center><h1>400 Bad Request</h1></center>\r\n<hr><center>nginx/1.15.2</center>\r\n</body>\r\n</html>\r\n"
2018/08/26 18:48:54 http2: Transport received DATA flags=END_STREAM stream=1 len=173 data="<html>\r\n<head><title>400 Bad Request</title></head>\r\n<body bgcolor=\"white\">\r\n<center><h1>400 Bad Request</h1></center>\r\n<hr><center>nginx/1.15.2</center>\r\n</body>\r\n</html>\r\n"
2018/08/26 18:48:54 http2: Transport readFrame error on conn 0xc0001c81c0: (*errors.errorString) EOF
--- FAIL: TestWithNginx (0.04s)
    main_test.go:262: client: ok: 002
    main_test.go:260: client: failed: code=400 001
FAIL

And the corresponding client code:

func TestWithNginx(t *testing.T) {
	const (
		requests = 2
		addr     = "https://go-issue-25009"
	)
	go func() {
		http.HandleFunc("/", func(writer http.ResponseWriter, request *http.Request) {
			io.Copy(ioutil.Discard, request.Body)
			request.Body.Close()
			writer.WriteHeader(http.StatusOK)
		})
		if err := http.ListenAndServe("localhost:31001", nil); err != nil {
			t.Fatal(err)
		}
	}()
	var (
		wg     = new(sync.WaitGroup)
		client = newClient()
		count  int64
	)
	for i := 0; i < requests; i++ {
		wg.Add(1)
		go func(id int) {
			defer wg.Done()
			for {
				current := atomic.AddInt64(&count, 1)
				if current > requests {
					break
				}
				buf := new(bytes.Buffer)
				fmt.Fprint(buf, current)
				req, err := http.NewRequest(http.MethodPost, addr+fmt.Sprintf("/%02d/%03d", id, current), buf)
				if err != nil {
					t.Error(err)
					continue
				}
				res, err := client.Do(req)
				if err != nil {
					t.Errorf("client: %v", err)
					continue
				}
				io.Copy(ioutil.Discard, res.Body)
				res.Body.Close()
				if res.StatusCode != http.StatusOK {
					t.Errorf("client: failed: code=%d %03d", res.StatusCode, current)
				} else {
					t.Logf("client: ok: %03d", current)
				}
			}
		}(i)
	}
	wg.Wait()
}

nginx config:

server {
        listen 443 ssl http2;
	http2_max_requests 1;
	error_log  /var/log/nginx/error.log debug;
        location / {
                proxy_http_version 1.1;
                proxy_request_buffering off;
                proxy_buffering off;
                proxy_pass_request_headers on;
                proxy_set_header Connection "";
                proxy_pass http://localhost:31001;
        }
        server_name go-issue-25009;
        ssl_certificate     /src/go-issue-25009/certs/go-issue-25009.pem;
        ssl_certificate_key /src/go-issue-25009/certs/go-issue-25009-key.pem;
}
@fraenkel

This comment has been minimized.

Copy link
Contributor

@fraenkel fraenkel commented Aug 26, 2018

There is something funny with that trace.
You send out 2 requests, although the order gets a bit flipped due to racy goroutines which is fine.
stream=3 is 00 and stream=1 is 01.
We get a GOAWAY on stream 1. And we receive a 200 for stream 1. The connection is then closed.
There is a retry for 00, a GOAWAY comes back for it with a 400.

The question is what was nginx doing.

@fraenkel

This comment has been minimized.

Copy link
Contributor

@fraenkel fraenkel commented Aug 26, 2018

@ernado You are correct that the retry does not reset the body. If you look at the retry, it sends a 0 byte body instead of a 1 byte body.
I will have to try your testcase and see if it reproduces that issue.

@ernado

This comment has been minimized.

Copy link
Contributor

@ernado ernado commented Aug 26, 2018

From https://trac.nginx.org/nginx/ticket/1250#comment:4

When nginx reaches http2_max_requests, it sends GOAWAY and closes the socket using close() (observed in strace).
But packets from the other side can still be in flight.
When an in-flight packet arrives at the kernel of the machine running nginx, after close(), the kernel will reply with TCP RST.
When the browser side receives the RST, its kernel will discard any receive buffers, even for packets that it already ACKed.
These receive buffers may contain the GOAWAY (or the last headers/data after it in the same stream).

Can be related to that issue.

@fraenkel
My tetstcase seems to reproduce the issue:

server: got length: 0

Also I've described why we the body is not reset in
#25009 (comment)

You can try the following patch to force the correct body reset:

--- a/src/net/http/transport.go
+++ b/src/net/http/transport.go
@@ -477,9 +477,8 @@ func (t *Transport) roundTrip(req *Request) (*Response, error) {
                }
                testHookRoundTripRetried()
 
-               // Rewind the body if we're able to.  (HTTP/2 does this itself so we only
-               // need to do it for HTTP/1.1 connections.)
-               if req.GetBody != nil && pconn.alt == nil {
+               // Rewind the body if we're able to.
+               if req.GetBody != nil {
                        newReq := *req
                        var err error
                        newReq.Body, err = req.GetBody()
@fraenkel

This comment has been minimized.

Copy link
Contributor

@fraenkel fraenkel commented Aug 26, 2018

Your testcase does not replicate the issue. You are trying to mimic the behavior but you aren't. The problem is that the request is retried and the body isn't reset so nginx sends back a 400 since it expected 1 byte of data and got 0.

The fix should be in the http2 code and not http which is what you have done.

@ernado

This comment has been minimized.

Copy link
Contributor

@ernado ernado commented Aug 26, 2018

Your testcase does not replicate the issue. You are trying to mimic the behavior but you aren't.

You are absolutely right, but the test case still triggers the invalid retry, isn't it? The client retries with request which body is not reset properly.

The problem is that the request is retried and the body isn't reset so nginx sends back a 400 since it expected 1 byte of data and got 0.

Yes, it is exactly what I was trying to describe.
And I've already described why it is not actually resetting the body in my #25009 (comment)

The fix should be in the http2 code and not http which is what you have done

I've provided both possible http2 and http "fixes", and they are described in #25009 (comment)

@fraenkel

This comment has been minimized.

Copy link
Contributor

@fraenkel fraenkel commented Aug 26, 2018

Why aren't you using the req.GetBody func? The is how you provide a body on retries.

@ernado

This comment has been minimized.

Copy link
Contributor

@ernado ernado commented Aug 26, 2018

Because it is already used under the hood when body is *bytes.Buffer

@fraenkel

This comment has been minimized.

Copy link
Contributor

@fraenkel fraenkel commented Aug 26, 2018

So what is currently there should work because the newReq is returned and retried in the loop. Setting it back on the original request is racy, but the outcome should be the same. Let me see if I can reproduce this and dig as to why the newReq isn't working which is what should be working.

@ernado

This comment has been minimized.

Copy link
Contributor

@ernado ernado commented Aug 26, 2018

Also I've tried both of my "patches", they are forcing the correct body reset which results in 200 OK from nginx (and passed test case).

@ernado

This comment has been minimized.

Copy link
Contributor

@ernado ernado commented Aug 26, 2018

NewReq is not working because we are not using it, we are failing to get connection from the pool and returning from http2 retry loop.

@fraenkel

This comment has been minimized.

Copy link
Contributor

@fraenkel fraenkel commented Aug 27, 2018

I agree that you found the bug. The fix while it works doesn't seem right.
@bradfitz : The bug is when http2 fails to obtain a connection and leaves http2/transport.go(RoundTripOpt:389), the new request is not returned to be retried from http/transport.go (roundtrip:463) instead the original one is used with a body that has already been read.

@fraenkel

This comment has been minimized.

Copy link
Contributor

@fraenkel fraenkel commented Aug 27, 2018

@ernado Your fix in http/transport.go looks correct. We still would need a test case to go along with the change.

@ernado

This comment has been minimized.

Copy link
Contributor

@ernado ernado commented Aug 28, 2018

I've implemented a test case for transport and currently preparing the patch.

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Aug 28, 2018

Change https://golang.org/cl/131755 mentions this issue: net/http: rewind request body unconditionally

@ernado

This comment has been minimized.

Copy link
Contributor

@ernado ernado commented Sep 6, 2018

Gentle ping for @bradfitz @ianlancetaylor
Please take a look, seems like we are close to resolving that issue.

@kaspergyselinck

This comment has been minimized.

Copy link

@kaspergyselinck kaspergyselinck commented May 27, 2019

Have you found the solution to this issue? I encountered the same problem where I'd get back a statuscode 400 from nginx sometimes when making a lot of concurrent calls. At some point I'd even get a 'broken pipe' error after which I'd have to restart the client. Using an environment variable (GODEBUG="http2client=0") fixes the issue, but I don't like this solution.

@ernado

This comment has been minimized.

Copy link
Contributor

@ernado ernado commented May 27, 2019

@kaspergyselinck what version of go are you using?

This problem should be fixed in go1.12.

@kwestein

This comment has been minimized.

Copy link

@kwestein kwestein commented Sep 25, 2019

Hi @ernado,

I am still seeing this issue. I have tested on both Go 1.12.9 and 1.13

It seems that others are as well in #32441

@ernado

This comment has been minimized.

Copy link
Contributor

@ernado ernado commented Sep 25, 2019

I can confirm that my go-issue-25009 repro still fails on go1.12 and go1.13 for some reason.
Seems like issue is not fixed (or there is a regression), I will investigate it further.

@ernado

This comment has been minimized.

Copy link
Contributor

@ernado ernado commented Sep 26, 2019

Thank you @kwestein, I've found the cause and commented in referenced issue with possible solution.

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Dec 8, 2019

Change https://golang.org/cl/210123 mentions this issue: net/http: rewind request body on ErrSkipAltProtocol

@Centhero

This comment has been minimized.

Copy link

@Centhero Centhero commented Jan 16, 2020

@ernado hello, i have meet the same solution, have you fix it ?

@ernado

This comment has been minimized.

Copy link
Contributor

@ernado ernado commented Jan 20, 2020

Hey @Centhero, I'm trying to fix in https://golang.org/cl/210123, but it seems like fix is not as trivial as I expected, sorry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.