Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: client requests fail if TCP connection becomes stuck (due to NAT timeout or such) #30702

Open
dop251 opened this issue Mar 9, 2019 · 1 comment
Milestone

Comments

@dop251
Copy link

@dop251 dop251 commented Mar 9, 2019

What did you do?

I've come across this problem when using one of the DNS-to-HTTPS proxies that use the Cloudflare's 1.1.1.1 service. Every time the PC came out of hibernation the DNS requests kept timing out until I restarted the proxy.

To illustrate the issue I wrote a simple client:

package main

import (
	"fmt"
	"io"
	"io/ioutil"
	"net/http"
	"time"
)

func main() {
	client := &http.Client{
		Timeout: 1 * time.Second,
	}

	for {
		resp, err := client.Get("https://1.1.1.1")
		if err != nil {
			fmt.Printf("Error: %v\n", err)
		} else {
			io.Copy(ioutil.Discard, resp.Body)
			resp.Body.Close()
			fmt.Println("Success")
			time.Sleep(1 * time.Second)
		}
	}
}

I started the client and then after a few seconds I simulated a NAT timeout by creating a firewall rule that blocked outbound traffic on that connection:

# netstat -anp | grep 1.1.1.1
...
tcp        0      0 192.168.0.39:47782      1.1.1.1:443             ESTABLISHED 23236/t

# iptables -I OUTPUT -p tcp -d 1.1.1.1 --dport 443 --sport 47782 -j DROP

What did you expect to see?

If I run it with GODEBUG=http2client=0 I get one failed request after which the connection gets killed and subsequent requests cause a new one to be opened after which everything continues to work:

...
Success
Error: Get https://1.1.1.1: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Success
Success
Success
...

What did you see instead?

Success
Error: Get https://1.1.1.1: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Error: Get https://1.1.1.1: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Error: Get https://1.1.1.1: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Error: Get https://1.1.1.1: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
...

System details

go version go1.12 linux/amd64
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/dop/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/dop/workspace"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
GOROOT/bin/go version: go version go1.12 linux/amd64
GOROOT/bin/go tool compile -V: compile version go1.12
uname -sr: Linux 4.4.0-142-generic
Distributor ID:	Ubuntu
Description:	Ubuntu 14.04.5 LTS
Release:	14.04
Codename:	trusty
/lib/x86_64-linux-gnu/libc.so.6: GNU C Library (Ubuntu EGLIBC 2.19-0ubuntu6.14) stable release version 2.19, by Roland McGrath et al.
gdb --version: GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.3) 7.7.1
@ALTree ALTree changed the title http/2 Client requests fail if TCP connection becomes stuck (due to NAT timeout or such) net/http: client requests fail if TCP connection becomes stuck (due to NAT timeout or such) Mar 10, 2019
@ALTree ALTree added this to the Go1.13 milestone Mar 10, 2019
@andybons andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019
@rsc rsc modified the milestones: Go1.14, Backlog Oct 9, 2019
@terinjokes
Copy link
Contributor

@terinjokes terinjokes commented Dec 6, 2019

I've been tracking down a similar looking issue in production tonight, where a server in a load balancer pool went away. Many Go HTTP/2 clients did not reconnect to the pool, continuously reporting instead:

net/http: request canceled (Client.Timeout exceeded while awaiting headers)

I can reproduce the errors described by @dop251 and that I observed locally by running the testcase and disrupting the network connection to 1.1.1.1, such as:

  • adding an iptables sport DROP rule similar to the given in the issue
  • connecting to a full tunnel VPN
  • changing network connection types (wifi -> ethernet, ethernet -> wifi)

I've expanded the testcase code to reduce the Dialer and Transport timeouts to below the Client timeout, expecting Go to recover from the situation, but this resulted in the same behavior.

Running with GODEBUG=http2debug=2:

2019/12/05 23:53:01 http2: Transport encoding header ":authority" = "1.1.1.1"
2019/12/05 23:53:01 http2: Transport encoding header ":method" = "GET"
2019/12/05 23:53:01 http2: Transport encoding header ":path" = "/"
2019/12/05 23:53:01 http2: Transport encoding header ":scheme" = "https"
2019/12/05 23:53:01 http2: Transport encoding header "accept-encoding" = "gzip"
2019/12/05 23:53:01 http2: Transport encoding header "user-agent" = "Go-http-client/2.0"
2019/12/05 23:53:01 http2: Framer 0xc000316540: wrote HEADERS flags=END_STREAM|END_HEADERS stream=93 len=6
2019/12/05 23:53:11 http2: Framer 0xc000316540: wrote RST_STREAM stream=93 len=4 ErrCode=CANCEL
2019/12/05 23:53:11 RoundTrip failure: net/http: request canceled
Error: Get https://1.1.1.1: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

The only timeout that seemed to change any behavior was ResponseHeaderTimeout, changing which error is received.

The client does eventually recover when the kernel marks the TCP connection as timed-out (that is, when tcp_retries2 expires), of which the defaults on Linux are very long. I'd expect a similar level and speed of recovery with HTTP/2 as Go provides with HTTP/1.

/cc @bradfitz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.