Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: All outgoing http requests are canceled if 1 dial times out #24194

Closed
jsalterwhisper opened this issue Mar 1, 2018 · 9 comments

Comments

Projects
None yet
4 participants
@jsalterwhisper
Copy link

commented Mar 1, 2018

What version of Go are you using (go version)?

go version go1.9.4 darwin/amd64
also go 1.8

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/jasonsalter/go"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/0v/c3_7c31s20j_l96jbslwg9gh0000gn/T/go-build630477543=/tmp/go-build -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"

What did you do?

package main

import (
  "fmt"
  "net/http"
  "net"
  "time"
  "context"
)

func main() {
  // channel to not exit main before requests finish
  d := make(chan bool)

  // make a RoundTripper and client so we dont have to use the default objects.
  // gurantees that requests made with this client trigger a dial error
  var DefaultOneMobileTransport http.RoundTripper = &http.Transport{
    Proxy: http.ProxyFromEnvironment,
    DialContext: (&net.Dialer{
      Timeout:   1 * time.Millisecond,
    }).DialContext,
  }

  badClient := &http.Client{
    Transport: DefaultOneMobileTransport,
  }

  // make a request and use a context from one of the 2 options in context.go
  r, _ :=   http.NewRequest("GET", "https://httpbin.org/delay/1", nil)
  c,_ := context.WithCancel(context.TODO())
  r = r.WithContext(c)

  // increasing 1 to higher numbers makes the error more repeatable
  for range(make([]int, 1)) {
    go func() {
      // request will have a dial error basically immediately
      _, err := badClient.Do(r)
      fmt.Printf("%s\n", err)
    }()
  }

  // start background request with default client/transports
  go func() {
    // make a request and try the other context
    // this endpoint waits at least 1 second and then returns
    // but the dial error from the other client kills it even though
    // it shouldnt share any dependencies
    r, _ :=   http.NewRequest("GET", "https://httpbin.org/delay/1", nil)
    c,_ := context.WithCancel(context.Background())
    r = r.WithContext(c)

    _, err := http.DefaultClient.Do(r)
    // should print nil but gets canceled by another bad request
    fmt.Printf("\ndefault %s\n", err)
    d <- true
  }()
  <- d
}

What did you expect to see?

The program uses the default http client and transport and a completely separate client/transport but the requests made from both share the same context so when one request fails all requests fail. Even if you remove the context additions to the requests they will use the same defaults and the error still happens. I only added in the calls because every context is either a child of todo or background and was trying to get it to work. The requests should not affect each other since they are made from different clients.

What did you see instead?

A request gets canceled because of another request from a different client getting a dial error.

I am making a program that makes a LOT of short lived requests and noticed that this was happening so I read through the source code some and can't seem to find a way to make the requests in the example not dependent in any way. I think the root of the problem is that the contexts are shared between requests.

If someone could look at this or provide any feedback on things to try I will gladly do whatever is necessary to help. Thanks.

@bradfitz

This comment has been minimized.

Copy link
Member

commented Mar 1, 2018

What makes you think they share the same context?

I'm having a little trouble following your repro code.

Can you write a standalone repro that doesn't depend on hitting the external network?

Or can you point to where in the net/http we're sharing contexts? Contexts shouldn't be shared because they shouldn't be ever stored.

@bradfitz bradfitz changed the title All outgoing http requests are canceled if 1 dial times out net/http: All outgoing http requests are canceled if 1 dial times out Mar 1, 2018

@jsalterwhisper

This comment has been minimized.

Copy link
Author

commented Mar 1, 2018

Thanks for looking into this.

I forgot to post the output I am seeing.

Jasons-MacBook-Pro:test $ go run t2.go
Get https://httpbin.org/delay/1: dial tcp: i/o timeout

default Get https://httpbin.org/delay/1: dial tcp: lookup httpbin.org on 172.19.0.1:53: dial udp 172.19.0.1:53: i/o timeout

I believe they are getting shared inside of net/http/request.go. If you don't add a context to the request with WithContext then when any requests Context() function gets called it just returns context.Background(). So I looked at context/context.go and Background() is just an accessor for a single variable background. So it seems that by default they all share the same context.

The context stored in the request is accessed from net/http/transport.go in getConn. It gets the http request from the transportRequest passed in and then that request's context is passed to the dialConn() in the goroutine that tries to create a new connection.

If you change the line
for range(make([]int, 1)) {
from 1 -> 100 it happens way more consistently. I also reproduced this on an Ubuntu machine with the specs below

go version go1.7.4 linux/amd64
ubuntu@header-jason-10:~/src/stress$ go env
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/ubuntu"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build404349086=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"

I'm not sure off the top of my head how I would reproduce this without doing external calls. I will look at the dial tests and try to modify them to reproduce and get back to you.

@Cidan

This comment has been minimized.

Copy link

commented Mar 1, 2018

Super interesting -- there's definitely something weird going on here, but I'm not sure what. It seems like there's something up with the resolver, but I don't think it's context related. It almost feels like a resolver cache issue -- the first request times out and races/gets cache somewhere, then the following request fails with a UDP resolver timeout.

@Cidan

This comment has been minimized.

Copy link

commented Mar 1, 2018

package main

import (
	"context"
	"fmt"
	"net/http"
	"time"
)

func main() {

	// Create N fire and forget requests, in this case, 1
	for i := 0; i < 1; i++ {
		go func() {
			err := doRequest(1*time.Millisecond, 1)
			// We are expecting an error here, this is working as expected
			fmt.Printf("SHOULD ERROR %s\n", err)
		}()
	}

	err := doRequest(2*time.Second, 1)
	// This should not error, but it does randomly
	// fail with resolver errors
	fmt.Printf("SHOULD NOT ERROR %s\n", err)
}

// timeout is the context timeout for the request
// reqTime is the param passed to the web server for how long this request should wait
func doRequest(timeout time.Duration, reqTime int) error {
	// Create a context with 1 ms timeout
	ctx, cancel := context.WithTimeout(context.Background(), timeout)

	// Create our request and set the context
	r, _ := http.NewRequest("GET", fmt.Sprintf("https://httpbin.org/delay/%d", reqTime), nil)
	r = r.WithContext(ctx)

	// Do our request
	_, err := http.DefaultClient.Do(r)
	cancel()
	return err
}

randomly yields:

SHOULD ERROR Get https://httpbin.org/delay/1: context deadline exceeded
SHOULD NOT ERROR %!s(<nil>)

and:

SHOULD ERROR Get https://httpbin.org/delay/1: context deadline exceeded
SHOULD NOT ERROR Get https://httpbin.org/delay/1: dial tcp: lookup httpbin.org on [2401:fa00:fa::53]:53: dial udp [2401:fa00:fa::53]:53: i/o timeout
@Cidan

This comment has been minimized.

Copy link

commented Mar 2, 2018

Okay, found the bug, though I'm not sure how to go about fixing it:

https://github.com/golang/go/blob/master/src/net/lookup.go#L198
https://github.com/golang/go/blob/master/src/net/lookup.go#L214

There is a small race when a request times out via context and a non timed out request come in back to back. The results get merged back together depending on which result comes back first, thus the random failures. I modified this code locally to create a new group for every request and the problem went away.

@bradfitz should we have an option to disable this feature? I think it obviously has its' uses, but I'm at a loss on how to make this work cleanly with rapid fire requests.

@bradfitz

This comment has been minimized.

Copy link
Member

commented Mar 2, 2018

should we have an option to disable this feature

New options to work around problems are not typically Go's preferred solution.

I feel like there's an open bug for something in this area already. I'll look for it.

@bradfitz

This comment has been minimized.

Copy link
Member

commented Mar 2, 2018

I was thinking of 6a3d4be for #22724 and its revert, ac8a70d

Yeah, this seems like #22724. Thoughts?

@Cidan

This comment has been minimized.

Copy link

commented Mar 2, 2018

Yup, that's it. We can close this as dup.

@bradfitz

This comment has been minimized.

Copy link
Member

commented Mar 2, 2018

Sounds good, thanks.

Duplicate of #22724 <--- magic GitHub phrase I think

@bradfitz bradfitz closed this Mar 2, 2018

@golang golang locked and limited conversation to collaborators Mar 2, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.