Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: infinite hang on dial #16060

Closed
stevenh opened this issue Jun 14, 2016 · 11 comments

Comments

Projects
None yet
8 participants
@stevenh
Copy link
Contributor

commented Jun 14, 2016

Please answer these questions before submitting your issue. Thanks!

  1. What version of Go are you using (go version)?
    1.6.2
  2. What operating system and processor architecture are you using (go env)?
    GOARCH="amd64"
    GOBIN="/data/go/bin"
    GOEXE=".exe"
    GOHOSTARCH="amd64"
    GOHOSTOS="freebsd"
    GOOS="freebsd"
    GOPATH="/data/go"
    GORACE=""
    GOROOT="/usr/local/go"
    GOTOOLDIR="/usr/local/go/pkg/tool/freebsd_amd64"
    GO15VENDOREXPERIMENT="1"
    CC="clang"
    GOGCCFLAGS="-m64 -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0"
    CXX="clang++"
    CGO_ENABLED="0"
  3. What did you do?
    Attempt an http request using http.Client
  4. What did you expect to see?
    Success or failure of the request
  5. What did you see instead?
    Request hung indefinitely still attempting to "dial" but no active OS socket.

Here are traces from relevant goroutines:

goroutine 8 [select, 31 minutes]:
net/http.(*persistConn).roundTrip(0xc8202f0410, 0xc82055ad30, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/http/transport.go:1473 +0xf1f
net/http.(*Transport).RoundTrip(0xc8200dc000, 0xc8203f9c00, 0xc8200dc000, 0x0, 0x0)
    /usr/local/go/src/net/http/transport.go:324 +0x9bb
net/http.send(0xc8203f9c00, 0x800c48528, 0xc8200dc000, 0x0, 0x0, 0x0, 0xc8204e4480, 0x0, 0x0)
    /usr/local/go/src/net/http/client.go:260 +0x6b7
net/http.(*Client).send(0xbd3380, 0xc8203f9c00, 0x0, 0x0, 0x0, 0xd8, 0x0, 0x0)
    /usr/local/go/src/net/http/client.go:155 +0x185
net/http.(*Client).doFollowingRedirects(0xbd3380, 0xc8203f9c00, 0xa25ee0, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/http/client.go:475 +0x8a4
net/http.(*Client).Get(0xbd3380, 0xc8204e4420, 0x21, 0xc8202f0000, 0x0, 0x0)
    /usr/local/go/src/net/http/client.go:422 +0xaa
net/http.Get(0xc8204e4420, 0x21, 0xc82022f6b8, 0x0, 0x0)
    /usr/local/go/src/net/http/client.go:397 +0x45
...

goroutine 58 [IO wait, 33 minutes]:
net.runtime_pollWait(0x800c49cd0, 0x72, 0xc8202e1000)
    /usr/local/go/src/runtime/netpoll.go:160 +0x60
net.(*pollDesc).Wait(0xc8203ae760, 0x72, 0x0, 0x0)
    /usr/local/go/src/net/fd_poll_runtime.go:73 +0x3a
net.(*pollDesc).WaitRead(0xc8203ae760, 0x0, 0x0)
    /usr/local/go/src/net/fd_poll_runtime.go:78 +0x36
net.(*netFD).Read(0xc8203ae700, 0xc8202e1000, 0x1000, 0x1000, 0x0, 0x800c44050, 0xc82000e130)
    /usr/local/go/src/net/fd_unix.go:250 +0x23a
net.(*conn).Read(0xc8200de030, 0xc8202e1000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/net.go:172 +0xe4
net/http.noteEOFReader.Read(0x800cc80f8, 0xc8200de030, 0xc8202f0478, 0xc8202e1000, 0x1000, 0x1000, 0x408333, 0x0, 0x0)
    /usr/local/go/src/net/http/transport.go:1687 +0x67
net/http.(*noteEOFReader).Read(0xc820428520, 0xc8202e1000, 0x1000, 0x1000, 0xc8201a4d1d, 0x0, 0x0)
    <autogenerated>:287 +0xd0
bufio.(*Reader).fill(0xc82017e360)
    /usr/local/go/src/bufio/bufio.go:97 +0x1e9
bufio.(*Reader).Peek(0xc82017e360, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0)
    /usr/local/go/src/bufio/bufio.go:132 +0xcc
net/http.(*persistConn).readLoop(0xc8202f0410)
    /usr/local/go/src/net/http/transport.go:1073 +0x177
created by net/http.(*Transport).dialConn
    /usr/local/go/src/net/http/transport.go:857 +0x10a6

goroutine 59 [select, 31 minutes]:
net/http.(*persistConn).writeLoop(0xc8202f0410)
    /usr/local/go/src/net/http/transport.go:1277 +0x472
created by net/http.(*Transport).dialConn
    /usr/local/go/src/net/http/transport.go:858 +0x10cb
@davecheney

This comment has been minimized.

Copy link
Contributor

commented Jun 14, 2016

I think there is something wrong with your go install. GOOS is set to
Windows but all the paths are Linux like.

On Tue, 14 Jun 2016, 21:24 Steven Hartland notifications@github.com wrote:

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?
1.6.2
2.

What operating system and processor architecture are you using (go env
)?
GOARCH="amd64"
GOBIN="/data/go/bin"
GOEXE=".exe"
GOHOSTARCH="amd64"
GOHOSTOS="freebsd"
GOOS="windows"
GOPATH="/data/go"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/freebsd_amd64"
GO15VENDOREXPERIMENT="1"
CC="clang"
GOGCCFLAGS="-m64 -fno-caret-diagnostics -Qunused-arguments
-fmessage-length=0"
CXX="clang++"
CGO_ENABLED="0"
3.

What did you do?
Attempt an http request using http.Client
4.

What did you expect to see?
Success or failure of the request
5.

What did you see instead?
Request hung indefinitely still attempting to "dial" but no active OS
socket.

Here are traces from relevant goroutines:

goroutine 8 [select, 31 minutes]:
net/http.(_persistConn).roundTrip(0xc8202f0410, 0xc82055ad30, 0x0, 0x0, 0x0)
/usr/local/go/src/net/http/transport.go:1473 +0xf1f
net/http.(_Transport).RoundTrip(0xc8200dc000, 0xc8203f9c00, 0xc8200dc000, 0x0, 0x0)
/usr/local/go/src/net/http/transport.go:324 +0x9bb
net/http.send(0xc8203f9c00, 0x800c48528, 0xc8200dc000, 0x0, 0x0, 0x0, 0xc8204e4480, 0x0, 0x0)
/usr/local/go/src/net/http/client.go:260 +0x6b7
net/http.(_Client).send(0xbd3380, 0xc8203f9c00, 0x0, 0x0, 0x0, 0xd8, 0x0, 0x0)
/usr/local/go/src/net/http/client.go:155 +0x185
net/http.(_Client).doFollowingRedirects(0xbd3380, 0xc8203f9c00, 0xa25ee0, 0x0, 0x0, 0x0)
/usr/local/go/src/net/http/client.go:475 +0x8a4
net/http.(*Client).Get(0xbd3380, 0xc8204e4420, 0x21, 0xc8202f0000, 0x0, 0x0)
/usr/local/go/src/net/http/client.go:422 +0xaa
net/http.Get(0xc8204e4420, 0x21, 0xc82022f6b8, 0x0, 0x0)
/usr/local/go/src/net/http/client.go:397 +0x45
...

goroutine 58 [IO wait, 33 minutes]:
net.runtime_pollWait(0x800c49cd0, 0x72, 0xc8202e1000)
/usr/local/go/src/runtime/netpoll.go:160 +0x60
net.(_pollDesc).Wait(0xc8203ae760, 0x72, 0x0, 0x0)
/usr/local/go/src/net/fd_poll_runtime.go:73 +0x3a
net.(_pollDesc).WaitRead(0xc8203ae760, 0x0, 0x0)
/usr/local/go/src/net/fd_poll_runtime.go:78 +0x36
net.(_netFD).Read(0xc8203ae700, 0xc8202e1000, 0x1000, 0x1000, 0x0, 0x800c44050, 0xc82000e130)
/usr/local/go/src/net/fd_unix.go:250 +0x23a
net.(_conn).Read(0xc8200de030, 0xc8202e1000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/usr/local/go/src/net/net.go:172 +0xe4
net/http.noteEOFReader.Read(0x800cc80f8, 0xc8200de030, 0xc8202f0478, 0xc8202e1000, 0x1000, 0x1000, 0x408333, 0x0, 0x0)
/usr/local/go/src/net/http/transport.go:1687 +0x67
net/http.(_noteEOFReader).Read(0xc820428520, 0xc8202e1000, 0x1000, 0x1000, 0xc8201a4d1d, 0x0, 0x0)
:287 +0xd0
bufio.(_Reader).fill(0xc82017e360)
/usr/local/go/src/bufio/bufio.go:97 +0x1e9
bufio.(_Reader).Peek(0xc82017e360, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0)
/usr/local/go/src/bufio/bufio.go:132 +0xcc
net/http.(_persistConn).readLoop(0xc8202f0410)
/usr/local/go/src/net/http/transport.go:1073 +0x177
created by net/http.(*Transport).dialConn
/usr/local/go/src/net/http/transport.go:857 +0x10a6

goroutine 59 [select, 31 minutes]:
net/http.(_persistConn).writeLoop(0xc8202f0410)
/usr/local/go/src/net/http/transport.go:1277 +0x472
created by net/http.(_Transport).dialConn
/usr/local/go/src/net/http/transport.go:858 +0x10cb


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#16060, or mute the thread
https://github.com/notifications/unsubscribe/AAAcA7o1N7CqAaLdBFehQU5ggua-BMZ9ks5qLo9RgaJpZM4I1N5I
.

@leepa

This comment has been minimized.

Copy link
Contributor

commented Jun 14, 2016

GOHOSTOS != GOOS - it's cross compiled so the embedded source paths will be Linux like? EDIT: Ah see comment below - looks like OP didn't paste a clean go env.

@darkliquid

This comment has been minimized.

Copy link

commented Jun 14, 2016

I think that is just an artefact of the local environment from when we were cross-compiling for windows - however the binary in question was compiled for freebsd.

@davecheney

This comment has been minimized.

Copy link
Contributor

commented Jun 14, 2016

Can you please provide the source code to a program that demonstrates the
issue.

On Tue, 14 Jun 2016, 21:39 Andrew Montgomery-Hurrell <
notifications@github.com> wrote:

I think that is just an artefact of the local environment from when we
were cross-compiling for windows - however the binary in question was
compiled for freebsd.


You are receiving this because you commented.

Reply to this email directly, view it on GitHub
#16060 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AAAcAymJ0Igvx6nei2kU4CkH8kb1tv_wks5qLpLlgaJpZM4I1N5I
.

@ianlancetaylor ianlancetaylor changed the title http infinite hang on dial net/http: infinite hang on dial Jun 14, 2016

@ianlancetaylor ianlancetaylor added this to the Go1.7Maybe milestone Jun 14, 2016

@stevenh

This comment has been minimized.

Copy link
Contributor Author

commented Jun 20, 2016

We're not been able to create a simple test case for this as yet unfortunately.

It seems like there's either a missing error check or a race condition during dial.

I'm not sure how easy it will be to create a test case as I believe it needs a network failure to trigger.

@stevenh

This comment has been minimized.

Copy link
Contributor Author

commented Jun 21, 2016

Its possible we may be seeing something related to: #14548

For reference the connections will be TLS connections.

@bradfitz

This comment has been minimized.

Copy link
Member

commented Jun 28, 2016

The bug report says:

Request hung indefinitely still attempting to "dial" but no active OS socket.

But that's not what the provided goroutines say.

Those three goroutines could just be an idle TCP connection owned by an http.Transport. It's been 33 minutes since that connection was used.

I don't see enough to work on here. There's not enough code (no code) or goroutine context to help debug.

I'm going to close this until there's something more we can help with.

@bradfitz bradfitz closed this Jun 28, 2016

@stevenh

This comment has been minimized.

Copy link
Contributor Author

commented Jun 28, 2016

Unfortunately that's the problem though, I could have added the sockstat output for app at the time but it was empty, as there was no OS level socket present, hence go was waiting for something that it was never going to get.

I totally admit that without a reproduction case this is likely going to be impossible to narrow down though, so if do manage to come up with that I'll reopen.

@bradfitz

This comment has been minimized.

Copy link
Member

commented Jun 28, 2016

Can you reproduce it yourself? Only on FreeBSD? Only on Go 1.6.2, or also Go 1.7?

If possible, post a system call trace of it happening. Probably best to open a new bug, since comments on closed bugs are often lost.

@stevenh

This comment has been minimized.

Copy link
Contributor Author

commented Jun 29, 2016

We've only seen it on FreeBSD and we run the app on FreeBSD, Linux (small subset) and Windows.

We've only seen it once and that was 1.6.2 we'll raise a new bug when we reproduce it.

@magicwang-cn

This comment has been minimized.

Copy link

commented Sep 28, 2016

@stevenh @bradfitz
i also have found the same issue in kubernetes. the kubelet has hang on the heartbeat to kube-apiserver(with it's get node request):

goroutine 158 [select, 242 minutes]:
net/http.(*persistConn).roundTrip(0xc820d780d0, 0xc820d01ba0, 0x0, 0x0, 0x0)
        /usr/local/go/src/net/http/transport.go:1473 +0xf1f
net/http.(*Transport).RoundTrip(0xc82000a180, 0xc820ec37a0, 0xa, 0x0, 0x0)
        /usr/local/go/src/net/http/transport.go:324 +0x9bb
k8s.io/kubernetes/pkg/client/transport.(*userAgentRoundTripper).RoundTrip(0xc8207afd40, 0xc820ec37a0, 0xc8207afd40, 0x0, 0x0)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/client/transport/round_trippers.go:92 +0x130
net/http.send(0xc820ec36c0, 0x7f8c69381058, 0xc8207afd40, 0x0, 0x0, 0x0, 0xc8207acdc0, 0x0, 0x0)
        /usr/local/go/src/net/http/client.go:260 +0x6b7
net/http.(*Client).send(0xc8207ab3b0, 0xc820ec36c0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /usr/local/go/src/net/http/client.go:155 +0x185
net/http.(*Client).doFollowingRedirects(0xc8207ab3b0, 0xc820ec36c0, 0x388acf8, 0x0, 0x0, 0x0)
        /usr/local/go/src/net/http/client.go:475 +0x8a4
net/http.(*Client).Do(0xc8207ab3b0, 0xc820ec36c0, 0x0, 0x0, 0x0)
        /usr/local/go/src/net/http/client.go:188 +0xff
k8s.io/kubernetes/pkg/client/restclient.(*Request).request(0xc820848d00, 0xc820c1da50, 0x0, 0x0)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/client/restclient/request.go:811 +0x5e5
k8s.io/kubernetes/pkg/client/restclient.(*Request).Do(0xc820848d00, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/client/restclient/request.go:863 +0x87
k8s.io/kubernetes/pkg/client/clientset_generated/internalclientset/typed/core/unversioned.(*nodes).Get(0xc820e72940, 0x7fff453d4d2c, 0xb, 0xc8201a1180, 0x0, 0x0)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/client/clientset_generated/internalclientset/typed/core/unversioned/node.go:124 +0x110
k8s.io/kubernetes/pkg/kubelet.(*Kubelet).tryUpdateNodeStatus(0xc8200f4b00, 0x0, 0x0)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/kubelet/kubelet.go:3731 +0x120
k8s.io/kubernetes/pkg/kubelet.(*Kubelet).updateNodeStatus(0xc8200f4b00, 0x0, 0x0)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/kubelet/kubelet.go:3073 +0x5b
k8s.io/kubernetes/pkg/kubelet.(*Kubelet).syncNodeStatus(0xc8200f4b00)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/kubelet/kubelet.go:1354 +0x4e
k8s.io/kubernetes/pkg/kubelet.(*Kubelet).(k8s.io/kubernetes/pkg/kubelet.syncNodeStatus)-fm()
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/kubelet/kubelet.go:1116 +0x20
k8s.io/kubernetes/pkg/util/wait.JitterUntil.func1(0xc8203c2e00)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/wait/wait.go:86 +0x4b
k8s.io/kubernetes/pkg/util/wait.JitterUntil(0xc8203c2e00, 0x2540be400, 0x0, 0x1, 0xc820060f60)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/wait/wait.go:87 +0xb4
k8s.io/kubernetes/pkg/util/wait.Until(0xc8203c2e00, 0x2540be400, 0xc820060f60)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/wait/wait.go:49 +0x43
created by k8s.io/kubernetes/pkg/kubelet.(*Kubelet).Run
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/kubelet/kubelet.go:1116 +0x61f

tbg added a commit to tbg/cockroach that referenced this issue Aug 22, 2017

acceptance: make docker more resilient to timeout in ContainerStart
Docker likes to never respond to us, and we do not usually have cancellations
on the context (which would not help, after all, that would just fail the test
right there). Instead, try a few times.

The problem looks similar to

golang/go#16060
golang/go#5103

Another possibility mentioned in usergroups is that some file descriptor limit
is hit. Since I've never seen this locally, perhaps that's the case on our
agent machines. Unfortunately, those are hard to SSH into.

This may not be a good idea (after all, perhaps `Start()` succeeded) and we'd
have to do something similar for `ContainerWait`. But, at least it should
give us an additional data point: do the retries also just block? Is the
container actually started when we retry?

tbg added a commit to tbg/cockroach that referenced this issue Aug 22, 2017

acceptance: make docker more resilient to timeout in ContainerStart
Docker likes to never respond to us, and we do not usually have cancellations
on the context (which would not help, after all, that would just fail the test
right there). Instead, try a few times.

The problem looks similar to

golang/go#16060
golang/go#5103

Another possibility mentioned in usergroups is that some file descriptor limit
is hit. Since I've never seen this locally, perhaps that's the case on our
agent machines. Unfortunately, those are hard to SSH into.

This may not be a good idea (after all, perhaps `Start()` succeeded) and we'd
have to do something similar for `ContainerWait`. But, at least it should
give us an additional data point: do the retries also just block? Is the
container actually started when we retry?

tbg added a commit to tbg/cockroach that referenced this issue Aug 23, 2017

acceptance: make docker more resilient to timeout in ContainerStart
Docker likes to never respond to us, and we do not usually have cancellations
on the context (which would not help, after all, that would just fail the test
right there). Instead, try a few times.

The problem looks similar to

golang/go#16060
golang/go#5103

Another possibility mentioned in usergroups is that some file descriptor limit
is hit. Since I've never seen this locally, perhaps that's the case on our
agent machines. Unfortunately, those are hard to SSH into.

This may not be a good idea (after all, perhaps `Start()` succeeded) and we'd
have to do something similar for `ContainerWait`. But, at least it should
give us an additional data point: do the retries also just block? Is the
container actually started when we retry?

tbg added a commit to tbg/cockroach that referenced this issue Aug 23, 2017

acceptance: make docker more resilient to timeout in ContainerStart
Docker likes to never respond to us, and we do not usually have cancellations
on the context (which would not help, after all, that would just fail the test
right there). Instead, try a few times.

The problem looks similar to

golang/go#16060
golang/go#5103

Another possibility mentioned in usergroups is that some file descriptor limit
is hit. Since I've never seen this locally, perhaps that's the case on our
agent machines. Unfortunately, those are hard to SSH into.

This may not be a good idea (after all, perhaps `Start()` succeeded) and we'd
have to do something similar for `ContainerWait`. But, at least it should
give us an additional data point: do the retries also just block? Is the
container actually started when we retry?

tbg added a commit to tbg/cockroach that referenced this issue Aug 23, 2017

acceptance: make docker more resilient to timeout in ContainerStart
Docker likes to never respond to us, and we do not usually have cancellations
on the context (which would not help, after all, that would just fail the test
right there). Instead, try a few times.

The problem looks similar to

golang/go#16060
golang/go#5103

Another possibility mentioned in usergroups is that some file descriptor limit
is hit. Since I've never seen this locally, perhaps that's the case on our
agent machines. Unfortunately, those are hard to SSH into.

This may not be a good idea (after all, perhaps `Start()` succeeded) and we'd
have to do something similar for `ContainerWait`. But, at least it should
give us an additional data point: do the retries also just block? Is the
container actually started when we retry?

@golang golang locked and limited conversation to collaborators Sep 28, 2017

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.