New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: source of errors is unclear #14203

Open
jingweno opened this Issue Feb 2, 2016 · 20 comments

Comments

Projects
None yet
9 participants
@jingweno

jingweno commented Feb 2, 2016

What version of Go are you using (go version)?

1.5.3.

What operating system and processor architecture are you using?

Linux. AMD 64.

What did you do?

I'm using httputil.ReverseProxy to proxy to a host behind ELB, and I'm getting the errors "http: proxy error: net/http: request canceled" & "http: proxy error: net/http: transport closed before response was received" occationally. The line that printed the error is here. And the error came from http.Transport.RoundTrip.

After more digging, the errors could come from a couple places in http.Transport.RoundTrip. I was trying to find a way to print the callstacks of the error from http.Transport.RoundTrip to understand why it was returned but there's currently no way to do so.

Details can be found in this golang-nuts thread.

What did you expect to see?

I would expect I could instrument a method of either application code or Go's standard library and find out the callstacks at runtime.

What did you see instead?

No way to find out the callstack of a method from Go's stdlib. This is a feature request.

@davecheney

This comment has been minimized.

Contributor

davecheney commented Feb 2, 2016

@jingweno would it be easier to return a more descriptive error rather than invent a mechanism to instrument call stacks at runtime ?

@randall77

This comment has been minimized.

Contributor

randall77 commented Feb 2, 2016

Can you run your code under a debugger? Setting a breakpoint at the offending error generating locations should get you the stack in question.

@jingweno

This comment has been minimized.

jingweno commented Feb 2, 2016

would it be easier to return a more descriptive error rather than invent a mechanism to instrument call stacks at runtime

It may be a yes in this case. But that would require waiting for the next Go version (if it's fixed), or I patch Go and create a customer build for myself. IMO a mechanism to instrument call stacks would be necessary.

Can you run your code under a debugger?

Unfortunately not. It's a critical app in production and the best we could do is to log it and fix it later.

@bradfitz

This comment has been minimized.

Member

bradfitz commented Feb 2, 2016

http.RoundTripper passes stuff between goroutines. A single stack trace won't yield the answers you seek.

@jingweno

This comment has been minimized.

jingweno commented Feb 2, 2016

http.RoundTripper passes stuff between goroutines. A single stack trace won't yield the answers you seek.

I may not fully understand how goroutines work internally. But it's a matter of traceability, right? Can we not trace the calls from one goroutine to another?

@bradfitz

This comment has been minimized.

Member

bradfitz commented Feb 2, 2016

@jingweno, the error is sent over a channel. It's unclear which stack(s) at which time you want. There is no automatic answer here short of recording a full trace of the program and inspecting it later, probably.

@davecheney

This comment has been minimized.

Contributor

davecheney commented Feb 2, 2016

I think this is not as complicated as you make out. As I understand it the
problem is a generic error with the same text is returned from several
places. If the error is made more specific, won't this solve the problem ?

On Wed, 3 Feb 2016, 10:34 Brad Fitzpatrick notifications@github.com wrote:

@jingweno https://github.com/jingweno, the error is sent over a
channel. It's unclear which stack(s) at which time you want. There is no
automatic answer here short of recording a full trace of the program and
inspecting it later, probably.


Reply to this email directly or view it on GitHub
#14203 (comment).

@bradfitz

This comment has been minimized.

Member

bradfitz commented Feb 2, 2016

To be clear, I agree with @davecheney. If there's an unstructured or useless error that needs to disambiguated, we can do that.

@jingweno

This comment has been minimized.

jingweno commented Feb 3, 2016

@bradfitz:

the error is sent over a channel. It's unclear which stack(s) at which time you want. There is no automatic answer here short of recording a full trace of the program and inspecting it later, probably.

Hmm...I may be oversimplifying the problem. These kinds of things happen for a distributed system as well: we keep track of callstacks for different microservices by carrying over a request id: https://brandur.org/request-ids. Can we do something similar for Goroutines? A unique "Goroutine ID" is generated before a piece of code is executed (the ID could come from an argument, as long as it's unique); The "Goroutine ID" is carried over for different goroutines; Stacks are correlated and aggregated by the "Goroutine ID". Maybe a deeper thought is needed though.

@davecheney:

I think this is not as complicated as you make out. As I understand it the
problem is a generic error with the same text is returned from several
places. If the error is made more specific, won't this solve the problem ?

Yup, it would solve the problem. But I couldn't get away of waiting for the next Go release or making a custom build for Go if I want to use it right away as mentioned in #14203 (comment). I literally want to know what caused the errors now.

@bradfitz

This comment has been minimized.

Member

bradfitz commented Feb 3, 2016

@jingweno, yes, we do the same thing with request tracing inside Google. See https://godoc.org/golang.org/x/net/trace and https://godoc.org/golang.org/x/net/context . Contexts need to be passed explicitly. Automatic contexts leads to tons of problems of confusingly-annotated traces (as witnessed in C++ and Java code trying to use TLS), which is why goroutines don't have accessible ID numbers. This won't be added to Go as an automatic thing.

I literally want to know what caused the errors now.

Modify the Go code and see. The error can only come from a few places.

@bradfitz

This comment has been minimized.

Member

bradfitz commented Feb 3, 2016

I'm getting the errors "http: proxy error: net/http: request canceled" & "http: proxy error: net/http: transport closed before response was received"

The first is the when the client hung up on you. (the ReverseProxy CloseNotifier fired, and canceled the outbound request).

The second is when the peer server you're making the request hung up on you.

@minux

This comment has been minimized.

Member

minux commented Feb 3, 2016

@bradfitz

This comment has been minimized.

Member

bradfitz commented Feb 3, 2016

@minux, that bug was never shipped. That was only during the Go 1.6 dev cycle.

@ianlancetaylor ianlancetaylor changed the title from Tracing callstack of a method from Go's standard library to net/http: source of errors is unclear Feb 3, 2016

@ianlancetaylor ianlancetaylor added this to the Unplanned milestone Feb 3, 2016

@jingweno

This comment has been minimized.

jingweno commented Feb 17, 2016

@bradfitz Alright, I'm coming back to this. Would it be possible to add different error messages for https://github.com/golang/go/blob/release-branch.go1.5/src/net/http/transport.go#L1093-L1094? I could help with that but I need to figure out how to contribute first...

Hopefully this could be fixed in time for 1.6.

@jingweno

This comment has been minimized.

jingweno commented Feb 17, 2016

And we're starting to get a couple "EOF" errors. The error type is *errors.errorString. The errors were from /usr/local/go/src/net/http/httputil/reverseproxy.go:194. I printed out the runtime stack, and the EOF may be coming from:

goroutine 7089108 [IO wait]:
net.runtime_pollWait(0x7f4174178f70, 0x72, 0xc82000a240)
/usr/local/go/src/runtime/netpoll.go:157 +0x60
net.(*pollDesc).Wait(0xc8207a2760, 0x72, 0x0, 0x0)
/usr/local/go/src/net/fd_poll_runtime.go:73 +0x3a
net.(*pollDesc).WaitRead(0xc8207a2760, 0x0, 0x0)
/usr/local/go/src/net/fd_poll_runtime.go:78 +0x36
net.(*netFD).Read(0xc8207a2700, 0xc8204cc000, 0x800, 0x800, 0x0, 0x7f41768e8050, 0xc82000a240)
/usr/local/go/src/net/fd_unix.go:232 +0x23a
net.(*conn).Read(0xc82002a3b8, 0xc8204cc000, 0x800, 0x800, 0x0, 0x0, 0x0)
/usr/local/go/src/net/net.go:172 +0xe4
crypto/tls.(*block).readFromUntil(0xc8228e88a0, 0x7f41768f2208, 0xc82002a3b8, 0x5, 0x0, 0x0)
/usr/local/go/src/crypto/tls/conn.go:455 +0xcc
crypto/tls.(*Conn).readRecord(0xc822c54b00, 0x8f4517, 0x0, 0x0)
/usr/local/go/src/crypto/tls/conn.go:540 +0x2d1
crypto/tls.(*Conn).Read(0xc822c54b00, 0xc820c8e000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/usr/local/go/src/crypto/tls/conn.go:901 +0x167
net/http.noteEOFReader.Read(0x7f41768f2118, 0xc822c54b00, 0xc8209581b8, 0xc820c8e000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/usr/local/go/src/net/http/transport.go:1370 +0x67
net/http.(*noteEOFReader).Read(0xc820d1b180, 0xc820c8e000, 0x1000, 0x1000, 0xc82002a688, 0x0, 0x0)
<autogenerated>:126 +0xd0
bufio.(*Reader).fill(0xc8228dac00)
/usr/local/go/src/bufio/bufio.go:97 +0x1e9
bufio.(*Reader).Peek(0xc8228dac00, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0)
/usr/local/go/src/bufio/bufio.go:132 +0xcc
net/http.(*persistConn).readLoop(0xc820958160)
/usr/local/go/src/net/http/transport.go:876 +0xf7
created by net/http.(*Transport).dialConn
/usr/local/go/src/net/http/transport.go:685 +0xc78
@jingweno

This comment has been minimized.

jingweno commented Feb 19, 2016

Also could @bradfitz explain a bit of b) of https://github.com/golang/go/blob/release-branch.go1.5/src/net/http/transport.go#L1194-L1199? We tracked down the downstream server did respond with 2xx but somehow the Proxy is responding with 5xx.

@glasser

This comment has been minimized.

Contributor

glasser commented Dec 10, 2016

As described in #18272, it would also be nice if you could easily tell if an error returned by http.Transport.RoundTrip came from communicating with the target server or from reading request.Body (or a third place I'm not considering).

@bradfitz

This comment has been minimized.

Member

bradfitz commented Feb 1, 2017

Related to #9424 / #15935

@gopherbot

This comment has been minimized.

gopherbot commented Feb 27, 2017

CL https://golang.org/cl/37495 mentions this issue.

gopherbot pushed a commit that referenced this issue Mar 2, 2017

net/http: clean up Transport.RoundTrip error handling
If I put a 10 millisecond sleep at testHookWaitResLoop, before the big
select in (*persistConn).roundTrip, two flakes immediately started
happening, TestTransportBodyReadError (#19231) and
TestTransportPersistConnReadLoopEOF.

The problem was that there are many ways for a RoundTrip call to fail
(errors reading from Request.Body while writing the response, errors
writing the response, errors reading the response due to server
closes, errors due to servers sending malformed responses,
cancelations, timeouts, etc.), and many of those failures then tear
down the TCP connection, causing more failures, since there are always
at least three goroutines involved (reading, writing, RoundTripping).

Because the errors were communicated over buffered channels to a giant
select, the error returned to the caller was a function of which
random select case was called, which was why a 10ms delay before the
select brought out so many bugs. (several fixed in my previous CLs the past
few days).

Instead, track the error explicitly in the transportRequest, guarded
by a mutex.

In addition, this CL now:

* differentiates between the two ways writing a request can fail: the
  io.Copy reading from the Request.Body or the io.Copy writing to the
  network. A new io.Reader type notes read errors from the
  Request.Body. The read-from-body vs write-to-network errors are now
  prioritized differently.

* unifies the two mapRoundTripErrorFromXXX methods into one
  mapRoundTripError method since their logic is now the same.

* adds a (*Request).WithT(*testing.T) method in export_test.go, usable
  by tests, to call t.Logf at points during RoundTrip. This is disabled
  behind a constant except when debugging.

* documents and deflakes TestClientRedirectContext

I've tested this CL with high -count values, with/without -race,
with/without delays before the select, etc. So far it seems robust.

Fixes #19231 (TestTransportBodyReadError flake)
Updates #14203 (source of errors unclear; they're now tracked more)
Updates #15935 (document Transport errors more; at least understood more now)

Change-Id: I3cccc3607f369724b5344763e35ad2b7ea415738
Reviewed-on: https://go-review.googlesource.com/37495
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>

@bradfitz bradfitz modified the milestones: Go1.10, Go1.9 Jul 6, 2017

@bradfitz

This comment has been minimized.

Member

bradfitz commented Jul 6, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment