Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: http.Client not properly handling broken connections #8946

Closed
gopherbot opened this issue Oct 16, 2014 · 5 comments

Comments

Projects
None yet
5 participants
@gopherbot
Copy link

commented Oct 16, 2014

by eyakubovich:

What does 'go version' print?
go version devel +5c372df9654a Wed Oct 15 13:33:00 2014 -0400 linux/amd64
 
What steps reproduce the problem?
If possible, include a link to a program on play.golang.org.
 
1. Run the http server: http://play.golang.org/p/J-DOxv7l8P
The server accepts connections and abruptly closes them via timeout.
2. Run the client program designed to illustrate the bug:
http://play.golang.org/p/J-DOxv7l8P
 
What happened?
Here's an output of a sample run:
[]
resp err= Get http://localhost/foo: EOF
[]
resp err= Get http://localhost/foo: EOF
[]
resp err= Get http://localhost/foo: EOF
[]
resp err= Get http://localhost/foo: EOF
[]
resp err= Get http://localhost/foo: EOF
[]
resp err= Get http://localhost/foo: read tcp 127.0.0.1:80: connection reset by peer
[]
resp err= Get http://localhost/foo: http: can't write HTTP request on broken connection
[]
resp err= Get http://localhost/foo: EOF
 
What should have happened instead?
The client makes independent http requests in a loop and each line is the result of the
request: either an error or body. All requests should produce the same behavior -- it
can be an empty body or EOF error when performing ioutil.ReadAll(resp.Body). The request
itself should not fail. Furthermore, at least some of the requests currently fail
immediately as they are attempted to be made on already broken connections.
 
Please provide any additional information below.
I would like to emphasize that the problem is with the HTTP client, not the server. The
demo server is there to exercise the client into handling broken connections. From doing
preliminary investigation, it looks like the first time the http client properly returns
empty body but the transport ends up returning the broken connection back to the pool.
This results in the next request failing and transport "cleaning itself out",
causing the pattern to repeat.
@bradfitz

This comment has been minimized.

Copy link
Member

commented Oct 16, 2014

Comment 1:

That is the same URL for the client and server.
@gopherbot

This comment has been minimized.

Copy link
Author

commented Oct 16, 2014

Comment 2 by eyakubovich:

My apologies. Here's the client code: http://play.golang.org/p/UELtke0oS4

@gopherbot gopherbot added the new label Oct 16, 2014

@bradfitz bradfitz removed the new label Dec 18, 2014

@bgentry

This comment has been minimized.

Copy link
Contributor

commented Jan 30, 2015

I think this is related to #4677, which I've made a rough attempt to fix here: https://go-review.googlesource.com/3210

@rsc rsc added this to the Go1.5Maybe milestone Apr 10, 2015

Dieterbe added a commit to vimeo/bosun that referenced this issue Apr 17, 2015

prevent "use of closed network connection" in graphite GET's
I would sometimes see errors like:
graphiteBand: graphite RequestError (http://....): Get failed: Get
http://... : read tcp 10.90.128.100:80: use of closed network connection

This kind of error is not something that should bubble up to the caller
of a http client library, but it does.
see also:
golang/go#8946
golang/go#9424

there's a bunch more issues about the broken state of error handling in
net/http.
So anyway the http client tries to reuse an existing connection which
has broken.  Somehow this is the caller's problem, so we address it by
not keeping any idle connection and opening a new connection each time.
This should get rid of these errors without adding much overhead.

Note that the used http transport is, other than the
MaxIdleConnsPerHost setting, the same as the default transport.
@jasdel

This comment has been minimized.

Copy link
Contributor

commented May 6, 2015

Should retrying a TCP RST connection be limited to HEAD/GET methods when it write fails because the server closed the connection? If not would it make sense for this specific error to be returned as a net.Error instead of url.Error? Especially since it is temporary and connection related. Or would that break the 1.0 contract?

Example of server read timeout for keep alive connections: https://gist.github.com/jasdel/e0346d861756e14399dc

@bradfitz

This comment has been minimized.

Copy link
Member

commented Dec 9, 2015

This appears to be fixed.

With a client of:

func main() {
        cli := http.Client{}

        for {
                url := "http://localhost:8080/foo"
                resp, rerr := cli.Get(url)
                fmt.Printf("Get = %#v, %v\n", resp, rerr)
                if rerr != nil {
                        continue
                }

                body, berr := ioutil.ReadAll(resp.Body)
                fmt.Printf("ReadAll = %q, %v\n", body, berr)
                resp.Body.Close()
        }
}

I get the same thing each time:

Get = &http.Response{Status:"200 OK", StatusCode:200, Proto:"HTTP/1.1", ProtoMajor:1, ProtoMinor:1, Header:http.Header{"Date":[]string{"Wed, 09 Dec 2015 18:34:18 GMT"}, "Content-Type":[]string{"text/plain; charset=utf-8"}}, Body:(*http.bodyEOFSignal)(0xc8200e0040), ContentLength:-1, TransferEncoding:[]string{"chunked"}, Close:false, Trailer:http.Header(nil), Request:(*http.Request)(0xc8200c2000), TLS:(*tls.ConnectionState)(nil)}, <nil>
ReadAll = "", <nil>
Get = &http.Response{Status:"200 OK", StatusCode:200, Proto:"HTTP/1.1", ProtoMajor:1, ProtoMinor:1, Header:http.Header{"Date":[]string{"Wed, 09 Dec 2015 18:34:19 GMT"}, "Content-Type":[]string{"text/plain; charset=utf-8"}}, Body:(*http.bodyEOFSignal)(0xc8200e02c0), ContentLength:-1, TransferEncoding:[]string{"chunked"}, Close:false, Trailer:http.Header(nil), Request:(*http.Request)(0xc8200fa000), TLS:(*tls.ConnectionState)(nil)}, <nil>
ReadAll = "", <nil>
Get = &http.Response{Status:"200 OK", StatusCode:200, Proto:"HTTP/1.1", ProtoMajor:1, ProtoMinor:1, Header:http.Header{"Content-Type":[]string{"text/plain; charset=utf-8"}, "Date":[]string{"Wed, 09 Dec 2015 18:34:20 GMT"}}, Body:(*http.bodyEOFSignal)(0xc8200e0400), ContentLength:-1, TransferEncoding:[]string{"chunked"}, Close:false, Trailer:http.Header(nil), Request:(*http.Request)(0xc8200c20e0), TLS:(*tls.ConnectionState)(nil)}, <nil>
ReadAll = "", <nil>
Get = &http.Response{Status:"200 OK", StatusCode:200, Proto:"HTTP/1.1", ProtoMajor:1, ProtoMinor:1, Header:http.Header{"Content-Type":[]string{"text/plain; charset=utf-8"}, "Date":[]string{"Wed, 09 Dec 2015 18:34:21 GMT"}}, Body:(*http.bodyEOFSignal)(0xc820016800), ContentLength:-1, TransferEncoding:[]string{"chunked"}, Close:false, Trailer:http.Header(nil), Request:(*http.Request)(0xc8200fa0e0), TLS:(*tls.ConnectionState)(nil)}, <nil>
ReadAll = "", <nil>
Get = &http.Response{Status:"200 OK", StatusCode:200, Proto:"HTTP/1.1", ProtoMajor:1, ProtoMinor:1, Header:http.Header{"Date":[]string{"Wed, 09 Dec 2015 18:34:22 GMT"}, "Content-Type":[]string{"text/plain; charset=utf-8"}}, Body:(*http.bodyEOFSignal)(0xc8200e0640), ContentLength:-1, TransferEncoding:[]string{"chunked"}, Close:false, Trailer:http.Header(nil), Request:(*http.Request)(0xc8200fa1c0), TLS:(*tls.ConnectionState)(nil)}, <nil>
ReadAll = "", <nil>
Get = &http.Response{Status:"200 OK", StatusCode:200, Proto:"HTTP/1.1", ProtoMajor:1, ProtoMinor:1, Header:http.Header{"Date":[]string{"Wed, 09 Dec 2015 18:34:23 GMT"}, "Content-Type":[]string{"text/plain; charset=utf-8"}}, Body:(*http.bodyEOFSignal)(0xc8200e0740), ContentLength:-1, TransferEncoding:[]string{"chunked"}, Close:false, Trailer:http.Header(nil), Request:(*http.Request)(0xc8200fa2a0), TLS:(*tls.ConnectionState)(nil)}, <nil>
ReadAll = "", <nil>
Get = &http.Response{Status:"200 OK", StatusCode:200, Proto:"HTTP/1.1", ProtoMajor:1, ProtoMinor:1, Header:http.Header{"Date":[]string{"Wed, 09 Dec 2015 18:34:24 GMT"}, "Content-Type":[]string{"text/plain; charset=utf-8"}}, Body:(*http.bodyEOFSignal)(0xc820016b00), ContentLength:-1, TransferEncoding:[]string{"chunked"}, Close:false, Trailer:http.Header(nil), Request:(*http.Request)(0xc8200c21c0), TLS:(*tls.ConnectionState)(nil)}, <nil>
ReadAll = "", <nil>

At first I was expecting the ioutil.ReadAll to return an UnexpectedEOF from the chunked reader, but looking at strace, it does appear the server manages to send out the EOF chunk before the connection is closed:

[pid  5598] write(3, "GET /foo HTTP/1.1\r\nHost: localhost:8080\r\nUser-Agent: Go-http-client/1.1\r\nAccept-Encoding: gzip\r\n\r\n", 98 <unfinished ...>
[pid  5594] <... select resumed> )      = 0 (Timeout)
[pid  5598] <... write resumed> )       = 98
[pid  5597] <... epoll_wait resumed> {{EPOLLIN|EPOLLOUT, {u32=453330488, u64=140321329858104}}}, 128, 4294967295) = 1
[pid  5595] <... select resumed> )      = 0 (Timeout)
[pid  5597] clock_gettime(CLOCK_MONOTONIC, {4241936, 78474169}) = 0
[pid  5594] futex(0x915c28, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid  5598] futex(0x915c28, FUTEX_WAKE, 1 <unfinished ...>
[pid  5594] <... futex resumed> )       = -1 EAGAIN (Resource temporarily unavailable)
[pid  5598] <... futex resumed> )       = 0
[pid  5594] select(0, NULL, NULL, NULL, {0, 100} <unfinished ...>
[pid  5595] clock_gettime(CLOCK_MONOTONIC,  <unfinished ...>
[pid  5598] futex(0xc8200a8908, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid  5595] <... clock_gettime resumed> {4241936, 78537336}) = 0
[pid  5595] clock_gettime(CLOCK_REALTIME,  <unfinished ...>
[pid  5597] read(3, "HTTP/1.1 200 OK\r\nDate: Wed, 09 Dec 2015 18:30:35 GMT\r\nContent-Type: text/plain; charset=utf-8\r\nTransfer-Encoding: chunked\r\n\r\n", 4096) = 125
[pid  5595] <... clock_gettime resumed> {1449685835, 408091488}) = 0
[pid  5594] <... select resumed> )      = 0 (Timeout)
[pid  5595] select(0, NULL, NULL, NULL, {0, 20} <unfinished ...>
[pid  5594] epoll_wait(4,  <unfinished ...>
[pid  5597] clock_gettime(CLOCK_REALTIME, {1449685835, 408276297}) = 0
[pid  5597] write(2, "2015/12/09 18:30:35 Get = &http.Response{Status:\"200 OK\", StatusCode:200, Proto:\"HTTP/1.1\", ProtoMajor:1, ProtoMinor:1, Header:http.Header{\"Date\":[]string{\"Wed, 09 Dec 2015 18:30:35 GMT\"}, \"Content
-Type\":[]string{\"text/plain; charset=utf-8\"}}, Body:(*http.bodyEOFSignal)(0xc8200184c0), ContentLength:-1, TransferEncoding:[]string{\"chunked\"}, Close:false, Trailer:http.Header(nil), Request:(*http.Requ"..., 460 <unfinishe
d ...>
[pid  5595] <... select resumed> )      = 0 (Timeout)
[pid  5595] clock_gettime(CLOCK_MONOTONIC, {4241936, 78841051}) = 0
[pid  5595] clock_gettime(CLOCK_REALTIME, {1449685835, 408362768}) = 0
[pid  5595] select(0, NULL, NULL, NULL, {0, 20}2015/12/09 18:30:35 Get = &http.Response{Status:"200 OK", StatusCode:200, Proto:"HTTP/1.1", ProtoMajor:1, ProtoMinor:1, Header:http.Header{"Date":[]string{"Wed, 09 Dec 2015 18:30:35 GMT"}, "Content-Type":[]string{"text/plain; charset=utf-8"}}, Body:(*http.bodyEOFSignal)(0xc8200184c0), ContentLength:-1, TransferEncoding:[]string{"chunked"}, Close:false, Trailer:http.Header(nil), Request:(*http.Request)(0xc8200d6380), TLS:(*tls.ConnectionState)(nil)}, <nil>
 <unfinished ...>
[pid  5597] <... write resumed> )       = 460
[pid  5597] read(3, 0xc820116000, 4096) = -1 EAGAIN (Resource temporarily unavailable)
[pid  5597] futex(0xc820028d08, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid  5595] <... select resumed> )      = 0 (Timeout)
[pid  5595] clock_gettime(CLOCK_MONOTONIC, {4241936, 78964304}) = 0
[pid  5595] futex(0x915460, FUTEX_WAIT, 0, {60, 0} <unfinished ...>
[pid  5594] <... epoll_wait resumed> {{EPOLLIN|EPOLLOUT, {u32=453330488, u64=140321329858104}}}, 128, 4294967295) = 1
[pid  5594] clock_gettime(CLOCK_MONOTONIC, {4241937, 77657115}) = 0
[pid  5594] futex(0xc820028d08, FUTEX_WAKE, 1) = 1
[pid  5594] futex(0x915460, FUTEX_WAKE, 1) = 1
[pid  5595] <... futex resumed> )       = 0
[pid  5594] read(3,  <unfinished ...>
[pid  5595] clock_gettime(CLOCK_MONOTONIC,  <unfinished ...>
[pid  5594] <... read resumed> "0\r\n\r\n", 4096) = 5
[pid  5597] <... futex resumed> )       = 0
[pid  5594] read(3,  <unfinished ...>
[pid  5597] epoll_wait(4,  <unfinished ...>
[pid  5594] <... read resumed> "", 4096) = 0
[pid  5597] <... epoll_wait resumed> {{EPOLLIN|EPOLLOUT|0x2000, {u32=453330488, u64=140321329858104}}}, 128, 0) = 1
[pid  5594] epoll_ctl(4, EPOLL_CTL_DEL, 3, {0, {u32=0, u64=0}} <unfinished ...>
[pid  5595] <... clock_gettime resumed> {4241937, 77996995}) = 0
[pid  5594] <... epoll_ctl resumed> )   = 0
[pid  5594] close(3)                    = 0
[pid  5595] clock_gettime(CLOCK_REALTIME,  <unfinished ...>
[pid  5594] write(1, "ReadAll = \"\", <nil>\n", 20ReadAll = "", <nil>
) = 20
[pid  5597] epoll_wait(4,  <unfinished ...>

And if I add to the Server's Handler:

                println("Hello", r.RemoteAddr)
                defer println("Bye", r.RemoteAddr)

I see a new connection for each one:

Hello 127.0.0.1:45172
Bye 127.0.0.1:45172
Hello 127.0.0.1:45173
Bye 127.0.0.1:45173
Hello 127.0.0.1:45174
Bye 127.0.0.1:45174
Hello 127.0.0.1:45175
Bye 127.0.0.1:45175
Hello 127.0.0.1:45176
Bye 127.0.0.1:45176
Hello 127.0.0.1:45177
Bye 127.0.0.1:45177
Hello 127.0.0.1:45178

@bradfitz bradfitz closed this Dec 9, 2015

@golang golang locked and limited conversation to collaborators Dec 14, 2016

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.