New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: Transport needs a limit on total connections (per host) #6785

Open
calmh opened this Issue Nov 18, 2013 · 27 comments

Comments

Projects
None yet
@calmh
Contributor

calmh commented Nov 18, 2013

(go1.2rc4)

We're working on a reverse HTTP proxy that does some "stuff" (that's not
relevant) and proxies towards one or more backend HTTP servers. During testing I ran
into connection failures apparently due to running out of file descriptors. In narrowing
this down I reduced the problem to;

The "null" reverse proxy (doing nothing but proxying, using the
httputil.ReverseProxy):

http://play.golang.org/p/p1g4bpTZ_g

A trivial HTTP server acting as the backend behind the above proxy; it simply counts the
number of connections and responds successfully:

http://play.golang.org/p/F7W-vbXEUt

Running both of these, I have the proxy on :8080 forwarding to the server at :8090. To
test it, I use wrk (https://github.com/wg/wrk);

jb@jborg-mbp:~ $ wrk -c 1 -t 1 http://localhost:8080/
...

Granted, this isn't a super realistic reproduction of the real world since the latency
on localhost is minimal. I can't prove the same thing _can't_ happen in production
though.

Using one connection (-c 1) up to about three, this works perfectly. The server side
sees a bunch of requests over one to three connections, i.e. the number of backend
connections from the proxy matches the number of incoming connections. At around -c 4
and upwards, it blows up. The proxy doesn't manage to recycle connections quickly enough
and starts doing regular Dial:s at a rate of thousands per second, resulting in

...
2013/11/18 20:18:21 http: proxy error: dial tcp 127.0.0.1:8090: can't assign requested
address
2013/11/18 20:18:21 http: proxy error: dial tcp 127.0.0.1:8090: can't assign requested
address
2013/11/18 20:18:21 http: proxy error: dial tcp 127.0.0.1:8090: can't assign requested
address
2013/11/18 20:18:21 http: proxy error: dial tcp 127.0.0.1:8090: can't assign requested
address
...

from the proxy code and of course HTTP errors as seen by wrk. 

My theory, after going through the http.Transport, is that when the number of requests/s
to the proxy goes up, the small amount of bookkeeping that is required to recycle a
connection (putIdleConn etc) starts taking just long enough that the next request in the
pipeline gets in before a connection is idle. The Transport starts Dial:ing, adding more
connections to take care of, and it explodes chaotically. I would prefer if it blocked
and awaited an idle connection instead of Dial:ing at some point.

Line 92, "// TODO: tunable on global max cached connections" seems relevant,
although in my case the tunable should probably be max connections per host instead of
globally.

I'll probably take a stab at implementing something like that to fix this (i.e.
rewriting Transport to limit connections), since I couldn't figure out a way around it
by just using the available API:s. Unless I seem to have misunderstood something
obvious...

//jb
@rsc

This comment has been minimized.

Contributor

rsc commented Nov 27, 2013

Comment 1:

Labels changed: added go1.3maybe.

@rsc

This comment has been minimized.

Contributor

rsc commented Dec 4, 2013

Comment 2:

Labels changed: added release-none, removed go1.3maybe.

@rsc

This comment has been minimized.

Contributor

rsc commented Dec 4, 2013

Comment 3:

Labels changed: added repo-main.

@davecheney

This comment has been minimized.

Contributor

davecheney commented Jul 16, 2014

Comment 4:

Labels changed: added suggested, removed priority-triage.

Status changed to LongTerm.

@bradfitz

This comment has been minimized.

Member

bradfitz commented Aug 24, 2014

Comment 5:

Issue #8536 has been merged into this issue.

@gopherbot

This comment has been minimized.

gopherbot commented Sep 11, 2014

Comment 6 by surajn.vnit:

Just wanted to check if this is planned anytime soon. Ran into similar issue - at scale
looks like there is high contention on the semaphore for PutIdleConn and GetIdleConn. 
316 @ 0x415989 0x415a0b 0x429eee 0x42a130 0x49ad56 0x45056e 0x450ab8 0x44ef26 0x43390d
0x4330db 0x43456f 0x433bbf 0x400e4f 0x415c20
#   0x42a130    sync.runtime_Semacquire+0x30            /usr/local/go/src/pkg/runtime/sema.goc:199
#   0x49ad56    sync.(*Mutex).Lock+0xd6             /usr/local/go/src/pkg/sync/mutex.go:66
#   0x45056e    net/http.(*Transport).getIdleConn+0x9e      /usr/local/go/src/pkg/net/http/transport.go:402
#   0x450ab8    net/http.(*Transport).getConn+0xa8      /usr/local/go/src/pkg/net/http/transport.go:452
#   0x44ef26    net/http.(*Transport).RoundTrip+0x416       /usr/local/go/src/pkg/net/http/transport.go:201
#   0x43390d    net/http.send+0x43d             /usr/local/go/src/pkg/net/http/client.go:195
#   0x4330db    net/http.(*Client).send+0x15b           /usr/local/go/src/pkg/net/http/client.go:118
#   0x43456f    net/http.(*Client).doFollowingRedirects+0x97f   /usr/local/go/src/pkg/net/http/client.go:343
#   0x433bbf    net/http.(*Client).Get+0xaf         /usr/local/go/src/pkg/net/http/client.go:275
#   0x400e4f    main.startClient+0x16f              /home/adsteam/go_workspace/src/suraj.narkhede/stresstester/main.go:48
347 @ 0x415989 0x415a0b 0x429eee 0x42a130 0x49ad56 0x44fd5e 0x455b87 0x454c1b 0x454931
0x483291 0x480f6d 0x400f0b 0x415c20
#   0x42a130    sync.runtime_Semacquire+0x30        /usr/local/go/src/pkg/runtime/sema.goc:199
#   0x49ad56    sync.(*Mutex).Lock+0xd6         /usr/local/go/src/pkg/sync/mutex.go:66
#   0x44fd5e    net/http.(*Transport).putIdleConn+0xce  /usr/local/go/src/pkg/net/http/transport.go:342
#   0x455b87    net/http.func·023+0x97         /usr/local/go/src/pkg/net/http/transport.go:853
#   0x454c1b    net/http.(*bodyEOFSignal).condfn+0xab   /usr/local/go/src/pkg/net/http/transport.go:1161
#   0x454931    net/http.(*bodyEOFSignal).Read+0x321    /usr/local/go/src/pkg/net/http/transport.go:1133
#   0x483291    io/ioutil.devNull.ReadFrom+0xb1     /usr/local/go/src/pkg/io/ioutil/ioutil.go:151
#   0x480f6d    io.Copy+0x13d               /usr/local/go/src/pkg/io/io.go:349
#   0x400f0b    main.startClient+0x22b          /home/adsteam/go_workspace/src/suraj.narkhede/stresstester/main.go:55
I have written a simple load tool in go (to evaluate go's http client performance) -
http://play.golang.org/p/MA-a2ZAVNe.
@gopherbot

This comment has been minimized.

gopherbot commented Sep 29, 2014

Comment 7 by bgmerrell:

jb, it sounds like you were working on something very similar to what I am trying to do.
 Did you ever come up with a work around?
@marbemac

This comment has been minimized.

marbemac commented Dec 14, 2014

Wondering if any progress has been made on this? I'm seeing similar results when running synthetic benchmarks - does this performance profile hold in production/real scenarios? If so, does using go as a serious reverse proxy become impractical?

@bradfitz

This comment has been minimized.

Member

bradfitz commented Dec 14, 2014

@marbemac, are you asking about max connections per host (what this bug is about), or the separate issue of mutex contention brought up in Comment 6? If the latter, I'd prefer to move it to a new bug.

@marbemac

This comment has been minimized.

marbemac commented Dec 14, 2014

@bradfitz I'm definitely seeing the first issue, I'm not sure if the issue in comment 6 is related. Basically I'm relatively new to go, and am trying to evaluate wether I can use it to build a decently fast reverse proxy. Running wrk against a go http server that simply returns a hello world string, I get the following (very unscientific) results:

With no proxy
wrk -t2 -c100 -d15s http://127.0.0.1:3000
29k requests/second, 0% failure

With nginx in front
wrk -t2 -c100 -d15s http://127.0.0.1:4205
28k requests/second, 0% failure

With NewSingleHostReverseProxy in front (https://play.golang.org/p/vk33Eg4QOS)
wrk -t2 -c100 -d15s http://127.0.0.1:4204
800 requests/second, 20% failure

Is this typical, or am I doing something wrong in the go code / testing setup. I'm running this on a Macbook Pro, max file descriptors have been increased.

@mentat

This comment has been minimized.

mentat commented Dec 24, 2014

I'm seeing a similar issue with a go program on both OSX and Linux. With only one concurrent request the proxy is stable, but when adding more it starts to break. My implementation is very simple.

@marbemac

This comment has been minimized.

marbemac commented Dec 24, 2014

@mentat Try playing with these two settings. I'm seeing some improvement with:

runtime.GOMAXPROCS(runtime.NumCPU())
http.DefaultTransport.(*http.Transport).MaxIdleConnsPerHost = 100

I'm curious to hear if it helps in your case, let me know.

@mentat

This comment has been minimized.

mentat commented Dec 25, 2014

Thanks @marbemac I've pushed out a new version with these settings and will let you know how it does with real-world traffic.

@tj

This comment has been minimized.

tj commented Jan 27, 2015

Hit this issue in prod, blocking would definitely be preferable to exhausting sockets

@cmdrkeene

This comment has been minimized.

cmdrkeene commented Apr 28, 2015

Hitting this issue as well in production. Was there a better solution here than crank up your MaxIdleConns to fit your workload?

@buth

This comment has been minimized.

buth commented Oct 8, 2015

The persistConn objects used by the Transport object define a closech to listen for the closing of the underlying connections.

https://github.com/golang/go/blob/master/src/net/http/transport.go#L821

If this is reliable, it seems like you could use it as the basis for a release on a (per-host) semaphore that dialConn would need to aquire in order to create a new connection. As a start on a proposal for limiting total per-host connections, does this sound way off?

@quentinmit quentinmit added the NeedsFix label Oct 7, 2016

@bradfitz

This comment has been minimized.

Member

bradfitz commented Nov 1, 2016

Sorry, didn't happen for Go 1.8.

@bradfitz bradfitz modified the milestones: Go1.9, Go1.8 Nov 1, 2016

@alexellis

This comment has been minimized.

alexellis commented Jan 27, 2017

I think I'm running into the same issue with a proxy lsof is showing hundreds of "ESTABLISHED" connections even when a response has been sent back to the client:

https://github.com/alexellis/faas/blob/master/watchdog/main.go

Can anyone confirm? @bradfitz

@tleyden

This comment has been minimized.

tleyden commented Jan 27, 2017

@alexellis don't you just need to add an r.Body.Close() to close the io.ReadCloser? Sent you a PR: openfaas/faas#13

alexellis added a commit to openfaas/faas that referenced this issue Jan 27, 2017

@rogchap

This comment has been minimized.

rogchap commented Feb 2, 2017

@tleyden From the Go http docs:

https://golang.org/pkg/net/http/#Request

// The Server will close the request body. The ServeHTTP
// Handler does not need to.
Body io.ReadCloser
@alexellis

This comment has been minimized.

alexellis commented Feb 2, 2017

@tleyden @rogchap I added a Close call to both ends (client and server were both Golang) which fixed the issue.

@rogchap

This comment has been minimized.

rogchap commented Feb 2, 2017

@alexellis Yup, you have to call Close on the client. Glad you got it working.

@bradfitz bradfitz added FeatureRequest and removed LongTerm labels May 24, 2017

@bradfitz bradfitz modified the milestones: Go1.10, Go1.9 May 24, 2017

@ldemailly

This comment has been minimized.

ldemailly commented Jul 14, 2017

Should I file a separate issue or is this the place to mention: I have code that does:

	ht := http.DefaultTransport.(*http.Transport)
	ht.MaxIdleConns = numConnections
	ht.MaxIdleConnsPerHost = numConnections

and then establishes numConnections simultaneous (go routines) back to back http connections (load testing tool)

if I check the number of sockets, it works as expected where I have numConnections established sockets for numConnections input with smaller numbers (between 1 and 9 or 10) but for 11 and up I have more sockets than the limit:

# with up to 10 it works
$ fortio -qps 0 -c 10 http://localhost:8080
$ netstat -na |awk ' $4 == "::1.8080" { print $0}'|wc -l
10
# with 16 there are more sockets:
$ echo `netstat -na |awk ' $4 == "::1.8080" { print $5}'| sort`
::1.56603 ::1.56604 ::1.56605 ::1.56606 ::1.56607 ::1.56608 ::1.56609 ::1.56610 ::1.56611 ::1.56612 ::1.56613 ::1.56615 ::1.56616 ::1.56617 ::1.56618 ::1.56619 ::1.56621 ::1.56622 ::1.56623 ::1.56624
$ netstat -na |awk ' $4 == "::1.8080" { print $0}'|wc -l
      20

this happens both on go version go1.8.3 darwin/amd64 and go version go1.8.1 linux/amd64
the full code to repro is go get istio.io/istio/devel/fortio/cmd/fortio (branch ldemailly-bench6 )

beside doing my own connection pooling is there a way to guarantee a maximum/exact number of socket used ? (and why do I get a few more than I ask for)

@rsc

This comment has been minimized.

Contributor

rsc commented Nov 22, 2017

Given that this has been like this since Go 1.2 and still not fixed it seems likely we should just close it. But moving to Go 1.11 at least.

@rsc rsc modified the milestones: Go1.10, Go1.11 Nov 22, 2017

@bradfitz

This comment has been minimized.

Member

bradfitz commented Nov 22, 2017

I don't want to close this. There are a number of related bugs I'd like to all address.

@luciferous

This comment has been minimized.

luciferous commented Sep 5, 2018

Looks like Transport.MaxConnsPerHost (new in Go1.11) addresses this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment