Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: http.Client calls can stall when a persistent connection is slow to close #30942

Open
aqua opened this issue Mar 20, 2019 · 2 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@aqua
Copy link

aqua commented Mar 20, 2019

go1.12's http.Client supports supplying a customized RoundTripper as the transport. A custom Dial can be supplied, e.g., to implement tunneling or proxying through nonstandard protocols, and returns objects implmenting net.Conn, which in turn includes a Close(). The native implementation of Close() on UNIX is a close(2) on the underlying socket, which is a local syscall and rarely blocks, especially in HTTP. However, if the supplied Dial returns a net.Conn with a longer shutdown (e.g. because it's tunneling over another protocol that requires doing a round trip on the underlying protocol to gracefully terminate), it can block for longer - potentially as long as the network RTT to a distant server plus that server's internal processing time.

The http implementation tries to keep persistent connections around, and when a new request is initiated to a "key" (scheme+protocol+host+port), it has an LRU cache of idle connections to choose from in getIdleConn() (https://golang.org/src/net/http/transport.go#L829). Before choosing one, it calls isBroken() on the corresponding http.persistConn (https://golang.org/src/net/http/transport.go#L848) in case the connection had just died an instant before. However, isBroken() requires acquiring the persistConn's mutex (https://golang.org/src/net/http/transport.go#L1533) before returning the nil-ness of persistConn.closed. That same mutex is held by persistConn.ReadLoop() (https://golang.org/src/net/http/transport.go#L1642), which can call pc.readLoopPeekFailLocked(), which calls pc.closeLocked(), which finally calls Close() on the net.Conn. This isn't a deadlock, but it does block with the persistConn.mu lock held for as long as it takes that Close() to finish, and so any calls to getIdleConn() that end up picking that connection will end up blocking as well. getIdleConn() removed the pconn from its idleLRU cache, but did not remove it from its list of idle connections, so it may end up getting picked.

Even for cases without a custom transport, there might be a tail-case optimization here for clients that make repeated use of the same HTTP server endpoint with finite re-usability of the TCP connections that could currently block bystander threads unnecessarily on the close() syscall.

Reference Google-internal bug 128442309.

@bradfitz bradfitz self-assigned this Mar 20, 2019
@bradfitz bradfitz added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 20, 2019
@bradfitz bradfitz added this to the Go1.13 milestone Mar 20, 2019
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/168345 mentions this issue: net/http: make Transport call net.Conn.Close without mutex held

@andybons andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019
@rsc rsc modified the milestones: Go1.14, Backlog Oct 9, 2019
@bradfitz bradfitz removed their assignment Oct 9, 2019
@martin-dos-santos
Copy link

any updates on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

6 participants