/ go Public
net/http: possible race on persistConn.roundtrip vs. persistConn.readloop in transport when using httptrace #59310
Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
What version of Go are you using (
Does this issue reproduce with the latest release?
What operating system and processor architecture are you using (
What did you do?
Given a simple tcp server, that immediately closes an incoming connection:
Given a client that uses httptrace:
What did you expect to see?
What did you see instead?
The use-case for this scenario is that we want to know if we can safely retry POST requests. The hypothesis is that if
WroteHeaderson httptrace were not called during the request, we know that no data of the POST request has actually reached the backend. In our case, the backend is behind a L4 reverse proxy which always accepts connections first, but immediately closes them if it cannot forward the connection to its own backend. This has lead to issues where the actual backend would not even accept connections but we don't see that because from our end, the connection got accepted by the reverse proxy. So we want to have some way of knowing whether we could not send HTTP headers. Unfortunately, the error
EOFis very generic and we can't really make any assumptions just based on this error, as it can happen anywhere during the lifecycle of a connection.
GotConncallback adds a minimal delay of 0.1ms, the behavior is different and
http: server closed idle connectionis returned. The race I observed is this:
persistConn.roundtripis called first, increasing
numExpectedResponsesto 1, which prevents
persistConn.readLoopPeekFailLockedfrom being called later.
readLoopPeekFailLockedis used to peek into the connection to see if it is dead or alive. If
EOFis encountered during the peek,
errServerClosedIdleis returned. In this case this does not work.
GotConncallback function of
persistConn.readLoopis called first, while
numExpectedResponsesis still 0 (from zero value initialization, NOT from getting reduced after actually reading a response, which leads to
readLoopPeekFailLockedgetting called and the final
errServerClosedIdlereturned to the caller.
GotConnis below 100 microseconds, the returned error starts flapping between
http: server closed idle connection. Above 500 microseconds we consistently see
http: server closed idle connection, below 10 microseconds we consistently see
EOFin the no delay case is actually a
nothingWrittenErrorwhich gets unwrapped to a
transportReadFromServerErrorwhich then unwraps the
EOFthat is returned to
Transport.RoundTrip, stripping all helpful information unfortunately.
The text was updated successfully, but these errors were encountered: