Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: TestRetryRequestsOnError is flaky #30938

Open
bradfitz opened this issue Mar 19, 2019 · 18 comments

Comments

@bradfitz
Copy link
Member

@bradfitz bradfitz commented Mar 19, 2019

From https://storage.googleapis.com/go-build-log/c7862f5a/linux-amd64-race_4ed6f3c3.log

--- FAIL: TestRetryRequestsOnError (0.02s)
    --- FAIL: TestRetryRequestsOnError/NothingWrittenGetBody (0.00s)
        transport_test.go:3116: i=1: Do = Post http://fake.golang: read tcp 127.0.0.1:52650->127.0.0.1:42017: use of closed network connection; log:
            Dial
            Write("POST / HTTP/1.1\r\nHost: fake.golang\r\nUser-Agent: Go-http-client/1.1\r\nContent-Length: 4\r\nAccept-Encoding: gzip\r\n\r\nfoo\n")
            Handler
            intentional write failure
2019/03/19 21:17:09 http: TLS handshake error from 127.0.0.1:38418: write tcp 127.0.0.1:43483->127.0.0.1:38418: use of closed network connection
@bradfitz bradfitz added this to the Go1.13 milestone Mar 19, 2019
@bradfitz bradfitz self-assigned this Mar 19, 2019
@cuonglm

This comment has been minimized.

Copy link
Contributor

@cuonglm cuonglm commented Mar 20, 2019

@bradfitz is there a way to reliably reproduce this?

I tried:

for _ in {1..100}; do go-tip test -count=1 -race .;done

with no error.

@bradfitz

This comment has been minimized.

Copy link
Member Author

@bradfitz bradfitz commented Mar 25, 2019

I haven't tried.

@bradfitz

This comment has been minimized.

@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented May 16, 2019

https://storage.googleapis.com/go-build-log/de0346c9/linux-amd64-race_86344abc.log

--- FAIL: TestRetryRequestsOnError (0.02s)
    --- FAIL: TestRetryRequestsOnError/NothingWrittenNoBody (0.00s)
        transport_test.go:3217: i=1: Do = Delete http://fake.golang: read tcp 127.0.0.1:56096->127.0.0.1:40507: use of closed network connection; log:
            Dial
            Write("DELETE / HTTP/1.1\r\nHost: fake.golang\r\nUser-Agent: Go-http-client/1.1\r\nAccept-Encoding: gzip\r\n\r\n")
            Handler
            intentional write failure
2019/05/16 17:31:29 http: TLS handshake error from 127.0.0.1:49704: read tcp 127.0.0.1:33311->127.0.0.1:49704: use of closed network connection
2019/05/16 17:31:29 http: TLS handshake error from 127.0.0.1:49712: write tcp 127.0.0.1:33311->127.0.0.1:49712: use of closed network connection
2019/05/16 17:31:29 http: TLS handshake error from 127.0.0.1:49710: read tcp 127.0.0.1:33311->127.0.0.1:49710: use of closed network connection
2019/05/16 17:31:29 http: TLS handshake error from 127.0.0.1:49716: read tcp 127.0.0.1:33311->127.0.0.1:49716: use of closed network connection
2019/05/16 17:31:29 http: TLS handshake error from 127.0.0.1:49706: read tcp 127.0.0.1:33311->127.0.0.1:49706: use of closed network connection
2019/05/16 17:31:29 http: TLS handshake error from 127.0.0.1:49708: write tcp 127.0.0.1:33311->127.0.0.1:49708: use of closed network connection
2019/05/16 17:31:29 http: TLS handshake error from 127.0.0.1:49714: write tcp 127.0.0.1:33311->127.0.0.1:49714: use of closed network connection
2019/05/16 17:31:29 http: TLS handshake error from 127.0.0.1:49720: read tcp 127.0.0.1:33311->127.0.0.1:49720: use of closed network connection
2019/05/16 17:31:29 http: TLS handshake error from 127.0.0.1:49734: write tcp 127.0.0.1:33311->127.0.0.1:49734: use of closed network connection
2019/05/16 17:31:29 http: TLS handshake error from 127.0.0.1:49722: read tcp 127.0.0.1:33311->127.0.0.1:49722: use of closed network connection
2019/05/16 17:31:29 http: TLS handshake error from 127.0.0.1:49718: read tcp 127.0.0.1:33311->127.0.0.1:49718: use of closed network connection
2019/05/16 17:31:29 http: TLS handshake error from 127.0.0.1:49726: read tcp 127.0.0.1:33311->127.0.0.1:49726: use of closed network connection
2019/05/16 17:31:29 http: TLS handshake error from 127.0.0.1:49728: read tcp 127.0.0.1:33311->127.0.0.1:49728: use of closed network connection
2019/05/16 17:31:29 http: TLS handshake error from 127.0.0.1:49724: read tcp 127.0.0.1:33311->127.0.0.1:49724: use of closed network connection
2019/05/16 17:31:29 http: TLS handshake error from 127.0.0.1:49732: read tcp 127.0.0.1:33311->127.0.0.1:49732: use of closed network connection
2019/05/16 17:31:29 http: TLS handshake error from 127.0.0.1:49730: read tcp 127.0.0.1:33311->127.0.0.1:49730: use of closed network connection
FAIL
@cuonglm

This comment has been minimized.

@andybons andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019
@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Jul 31, 2019

@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Aug 20, 2019

android-amd64-emu (https://build.golang.org/log/92ec6ce5e426c6e197dc01557d726a680dc5134a):

--- FAIL: TestRetryRequestsOnError (0.03s)
    --- FAIL: TestRetryRequestsOnError/NothingWrittenNoBody (0.00s)
        transport_test.go:3224: i=1: Do = Delete http://fake.golang: read tcp 127.0.0.1:36280->127.0.0.1:41698: use of closed network connection; log:
            Dial
            Write("DELETE / HTTP/1.1\r\nHost: fake.golang\r\nUser-Agent: Go-http-client/1.1\r\nAccept-Encoding: gzip\r\n\r\n")
            Handler
            intentional write failure
FAIL
exitcode=1FAIL	net/http	9.085s
@jtarchie

This comment has been minimized.

Copy link

@jtarchie jtarchie commented Sep 11, 2019

We may be experiencing this error, too.
We have a test that is testing retry logic when a connection (in <=go1.12) would receive a io.EOF when the server closed the client connection.
When we upgraded to go1.13, we no longer receive io.EOF on the closed connection. We receive tls.errClosed.
With retry logic, we were able to compare the type of io.EOF, but with tls.errClosed, we now have to inspect the error string (Error()). This doesn't feel right to do.
We don't have a reproducible simple test case at the moment, but hopefully I've provided enough information from our current test case.

jtarchie added a commit to pivotal-cf/om that referenced this issue Sep 11, 2019
Read for more info: golang/go#30938 (comment)

Signed-off-by: Jeremy Alvis <jalvis@pivotal.io>
@jtarchie

This comment has been minimized.

Copy link

@jtarchie jtarchie commented Sep 13, 2019

It appears that tls.errClosed may not be the only culprit.
We are experiencing also a bunch of write: broken pipe in CI.
🤔

@jtarchie

This comment has been minimized.

Copy link

@jtarchie jtarchie commented Sep 13, 2019

We've reverted back to go1.12.

@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Sep 16, 2019

@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Oct 4, 2019

windows-amd64-2016: https://build.golang.org/log/bef07597bea372590c2de133c9ad1afeddd7a42f

Same failure, mode, but no TLS handshake error (consistent with @jtarchie's observation).

@jtarchie

This comment has been minimized.

Copy link

@jtarchie jtarchie commented Oct 4, 2019

I might be able to figure out a reproducible (flakey) test, but I don't know if that actually hunts down the issue.
I'm not sure what changed in the go 1.12 to 1.13 source for TLS. I've looked at the diffs via git.
The fact that the error types changed (io.EOF to tls.errClosed | write: broken pipe) is probably where to start.

This could also be due to changes in httptest TLS Server.

@rsc rsc modified the milestones: Go1.14, Backlog Oct 9, 2019
@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Oct 21, 2019

dragonfly-amd64: https://build.golang.org/log/782c3e02489de8e3203dcd50f037538956bab465
...at least we know it's not platform-dependent?

Actually, there is a pattern here: the only two cases that seem to flake are NothingWrittenNoBody and NothingWrittenGetBody. The Idempotent cases have not yet been implicated.

@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Oct 21, 2019

This note in the test may provide a clue:

// NOTE: we resend a request only if:
// - we reused a keep-alive connection
// - we haven't yet received any header data
// - either we wrote no bytes to the server, or the request is idempotent

I think the TLS handshake error is a red herring: it isn't necessarily even coming from the same test function.

@bcmills bcmills changed the title net/http: TestRetryRequestsOnError flake net/http: TestRetryRequestsOnError is flaky Nov 21, 2019
@bradfitz

This comment has been minimized.

Copy link
Member Author

@bradfitz bradfitz commented Nov 22, 2019

I can reproduce this if I burn a bunch of CPU (https://gist.github.com/bradfitz/233894f4aec11b221cb06abd8489d2e4) and run

$ go test  -run=TestRetryRequestsOnError/NothingWrittenGetBody -race -count=500
--- FAIL: TestRetryRequestsOnError (0.02s) 
    --- FAIL: TestRetryRequestsOnError/NothingWrittenGetBody (0.02s)
        transport_test.go:3395: i=1: Do = Post "http://fake.golang": read tcp 127.0.0.1:50552->127.0.0.1:46615: use of closed network connection; log:
            Dial 
            Write("POST / HTTP/1.1\r\nHost: fake.golang\r\nUser-Agent: Go-http-client/1.1\r\nContent-Length: 4\r\nAccept-Encoding: gzip\r\n\r\nfoo\n")
            Handler
            intentional write failure      
--- FAIL: TestRetryRequestsOnError (0.01s) 
    --- FAIL: TestRetryRequestsOnError/NothingWrittenGetBody (0.00s)
        transport_test.go:3395: i=1: Do = Post "http://fake.golang": read tcp 127.0.0.1:42122->127.0.0.1:38355: use of closed network connection; log:
            Dial
            Write("POST / HTTP/1.1\r\nHost: fake.golang\r\nUser-Agent: Go-http-client/1.1\r\nContent-Length: 4\r\nAccept-Encoding: gzip\r\n\r\nfoo\n")
            Handler
            intentional write failure
FAIL
exit status 1   
FAIL    net/http        55.811s

This check involving checking for a failure under 25 milliseconds scares me:

if time.Since(t0) < MaxWriteWaitBeforeConnReuse/2 {

... because all test code involving time scares me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.