Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: TestClientRedirectContext is flaky #16049

Closed
eliasnaur opened this issue Jun 13, 2016 · 6 comments

Comments

Projects
None yet
7 participants
@eliasnaur
Copy link
Contributor

commented Jun 13, 2016

The TestClientRedirectContext test sometimes fail with this:

--- FAIL: TestClientRedirectContext (0.00s)
    client_test.go:317: url.Error.Err = dial tcp 127.0.0.1:60809: operation was canceled; want errRequestCanceled or errRequestCanceledConn
FAIL
FAIL    net/http    15.840s

Recent flake on the darwin/386 builder:

https://build.golang.org/log/618f21103437358fda63d77ffaf92dd8b581f6d2

@ianlancetaylor ianlancetaylor added this to the Go1.7Maybe milestone Jun 13, 2016

@adg adg assigned bradfitz and adg Jun 14, 2016

@adg adg removed their assignment Jun 27, 2016

@bradfitz

This comment has been minimized.

Copy link
Member

commented Jun 27, 2016

I can't reproduce this on my Mac laptop even with thousands of runs, in normal mode or with -race (which randomizes goroutine scheduling).

The code also seems fine.

I wonder if the failure was some network flake?

In any case, punting to Go 1.8. Help wanted.

@bradfitz bradfitz modified the milestones: Go1.8, Go1.7Maybe Jun 27, 2016

@eliasnaur

This comment has been minimized.

Copy link
Contributor Author

commented Jun 28, 2016

Thank you for looking into this. FWIW, it happened recently on the darwin/arm builder:

https://build.golang.org/log/2a4f80a178c686db64e9bf563d8511ff3dfa0410

@gregory-m

This comment has been minimized.

Copy link
Contributor

commented Jun 30, 2016

I can constantly reproduce it on my Mac laptop (OS X 10.11.5). Using stress command

⇢ $ go test -gcflags= net/http -c                                                                                                                                      
⇢ $ stress  ./http.test -test.short -test.run TestClientRedirectContext
1514 runs so far, 0 failures
3007 runs so far, 0 failures

/var/folders/lw/kb0_gx6x43d6jv3mj4s4t6v00000gn/T/go-stress949703166
--- FAIL: TestClientRedirectContext (0.00s)
    client_test.go:317: url.Error.Err = dial tcp 127.0.0.1:53382: operation was canceled; want errRequestCanceled or errRequestCanceledConn
FAIL

But I can't produce any valuable stack trace.
If running with runtime.StartTrace() the test pass forever.

Any ideas how to get any valuable info?

@jaredborner

This comment has been minimized.

Copy link

commented Jul 6, 2016

i've found some things that may help here

there seems to be a race condition in Transport.getConn (transport.go:797), specifically between the go func wrapping t.dialConn (transport.go:839) and the select statement following it (transport.go:845)

in the test case, Transport.getConn is called twice with the context canceled between these calls

the first time:

  • t.dialConn returns a nil error
  • this nil error is returned from the select statement

the second time:

  • t.dialConn returns an operation was canceled error
  • usually the select statement doesn't see this and executes the <-req.Context().Done() case (returning errRequestCanceledConn)...
  • ... but sometimes the t.dialConn error makes it through the channel in time and is returned instead

the following added log statements highlight this.

passing:

client.go:257: calling transport roundtrip
transport.go:841: error from t.dialConn: <nil>
transport.go:853: returning from '<-dialc' case. err: <nil>
client.go:496: checking redirect
client_test.go:305: canceling
client.go:257: calling transport roundtrip
transport.go:841: error from t.dialConn: dial tcp 127.0.0.1:64091: operation was canceled
transport.go:871: returning from '<-req.Context().Done()' case
PASS

so even though errRequestCanceledConn is returned, the operation was canceled error is still present.

failing:

client.go:257: calling transport roundtrip
transport.go:841: error from t.dialConn: <nil>
transport.go:853: returning from '<-dialc' case. err: <nil>
client.go:496: checking redirect
client_test.go:305: canceling
client.go:257: calling transport roundtrip
transport.go:841: error from t.dialConn: dial tcp 127.0.0.1:64097: operation was canceled
transport.go:853: returning from '<-dialc' case. err: dial tcp 127.0.0.1:64097: operation was canceled
--- FAIL: TestClientRedirectContext (0.21s)
    client_test.go:319: url.Error.Err = dial tcp 127.0.0.1:64097: operation was canceled; want errRequestCanceled or errRequestCanceledConn

also, adding a small time.Sleep call between the go func wrapping the t.dialConn and the select statement makes this behavior much more reproducible for me (~50% of the time just using regular go test)

@bradfitz

This comment has been minimized.

Copy link
Member

commented Jul 6, 2016

@jaredborner, thanks! That let me reproduce it. I sent out https://golang.org/cl/24748

@bradfitz bradfitz modified the milestones: Go1.7Maybe, Go1.8 Jul 6, 2016

@gopherbot

This comment has been minimized.

Copy link

commented Jul 6, 2016

CL https://golang.org/cl/24748 mentions this issue.

@gopherbot gopherbot closed this in d872201 Jul 7, 2016

@golang golang locked and limited conversation to collaborators Jul 7, 2017

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.