Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: TestTimeoutHandlerSuperfluousLogs is flaky #35051

Closed
bcmills opened this issue Oct 21, 2019 · 13 comments
Closed

net/http: TestTimeoutHandlerSuperfluousLogs is flaky #35051

bcmills opened this issue Oct 21, 2019 · 13 comments

Comments

@bcmills
Copy link
Member

@bcmills bcmills commented Oct 21, 2019

The test added in CL 200518 appears to have flaked on the dragonfly-amd64 builder (https://build.golang.org/log/dbcda17a3616121b8fc8c2efc58cb42e82a6d373):

--- FAIL: TestTimeoutHandlerSuperfluousLogs (1.04s)
    --- FAIL: TestTimeoutHandlerSuperfluousLogs/return_after_timeout (1.04s)
        serve_test.go:6336: Response mismatch
            Got
            "HTTP/1.1 404 Not Found\r\nContent-Length: 0\r\n\r\n"
            
            Want
            "HTTP/1.1 503 Service Unavailable\r\nContent-Length: 15\r\n\r\ntimed out here!"
FAIL
FAIL	net/http	11.471s

I suspect that the problem relates to the use of a hard-coded timeout (here), although I haven't verified that.

CC @odeke-em @bradfitz

@bcmills bcmills added this to the Go1.14 milestone Oct 21, 2019
@odeke-em
Copy link
Member

@odeke-em odeke-em commented Oct 21, 2019

Thanks for the file @bcmills! Sheesh how does one then test a Timeout if 2X the requested sleep time still doesn't time out? I'll send a CL skipping that test to avoid blocking the builders.

@bcmills
Copy link
Member Author

@bcmills bcmills commented Oct 21, 2019

The flake rate seems low enough that a Skip probably isn't necessary at this point.

But yeah: never underestimate the slowness of builders.

(It's a corollary of the “infinite monkeys” theorem: “Infinitely many monkeys simulating TryBot runs will eventually produce an antagonistic schedule.”)

@odeke-em
Copy link
Member

@odeke-em odeke-em commented Oct 21, 2019

The flake rate seems low enough that a Skip probably isn't necessary at this point.

Cool, thanks! However, this would point to being a runtime problem if the timer for the goroutine that fires insider the handler, fires earlier than the timer of the overall handler. We should investigate that angle too.

But yeah: never underestimate the slowness of builders.

(It's a corollary of the “infinite monkeys” theorem: “Infinitely many monkeys simulating TryBot runs will eventually produce an antagonistic schedule.”)

Hehe, indeed!

@bcmills
Copy link
Member Author

@bcmills bcmills commented Oct 21, 2019

if the timer for the goroutine that fires insider the handler, fires earlier than the timer of the overall handler.

Note that with goroutine preemption (#24543), it is in general possible for the two timers to fire “in order”, but for the next statement(s) after the first timer to be preempted immediately and ultimately execute after the statements immediately after the second timer. (If the two goroutines don't have a “happens-before” edge, their execution order is unspecified regardless of wall time.)

@bcmills
Copy link
Member Author

@bcmills bcmills commented Nov 7, 2019

freebsd-amd64-race (https://build.golang.org/log/973f4071939084efe638fa859e6473379cdbe698):

--- FAIL: TestTimeoutHandlerSuperfluousLogs (0.55s)
    --- FAIL: TestTimeoutHandlerSuperfluousLogs/return_after_timeout (0.54s)
        serve_test.go:6338: Response mismatch
            Got
            "HTTP/1.1 404 Not Found\r\nContent-Length: 0\r\n\r\n"
            
            Want
            "HTTP/1.1 503 Service Unavailable\r\nContent-Length: 15\r\n\r\ntimed out here!"
FAIL
FAIL	net/http	14.330s
@bcmills bcmills changed the title net/http: TestTimeoutHandlerSuperfluousLogs flake on dragonfly-amd64 net/http: TestTimeoutHandlerSuperfluousLogs is flaky Nov 7, 2019
@bcmills
Copy link
Member Author

@bcmills bcmills commented Nov 21, 2019

@odeke-em, do you plan to address this yourself?

@odeke-em
Copy link
Member

@odeke-em odeke-em commented Nov 21, 2019

Yes, I'll work on this tomorrow morning or later tonight. Thanks for the ping @bcmills!

@odeke-em odeke-em self-assigned this Nov 21, 2019
@odeke-em
Copy link
Member

@odeke-em odeke-em commented Nov 22, 2019

I think I have 2 options here:
a) Reduce the maxTimeout to say 100 microseconds, increase sleepTime to 200 X maxTimeout -- would be odd if the scheduler doesn't hit us back after 200X the maxTimeout
b) Use faketime which by virtue is bound to have well organized/synchronized "sleeps" and the test should always pass

Despite us just only needing to ensure that that 503 is returned after the goroutine for the timer in the TimeoutHandler expires,
a) seems like the best horse in the race as it'll properly ensure that we test with common runtimes, but b) will ensure we never get flakes but also that will require us putting the test in a special file guarded with the faketime build constraint and there currently isn't any other test that does that, so perhaps I am overreacting by looking at b).

a) it is! CL coming up shortly.

@gopherbot
Copy link

@gopherbot gopherbot commented Nov 22, 2019

Change https://golang.org/cl/208477 mentions this issue: net/http: radically change TestTimeoutHandlerSuperfluousLogs timeouts

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 5, 2019

Interestingly this hasn't happened since 2019-11-25.

@gopherbot gopherbot closed this in bff4ebb Dec 7, 2019
@golang golang locked and limited conversation to collaborators Dec 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.