New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
time: TestZeroTimer failures #66006
Comments
Found new dashboard test flakes for:
2024-02-28 16:44 linux-amd64-longtest go@d1e8dc25 time.TestZeroTimer (log)
2024-02-28 16:44 windows-amd64-longtest go@1df6db8e time.TestZeroTimer (log)
|
Found new dashboard test flakes for:
2024-02-28 16:44 linux-amd64-longtest-race go@d1e8dc25 time.TestZeroTimer (log)
|
Found new dashboard test flakes for:
2024-02-28 16:44 gotip-linux-386-longtest go@d1e8dc25 time.TestZeroTimer (log)
2024-02-28 16:44 gotip-linux-amd64-longtest-race go@1df6db8e time.TestZeroTimer (log)
|
I looked into these. I can reproduce on my laptop using The failure is reliably introduced by one of my CLs (2fb5ef8) but then reliably fixed by a CL a few commits later (c6888d9) [transcript below]. The problem is that when I fixed modtimer, I corrected a deadlock (as noted in the CL 564125 description) by unlocking t before locking pp.timersLock. But that created a window when the timer was in state timerWaiting without actually being in a heap. Something about that state caused the timer not to be noticed as things idled, even though it got inserted into the heap shortly afterward. Moving the unlock back down after the doaddtimer reintroduces the hypothetical deadlock but also closes the window and fixes the test failure. Of course, we don't want to do that. In CL 564127 I closed that window by setting t.pp before unlocking t, so that a future modification of t would at least set t.pp.timerModifiedEarliest (minNextWhen). That avoids the deadlock but still closes the window and fixes the test failure. This window where t was in state timerWaiting but t.pp == nil is also the root cause of the ppc64le builder failure that I worried might be bad atomics. Phew. In pending CL 564977, I reintroduce the window where the timer is not yet in the heap, but now the timer is not marked as being in a heap, so it's not a window of inconsistency like in the failures. I also wrote a func-based TestZeroTimer (since the channel-based one will be handled by the new channel special case in pending CL 512355), verified that it failed in the same failure window, and verified that it still passes with CL 546977. I will mail that out as a new CL and mark it as fixing this GitHub issue, but the flake is already gone. /cc @aclements
|
Change https://go.dev/cl/568255 mentions this issue: |
Found new dashboard test flakes for:
2024-02-28 16:44 gotip-darwin-amd64-longtest go@58911599 time.TestZeroTimer (log)
|
Not sure why this only just appeared, but 5891159 is one of the commits in the known bad range. |
Many of the tests in package time are about proper manipulation of the timer heap. But now NewTimer bypasses the timer heap except when something is blocked on the associated channel. Make the tests test the heap again by using AfterFunc instead of NewTimer. In particular, adds a non-chan version of TestZeroTimer, which was flaky-broken and then fixed by CLs in the cleanup stack. This new tests makes sure we notice if it breaks again. Fixes #66006. Change-Id: Ib59fc1b8b85ef5a21e72fe418c627c9b8b8a083a Reviewed-on: https://go-review.googlesource.com/c/go/+/568255 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Austin Clements <austin@google.com>
Found new dashboard test flakes for:
2024-03-08 21:09 windows-amd64-longtest go@85bbb121 time.TestZeroTimer (log)
|
Found new dashboard test flakes for:
2024-03-11 19:54 windows-amd64-longtest go@a18aa0e3 time.TestZeroTimer (log)
|
Found new dashboard test flakes for:
2024-03-12 16:49 linux-arm64-longtest go@f83102cf time.TestZeroTimer (log)
|
@rsc, it looks like this is still happening. |
🤔 |
Unlike the last time, I can't reproduce this with go test time -count=50 -run=Zero on my laptop. The timing here doesn't look like it is still happening so much as it recurred after a week of being fixed. The first new failure being on the day I submitted more time CLs probably indicates it is one of these: 2024-03-08 85bbb12 runtime: fix timers.wakeTime inaccuracy race |
Reproduced under stress; seems to be 85bbb12, ironically. Does not reproduce in my local client, though. I will bisect it later today to see if it is really fixed. |
Found new dashboard test flakes for:
2024-03-13 13:58 linux-arm64-longtest go@6e5398ba time.TestZeroTimer (log)
|
Found new dashboard test flakes for:
2024-03-13 18:22 linux-amd64-longtest go@38723f2b time.TestZeroTimer (log)
|
Lots more reproduction. |
The bug is clear from the diff of 85bbb12 if you read it carefully enough. Oops. Fix mailed. |
Change https://go.dev/cl/571196 mentions this issue: |
Issue created automatically to collect these failures.
Example (log):
— watchflakes
The text was updated successfully, but these errors were encountered: