time: TestZeroTimer failures #66006

gopherbot · 2024-02-28T17:47:46Z

#!watchflakes
default <- pkg == "time" && test == "TestZeroTimer"

Issue created automatically to collect these failures.

Example (log):

--- FAIL: TestZeroTimer (120.77s)
    sleep_test.go:578: Expected time to get value from Timer channel in less than 2 sec, took 2m0.009812517s

— watchflakes

The text was updated successfully, but these errors were encountered:

gopherbot · 2024-02-28T17:47:49Z

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "time" && test == "TestZeroTimer"

2024-02-28 16:44 linux-amd64-longtest go@d1e8dc25 time.TestZeroTimer (log)

--- FAIL: TestZeroTimer (120.77s)
    sleep_test.go:578: Expected time to get value from Timer channel in less than 2 sec, took 2m0.009812517s

2024-02-28 16:44 windows-amd64-longtest go@1df6db8e time.TestZeroTimer (log)

--- FAIL: TestZeroTimer (362.37s)
    sleep_test.go:578: Expected time to get value from Timer channel in less than 2 sec, took 2m0.0098644s
    sleep_test.go:578: Expected time to get value from Timer channel in less than 2 sec, took 2m0.0108545s
    sleep_test.go:578: Expected time to get value from Timer channel in less than 2 sec, took 2m0.0010985s

— watchflakes

gopherbot · 2024-02-28T18:29:31Z

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "time" && test == "TestZeroTimer"

2024-02-28 16:44 linux-amd64-longtest-race go@d1e8dc25 time.TestZeroTimer (log)

--- FAIL: TestZeroTimer (243.44s)
    sleep_test.go:578: Expected time to get value from Timer channel in less than 2 sec, took 2m0.004081005s
    sleep_test.go:578: Expected time to get value from Timer channel in less than 2 sec, took 2m0.00057097s

— watchflakes

gopherbot · 2024-02-29T02:42:12Z

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "time" && test == "TestZeroTimer"

2024-02-28 16:44 gotip-linux-386-longtest go@d1e8dc25 time.TestZeroTimer (log)

=== RUN   TestZeroTimer
    sleep_test.go:578: Expected time to get value from Timer channel in less than 2 sec, took 2m0.010132448s
--- FAIL: TestZeroTimer (122.94s)

2024-02-28 16:44 gotip-linux-amd64-longtest-race go@1df6db8e time.TestZeroTimer (log)

=== RUN   TestZeroTimer
    sleep_test.go:578: Expected time to get value from Timer channel in less than 2 sec, took 2m0.009149723s
--- FAIL: TestZeroTimer (127.47s)

— watchflakes

rsc · 2024-02-29T16:33:38Z

I looked into these. I can reproduce on my laptop using go test -count=50 -run=Zero -timeout=3m in go/src/time. The failure rate is about 1 per 10 runs, so count=50 gets about 5 failures. Each failure takes 2m, so the 3m timeout speeds up the check. Without a failure the command takes 0.5m.

The failure is reliably introduced by one of my CLs (2fb5ef8) but then reliably fixed by a CL a few commits later (c6888d9) [transcript below].

The problem is that when I fixed modtimer, I corrected a deadlock (as noted in the CL 564125 description) by unlocking t before locking pp.timersLock. But that created a window when the timer was in state timerWaiting without actually being in a heap. Something about that state caused the timer not to be noticed as things idled, even though it got inserted into the heap shortly afterward. Moving the unlock back down after the doaddtimer reintroduces the hypothetical deadlock but also closes the window and fixes the test failure. Of course, we don't want to do that.

In CL 564127 I closed that window by setting t.pp before unlocking t, so that a future modification of t would at least set t.pp.timerModifiedEarliest (minNextWhen). That avoids the deadlock but still closes the window and fixes the test failure.

This window where t was in state timerWaiting but t.pp == nil is also the root cause of the ppc64le builder failure that I worried might be bad atomics. Phew.

In pending CL 564977, I reintroduce the window where the timer is not yet in the heap, but now the timer is not marked as being in a heap, so it's not a window of inconsistency like in the failures. I also wrote a func-based TestZeroTimer (since the channel-based one will be handled by the new channel special case in pending CL 512355), verified that it failed in the same failure window, and verified that it still passes with CL 546977. I will mail that out as a new CL and mark it as fixing this GitHub issue, but the flake is already gone.

/cc @aclements

% git foreach ed07e16ffd..7649c9b212 go test -count=50 -run=Zero -timeout=5m
f8654408d6 go/build: add check for allowed vendor packages

0728e2b139 runtime: simplify, speed up adjusttimers

77f0bd01fb runtime: delete clearDeletedTimers

8a493a6672 runtime: move all timer-locking code into time.go

4a3c3ec996 runtime: delete addtimer

b2bd7ebafc runtime: merge timerModifiedEarlier and timerModifierLater

09bfea95cf runtime: merge timerRemoving into timerModifying

3be36e9b20 runtime: merge timerMoving into timerModifying

ed0ad591d3 runtime: merge timerNoStatus into timerRemoved

db8c6c8c7a runtime: merge timerDeleted into timerModified

51df232b12 runtime: use timer.lock in deltimer

b5a99eaa94 runtime: add timer lock to lock ranking

2fb5ef889b runtime: use timer.lock in modtimer
FAIL
58911599e8 runtime: use timer.lock in cleantimers
FAIL
d1e8dc25ff runtime: use timer.lock in runtimer
FAIL
1df6db8e4f runtime: use timer.lock in adjusttimers
FAIL
c6888d9264 runtime: use timer.lock in moveTimers

a155a2f8a1 runtime: change timer.status to timer.state

370c8e983b runtime: use methods for timer

b6753baaed cmd/compile: compile len(ch) as call to runtime.chanlen

022d16dece runtime: move per-P timers state into its own struct

ea15ebfb38 time: garbage collect unstopped Tickers and Timers

98981caf4e runtime: rename timers fields for clarity

dc5b02899d runtime: update timers.len with Store instead of Add

63a5dfc55c runtime: fix timers.wakeTime inaccuracy race

7649c9b212 make.bat: find bootstrap toolchain on LUCI gomotes

gopherbot · 2024-02-29T17:03:10Z

Change https://go.dev/cl/568255 mentions this issue: time: make sure tests avoid the special-case channel code

gopherbot · 2024-02-29T17:24:07Z

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "time" && test == "TestZeroTimer"

2024-02-28 16:44 gotip-darwin-amd64-longtest go@58911599 time.TestZeroTimer (log)

=== RUN   TestZeroTimer
    sleep_test.go:578: Expected time to get value from Timer channel in less than 2 sec, took 2m0.061488161s
--- FAIL: TestZeroTimer (122.37s)

— watchflakes

rsc · 2024-02-29T18:49:33Z

Not sure why this only just appeared, but 5891159 is one of the commits in the known bad range.

Many of the tests in package time are about proper manipulation of the timer heap. But now NewTimer bypasses the timer heap except when something is blocked on the associated channel. Make the tests test the heap again by using AfterFunc instead of NewTimer. In particular, adds a non-chan version of TestZeroTimer, which was flaky-broken and then fixed by CLs in the cleanup stack. This new tests makes sure we notice if it breaks again. Fixes #66006. Change-Id: Ib59fc1b8b85ef5a21e72fe418c627c9b8b8a083a Reviewed-on: https://go-review.googlesource.com/c/go/+/568255 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Austin Clements <austin@google.com>

gopherbot · 2024-03-08T22:30:19Z

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "time" && test == "TestZeroTimer"

2024-03-08 21:09 windows-amd64-longtest go@85bbb121 time.TestZeroTimer (log)

--- FAIL: TestZeroTimer (122.22s)
    sleep_test.go:578: Expected time to get value from Timer channel in less than 2 sec, took 2m0.0054632s

— watchflakes

gopherbot · 2024-03-11T23:45:48Z

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "time" && test == "TestZeroTimer"

2024-03-11 19:54 windows-amd64-longtest go@a18aa0e3 time.TestZeroTimer (log)

--- FAIL: TestZeroTimer (245.81s)
    --- FAIL: TestZeroTimer/impl=chan (122.13s)
        sleep_test.go:636: Expected time to get value from Timer channel in less than 2 sec, took 2m0.0028434s
    --- FAIL: TestZeroTimer/impl=func (123.68s)
        sleep_test.go:636: Expected time to get value from Timer channel in less than 2 sec, took 2m0.0058075s

— watchflakes

gopherbot · 2024-03-12T18:04:36Z

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "time" && test == "TestZeroTimer"

2024-03-12 16:49 linux-arm64-longtest go@f83102cf time.TestZeroTimer (log)

--- FAIL: TestZeroTimer (122.61s)
    --- FAIL: TestZeroTimer/impl=func (121.43s)
        sleep_test.go:636: Expected time to get value from Timer channel in less than 2 sec, took 2m0.007207148s

— watchflakes

aclements · 2024-03-13T16:11:25Z

@rsc, it looks like this is still happening.

rsc · 2024-03-13T16:26:06Z

🤔

rsc · 2024-03-13T16:33:00Z

Unlike the last time, I can't reproduce this with go test time -count=50 -run=Zero on my laptop. The timing here doesn't look like it is still happening so much as it recurred after a week of being fixed. The first new failure being on the day I submitted more time CLs probably indicates it is one of these:

2024-03-08 85bbb12 runtime: fix timers.wakeTime inaccuracy race
2024-03-08 3ad5cd4 runtime: update timers.len with Store instead of Add
2024-03-08 2171e62 runtime: rename timers fields for clarity
2024-03-08 e38be31 time: gracefully handle ts.zombies underflow

rsc · 2024-03-13T16:59:41Z

Reproduced under stress; seems to be 85bbb12, ironically. Does not reproduce in my local client, though. I will bisect it later today to see if it is really fixed.

gopherbot · 2024-03-13T17:54:47Z

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "time" && test == "TestZeroTimer"

2024-03-13 13:58 linux-arm64-longtest go@6e5398ba time.TestZeroTimer (log)

--- FAIL: TestZeroTimer (122.57s)
    --- FAIL: TestZeroTimer/impl=chan (121.18s)
        sleep_test.go:636: Expected time to get value from Timer channel in less than 2 sec, took 2m0.005818595s

— watchflakes

gopherbot · 2024-03-13T19:17:50Z

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "time" && test == "TestZeroTimer"

2024-03-13 18:22 linux-amd64-longtest go@38723f2b time.TestZeroTimer (log)

--- FAIL: TestZeroTimer (121.77s)
    --- FAIL: TestZeroTimer/impl=func (121.02s)
        sleep_test.go:636: Expected time to get value from Timer channel in less than 2 sec, took 2m0.009710965s

— watchflakes

rsc · 2024-03-13T20:35:03Z

Lots more reproduction. stress ./time.test -test.run=Zero -test.timeout=3m -test.count=20 consistently fails at least a few runs out of a hundred, even with all my pending CLs applied. Definitely started at 85bbb12 although I don't see why. Maybe my fault, maybe a latent bug I uncovered.

rsc · 2024-03-14T00:37:44Z

The bug is clear from the diff of 85bbb12 if you read it carefully enough. Oops. Fix mailed.

gopherbot · 2024-03-14T00:41:38Z

Change https://go.dev/cl/571196 mentions this issue: runtime: fix lost sleep causing TestZeroTimer flakes

gopherbot added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 28, 2024

bcmills added release-blocker compiler/runtime Issues related to the Go compiler and/or runtime. labels Feb 29, 2024

bcmills added this to the Go1.23 milestone Feb 29, 2024

mknyszek closed this as completed Mar 6, 2024

gopherbot reopened this Mar 8, 2024

aclements assigned rsc Mar 13, 2024

gopherbot closed this as completed in 2965dc9 Mar 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

time: TestZeroTimer failures #66006

time: TestZeroTimer failures #66006

gopherbot commented Feb 28, 2024

gopherbot commented Feb 28, 2024

gopherbot commented Feb 28, 2024

gopherbot commented Feb 29, 2024

rsc commented Feb 29, 2024

gopherbot commented Feb 29, 2024

gopherbot commented Feb 29, 2024

rsc commented Feb 29, 2024

gopherbot commented Mar 8, 2024

gopherbot commented Mar 11, 2024

gopherbot commented Mar 12, 2024

aclements commented Mar 13, 2024

rsc commented Mar 13, 2024

rsc commented Mar 13, 2024

rsc commented Mar 13, 2024

gopherbot commented Mar 13, 2024

gopherbot commented Mar 13, 2024

rsc commented Mar 13, 2024

rsc commented Mar 14, 2024

gopherbot commented Mar 14, 2024

time: TestZeroTimer failures #66006

time: TestZeroTimer failures #66006

Comments

gopherbot commented Feb 28, 2024

gopherbot commented Feb 28, 2024

gopherbot commented Feb 28, 2024

gopherbot commented Feb 29, 2024

rsc commented Feb 29, 2024

gopherbot commented Feb 29, 2024

gopherbot commented Feb 29, 2024

rsc commented Feb 29, 2024

gopherbot commented Mar 8, 2024

gopherbot commented Mar 11, 2024

gopherbot commented Mar 12, 2024

aclements commented Mar 13, 2024

rsc commented Mar 13, 2024

rsc commented Mar 13, 2024

rsc commented Mar 13, 2024

gopherbot commented Mar 13, 2024

gopherbot commented Mar 13, 2024

rsc commented Mar 13, 2024

rsc commented Mar 14, 2024

gopherbot commented Mar 14, 2024