Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: TestPingPongHog failures with ratios out of range #52207

Closed
bcmills opened this issue Apr 7, 2022 · 10 comments
Closed

runtime: TestPingPongHog failures with ratios out of range #52207

bcmills opened this issue Apr 7, 2022 · 10 comments
Assignees
Labels
NeedsInvestigation release-blocker Soon
Milestone

Comments

@bcmills
Copy link
Member

@bcmills bcmills commented Apr 7, 2022

--- FAIL: TestPingPongHog (0.10s)
    proc_test.go:486: want hogCount/lightCount in [0.2, 5]; got 24000000/133792000 = 0.1793829227457546
FAIL
FAIL	runtime	155.091s

See previously #38266, #20494.

greplogs --dashboard -md -l -e 'FAIL: TestPingPongHog .*(?:\n .*)*want hogCount/lightCount in ' --since=2021-01-01

2022-04-06T20:46:47-81ae993/linux-386-longtest
2022-03-10T21:10:30-914195c/windows-amd64-longtest
2021-11-12T22:20:50-9150c16/linux-386-longtest
2021-10-29T17:12:47-3aecb3a/linux-386-longtest

(CC @golang/runtime)

@bcmills
Copy link
Member Author

@bcmills bcmills commented Apr 7, 2022

Marking as release-blocker for Go 1.19 because windows/386 and windows/amd64 are first class ports.

(This at least needs to be triaged by the runtime team before the release, and they can either fix the failure or add a skip based on the outcome of that triage.)

@bcmills bcmills added OS-Windows NeedsInvestigation release-blocker labels Apr 7, 2022
@bcmills bcmills added this to the Go1.19 milestone Apr 7, 2022
@mknyszek mknyszek self-assigned this Apr 7, 2022
@heschi
Copy link
Contributor

@heschi heschi commented Apr 20, 2022

Ping: any progress on this?

@bcmills
Copy link
Member Author

@bcmills bcmills commented May 2, 2022

greplogs --dashboard -md -l -e 'FAIL: TestPingPongHog .*(?:\n .*)*want hogCount/lightCount in ' --since=2022-04-07
2022-04-30T04:04:40-fd6c556/windows-amd64-longtest
2022-04-30T02:35:50-0ebe224/windows-amd64-longtest
2022-04-25T23:36:50-a5d61be/windows-amd64-longtest

@bcmills bcmills changed the title runtime: TestPingPongHog failures with ratios out of range on Windows runtime: TestPingPongHog failures with ratios out of range May 3, 2022
@bcmills bcmills removed the OS-Windows label May 3, 2022
@bcmills
Copy link
Member Author

@bcmills bcmills commented May 3, 2022

greplogs -l -e 'FAIL: TestPingPongHog .*(?:\n .*)*want hogCount/lightCount in ' --since=2022-05-01
2022-05-03T14:36:37-d0cda4d/linux-386-longtest

@bcmills
Copy link
Member Author

@bcmills bcmills commented May 9, 2022

greplogs -l -e 'FAIL: TestPingPongHog .*(?:\n .*)*want hogCount/lightCount in ' --since=2022-05-04
2022-05-06T15:32:04-7f71e1f/windows-amd64-longtest
2022-05-06T04:22:04-3ea3bc0/windows-amd64-longtest
2022-05-06T02:40:50-7dd9884/windows-amd64-longtest
2022-05-04T04:21:29-7cf3268/windows-amd64-longtest

and https://storage.googleapis.com/go-build-log/84130882/linux-amd64-longtest_b0bacea2.log (a SlowBot)

@golang/runtime: can this test be skipped until someone has the bandwidth to fix it? It's gotten really noisy.

@bcmills bcmills added the Soon label May 9, 2022
@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented May 17, 2022

The code comment says

// We can use a
// fairly large factor here to make this robust: if the
// scheduler isn't working right, the gap should be ~1000X.

All failures above are within a factor of 10 (the current factor is 5). Perhaps we can just increase the factor, to (say) 20 or 50?

@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented May 17, 2022

I agree with Cherry. It sounds like this just needs a lot more slack. However, I think this test might just be fundamentally flaky because it's not robust to OS scheduling delays (I'm thinking about #52433 here). I wonder if we could prove that this is the case with perf on the linux/386 builders.

This is definitely a test that, if we had, for instance, scheduler instrumentation for testing, we could construct a situation where we make sure that a goroutine gets scheduled within N calls to the scheduler on average (or something), instead of bounding it by real timings.

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented May 17, 2022

Yeah, scheduler instrumentation would be much better for testing this kind of things.

Perhaps for short term we could increase the threshold to reduce the noise on builders.

@gopherbot
Copy link

@gopherbot gopherbot commented May 18, 2022

Change https://go.dev/cl/407134 mentions this issue: runtime: relax the threshold for TestPingPongHog

@gopherbot
Copy link

@gopherbot gopherbot commented May 19, 2022

Change https://go.dev/cl/407415 mentions this issue: runtime: fix overflow in PingPongHog test

gopherbot pushed a commit that referenced this issue May 19, 2022
On 32-bit systems the result of hogCount*factor can overflow.
Use division instead to do comparison.

Update #52207

Change-Id: I429fb9dc009af645acb535cee5c70887527ba207
Reviewed-on: https://go-review.googlesource.com/c/go/+/407415
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation release-blocker Soon
Projects
None yet
Development

No branches or pull requests

5 participants