-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
os: Test{Read,Write}TimeoutFluctuation failures on BSD variants #50725
Comments
|
Any action to be taken on my side? |
I still think changing the test to |
Okay, let me know if we need to check anything OS specific on our side :) |
Rolling forward to 1.20. Please comment if you disagree. Thanks. |
Another failure on https://storage.googleapis.com/go-build-log/6db3b12e/openbsd-amd64-68_317a0f16.log (an If we're not going to prioritize a fix for these tests, they should be skipped to reduce noise when testing unrelated changes. |
An 0.9s delay during all.bash on a slow machine (running many different test binaries at once, which can be causing long rescheduling delays) does not seem out of the ordinary to me. I think this test is too picky. |
These timeout fluctuation tests were added (by me) in https://go.dev/cl/71770. They were directly copied from the net package. Since then, the net package versions of those tests have been updated, by @bcmills, in https://go.dev/cl/365334, for #36108. The first step here might be to copy CL 365334 into the os package. |
(Note that that fix also needed a followup in CL 372215.) |
I will prepare a CL. |
Change https://go.dev/cl/415234 mentions this issue: |
This applies the net package CL 365334, CL 366176, CL 372215 to the os package. CL 365334: These tests were checking for fairly narrow timing windows, but were running in parallel and heavily dependent on timer and goroutine scheduling. This change eliminates unnecessary goroutines, runs the tests sequentially (dramatically shortening the timeouts to reduce the penalty of doing so), and uses timestamp comparison instead of background timers to hopefully gain some robustness from monotonic timestamps. Many of the other tests from this package would benefit from similar simplifications, which we can apply if and when we notice flaky failures or want to improve the latency of running the test. CL 366176: It appears that at least the OpenBSD kernel gets sloppier the longer the timeout we give it, up to an observed overhead of around 25%. Let's give it a little more than that (33%) in the comparison, and also increase the growth curve to match the actual observed times instead of exponential initial growth. CL 372215: Decrease the slop everywhere else, since NetBSD and OpenBSD seem to be the only ones that miss by that much. For golang#36108 For golang#50189 Fixes golang#50725 (we hope) Change-Id: I0854d27af67ca9fcf0f9d9e4ff67acff4c2effc8 Reviewed-on: https://go-review.googlesource.com/c/go/+/415234 Run-TryBot: Ian Lance Taylor <iant@golang.org> Reviewed-by: Bryan Mills <bcmills@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com>
greplogs --dashboard -md -l -e '(?ms)FAIL: Test(?:Read|Write)TimeoutFluctuation.*FAIL\s+os\s' --since=2021-01-01
2022-01-18T23:59:40-50869f3/dragonfly-amd64
2021-11-18T06:05:29-f6647f2/freebsd-arm-paulzhol
2021-11-03T00:07:03-d6f7203/dragonfly-amd64
2021-09-24T14:52:47-217507e/netbsd-arm-bsiegert
2021-09-21T20:28:50-2fc7df9/dragonfly-amd64
2021-04-16T01:54:27-f08c552/dragonfly-amd64-5_8
This may or may not be related in some way to #50189.
The most frequent recent failures are on
dragonfly-amd64
(CC @tuxillo). A good first step might be to change the test topanic
instead of callingt.Errorf
ort.Fatalf
: apanic
would at least dump any goroutines that are stuck so that we can see which goroutines and/or system calls are still in flight at the time of failure.The text was updated successfully, but these errors were encountered: