Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: TestNetpollBreak failures with "did not interrupt netpoll" on plan9 builders #39437

Closed
bcmills opened this issue Jun 6, 2020 · 4 comments

Comments

@bcmills bcmills added this to the Unplanned milestone Jun 6, 2020
@9pi
Copy link

@9pi 9pi commented Jun 6, 2020

CL 235820 is just a coincidence. I can provoke the same failure on the release branch:

term% go version
go version go1.14.2 plan9/arm
term% go test -count 1000 -run TestNetpollBreak runtime
--- FAIL: TestNetpollBreak (5.80s)
    proc_test.go:1031: netpollBreak did not interrupt netpoll: slept for: 5.769816333s
FAIL

There's no network poller for Plan 9, just runtime/netpoll_stub.go which presumably exists to pretend to pass the tests. In essence this test is just one goroutine doing a notetsleep for 10 seconds, and another repeatedly doing a notewakeup with a 100 microsecond Usleep each time around the loop. How this can result in a >5 second delay is mysterious to me. I can see it's not swapping.

@millerresearch
Copy link

@millerresearch millerresearch commented Jun 6, 2020

I can see it's not swapping.

Also, it's not a garbage collection delay: I tried with GOGC=off and still see failures.

@gopherbot
Copy link

@gopherbot gopherbot commented Jun 13, 2020

Change https://golang.org/cl/237698 mentions this issue: runtime: avoid lock starvation in TestNetpollBreak on Plan 9

@millerresearch
Copy link

@millerresearch millerresearch commented Jun 13, 2020

There's no network poller for Plan 9, just runtime/netpoll_stub.go which presumably exists to pretend to pass the tests.

My presumption was wrong: the stub "implementation" of netpoll is also used (currently on Plan 9 only) to support runtime timers. So the 10 second netpoll calls done by the test are contending with 10 minute netpoll calls which come from the overall go test timeout. The problem is that the runtime.lock which mediates this contention is unfair. When the 10 minute netpoll call is interrupted by netpollBreak it can be restarted and seize the lock before the 10 second netpoll call gets a chance. Repeated enough times, this starves the 10 second call sufficiently to time out the test.

CL 237698 inserts an osyield call to give the two callers a more even chance. It won't guarantee fairness, but a few hours running the test suggest that it does well enough.

@gopherbot gopherbot closed this in 9340bd6 Jun 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.