Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync: TestWaitGroupMisuse2 takes 45-90 seconds on netbsd, AIX #22944

Open
bradfitz opened this issue Nov 30, 2017 · 11 comments

Comments

@bradfitz
Copy link
Member

commented Nov 30, 2017

TestWaitGroupMisuse2 on Linux with 8 cores takes 0.22s.

On NetBSD it does pass but takes 45-90 seconds. Why?

/cc @bsiegert

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Nov 30, 2017

/cc @dvyukov for any theories.

@dvyukov

This comment has been minimized.

Copy link
Member

commented Dec 1, 2017

This test requires physical parallelism, so my first bet would be on a problem in netbsd scheduler.

@dvyukov

This comment has been minimized.

Copy link
Member

commented Dec 1, 2017

Perhaps execution trace will sched some light.

@coypoop

This comment has been minimized.

Copy link
Contributor

commented Dec 6, 2017

how can I run this specific test?

@bsiegert

This comment has been minimized.

Copy link
Contributor

commented Dec 6, 2017

go test sync should work/

@coypoop

This comment has been minimized.

Copy link
Contributor

commented Dec 6, 2017

netbsd's nanosleep always schedules another process, even for really short sleeps. making it spin on really short sleeps fixes this

@jdolecek-zz

This comment has been minimized.

Copy link

commented Dec 9, 2017

NetBSD nanosleep() always sleeps for at least 1 schedule slice when the specified time is under the schedule resolution. With default HZ value of 100 for i386 and amd64, it's always at least 10ms even when the specified time is smaller. I think other systems might work similar way.

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Dec 9, 2017

@coypoop, or:

$ go test -v -run=TestWaitGroupMisuse2 sync

Or:

$ go test -v -run=TestWaitGroupMisuse2 -count=20 sync
@bradfitz bradfitz changed the title sync: TestWaitGroupMisuse2 takes 45-90 seconds on netbsd w/ 8 cores sync: TestWaitGroupMisuse2 takes 45-90 seconds on netbsd, AIX May 29, 2019
@bradfitz bradfitz modified the milestones: Unplanned, Go1.13 May 29, 2019
@bradfitz

This comment has been minimized.

Copy link
Member Author

commented May 29, 2019

It takes 45 seconds on AIX too, and never panics as it expects to:

https://build.golang.org/log/4521fa0230a42f6f3b70e8f108c3ab63d962d567

/cc @Helflym

@Helflym

This comment has been minimized.

Copy link
Contributor

commented Jun 5, 2019

I was already aware about this failure. But I didn't find anything relevant.
As the test says "The detection is opportunistic" and in some cases, I don't get a panic until iteration 500000... So a few other cases might not trigger it at all, but it's random.
Note that it also explains the slowness of this test. On a local Linux machine, almost all the panic occurs during the first 1000th iteration. On AIX builder, it's far more random, it can happen at the 10th one like at 100000th one...

Edit: as NetBSD, it might be related to AIX scheduler:

The suspension time may be longer than requested due to the scheduling of other activity by the system.

@andybons andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019
@bcmills

This comment has been minimized.

Copy link
Member

commented Oct 4, 2019

TestWaitGroupMisuse2 has mostly been passing on AIX as far as I can tell, but the failure mode from #22944 (comment) cropped up again in one aix-ppc64 builder run today (https://build.golang.org/log/a38f4c89dcb8e8b62b37ccefeae0de03dfcfecb5):

--- FAIL: TestWaitGroupMisuse2 (44.26s)
    waitgroup_test.go:133: Should panic
    waitgroup_test.go:96: Unexpected panic: <nil>
FAIL
FAIL	sync	44.880s
@rsc rsc modified the milestones: Go1.14, Backlog Oct 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants
You can’t perform that action at this time.