Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync: Pool tests flaky on arm builders #31422

Closed
bcmills opened this issue Apr 11, 2019 · 9 comments

Comments

@bcmills
Copy link
Member

commented Apr 11, 2019

Possibly related to #24640.

Samples:
https://build.golang.org/log/10c155a9635967f5b3006b6a04b6d5442ff9713a:

--- FAIL: TestPoolDequeue (0.00s)
    pool_test.go:239: popHead never succeeded
FAIL
FAIL	sync	0.864s

https://build.golang.org/log/3fbb17b4083eca9629c97f7b71879c804ecf5d0d and
https://build.golang.org/log/9f98720ace008f8c98f74c0d14049cb67b3c56f5:

##### sync -cpu=10
--- FAIL: TestPoolChain (0.00s)
    pool_test.go:239: popHead never succeeded
FAIL
FAIL	sync	0.864s

@bcmills bcmills added this to the Go1.13 milestone Apr 11, 2019

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Apr 11, 2019

@aclements

This comment has been minimized.

Copy link
Member

commented Apr 16, 2019

I just got this once in 1,045 runs of all.bash on my linux/amd64 workstation.

--- FAIL: TestPoolChain (0.00s)
    pool_test.go:239: popHead never succeeded
FAIL
FAIL    sync    0.827s

This is certainly a theoretically possible failure, but when I wrote this test I though the chance of hitting the bad schedule was infinitesimal. Maybe there's a more likely schedule that can cause this.

@josharian

This comment has been minimized.

Copy link
Contributor

commented May 14, 2019

Lots of instances of this on arm and arm64 builders:

$ greplogs -dashboard -E popHead -l
2019-04-29T15:23:10-db1514c/linux-arm64-packet
2019-04-29T21:26:07-d5014ec/linux-arm64-packet
2019-04-29T22:17:05-ccbc9a3/linux-arm64-packet
2019-04-30T15:48:46-4ad1355/netbsd-arm-bsiegert
2019-04-30T16:59:13-f686a28/netbsd-arm-bsiegert
2019-04-30T18:40:06-62ddf7d/linux-arm64-packet
2019-04-30T19:13:43-8e4f1a7/linux-arm64-packet
2019-04-30T20:26:36-85387aa/netbsd-arm-bsiegert
2019-05-01T14:59:51-ab5cee5/netbsd-arm-bsiegert
2019-05-01T16:10:05-e56c73f/netbsd-arm-bsiegert
2019-05-01T16:53:19-f0c383b/netbsd-arm-bsiegert
2019-05-01T16:55:33-07f6894/netbsd-arm-bsiegert
2019-05-01T21:14:28-aaf40f8/netbsd-arm-bsiegert
2019-05-01T22:22:41-e5f0d14/netbsd-arm-bsiegert
2019-05-02T14:04:56-2316784/netbsd-arm-bsiegert
2019-05-02T14:44:05-19f5c23/netbsd-arm-bsiegert
2019-05-02T22:17:31-fe83731/netbsd-arm-bsiegert
2019-05-03T15:17:54-5e404b3/netbsd-arm-bsiegert
2019-05-03T15:20:15-f5c43b9/netbsd-arm-bsiegert
2019-05-03T15:20:41-2c67cdf/linux-arm64-packet
2019-05-03T18:42:04-7fcba81/linux-arm64-packet
2019-05-06T17:06:16-5003b62/netbsd-arm-bsiegert
2019-05-06T18:17:03-cc5eaf9/linux-arm64-packet
2019-05-06T20:09:58-e1f9e70/netbsd-arm-bsiegert
2019-05-06T20:57:39-a62b572/netbsd-arm-bsiegert
2019-05-06T20:59:20-f4a5ae5/netbsd-arm-bsiegert
2019-05-06T21:14:52-5c15ed6/linux-arm
2019-05-06T21:23:29-04845fe/linux-arm64-packet
2019-05-06T21:23:29-04845fe/netbsd-arm-bsiegert
2019-05-06T23:02:29-6b1ac82/netbsd-arm-bsiegert
2019-05-06T23:23:45-53374e7/linux-arm64-packet
2019-05-06T23:23:45-53374e7/netbsd-arm-bsiegert
2019-05-07T12:48:04-a88cb1d/netbsd-arm-bsiegert
2019-05-07T16:59:51-8280455/linux-arm64-packet
2019-05-07T16:59:51-8280455/netbsd-arm-bsiegert
2019-05-08T16:00:05-4cd6c3b/linux-arm64-packet
2019-05-08T16:00:05-4cd6c3b/netbsd-arm-bsiegert
2019-05-08T16:55:59-2625fef/netbsd-arm-bsiegert
2019-05-08T17:11:57-5a2da56/netbsd-arm-bsiegert
2019-05-09T00:02:34-f766b68/netbsd-arm-bsiegert
2019-05-09T16:10:22-d56199d/linux-arm64-packet
2019-05-09T17:11:16-a44c3ed/linux-arm64-packet
2019-05-09T17:49:12-50a1d89/netbsd-arm-bsiegert
2019-05-09T21:13:18-6ed2ec4/netbsd-arm-bsiegert
2019-05-09T21:13:21-1ea7644/netbsd-arm-bsiegert
2019-05-09T21:13:39-13723d4/netbsd-arm-bsiegert
2019-05-09T21:13:56-a4f5c9c/netbsd-arm-bsiegert
2019-05-10T00:14:40-4ae31dc/netbsd-arm-bsiegert
2019-05-10T14:24:43-2aa8971/netbsd-arm-bsiegert
2019-05-11T03:02:33-ce5ae2f/netbsd-arm-bsiegert
2019-05-11T23:19:40-0926701/netbsd-arm-bsiegert

@bcmills bcmills changed the title sync: Pool tests flaky on linux-arm64-packet builder sync: Pool tests flaky on arm builders May 14, 2019

@dianhong01

This comment has been minimized.

Copy link

commented Jun 3, 2019

when I run sync pool test cases like below for about 2000 times, they were all passed in arm64 device.
../golang/bin/go test sync -cpu=10 - c -o s1
./s1
But when I run case like that:
../golang/bin/go test sync -cpu=10 - c -o s2
./s2 -test.short
there were 1521 passed and 1378 failed.

when run all.bash script, the flag '-test.short' is set, which could make installation more efficient. In this case, the flag '-test.short' control value of "N". As comment in code "In theory it's possible in a valid schedule for popHead to never succeed", so I guess maybe N is too small to pass the case.

func testPoolDequeue(t *testing.T, d PoolDequeue) {
const P = 10
// In long mode, do enough pushes to wrap around the 21-bit
// indexes.
N := 1<<21 + 1000
if testing.Short() {
N = 1e3
}
...........

@bcmills

This comment has been minimized.

Copy link
Member Author

commented Jun 26, 2019

@aclements, is this still on the radar for 1.13? Is this more likely a bug in the test, or in the Pool implementation?

@aclements

This comment has been minimized.

Copy link
Member

commented Jun 26, 2019

Given that the long test doesn't flake, this is almost certainly a bug in the test. In the short test, there are only 100 expected PopHeads. On my linux/amd64 laptop, in 1000 runs, it gets as low as 50 successful PopHeads, but that seems to be a hard floor. It does give me pause that the failure rate is that high, since I would expect these schedules to be quite rare.

@aclements

This comment has been minimized.

Copy link
Member

commented Jun 26, 2019

I added some logging. It looks like the time between the PushHead committing and the PopHead committing is just long enough that the racing PopTail loop can regularly succeed and drain the queue.

This means it's just the test. I'm not sure why it's so flaky on arm64 specifically, but it may be that that window is just larger because of architectural details. I'm still thinking about how to make the test less flaky. We could of course just add retries, but it would be nice to do something better.

@aclements

This comment has been minimized.

Copy link
Member

commented Jun 26, 2019

Or we just remove the nPopHead check.

@gopherbot

This comment has been minimized.

Copy link

commented Jun 26, 2019

Change https://golang.org/cl/183981 mentions this issue: sync: only check for successful PopHeads in long mode

@gopherbot gopherbot closed this in 9caaac2 Jun 26, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.