Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: freebsd/386 flaky TestCgoSignalDeadlock #18598

Closed
broady opened this issue Jan 10, 2017 · 12 comments

Comments

Projects
None yet
7 participants
@broady
Copy link
Member

commented Jan 10, 2017

While preparing go1.8rc1:

--- FAIL: TestCgoSignalDeadlock (1.91s)
        crash_cgo_test.go:34: expected "OK\n", but got:
                HANG
FAIL
FAIL    runtime 46.016s

/cc @ianlancetaylor

@broady broady added the OS-FreeBSD label Jan 10, 2017

@broady

This comment has been minimized.

Copy link
Member Author

commented Jan 10, 2017

Another one:

--- FAIL: TestStackGrowth (43.16s)
        stack_test.go:114: finalizer did not run
FAIL
FAIL    runtime 57.182s
@bradfitz

This comment has been minimized.

Copy link
Member

commented Mar 3, 2017

Let's keep this bug about TestCgoSignalDeadlock.

I filed #19381 for the TestStackGrowth bug.

@bradfitz bradfitz added the Testing label Mar 3, 2017

@bradfitz

This comment has been minimized.

Copy link
Member

commented Mar 3, 2017

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Jun 25, 2017

I haven't yet been able to recreate this using gomote with freebsd-386-110. When I run the test using gomote, it takes consistently less than 0.2s. Looking at the times from the failures listed above, I see these times:

279.16
382.73
1.92 (another complicated runtime package failure)
182.60
151.64
147.00
251.81
229.31
84.96
2.06 (only 386 failure, the others are ARM)

So it looks like at at least on ARM, something is making the test take much much longer than expected. There is no FreeBSD ARM gomote. Since the test has a timeout, if something slows it down a lot it is expected to fail.

Since the test calls t.Parallel, it is at least possible that it is other testing work that is causing the test to run much slower.

@gopherbot

This comment has been minimized.

Copy link

commented Jun 26, 2017

CL https://golang.org/cl/46723 mentions this issue.

gopherbot pushed a commit that referenced this issue Jun 27, 2017

runtime: get more info for TestCgoSignalDeadlock failures
Updates #18598

Change-Id: I13c60124714cf9d1537efa0a7dd1e6a0fed9ae5b
Reviewed-on: https://go-review.googlesource.com/46723
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Jul 12, 2017

Thanks. It looks like while the test is running the system is getting steadily more and more overloaded, and the test blows through the timeout. I note that the test calls t.Parallel. I would guess that some other t.Parallel test is using too many resources.

Something I didn't notice before is that every failure on ARM is in the GOMAXPROCS=2 runtime -cpu=1,2,4 section, likely meaning that the failure occurs when GOMAXPROCS is set to a value larger than the number of hardware threads.

It would be nice if we could figure out a way to print what other tests are running when this one fails.

@aclements

This comment has been minimized.

Copy link
Member

commented Jul 12, 2017

I recently dropped t.Parallel() from TestStackGrowth because it also had a built-in timeout that made it sensitive to load. Perhaps in general tests with timeouts should not be parallel.

Something I didn't notice before is that every failure on ARM is in the GOMAXPROCS=2 runtime -cpu=1,2,4 section

Note that CgoSignalDeadlock itself sets GOMAXPROCS to 100. But maybe the rest of the system is just more loaded in this test section.

@gopherbot

This comment has been minimized.

Copy link

commented Jul 13, 2017

CL https://golang.org/cl/48233 mentions this issue.

gopherbot pushed a commit that referenced this issue Jul 13, 2017

runtime: don't call t.Parallel in TestCgoSignalDeadlock
It seems that when too much other code is running on the system,
the testprogcgo code can overrun its timeouts.

Updates #18598.

Not marking the issue as fixed until it doesn't recur for some time.

Change-Id: Ieaf106b41986fdda76b1d027bb9d5e3fb805cc3b
Reviewed-on: https://go-review.googlesource.com/48233
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@paulzhol

This comment has been minimized.

Copy link
Member

commented Jul 13, 2017

I'm running the freebsd-arm-paulzhol builder with the following parameters:
GOARM=7 CGO_ENABLED=1 GO_TEST_TIMEOUT_SCALE=16 $HOME/bin/builder -subrepos=false -v -buildTimeout 2h freebsd-arm-paulzhol
It is also doing automatic reboots after each build with some scripts.

It's an ARM Cortex-A7 (Allwinner A20) and has only 1G RAM so it's using a swap partition from a magnetic disk. Could that be what is making it extra slow for the tests?

I can provide ssh access if it would help (or I can help debug).

@broady broady modified the milestones: Go1.9Maybe, Go1.9 Jul 17, 2017

@bradfitz bradfitz modified the milestones: Go1.9Maybe, Go1.10 Jul 20, 2017

@rsc rsc modified the milestones: Go1.10, Go1.11 Nov 22, 2017

@ianlancetaylor ianlancetaylor modified the milestones: Go1.11, Go1.10 Nov 22, 2017

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Jan 3, 2018

This has not failed since June 29, when before then it was failing every few days. I think that removing t.Parallel worked around the problem. Closing.

@golang golang locked and limited conversation to collaborators Jan 3, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.