New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: re-enable TestCgoCallbackGC flaky on FreeBSD #16396

Closed
bradfitz opened this Issue Jul 17, 2016 · 29 comments

Comments

Projects
None yet
9 participants
@bradfitz
Member

bradfitz commented Jul 17, 2016

I see a number of recent TestCgoCallbackGC flakes on FreeBSD

amd64:
https://build.golang.org/log/c47d9b4fe69df9c91af66122e7633f77cb623794
https://build.golang.org/log/2c93bb5f172cf8082a80bd567628bcea4f437c20
https://build.golang.org/log/92ad760a01e814bb0021486856b4d3123a4a11b7

386:
https://build.golang.org/log/47a3e8462838b94a69459a3e56de5ea852b5e86e
https://build.golang.org/log/d6604ec508fd4166c46bd6fb3e04a4b0fb945959
https://build.golang.org/log/85e6aab2ff5692ee7372f03dd187fc555f8a685c

Most are:

--- FAIL: TestCgoCallbackGC (93.64s)
    crash_test.go:105: testprogcgo CgoCallbackGC exit status: signal: quit (core dumped)
    crash_cgo_test.go:61: expected "OK\n", but got:
FAIL
FAIL    runtime 147.968s

Some are:

--- FAIL: TestCgoCallbackGC (102.14s)
    crash_test.go:105: testprogcgo CgoCallbackGC exit status: signal: quit (core dumped)
    crash_cgo_test.go:61: expected "OK\n", but got:
panic: test timed out after 5m0s

goroutine 232809 [running]:
panic(0x81f2c80, 0x39220b20)
    /tmp/workdir/go/src/runtime/panic.go:500 +0x334 fp=0x392b1fa0 sp=0x392b1f5c
testing.startAlarm.func1()
    /tmp/workdir/go/src/testing/testing.go:918 +0xef fp=0x392b1fe0 sp=0x392b1fa0
runtime.goexit()
    /tmp/workdir/go/src/runtime/asm_386.s:1612 +0x1 fp=0x392b1fe4 sp=0x392b1fe0
created by time.goFunc
    /tmp/workdir/go/src/time/sleep.go:154 +0x30

goroutine 1 [chan receive]:
runtime.gopark(0x8246704, 0x39632d30, 0x8224051, 0xc, 0x17, 0x3)
    /tmp/workdir/go/src/runtime/proc.go:259 +0x12d fp=0x39233cd4 sp=0x39233cc0
runtime.goparkunlock(0x39632d30, 0x8224051, 0xc, 0x17, 0x3)
    /tmp/workdir/go/src/runtime/proc.go:265 +0x45 fp=0x39233cf0 sp=0x39233cd4
...

Why did these start recently?

@bradfitz bradfitz added this to the Go1.7Maybe milestone Jul 17, 2016

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Jul 17, 2016

I think these are other versions of #15658 and #16136. There is something wrong on FreeBSD but I don't understand what it is and I have not been able to recreate it myself.

If anybody has a FreeBSD machine I can ssh into that would help.

@bradfitz

This comment has been minimized.

Member

bradfitz commented Jul 17, 2016

Our FreeBSD builders run on GCE. I can create you an instance there.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Jul 18, 2016

Our FreeBSD builders run on GCE. I can create you an instance there.

That would help if you have time. Thanks.

@bradfitz

This comment has been minimized.

Member

bradfitz commented Jul 18, 2016

@ianlancetaylor, done. Emailed you details.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Jul 19, 2016

I think I was incorrect in thinking that this was related to the other FreeBSD problems. The other FreeBSD problems are less frequent, and I have not been able to recreate them at all.

I now think this is the same as issue #13926: a problem that arises when there are many simultaneous callbacks from C to Go, which is exactly what the CgoCallbackGC test case does. The problem is that on FreeBSD you can surprisingly large delays during the callback. My theory is that when enough of those delays occur, the test times out. That is consistent with the failures reported on the builder, which are in all cases the program crashing from SIGQUIT; SIGQUIT is sent by the runtime test harness when the test program runs for more than 1 minute (runTestProg in runtime/crash_test.go).

Working on a patch.

@gopherbot

This comment has been minimized.

gopherbot commented Jul 19, 2016

CL https://golang.org/cl/25047 mentions this issue.

@gopherbot gopherbot closed this in 50048a4 Jul 20, 2016

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Jul 21, 2016

TestCgoCallbackGC is still failing on FreeBSD on the builders. The change did not fix it, at least not completely.

@adg

This comment has been minimized.

Contributor

adg commented Jul 25, 2016

We may just have to punt on this for Go 1.7, which should be okay since IIUC this is not a regression in Go 1.7. (Or is it?)

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Jul 26, 2016

To the best of my knowledge this is not a regression in 1.7.

@bradfitz

This comment has been minimized.

Member

bradfitz commented Jul 27, 2016

@adg, @broady, should we at least document this as a known issue in the release notes for Go 1.7? And then punt this to Go 1.8 for actually fixing.

It'd be a pity to ship something with known crashes for a "first class port". I'd rather warn FreeBSD users about this bug so they don't deploy something with cgo callbacks in production and see crashes.

@adg

This comment has been minimized.

Contributor

adg commented Jul 27, 2016

Agree, sounds like the right move.

On 27 July 2016 at 15:46, Brad Fitzpatrick notifications@github.com wrote:

@adg https://github.com/adg, @broady https://github.com/broady,
should we at least document this as a known issue in the release notes for
Go 1.7? And then punt this to Go 1.8 for actually fixing.

It'd be a pity to ship something with known crashes for a "first class
port". I'd rather warn FreeBSD users about this bug so they don't deploy
something with cgo callbacks in production and see crashes.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#16396 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AIDilbbWTN44lyUFsJ__eMiYKdLcEpZgks5qZvCsgaJpZM4JOHyK
.

@broady

This comment has been minimized.

Member

broady commented Jul 27, 2016

Maybe, but if it's not a regression then the release notes isn't really the place for this.

We could say something like "Reduced runtime crashes on FreeBSD, but the issue is still present. See Issue 16396."

@bradfitz

This comment has been minimized.

Member

bradfitz commented Jul 27, 2016

I don't think it matters whether it's a regression. Had I known about this in Go 1.6 I would have also argued we need a dedicated "Known Issues" section in our release notes, especially for first class ports.

Or: we demote FreeBSD to no longer be a first-class port.

@broady

This comment has been minimized.

Member

broady commented Jul 27, 2016

I suppose it wasn't known at the time, so we couldn't document as a known issue. SGTM.

Perhaps we should also amend the 1.6 release notes (with a date for said addendum).

@gopherbot

This comment has been minimized.

gopherbot commented Jul 27, 2016

CL https://golang.org/cl/25283 mentions this issue.

@bradfitz bradfitz modified the milestones: Go1.8, Go1.7Maybe Jul 28, 2016

gopherbot pushed a commit that referenced this issue Jul 28, 2016

doc/go1.7.html: add known issues section for FreeBSD crashes
Updates #16396

Change-Id: I7b4f85610e66f2c77c17cf8898cc41d81b2efc8c
Reviewed-on: https://go-review.googlesource.com/25283
Reviewed-by: Chris Broadfoot <cbro@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Andrew Gerrand <adg@golang.org>
@gopherbot

This comment has been minimized.

gopherbot commented Aug 18, 2016

CL https://golang.org/cl/27313 mentions this issue.

gopherbot pushed a commit that referenced this issue Aug 18, 2016

runtime: disable TestCgoCallbackGC on FreeBSD
The trybot flakes are a nuisance.

Updates #16396

Change-Id: I8202adb554391676ba82bca44d784c6a81bf2085
Reviewed-on: https://go-review.googlesource.com/27313
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

@rsc rsc modified the milestones: Go1.9, Go1.8 Nov 11, 2016

@stevenh

This comment has been minimized.

Contributor

stevenh commented Dec 1, 2016

Does anyone have any tips on reproducing this?
I'm currently running with the following and haven't seen the test take over 7 seconds.

setenv GOMAXPROCS 2
runtime> sh -c 'while true; do go test -run=TestCgoCallbackGC -cpu=1,2,4; sleep 1; done'

Test machine is dual E5-2640 @ 2.50GHz running FreeBSD 11.0-RELEASE-p3 go version go1.7.3 freebsd/amd64

@stevenh

This comment has been minimized.

Contributor

stevenh commented Dec 1, 2016

Also testing this on 10.1 (metal) and 10.2 (gce VM) currently nothing longer than 8 seconds, tests are still running.

@stevenh

This comment has been minimized.

Contributor

stevenh commented Dec 2, 2016

Test has been running in a loop on three different machines for over 28 hours not a single failure so far.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Dec 2, 2016

@stevenh Which version of FreeBSD? It's possible that changes to the kernel scheduler have eliminated the problem. As it says in https://golang.org/cl/25047 the problem was much worse on FreeBSD than on GNU/Linux for some reason.

Or of course it's possible that something we've changed in the runtime has fixed the problem, though I don't know what that could be.

@stevenh

This comment has been minimized.

Contributor

stevenh commented Dec 3, 2016

I tested on 10.1 (metal), 10.2 (GCE) and 11 (metal) all under golang 1.7.3.

With all machines clearing well over 24 hours (coming up to 32 now) on tight loop running this test I think it may be good to remove the 604efe1 for 1.8 to see if we can confirm it is indeed no longer an issue.

@stevenh

This comment has been minimized.

Contributor

stevenh commented Dec 5, 2016

All three machines have now been running the test in loop for over 3 days no errors.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Dec 5, 2016

Thanks for trying it so thoroughly. Sounds like this issue can be closed.

@bradfitz

This comment has been minimized.

Member

bradfitz commented Dec 5, 2016

@ianlancetaylor, did you mean to close this? It sounds like we need to at least remove the t.Skip for this test on FreeBSD.

@bradfitz bradfitz reopened this Dec 5, 2016

@bradfitz bradfitz changed the title from runtime: TestCgoCallbackGC flaky on FreeBSD to runtime: re-enable TestCgoCallbackGC flaky on FreeBSD Dec 5, 2016

@gopherbot

This comment has been minimized.

gopherbot commented Dec 5, 2016

CL https://golang.org/cl/33903 mentions this issue.

@gopherbot gopherbot closed this in 8c31f40 Dec 5, 2016

@gopherbot

This comment has been minimized.

gopherbot commented Dec 6, 2016

CL https://golang.org/cl/33926 mentions this issue.

gopherbot pushed a commit that referenced this issue Dec 6, 2016

doc/go1.8: remove mention of #16396
Change-Id: I811e76c9f42505e974bea634d4ded2499e4893db
Reviewed-on: https://go-review.googlesource.com/33926
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@paulzhol

This comment has been minimized.

Member

paulzhol commented Jan 27, 2017

Hello,
I've starting running builds with CGO_ENABLED=1 on freebsd-arm-paulzhol and it fails consistently:

ok  	regexp/syntax	4.107s
--- FAIL: TestCgoCallbackGC (180.11s)
	crash_test.go:105: testprogcgo CgoCallbackGC exit status: exit status 2
	crash_cgo_test.go:61: expected "OK\n", but got:
		SIGQUIT: quit
		PC=0x20263cd0 m=0
		signal arrived during cgo execution
		
		goroutine 166 [syscall, locked to thread]:
		runtime.cgocall(0xc0c78, 0x30a7d7c4, 0x0)
			/tmp/gobuilder/freebsd-arm-paulzhol-6ceba5d5e85b/go/src/runtime/cgocall.go:131 +0x130 fp=0x30a7d7ac sp=0x30a7d790
		main._Cfunc_foo()
			_/tmp/gobuilder/freebsd-arm-paulzhol-6ceba5d5e85b/go/src/runtime/testdata/testprogcgo/_obj/_cgo_gotypes.go:176 +0x38 fp=0x30a7d7c0 sp=0x30a7d7ac
		main.CgoCallbackGC.func2(0x30a5a040)
			/tmp/gobuilder/freebsd-arm-paulzhol-6ceba5d5e85b/go/src/runtime/testdata/testprogcgo/callback.go:81 +0x14 fp=0x30a7d7d4 sp=0x30a7d7c0
		runtime.goexit()
			/tmp/gobuilder/freebsd-arm-paulzhol-6ceba5d5e85b/go/src/runtime/asm_arm.s:998 +0x4 fp=0x30a7d7d4 sp=0x30a7d7d4
		created by main.CgoCallbackGC
			/tmp/gobuilder/freebsd-arm-paulzhol-6ceba5d5e85b/go/src/runtime/testdata/testprogcgo/callback.go:83 +0xd8
		
		goroutine 1 [chan receive]:
		main.CgoCallbackGC()
			/tmp/gobuilder/freebsd-arm-paulzhol-6ceba5d5e85b/go/src/runtime/testdata/testprogcgo/callback.go:86 +0x114
		main.main()
			/tmp/gobuilder/freebsd-arm-paulzhol-6ceba5d5e85b/go/src/runtime/testdata/testprogcgo/main.go:34 +0x1ac
		
		goroutine 17 [runnable, locked to thread]:
		runtime.goexit()
			/tmp/gobuilder/freebsd-arm-paulzhol-6ceba5d5e85b/go/src/runtime/asm_arm.s:998 +0x4

System is an ALLWINNER A20 running FreeBSD 11.0-RELEASE-p2. I've noticed CGO_ENABLED was off by default because the bootstrap 1.4.3 compilers need to be built with CGO_ENABLED=0.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Jan 27, 2017

@paulzhol This issue is closed and the problem you are reporting has different symptoms. Please open a new issue with the details. Thanks.

@paulzhol

This comment has been minimized.

Member

paulzhol commented Jan 27, 2017

@ianlancetaylor I've opened #18811 but I do have the following in my log which look like the "Most are:" example in the original issue:

ok      regexp/syntax   4.205s
--- FAIL: TestCgoCallbackGC (204.17s)
        crash_test.go:106: testprogcgo CgoCallbackGC exit status: signal: quit
        crash_cgo_test.go:66: expected "OK\n", but got:
FAIL
FAIL    runtime 427.836s
ok      runtime/debug   0.088s

But the builder failed to push that:

2017/01/27 02:25:42 Failed: exit status 1
Build complete, duration 41m53.837957541s. Result: error: exit status 1
 RunTime:41m53.837957541s Builder:freebsd-arm-paulzhol PackagePath:]
2017/01/27 02:25:45 Post https://build.golang.org/result?builder=freebsd-arm-paulzhol&key=XXXXXXXXXXXX&version=1: dial tcp: lookup build.golang.org on 192.168.XXX.XXX:53: server misbehaving

@golang golang locked and limited conversation to collaborators Jan 27, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.