Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: runtime.GC can return without finishing sweep #45315

Open
aclements opened this issue Mar 31, 2021 · 8 comments
Open

runtime: runtime.GC can return without finishing sweep #45315

aclements opened this issue Mar 31, 2021 · 8 comments
Assignees
Milestone

Comments

@aclements
Copy link
Member

@aclements aclements commented Mar 31, 2021

runtime.TestGCTestIsReachable has been flaky since I added it. The failures look like:

##### GOMAXPROCS=2 runtime -cpu=1,2,4 -quick
runtime: object 0 was not swept
fatal error: IsReachable failed

goroutine 176934 [running]:
runtime.throw(0x688d74, 0x12)
	/workdir/go/src/runtime/panic.go:1126 +0x74 fp=0xc000276628 sp=0xc0002765f8 pc=0x43e334
runtime.gcTestIsReachable(0xc00108e200, 0x10, 0x10, 0x4)
	/workdir/go/src/runtime/mgc.go:2397 +0x2ad fp=0xc000276690 sp=0xc000276628 pc=0x423a0d
runtime.GCTestIsReachable(...)
	/workdir/go/src/runtime/export_test.go:1256
runtime_test.TestGCTestIsReachable(0xc00090d380)
	/workdir/go/src/runtime/gc_test.go:249 +0x25c fp=0xc000276780 sp=0xc000276690 pc=0x58a8fc
testing.tRunner(0xc00090d380, 0x69c980)
	/workdir/go/src/testing/testing.go:1235 +0x102 fp=0xc0002767d0 sp=0xc000276780 pc=0x4eee02
runtime.goexit()
	/workdir/go/src/runtime/asm_amd64.s:1454 +0x1 fp=0xc0002767d8 sp=0xc0002767d0 pc=0x47b861
created by testing.(*T).Run
	/workdir/go/src/testing/testing.go:1280 +0x3ab

Some examples:

https://build.golang.org/log/6664ed5c503173b3f29eb85cb093baa61fb04838
https://build.golang.org/log/5608852a242f93bb2f36b6a56bebdefda3797e82
https://build.golang.org/log/231e6d68ccc2b858f8b002b45596356d8beb856c

I'm sure this has something to do with GC scheduling.

I haven't had any luck reproducing this locally. I'm now trying on the gomotes.

gopool create -setup 'gomote push $VM && gomote run $VM go/src/make.bash && gomote run $VM go/bin/go test -c -o ../src/runtime/runtime.test runtime' linux-amd64-sid
stress2 -p 5 gopool run 'gomote run $VM go/src/runtime/runtime.test -test.short -test.cpu=1,2,4'
@aclements aclements added this to the Go1.17 milestone Mar 31, 2021
@aclements aclements self-assigned this Mar 31, 2021
@aclements
Copy link
Member Author

@aclements aclements commented Apr 1, 2021

I've added some debuglogs and am running with

gopool create -setup 'gomote push $VM && gomote run $VM go/src/make.bash && gomote run $VM go/bin/go test -c -o ../src/runtime/runtime.test -tags debuglog runtime' linux-amd64-sid
stress2 -p 5 -timeout 20m gopool run 'gomote run $VM go/src/runtime/runtime.test -quick -test.cpu=1,2,4'

Here's a debug log from a test failure:

[405.756910307 P 0] GC STW
[405.756935357 P 0] GC cycle 29670
[405.758808137 P 1] mark term release gcsema
[405.760135621 P 1] gcTest:semacquire
[405.760137842 P 1] gcTest:addSpecial  0xc00028c000
[405.760139981 P 1] gcTest:addSpecial  0xc00028c028
[405.760140720 P 1] gcTest:addSpecial  0xc00028c030
[405.760141367 P 1] gcTest:addSpecial  0xc00028c038
[405.760141901 P 1] gcTest:addSpecial  0xc00028c040
[405.760142377 P 1] gcTest:addSpecial  0xc00028c048
[405.760142987 P 1] gcTest:addSpecial  0xc00028c050
[405.760143558 P 1] gcTest:addSpecial  0xc00028c058
[405.760144140 P 1] gcTest:addSpecial  0xc00028c060
[405.760144730 P 1] gcTest:addSpecial  0xc00028c068
[405.760145278 P 1] gcTest:addSpecial  0xc00028c070
[405.760145665 P 1] gcTest:addSpecial  0xc00028c078
[405.760146214 P 1] gcTest:addSpecial  0xc00028c080
[405.760146700 P 1] gcTest:addSpecial  0xc00028c088
[405.760147205 P 1] gcTest:addSpecial  0xc00028c090
[405.760147760 P 1] gcTest:addSpecial  0xc00028c098
[405.760148314 P 1] gcTest:semrelease
[405.760148930 P 1] GC n= 29670
[405.760149453 P 1] GC:gcStart  29671
[405.760150119 P 1] acquire gcsema
[405.760387061 P 1] GC STW
[405.760405580 P 1] GC cycle 29671
[405.760424513 P 1] GC:gcWaitOnMark  29671
[405.762090357 P 0] GC:sweepone  29671
[405.762322549 P 0] gcTest:check
[405.762383414 P 1] freeSpecial 0xc00028c000 29671
[405.762384649 P 1] freeSpecial 0xc00028c028 29671
[405.762385050 P 1] freeSpecial 0xc00028c030 29671
[405.762385481 P 1] freeSpecial 0xc00028c038 29671
[405.762385777 P 1] freeSpecial 0xc00028c040 29671
[405.762386091 P 1] freeSpecial 0xc00028c048 29671
[405.762386348 P 1] freeSpecial 0xc00028c050 29671
[405.762386697 P 1] freeSpecial 0xc00028c058 29671
[405.762387014 P 1] freeSpecial 0xc00028c060 29671
[405.762387346 P 1] freeSpecial 0xc00028c068 29671
[405.762387692 P 1] freeSpecial 0xc00028c070 29671
[405.762387966 P 1] freeSpecial 0xc00028c078 29671
[405.762388174 P 1] freeSpecial 0xc00028c080 29671
[405.762388718 P 1] freeSpecial 0xc00028c088 29671
[405.762389014 P 1] freeSpecial 0xc00028c090 29671
[405.762389380 P 1] freeSpecial 0xc00028c098 29671

So the test is triggering a GC that ultimately frees the objects, but runtime.GC() is returning before the GC cycle is complete.

@aclements
Copy link
Member Author

@aclements aclements commented Apr 1, 2021

Some progress. runtime.GC() is successfully waiting for the mark to complete, but returning before the sweep is complete. We could work around this in IsReachable, but this is actually a bug inruntime.GC().

@aclements
Copy link
Member Author

@aclements aclements commented Apr 1, 2021

I think I understand what's going on. Sweeping tracks two different types of completion: "we've swept or started sweeping all spans" (confusingly called mheap_.sweepdone) and "all spans have been swept". GC correctness depends on mheap_.sweepdone and we get that one right, but runtime.GC() and a few other minor things depend on the latter condition, and we get that wrong.

For "all spans have been swept", we track that we've exited all calls to sweepone and there are no more spans that haven't started sweeping, but that's not the only way that spans get swept. Hence, the final sweepone could exit with no more spans in the queue and declare that all spans have been swept, but there could still be concurrent sweeps from other sources working on the last few spans. In TestGCTestIsReachable, the result is that runtime.GC() returns while the span with all the test's objects is actively being swept.

@aclements aclements changed the title runtime: TestGCTestIsReachable flakes runtime: runtime.GC can return without finishing sweep Apr 1, 2021
@bcmills
Copy link
Member

@bcmills bcmills commented Apr 6, 2021

Do you expect to have a fix soon, or can the test be disabled until it is fixed?
I'm concerned that the failure rate on the builders could mask regressions that lead to other low-frequency failures.

@gopherbot
Copy link

@gopherbot gopherbot commented Apr 6, 2021

Change https://golang.org/cl/307809 mentions this issue: runtime: deflake TestGCTestIsReachable

@aclements
Copy link
Member Author

@aclements aclements commented Apr 6, 2021

I just committed a workaround in TestGCTestIsReachable that should stop the test flakes.

gopherbot pushed a commit that referenced this issue Apr 6, 2021
This is a simple workaround for a bug where runtime.GC() can return
before finishing a full sweep, causing gcTestIsReachable to throw. The
right thing is to fix runtime.GC(), but this should get this test
passing reliably in the meantime.

Updates #45315.

Change-Id: Iae141e6dbb26a9c2649497c1feedd4aaeaf540c7
Reviewed-on: https://go-review.googlesource.com/c/go/+/307809
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
@gopherbot
Copy link

@gopherbot gopherbot commented Apr 7, 2021

Change https://golang.org/cl/307916 mentions this issue: runtime: consolidate "is sweep done" conditions

@gopherbot
Copy link

@gopherbot gopherbot commented Apr 7, 2021

Change https://golang.org/cl/307915 mentions this issue: runtime: block sweep completion on all sweep paths

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants