Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: lock ordering problem: gscan, profBlock #66004

Open
rsc opened this issue Feb 28, 2024 · 7 comments
Open

runtime: lock ordering problem: gscan, profBlock #66004

rsc opened this issue Feb 28, 2024 · 7 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Milestone

Comments

@rsc
Copy link
Contributor

rsc commented Feb 28, 2024

https://build.golang.org/log/b20065c3dd85e387e63949a5dcd73cf49f3ab7a5 shows a staticlockranking failure, reproduced below.

This looks real: gscan is held while profBlock is acquired. But the ranking order seems to say profBlock has to be less than STACKGROW, which has to be less than gscan. So it does not appear that adjusting the policy will fix the problem.

/cc @aclements

0 : gscan 39 0x0
1 : profBlock 36 0xa70520
fatal error: lock ordering problem

runtime stack:
runtime.throw({0x77977b?, 0x0?})
	/workdir/go/src/runtime/panic.go:1021 +0x5c fp=0x7f840108a780 sp=0x7f840108a750 pc=0x43d77c
runtime.checkRanks(0xc0000a8380, 0x434079?, 0xc0000a8380?)
	/workdir/go/src/runtime/lockrank_on.go:162 +0x236 fp=0x7f840108a7e0 sp=0x7f840108a780 pc=0x40f136
runtime.lockWithRank.func1()
	/workdir/go/src/runtime/lockrank_on.go:87 +0x85 fp=0x7f840108a810 sp=0x7f840108a7e0 pc=0x40ed85
runtime.lockWithRank(0x3?, 0x0?)
	/workdir/go/src/runtime/lockrank_on.go:76 +0x85 fp=0x7f840108a848 sp=0x7f840108a810 pc=0x40ecc5
runtime.lock(...)
	/workdir/go/src/runtime/lock_futex.go:52 pc=0x435394
runtime.saveBlockEventStack(0xff7, 0x1, {0xc0001f6280?, 0x80c00041bfff?, 0x2030001?}, 0x3)
	/workdir/go/src/runtime/mprof.go:749 +0x48 fp=0x7f840108a890 sp=0x7f840108a848 pc=0x435388
runtime.(*mLockProfile).store(0xc0001f6278)
	/workdir/go/src/runtime/mprof.go:733 +0xa5 fp=0x7f840108a8f0 sp=0x7f840108a890 pc=0x4352a5
runtime.(*mLockProfile).recordUnlock(0xc0001f6278?, 0x0?)
	/workdir/go/src/runtime/mprof.go:673 +0x47 fp=0x7f840108a908 sp=0x7f840108a8f0 pc=0x434fc7
runtime.unlock2(0x2030000?)
	/workdir/go/src/runtime/lock_futex.go:134 +0x5d fp=0x7f840108a930 sp=0x7f840108a908 pc=0x40e6dd
runtime.unlockWithRank.func1()
	/workdir/go/src/runtime/lockrank_on.go:188 +0xed fp=0x7f840108a988 sp=0x7f840108a930 pc=0x40f36d
runtime.unlockWithRank(0xa16f00?)
	/workdir/go/src/runtime/lockrank_on.go:174 +0x77 fp=0x7f840108a9b8 sp=0x7f840108a988 pc=0x40f257
runtime.unlock(...)
	/workdir/go/src/runtime/lock_futex.go:121 pc=0x42dd54
runtime.(*mheap).allocSpan(0xa16f00, 0x4, 0x3, 0x0)
	/workdir/go/src/runtime/mheap.go:1262 +0x227 fp=0x7f840108aa58 sp=0x7f840108a9b8 pc=0x42dd47
runtime.(*mheap).allocManual(0x40f7a0?, 0xc0000a8380?, 0x2c?)
	/workdir/go/src/runtime/mheap.go:990 +0x19 fp=0x7f840108aa80 sp=0x7f840108aa58 pc=0x42d6d9
runtime.getempty.func1()
	/workdir/go/src/runtime/mgcwork.go:378 +0x2d fp=0x7f840108aab0 sp=0x7f840108aa80 pc=0x42c40d
runtime.getempty()
	/workdir/go/src/runtime/mgcwork.go:377 +0xfc fp=0x7f840108ab10 sp=0x7f840108aab0 pc=0x42c29c
runtime.(*stackScanState).addObject(0x7f840108ac38, 0xc000503fb8, 0x7b6c28)
	/workdir/go/src/runtime/mgcstack.go:274 +0x2f fp=0x7f840108ab38 sp=0x7f840108ab10 pc=0x42976f
runtime.scanframeworker(0x7f840108abd8, 0x7f840108ac38, 0xc000038668)
	/workdir/go/src/runtime/mgcmark.go:1095 +0x1fd fp=0x7f840108ab98 sp=0x7f840108ab38 pc=0x4242fd
runtime.scanstack(0xc0004508c0, 0xc000038668)
	/workdir/go/src/runtime/mgcmark.go:898 +0x267 fp=0x7f840108acc8 sp=0x7f840108ab98 pc=0x423be7
runtime.markroot.func1()
	/workdir/go/src/runtime/mgcmark.go:239 +0xb5 fp=0x7f840108ad18 sp=0x7f840108acc8 pc=0x422895
runtime.markroot(0xc000038668, 0x72, 0x0)
	/workdir/go/src/runtime/mgcmark.go:213 +0x1a8 fp=0x7f840108adc0 sp=0x7f840108ad18 pc=0x422528
runtime.gcDrainN(0xc000038668, 0x10000)
	/workdir/go/src/runtime/mgcmark.go:1321 +0x15d fp=0x7f840108adf0 sp=0x7f840108adc0 pc=0x4248dd
runtime.gcAssistAlloc1(0xc000196e00, 0x10000)
	/workdir/go/src/runtime/mgcmark.go:663 +0x112 fp=0x7f840108ae50 sp=0x7f840108adf0 pc=0x423412
runtime.gcAssistAlloc.func1()
	/workdir/go/src/runtime/mgcmark.go:554 +0x1b fp=0x7f840108ae70 sp=0x7f840108ae50 pc=0x4232db
runtime.systemstack(0x800000)
	/workdir/go/src/runtime/asm_amd64.s:509 +0x4a fp=0x7f840108ae80 sp=0x7f840108ae70 pc=0x4750aa
@rsc rsc added this to the Go1.23 milestone Feb 28, 2024
@rsc rsc added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 28, 2024
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Feb 28, 2024
@mknyszek
Copy link
Contributor

mknyszek commented Mar 6, 2024

This appears to be related to the addition of runtime lock profiling to the mutex profile.

CC @rhysh @prattmic

@rhysh
Copy link
Contributor

rhysh commented Mar 6, 2024

@mknyszek , is this the same problem described in #64706 (comment) ?

That was tabled in December because 1/ it was test-only and 2/ it was too late in the release cycle to change Gscan users to also note their non-preemptibility in gp.m.locks. Do those still hold?

@rsc
Copy link
Contributor Author

rsc commented Mar 9, 2024

I continue to see these on the dashboard at a low rate. Marking release-blocker. It sounds like we could remove runtime locks from the mutex profile at the very least.

@rsc
Copy link
Contributor Author

rsc commented Mar 9, 2024

Looks like more reports are accumulating on #58277.

rhysh added a commit to rhysh/go that referenced this issue Mar 12, 2024
Profiling of runtime-internal locks checks gp.m.locks to see if it's
safe to add a new record to the profile, but direct use of
acquireLockRank can change the list of the M's active lock ranks without
updating gp.m.locks to match. The runtime's internal rwmutex
implementation makes a point of calling acquirem/releasem when
manipulating the lock rank list, but the other user of acquireLockRank
(the GC's Gscan bit) relied on the GC's invariants to avoid deadlocks.

Codify the rwmutex approach by having acquireLockRank (now called
acquireLockRankAndM) include a call to aquirem. Do the same for release.

For golang#64706
For golang#66004
@gopherbot
Copy link

Change https://go.dev/cl/571056 mentions this issue: runtime: always acquire M when acquiring locks by rank

@mknyszek mknyszek self-assigned this Mar 13, 2024
rhysh added a commit to rhysh/go that referenced this issue Mar 18, 2024
Profiling of runtime-internal locks checks gp.m.locks to see if it's
safe to add a new record to the profile, but direct use of
acquireLockRank can change the list of the M's active lock ranks without
updating gp.m.locks to match. The runtime's internal rwmutex
implementation makes a point of calling acquirem/releasem when
manipulating the lock rank list, but the other user of acquireLockRank
(the GC's Gscan bit) relied on the GC's invariants to avoid deadlocks.

Codify the rwmutex approach by adding a variant of acquireLockRank,
acquireLockRankAndM, include a call to aquirem. Do the same for release.

Leave runtime/time.go's use of the old variants intact for the moment.

For golang#64706
For golang#66004

Change-Id: Id18e4d8de1036de743d2937fad002c6feebe2faf
rhysh added a commit to rhysh/go that referenced this issue Mar 18, 2024
Profiling of runtime-internal locks checks gp.m.locks to see if it's
safe to add a new record to the profile, but direct use of
acquireLockRank can change the list of the M's active lock ranks without
updating gp.m.locks to match. The runtime's internal rwmutex
implementation makes a point of calling acquirem/releasem when
manipulating the lock rank list, but the other user of acquireLockRank
(the GC's Gscan bit) relied on the GC's invariants to avoid deadlocks.

Codify the rwmutex approach by adding a variant of acquireLockRank,
acquireLockRankAndM, include a call to aquirem. Do the same for release.

Leave runtime/time.go's use of the old variants intact for the moment.

For golang#64706
For golang#66004

Change-Id: Id18e4d8de1036de743d2937fad002c6feebe2faf
@cagedmantis
Copy link
Contributor

Is there any status update to report on this issue?

rhysh added a commit to rhysh/go that referenced this issue Apr 16, 2024
Profiling of runtime-internal locks checks gp.m.locks to see if it's
safe to add a new record to the profile, but direct use of
acquireLockRank can change the list of the M's active lock ranks without
updating gp.m.locks to match. The runtime's internal rwmutex
implementation makes a point of calling acquirem/releasem when
manipulating the lock rank list, but the other user of acquireLockRank
(the GC's Gscan bit) relied on the GC's invariants to avoid deadlocks.

Codify the rwmutex approach by renaming acquireLockRank to
acquireLockRankAndM and having it include a call to aquirem. Do the same
for release.

For golang#64706
For golang#66004

Change-Id: Id18e4d8de1036de743d2937fad002c6feebe2faf
@rhysh
Copy link
Contributor

rhysh commented Apr 16, 2024

I had to step away from https://go.dev/cl/571056 for a few weeks, but it's once again ready for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Projects
Development

No branches or pull requests

5 participants