Description
As the title says, there's a runtime lock cycle between mheap_.lock
and gcSweepBuf.spineLock
. The cycle is effectively of the following form:
On one thread:
(*mheap).alloc_m
(which runs on the system stack) allocates a span and calls in(*gcSweepBuf).push
.(*gcSweepBuf).push
acquires the spine lock.
Meanwhile on another thread:
deductSweepCredit
(on a g's stack) calls intosweepone
, then(*mspan).sweep
.(*mspan).sweep
calls into(*gcSweepBuf).push
.(*gcSweepBuf).push
acquires the spine lock.(*gcSweepBuf).push
calls into eitherpersistentalloc
orunlock
.- In the prologue of either of these function, a stack growth is triggered which acquires
mheap_.lock
.
Note that (*gcSweepBuf).push
would have the potential for self-deadlock in the alloc_m
case, but because it runs on the system stack, stack growths won't happen.
This must be an extremely rare deadlock because git
history indicates that it's been around since 2016 and we've never received a single bug report (AFAICT). With that being said, if we want any sort of automated lock cycle detection, we need to fix this.
It's unclear to me what the right thing to do here is. The "easy" thing would be make (*gcSweepBuf).push
run on the system stack, that way it'll never trigger a stack growth, but this seems wrong. It feels better to instead only acquire spineLock
after mheap_.lock
, but this may not be possible. My concern is that the allocated span's sweepgen
could end up skewed with respect to the gcSweepBuf
it's in, but I haven't looked closely at the concurrency requirements of the relevant pieces.
CC @aclements
Metadata
Metadata
Assignees
Labels
Type
Projects
Status