Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: lock cycle between mheap_.lock and gcSweepBuf.spineLock #34156

mknyszek opened this issue Sep 6, 2019 · 0 comments

runtime: lock cycle between mheap_.lock and gcSweepBuf.spineLock #34156

mknyszek opened this issue Sep 6, 2019 · 0 comments


Copy link

@mknyszek mknyszek commented Sep 6, 2019

As the title says, there's a runtime lock cycle between mheap_.lock and gcSweepBuf.spineLock. The cycle is effectively of the following form:

On one thread:

  1. (*mheap).alloc_m (which runs on the system stack) allocates a span and calls in (*gcSweepBuf).push.
  2. (*gcSweepBuf).push acquires the spine lock.

Meanwhile on another thread:

  1. deductSweepCredit (on a g's stack) calls into sweepone, then (*mspan).sweep.
  2. (*mspan).sweep calls into (*gcSweepBuf).push.
  3. (*gcSweepBuf).push acquires the spine lock.
  4. (*gcSweepBuf).push calls into either persistentalloc or unlock.
  5. In the prologue of either of these function, a stack growth is triggered which acquires mheap_.lock.

Note that (*gcSweepBuf).push would have the potential for self-deadlock in the alloc_m case, but because it runs on the system stack, stack growths won't happen.

This must be an extremely rare deadlock because git history indicates that it's been around since 2016 and we've never received a single bug report (AFAICT). With that being said, if we want any sort of automated lock cycle detection, we need to fix this.

It's unclear to me what the right thing to do here is. The "easy" thing would be make (*gcSweepBuf).push run on the system stack, that way it'll never trigger a stack growth, but this seems wrong. It feels better to instead only acquire spineLock after mheap_.lock, but this may not be possible. My concern is that the allocated span's sweepgen could end up skewed with respect to the gcSweepBuf it's in, but I haven't looked closely at the concurrency requirements of the relevant pieces.

CC @aclements

@mknyszek mknyszek modified the milestones: Go1.14, Go1.15 Sep 6, 2019
@rsc rsc modified the milestones: Go1.14, Backlog Oct 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants