Skip to content

runtime: lock cycle between mheap_.lock and gcSweepBuf.spineLock #34156

Open
@mknyszek

Description

@mknyszek

As the title says, there's a runtime lock cycle between mheap_.lock and gcSweepBuf.spineLock. The cycle is effectively of the following form:

On one thread:

  1. (*mheap).alloc_m (which runs on the system stack) allocates a span and calls in (*gcSweepBuf).push.
  2. (*gcSweepBuf).push acquires the spine lock.

Meanwhile on another thread:

  1. deductSweepCredit (on a g's stack) calls into sweepone, then (*mspan).sweep.
  2. (*mspan).sweep calls into (*gcSweepBuf).push.
  3. (*gcSweepBuf).push acquires the spine lock.
  4. (*gcSweepBuf).push calls into either persistentalloc or unlock.
  5. In the prologue of either of these function, a stack growth is triggered which acquires mheap_.lock.

Note that (*gcSweepBuf).push would have the potential for self-deadlock in the alloc_m case, but because it runs on the system stack, stack growths won't happen.

This must be an extremely rare deadlock because git history indicates that it's been around since 2016 and we've never received a single bug report (AFAICT). With that being said, if we want any sort of automated lock cycle detection, we need to fix this.

It's unclear to me what the right thing to do here is. The "easy" thing would be make (*gcSweepBuf).push run on the system stack, that way it'll never trigger a stack growth, but this seems wrong. It feels better to instead only acquire spineLock after mheap_.lock, but this may not be possible. My concern is that the allocated span's sweepgen could end up skewed with respect to the gcSweepBuf it's in, but I haven't looked closely at the concurrency requirements of the relevant pieces.

CC @aclements

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.compiler/runtimeIssues related to the Go compiler and/or runtime.

    Type

    No type

    Projects

    Status

    Triage Backlog

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions