Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: SIGQUIT crash loop + startm race condition deadlock #65138

Open
prattmic opened this issue Jan 17, 2024 · 2 comments
Open

runtime: SIGQUIT crash loop + startm race condition deadlock #65138

prattmic opened this issue Jan 17, 2024 · 2 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Milestone

Comments

@prattmic
Copy link
Member

Discovered by @bcmills in #64752 (comment). Bryan's analysis copied here:

Some analysis of the test log reported by watchflakes.

The program ends up running six goroutines in total. When run on my workstation, at the time of the crash the goroutines are parked in the following locations:

In a successful run on my local workstation, I see SIGQUIT: quit logs for five background threads:

In the builder log, we see SIGQUIT: quit logs for only four background threads:

  • one in notetsleep via sysmon (as expected)
  • one in notesleep via templateThread (as expected)
  • one in allocm via schedule, resetspinning, and newm
  • one in stopm via schedule and startlockedm

The thread in allocm looks suspiciously like a race in the runtime. In particular:

  • startm calls mReserveID (incrementing mnext) before it calls newm
  • The newm thread was interrupted by SIGQUIT during allocm, which is before the new thread is actually started for the M (in newm1).

So this looks like a genuine watchdog failure: the kernel defers delivery of the final SIGQUIT signal because all of the threads that could receive it are already handling a SIGQUIT (with that signal presumably masked), and the main thread is spinning in the docrash loop waiting for acknowledgement from a fifth thread that had an M ID reserved but was never actually started.

Perhaps startm needs to block signals from at some point before calling mReserveID until newm has returned?

cc @golang/runtime

@prattmic prattmic added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Jan 17, 2024
@prattmic prattmic added this to the Backlog milestone Jan 17, 2024
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jan 17, 2024
@prattmic
Copy link
Member Author

The new TestGdbCoreCrashThreadBacktrace test exercises the SIGQUIT crash loop and managed to trigger this race at least once, in #64752.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/556356 mentions this issue: runtime: mark TestGdbCoreCrashThreadBacktrace as flaky

gopherbot pushed a commit that referenced this issue Jan 17, 2024
This test exercises the SIGQUIT crash loop and managed to trigger the
race from #65138 at least once.

For #65138.
Fixes #64752.

Change-Id: I11091510aa7ae4f58b1d748e53df2e3e3dbfb323
Reviewed-on: https://go-review.googlesource.com/c/go/+/556356
Auto-Submit: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
ezz-no pushed a commit to ezz-no/go-ezzno that referenced this issue Feb 18, 2024
This test exercises the SIGQUIT crash loop and managed to trigger the
race from golang#65138 at least once.

For golang#65138.
Fixes golang#64752.

Change-Id: I11091510aa7ae4f58b1d748e53df2e3e3dbfb323
Reviewed-on: https://go-review.googlesource.com/c/go/+/556356
Auto-Submit: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Projects
Development

No branches or pull requests

2 participants