-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Description
Discovered by @bcmills in #64752 (comment). Bryan's analysis copied here:
Some analysis of the test log reported by watchflakes.
The program ends up running six goroutines in total. When run on my workstation, at the time of the crash the goroutines are parked in the following locations:
maininC.trigger_crashforcegchelperingoparkunlockbgsweepingoparkunlockbgscavengeingoparkunlockrunfinqingoparksysmoninnotetsleep- (Note that the
sysmongoroutine is not included in the output fromtracebackothers— I guess because it doesn't have an associated P or G?)
- (Note that the
In a successful run on my local workstation, I see SIGQUIT: quit logs for five background threads:
- one in
notetsleepviasysmon - one in
notesleepviatemplateThread - three in
stopmviascheduleandfindRunnable
In the builder log, we see SIGQUIT: quit logs for only four background threads:
- one in
notetsleepviasysmon(as expected) - one in
notesleepviatemplateThread(as expected) - one in
allocmviaschedule,resetspinning, andnewm - one in
stopmviascheduleandstartlockedm
The thread in allocm looks suspiciously like a race in the runtime. In particular:
startmcallsmReserveID(incrementingmnext) before it callsnewm- The
newmthread was interrupted bySIGQUITduringallocm, which is before the new thread is actually started for the M (innewm1).
So this looks like a genuine watchdog failure: the kernel defers delivery of the final SIGQUIT signal because all of the threads that could receive it are already handling a SIGQUIT (with that signal presumably masked), and the main thread is spinning in the docrash loop waiting for acknowledgement from a fifth thread that had an M ID reserved but was never actually started.
Perhaps startm needs to block signals from at some point before calling mReserveID until newm has returned?
cc @golang/runtime
Metadata
Metadata
Assignees
Labels
Type
Projects
Status