-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Description
When a C-created thread running outside of Go receives a SIGPROF, we record the sample as coming from runtime._ExternalCode
(https://cs.opensource.google/go/go/+/master:src/runtime/signal_unix.go;l=443;drc=944df9a7516021f0405cd8adb1e6894ae9872cb5).
As of https://go.dev/cl/495855, these threads will have an M if they've ever run Go before. In this case, we use mp.isExtraInC
to track when this thread is running completely outside of Go.
Unfortunately, we have a bug in our handling of this flag that can result in running Go code with isExtraInC
set. This happens like so:
- C-created thread calls into Go function 1 (via cgocallback).
- Go function 1 calls into C function 1 (via cgocall).
- C function 1 calls into Go function 2 (via cgocallback).
- Go function 2 returns back to C function 1 (returning via the remainder of cgocallback).
- C function 1 returns back to Go function 1 (returning via the remainder of cgocall).
- Go function 1 is now running with
mp.isExtraInC == true
.
The problem occurs in step 4. On return, cgocallback
unconditionally sets mp.isExtraInC
, but it should actually only set the flag if there are no more Go frames higher up the stack.
I believe all of the effects of the incorrect flag are:
-
SIGPROF will record a sample with frames
[pc, runtime._ExternalCode]
rather than the actual stack trace. Normally this is just some data loss in profiles, but if the PC corresponds to multiple inlined frames, this can result in apanic: runtime error: slice bounds out of range
crash inruntime/pprof.(*profileBuilder).appendLocsForStack
becauseruntime/pprof
assumes that each PC always corresponds to a constant number of inlined frames. I believe this is the issue described in runtime/pprof: theoretical appendLocsForStack panic with SIGPROF between cgocallback and exitsyscall #70529 (comment). -
Async preemption signals to this thread will be ignored.
-
I'm not certain (need to test), but I think that synchronous signals will be forwarded to the C handler or completely ignored. e.g., SIGILL might just be ignored and thus be triggered in a loop?
cc @mknyszek @cherrymui @nsrip-dd @golang/runtime