-
Notifications
You must be signed in to change notification settings - Fork 18k
runtime: cgo callback on extra M treated as external code after nested cgo callback returns #72870
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@gopherbot Please backport to 1.23 and 1.24. This bug can cause arbitrary crashes when CPU profiling is enabled. |
Backport issue(s) opened: #72871 (for 1.23), #72872 (for 1.24). Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases. |
Change https://go.dev/cl/658035 mentions this issue: |
Change https://go.dev/cl/658055 mentions this issue: |
Change https://go.dev/cl/658056 mentions this issue: |
Change https://go.dev/cl/658415 mentions this issue: |
CL 658035 added TestCgoCallbackPprof, which is consistently failing on solaris. runtime/pprof maintains a list of platforms where CPU profiling does not work properly. Since this test requires CPU profiling, skip the this test on those platforms. For #72870. Fixes #72876. Change-Id: I6a6a636cbf6b16abcbba8771178fe1d001be9d9b Reviewed-on: https://go-review.googlesource.com/c/go/+/658415 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
Change https://go.dev/cl/658416 mentions this issue: |
Change https://go.dev/cl/658435 mentions this issue: |
…Go frames left mp.isExtraInC is intended to indicate that this M has no Go frames at all; it is entirely executing in C. If there was a cgocallback to Go and then a cgocall to C, such that the leaf frames are C, that is fine. e.g., traceback can handle this fine with SetCgoTraceback (or by simply skipping the C frames). However, we currently mismanage isExtraInC, unconditionally setting it on return from cgocallback. This means that if there are two levels of cgocallback, we end up running Go code with isExtraInC set. 1. C-created thread calls into Go function 1 (via cgocallback). 2. Go function 1 calls into C function 1 (via cgocall). 3. C function 1 calls into Go function 2 (via cgocallback). 4. Go function 2 returns back to C function 1 (returning via the remainder of cgocallback). 5. C function 1 returns back to Go function 1 (returning via the remainder of cgocall). 6. Go function 1 is now running with mp.isExtraInC == true. The fix is simple; only set isExtraInC on return from cgocallback if there are no more Go frames. There can't be more Go frames unless there is an active cgocall out of the Go frames. For #72870. Fixes #72872. Cq-Include-Trybots: luci.golang.try:go1.24-linux-amd64-longtest Change-Id: I6a6a636c4e7ba75a29639d7036c5af3738033467 Reviewed-on: https://go-review.googlesource.com/c/go/+/658035 Reviewed-by: Cherry Mui <cherryyz@google.com> Commit-Queue: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> (cherry picked from commit 577bb3d) Reviewed-on: https://go-review.googlesource.com/c/go/+/658056 Auto-Submit: Dmitri Shuralyov <dmitshur@google.com>
…ms with broken profiling CL 658035 added TestCgoCallbackPprof, which is consistently failing on solaris. runtime/pprof maintains a list of platforms where CPU profiling does not work properly. Since this test requires CPU profiling, skip the this test on those platforms. For #72870. For #72876. For #72872. Change-Id: I6a6a636cbf6b16abcbba8771178fe1d001be9d9b Reviewed-on: https://go-review.googlesource.com/c/go/+/658415 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-on: https://go-review.googlesource.com/c/go/+/658416 Auto-Submit: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
…Go frames left mp.isExtraInC is intended to indicate that this M has no Go frames at all; it is entirely executing in C. If there was a cgocallback to Go and then a cgocall to C, such that the leaf frames are C, that is fine. e.g., traceback can handle this fine with SetCgoTraceback (or by simply skipping the C frames). However, we currently mismanage isExtraInC, unconditionally setting it on return from cgocallback. This means that if there are two levels of cgocallback, we end up running Go code with isExtraInC set. 1. C-created thread calls into Go function 1 (via cgocallback). 2. Go function 1 calls into C function 1 (via cgocall). 3. C function 1 calls into Go function 2 (via cgocallback). 4. Go function 2 returns back to C function 1 (returning via the remainder of cgocallback). 5. C function 1 returns back to Go function 1 (returning via the remainder of cgocall). 6. Go function 1 is now running with mp.isExtraInC == true. The fix is simple; only set isExtraInC on return from cgocallback if there are no more Go frames. There can't be more Go frames unless there is an active cgocall out of the Go frames. For #72870. Fixes #72871. Cq-Include-Trybots: luci.golang.try:go1.23-linux-amd64-longtest Change-Id: I6a6a636c4e7ba75a29639d7036c5af3738033467 Reviewed-on: https://go-review.googlesource.com/c/go/+/658035 Reviewed-by: Cherry Mui <cherryyz@google.com> Commit-Queue: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> (cherry picked from commit 577bb3d) Reviewed-on: https://go-review.googlesource.com/c/go/+/658055 Auto-Submit: Dmitri Shuralyov <dmitshur@google.com>
…ms with broken profiling CL 658035 added TestCgoCallbackPprof, which is consistently failing on solaris. runtime/pprof maintains a list of platforms where CPU profiling does not work properly. Since this test requires CPU profiling, skip the this test on those platforms. For #72870. For #72876. For #72871. Change-Id: I6a6a636cbf6b16abcbba8771178fe1d001be9d9b Reviewed-on: https://go-review.googlesource.com/c/go/+/658415 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-on: https://go-review.googlesource.com/c/go/+/658435 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Auto-Submit: Dmitri Shuralyov <dmitshur@google.com> TryBot-Bypass: Dmitri Shuralyov <dmitshur@google.com>
When a C-created thread running outside of Go receives a SIGPROF, we record the sample as coming from
runtime._ExternalCode
(https://cs.opensource.google/go/go/+/master:src/runtime/signal_unix.go;l=443;drc=944df9a7516021f0405cd8adb1e6894ae9872cb5).As of https://go.dev/cl/495855, these threads will have an M if they've ever run Go before. In this case, we use
mp.isExtraInC
to track when this thread is running completely outside of Go.Unfortunately, we have a bug in our handling of this flag that can result in running Go code with
isExtraInC
set. This happens like so:mp.isExtraInC == true
.The problem occurs in step 4. On return,
cgocallback
unconditionally setsmp.isExtraInC
, but it should actually only set the flag if there are no more Go frames higher up the stack.I believe all of the effects of the incorrect flag are:
SIGPROF will record a sample with frames
[pc, runtime._ExternalCode]
rather than the actual stack trace. Normally this is just some data loss in profiles, but if the PC corresponds to multiple inlined frames, this can result in apanic: runtime error: slice bounds out of range
crash inruntime/pprof.(*profileBuilder).appendLocsForStack
becauseruntime/pprof
assumes that each PC always corresponds to a constant number of inlined frames. I believe this is the issue described in runtime/pprof: theoretical appendLocsForStack panic with SIGPROF between cgocallback and exitsyscall #70529 (comment).Async preemption signals to this thread will be ignored.
I'm not certain (need to test), but I think that synchronous signals will be forwarded to the C handler or completely ignored. e.g., SIGILL might just be ignored and thus be triggered in a loop?
cc @mknyszek @cherrymui @nsrip-dd @golang/runtime
The text was updated successfully, but these errors were encountered: