Skip to content

runtime: cgo callback on extra M treated as external code after nested cgo callback returns #72870

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
prattmic opened this issue Mar 14, 2025 · 8 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Critical A critical problem that affects the availability or correctness of production systems built using Go NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@prattmic
Copy link
Member

When a C-created thread running outside of Go receives a SIGPROF, we record the sample as coming from runtime._ExternalCode (https://cs.opensource.google/go/go/+/master:src/runtime/signal_unix.go;l=443;drc=944df9a7516021f0405cd8adb1e6894ae9872cb5).

As of https://go.dev/cl/495855, these threads will have an M if they've ever run Go before. In this case, we use mp.isExtraInC to track when this thread is running completely outside of Go.

Unfortunately, we have a bug in our handling of this flag that can result in running Go code with isExtraInC set. This happens like so:

  1. C-created thread calls into Go function 1 (via cgocallback).
  2. Go function 1 calls into C function 1 (via cgocall).
  3. C function 1 calls into Go function 2 (via cgocallback).
  4. Go function 2 returns back to C function 1 (returning via the remainder of cgocallback).
  5. C function 1 returns back to Go function 1 (returning via the remainder of cgocall).
  6. Go function 1 is now running with mp.isExtraInC == true.

The problem occurs in step 4. On return, cgocallback unconditionally sets mp.isExtraInC, but it should actually only set the flag if there are no more Go frames higher up the stack.

I believe all of the effects of the incorrect flag are:

  1. SIGPROF will record a sample with frames [pc, runtime._ExternalCode] rather than the actual stack trace. Normally this is just some data loss in profiles, but if the PC corresponds to multiple inlined frames, this can result in a panic: runtime error: slice bounds out of range crash in runtime/pprof.(*profileBuilder).appendLocsForStack because runtime/pprof assumes that each PC always corresponds to a constant number of inlined frames. I believe this is the issue described in runtime/pprof: theoretical appendLocsForStack panic with SIGPROF between cgocallback and exitsyscall #70529 (comment).

  2. Async preemption signals to this thread will be ignored.

  3. I'm not certain (need to test), but I think that synchronous signals will be forwarded to the C handler or completely ignored. e.g., SIGILL might just be ignored and thus be triggered in a loop?

cc @mknyszek @cherrymui @nsrip-dd @golang/runtime

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Mar 14, 2025
@prattmic prattmic self-assigned this Mar 14, 2025
@prattmic prattmic added NeedsFix The path to resolution is known, but the work has not been done. Critical A critical problem that affects the availability or correctness of production systems built using Go labels Mar 14, 2025
@prattmic prattmic added this to the Go1.25 milestone Mar 14, 2025
@prattmic
Copy link
Member Author

@gopherbot Please backport to 1.23 and 1.24. This bug can cause arbitrary crashes when CPU profiling is enabled.

@gopherbot
Copy link
Contributor

Backport issue(s) opened: #72871 (for 1.23), #72872 (for 1.24).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/658035 mentions this issue: runtime: only set isExtraInC if there are no Go frames left

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/658055 mentions this issue: [release-branch.go1.23] runtime: only set isExtraInC if there are no Go frames left

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/658056 mentions this issue: [release-branch.go1.24] runtime: only set isExtraInC if there are no Go frames left

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/658415 mentions this issue: runtime: skip TestCgoCallbackPprof on platforms with broken profiling

gopherbot pushed a commit that referenced this issue Mar 17, 2025
CL 658035 added TestCgoCallbackPprof, which is consistently failing on
solaris. runtime/pprof maintains a list of platforms where CPU profiling
does not work properly. Since this test requires CPU profiling, skip the
this test on those platforms.

For #72870.
Fixes #72876.

Change-Id: I6a6a636cbf6b16abcbba8771178fe1d001be9d9b
Reviewed-on: https://go-review.googlesource.com/c/go/+/658415
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/658416 mentions this issue: [release-branch.go1.24] runtime: skip TestCgoCallbackPprof on platforms with broken profiling

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/658435 mentions this issue: [release-branch.go1.23] runtime: skip TestCgoCallbackPprof on platforms with broken profiling

gopherbot pushed a commit that referenced this issue Mar 25, 2025
…Go frames left

mp.isExtraInC is intended to indicate that this M has no Go frames at
all; it is entirely executing in C.

If there was a cgocallback to Go and then a cgocall to C, such that the
leaf frames are C, that is fine. e.g., traceback can handle this fine
with SetCgoTraceback (or by simply skipping the C frames).

However, we currently mismanage isExtraInC, unconditionally setting it
on return from cgocallback. This means that if there are two levels of
cgocallback, we end up running Go code with isExtraInC set.

1. C-created thread calls into Go function 1 (via cgocallback).
2. Go function 1 calls into C function 1 (via cgocall).
3. C function 1 calls into Go function 2 (via cgocallback).
4. Go function 2 returns back to C function 1 (returning via the remainder of cgocallback).
5. C function 1 returns back to Go function 1 (returning via the remainder of cgocall).
6. Go function 1 is now running with mp.isExtraInC == true.

The fix is simple; only set isExtraInC on return from cgocallback if
there are no more Go frames. There can't be more Go frames unless there
is an active cgocall out of the Go frames.

For #72870.
Fixes #72872.

Cq-Include-Trybots: luci.golang.try:go1.24-linux-amd64-longtest
Change-Id: I6a6a636c4e7ba75a29639d7036c5af3738033467
Reviewed-on: https://go-review.googlesource.com/c/go/+/658035
Reviewed-by: Cherry Mui <cherryyz@google.com>
Commit-Queue: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
(cherry picked from commit 577bb3d)
Reviewed-on: https://go-review.googlesource.com/c/go/+/658056
Auto-Submit: Dmitri Shuralyov <dmitshur@google.com>
gopherbot pushed a commit that referenced this issue Mar 25, 2025
…ms with broken profiling

CL 658035 added TestCgoCallbackPprof, which is consistently failing on
solaris. runtime/pprof maintains a list of platforms where CPU profiling
does not work properly. Since this test requires CPU profiling, skip the
this test on those platforms.

For #72870.
For #72876.
For #72872.

Change-Id: I6a6a636cbf6b16abcbba8771178fe1d001be9d9b
Reviewed-on: https://go-review.googlesource.com/c/go/+/658415
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-on: https://go-review.googlesource.com/c/go/+/658416
Auto-Submit: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
gopherbot pushed a commit that referenced this issue Mar 25, 2025
…Go frames left

mp.isExtraInC is intended to indicate that this M has no Go frames at
all; it is entirely executing in C.

If there was a cgocallback to Go and then a cgocall to C, such that the
leaf frames are C, that is fine. e.g., traceback can handle this fine
with SetCgoTraceback (or by simply skipping the C frames).

However, we currently mismanage isExtraInC, unconditionally setting it
on return from cgocallback. This means that if there are two levels of
cgocallback, we end up running Go code with isExtraInC set.

1. C-created thread calls into Go function 1 (via cgocallback).
2. Go function 1 calls into C function 1 (via cgocall).
3. C function 1 calls into Go function 2 (via cgocallback).
4. Go function 2 returns back to C function 1 (returning via the remainder of cgocallback).
5. C function 1 returns back to Go function 1 (returning via the remainder of cgocall).
6. Go function 1 is now running with mp.isExtraInC == true.

The fix is simple; only set isExtraInC on return from cgocallback if
there are no more Go frames. There can't be more Go frames unless there
is an active cgocall out of the Go frames.

For #72870.
Fixes #72871.

Cq-Include-Trybots: luci.golang.try:go1.23-linux-amd64-longtest
Change-Id: I6a6a636c4e7ba75a29639d7036c5af3738033467
Reviewed-on: https://go-review.googlesource.com/c/go/+/658035
Reviewed-by: Cherry Mui <cherryyz@google.com>
Commit-Queue: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
(cherry picked from commit 577bb3d)
Reviewed-on: https://go-review.googlesource.com/c/go/+/658055
Auto-Submit: Dmitri Shuralyov <dmitshur@google.com>
gopherbot pushed a commit that referenced this issue Mar 25, 2025
…ms with broken profiling

CL 658035 added TestCgoCallbackPprof, which is consistently failing on
solaris. runtime/pprof maintains a list of platforms where CPU profiling
does not work properly. Since this test requires CPU profiling, skip the
this test on those platforms.

For #72870.
For #72876.
For #72871.

Change-Id: I6a6a636cbf6b16abcbba8771178fe1d001be9d9b
Reviewed-on: https://go-review.googlesource.com/c/go/+/658415
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-on: https://go-review.googlesource.com/c/go/+/658435
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@google.com>
TryBot-Bypass: Dmitri Shuralyov <dmitshur@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Critical A critical problem that affects the availability or correctness of production systems built using Go NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

2 participants