Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: runtime: (optionally) print M backtrace on crash #67929

Closed
aktau opened this issue Jun 11, 2024 · 4 comments
Closed

proposal: runtime: (optionally) print M backtrace on crash #67929

aktau opened this issue Jun 11, 2024 · 4 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Proposal
Milestone

Comments

@aktau
Copy link
Contributor

aktau commented Jun 11, 2024

Proposal Details

Currently, when crashing due to receiving a fatal signal (e.g.: SIGABRT), Go outputs backtraces for all Gs, and then the registers of the thread on which the signal hit:

level, _, docrash := gotraceback()
if level > 0 {
goroutineheader(gp)
tracebacktrap(c.sigpc(), c.sigsp(), c.siglr(), gp)
if crashing.Load() > 0 && gp != mp.curg && mp.curg != nil && readgstatus(mp.curg)&^_Gscan == _Grunning {
// tracebackothers on original m skipped this one; trace it now.
goroutineheader(mp.curg)
traceback(^uintptr(0), ^uintptr(0), 0, mp.curg)
} else if crashing.Load() == 0 {
tracebackothers(gp)
print("\n")
}
dumpregs(c)
}
.

We're observing an issue where a Go program appears to stall (making no significant progress, if any) that we have only seen in production. We have not been able to reproduce at will. The G backtraces indicate that there are many (1000s) runnable goroutines when this happens, yet only 3 out of ~10 GOMAXPROCS Ms appear assigned to a goroutine. It is unclear what the other Ms are doing. Seeing where they are may provide a hint as to what's going wrong.

I believe this to be useful in general, although in an otherwise healthy process I'd expect at least the GOMAXPROCS Ms to be mentioned in goroutine backtraces like this:

goroutine 45890 gp=0xc00fa35340 m=15 mp=0xc007a35808 [running]:

But that would still leave the Ms that are in cgo-calls or syscalls.

Regrettably, we have not been able to extract a coredump from such a crash from our environment, which would be another way of getting at this useful information.

It's fine if this functionality is hidden behind a GODEBUG flag, but I'd say there's a case for enabling it by default. The format (header) would have to be somewhat different from regular goroutine stacks, as otherwise tools parsing this may get confused.

cc @mknyszek @prattmic

@aktau aktau added the Proposal label Jun 11, 2024
@mauri870 mauri870 added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jun 11, 2024
@mauri870 mauri870 added this to the Backlog milestone Jun 11, 2024
@aktau
Copy link
Contributor Author

aktau commented Jun 11, 2024

I agree that #13161 seems quite similar. It sounds like what I want is subsumed by @aclements proposed GOTRACEBACK=all, given:

Currently, GOTRACEBACK=all is a misnomer. It prints stacks for all goroutines that happen to be non-running or running on the current OS thread, but it does not print stacks for goroutines that are running on other OS threads.

Should the discussion be moved there?

@rsc
Copy link
Contributor

rsc commented Jun 11, 2024

I agree that this needn't be a proposal and can be considered a bug fix in scope of #13161.

@mknyszek
Copy link
Contributor

mknyszek commented Jun 12, 2024

In triage, based on the discussion, closing in favor of #13161. Please feel free to comment or reopen if that's wrong. Thanks.

@mknyszek mknyszek closed this as not planned Won't fix, can't repro, duplicate, stale Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Proposal
Projects
Development

No branches or pull requests

5 participants