Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: dump a summary of heap stats on OOM #52546

Open
bcmills opened this issue Apr 25, 2022 · 7 comments
Open

runtime: dump a summary of heap stats on OOM #52546

bcmills opened this issue Apr 25, 2022 · 7 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Apr 25, 2022

I'm seeing fairly frequent OOM failures in the Go build dashboard.

Some of them look like misconfigured or undersized builders, while others could plausibly be bugs in the runtime (such as #52433). In other cases (such as #49957), it's much less clear what led to the failure.

It would be very helpful to be able to distinguish those cases from the failure logs. Today, we get fatal error: runtime: out of memory, but generally the runtime doesn't actually tell us how much memory it was using when that happened. (We do get a goroutine dump, but a goroutine dump is only loosely correlated with the size of the heap.)

@golang/runtime: how difficult would it be to have the runtime dump a best-effort summary of heap stats when it hits the out of memory condition?

@bcmills
Copy link
Contributor Author

bcmills commented Apr 25, 2022

#52547 is another example of a failure mode where a dump on runtime OOM would be helpful.

@bcmills
Copy link
Contributor Author

bcmills commented Apr 25, 2022

This might also help with #51019.

@mknyszek
Copy link
Contributor

This is straightforward to do. There's an "unsafe read" operation on memstats.heapStats which gets you most of the way there (that may need to be modified so all the reads are atomic, just to make that effort a little better). We can also dump the rest of stats which are just the sysMemStat-typed fields in memstats (this may not be 100% true before my outstanding refactoring CLs, but will be after). It's all just atomics (and I think a little math), no locks necessary.

@mknyszek
Copy link
Contributor

I can write this CL on top of my stack.

@cagedmantis cagedmantis added this to the Backlog milestone Apr 28, 2022
@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Apr 28, 2022
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 7, 2022
@mknyszek mknyszek moved this to Triage Backlog in Go Compiler / Runtime Jul 15, 2022
@mknyszek mknyszek moved this from Triage Backlog to Todo in Go Compiler / Runtime Jul 18, 2022
@mknyszek mknyszek self-assigned this Jul 18, 2022
@mknyszek
Copy link
Contributor

Sorry, I never really got around to this. I'll triage it for myself.

@yeya24
Copy link

yeya24 commented Aug 19, 2024

Is this still planned?

@mknyszek
Copy link
Contributor

mknyszek commented Aug 20, 2024

No, sorry. This was never formally planned.

Keep in mind that this was only about the runtime's "out of memory" error which is does not apply to general out-of-memory errors on, say, Linux. Linux out-of-memory failures trigger the OOM killer, which begins killing programs based on heuristics with no opportunity to dump any information or recover. (Often, it's triggered by accessing memory for the first time, not explicitly allocating it; there is no way that I am aware of to hook into that, because even signals might require allocating memory.)

The specific class of "out of memory" error here shows up most commonly on 32-bit platforms where the address space is small, so mmap (and the like) actually have a solid chance of failing directly. I believe the root of this improvement was for debugging issues in the Go project itself -- it's generally less useful for other projects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Development

No branches or pull requests

5 participants