Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: runtime: add HeapMarked to MemStats #51966

Closed
doujiang24 opened this issue Mar 26, 2022 · 7 comments
Closed

proposal: runtime: add HeapMarked to MemStats #51966

doujiang24 opened this issue Mar 26, 2022 · 7 comments

Comments

@doujiang24
Copy link
Contributor

Background:
We want to figure out the reason for memory spikes.

I found this may be a proper way:

  1. install a GC finalizer.
  2. monitor the HeapMarked after every GC termination, in GC finalizer.

Like this case:

  1. HeapMarked keeps in almost 100MB usually.
  2. if HeapMarked grows fast, reaches 150MB, after one GC termination, then, dump a heap profile, eg. a.profile.
  3. and dump another heap profile, after the next GC termination, eg. b.profile.

We could know why heap grows 50MB, by using this command:

go tool pprof b.profile -base a.profile

This could work since heap profile may be up to two garbage collection cycles old:

// The returned profile may be up to two garbage collection cycles old.

We could roughly calculate the HeapMarked by NextGC / (1+GCPercent/100).
I think it would be useful to add HeapMarked to MemStats, so that we can get it exactly.

@gopherbot gopherbot added this to the Proposal milestone Mar 26, 2022
@ianlancetaylor
Copy link
Contributor

CC @mknyszek @golang/runtime

@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) Mar 26, 2022
@mknyszek
Copy link
Contributor

A few thoughts:

  1. We will not be adding more fields to MemStats. For all future metrics, runtime/metrics is the new package they'll be added to. It's more efficient, and gives us the option to properly deprecate and drop metrics in the future (certainly not often though, so we still want to be careful; it's not a dumping ground).
  2. With runtime/debug: soft memory limit #48409, which I still plan to land this cycle, the amount of memory marked live is a lot less meaningful if there's a soft memory limit in effect. As an example, NextGC / (1+GCPercent/100) will definitely be inaccurate (which means HeapMarked * (1+GCPercent/100) will also be inaccurate) once that lands.
  3. Finalizers may be run at an arbitrary time after their associated objects die, so there definitely isn't a 1:1 correspondence between finalizer execution and GC cycles.
  4. Live heap spikes can happen for a lot of reasons that may be otherwise benign or unavoidable (and a heap profile won't reveal or point out anything in particular). For example, if the application suddenly begins allocating more rapidly, the GC mark phase will begin earlier to give the GC more runway and time to catch up without assists. The result is more memory being allocated in an already-marked state, ultimately leading to a higher live heap for the next cycle.

More fundamentally, if (3) isn't a problem (and it seems like it would be OK in your use-case to be a little loose), why isn't HeapInUse enough? While HeapInUse doesn't reveal the sawtooth curve of the GC cycle, it does still scale with... well, how much of the heap is in use. :) I think that's a reasonable expectation to have going forward.

I'm not fundamentally opposed to something like HeapMarked being added to runtime/metrics, but I think it needs some thought as to why this isn't already exposed and what the implications are with respect to GC implementations. As a general rule, the more internal state of the GC you reveal, the more users come to rely on that state, making it harder and harder to change the implementation.

@doujiang24
Copy link
Contributor Author

Thanks.

For (1), got it, thanks.
For (2), #48409 sounds really a good/useful proposal.
For (3), yeah, it is acceptable for us, it's ok while the finalizer is invoked before the next GC cycle, which is a large enough time, usually.
But, HeapInUse is not good enough, since it always grows, and we can not assume that delay time(GC termination to finalizer invoked) is very small.

For (4). We just want to figure out the spikes that are unexpected. We used to enter this case:
In some bad code paths, it will allocate lots of GC objects, and hold for a while, ultimately leading to a large GC goal.
Sometimes that GC goal is large enough to cause OOM, OOM happens before the next GC cycle happens.
We want to figure out the bad code paths accurately, by using GC finalizer + heap profile.
(We used to figure out it in by heap profile + time Ticketer, but we can not capture the bad code path accurately).
By the way, #48409 also helps in this case.

As a general rule, the more internal state of the GC you reveal, the more users come to rely on that state, making it harder and harder to change the implementation.

Yeah, totally understand.

@rsc
Copy link
Contributor

rsc commented Sep 7, 2022

Adding to minutes but it seems like a likely decline for next week.

@rsc
Copy link
Contributor

rsc commented Sep 7, 2022

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented Sep 21, 2022

Based on the discussion above, this proposal seems like a likely decline.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented Sep 28, 2022

No change in consensus, so declined.
— rsc for the proposal review group

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Declined
Development

No branches or pull requests

5 participants