Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: runtime: add HeapMarked to MemStats #51966

Open
doujiang24 opened this issue Mar 26, 2022 · 3 comments
Open

proposal: runtime: add HeapMarked to MemStats #51966

doujiang24 opened this issue Mar 26, 2022 · 3 comments
Labels
Projects
Milestone

Comments

@doujiang24
Copy link
Contributor

@doujiang24 doujiang24 commented Mar 26, 2022

Background:
We want to figure out the reason for memory spikes.

I found this may be a proper way:

  1. install a GC finalizer.
  2. monitor the HeapMarked after every GC termination, in GC finalizer.

Like this case:

  1. HeapMarked keeps in almost 100MB usually.
  2. if HeapMarked grows fast, reaches 150MB, after one GC termination, then, dump a heap profile, eg. a.profile.
  3. and dump another heap profile, after the next GC termination, eg. b.profile.

We could know why heap grows 50MB, by using this command:

go tool pprof b.profile -base a.profile

This could work since heap profile may be up to two garbage collection cycles old:

// The returned profile may be up to two garbage collection cycles old.

We could roughly calculate the HeapMarked by NextGC / (1+GCPercent/100).
I think it would be useful to add HeapMarked to MemStats, so that we can get it exactly.

@gopherbot gopherbot added this to the Proposal milestone Mar 26, 2022
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Mar 26, 2022

@ianlancetaylor ianlancetaylor added this to Incoming in Proposals Mar 26, 2022
@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Mar 27, 2022

A few thoughts:

  1. We will not be adding more fields to MemStats. For all future metrics, runtime/metrics is the new package they'll be added to. It's more efficient, and gives us the option to properly deprecate and drop metrics in the future (certainly not often though, so we still want to be careful; it's not a dumping ground).
  2. With #48409, which I still plan to land this cycle, the amount of memory marked live is a lot less meaningful if there's a soft memory limit in effect. As an example, NextGC / (1+GCPercent/100) will definitely be inaccurate (which means HeapMarked * (1+GCPercent/100) will also be inaccurate) once that lands.
  3. Finalizers may be run at an arbitrary time after their associated objects die, so there definitely isn't a 1:1 correspondence between finalizer execution and GC cycles.
  4. Live heap spikes can happen for a lot of reasons that may be otherwise benign or unavoidable (and a heap profile won't reveal or point out anything in particular). For example, if the application suddenly begins allocating more rapidly, the GC mark phase will begin earlier to give the GC more runway and time to catch up without assists. The result is more memory being allocated in an already-marked state, ultimately leading to a higher live heap for the next cycle.

More fundamentally, if (3) isn't a problem (and it seems like it would be OK in your use-case to be a little loose), why isn't HeapInUse enough? While HeapInUse doesn't reveal the sawtooth curve of the GC cycle, it does still scale with... well, how much of the heap is in use. :) I think that's a reasonable expectation to have going forward.

I'm not fundamentally opposed to something like HeapMarked being added to runtime/metrics, but I think it needs some thought as to why this isn't already exposed and what the implications are with respect to GC implementations. As a general rule, the more internal state of the GC you reveal, the more users come to rely on that state, making it harder and harder to change the implementation.

@doujiang24
Copy link
Contributor Author

@doujiang24 doujiang24 commented Mar 28, 2022

Thanks.

For (1), got it, thanks.
For (2), #48409 sounds really a good/useful proposal.
For (3), yeah, it is acceptable for us, it's ok while the finalizer is invoked before the next GC cycle, which is a large enough time, usually.
But, HeapInUse is not good enough, since it always grows, and we can not assume that delay time(GC termination to finalizer invoked) is very small.

For (4). We just want to figure out the spikes that are unexpected. We used to enter this case:
In some bad code paths, it will allocate lots of GC objects, and hold for a while, ultimately leading to a large GC goal.
Sometimes that GC goal is large enough to cause OOM, OOM happens before the next GC cycle happens.
We want to figure out the bad code paths accurately, by using GC finalizer + heap profile.
(We used to figure out it in by heap profile + time Ticketer, but we can not capture the bad code path accurately).
By the way, #48409 also helps in this case.

As a general rule, the more internal state of the GC you reveal, the more users come to rely on that state, making it harder and harder to change the implementation.

Yeah, totally understand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Proposals
Incoming
Development

No branches or pull requests

4 participants