We will not be adding more fields to MemStats. For all future metrics, runtime/metrics is the new package they'll be added to. It's more efficient, and gives us the option to properly deprecate and drop metrics in the future (certainly not often though, so we still want to be careful; it's not a dumping ground).
With #48409, which I still plan to land this cycle, the amount of memory marked live is a lot less meaningful if there's a soft memory limit in effect. As an example, NextGC / (1+GCPercent/100) will definitely be inaccurate (which means HeapMarked * (1+GCPercent/100) will also be inaccurate) once that lands.
Finalizers may be run at an arbitrary time after their associated objects die, so there definitely isn't a 1:1 correspondence between finalizer execution and GC cycles.
Live heap spikes can happen for a lot of reasons that may be otherwise benign or unavoidable (and a heap profile won't reveal or point out anything in particular). For example, if the application suddenly begins allocating more rapidly, the GC mark phase will begin earlier to give the GC more runway and time to catch up without assists. The result is more memory being allocated in an already-marked state, ultimately leading to a higher live heap for the next cycle.
More fundamentally, if (3) isn't a problem (and it seems like it would be OK in your use-case to be a little loose), why isn't HeapInUse enough? While HeapInUse doesn't reveal the sawtooth curve of the GC cycle, it does still scale with... well, how much of the heap is in use. :) I think that's a reasonable expectation to have going forward.
I'm not fundamentally opposed to something like HeapMarked being added to runtime/metrics, but I think it needs some thought as to why this isn't already exposed and what the implications are with respect to GC implementations. As a general rule, the more internal state of the GC you reveal, the more users come to rely on that state, making it harder and harder to change the implementation.
For (1), got it, thanks.
For (2), #48409 sounds really a good/useful proposal.
For (3), yeah, it is acceptable for us, it's ok while the finalizer is invoked before the next GC cycle, which is a large enough time, usually.
But, HeapInUse is not good enough, since it always grows, and we can not assume that delay time(GC termination to finalizer invoked) is very small.
For (4). We just want to figure out the spikes that are unexpected. We used to enter this case:
In some bad code paths, it will allocate lots of GC objects, and hold for a while, ultimately leading to a large GC goal.
Sometimes that GC goal is large enough to cause OOM, OOM happens before the next GC cycle happens.
We want to figure out the bad code paths accurately, by using GC finalizer + heap profile.
(We used to figure out it in by heap profile + time Ticketer, but we can not capture the bad code path accurately).
By the way, #48409 also helps in this case.
As a general rule, the more internal state of the GC you reveal, the more users come to rely on that state, making it harder and harder to change the implementation.