runtime: GC pacer problems meta-issue #42430
Problems with the GC pacer
The Go GC's pacer has had a number of problems accumulate over the years since it was originally unveiled in Go 1.5. The pacer has also changed somewhat since then. This issue lists each problem along with some details and history. They are not listed in any particular order.
Idle GC interferes with auto-scaling systems
The Go scheduler considers
A broader question is: do we need idle GC at all? If the application is mostly idle anyway, why not let the GC run slower? The idle GC conceptually appears to help mainly bursty applications, but we don't have good data to back that up. The idle GC may have also inadvertently become required for the GC to make progress. Idle GC can also have other negative effects, such as latency, since once an idle GC goroutine grabs a P it could run until a scheduler preemption (#37116).
Assist credit system issues
Hard heap goal isn't actually hard
When debugging issues with
Mark assists tend to be front-loaded in a GC cycle
Mark assists by allocating goroutines tend to happen frequently early on in a GC cycle, even if assists aren't generally needed, because no assist credit is available at the beginning of a GC cycle. As a result, allocating goroutines are forced to assist until they either earn enough credit or the background mark workers generate enough credit for them. Ideally mark assist work would be spread more evenly throughout the GC cycle to promote better whole-program latencies.
Assist credit system can leave credit unaccounted for
The assist credit system is somewhat ad-hoc in terms of credit/debt ownership. For instance, if a goroutine exits while in debt, that debt simply disappears, yet something must do that work before the GC is over. Similarly, credit will just disappear, potentially making the GC work harder than it needs to.
High GOGC values have some phase change issues
Consider a very high GOGC value such as 10000. Generally speaking, if the live heap is steady, then all is well and we're getting the RAM/CPU tradeoff we expect. However, if the application's phase changes suddenly and a significant portion of the allocated heap is found to be live during a GC, the pacer will be in trouble. Namely, it will have started far too late and though it will push back on the application, it could take several GC cycles to recover.
Support for minimum and maximum heaps
Memory is relatively inflexible, yet the Go GC doesn't exactly treat it that way: it's not aware of actual memory availability. Today, we allow the programmer to make a CPU/RAM tradeoff via GOGC, but in practice when memory is limited we might want to tradeoff CPU to limit RAM to a specific limit, and when memory is abundant we may want to tradeoff RAM (up to some target) for CPU because we have that RAM available anyway. These two situations may be dealt with by having maximum and minimum heap limits respectively.
For the maximum heap limit, we've long considered such a solution (approx. 3.5 years at the time of writing) in the form of
For the minimum heap target, we've had a long-standing proposal on GitHub (#23044) for such a feature, as an alternative to using a heap ballast. In general there are some pacing-related issues here that prevent simply setting a minimum target heap size, mainly related to generating (potentially unintentionally, which is the critical bit here) a high 'effective' GOGC value. For an example of how that can happen see this comment.
Thinking more broadly, there may be an effective way to approach both of these problems by rethinking the pacer a bit. For instance, pacing decisions today implicitly involve the GC scan/mark rate. By making this value explicit in the process, we may be able to make better decisions about how "soft" these limits should be in the general case.
Finally, there's a philosophical question here on how these tradeoffs should be made visible to the user. Theoretically, with notifications about GC timings, live heap metrics, and
Failure to amortize GC costs for small heaps
Small Go heaps by definition have very little work for the GC to do in the heap itself, so in these cases other factors (such as globals, goroutine stacks, and the like) can dominate costs. However, the pacer doesn't consider these factors at all. As a result, it tends to make bad predictions for future GCs. As a result, this long-standing issue (#19839) to include globals in pacing remains open. Furthermore, to cut off the worst of this bad behavior, the GC has a minimum heap size of 4 MiB.
This problem has been conflated with the minimum heap problem in the past, because heap ballasts may also be used to work around this problem in more severe cases (such as an application having an unusually large amount of globals or goroutines vs. the size of the heap). The heap ballast effectively acts like a stand-in for all this GC work that is unaccounted for.
The text was updated successfully, but these errors were encountered:
Perhaps an economic model, basically a function that inputs, among other values, the cost of compute and the cost of memory and produces the total cost of GC. The Pacer would use the model to create a target that minimizes the total cost of GC. Currently Go uses GOGC and live object size to determine heap size independent of compute costs. Likewise it uses GOMAXPROCS and mutator utilization to determine the CPU available to the GC. This leads to overall imprecision as well as fragility when there is high allocation and low survivability. At some point, many years ago, the minimum heap size was set at 4MBytes and GC compute was allowed to grow unrestrained to maintain that minimum. Similarly there is imprecision when there is low allocation and high survivability. Every so often Go forces a GC even when it is not needed causing 25% of compute being periodically taken away from the mutator. A more holistic approach starting with an economic model could resolve these and other issues.
While runtime behaviour does not technically fall under Go’s compatibility promise, providing a way to support current behaviour would be in the spirit of the compatibility promise. I suspect this will require adding at least one additional API that can focus on resolving the co-tenancy problems such as the auto-scaler problem. If Go is omnipotent, the obvious default (insert smilee), then the runtime can simply ask the OS about RAM and CPU and consume all that is available. If there is a co-tenant then the API can focus on informing the total cost of GC function of its budget relative to being omnipotent. This avoids the specifying a hard max or min heap while addressing the goal. This will help to future proof changes to the runtime while allowing today’s applications to scale as HW scales.