Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: GC pacer problems meta-issue #42430

mknyszek opened this issue Nov 6, 2020 · 1 comment

runtime: GC pacer problems meta-issue #42430

mknyszek opened this issue Nov 6, 2020 · 1 comment
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance umbrella


Copy link

mknyszek commented Nov 6, 2020

Problems with the GC pacer

Updated: 2020-11-06

The Go GC's pacer has had a number of problems accumulate over the years since it was originally unveiled in Go 1.5. The pacer has also changed somewhat since then. This issue lists each problem along with some details and history. They are not listed in any particular order.

Idle GC interferes with auto-scaling systems

The Go scheduler considers GOMAXPROCS level of parallelism available, and if it's not in use, fills that time with GC mark workers of a special kind. In practice, by soaking up this CPU time, it interferes with systems that scale the amount of resources available to a Go application dependent on resource usage and latency. If an application is mostly idle, then it will experience occasional spikes in CPU usage as GCs, and thus the idle GC, gets triggered. Auto-scaling systems may then notice this and give the Go application more CPUs, which the GC then soaks up in turn, leading to these systems to provide ever more CPUs. In practice we don't see this spiral too often since it can generally be easily mitigated by increasing the time window used to make auto-scaling decisions, but it has occurred before. See #39983 for more details.

A broader question is: do we need idle GC at all? If the application is mostly idle anyway, why not let the GC run slower? The idle GC conceptually appears to help mainly bursty applications, but we don't have good data to back that up. The idle GC may have also inadvertently become required for the GC to make progress. Idle GC can also have other negative effects, such as latency, since once an idle GC goroutine grabs a P it could run until a scheduler preemption (#37116).

Assist credit system issues

Hard heap goal isn't actually hard

When debugging issues with SetMaxHeap, mknyszek@ discovered that the goroutine assist credit system allows for exceeding the "hard heap goal" which is set by the pacer to be 1.1x the soft heap goal. In practice this "hard goal" is "hard enough" and exceeding it is difficult, but still possible (#40460). It suggests more general problems in the goroutine assist credit system because assists are the GC's only way of pushing back on the application, and if there's a hole there, it's not covered by any other mechanisms.

Mark assists tend to be front-loaded in a GC cycle

Mark assists by allocating goroutines tend to happen frequently early on in a GC cycle, even if assists aren't generally needed, because no assist credit is available at the beginning of a GC cycle. As a result, allocating goroutines are forced to assist until they either earn enough credit or the background mark workers generate enough credit for them. Ideally mark assist work would be spread more evenly throughout the GC cycle to promote better whole-program latencies.

Assist credit system can leave credit unaccounted for

The assist credit system is somewhat ad-hoc in terms of credit/debt ownership. For instance, if a goroutine exits while in debt, that debt simply disappears, yet something must do that work before the GC is over. Similarly, credit will just disappear, potentially making the GC work harder than it needs to.

High GOGC values have some phase change issues

Consider a very high GOGC value such as 10000. Generally speaking, if the live heap is steady, then all is well and we're getting the RAM/CPU tradeoff we expect. However, if the application's phase changes suddenly and a significant portion of the allocated heap is found to be live during a GC, the pacer will be in trouble. Namely, it will have started far too late and though it will push back on the application, it could take several GC cycles to recover.

Support for minimum and maximum heaps

Memory is relatively inflexible, yet the Go GC doesn't exactly treat it that way: it's not aware of actual memory availability. Today, we allow the programmer to make a CPU/RAM tradeoff via GOGC, but in practice when memory is limited we might want to tradeoff CPU to limit RAM to a specific limit, and when memory is abundant we may want to tradeoff RAM (up to some target) for CPU because we have that RAM available anyway. These two situations may be dealt with by having maximum and minimum heap limits respectively.

For the maximum heap limit, we've long considered such a solution (approx. 3.5 years at the time of writing) in the form of SetMaxHeap (some discussion about it at #29696). This API has existed as a patch, though uptake and feedback has been limited. It notably included a GC feedback mechanism for memory-based load shedding, but this mechanism was often misused or ignored entirely. SetMaxHeap set a soft limit, but punted on some questions like "how do we use the fact that it's a soft limit to prevent a GC death spiral?" by picking some arbitrary answers like "don't let the 'effective' GOGC fall below 10."

For the minimum heap target, we've had a long-standing proposal on GitHub (#23044) for such a feature, as an alternative to using a heap ballast. In general there are some pacing-related issues here that prevent simply setting a minimum target heap size, mainly related to generating (potentially unintentionally, which is the critical bit here) a high 'effective' GOGC value. For an example of how that can happen see this comment.

Thinking more broadly, there may be an effective way to approach both of these problems by rethinking the pacer a bit. For instance, pacing decisions today implicitly involve the GC scan/mark rate. By making this value explicit in the process, we may be able to make better decisions about how "soft" these limits should be in the general case.

Finally, there's a philosophical question here on how these tradeoffs should be made visible to the user. Theoretically, with notifications about GC timings, live heap metrics, and GOGC, both a minimum and maximum heap could be efficiently set by the application itself. The benefit of this approach (over e.g. SetMaxHeap or SetMinHeap) is that there's still only one GC "knob," which means fewer GC configurations to test and maintain, and encourages us to try to make the GC and the pacer simply more robust. The cost of this approach is that it tends to expose some details about how the pacer works, and ends up having the application rely on GC implementation details to some limited extent. On the other hand, an API lets us hide these details more effectively, even if folks could still technically fiddle with GOGC on the fly, but that also depends on us being careful about exposing behaviors like GC timing to the application (which we may want to do anyway, directly or indirectly, for other reasons in the future).

Failure to amortize GC costs for small heaps

Small Go heaps by definition have very little work for the GC to do in the heap itself, so in these cases other factors (such as globals, goroutine stacks, and the like) can dominate costs. However, the pacer doesn't consider these factors at all. As a result, it tends to make bad predictions for future GCs. As a result, this long-standing issue (#19839) to include globals in pacing remains open. Furthermore, to cut off the worst of this bad behavior, the GC has a minimum heap size of 4 MiB.

This problem has been conflated with the minimum heap problem in the past, because heap ballasts may also be used to work around this problem in more severe cases (such as an application having an unusually large amount of globals or goroutines vs. the size of the heap). The heap ballast effectively acts like a stand-in for all this GC work that is unaccounted for.

@mknyszek mknyszek added this to the Unplanned milestone Nov 6, 2020
@toothrot toothrot added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 6, 2020
Copy link

RLH commented Nov 17, 2020

Perhaps an economic model, basically a function that inputs, among other values, the cost of compute and the cost of memory and produces the total cost of GC. The Pacer would use the model to create a target that minimizes the total cost of GC. Currently Go uses GOGC and live object size to determine heap size independent of compute costs. Likewise it uses GOMAXPROCS and mutator utilization to determine the CPU available to the GC. This leads to overall imprecision as well as fragility when there is high allocation and low survivability. At some point, many years ago, the minimum heap size was set at 4MBytes and GC compute was allowed to grow unrestrained to maintain that minimum. Similarly there is imprecision when there is low allocation and high survivability. Every so often Go forces a GC even when it is not needed causing 25% of compute being periodically taken away from the mutator. A more holistic approach starting with an economic model could resolve these and other issues.

While runtime behaviour does not technically fall under Go’s compatibility promise, providing a way to support current behaviour would be in the spirit of the compatibility promise. I suspect this will require adding at least one additional API that can focus on resolving the co-tenancy problems such as the auto-scaler problem. If Go is omnipotent, the obvious default (insert smilee), then the runtime can simply ask the OS about RAM and CPU and consume all that is available. If there is a co-tenant then the API can focus on informing the total cost of GC function of its budget relative to being omnipotent. This avoids the specifying a hard max or min heap while addressing the goal. This will help to future proof changes to the runtime while allowing today’s applications to scale as HW scales.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance umbrella
None yet

No branches or pull requests

5 participants