runtime: GC pacer problems meta-issue

# Problems with the GC pacer

Updated: 2020-11-06

The Go GC's pacer has had a number of problems accumulate over the years since it was originally unveiled in [Go 1.5](https://docs.google.com/document/d/1wmjrocXIWTr1JxU-3EQBI6BK6KgtiFArkG47XK73xIQ/edit#). The pacer has also changed somewhat since then. This issue lists each problem along with some details and history. They are not listed in any particular order.

## Idle GC interferes with auto-scaling systems

The Go scheduler considers `GOMAXPROCS` level of parallelism available, and if it's not in use, fills that time with GC mark workers of a special kind. In practice, by soaking up this CPU time, it interferes with systems that scale the amount of resources available to a Go application dependent on resource usage and latency. If an application is mostly idle, then it will experience occasional spikes in CPU usage as GCs, and thus the idle GC, gets triggered. Auto-scaling systems may then notice this and give the Go application more CPUs, which the GC then soaks up in turn, leading to these systems to provide ever more CPUs. In practice we don't see this spiral too often since it can generally be easily mitigated by increasing the time window used to make auto-scaling decisions, but it has occurred before. See #39983 for more details.

A broader question is: do we need idle GC at all? If the application is mostly idle anyway, why not let the GC run slower? The idle GC conceptually appears to help mainly bursty applications, but we don't have good data to back that up. The idle GC may have also inadvertently become required for the GC to make progress. Idle GC can also have other negative effects, such as latency, since once an idle GC goroutine grabs a P it could run until a scheduler preemption (#37116).

## Assist credit system issues

### Hard heap goal isn't actually hard

When debugging issues with `SetMaxHeap`, mknyszek@ discovered that the goroutine assist credit system allows for exceeding the "hard heap goal" which is set by the pacer to be 1.1x the soft heap goal. In practice this "hard goal" is "hard enough" and exceeding it is difficult, but still possible (#40460). It suggests more general problems in the goroutine assist credit system because assists are the GC's only way of pushing back on the application, and if there's a hole there, it's not covered by any other mechanisms.

### Mark assists tend to be front-loaded in a GC cycle

Mark assists by allocating goroutines tend to happen frequently early on in a GC cycle, even if assists aren't generally needed, because no assist credit is available at the beginning of a GC cycle. As a result, allocating goroutines are forced to assist until they either earn enough credit or the background mark workers generate enough credit for them. Ideally mark assist work would be spread more evenly throughout the GC cycle to promote better whole-program latencies.

### Assist credit system can leave credit unaccounted for

The assist credit system is somewhat ad-hoc in terms of credit/debt ownership. For instance, if a goroutine exits while in debt, that debt simply disappears, yet something must do that work before the GC is over. Similarly, credit will just disappear, potentially making the GC work harder than it needs to.

## High GOGC values have some phase change issues

Consider a very high GOGC value such as 10000. Generally speaking, if the live heap is steady, then all is well and we're getting the RAM/CPU tradeoff we expect. However, if the application's phase changes suddenly and a significant portion of the allocated heap is found to be live during a GC, the pacer will be in trouble. Namely, it will have started far too late and though it will push back on the application, it could take several GC cycles to recover.

## Support for minimum and maximum heaps

Memory is relatively inflexible, yet the Go GC doesn't exactly treat it that way: it's not aware of actual memory availability. Today, we allow the programmer to make a CPU/RAM tradeoff via GOGC, but in practice when memory is limited we might want to tradeoff CPU to limit RAM to a specific limit, and when memory is abundant we may want to tradeoff RAM (up to some target) for CPU because we have that RAM available anyway. These two situations may be dealt with by having maximum and minimum heap limits respectively.

For the maximum heap limit, we've long considered such a solution (approx. 3.5 years at the time of writing) in the form of `SetMaxHeap` (some discussion about it at #29696). This API has existed as a patch, though uptake and feedback has been limited. It notably included a GC feedback mechanism for memory-based load shedding, but this mechanism was often misused or ignored entirely. `SetMaxHeap` set a soft limit, but punted on some questions like "how do we use the fact that it's a soft limit to prevent a GC death spiral?" by picking some arbitrary answers like "don't let the 'effective' GOGC fall below 10."

For the minimum heap target, we've had a long-standing proposal on GitHub (#23044) for such a feature, as an alternative to using a [heap ballast](https://blog.twitch.tv/en/2019/04/10/go-memory-ballast-how-i-learnt-to-stop-worrying-and-love-the-heap-26c2462549a2/). In general there are some pacing-related issues here that prevent simply setting a minimum target heap size, mainly related to generating (potentially unintentionally, which is the critical bit here) a high 'effective' GOGC value. For an example of how that can happen see [this comment](https://github.com/golang/go/issues/23044#issuecomment-723262216).

Thinking more broadly, there may be an effective way to approach both of these problems by rethinking the pacer a bit. For instance, pacing decisions today implicitly involve the GC scan/mark rate. By making this value explicit in the process, we may be able to make better decisions about how "soft" these limits should be in the general case.

Finally, there's a philosophical question here on how these tradeoffs should be made visible to the user. Theoretically, with notifications about GC timings, live heap metrics, and `GOGC`, both a minimum and maximum heap could be _efficiently_ set by the application itself. The benefit of this approach (over e.g. `SetMaxHeap` or `SetMinHeap`) is that there's still only one GC "knob," which means fewer GC configurations to test and maintain, and encourages us to try to make the GC and the pacer simply more robust. The cost of this approach is that it tends to expose some details about how the pacer works, and ends up having the application rely on GC implementation details to some limited extent. On the other hand, an API lets us hide these details more effectively, even if folks could still technically fiddle with `GOGC` on the fly, but that also depends on us being careful about exposing behaviors like GC timing to the application (which we may want to do anyway, directly or indirectly, for other reasons in the future).

## Failure to amortize GC costs for small heaps

Small Go heaps by definition have very little work for the GC to do in the heap itself, so in these cases other factors (such as globals, goroutine stacks, and the like) can dominate costs. However, the pacer doesn't consider these factors at all. As a result, it tends to make bad predictions for future GCs. As a result, this long-standing issue (#19839) to include globals in pacing remains open. Furthermore, to cut off the worst of this bad behavior, the GC has a minimum heap size of 4 MiB.

This problem has been conflated with the minimum heap problem in the past, because heap ballasts may also be used to work around this problem in more severe cases (such as an application having an unusually large amount of globals or goroutines vs. the size of the heap). The heap ballast effectively acts like a stand-in for all this GC work that is unaccounted for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

runtime: GC pacer problems meta-issue #42430

Problems with the GC pacer

Idle GC interferes with auto-scaling systems

Assist credit system issues

Hard heap goal isn't actually hard

Mark assists tend to be front-loaded in a GC cycle

Assist credit system can leave credit unaccounted for

High GOGC values have some phase change issues

Support for minimum and maximum heaps

Failure to amortize GC costs for small heaps

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

runtime: GC pacer problems meta-issue #42430

Description

Problems with the GC pacer

Idle GC interferes with auto-scaling systems

Assist credit system issues

Hard heap goal isn't actually hard

Mark assists tend to be front-loaded in a GC cycle

Assist credit system can leave credit unaccounted for

High GOGC values have some phase change issues

Support for minimum and maximum heaps

Failure to amortize GC costs for small heaps

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions