Today the Go GC will spin up dedicated GC workers which use 25% of
The idea here is that those cores are idle anyway, so why not use them to speed up the GC cycle? There are negative performance implications to having the GC on longer than necessary (floating garbage accumulation for memory, and write barrier enabled for CPU time).
EDIT: To be clear, I'm not proposing removing dedicated GC workers, which use 25% of CPU resources during a GC cycle. Those are very, very important. Just the idle workers that can increase that number to 100% if the application is idle.
I believe today these idle GC workers are doing more harm than good.
Firstly, when a Go process is mostly idle, idle enough that the "every-2-minutes" GC kicks in, it's going to go full blast on all CPU cores, resulting in a CPU utilization spike. While this on it's own isn't a big deal, it makes a difference in the face of monitoring systems and auto-scaling systems. The former is surprising; one might see these spikes in their monitoring and spend time investigating them, only to find there's nothing to do about that (today). The latter is dangerous: if an auto-scaling system decides to give an application more resources because it notices this spike, then the Go GC is now going to soak up those resources, resulting in a significant (if temporary, and generally bounded) increase in resources.
Secondly, the Go pacer fails to account for this extra CPU utilization. This significantly skews the pacer's function and makes it consistently undershoot its target in the steady-state. The goal of the pacer is to reach some GC CPU utilization target, but that target is somewhat meaningless if the idle GC workers can kick in and blow that away in applications that have any idle time. The result is that if idle GC workers are running, the GC will consistently undershoot its goal.
Worse still, this extra CPU utilization isn't easy to account for in the pacer. If we pace assuming the idle GC workers are running, it's possible for the GC to really panic and spike utilization just when you really need it. For instance, take an application with many goroutines blocked on syscalls. Let's say most of the time they're not doing a lot, but suddenly (by chance) many wake up at once during a GC cycle. Suddenly the GC CPU utilization will drop (no space for idle workers) and the GC will think it's extremely behind, so at that moment it'll start forcing all these goroutines to assist, basically up to the utilization that the idle workers were doing. This makes our notion of a steady-state fragile, since it now depends on the CPU resources available. (I have a new pacer design in the works and while it can deal with this, it ends up with consistent undershoot when the idle workers are running; I don't think you can win here.)
Thirdly, there's a latency cost. Consider again the case of goroutines waiting on, say, network requests. The Go scheduler is going to schedule idle GC workers, and while they yield readily (they're "low priority" in a sense), their function is primarily non-preemptible and likely will be for the forseeable future. The cost of scheduling them away is small, but not free.
I'm honestly not sure. The only scenario I could construct where the idle GC workers come in handy is a bursty application, since you increase your chance that the GC fires outside of a burst, and you get a nice boost in throughput and latency since the GC conveniently tries to finish as fast as possible outside the burst.
But this argument is still somewhat dubious because there's no guarantee the GC won't just fire during the burst, so I'm not sure how much this actually helps. However, one reason I'm creating this issue is that maybe someone out there can think of another scenario, or has another use-case.
Why not now?
Removing the idle GC workers is unfortunately going to be risky from an implementation standpoint. @prattmic tried this out and unfortunately it uncovered at least one bug. It's possible that over time the idle GC workers have become critical to GC progress in some scenarios. Oops.
@prattmic concluded from his investigation into this that it's going to uncover a non-trivial amount of latent, subtle bugs (and I say bugs since theoretically the idle GC workers should have no effect on the function of the GC otherwise) and it's going to take a while to squash all of them.
As a result, it's going to be critical to land this early in the development cycle and give it plenty of time to soak, with an easy roll-back option (perhaps a runtime-internal constant).
However, despite all this, I think this is worth doing sooner rather than later. The longer we wait, the greater chance we have that seemingly orthogonal changes to the GC tangle this more.
The text was updated successfully, but these errors were encountered:
So who's doing the GC work then? Is it all handled during assists?
If so, then I worry about an application that stops allocating mid-cycle while the write barrier is active. Then is the write barrier active forever? Is all the floating garbage never collected (and resulting free pages returned to the OS)?
Or is some of the GC work done during write barriers also? That would eliminate the write-barrier-on-forever problem, and lessen, but maybe not eliminate, the floating garbage one (to still have a problem you'd need an application that runs but never writes a pointer).
@randall77 I only mean idle GC workers (which are dedicated low-priority workers that soak up additional idle Ps in addition to the 25% dedicated), not the 25% dedicated workers. I think it's very important to keep dedicated workers, for the reasons you mention (and others).