Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: remove idle GC workers #44163

Open
mknyszek opened this issue Feb 8, 2021 · 7 comments
Open

runtime: remove idle GC workers #44163

mknyszek opened this issue Feb 8, 2021 · 7 comments

Comments

@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Feb 8, 2021

Background

Today the Go GC will spin up dedicated GC workers which use 25% of GOMAXPROCS resources. But it will also soak up any additional GOMAXPROCS that are idle and run "idle GC workers" to do so.

The idea here is that those cores are idle anyway, so why not use them to speed up the GC cycle? There are negative performance implications to having the GC on longer than necessary (floating garbage accumulation for memory, and write barrier enabled for CPU time).

EDIT: To be clear, I'm not proposing removing dedicated GC workers, which use 25% of CPU resources during a GC cycle. Those are very, very important. Just the idle workers that can increase that number to 100% if the application is idle.

Why?

I believe today these idle GC workers are doing more harm than good.

Firstly, when a Go process is mostly idle, idle enough that the "every-2-minutes" GC kicks in, it's going to go full blast on all CPU cores, resulting in a CPU utilization spike. While this on it's own isn't a big deal, it makes a difference in the face of monitoring systems and auto-scaling systems. The former is surprising; one might see these spikes in their monitoring and spend time investigating them, only to find there's nothing to do about that (today). The latter is dangerous: if an auto-scaling system decides to give an application more resources because it notices this spike, then the Go GC is now going to soak up those resources, resulting in a significant (if temporary, and generally bounded) increase in resources.

Secondly, the Go pacer fails to account for this extra CPU utilization. This significantly skews the pacer's function and makes it consistently undershoot its target in the steady-state. The goal of the pacer is to reach some GC CPU utilization target, but that target is somewhat meaningless if the idle GC workers can kick in and blow that away in applications that have any idle time. The result is that if idle GC workers are running, the GC will consistently undershoot its goal.

Worse still, this extra CPU utilization isn't easy to account for in the pacer. If we pace assuming the idle GC workers are running, it's possible for the GC to really panic and spike utilization just when you really need it. For instance, take an application with many goroutines blocked on syscalls. Let's say most of the time they're not doing a lot, but suddenly (by chance) many wake up at once during a GC cycle. Suddenly the GC CPU utilization will drop (no space for idle workers) and the GC will think it's extremely behind, so at that moment it'll start forcing all these goroutines to assist, basically up to the utilization that the idle workers were doing. This makes our notion of a steady-state fragile, since it now depends on the CPU resources available. (I have a new pacer design in the works and while it can deal with this, it ends up with consistent undershoot when the idle workers are running; I don't think you can win here.)

Thirdly, there's a latency cost. Consider again the case of goroutines waiting on, say, network requests. The Go scheduler is going to schedule idle GC workers, and while they yield readily (they're "low priority" in a sense), their function is primarily non-preemptible and likely will be for the forseeable future. The cost of scheduling them away is small, but not free.

Why not?

I'm honestly not sure. The only scenario I could construct where the idle GC workers come in handy is a bursty application, since you increase your chance that the GC fires outside of a burst, and you get a nice boost in throughput and latency since the GC conveniently tries to finish as fast as possible outside the burst.

But this argument is still somewhat dubious because there's no guarantee the GC won't just fire during the burst, so I'm not sure how much this actually helps. However, one reason I'm creating this issue is that maybe someone out there can think of another scenario, or has another use-case.

Why not now?

Removing the idle GC workers is unfortunately going to be risky from an implementation standpoint. @prattmic tried this out and unfortunately it uncovered at least one bug. It's possible that over time the idle GC workers have become critical to GC progress in some scenarios. Oops.

@prattmic concluded from his investigation into this that it's going to uncover a non-trivial amount of latent, subtle bugs (and I say bugs since theoretically the idle GC workers should have no effect on the function of the GC otherwise) and it's going to take a while to squash all of them.

As a result, it's going to be critical to land this early in the development cycle and give it plenty of time to soak, with an easy roll-back option (perhaps a runtime-internal constant).

However, despite all this, I think this is worth doing sooner rather than later. The longer we wait, the greater chance we have that seemingly orthogonal changes to the GC tangle this more.

CC @aclements @prattmic

@randall77
Copy link
Contributor

@randall77 randall77 commented Feb 8, 2021

So who's doing the GC work then? Is it all handled during assists?

If so, then I worry about an application that stops allocating mid-cycle while the write barrier is active. Then is the write barrier active forever? Is all the floating garbage never collected (and resulting free pages returned to the OS)?

Or is some of the GC work done during write barriers also? That would eliminate the write-barrier-on-forever problem, and lessen, but maybe not eliminate, the floating garbage one (to still have a problem you'd need an application that runs but never writes a pointer).

@mknyszek
Copy link
Contributor Author

@mknyszek mknyszek commented Feb 8, 2021

@randall77 I only mean idle GC workers (which are dedicated low-priority workers that soak up additional idle Ps in addition to the 25% dedicated), not the 25% dedicated workers. I think it's very important to keep dedicated workers, for the reasons you mention (and others).

@mknyszek
Copy link
Contributor Author

@mknyszek mknyszek commented Apr 15, 2021

Definitely not in this release, but I've found some other issues with dedicated worker scheduling in working on #44167. I would like to address all these issues in the final implementation for #44167 which is currently in progress.

@cagedmantis
Copy link
Contributor

@cagedmantis cagedmantis commented Aug 12, 2021

This issue is currently labeled as early-in-cycle for Go 1.18.
That time is now, so this is a friendly ping so the issue is looked at again.

@mknyszek
Copy link
Contributor Author

@mknyszek mknyszek commented Aug 12, 2021

thanks @cagedmantis, I'm preparing a change for this, it's on my radar.

@mknyszek
Copy link
Contributor Author

@mknyszek mknyszek commented Aug 12, 2021

ah, though I will note I probably won't land it before the tree opens. I don't see a need.

@gopherbot
Copy link

@gopherbot gopherbot commented Sep 20, 2021

Change https://golang.org/cl/351131 mentions this issue: runtime: add an experiment to disable idle GC mark workers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants