Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: make the proportion of CPU the GC uses based on actual available CPU time and not GOMAXPROCS #59715

Open
mknyszek opened this issue Apr 19, 2023 · 2 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@mknyszek
Copy link
Contributor

mknyszek commented Apr 19, 2023

Currently the Go GC sets its parallelism as 25% of GOMAXPROCS. In doing so, it effectively assumes the Go runtime has access to GOMAXPROCS*N seconds of CPU time in N seconds of wall time, which is true in many circumstances, but not all.

For example, if a container is configured to have 100 ms of CPU time available for each 1 second window of wall time and GOMAXPROCS=4 (because that's how much parallelism is available on the machine), then the Go runtime is effectively assuming it can use 1 second of CPU time very 1 second of wall time on GC alone.

In practice, this is not true, resulting in significant CPU throttling that can hurt latency-sensitive applications. To be more specific, throttling in general is not really a Go runtime issue. The issue is that the GC on its own is using more CPU over a given window of time than the container is actually allowed.

Another example where this can go awry is with GOMEMLIMIT. If a container with a small CPU reservation runs up against the memory limit and the GC starts to execute frequently enough to then subsequently hit the 50% GC CPU limit, that 50% GC CPU limit is going to be based on GOMAXPROCS. Once again, this results in significant throttling that can make an already bad latency situation worse.

One possible fix for this is to make the Go GC container-aware. Rather than set the GC CPU limit, or the 25% parallelism from GOMAXPROCS, the Go GC could derive these numbers from the container.

On Linux, that means using the CPU cgroups parameters cpu.cfs_period_us and cpu.cfs_quota_us. More specifically, the GC CPU limit is set to 50% of cpu.cfs_quota_us in any given window of cpu.cfs_period_us, while the GC's default mark-phase parallelism is 25% of cpu.cfs_quota_us / cpu.cfs_period_us. (Disclaimer: this second part I'm less sure about.)

The Go runtime would re-read these parameters regularly, for example every few seconds, to stay up-to-date with its current environment.

@mknyszek mknyszek added this to the Backlog milestone Apr 19, 2023
@mknyszek mknyszek added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Apr 19, 2023
@mknyszek mknyszek self-assigned this Apr 19, 2023
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Apr 19, 2023
@mknyszek
Copy link
Contributor Author

I should note that on Linux we'd ignore cpu.shares because that seems to be the norm these days. The main issue is that the runtime can't really do anything useful with cpu.shares without knowing the total number of shares distributed across the same cgroup.

@mknyszek
Copy link
Contributor Author

On the other hand, it might be a little bit uncharacteristic and backwards incompatible for the runtime to just adjust itself based on container configuration. That might suggest having another knob, but we don't like knobs. There might be a better way to identify how much CPU time the GC should be using.

At the very least, this issue should serve to track any improvements we make in this area.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Status: Todo
Development

No branches or pull requests

2 participants