Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: make `GOMAXPROCS` cfs-aware on `GOOS=linux` #33803

Open
jcorbin opened this issue Aug 23, 2019 · 8 comments

Comments

@jcorbin
Copy link

commented Aug 23, 2019

Problem

The default setting of runtime.GOMAXPROCS() (to be the number of os-apparent processors) can be greatly misaligned with container cpu quota (e.g. as implemented through cfs bandwidth control by docker).

This can lead to large latency artifacts in programs, especially under peak load, or when saturating all processors during background GC phases.

The smaller the container / larger the machine = the worse this effect becomes: let's say you deploy a fleet of micro service workers, each container having a cpu quota of 4, on a fleet of 32 processor[1] machines.

To understand why, you really have to understand the CFS quota mechanism; this blog post does well (with pictures); this kubernetes issue further explores the topic (especially as it relates to a recently resolved kernel cpu accounting bug). But to summarize it briefly for this issue:

  • there is a quota period, say 100ms
  • and there is then a quota, say 400ms to affect a 4-processor quota
  • within any period, once the process group exceeds its quota it is throttled

Running an application workload at a reasonable level of cpu efficiency makes it quite likely that you'll be spiking up to your full quota and getting throttled.

Background waste workload, like concurrent GC[2], is especially likely to cause quota exhaustion.

I hesitate to even call this a "tail latency" problem; the artifacts are visible in the main body of and can shift the entire latency distribution.

Solution

If you care about latency, reliability, predictability (... insert more *ilities to taste), then the correct thing to do is to never exceed your cpu quota, by setting GOMAXPROCS=max(1, floor(cpu_quota)).

Using this as a default for GOMAXPROCS makes the world safe again, which is why we use uber-go/automaxprocs in all of our microservices.

NOTEs

  1. intentionally avoiding use of the word "core"; the matter of hyper-threading and virtual-vs-physical cores is another topic
  2. /digression: can't not mention userspace scheduler pressure induced by background GC; where are we at with goroutine preemption again?

@gopherbot gopherbot added this to the Proposal milestone Aug 23, 2019

@gopherbot gopherbot added the Proposal label Aug 23, 2019

@jcorbin

This comment has been minimized.

Copy link
Author

commented Aug 23, 2019

I really have to disagree with some of the latter suggestions in kubernetes/kubernetes#67577
like
kubernetes/kubernetes#67577 (comment): using ceil(quota)+2, or in any way over-provisioning GOMAXPROCS vs cpu quota, is at best a statistical gamble to ameliorate current shortcomings in Go's userspace scheduler.

Some background on uber-go/automaxprocs#13 (changing from ceil(quota) to floor(quota)):

  • over-provisioning seemed reasonable at first when the guess was "if we provision fractional cores, use them"
  • but later on we ended up needing fractionals to add margin for other supporting processes injected into your container (e.g. by Mesos Aurora's Thermos executor)

I'll reprise (copied with some edits) my description from that issue here for easy reading:

  • as far as the Go scheduler is concerned, there's no such thing as a fractional CPU
  • so let's say you have your quota set to N + p for some integer N and some 0.0 < p < 1.0
  • the only safe assumption then is that you're using that p value as a hedge for something like "systems effects" or "c libraries"
  • in that latter case, what you really might want is to be able give maxprocs some value K of CPUs stolen for some parasite like a C library or sidecar; but this will always need to be application-config specific
@jcorbin

This comment has been minimized.

Copy link
Author

commented Aug 23, 2019

Noting: #19378 (comment) explores some GC-CFS relationship

@jcorbin

This comment has been minimized.

Copy link
Author

commented Aug 23, 2019

For comparative purposes, Oracle blog post about Java adding similar support ( especially for GC threads )

@ianlancetaylor

This comment was marked as outdated.

Copy link
Contributor

commented Aug 23, 2019

In the original post the links to "cfs bandwidth control" and "this blog post" and "kubernetes issue" do not resolve, so I'm having a hard time understanding this issue.

@jcorbin

This comment was marked as outdated.

Copy link
Author

commented Aug 23, 2019

In the original post the links to "cfs bandwidth control" and "this blog post" and "kubernetes issue" do not resolve, so I'm having a hard time understanding this issue.

Oops; fixed

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Aug 23, 2019

CC @aclements @mknyszek

(I'm not sure this has to be a proposal at all. This is more like a bug report. See https://golang.org/s/proposal.)

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Sep 3, 2019

Changing this from a proposal into a feature request for the runtime package.

@ianlancetaylor ianlancetaylor changed the title proposal: make `GOMAXPROCS` cfs-aware on `GOOS=linux` runtime: make `GOMAXPROCS` cfs-aware on `GOOS=linux` Sep 3, 2019

@ianlancetaylor ianlancetaylor modified the milestones: Proposal, Go1.14 Sep 3, 2019

@mknyszek

This comment has been minimized.

Copy link
Contributor

commented Sep 3, 2019

@jcorbin I'm certainly not opposed. After dissecting uber-go/automaxprocs it seems like it requires a bunch of string parsing to really get to the numbers.

This is possible to do from the runtime but also a bit complex. Note that you can't allocate and you need to use raw system calls to process files.

I previously did something similar to get the default huge page size but later found you could just read an integer hiding down in /sys at a path that doesn't change between copies of linux and different environments.

I assume it's not quite so simple with cgroups (even though for bash on my machine that just ends up in /sys) and that you actually need to do this parsing because the paths to the files containing the quota and period could be different depending on your environment.

If it's possible to reach at these values in a simpler way (i.e. just a static path at which there's a file that just contains an integer) that would be preferred.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.