resources.cpu.requests + GOMAXPROCS #501

julianvmodesto · 2017-07-23T05:21:25Z

The Prometheus documentation states that OS threads is set by GOMAXPROCS. The default for GOMAXPROCS:

As of Go 1.5 the default value is the number of cores available.

The Prometheus Operator uses resources.memory.requests to set -local.target.heap.size for efficient memory usage.

Similary, setting resources.cpu.requests should set GOMAXPROCS for efficient CPU usage.

With Kubernetes + Docker, the default value of GOMAXPROCS is the number of cores available to the Kubernetes minion, not the number of cores available to the container as by --cpu-quota nor --cpu-shares in Docker. This is because runtime.NumCPU() is used to set the default for GOMAXPROCS.

For example, on a Kubernetes minion with 32 cores, GOMAXPROCS will be set to 32. Ideally, if we set resources.cpu.limits to 4, then GOMAXPROCS should be set to 4.

See discussion on prometheus-users.

The text was updated successfully, but these errors were encountered:

brancz · 2017-07-24T08:32:08Z

As mentioned on the mailing list, I'm not sure we're actually able to do this, as we would need to know that then machine that the Prometheus pod is scheduled onto is going to have a certain amount of cores. Aside from that I'm not sure combining the number of GOMAXPROCS through limit is the right way to go, my feeling is that parallelization/cpushares should be treated distinctly.

brancz · 2017-07-24T14:56:48Z

I wrote on the mailing list once more. I'm still questioning whether we should use the same field to configure cpu utilization and paralellization. I feel we might want to configure them separately. Wdyt?

Also I understand the desire to set these kinds of things, but setting GOMAXPROCS is not really a general solution like cgroups limiting the memory/cpu utilization, there is no guarantee that the software might still exceed what you are trying to configure. Not trying to be the devils advocate, just want to make sure we have a solid use-case 🙂 .

julianvmodesto · 2017-07-24T23:01:09Z

Hmm makes sense to me that

GOMAXPROCS=1 is a bad default for users who request 500m CPU, and
GOMAXPROCS doesn't make any guarantees for CPU utilization.

(As my strawman argument, Prometheus Operator uses storage.local.target-heap-size to set a soft "limit" on memory based on resources.requests.memory.)

I'm running two Prometheuses, and based on observed usage on 16 vCPU machines, I eventually set:

resources.requests.cpu=500m + resources.limits.cpu=1, and
resources.requests.cpu=2 + resources.limits.cpu=3.

I want my Prometheuses to run efficiently on these machines but also I want to accurately request CPU for efficient scheduling + cluster utilization.

As a user, I'd like for Prometheus Operator to set GOMAXPROCS for me instead of configuring it myself.
I have the expectation that Prometheus Operator will choose defaults that are best for me, so I don't think adding a parameter for GOMAXPROCS to the spec would be meaningful in this case.

Is there a set of properties that includes resources.limits.cpu + resources.requests.cpu + runtime.NumCPU() that we could use to set GOMAXPROCS < runtime.NumCPU()?

Or is GOMAXPROCS=runtime.NumCPU() the recommended default for running Prometheus?

fabxc · 2017-07-27T12:06:25Z

What problem does this pull request address?

While I see the theoretical points, it would be great to get some profiling data to determine that this added complexity has practical benefit. Did you run into any problems with not setting GOMAXPROCS?

brancz · 2017-10-16T13:37:30Z

I'm pretty such GOMAXPROCS is not the right tool to solve the problem. As this issue has gone pretty stale, I'd say we can re-discuss the actual problem at hand and see how we can solve it. Closing here, but feel free to re-open or continue the discussion.

embano1 · 2017-11-07T09:15:56Z

Have a look at this thread from Monzo: golang/go#19378
Interesting read on related matter (CPU quota).

An easy approach for this discussion here would be to use ResourceFieldRef based on limits.cpu in the manifest (https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/#use-container-fields-as-values-for-environment-variables).

What do you think?

embano1 · 2018-07-20T10:31:40Z

For those interested and related to this issue: did some benchmarking when CFS quota is not aligned with GOMAXPROCS. Details here: uber-go/automaxprocs#12

Before this gets into the wrong direction: these benchmarks show that there can be a potential performance impact. But as always, it's focused and synthetic benchmarks. Stress testing against prometheus has to be done to validate whether this is really required.

brancz · 2018-07-20T11:24:35Z

@embano1 thanks for sharing! I gave this to our scalability team, we will evaluate it and might run some tests to validate whether that could have significant impact.

embano1 · 2018-07-20T12:05:08Z

That would be great, thx a ton!

redbaron · 2019-07-05T21:25:31Z

Just wanted to add another datapoint. On 12 CPU VM with container limits set to 4, same query runs:

13 seconds without GOMAXPROCS
4.3 seconds with GOMAXPROCS=4

I think main slowdown comes from CFS quata period, where it starves process for cpu time for 100ms every time it runs over allocated quota.

brancz · 2019-07-08T07:34:29Z

@redbaron Thanks for testing it! Out of curiosity do you have the possibility to disable CFS quotas globally? That way we could validate if it is that, that is causing it.

Also are you just setting the limit to 4 or request and limit? If both, then I'd actually expect that guaranteed QoS would configure the container to its own cpuset. I'd be curious to see if the still phenomenon still occurs when on a dedicated cpuset. Could you try this if you haven't already? :)

sslavic · 2022-06-08T19:00:36Z

I don't see a way currently to configure Prometheus CR so that operator configures env for Prometheus main container in the StatefulSet. Am I missing something? When supported this would allow community to experiment , and depending on result defaults change might be easier to justify. WDYT, would it make sense to support configuring env in prometheusSpec for a start?

sslavic · 2022-06-08T19:44:15Z

Found a workaround - scaled down prometheus operator, then edited StatefulSet to add env. Runtime config got changed, but sadly, configuring GOMAXPROCS didn't make a difference.

simonpasquier · 2022-06-10T15:53:26Z

you could use strategic merge patch to inject env variables in the prometheus container

jpiper · 2022-08-04T09:36:15Z

In your Prometheus CRD:

spec:
  containers:
  - name: prometheus
    env:
    - name: "GOMAXPROCS"
      value: "4"

simonpasquier · 2023-05-12T12:46:50Z

Latest Prometheus releases can also adjust GOMAXPROCS to the CPU limits with the --enable-feature=auto-gomaxprocs CLI argument.

julianvmodesto mentioned this issue Jul 23, 2017

Set GOMAXPROCS from requests.limits.cpu #502

Closed

brancz closed this as completed Oct 16, 2017

jochen42 mentioned this issue Oct 12, 2019

#77 autogomaxprocs i-love-flamingo/flamingo#79

Merged

alok87 mentioned this issue Apr 1, 2021

Batcher CPU optimization practo/tipoca-stream#173

Open

jan--f mentioned this issue Jun 13, 2023

OCPBUGS-13153: Limit the value of GOMAXPROCS on node-exporter. openshift/cluster-monitoring-operator#1996

Merged

5 tasks

trengrj mentioned this issue Sep 28, 2023

async: share workers between queues weaviate/weaviate#3590

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resources.cpu.requests + GOMAXPROCS #501

resources.cpu.requests + GOMAXPROCS #501

julianvmodesto commented Jul 23, 2017 •

edited

brancz commented Jul 24, 2017

brancz commented Jul 24, 2017

julianvmodesto commented Jul 24, 2017

fabxc commented Jul 27, 2017

brancz commented Oct 16, 2017

embano1 commented Nov 7, 2017

embano1 commented Jul 20, 2018

brancz commented Jul 20, 2018

embano1 commented Jul 20, 2018 via email •

edited

redbaron commented Jul 5, 2019

brancz commented Jul 8, 2019

sslavic commented Jun 8, 2022

sslavic commented Jun 8, 2022

simonpasquier commented Jun 10, 2022

jpiper commented Aug 4, 2022

simonpasquier commented May 12, 2023

resources.cpu.requests + GOMAXPROCS #501

resources.cpu.requests + GOMAXPROCS #501

Comments

julianvmodesto commented Jul 23, 2017 • edited

brancz commented Jul 24, 2017

brancz commented Jul 24, 2017

julianvmodesto commented Jul 24, 2017

fabxc commented Jul 27, 2017

brancz commented Oct 16, 2017

embano1 commented Nov 7, 2017

embano1 commented Jul 20, 2018

brancz commented Jul 20, 2018

embano1 commented Jul 20, 2018 via email • edited

redbaron commented Jul 5, 2019

brancz commented Jul 8, 2019

sslavic commented Jun 8, 2022

sslavic commented Jun 8, 2022

simonpasquier commented Jun 10, 2022

jpiper commented Aug 4, 2022

simonpasquier commented May 12, 2023

julianvmodesto commented Jul 23, 2017 •

edited

embano1 commented Jul 20, 2018 via email •

edited