Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable cpu quota(use only cpuset) for pod Guaranteed #70585

Closed
Wenfeng-GAO opened this issue Nov 2, 2018 · 32 comments
Closed

Disable cpu quota(use only cpuset) for pod Guaranteed #70585

Wenfeng-GAO opened this issue Nov 2, 2018 · 32 comments

Comments

@Wenfeng-GAO
Copy link

@Wenfeng-GAO Wenfeng-GAO commented Nov 2, 2018

What would you like to be added:

Disable cpu quota when we use cpuset for guaranteed pods.

Why is this needed:

For now, kubelet default adds cpu quota and share for all cases when cpuCFSQuota is open, even if cpuset will be set in guaranteed pod case.

However, this kind of use(cpuset with quota) may sometimes bring us cpu throttle time which we don't expect.

My question is why not make a if statement when we add cpuQuota config, if it's guaranteed case, just set cpuQuota=-1?

	if m.cpuCFSQuota {
		// if cpuLimit.Amount is nil, then the appropriate default value is returned
		// to allow full usage of cpu resource.
		cpuQuota, cpuPeriod := milliCPUToQuota(cpuLimit.MilliValue())
		lc.Resources.CpuQuota = cpuQuota
		lc.Resources.CpuPeriod = cpuPeriod
	}

/kind feature

@Wenfeng-GAO

This comment has been minimized.

Copy link
Author

@Wenfeng-GAO Wenfeng-GAO commented Nov 2, 2018

/sig kubelet

@Wenfeng-GAO

This comment has been minimized.

Copy link
Author

@Wenfeng-GAO Wenfeng-GAO commented Nov 2, 2018

/sig architecture

@clkao

This comment has been minimized.

Copy link
Contributor

@clkao clkao commented Nov 2, 2018

/sig node

@zaa

This comment has been minimized.

Copy link

@zaa zaa commented Nov 28, 2018

Hello,

We are using cpu manager in k8s v1.10.9 and experiencing the same issue.

https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/ states:

As Guaranteed pods whose containers fit the requirements for being statically assigned are scheduled to the node, CPUs are removed from the shared pool and placed in the cpuset for the container. CFS quota is not used to bound the CPU usage of these containers as their usage is bound by the scheduling domain itself. In others words, the number of CPUs in the container cpuset is equal to the integer CPU limit specified in the pod spec. This static assignment increases CPU affinity and decreases context switches due to throttling for the CPU-bound workload.

However CFS quota is not used to bound the CPU usage of these containers as their usage is bound by the scheduling domain itself. portion of the document does not work in our case.

So, the setup looks as follows:

  1. We have cpu manager enabled and running with static policy
  2. I have a pod with two containers
  3. The first container has limits/requests set to 2 whole cpus
  4. The second container has limits/requests set to 0.25 cpu

k8s schedules the pod to a node. When I exec into the first container, I see that it has cpu set defined correctly:

/sys/fs/cgroup/cpu,cpuacct # cat /proc/1/status | grep Cpu
Cpus_allowed:    00000000,00000000,00000000,00000202
Cpus_allowed_list:    1,9

However, for some reason (I presume that this is a bug), the container has CFS quota defined too, which should not be the case. The pod is restricted to the two cpus already and should be able to use 100% of them, without quota:

/sys/fs/cgroup/cpu,cpuacct # cat cpu.cfs_*
100000
200000

When I check cpu stats, I see that the container is severely throttled:

/sys/fs/cgroup/cpu,cpuacct # cat cpu.stat
nr_periods 2686183
nr_throttled 363111
throttled_time 27984636050628

Moreover, when I check cgroup set of the whole pod, I see that k8s put CFS quota to the whole pod too:

oot@ip-10-33-99-26:/sys/fs/cgroup/cpu,cpuacct/kubepods/pod62d886b5-ee75-11e8-8f1c-0a4ce21644ea# cat cpu.cfs_*
100000
225000
root@ip-10-33-99-26:/sys/fs/cgroup/cpu,cpuacct/kubepods/pod62d886b5-ee75-11e8-8f1c-0a4ce21644ea# cat fafd0f38288710446b53402716d11449c4b046a09c26266d2725eae2ac7f0352/cpu.cfs_*
100000
200000

My understanding is that if the pod is in guaranteed QoS class and requests integral number of cpus, it should not be subject to CFS quota (as stated in the docs).
So in this case:

  1. the pod should not have CFS quota defined.
  2. The first container should have cpuset defined, but no cfs quota (as the docs imply)
  3. The second container should have CFS quota defined as 25000 µs of 100000 µs period
@zaa

This comment has been minimized.

Copy link

@zaa zaa commented Nov 28, 2018

cc: @ipuustin @ConnorDoyle might be of interest to you

@mariusgrigoriu

This comment has been minimized.

Copy link

@mariusgrigoriu mariusgrigoriu commented Feb 14, 2019

Same issue with CPU manager here. We're not realizing any performance benefits due to this issue. In fact we're getting far worse latency due to the CFS throttling bug. Was it intentional to add CFS quota to pods that are CPU pinned? The documentation makes it seem like having a CFS quota on those pods is a bug.

@dannyk81

This comment has been minimized.

Copy link

@dannyk81 dannyk81 commented Mar 25, 2019

@Wenfeng-GAO it seems that this should be labelled as bug and not a feature 😄

as mentioned by @mariusgrigoriu and @zaa, CPUManager's documentation suggests that Pods with pinned CPUs should not be bound to CPU quotas, however it is quite clear that they are, which seems wrong.

We keep getting hit by cpu throttling on Guaranteed QoS Pods with pinned CPUs, which is totally counter-productive 😞

@praseodym

This comment has been minimized.

Copy link
Contributor

@praseodym praseodym commented Mar 25, 2019

/kind bug
/remove-kind feature

@k8s-ci-robot k8s-ci-robot added kind/bug and removed kind/feature labels Mar 25, 2019
@dannyk81

This comment has been minimized.

Copy link

@dannyk81 dannyk81 commented Mar 25, 2019

FYI @ConnorDoyle

I came across your comment here #51135 (comment) (from a while back) and it seems that this contradicts what you've mentioned in that post.

Figured you'd may be interested to weigh in.

@praseodym

This comment has been minimized.

Copy link
Contributor

@praseodym praseodym commented Mar 25, 2019

In theory setting CPU CFS quota for CPU pinned pods should not affect performance, but in practice it does cause performance degradation due to the (buggy) way that CFS quota are implemented in the Linux kernel.

I think the best fix would be to unset any CPU CFS quota from the pod in the CPU manager when reserved CPUs (CPU sets) are assigned.

@dannyk81

This comment has been minimized.

Copy link

@dannyk81 dannyk81 commented Mar 25, 2019

Thanks @praseodym!

Totally agree, unseting the quota for Pods with cpusets seems like the best solution.

Regarding the Kernel, it seems that a recent patch (included in 4.18) should solve the issue --> #67577 (comment)

OTOH, this comment --> #67577 (comment) is a bit confusing ¯_(ツ)_/¯

We are going to upgrade to latest CoreOS (2023.5.0) that comes with Kernel 4.19.25 and see how things look.

@chiluk

This comment has been minimized.

Copy link

@chiluk chiluk commented Mar 25, 2019

512ac99 may resolve the issue in this thread, and assuming things are all correctly pinned and cgrouped the issue I discuss in the other bug due to 512ac99 should not be in play.

@praseodym

This comment has been minimized.

Copy link
Contributor

@praseodym praseodym commented Mar 25, 2019

That's good to know!

This issue will still need some work to align the implemented behaviour with the documentation ("CFS quota is not used to bound the CPU usage of these containers as their usage is bound by the scheduling domain itself.") or vice versa.

@dannyk81

This comment has been minimized.

Copy link

@dannyk81 dannyk81 commented Mar 25, 2019

Thanks @chiluk, just read your comment on the other thread (#67577 (comment)) and to be honest this seems a bit worrying as we might actually hit this if we go for >4.18 kernels.

We are using 48-core bare-metals for our worker servers and also deploy java (and php) workloads in various sizes.

So, although this might help in the case of the Pods with pinned CPUs it would actually make things worse for the rest of the workload?

@blakebarnett

This comment has been minimized.

Copy link
Contributor

@blakebarnett blakebarnett commented Mar 25, 2019

So, although this might help in the case of the Pods with pinned CPUs it would actually make things worse for the rest of the workload?

If you're pinning a container to a CPU with cpu-manager it should have full use of that CPU. The rest of the workload should be excluded. The rest of the system should behave normally, but CFS quota should be disabled for the cgroups that cpuset have been used on. That's my understanding anyway.

@dannyk81

This comment has been minimized.

Copy link

@dannyk81 dannyk81 commented Mar 25, 2019

@blakebarnett Yes, that's accurate - however it's not the case now, and hopefully will be fixed by #75682

What concerns me is that using the latest kernel (>4.18) may degrade performance for the pods that are not pinned to specific cpusets (the majority) as indicated by @chiluk's tests...

@liggitt

This comment has been minimized.

Copy link
Member

@liggitt liggitt commented May 4, 2019

/remove-sig architecture

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

@k8s-ci-robot k8s-ci-robot commented May 4, 2019

@liggitt: Those labels are not set on the issue: sig/

In response to this:

/remove-sig architecture

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@haosdent

This comment has been minimized.

Copy link
Contributor

@haosdent haosdent commented May 9, 2019

+1

@derekwaynecarr

This comment has been minimized.

Copy link
Member

@derekwaynecarr derekwaynecarr commented May 20, 2019

I agree we should be able to disable the use of cfs quota for pods in Guaranteed QoS class when static cpu manager is enabled.

/priority important-soon

@draveness

This comment has been minimized.

Copy link
Member

@draveness draveness commented May 27, 2019

@draveness

This comment has been minimized.

Copy link
Member

@draveness draveness commented Jun 14, 2019

/assign

@evansheng

This comment has been minimized.

Copy link

@evansheng evansheng commented Jul 30, 2019

+1 @praseodym are you still working on #75682?
We (airbnb) would like to test the fix on our clusters

@praseodym

This comment has been minimized.

Copy link
Contributor

@praseodym praseodym commented Jul 30, 2019

@evansheng I am, but was stuck on a couple of things (see questions in PR comments) — any help would be greatly appreciated!

@fejta-bot

This comment has been minimized.

Copy link

@fejta-bot fejta-bot commented Oct 28, 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@warmchang

This comment has been minimized.

Copy link
Contributor

@warmchang warmchang commented Oct 29, 2019

/remove-lifecycle stale

@chiluk

This comment has been minimized.

Copy link

@chiluk chiluk commented Oct 30, 2019

AFAICT this inadvertant throttling is likely resolved by fixes 512ac999 and de53fd7aedb in the kernel. Complain to your distros to get these fixes backported.

@draveness

This comment has been minimized.

Copy link
Member

@draveness draveness commented Oct 30, 2019

AFAICT this inadvertant throttling is likely resolved by fixes 512ac999 and de53fd7aedb in the kernel. Complain to your distros to get these fixes backported.

That would be great, could you help to link these commits to this issue?

@chiluk

This comment has been minimized.

Copy link

@chiluk chiluk commented Nov 9, 2019

I think this can probably be closed. If people start hitting throttling again, this can probably be re-opened/revisited.

@draveness

This comment has been minimized.

Copy link
Member

@draveness draveness commented Nov 10, 2019

I think this can probably be closed. If people start hitting throttling again, this can probably be re-opened/revisited.

👍

/close

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

@k8s-ci-robot k8s-ci-robot commented Nov 10, 2019

@draveness: Closing this issue.

In response to this:

I think this can probably be closed. If people start hitting throttling again, this can probably be re-opened/revisited.

👍

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.