Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unset CPU CFS quota when CPU sets are in use #75682

Conversation

@praseodym
Copy link
Contributor

praseodym commented Mar 25, 2019

What type of PR is this?
/kind bug

What this PR does / why we need it:
Having CFS CPU quota enabled can cause performance degradation due to the (buggy) way current Linux kernels enforce this quota.

For containers that have CPU cores dedicated to them through CPU sets, any CFS CPU quota can be safely unset to prevent this issue.

This PR also aligns the implementation with the CPU Management documentation ("CFS quota is not used to bound the CPU usage of these containers as their usage is bound by the scheduling domain itself.").

Which issue(s) this PR fixes:
Fixes #70585

Does this PR introduce a user-facing change?:

Unset CPU CFS quota when CPU sets are in use
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Mar 25, 2019

Hi @praseodym. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ConnorDoyle

This comment has been minimized.

Copy link
Member

ConnorDoyle commented May 4, 2019

/ok-to-test

@ConnorDoyle

This comment has been minimized.

Copy link
Member

ConnorDoyle commented May 4, 2019

@praseodym did you collect any benchmarks showing performance for pinned containers with and without this patch? If so it would be helpful to add the results here. Thanks!

@praseodym

This comment has been minimized.

Copy link
Contributor Author

praseodym commented May 5, 2019

/retest

@praseodym

This comment has been minimized.

Copy link
Contributor Author

praseodym commented May 6, 2019

@praseodym did you collect any benchmarks showing performance for pinned containers with and without this patch? If so it would be helpful to add the results here. Thanks!

I did not collect any benchmarks yet. Would you have any suggestions on how to execute a fair benchmark?

@dims

This comment has been minimized.

Copy link
Member

dims commented May 29, 2019

/milestone v1.15

@k8s-ci-robot k8s-ci-robot added this to the v1.15 milestone May 29, 2019
@dims

This comment has been minimized.

Copy link
Member

dims commented May 29, 2019

/priority important-soon

@dims

This comment has been minimized.

Copy link
Member

dims commented May 29, 2019

@praseodym praseodym force-pushed the praseodym:unset-cfs-cpu-quota-when-cpuset-defined branch from 3a2d64c to 098e171 May 29, 2019
@praseodym praseodym force-pushed the praseodym:unset-cfs-cpu-quota-when-cpuset-defined branch from 27d7490 to f42dbb7 May 30, 2019
@praseodym

This comment has been minimized.

Copy link
Contributor Author

praseodym commented May 30, 2019

Updated. The CPU CFS quota is now only unset when a real CPU manager policy is active and the pod is in the guaranteed class.

Unfortunately this does require moving a bit of logic from the static CPU manager policy to the generic CPU manager code, which might be considered as a layering violation. There is already some other code in the generic CPU manager code to support the static policy however.

/hold cancel

@praseodym

This comment has been minimized.

Copy link
Contributor Author

praseodym commented May 30, 2019

/retest

@derekwaynecarr

This comment has been minimized.

Copy link
Member

derekwaynecarr commented Jun 5, 2019

can we enhance the cpu_manager_test e2e so we ensure the right outcome never regresses?

Copy link
Member

derekwaynecarr left a comment

please add an e2e test that validates the desired container and pod cgroup settings so we know this works and does not regress. the pod cgroup will still have the cfs quota applied at that scope.

@ConnorDoyle can you confirm?

If it is set at the pod boundary, do we know this really has the performance benefit you were looking to achieve?

@@ -309,14 +310,35 @@ func findContainerIDByName(status *v1.PodStatus, name string) (string, error) {
return "", fmt.Errorf("unable to find ID for container with name %v in pod status (it may not be running)", name)
}

func (m *manager) updateContainerCPUSet(containerID string, cpus cpuset.CPUSet) error {
func (m *manager) updateContainerCPUSet(pod *v1.Pod, container *v1.Container, containerID string, cpus cpuset.CPUSet) error {

This comment has been minimized.

Copy link
@praseodym

praseodym Jun 15, 2019

Author Contributor

Yes, unfortunately it is. Running in kind gives me:

> cat /sys/fs/cgroup/cpu,cpuacct/docker/8d89c6f3cf8e5aaace1beb1adad3f38b926420ff5490d53a80a18b4457a60834/kubepods/pod71ef1b9f-8373-4dd8-9d57-a47706112750/cpu.cfs_quota_us
100000
> cat /sys/fs/cgroup/cpu,cpuacct/docker/8d89c6f3cf8e5aaace1beb1adad3f38b926420ff5490d53a80a18b4457a60834/kubepods/pod71ef1b9f-8373-4dd8-9d57-a47706112750/7f21da8aaab220ea945cca3c29c8af1217a09d2b146721b2f98b051d063c74a5/cpu.cfs_quota_us
-1

I checked cpu.stat for the container and it had throttled_time 0, but cpu.stat for the pod (one level up in the hierarchy) did have a non-zero throttled_time so there is some throttling going on.

Solving this is a little more complicated. Unsetting the CFS CPU quota on the pod level would always make sense if there is only one container running with static CPU policies assigned, but with multiple containers in a pod we would have to check whether that holds for all of them. Such a check could placed in the cpu_manager's reconcileState function, but I'm not sure whether the pod resources can be easily updated. Do you know whether that is the case?

@dims

This comment has been minimized.

Copy link
Member

dims commented Jun 5, 2019

@praseodym looks like from the questions raised by @derekwaynecarr and @ConnorDoyle that this is not yet ready. so let's move the milestone to 1.16.

/milestone v1.16

@k8s-ci-robot k8s-ci-robot modified the milestones: v1.15, v1.16 Jun 5, 2019
@praseodym praseodym force-pushed the praseodym:unset-cfs-cpu-quota-when-cpuset-defined branch from f42dbb7 to bbbdd02 Jun 15, 2019
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Jun 15, 2019

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: praseodym
To complete the pull request process, please assign derekwaynecarr
You can assign the PR to them by writing /assign @derekwaynecarr in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@praseodym

This comment has been minimized.

Copy link
Contributor Author

praseodym commented Jun 15, 2019

can we enhance the cpu_manager_test e2e so we ensure the right outcome never regresses?

Would the best way to write such a test be to follow the same route as the pods_container_manager_test e2e tests, i.e. creating a new pod to check whether cpu.cfs_quota_us on sysfs contains the expected value?

Having CFS CPU quota enabled can cause performance degradation due to
the (buggy) way current Linux kernels enforce this quota.

For containers that have CPU cores dedicated to them through CPU sets,
any CFS CPU quota can be safely unset to prevent this issue.
@praseodym praseodym force-pushed the praseodym:unset-cfs-cpu-quota-when-cpuset-defined branch from bbbdd02 to f4d0fa4 Jun 15, 2019
@praseodym

This comment has been minimized.

Copy link
Contributor Author

praseodym commented Jun 16, 2019

/retest

@draveness

This comment has been minimized.

Copy link
Member

draveness commented Jun 16, 2019

@yguo0905

This comment has been minimized.

Copy link
Contributor

yguo0905 commented Jun 27, 2019

/cc @yguo0905

@k8s-ci-robot k8s-ci-robot requested a review from yguo0905 Jun 27, 2019
@dims

This comment has been minimized.

Copy link
Member

dims commented Aug 1, 2019

@derekwaynecarr @yujuhong Can you please review this long standing patch? Associated issue has been open since Aug 20, 2018 :(

@xmudrii

This comment has been minimized.

Copy link
Member

xmudrii commented Aug 25, 2019

@derekwaynecarr @yujuhong Hello! I'm the bug triage lead for the 1.16 release cycle and considering this PR is tagged for 1.16, I'd like to check its status. The code freeze is starting on August 29th EOD PST (about 5 days from now) and once the code freeze starts, only release-blocking PRs will be considered to be merged. Is this PR still planned for the 1.16 release?

@praseodym Please note that the PR requires to be rebased before it can be merged.

@praseodym

This comment has been minimized.

Copy link
Contributor Author

praseodym commented Aug 25, 2019

@xmudrii Thanks for the heads-up! This PR needs more work, so it's not realistic for it to get merged before the 1.16 code freeze.

/milestone clear

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Aug 25, 2019

@praseodym: You must be a member of the kubernetes/milestone-maintainers GitHub team to set the milestone. If you believe you should be able to issue the /milestone command, please contact your and have them propose you as an additional delegate for this responsibility.

In response to this:

@xmudrii Thanks for the heads-up! This PR needs more work, so it's not realistic for it to get merged before the 1.16 code freeze.

/milestone clear

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@xmudrii

This comment has been minimized.

Copy link
Member

xmudrii commented Aug 25, 2019

@praseodym Thank you for letting us know!

/milestone clear

@k8s-ci-robot k8s-ci-robot removed this from the v1.16 milestone Aug 25, 2019
Copy link
Member

draveness left a comment

Can we close this since it is already fixed in the kernel, or we still need this PR

#70585 (comment)

@praseodym

This comment has been minimized.

Copy link
Contributor Author

praseodym commented Nov 9, 2019

Proper implementation of this feature (or bug workaround, rather) will get pretty complicated, so I’m fully supportive of closing this PR in favour of the kernel fix.

If anyone has any objections, please leave a comment!

@praseodym praseodym closed this Nov 9, 2019
@dannyk81

This comment has been minimized.

Copy link

dannyk81 commented Nov 10, 2019

@praseodym @chiluk @draveness although the CFS fixes should eliminate any throttling here I would still argue that enforcing CFS quotas for pods in Guaranteed QoS with cpuset (cpumanager enabled) seems wrong.

The Pods in question are isolated into dedicated cpusets where the limits of their usage is dictated by the amount of cores allocated to the cpuset.

There doesn't seem to be any logical reason to have a CFS quota applied here as well, as it will serve no purpose and only adds to confusion.

This will also solve the issue for those not able to upgrade to latest kernels where the fix is available, though I would say that this should not be be a factor indeed.

in any case, @chiluk mad respect for figuring out the fixes and working to ensure they made it into the kernel 🙏 👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.