-
Notifications
You must be signed in to change notification settings - Fork 39.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the issue in CPU static policy when containers shouldn't be sched… #118021
base: master
Are you sure you want to change the base?
Fix the issue in CPU static policy when containers shouldn't be sched… #118021
Conversation
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
Welcome @dastonzerg! |
Hi @dastonzerg. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test @dastonzerg can you sign-off your commit and sign the EasyCLA per the kubernetes contributor guidelines? |
Hey @haircommander I have signed 10 mins ago but there is no "Look for an email indicating successful signup." email received as per https://github.com/kubernetes/community/blob/master/CLA.md#5-look-for-an-email-indicating-successful-signup . Should I wait for longer time? |
can you try running
|
5ffc472
to
92860d2
Compare
/easycla |
@haircommander Thx! after that I added |
c945742
to
b6dc80c
Compare
Signed-off-by: Weipeng Zheng <zweipeng@google.com>
b6dc80c
to
5c71de8
Compare
/test pull-kubernetes-node-kubelet-serial-cpu-manager |
@@ -255,7 +256,7 @@ func (p *staticPolicy) GetAllocatableCPUs(s state.State) cpuset.CPUSet { | |||
|
|||
// GetAvailableCPUs returns the set of unassigned CPUs minus the reserved set. | |||
func (p *staticPolicy) GetAvailableCPUs(s state.State) cpuset.CPUSet { | |||
return s.GetDefaultCPUSet().Difference(p.reservedCPUs) | |||
return s.GetDefaultCPUSet() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic is mismatch with the comment
/approve cancel as discussed at yesterday sig node meeting |
I reviewed the code and can see that isFailed check happens in the allocation stage and all the pods that are not guaranteed or guaranteed with non-integral CPU request are ruled out before we get to this check and we would get to this only if we are trying to allocate exclusive CPUs only. So I stand corrected on my comment about best effort pods (with no CPU request) in this check. I can see how this would ensure that the default pool is never empty but IIUC we get this by preventing all the CPUs from being used on the node? We essentially keep a buffer of 1 CPU always on a node. Correct? If I understand the intention above correctly, I think we need to consider the node allocatable here too. Consider this scenario: we have a node with 8 cpus, 2 cpus are reserved on the node => 6 allocatable CPUs as per https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/cm/cpumanager/policy_static.go#L252. The nodeAllocatable amount (6) is advertised on the node and subsequently used by the scheduler to perform scheduling decisions. Now, if we have a guaranteed pod requesting 6 CPUs, the scheduler would consider the aforementioned node a suitable candidate for the placement of the pod but if it is scheduled on the node it would fail at admission time.
I want to share a very compelling point made by @MarSik in our discussion yesterday about default pool not being empty:
This is copied verbatim from cgroup-v2 documentation and essentially means that if we try to make the defaultset empty it would inherit cgroup from its nearest non-empty ancestor or get all the CPUs which in the worst case can jeopardize exclusivity guarantees of all the pinned pods. I couldn't figure out if the implications are same in case of cgroupsv1 or not. But with cgroupsv2 supported as a stable feature in Kubernetes we can't ignore the fact that even if we try to make the default pool empty it could completely backfire. I wonder if this was the very reason for ensuring that the default pool is never empty from the get go. |
@SergeyKanzhelev Can you provide the discussion conclusion? Why was this proposal canceled |
I hope we can find some time to discuss this at sig-node today, I will only be able to be in for a few minutes due to some overlapping meetings. I appreciate the reference to cgroups v2 documentation. To me, if someone exhausts the shared pool with exclusive allocations, that is WAI, and should be a supported thing. I don't think that anyone is arguing against that point, but I am simply stating my assumption. Now, as exclusive allocations consume CPUs out of the shared pool, and the shared pool becomes empty, I hear the argument about cgroups v2 behavior: documentation seems to indicate that the parent cgroups resources will be used instead. If that's the case, couldn't we simply just have a specifc check to see if the |
@swatisehgal Thanks for the follow up discussions and context sharing! I am clear and agree on shared CPU pool should never go empty for the following reasons based our discussions so far:
I think I can address it together in this PR too. I was thinking of throwing error in admission time like in |
Hi, sorry I was out sick yesterday so couldn't join the discussion in SIG Node meeting and the recording hasn't been uploaded yet. Did we manage to discuss the implications of this approach from scheduling and end user perspective? I am referring to my point below:
Was the disconnect between scheduler and kubelet brought up in the meeting? With the approach you are suggesting we are could end up in runaway pod creation scenarios similar to #84869. |
I think this pr already fix the issue |
Recording: https://www.youtube.com/watch?v=HdIURTQSm7Q |
/priority important-longterm |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale @ffromani removing stale assuming you still want to follow up here |
…uled on reserved CPUs.
What type of PR is this?
/kind bug
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #115994
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: