New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exclusive CPUs not removed from deleted Pod and put back in the defaultCPUSet. #107074
Comments
/sig node |
/cc @cynepco3hahue |
/triage accepted |
Hello, I am trying to find a sig/node labeled issue I can contribute. Though it doesn't have "help wanted" or "good first issue" label, can I try to contribute something for this issue? @bart0sh suggested me this issue (and I believe he will help me a lot ;) ), and it seems interesting to me! I understand at least the issue fully, and some related parts in cpu_manager.go. But, since I am not really experienced in Kubernetes project yet, I may need help!! |
Feel free to assign it to you yourself, but take into consideration that we are dependent on @smarterclayton work that he mentioned here #103979 (comment) |
Ah, I see.! Thanks! /assign |
I would like to know which code of which file makes a pod be in the list of activePods()..! And,, @smarterclayton can you explain once more what would be included in the activePods()? The comment @cynepco3hahue mentioned above is somewhat not clear to me (since I do not know what current activePods() lists). |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The issue has been marked as an important bug and triaged. Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle frozen |
i think it's a bug and this problem also exist in 1.28. |
What happened?
The
CPUManager
has logic to periodically cleanup stale state and reclaim exclusive CPUs from pods that have recently terminated. It does this by querying the system for a list ofactivePods()
and reclaiming CPUs from any pods it is tracking that are not in this list.This works fine for most pods, but special care needs to be taken to ensure that CPUs are not accidentally reclaimed from pods that have not started yet. Allocation of CPUs to the containers of a pod happens during pod admission (i.e. before the pod is added to the
activePods()
list), so a simple state variable (pendingAdmissionPod
) is used to indicate which pod is currently being admitted and exclude it from cleanup. Since pod admission is serialized, only one pod will ever be pending admission at a given time, and only a single variable is necessary to track this (i.e. whenever a new pod enters the admission loop to have exclusive CPUs granted to it,pendingAdmissionPod
is overwritten to point to the new pod, clearing way for the previous one to have its state cleaned up when appropriate).Unfortunately, this simple procedure can cause problems because
pendingAdmissionPod
is never reset tonil
after the last pod is admitted. This is usually fine, because the next time a pod comes in for admission, it will be overwritten to point to the new pod. But if no new pods come in, then it continues to point to the last pod that we attempted to admit (forever essentially), making it so that we can never cleanup its state if it gets deleted at some point in the future (because it is always treated as an "active" pod so long as it is pointed to bypendingAdmissionPod
).I don't think this issue is critical, since in any practical setting pods will be started and stopped all the time, clearing way for the state of previously admitted pods to be cleaned up. But we should consider a better method of tracking pods that are
notYetActiveButPendingAdmission()
so that we can eliminate this weird edge case.What did you expect to happen?
With
--cpu-manager-policy=static
enabled on a node.Look at the CPU set assigned to one of the system pods running on that node in a non-Guaranteed QOS class:
Create a pod requesting exclusive CPUs:
Look at the set of exclusive CPUs granted to it (we should see 4):
Look again at the set of CPU set assigned to the pod in the non-Guaranteed QOS class (the 4 CPUs from above are gone):
Delete the test pod:
Look again at the set of CPU set assigned to the pod in the non-Guaranteed QOS class (we are back to the original set):
How can we reproduce it (as minimally and precisely as possible)?
Go through the steps above, but the last step doesn't show the original CPU set restored.
Anything else we need to know?
No response
Kubernetes version
Cloud provider
OS version
No response
Install tools
No response
Container runtime (CRI) and and version (if applicable)
No response
Related plugins (CNI, CSI, ...) and versions (if applicable)
No response
The text was updated successfully, but these errors were encountered: