Exclusive CPUs not removed from deleted Pod and put back in the defaultCPUSet. #107074

klueska · 2021-12-16T13:31:26Z

What happened?

The CPUManager has logic to periodically cleanup stale state and reclaim exclusive CPUs from pods that have recently terminated. It does this by querying the system for a list of activePods() and reclaiming CPUs from any pods it is tracking that are not in this list.

This works fine for most pods, but special care needs to be taken to ensure that CPUs are not accidentally reclaimed from pods that have not started yet. Allocation of CPUs to the containers of a pod happens during pod admission (i.e. before the pod is added to the activePods() list), so a simple state variable (pendingAdmissionPod) is used to indicate which pod is currently being admitted and exclude it from cleanup. Since pod admission is serialized, only one pod will ever be pending admission at a given time, and only a single variable is necessary to track this (i.e. whenever a new pod enters the admission loop to have exclusive CPUs granted to it, pendingAdmissionPod is overwritten to point to the new pod, clearing way for the previous one to have its state cleaned up when appropriate).

Unfortunately, this simple procedure can cause problems because pendingAdmissionPod is never reset to nil after the last pod is admitted. This is usually fine, because the next time a pod comes in for admission, it will be overwritten to point to the new pod. But if no new pods come in, then it continues to point to the last pod that we attempted to admit (forever essentially), making it so that we can never cleanup its state if it gets deleted at some point in the future (because it is always treated as an "active" pod so long as it is pointed to by pendingAdmissionPod).

I don't think this issue is critical, since in any practical setting pods will be started and stopped all the time, clearing way for the state of previously admitted pods to be cleaned up. But we should consider a better method of tracking pods that are notYetActiveButPendingAdmission() so that we can eliminate this weird edge case.

What did you expect to happen?

With --cpu-manager-policy=static enabled on a node.

Look at the CPU set assigned to one of the system pods running on that node in a non-Guaranteed QOS class:

$ kubectl exec -it <pod> -- taskset -cp 1
pid 1's current affinity list: 0-255

Create a pod requesting exclusive CPUs:

$ cat pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: test-pod
spec:
  restartPolicy: Never
  containers:
  - image: ubuntu:20.04
    name: test-ctr
    command: ["bash", "-c"]
    args: ["sleep 99999"]
    resources:
      limits:
        cpu: 4000m
        memory: 1Gi

$ kubectl apply -f pod.yml
pod/test-pod created

Look at the set of exclusive CPUs granted to it (we should see 4):

$ kubectl exec -it test-pod -- taskset -cp 1
pid 1's current affinity list: 1,2,129,130

Look again at the set of CPU set assigned to the pod in the non-Guaranteed QOS class (the 4 CPUs from above are gone):

$ kubectl exec -it <pod> -- taskset -cp 1
pid 1's current affinity list: 0,3-128,131-255

Delete the test pod:

$ kubectl delete pod test-pod
pod "test-pod" deleted

Look again at the set of CPU set assigned to the pod in the non-Guaranteed QOS class (we are back to the original set):

$ kubectl exec -it <pod> -- taskset -cp 1
pid 1's current affinity list: 0-255

How can we reproduce it (as minimally and precisely as possible)?

Go through the steps above, but the last step doesn't show the original CPU set restored.

$ kubectl exec -it <pod> -- taskset -cp 1
pid 1's current affinity list: 0,3-128,131-255

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.2", GitCommit:"8b5a19147530eaac9476b0ab82980b4088bbc1b2", GitTreeState:"clean", BuildDate:"2021-09-15T21:38:50Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.2", GitCommit:"8b5a19147530eaac9476b0ab82980b4088bbc1b2", GitTreeState:"clean", BuildDate:"2021-09-15T21:32:41Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider

Baremetal

OS version

No response

Install tools

No response

Container runtime (CRI) and and version (if applicable)

No response

Related plugins (CNI, CSI, ...) and versions (if applicable)

No response

The text was updated successfully, but these errors were encountered:

klueska · 2021-12-16T13:31:47Z

/sig node

cynepco3hahue · 2021-12-16T13:41:45Z

/cc @cynepco3hahue

SergeyKanzhelev · 2021-12-22T18:32:51Z

/triage accepted
/priority important-longterm

hj-johannes-lee · 2022-01-12T22:23:56Z

Hello, I am trying to find a sig/node labeled issue I can contribute. Though it doesn't have "help wanted" or "good first issue" label, can I try to contribute something for this issue?

@bart0sh suggested me this issue (and I believe he will help me a lot ;) ), and it seems interesting to me!

I understand at least the issue fully, and some related parts in cpu_manager.go. But, since I am not really experienced in Kubernetes project yet, I may need help!!

cynepco3hahue · 2022-01-13T08:57:09Z

Feel free to assign it to you yourself, but take into consideration that we are dependent on @smarterclayton work that he mentioned here #103979 (comment)

hj-johannes-lee · 2022-01-13T09:43:21Z

Ah, I see.! Thanks!

/assign

hj-johannes-lee · 2022-01-19T19:52:01Z

I would like to know which code of which file makes a pod be in the list of activePods()..!
I have tried many things, but I think it's not possible to do anything without knowing that.! Can anyone help me?

And,, @smarterclayton can you explain once more what would be included in the activePods()? The comment @cynepco3hahue mentioned above is somewhat not clear to me (since I do not know what current activePods() lists).

k8s-triage-robot · 2022-04-19T20:20:28Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

vaibhav2107 · 2022-04-26T09:09:06Z

/remove-lifecycle stale

k8s-triage-robot · 2022-07-25T09:40:51Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

vaibhav2107 · 2022-08-02T17:36:33Z

/remove-lifecycle stale

k8s-triage-robot · 2022-10-31T17:50:03Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2022-11-30T18:17:54Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2022-11-30T21:10:53Z

The issue has been marked as an important bug and triaged.
Such issues are automatically marked as frozen when hitting the rotten state
to avoid missing important bugs.

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle frozen

likakuli · 2023-09-13T06:55:54Z

i think it's a bug and this problem also exist in 1.28.

klueska added the kind/bug Categorizes issue or PR as related to a bug. label Dec 16, 2021

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 16, 2021

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 16, 2021

SergeyKanzhelev added this to Triage in SIG Node Bugs Dec 22, 2021

SergeyKanzhelev moved this from Triage to Triaged in SIG Node Bugs Dec 22, 2021

k8s-ci-robot assigned hj-johannes-lee Jan 13, 2022

This was referenced Jan 20, 2022

The CPU manager does not work correctly for the guaranteed pod with multiple containers #103952

Closed

kubelet: fix exclusive CPUs not removed from deleted Pod #107732

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 19, 2022

lxtywypc mentioned this issue Apr 21, 2022

CPUs allocated for guaranteed QoS pod are not released when the Pods are deleted #109562

Closed

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 26, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 25, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 2, 2022

klueska mentioned this issue Sep 7, 2022

cgroup cpuset is not updated for the running Pods when a Pod is deleted #111705

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 31, 2022

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 30, 2022

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Nov 30, 2022

smarterclayton mentioned this issue Jun 9, 2023

Running pods with devices are terminated if kubelet is restarted #118559

Closed

ZhangShuaiyi mentioned this issue Aug 15, 2023

cpu_manager_state is not updated when pod is deleted #118375

Closed

likakuli linked a pull request Sep 14, 2023 that will close this issue

fix: exclusive CPUs removed from deleted Pod and put back in the defaultCPUSet #120661

Open

MarSik mentioned this issue Nov 1, 2023

OCPBUGS-20368: E2E: Add tests for Dynamic ovs pinning openshift/cluster-node-tuning-operator#746

Merged

ffromani mentioned this issue Dec 5, 2023

Some reusable CPUs are missing in cpu_manager_state#defaultCpuSet #122171

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exclusive CPUs not removed from deleted Pod and put back in the defaultCPUSet. #107074

Exclusive CPUs not removed from deleted Pod and put back in the defaultCPUSet. #107074

klueska commented Dec 16, 2021 •

edited

klueska commented Dec 16, 2021

cynepco3hahue commented Dec 16, 2021

SergeyKanzhelev commented Dec 22, 2021

hj-johannes-lee commented Jan 12, 2022

cynepco3hahue commented Jan 13, 2022

hj-johannes-lee commented Jan 13, 2022

hj-johannes-lee commented Jan 19, 2022 •

edited

k8s-triage-robot commented Apr 19, 2022

vaibhav2107 commented Apr 26, 2022

k8s-triage-robot commented Jul 25, 2022

vaibhav2107 commented Aug 2, 2022

k8s-triage-robot commented Oct 31, 2022

k8s-triage-robot commented Nov 30, 2022

k8s-triage-robot commented Nov 30, 2022

likakuli commented Sep 13, 2023

Exclusive CPUs not removed from deleted Pod and put back in the defaultCPUSet. #107074

Exclusive CPUs not removed from deleted Pod and put back in the defaultCPUSet. #107074

Comments

klueska commented Dec 16, 2021 • edited

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

klueska commented Dec 16, 2021

cynepco3hahue commented Dec 16, 2021

SergeyKanzhelev commented Dec 22, 2021

hj-johannes-lee commented Jan 12, 2022

cynepco3hahue commented Jan 13, 2022

hj-johannes-lee commented Jan 13, 2022

hj-johannes-lee commented Jan 19, 2022 • edited

k8s-triage-robot commented Apr 19, 2022

vaibhav2107 commented Apr 26, 2022

k8s-triage-robot commented Jul 25, 2022

vaibhav2107 commented Aug 2, 2022

k8s-triage-robot commented Oct 31, 2022

k8s-triage-robot commented Nov 30, 2022

k8s-triage-robot commented Nov 30, 2022

likakuli commented Sep 13, 2023

klueska commented Dec 16, 2021 •

edited

hj-johannes-lee commented Jan 19, 2022 •

edited