Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclusive CPUs not removed from deleted Pod and put back in the defaultCPUSet. #107074

Open
klueska opened this issue Dec 16, 2021 · 15 comments · May be fixed by #120661
Open

Exclusive CPUs not removed from deleted Pod and put back in the defaultCPUSet. #107074

klueska opened this issue Dec 16, 2021 · 15 comments · May be fixed by #120661
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@klueska
Copy link
Contributor

klueska commented Dec 16, 2021

What happened?

The CPUManager has logic to periodically cleanup stale state and reclaim exclusive CPUs from pods that have recently terminated. It does this by querying the system for a list of activePods() and reclaiming CPUs from any pods it is tracking that are not in this list.

This works fine for most pods, but special care needs to be taken to ensure that CPUs are not accidentally reclaimed from pods that have not started yet. Allocation of CPUs to the containers of a pod happens during pod admission (i.e. before the pod is added to the activePods() list), so a simple state variable (pendingAdmissionPod) is used to indicate which pod is currently being admitted and exclude it from cleanup. Since pod admission is serialized, only one pod will ever be pending admission at a given time, and only a single variable is necessary to track this (i.e. whenever a new pod enters the admission loop to have exclusive CPUs granted to it, pendingAdmissionPod is overwritten to point to the new pod, clearing way for the previous one to have its state cleaned up when appropriate).

Unfortunately, this simple procedure can cause problems because pendingAdmissionPod is never reset to nil after the last pod is admitted. This is usually fine, because the next time a pod comes in for admission, it will be overwritten to point to the new pod. But if no new pods come in, then it continues to point to the last pod that we attempted to admit (forever essentially), making it so that we can never cleanup its state if it gets deleted at some point in the future (because it is always treated as an "active" pod so long as it is pointed to by pendingAdmissionPod).

I don't think this issue is critical, since in any practical setting pods will be started and stopped all the time, clearing way for the state of previously admitted pods to be cleaned up. But we should consider a better method of tracking pods that are notYetActiveButPendingAdmission() so that we can eliminate this weird edge case.

What did you expect to happen?

With --cpu-manager-policy=static enabled on a node.

Look at the CPU set assigned to one of the system pods running on that node in a non-Guaranteed QOS class:

$ kubectl exec -it <pod> -- taskset -cp 1
pid 1's current affinity list: 0-255

Create a pod requesting exclusive CPUs:

$ cat pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: test-pod
spec:
  restartPolicy: Never
  containers:
  - image: ubuntu:20.04
    name: test-ctr
    command: ["bash", "-c"]
    args: ["sleep 99999"]
    resources:
      limits:
        cpu: 4000m
        memory: 1Gi
$ kubectl apply -f pod.yml
pod/test-pod created

Look at the set of exclusive CPUs granted to it (we should see 4):

$ kubectl exec -it test-pod -- taskset -cp 1
pid 1's current affinity list: 1,2,129,130

Look again at the set of CPU set assigned to the pod in the non-Guaranteed QOS class (the 4 CPUs from above are gone):

$ kubectl exec -it <pod> -- taskset -cp 1
pid 1's current affinity list: 0,3-128,131-255

Delete the test pod:

$ kubectl delete pod test-pod
pod "test-pod" deleted

Look again at the set of CPU set assigned to the pod in the non-Guaranteed QOS class (we are back to the original set):

$ kubectl exec -it <pod> -- taskset -cp 1
pid 1's current affinity list: 0-255

How can we reproduce it (as minimally and precisely as possible)?

Go through the steps above, but the last step doesn't show the original CPU set restored.

$ kubectl exec -it <pod> -- taskset -cp 1
pid 1's current affinity list: 0,3-128,131-255

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.2", GitCommit:"8b5a19147530eaac9476b0ab82980b4088bbc1b2", GitTreeState:"clean", BuildDate:"2021-09-15T21:38:50Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.2", GitCommit:"8b5a19147530eaac9476b0ab82980b4088bbc1b2", GitTreeState:"clean", BuildDate:"2021-09-15T21:32:41Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider

Baremetal

OS version

No response

Install tools

No response

Container runtime (CRI) and and version (if applicable)

No response

Related plugins (CNI, CSI, ...) and versions (if applicable)

No response

@klueska klueska added the kind/bug Categorizes issue or PR as related to a bug. label Dec 16, 2021
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 16, 2021
@klueska
Copy link
Contributor Author

klueska commented Dec 16, 2021

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 16, 2021
@cynepco3hahue
Copy link

/cc @cynepco3hahue

@SergeyKanzhelev SergeyKanzhelev added this to Triage in SIG Node Bugs Dec 22, 2021
@SergeyKanzhelev
Copy link
Member

/triage accepted
/priority important-longterm

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 22, 2021
@SergeyKanzhelev SergeyKanzhelev moved this from Triage to Triaged in SIG Node Bugs Dec 22, 2021
@hj-johannes-lee
Copy link
Contributor

Hello, I am trying to find a sig/node labeled issue I can contribute. Though it doesn't have "help wanted" or "good first issue" label, can I try to contribute something for this issue?

@bart0sh suggested me this issue (and I believe he will help me a lot ;) ), and it seems interesting to me!

I understand at least the issue fully, and some related parts in cpu_manager.go. But, since I am not really experienced in Kubernetes project yet, I may need help!!

@cynepco3hahue
Copy link

Feel free to assign it to you yourself, but take into consideration that we are dependent on @smarterclayton work that he mentioned here #103979 (comment)

@hj-johannes-lee
Copy link
Contributor

Ah, I see.! Thanks!

/assign

@hj-johannes-lee
Copy link
Contributor

hj-johannes-lee commented Jan 19, 2022

I would like to know which code of which file makes a pod be in the list of activePods()..!
I have tried many things, but I think it's not possible to do anything without knowing that.! Can anyone help me?

And,, @smarterclayton can you explain once more what would be included in the activePods()? The comment @cynepco3hahue mentioned above is somewhat not clear to me (since I do not know what current activePods() lists).

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 19, 2022
@vaibhav2107
Copy link
Member

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 26, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 25, 2022
@vaibhav2107
Copy link
Member

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 2, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 31, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 30, 2022
@k8s-triage-robot
Copy link

The issue has been marked as an important bug and triaged.
Such issues are automatically marked as frozen when hitting the rotten state
to avoid missing important bugs.

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Nov 30, 2022
@likakuli
Copy link
Contributor

i think it's a bug and this problem also exist in 1.28.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
8 participants