Pods that are terminating but force deleted must still be accounted for in admission #104824

smarterclayton · 2021-09-07T17:52:20Z

A pod can be created, allocated unique resources on a node (such as a volume, an isolated CPU, or a device), and then force deleted. Force deletion however happens independent of the state of the Kubelet - the pod could still be running or not have completed termination (termination takes a non-zero amount of time from request to completion). A pod that is force deleted should still be visible to admission checks and other Kubelet loops until the pod is actually terminated.

There are two main admission loops - hard admission (which rejects the pod from the node immediately) and soft admission (which happens each time the pod is synced and might result in the pod being stopped temporarily) - which both need to take into account resources that are allocated but no longer in the list of "all pods".

Depending on the admission plugin, there may be different requirements about what information is necessary. For instance, if an admission plugin allocates resources only for running pods, then the plugin needs only pods that should or could have running containers (desired pods + terminating pods). If the admission plugin allocates resources for volumes or the entire pod, then the plugin needs to know (desired pods + non-terminal pods).

Currently, both use cases are handled identically, which means admission plugins often make direct calls to Kubelet manager functions, which could result in inconsistencies. Both cases are do not include pods that are force deleted, even those those pods may have running containers or be non-terminal.

Probable change

The current list of "all admitted pods" is served by GetActivePods() on the kubelet. This list is the set of observed desired configuration pods (every pod passed to HandlePodUpdates is added to the pod manager, regardless of admission status), which is filtered by:

pod is in a terminal state from the config source (pod phase = failed or succeeded)
pod is in a terminal state because it was rejected by admission (statusManager pod phase = failed or succeeded) - regressed in kubelet: Prevent runtime-only pods from going into terminated phase #103668, fixed in kubelet: Rejected pods should be filtered from admission #104817
pod has been detected to complete termination by the kubelet pod worker (pending terminal phase) - fixed in kubelet: Prevent runtime-only pods from going into terminated phase #103668

However, this list needs to change to the union of all pods known to the pod manager and all pods known to the pod workers (which includes pods that have been force deleted), filtered by:

pod is in a terminal state from the config source
pod is in a terminal state from admission
pod is in terminal state from the pod workers

The OtherPods entry on the pod admission list would then correctly show pods still consuming resources. An admission plugin that depends only on running pods could then make calls to the pod worker via an appropriate method (or we could split OtherPods).

The text was updated successfully, but these errors were encountered:

smarterclayton · 2021-09-07T17:53:01Z

/sig node
/assign

SergeyKanzhelev · 2021-09-15T17:27:27Z

/triage accepted

k8s-triage-robot · 2021-12-14T18:04:13Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2022-01-13T18:09:07Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

endocrimes · 2022-02-11T13:46:40Z

/remove-lifecycle rotten

k8s-triage-robot · 2022-05-12T14:01:25Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

vaibhav2107 · 2022-06-03T04:17:30Z

/remove-lifecycle stale

k8s-triage-robot · 2022-09-01T05:00:55Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2022-10-01T05:22:03Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2022-10-31T05:43:03Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2022-10-31T05:43:08Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

smarterclayton · 2023-03-03T18:40:28Z

/lifecycle frozen

This is an issue that can occur at any time and batch is most likely to trigger it. Leaving it open for now until we decide whether static pods need to hold exclusive resources during graceful termination and can block admission (force deleting a regular pod probably means that you don't want to block admitting the pod, but you still need to hold actual admission until the old pod is terminated).

There could be security issues with force terminating a pod that is holding an exclusive resource if the net new device doesn't guarantee a reset on acquisition (i.e. if the terminating pod is responsible for putting something into the pool clean, and the acquirer naively expects terminating to always run, the acquirer could see things from the previous user). Not a serious issue, but a likely source of bugs in reuse for most users.

k8s-triage-robot · 2024-03-02T19:06:18Z

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

Confirm that this issue is still relevant with /triage accepted (org members only)
Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

smarterclayton added the kind/bug Categorizes issue or PR as related to a bug. label Sep 7, 2021

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 7, 2021

k8s-ci-robot assigned smarterclayton Sep 7, 2021

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 7, 2021

smarterclayton mentioned this issue Sep 7, 2021

kubelet: Rejected pods should be filtered from admission #104817

Merged

ehashman added this to Triage in SIG Node Bugs Sep 14, 2021

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 15, 2021

SergeyKanzhelev moved this from Triage to Triaged in SIG Node Bugs Sep 15, 2021

smarterclayton mentioned this issue Oct 7, 2021

kubelet: set terminated podWorker status for terminated pods #105527

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 14, 2021

smarterclayton mentioned this issue Jan 5, 2022

Status of pods can become "OutOfCpu" when many pods are created and completed in a short time on the same node. #106884

Closed

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 13, 2022

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Feb 11, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 12, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 3, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 1, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 1, 2022

k8s-ci-robot added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Oct 1, 2022

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 31, 2022

SIG Node Bugs automation moved this from Triaged to Done Oct 31, 2022

smarterclayton reopened this Mar 3, 2023

SIG Node Bugs automation moved this from Done to Triage Mar 3, 2023

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Mar 3, 2023

This was referenced Mar 6, 2023

Static pod resources are double counted in kubelet admission because mirror pods are not excluded #116310

Closed

Static pods that are rejected by admission should be periodically retried / preempt non-static pods #116484

Open

SergeyKanzhelev moved this from Triage to Triaged in SIG Node Bugs Mar 15, 2023

qiutongs mentioned this issue Jul 24, 2023

Update a pod's effective request after init containers complete execution #119547

Open

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Mar 2, 2024

aojea mentioned this issue Apr 23, 2024

don't force delete pods on e2e tests #124471

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pods that are terminating but force deleted must still be accounted for in admission #104824

Pods that are terminating but force deleted must still be accounted for in admission #104824

smarterclayton commented Sep 7, 2021

smarterclayton commented Sep 7, 2021

SergeyKanzhelev commented Sep 15, 2021

k8s-triage-robot commented Dec 14, 2021

k8s-triage-robot commented Jan 13, 2022

endocrimes commented Feb 11, 2022

k8s-triage-robot commented May 12, 2022

vaibhav2107 commented Jun 3, 2022

k8s-triage-robot commented Sep 1, 2022

k8s-triage-robot commented Oct 1, 2022

k8s-triage-robot commented Oct 31, 2022

k8s-ci-robot commented Oct 31, 2022

smarterclayton commented Mar 3, 2023

k8s-triage-robot commented Mar 2, 2024

Pods that are terminating but force deleted must still be accounted for in admission #104824

Pods that are terminating but force deleted must still be accounted for in admission #104824

Comments

smarterclayton commented Sep 7, 2021

Probable change

smarterclayton commented Sep 7, 2021

SergeyKanzhelev commented Sep 15, 2021

k8s-triage-robot commented Dec 14, 2021

k8s-triage-robot commented Jan 13, 2022

endocrimes commented Feb 11, 2022

k8s-triage-robot commented May 12, 2022

vaibhav2107 commented Jun 3, 2022

k8s-triage-robot commented Sep 1, 2022

k8s-triage-robot commented Oct 1, 2022

k8s-triage-robot commented Oct 31, 2022

k8s-ci-robot commented Oct 31, 2022

smarterclayton commented Mar 3, 2023

k8s-triage-robot commented Mar 2, 2024