[1.25] kubelet: devices: skip allocation for running pods #118635 #119707

ffromani · 2023-08-01T11:41:19Z

What type of PR is this?

/kind bug
/kind regression

What this PR does / why we need it:

Cherry-pick of #118635 to branch release-1.25 through #119706. Cherry pick per se done using hack/cherry_pick_pull.sh

Original description

When kubelet initializes, runs admission for pods and possibly allocate requested resources. We need to distinguish between node reboot (no containers running) versus kubelet restart (containers potentially running).

Running pods should always survive kubelet restart. This means that device allocation on admission should not be attempted, because if a container requires devices and is still running when kubelet is restarting, that container already has devices allocated and working.

Thus, we need to properly detect this scenario in the allocation step and handle it explicitely. We need to inform the devicemanager about which pods are already running.

Which issue(s) this PR fixes:

Fixes #118559

Special notes for your reviewer:

Implements the first approach proposed in the thread, so we make the devicemanager treat running pod differently.

This approach was chosen because it seems simpler to make self-contained and easier to backport.

The devicemanager already tracks (with the help of the checkpoint files) which containers got devices assigned to them, which by definition means these containers passed its admission. The missing bit is safely learning which container are already running when initializing, and for that we extend the existing buildContainerMapFromRuntime

Does this PR introduce a user-facing change?

Fixes regression in 1.25.10 causing running pods with devices to be terminated if kubelet is restarted

swatisehgal · 2023-08-01T16:34:50Z

/triage accepted
/priority important-soon

swatisehgal · 2023-08-01T16:38:06Z

/assign

The recently added e2e device plugins test to cover node reboot works fine if runs every time on CI environment (e.g CI) but doesn't handle correctly partial setup when run repeatedly on the same instance (developer setup). To accomodate both flows, we extend the error management, checking more error conditions in the flow. Signed-off-by: Francesco Romani <fromani@redhat.com>

Make sure orphanded pods (pods deleted while kubelet is down) are handled correctly. Outline: 1. create a pod (not static pod) 2. stop kubelet 3. while kubelet is down, force delete the pod on API server 4. restart kubelet the pod becomes an orphaned pod and is expected to be killed by HandlePodCleanups. There is a similar test already, but here we want to check device assignment. Signed-off-by: Francesco Romani <fromani@redhat.com>

One of the contributing factors of issues kubernetes#118559 and kubernetes#109595 hard to debug and fix is that the devicemanager has very few logs in important flow, so it's unnecessarily hard to reconstruct the state from logs. We add minimal logs to be able to improve troubleshooting. We add minimal logs to be backport-friendly, deferring a more comprehensive review of logging to later PRs. Signed-off-by: Francesco Romani <fromani@redhat.com>

ffromani · 2023-08-08T11:31:17Z

rebased to fix a conflict

ffromani · 2023-08-08T17:36:02Z

/test pull-kubernetes-e2e-capz-windows-containerd-1-25

ffromani · 2023-08-09T06:28:17Z

/test pull-kubernetes-e2e-capz-windows-containerd-1-25

swatisehgal

/lgtm

Re-applying as the label was removed due to rebase.

k8s-ci-robot · 2023-08-15T16:48:53Z

LGTM label has been added.

Git tree hash: 24633598f8a497165c30f06ce4f3eeb05002cb03

ffromani · 2023-08-30T13:10:00Z

/cc @kubernetes/release-managers

Release managers, can you please take a look at this cherrypick? Thank you!

xmudrii

For RelEng:
/lgtm
/approve

k8s-ci-robot · 2023-08-30T13:16:08Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani, mrunalp, xmudrii

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/kubelet/cm/OWNERS~~ [mrunalp]
~~test/e2e_node/OWNERS~~ [mrunalp]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2023-08-30T13:52:22Z

@ffromani: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-kubernetes-unit	`52c5fa5`	link	unknown	`/test pull-kubernetes-unit`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

ffromani · 2023-08-30T13:54:36Z

/test pull-kubernetes-unit

unrelated failure in Test_Run_OneVolumeDetachFailNodeWithReadWriteOnce

k8s-ci-robot added this to the v1.25 milestone Aug 1, 2023

k8s-ci-robot requested a review from bobbypage August 1, 2023 11:41

k8s-ci-robot added the area/kubelet label Aug 1, 2023

k8s-ci-robot requested a review from klueska August 1, 2023 11:41

k8s-ci-robot added area/test sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 1, 2023

ffromani force-pushed the automated-cherry-pick-of-#119706-upstream-release-1.25-1690887364 branch 3 times, most recently from 029f358 to d8b9165 Compare August 1, 2023 12:43

bart0sh added this to Triage in SIG Node PR Triage Aug 1, 2023

k8s-ci-robot assigned swatisehgal Aug 1, 2023

SergeyKanzhelev added this to Triage in SIG Node CI/Test Board Aug 1, 2023

bart0sh moved this from Triage to Needs Reviewer in SIG Node PR Triage Aug 2, 2023

ffromani added 3 commits August 8, 2023 13:29

ffromani force-pushed the automated-cherry-pick-of-#119706-upstream-release-1.25-1690887364 branch from d8b9165 to 52c5fa5 Compare August 8, 2023 11:29

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 8, 2023

k8s-ci-robot requested a review from swatisehgal August 8, 2023 11:29

swatisehgal reviewed Aug 15, 2023

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 15, 2023

bart0sh moved this from Needs Reviewer to Needs Approver in SIG Node PR Triage Aug 15, 2023

mrunalp approved these changes Aug 22, 2023

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 22, 2023

k8s-ci-robot requested a review from a team August 30, 2023 13:10

xmudrii approved these changes Aug 30, 2023

View reviewed changes

k8s-ci-robot assigned xmudrii Aug 30, 2023

xmudrii added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Aug 30, 2023

k8s-ci-robot removed the do-not-merge/cherry-pick-not-approved Indicates that a PR is not yet approved to merge into a release branch. label Aug 30, 2023

k8s-ci-robot merged commit ad9995e into kubernetes:release-1.25 Aug 30, 2023
16 of 17 checks passed

SIG Node PR Triage automation moved this from Needs Approver to Done Aug 30, 2023

ffromani deleted the automated-cherry-pick-of-#119706-upstream-release-1.25-1690887364 branch August 30, 2023 14:24

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Sep 9, 2023

zwpaper mentioned this pull request Apr 17, 2024

Kubelet restart cause running pod restart with UnexpectedAdmissionError when pods have initContainers and external devices like GPU #124345

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[1.25] kubelet: devices: skip allocation for running pods #118635 #119707

[1.25] kubelet: devices: skip allocation for running pods #118635 #119707

ffromani commented Aug 1, 2023 •

edited by liggitt

swatisehgal commented Aug 1, 2023

swatisehgal commented Aug 1, 2023

ffromani commented Aug 8, 2023

ffromani commented Aug 8, 2023

ffromani commented Aug 9, 2023

swatisehgal left a comment •

edited

k8s-ci-robot commented Aug 15, 2023

ffromani commented Aug 30, 2023

xmudrii left a comment

k8s-ci-robot commented Aug 30, 2023

k8s-ci-robot commented Aug 30, 2023

ffromani commented Aug 30, 2023

[1.25] kubelet: devices: skip allocation for running pods #118635 #119707

[1.25] kubelet: devices: skip allocation for running pods #118635 #119707

Conversation

ffromani commented Aug 1, 2023 • edited by liggitt

What type of PR is this?

What this PR does / why we need it:

Original description

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

swatisehgal commented Aug 1, 2023

swatisehgal commented Aug 1, 2023

ffromani commented Aug 8, 2023

ffromani commented Aug 8, 2023

ffromani commented Aug 9, 2023

swatisehgal left a comment • edited

Choose a reason for hiding this comment

k8s-ci-robot commented Aug 15, 2023

ffromani commented Aug 30, 2023

xmudrii left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Aug 30, 2023

k8s-ci-robot commented Aug 30, 2023

ffromani commented Aug 30, 2023

ffromani commented Aug 1, 2023 •

edited by liggitt

swatisehgal left a comment •

edited