[1.27] kubelet: devices: skip allocation for running pods #118635 #119432

ffromani · 2023-07-19T10:10:54Z

What type of PR is this?

/kind bug
/kind regression

What this PR does / why we need it:

Cherry-pick of #118635 to branch release-1.27. Cherry pick per se done using hack/cherry_pick_pull.sh

Original description

When kubelet initializes, runs admission for pods and possibly allocate requested resources. We need to distinguish between node reboot (no containers running) versus kubelet restart (containers potentially running).

Running pods should always survive kubelet restart. This means that device allocation on admission should not be attempted, because if a container requires devices and is still running when kubelet is restarting, that container already has devices allocated and working.

Thus, we need to properly detect this scenario in the allocation step and handle it explicitely. We need to inform the devicemanager about which pods are already running.

Which issue(s) this PR fixes:

Fixes #118559

Special notes for your reviewer:

Implements the first approach proposed in the thread, so we make the devicemanager treat running pod differently.

This approach was chosen because it seems simpler to make self-contained and easier to backport.

The devicemanager already tracks (with the help of the checkpoint files) which containers got devices assigned to them, which by definition means these containers passed its admission. The missing bit is safely learning which container are already running when initializing, and for that we extend the existing buildContainerMapFromRuntime

Does this PR introduce a user-facing change?

Fixes regression in 1.27.2 causing running pods with devices to be terminated if kubelet is restarted

ffromani · 2023-07-19T10:11:08Z

/sig node

ffromani · 2023-07-19T10:11:26Z

/test

k8s-ci-robot · 2023-07-19T10:11:28Z

@ffromani: The /test command needs one or more targets.
The following commands are available to trigger required jobs:

/test pull-cadvisor-e2e-kubernetes
/test pull-kubernetes-conformance-kind-ga-only-parallel
/test pull-kubernetes-coverage-unit
/test pull-kubernetes-dependencies
/test pull-kubernetes-dependencies-go-canary
/test pull-kubernetes-e2e-gce
/test pull-kubernetes-e2e-gce-100-performance
/test pull-kubernetes-e2e-gce-canary
/test pull-kubernetes-e2e-gce-cos
/test pull-kubernetes-e2e-gce-cos-canary
/test pull-kubernetes-e2e-gce-cos-no-stage
/test pull-kubernetes-e2e-gce-network-proxy-http-connect
/test pull-kubernetes-e2e-kind
/test pull-kubernetes-e2e-kind-ipv6
/test pull-kubernetes-integration
/test pull-kubernetes-integration-go-canary
/test pull-kubernetes-integration-go-compatibility
/test pull-kubernetes-kubemark-e2e-gce-scale
/test pull-kubernetes-node-e2e-containerd
/test pull-kubernetes-typecheck
/test pull-kubernetes-unit
/test pull-kubernetes-unit-go-canary
/test pull-kubernetes-unit-go-compatibility
/test pull-kubernetes-update
/test pull-kubernetes-verify
/test pull-kubernetes-verify-go-canary

The following commands are available to trigger optional jobs:

/test check-dependency-stats
/test pull-e2e-gce-cloud-provider-disabled
/test pull-kubernetes-conformance-kind-ipv6-parallel
/test pull-kubernetes-cross
/test pull-kubernetes-e2e-autoscaling-hpa-cm
/test pull-kubernetes-e2e-autoscaling-hpa-cpu
/test pull-kubernetes-e2e-capz-azure-disk
/test pull-kubernetes-e2e-capz-azure-disk-vmss
/test pull-kubernetes-e2e-capz-azure-file
/test pull-kubernetes-e2e-capz-azure-file-vmss
/test pull-kubernetes-e2e-capz-conformance
/test pull-kubernetes-e2e-capz-windows-1-27
/test pull-kubernetes-e2e-containerd-gce
/test pull-kubernetes-e2e-gce-correctness
/test pull-kubernetes-e2e-gce-device-plugin-gpu
/test pull-kubernetes-e2e-gce-network-proxy-grpc
/test pull-kubernetes-e2e-gce-serial
/test pull-kubernetes-e2e-gci-gce-autoscaling
/test pull-kubernetes-e2e-kind-dual-canary
/test pull-kubernetes-e2e-kind-ipvs-dual-canary
/test pull-kubernetes-e2e-kind-kms
/test pull-kubernetes-e2e-kind-multizone
/test pull-kubernetes-e2e-kops-aws
/test pull-kubernetes-kubemark-e2e-gce-big
/test pull-kubernetes-node-e2e-containerd-kubetest2
/test pull-kubernetes-node-kubelet-serial-pod-disruption-conditions
/test pull-kubernetes-verify-strict-lint
/test pull-publishing-bot-validate

Use /test all to run the following jobs that were automatically triggered:

pull-kubernetes-conformance-kind-ga-only-parallel
pull-kubernetes-conformance-kind-ipv6-parallel
pull-kubernetes-dependencies
pull-kubernetes-e2e-capz-windows-1-27
pull-kubernetes-e2e-gce
pull-kubernetes-e2e-kind
pull-kubernetes-e2e-kind-ipv6
pull-kubernetes-integration
pull-kubernetes-integration-go-compatibility
pull-kubernetes-node-e2e-containerd
pull-kubernetes-typecheck
pull-kubernetes-unit
pull-kubernetes-unit-go-compatibility
pull-kubernetes-verify

In response to this:

/test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ffromani · 2023-07-19T10:12:01Z

/test all

ffromani · 2023-07-19T10:12:34Z

/test pull-kubernetes-e2e-gce-device-plugin-gpu
/test pull-kubernetes-e2e-gce-serial

ffromani · 2023-07-19T11:06:01Z

/test pull-kubernetes-unit-go-compatibility

unrelated failure: k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/util/flowcontrol/fairqueuing/queueset: TestTooWide expand_less

ffromani · 2023-07-19T11:07:41Z

/test pull-kubernetes-e2e-gce-device-plugin-gpu

setup failure?

ffromani · 2023-07-19T11:07:51Z

/test pull-kubernetes-unit-go-compatibility

ffromani · 2023-07-19T11:44:47Z

/test pull-kubernetes-e2e-capz-windows-1-27

unrelated failure: /home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/test/e2e/conformance_test.go

ffromani · 2023-07-19T11:56:13Z

/retest-required

apiserver is not affected by this change

ffromani · 2023-08-11T07:17:56Z

@ffromani sorry for the issues with the windows job. Thanks for reporting. It is running and passing now

no problem at all! it took me a while to get a clue about what was broken, but appy it was resolved and we have better windows jobs

swatisehgal

/lgtm

Re-applying as the label was removed due to rebase.

k8s-ci-robot · 2023-08-15T17:02:59Z

LGTM label has been added.

Git tree hash: 01900654b0c6f23d2ddc1ff343f0eb6fe97a9fcd

bobbypage · 2023-08-21T21:11:48Z

regarding #119432 (comment)

I'm inclined to do the debugging of the jobs in parallel and not block this PR because of that, please reviewers share your thoughts

+1, it looks like the test is failing on presubmits on your empty PR (#119590), so I agree we should not block this PR on that. I believe I found the issue there with the config (see kubernetes/test-infra#30450), which we can address separately.

bobbypage · 2023-08-21T21:12:15Z

/lgtm

bobbypage · 2023-08-21T22:22:02Z

/test pull-kubernetes-e2e-gce-device-plugin-gpu

ffromani · 2023-08-22T06:04:11Z

regarding #119432 (comment)

I'm inclined to do the debugging of the jobs in parallel and not block this PR because of that, please reviewers share your thoughts

+1, it looks like the test is failing on presubmits on your empty PR (#119590), so I agree we should not block this PR on that. I believe I found the issue there with the config (see kubernetes/test-infra#30450), which we can address separately.

Thanks @bobbypage ! it seems your fix was merged and worked like a charm, awesome!
Now I'll review myself that change carefully because I was not aware of the interdependencies and of the settings you used. TIL!

retr0h · 2023-08-22T07:18:46Z

@ffromani any idea when this will make it into 1.27?

We have been hitting this for a while now, and encounter UnexpectedAdmissionError on pods which use devices made available by the generic device plugin when the kubelet is restarted.

ffromani · 2023-08-22T07:21:56Z

@ffromani any idea when this will make it into 1.27?

We have been hitting this for a while now, and encounter UnexpectedAdmissionError on pods which use devices made available by the generic device plugin when the kubelet is restarted.

Hi, please note that we (or, at very least, I :) ) want to keep the existing behavior added in #109595 , whose partial fix lead to the bug I'm fixing here. In other words, some pods WILL still be hitting UnexpectedAdmissionError in some cases.
That said, I'm aiming for the september patch day. All blockers have been removed, but the backports have not been approved either.

bobbypage · 2023-08-22T19:44:37Z

/cc @kubernetes/release-managers

Release managers, can you please take a look at this cherrypick? Thank you!

xmudrii

For RelEng:
/lgtm
/approve

k8s-ci-robot · 2023-08-30T13:16:15Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani, mrunalp, xmudrii

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/kubelet/cm/OWNERS~~ [mrunalp]
~~test/e2e_node/OWNERS~~ [mrunalp]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ffromani · 2023-08-30T13:46:19Z

/test pull-kubernetes-node-e2e-containerd

failure unrelated to this PR

ffromani · 2023-08-30T13:58:16Z

/test pull-kubernetes-unit

unrelated failures

ffromani · 2023-08-30T14:04:01Z

/test pull-kubernetes-unit-go-compatibility

unrelated failure

k8s-ci-robot added this to the v1.27 milestone Jul 19, 2023

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 19, 2023

k8s-ci-robot requested review from klueska and krmayankk July 19, 2023 10:11

k8s-ci-robot added area/kubelet area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Jul 19, 2023

SergeyKanzhelev added this to Triage in SIG Node CI/Test Board Jul 19, 2023

SergeyKanzhelev added this to Triage in SIG Node PR Triage Jul 19, 2023

SergeyKanzhelev moved this from Triage to Archive-it in SIG Node CI/Test Board Jul 19, 2023

swatisehgal reviewed Aug 15, 2023

View reviewed changes

k8s-ci-robot assigned swatisehgal Aug 15, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 15, 2023

mrunalp approved these changes Aug 22, 2023

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 22, 2023

k8s-ci-robot requested a review from a team August 22, 2023 19:44

SergeyKanzhelev moved this from Needs Approver to Done in SIG Node PR Triage Aug 23, 2023

xmudrii approved these changes Aug 30, 2023

View reviewed changes

k8s-ci-robot assigned xmudrii Aug 30, 2023

xmudrii added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Aug 30, 2023

k8s-ci-robot removed the do-not-merge/cherry-pick-not-approved Indicates that a PR is not yet approved to merge into a release branch. label Aug 30, 2023

k8s-ci-robot merged commit 4737b87 into kubernetes:release-1.27 Aug 30, 2023
17 checks passed

ffromani deleted the automated-cherry-pick-of-#118635-upstream-release-1.27-1689697264 branch August 30, 2023 14:43

akalenyu mentioned this pull request Sep 5, 2023

Bump kubevirtci kubevirt/kubevirt#10375

Merged

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Sep 9, 2023

davshen mentioned this pull request Mar 20, 2024

restarting a kubelet should never affect the running workload #123980

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[1.27] kubelet: devices: skip allocation for running pods #118635 #119432

[1.27] kubelet: devices: skip allocation for running pods #118635 #119432

ffromani commented Jul 19, 2023 •

edited by liggitt

ffromani commented Jul 19, 2023

ffromani commented Jul 19, 2023

k8s-ci-robot commented Jul 19, 2023

ffromani commented Jul 19, 2023

ffromani commented Jul 19, 2023

ffromani commented Jul 19, 2023

ffromani commented Jul 19, 2023

ffromani commented Jul 19, 2023

ffromani commented Jul 19, 2023

ffromani commented Jul 19, 2023

ffromani commented Aug 11, 2023

swatisehgal left a comment

k8s-ci-robot commented Aug 15, 2023

bobbypage commented Aug 21, 2023

bobbypage commented Aug 21, 2023

bobbypage commented Aug 21, 2023

ffromani commented Aug 22, 2023

retr0h commented Aug 22, 2023

ffromani commented Aug 22, 2023

bobbypage commented Aug 22, 2023

xmudrii left a comment

k8s-ci-robot commented Aug 30, 2023

ffromani commented Aug 30, 2023

ffromani commented Aug 30, 2023

ffromani commented Aug 30, 2023

[1.27] kubelet: devices: skip allocation for running pods #118635 #119432

[1.27] kubelet: devices: skip allocation for running pods #118635 #119432

Conversation

ffromani commented Jul 19, 2023 • edited by liggitt

What type of PR is this?

What this PR does / why we need it:

Original description

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

ffromani commented Jul 19, 2023

ffromani commented Jul 19, 2023

k8s-ci-robot commented Jul 19, 2023

ffromani commented Jul 19, 2023

ffromani commented Jul 19, 2023

ffromani commented Jul 19, 2023

ffromani commented Jul 19, 2023

ffromani commented Jul 19, 2023

ffromani commented Jul 19, 2023

ffromani commented Jul 19, 2023

ffromani commented Aug 11, 2023

swatisehgal left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Aug 15, 2023

bobbypage commented Aug 21, 2023

bobbypage commented Aug 21, 2023

bobbypage commented Aug 21, 2023

ffromani commented Aug 22, 2023

retr0h commented Aug 22, 2023

ffromani commented Aug 22, 2023

bobbypage commented Aug 22, 2023

xmudrii left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Aug 30, 2023

ffromani commented Aug 30, 2023

ffromani commented Aug 30, 2023

ffromani commented Aug 30, 2023

ffromani commented Jul 19, 2023 •

edited by liggitt