dra: kubelet must skip NodePrepareResource if not used by any container #118786

pohly · 2023-06-21T08:45:02Z

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

If (for whatever reason) no container uses a claim, then there's no need to prepare it. We also need to test this.

Does this PR introduce a user-facing change?

As in Kubernetes 1.26 and 1.27, resource claims do not get prepared by kubelet when no container uses them. This was changed accidentally in [v1.28.0-alpha.1](https://github.com/kubernetes/kubernetes/releases/tag/v1.28.0-alpha.1).

…y container" If (for whatever reason) no container uses a claim, then there's no need to prepare it.

k8s-ci-robot · 2023-06-21T08:45:04Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

k8s-ci-robot · 2023-06-21T08:45:11Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

pohly · 2023-06-21T08:47:05Z

This is a draft because it fails for me locally. kubelet calls NodePrepareResource.

/assign @bart0sh

pohly · 2023-06-21T14:55:20Z

/kind bug

pohly · 2023-06-21T14:56:07Z

/unassign @bart0sh
/assign @klueska

pohly · 2023-06-21T16:02:19Z

/retest

klueska · 2023-06-27T09:15:33Z

pkg/kubelet/cm/dra/manager.go

+		// If no container actually uses the claim, then we don't need
+		// to prepare it.
+		if !claimIsUsedByPod(podClaim, pod) {
+			klog.V(5).InfoS("Skipping unused resource", "claim", claimName, "pod", pod.Name)
+			continue
+		}
+


Something seems off about the placement of this check in this function.

I believe it still works, but it may unnecessarily delay the underyling NodeUnprepareResources() in certain scenarios.

The scenarios to consider are:

This is the only pod to reference the claim

This is the first pod to reference the claim (and more than one eventually reference it)

This is not the first pod to reference the claim

In scenario 1, NodePrepareResources() will never be called (because of the new check), and NodeUnprepareResources() will also never be called (because a claimInfo object for it will never be created).

In scenario 2, NodePrepareResources() will not be called when the first pod is started (because of the new check), a reference to the pod will also not be added to the claimInfo object. Future calls to PrepareResources() will have their pods added to the claimInfo as appropriate and future calls to UnprepareResources() will only trigger an underlying NodeUnprepareResources() once the last pod reference in the claimInfo has called it.

In scenario 3, NodePrepareResources() will have already been trigged by a previous pod, and since our new check for !claimIsUsedByPod() is further down in the code, a reference to this pod will be added to the claimInfo object. Future calls to UnprepareResources() will only trigger an underlying NodeUnprepareResources() once the last pod reference in the claimInfo has called it (even if the only pod we are waiting for is one that would fail the !claimIsUsedByPod() check.

I placed it here because I wanted to do the "is reserved for" check also when the claim is not actually used. We still want to do proper scheduling despite that, which means the claim must be reserved for the pod to run.

What I missed is the

claimInfo.addPodReference(pod.UID) continue

above when the claim is already prepared.

I think that's actually another bug in the code: if the claim was prepared for some other pod and then this pod here does't have it reserved, it's allowed to run by kubelet because the "is reserved for" check gets skipped.

I fixed this by moving the "is already prepared" check.

Seems reasonable. I believe the initial thinking was to exit early to avoid a call to the API server with ResourceV1alpha2().ResourceClaims(pod.Namespace).Get(), but I agree that checking reservedFor is more important than this minor optimization.

When a second pod wanted to use a claim, the obligatory sanity check whether the pod is really allowed to use the claim ("reserved for") was skipped.

1aeec10 removed iterating over containers in favor of iterating over pod claims. This had the unintended consequence that NodePrepareResource gets called unnecessarily when no container needs the claim. The more natural behavior is to skip unused resources. This enables (theoretic, at this time) use cases where some DRA driver relies on the controller part to influence scheduling, but then doesn't use CDI with containers.

klueska · 2023-06-27T14:30:55Z

/approve
/lgtm

k8s-ci-robot · 2023-06-27T14:31:03Z

LGTM label has been added.

Git tree hash: 6dc1e459b08ddd3aba75ffadc196accbcd5fb4c6

k8s-ci-robot · 2023-06-27T14:31:20Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: klueska, pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/kubelet/cm/OWNERS~~ [klueska]
~~test/e2e/dra/OWNERS~~ [klueska,pohly]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

e2e dra: add "kubelet must skip NodePrepareResource if not used by an…

ec70b2e

…y container" If (for whatever reason) no container uses a claim, then there's no need to prepare it.

k8s-ci-robot requested review from bart0sh and klueska June 21, 2023 08:45

k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 21, 2023

k8s-ci-robot assigned bart0sh Jun 21, 2023

pohly mentioned this pull request Jun 21, 2023

RFC: dra: support skipping NodePrepareResource and NodeUnprepareResource #118203

Closed

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jun 21, 2023

pohly marked this pull request as ready for review June 21, 2023 14:52

k8s-ci-robot added area/kubelet and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Jun 21, 2023

pohly changed the title ~~e2e dra: add "kubelet must skip NodePrepareResource if not used by any container"~~ dra: kubelet must skip NodePrepareResource if not used by any container Jun 21, 2023

k8s-ci-robot requested a review from wzshiming June 21, 2023 14:52

k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Jun 21, 2023

k8s-ci-robot removed the release-note-none Denotes a PR that doesn't merit a release note. label Jun 21, 2023

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 21, 2023

k8s-ci-robot assigned klueska and unassigned bart0sh Jun 21, 2023

pacoxu added this to Triage in SIG Node PR Triage Jun 24, 2023

klueska reviewed Jun 27, 2023

View reviewed changes

pohly added 2 commits June 27, 2023 16:01

kubelet dra: fix checking of second pod which uses a claim

874daa8

When a second pod wanted to use a claim, the obligatory sanity check whether the pod is really allowed to use the claim ("reserved for") was skipped.

pohly force-pushed the dra-test-skip-prepare branch from b76af8c to bde66bf Compare June 27, 2023 14:02

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 27, 2023

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 27, 2023

k8s-ci-robot merged commit b3d94ae into kubernetes:master Jun 27, 2023
14 of 15 checks passed

SIG Node PR Triage automation moved this from Triage to Done Jun 27, 2023

k8s-ci-robot added this to the v1.28 milestone Jun 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dra: kubelet must skip NodePrepareResource if not used by any container #118786

dra: kubelet must skip NodePrepareResource if not used by any container #118786

pohly commented Jun 21, 2023 •

edited

k8s-ci-robot commented Jun 21, 2023

k8s-ci-robot commented Jun 21, 2023

pohly commented Jun 21, 2023

pohly commented Jun 21, 2023

pohly commented Jun 21, 2023

pohly commented Jun 21, 2023

klueska Jun 27, 2023

pohly Jun 27, 2023

pohly Jun 27, 2023

klueska Jun 27, 2023

klueska commented Jun 27, 2023

k8s-ci-robot commented Jun 27, 2023

k8s-ci-robot commented Jun 27, 2023

dra: kubelet must skip NodePrepareResource if not used by any container #118786

dra: kubelet must skip NodePrepareResource if not used by any container #118786

Conversation

pohly commented Jun 21, 2023 • edited

What type of PR is this?

What this PR does / why we need it:

Does this PR introduce a user-facing change?

k8s-ci-robot commented Jun 21, 2023

k8s-ci-robot commented Jun 21, 2023

pohly commented Jun 21, 2023

pohly commented Jun 21, 2023

pohly commented Jun 21, 2023

pohly commented Jun 21, 2023

klueska Jun 27, 2023

Choose a reason for hiding this comment

pohly Jun 27, 2023

Choose a reason for hiding this comment

pohly Jun 27, 2023

Choose a reason for hiding this comment

klueska Jun 27, 2023

Choose a reason for hiding this comment

klueska commented Jun 27, 2023

k8s-ci-robot commented Jun 27, 2023

k8s-ci-robot commented Jun 27, 2023

pohly commented Jun 21, 2023 •

edited