Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dra: kubelet must skip NodePrepareResource if not used by any container #118786

Merged
merged 3 commits into from Jun 27, 2023

Conversation

pohly
Copy link
Contributor

@pohly pohly commented Jun 21, 2023

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

If (for whatever reason) no container uses a claim, then there's no need to prepare it. We also need to test this.

Does this PR introduce a user-facing change?

As in Kubernetes 1.26 and 1.27, resource claims do not get prepared by kubelet when no container uses them. This was changed accidentally in [v1.28.0-alpha.1](https://github.com/kubernetes/kubernetes/releases/tag/v1.28.0-alpha.1).

…y container"

If (for whatever reason) no container uses a claim, then there's no need to
prepare it.
@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 21, 2023
@k8s-ci-robot k8s-ci-robot added area/test sig/node Categorizes an issue or PR as relevant to SIG Node. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. approved Indicates a PR has been approved by an approver from all required OWNERS files. sig/testing Categorizes an issue or PR as relevant to SIG Testing. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 21, 2023
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 21, 2023
@pohly
Copy link
Contributor Author

pohly commented Jun 21, 2023

This is a draft because it fails for me locally. kubelet calls NodePrepareResource.

/assign @bart0sh

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jun 21, 2023
@pohly pohly marked this pull request as ready for review June 21, 2023 14:52
@k8s-ci-robot k8s-ci-robot added area/kubelet and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Jun 21, 2023
@pohly pohly changed the title e2e dra: add "kubelet must skip NodePrepareResource if not used by any container" dra: kubelet must skip NodePrepareResource if not used by any container Jun 21, 2023
@k8s-ci-robot k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Jun 21, 2023
@k8s-ci-robot k8s-ci-robot removed the release-note-none Denotes a PR that doesn't merit a release note. label Jun 21, 2023
@pohly
Copy link
Contributor Author

pohly commented Jun 21, 2023

/kind bug

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 21, 2023
@pohly
Copy link
Contributor Author

pohly commented Jun 21, 2023

/unassign @bart0sh
/assign @klueska

@k8s-ci-robot k8s-ci-robot assigned klueska and unassigned bart0sh Jun 21, 2023
@pohly
Copy link
Contributor Author

pohly commented Jun 21, 2023

/retest

@pacoxu pacoxu added this to Triage in SIG Node PR Triage Jun 24, 2023
Comment on lines +103 to +95
// If no container actually uses the claim, then we don't need
// to prepare it.
if !claimIsUsedByPod(podClaim, pod) {
klog.V(5).InfoS("Skipping unused resource", "claim", claimName, "pod", pod.Name)
continue
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something seems off about the placement of this check in this function.

I believe it still works, but it may unnecessarily delay the underyling NodeUnprepareResources() in certain scenarios.

The scenarios to consider are:

  1. This is the only pod to reference the claim
  2. This is the first pod to reference the claim (and more than one eventually reference it)
  3. This is not the first pod to reference the claim

In scenario 1, NodePrepareResources() will never be called (because of the new check), and NodeUnprepareResources() will also never be called (because a claimInfo object for it will never be created).

In scenario 2, NodePrepareResources() will not be called when the first pod is started (because of the new check), a reference to the pod will also not be added to the claimInfo object. Future calls to PrepareResources() will have their pods added to the claimInfo as appropriate and future calls to UnprepareResources() will only trigger an underlying NodeUnprepareResources() once the last pod reference in the claimInfo has called it.

In scenario 3, NodePrepareResources() will have already been trigged by a previous pod, and since our new check for !claimIsUsedByPod() is further down in the code, a reference to this pod will be added to the claimInfo object. Future calls to UnprepareResources() will only trigger an underlying NodeUnprepareResources() once the last pod reference in the claimInfo has called it (even if the only pod we are waiting for is one that would fail the !claimIsUsedByPod() check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I placed it here because I wanted to do the "is reserved for" check also when the claim is not actually used. We still want to do proper scheduling despite that, which means the claim must be reserved for the pod to run.

What I missed is the

			claimInfo.addPodReference(pod.UID)
			continue

above when the claim is already prepared.

I think that's actually another bug in the code: if the claim was prepared for some other pod and then this pod here does't have it reserved, it's allowed to run by kubelet because the "is reserved for" check gets skipped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this by moving the "is already prepared" check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable. I believe the initial thinking was to exit early to avoid a call to the API server with ResourceV1alpha2().ResourceClaims(pod.Namespace).Get(), but I agree that checking reservedFor is more important than this minor optimization.

When a second pod wanted to use a claim, the obligatory sanity check whether
the pod is really allowed to use the claim ("reserved for") was skipped.
1aeec10 removed iterating over containers in favor of iterating over pod
claims. This had the unintended consequence that NodePrepareResource gets
called unnecessarily when no container needs the claim. The more natural
behavior is to skip unused resources. This enables (theoretic, at this time)
use cases where some DRA driver relies on the controller part to influence
scheduling, but then doesn't use CDI with containers.
@klueska
Copy link
Contributor

klueska commented Jun 27, 2023

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 27, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 6dc1e459b08ddd3aba75ffadc196accbcd5fb4c6

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: klueska, pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 27, 2023
@k8s-ci-robot k8s-ci-robot merged commit b3d94ae into kubernetes:master Jun 27, 2023
14 of 15 checks passed
SIG Node PR Triage automation moved this from Triage to Done Jun 27, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.28 milestone Jun 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants