New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dra: kubelet must skip NodePrepareResource if not used by any container #118786
Conversation
…y container" If (for whatever reason) no container uses a claim, then there's no need to prepare it.
Skipping CI for Draft Pull Request. |
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This is a draft because it fails for me locally. kubelet calls /assign @bart0sh |
/kind bug |
/retest |
// If no container actually uses the claim, then we don't need | ||
// to prepare it. | ||
if !claimIsUsedByPod(podClaim, pod) { | ||
klog.V(5).InfoS("Skipping unused resource", "claim", claimName, "pod", pod.Name) | ||
continue | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something seems off about the placement of this check in this function.
I believe it still works, but it may unnecessarily delay the underyling NodeUnprepareResources()
in certain scenarios.
The scenarios to consider are:
- This is the only pod to reference the claim
- This is the first pod to reference the claim (and more than one eventually reference it)
- This is not the first pod to reference the claim
In scenario 1, NodePrepareResources()
will never be called (because of the new check), and NodeUnprepareResources()
will also never be called (because a claimInfo object for it will never be created).
In scenario 2, NodePrepareResources()
will not be called when the first pod is started (because of the new check), a reference to the pod will also not be added to the claimInfo
object. Future calls to PrepareResources()
will have their pods added to the claimInfo
as appropriate and future calls to UnprepareResources()
will only trigger an underlying NodeUnprepareResources()
once the last pod reference in the claimInfo has called it.
In scenario 3, NodePrepareResources()
will have already been trigged by a previous pod, and since our new check for !claimIsUsedByPod()
is further down in the code, a reference to this pod will be added to the claimInfo
object. Future calls to UnprepareResources()
will only trigger an underlying NodeUnprepareResources()
once the last pod reference in the claimInfo has called it (even if the only pod we are waiting for is one that would fail the !claimIsUsedByPod()
check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I placed it here because I wanted to do the "is reserved for" check also when the claim is not actually used. We still want to do proper scheduling despite that, which means the claim must be reserved for the pod to run.
What I missed is the
claimInfo.addPodReference(pod.UID)
continue
above when the claim is already prepared.
I think that's actually another bug in the code: if the claim was prepared for some other pod and then this pod here does't have it reserved, it's allowed to run by kubelet because the "is reserved for" check gets skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed this by moving the "is already prepared" check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable. I believe the initial thinking was to exit early to avoid a call to the API server with ResourceV1alpha2().ResourceClaims(pod.Namespace).Get()
, but I agree that checking reservedFor
is more important than this minor optimization.
When a second pod wanted to use a claim, the obligatory sanity check whether the pod is really allowed to use the claim ("reserved for") was skipped.
1aeec10 removed iterating over containers in favor of iterating over pod claims. This had the unintended consequence that NodePrepareResource gets called unnecessarily when no container needs the claim. The more natural behavior is to skip unused resources. This enables (theoretic, at this time) use cases where some DRA driver relies on the controller part to influence scheduling, but then doesn't use CDI with containers.
b76af8c
to
bde66bf
Compare
/approve |
LGTM label has been added. Git tree hash: 6dc1e459b08ddd3aba75ffadc196accbcd5fb4c6
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: klueska, pohly The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
If (for whatever reason) no container uses a claim, then there's no need to prepare it. We also need to test this.
Does this PR introduce a user-facing change?