Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated cherry pick of #120550: DRA: call plugins for claims even if exist in cache #121748

Conversation

adrianchiris
Copy link
Contributor

Cherry pick of #120550 on release-1.28.

#120550: DRA: call plugins for claims even if exist in cache

For details on the cherry pick process, see the cherry pick requests page.


Today, DRA manager does not call plugin NodePrepareResource
for claims that it previously successfully handled, that is,
if claims are present in cache (checkpoint) even if node
rebooted.

After node reboots, it is required to call DRA plugin
for resource claims so that plugins may prepare them
again in case the resources dont persist reboot.

To achieve that, once kubelet is started, we call DRA
plugins for claims once if a pod sandbox is required
to be created during PodSync.

Signed-off-by: adrianc <adrianc@nvidia.com>
adjust existing tests and add new test flows
to cover new DRA manager behaviour

Signed-off-by: adrianc <adrianc@nvidia.com>
@k8s-ci-robot k8s-ci-robot added this to the v1.28 milestone Nov 6, 2023
@k8s-ci-robot k8s-ci-robot added do-not-merge/cherry-pick-not-approved Indicates that a PR is not yet approved to merge into a release branch. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Nov 6, 2023
@k8s-ci-robot k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 6, 2023
@adrianchiris
Copy link
Contributor Author

@bart0sh @klueska PTAL

going over cherry-picks i see that alpha feature related bugs are not considered critical, so if this does not qualify for backport feel free to close this one.

Reason to add it:

  • This fixes an important (IMO) issue with DRA so we would like to have this fix as part of a released k8s version
  • the fix is isolated to DRA so risk is low.

@adrianchiris
Copy link
Contributor Author

/test pull-kubernetes-integration

@bart0sh bart0sh added this to Triage in SIG Node PR Triage Nov 7, 2023
@bart0sh
Copy link
Contributor

bart0sh commented Nov 7, 2023

/triage accepted
/priority important-soon
/kind bug
/release-note-none
/lgtm

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. kind/bug Categorizes issue or PR as related to a bug. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Nov 7, 2023
@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Nov 7, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 24592c7e926847b10141cfc75f98df854b4b0892

@bart0sh bart0sh moved this from Triage to Needs Approver in SIG Node PR Triage Nov 7, 2023
@bart0sh
Copy link
Contributor

bart0sh commented Nov 7, 2023

/assign @klueska @mrunalp
for a SIG-node approval

@klueska
Copy link
Contributor

klueska commented Nov 14, 2023

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 14, 2023
@adrianchiris
Copy link
Contributor Author

/cc kubernetes/release-managers

@k8s-ci-robot k8s-ci-robot requested a review from a team December 3, 2023 12:35
@adrianchiris
Copy link
Contributor Author

@bart0sh is there anything else required ?

@bart0sh
Copy link
Contributor

bart0sh commented Dec 13, 2023

@adrianchiris We're waiting for a Release managers approval as far as I understand.

@saschagrunert
Copy link
Member

How big is the risk that we introduce a regression here? It feels that we changed a non-trivial behavior, right?

@adrianchiris
Copy link
Contributor Author

How big is the risk that we introduce a regression here? It feels that we changed a non-trivial behavior, right?

Adding my POV:
I think the risk is acceptable as without this change, after reboot, pod will not have dra resources.
the change is isolated to dra manager in kubelet. so will not affect other flows.

the gist of the change:
instead of calling dra plugin once per resource (checkpointed, never called again),
we call dra plugin once per resource on kubelet start and only if a pod which requests dra resources need to be prepared.

note: dra plugins are defined as idempotent, so callling more than once with same parameters should yield same result.

@k8s-ci-robot k8s-ci-robot added cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. and removed do-not-merge/cherry-pick-not-approved Indicates that a PR is not yet approved to merge into a release branch. labels Dec 14, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adrianchiris, klueska, saschagrunert

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit a4527ec into kubernetes:release-1.28 Dec 14, 2023
14 checks passed
SIG Node PR Triage automation moved this from Needs Approver to Done Dec 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

6 participants