New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubelet: prepare DRA resources before CNI setup #114364
kubelet: prepare DRA resources before CNI setup #114364
Conversation
Please note that we're already in Test Freeze for the Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Thu Dec 8 11:56:23 UTC 2022. |
/sig node |
995927f
to
ea40383
Compare
/retest |
1 similar comment
/retest |
/assign @klueska |
@bart0sh: GitHub didn't allow me to request PR reviews from the following users: moshe010. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cc @moshe010 |
@bart0sh: GitHub didn't allow me to request PR reviews from the following users: moshe010. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Before committing to this approach, I would like to explore the option of adding code to tolerate temporary failures in the Pod Admission loop. That way the dra manager could move into this loop and sit alongside all other related managers. The |
/retest |
/retest |
/priority important-longterm |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, this change looks good. Just a few comments.
That said, as mentioned previously, we should still spend some cycles looking into doing this in the pod admission loop (like all other resource managers), rather than in the container creation step. The blocking factor at the moment being that the pod admission loop doesn't support transient failures when calling out to the kubelet plugins, but the container creation step does.
if cm.draManager != nil { | ||
return cm.draManager.PrepareResources(pod, container) | ||
} | ||
|
||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we didn't need this check against nil
previously, or was it an oversight?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cm.draManager can't be nil when DynamicResourceAllocation
feature is enabled ,so this check is the same as if utilfeature.DefaultFeatureGate.Enabled(kubefeatures.DynamicResourceAllocation) {
.
I decided to check for nil to be consistent with the code in this file, e.g. with https://github.com/kubernetes/kubernetes/blob/release-1.26/pkg/kubelet/cm/container_manager_linux.go#L714
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked if DynamicResourceAllocation
feature is enabled instead. This is better as we'll hopefully need to remove this check when DRA is GA. It will be easier to find it.
if cm.draManager != nil { | ||
return cm.draManager.UnprepareResources(pod) | ||
} | ||
|
||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise here
Agree, this needs to be done and I'm going to investigate it. |
58b54e7
to
c25222d
Compare
/retest |
1 similar comment
/retest |
c25222d
to
ce06591
Compare
@klueska fixed CI failures & rebased, PTAL |
ce06591
to
b01a9d0
Compare
/retest |
b01a9d0
to
55059e6
Compare
pkg/kubelet/kubelet.go
Outdated
} | ||
|
||
// UnprepareDynamicResources calls container Manager UnprepareDynamicResources API | ||
// This method implements RuntimeHelper interface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// This method implements RuntimeHelper interface | |
// This method implements the RuntimeHelper interface |
55059e6
to
4f88332
Compare
@bart0sh: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest |
@klueska thank you for the review! I've updated the PR according to your suggestions. PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick turn-around on the reviews. Looks great now.
/lgtm
/approve
LGTM label has been added. Git tree hash: 5cd8e0f886c19847029ece8324230e80e7b426ff
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bart0sh, klueska The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind feature
What this PR does / why we need it:
It calls DRA PrepareResources API before CNI is initialized to enable DRA usage for network devices.
Which issue(s) this PR fixes:
Fixes #113785
Special notes for your reviewer:
This is a modified version of the pohly#43 that keeps most of the logic in the container manager and adds only one call to the Kubelet just to be able to call container manager PrepareResources API.
Does this PR introduce a user-facing change?