RFC: dra: support skipping NodePrepareResource and NodeUnprepareResource #118203

pohly · 2023-05-23T10:53:51Z

What type of PR is this?

/kind feature
/kind api-change

What this PR does / why we need it:

This enables writing DRA drivers where the part local to a node isn't using CDI and CRI. For example, an NRI plugin might act upon the allocated resource.

Special notes for your reviewer:

Using a new special value for the DriverName is simple. However, like any alternative (for example, a separate field) it will cause problems when the kubelet handling the pod does not know about it: then the kubelet will try to find a non-existent "nop.k8s.io" plugin and fail to prepare.

Does this PR introduce a user-facing change?

DRA: when using the new kubelet on a node, allocated ResourceClaims can skip the local prepare/unprepare operations and thus run without a DRA driver on the node. This enables additional use cases for DRA beyond the traditional injection of resources into the container.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/issues/3063

This enables writing DRA drivers where the part local to a node isn't using CDI and CRI. For example, an NRI plugin might act upon the allocated resource. Using a new special value for the DriverName is simple. However, like any alternative (for example, a separate field) it will cause problems when the kubelet handling the pod does not know about it: then the kubelet will try to find a non-existent "nop.k8s.io" plugin and fail to prepare.

k8s-ci-robot · 2023-05-23T10:53:53Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

k8s-ci-robot · 2023-05-23T10:54:00Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

pohly · 2023-05-23T11:22:24Z

pkg/apis/resource/types.go

+	//
+	// The special "nop.k8s.io" constant may be used here. This tells the
+	// kubelet that it does no resource driver is required when starting or
+	// stopping containers that use this resource.


The alternative that I had considered was something like:

Usage []string

where the empty set is the default behavior (call plugin). The same could also be specified explicitly with usage: [ cdi ].

usage: [ nop ] could indicate "do nothing".

If we ever do something with CNI, usage: [ cni ] could be added, perhaps even in combination (usage: [ cdi, cni ]).

I got stuck a bit when wondering whether all strings need to be pre-defined. Probably, otherwise we have a problem when a future release adds a new value that an old kubelet doesn't support. We want that old kubelet to fail when encountering an unknown value instead of silently ignoring it.

kad · 2023-05-23T14:36:23Z

NRI doesn't know anything about the high level objects like claims or allocation. To pass information to NRI something on kubelet should get info from claim object and convert it to something that will be passed down to runtime.
So, even were the cdi/cni ways of using might be feasible (I'd prefer to call it more container/pod resources rather than cdi/cni), but for full skip of kubelet part it doesn't make sense.

kad · 2023-05-23T14:40:26Z

/cc

k8s-ci-robot · 2023-05-23T16:54:55Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mimani68, pohly
Once this PR has been reviewed and has the lgtm label, please assign liggitt for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

alexzielenski · 2023-06-06T20:11:20Z

/remove-sig api-machinery
/sig scheduling

bart0sh · 2023-06-20T10:02:33Z

@pohly do we still need this PR? If not, please consider to close it, thanks.

pohly · 2023-06-20T14:32:18Z

There is an alternative that should work without any changes: just declaring pod.spec.resourceClaims without referencing them in pod.spec.containers[].resources.claims should also cause kubelet to not call NodePrepareResource. I had suggested that to some colleagues, but haven't heard back.

Before we close this PR, let me add an E2E test for this and a KEP update which documents this.

pohly · 2023-06-21T08:50:00Z

See #118786

pohly · 2023-06-29T09:18:19Z

/close

We agreed that just declaring pod.spec.resourceClaims without referencing them in pod.spec.containers[].resources.claims should work, and the pending PR restores that behavior for Kubernetes 1.28.

k8s-ci-robot · 2023-06-29T09:18:25Z

@pohly: Closed this PR.

In response to this:

/close

We agreed that just declaring pod.spec.resourceClaims without referencing them in pod.spec.containers[].resources.claims should work, and the pending PR restores that behavior for Kubernetes 1.28.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot requested review from bart0sh and feiskyer May 23, 2023 10:55

aojea mentioned this pull request May 23, 2023

Proposal / Discussion: gRPC Interface For CNI containernetworking/cni#785

Open

pohly commented May 23, 2023

View reviewed changes

k8s-ci-robot requested a review from kad May 23, 2023 14:40

mimani68 approved these changes May 23, 2023

View reviewed changes

SergeyKanzhelev added this to Triage in SIG Node PR Triage May 24, 2023

bart0sh moved this from Triage to WIP in SIG Node PR Triage May 24, 2023

k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Jun 6, 2023

k8s-ci-robot removed the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Jun 6, 2023

k8s-ci-robot closed this Jun 29, 2023

SIG Node PR Triage automation moved this from WIP to Done Jun 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: dra: support skipping NodePrepareResource and NodeUnprepareResource #118203

RFC: dra: support skipping NodePrepareResource and NodeUnprepareResource #118203

pohly commented May 23, 2023

k8s-ci-robot commented May 23, 2023

k8s-ci-robot commented May 23, 2023

pohly May 23, 2023

kad commented May 23, 2023

kad commented May 23, 2023

k8s-ci-robot commented May 23, 2023

alexzielenski commented Jun 6, 2023

bart0sh commented Jun 20, 2023

pohly commented Jun 20, 2023

pohly commented Jun 21, 2023

pohly commented Jun 29, 2023

k8s-ci-robot commented Jun 29, 2023

RFC: dra: support skipping NodePrepareResource and NodeUnprepareResource #118203

RFC: dra: support skipping NodePrepareResource and NodeUnprepareResource #118203

Conversation

pohly commented May 23, 2023

What type of PR is this?

What this PR does / why we need it:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot commented May 23, 2023

k8s-ci-robot commented May 23, 2023

pohly May 23, 2023

Choose a reason for hiding this comment

kad commented May 23, 2023

kad commented May 23, 2023

k8s-ci-robot commented May 23, 2023

alexzielenski commented Jun 6, 2023

bart0sh commented Jun 20, 2023

pohly commented Jun 20, 2023

pohly commented Jun 21, 2023

pohly commented Jun 29, 2023

k8s-ci-robot commented Jun 29, 2023