Skip to content

Conversation

@johnbelamaric
Copy link
Member

@johnbelamaric johnbelamaric commented Sep 24, 2024

  • One-line PR description: adding new KEP for Prioritized Alternatives in Device Requests
  • Other comments:

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Sep 24, 2024
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 24, 2024
@johnbelamaric
Copy link
Member Author

/hold for all approvals

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 24, 2024
@johnbelamaric
Copy link
Member Author

cc @klueska @pohly

/assign @mrunalp

@sftim
Copy link

sftim commented Sep 25, 2024

@johnbelamaric can you redo the one-line PR description? It's a bit unclear.

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 26, 2024
The proposal adds a new type, called `RankedDeviceRequest`, which allows the
user to list `DeviceRequest`s, exactly one of which must be satisfied. The
`DeviceClaim` then gets a new field listing all of these such requests that must
be satisfied. There is no change to the existing `DeviceRequest` type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach is the opposite of how we envisioned to add this feature. Adding a new field means that old consumers acting on a DeviceClaim will not see that new field. Instead of detecting that they are out-dated and cannot handle the claim, they will treat it like an empty claim.

Instead, the plan was to extend DeviceRequest with a OneOf slice and leave the rest of the fields unset, thus treating it like a "one-of": either the other fields are set or the new OneOf, but never both. This depends on DeviceClassName being required and current consumers checking that it is set, which they should.

We could reuse DeviceRequest and define that the OneOf slice must be empty when the DeviceRequest is used inside a OneOf (no nesting). The DeviceRequest.Name is meaningful the same way as you describe it in the KEP for RankedDeviceRequest.Name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I had a similar train of thought but wasn't sure how we could handle it because of the current request fields being required. Are you saying we can relax that in validation? Then I think we can nest the current DeviceRequest fields all under a DeviceRequestDetail which we inline. Or just duplicate the fields.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying we can relax that in validation?

I think we can. It's the "organic" evolution of the API that we discussed previously.

Then I think we can nest the current DeviceRequest fields all under a DeviceRequestDetail which we inline. Or just duplicate the fields.

Of just use DeviceRequest with the constraint that the nested instances must use the fields, not the OneOf slice:

type DeviceRequest {
    ...
    // +optional
    // +oneOf=deviceRequestType
    DeviceClassName string
    
    ...
    //
    // OneOf may only be set in the entries of DeviceClaim.Requests. It must not be set
    // in DeviceRequest instances that themselves are part of a OneOf.
    //
    // +optional
    // +oneOf=deviceRequestType
    OneOf []DeviceRequest
}

Reusing DeviceRequest might make some of the implementation simpler.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manual validation code will have to ensure that either DeviceClassName or OneOf are set.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this change, I like it. It makes the change even smaller. The one thing I did differently is use the field name FirstOf instead of OneOf, which I think is more descriptive and also won't be confused with our other use of the term "one of".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xref my review at #4871 (comment)
This is sort of a (non-discriminated) union, but one option is "inlined" or "unpacked".

Things I don't like:

  • the declarative validation work so far does not support this
  • it's unclear how to express this if ever a 3rd option arises
  • admin-access should only be at the "parent" but is present in the child
  • it's recursive but not REALLY (cognitive load for everyone who consumes it)

Adding a new field means that old consumers acting on a DeviceClaim will not see that new field. Instead of detecting that they are out-dated and cannot handle the claim, they will treat it like an empty claim.

kubernetes/kubernetes#125488 (comment)

We had a lot of discussion, and I could not find it all, but this seems relevant. If we are ever going to have "gizmos" then we have this same problem, right? A back-rev scheduler might not know how to hande Gizmos and treat it like they don't exist.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that's different, is this original proposal was not a "one-of". The RankedRequests were in addition to the ordinary Requests. So, an older client would see a partial, valid request and wouldn't know that it doesn't have all the information. To make it a one-of, we would force people to use either regular Requests OR RankedRequests in a given claim. This would work, because a RankedRequest with one entry is equivalent to a regular Request. It's just a little awkward.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*would work, if empty claims were not considered valid, so we could differentiate

`DeviceClassName` was required, were requested to include this logic, and the
in-tree components have been built in this way.

```go
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the diff, if it's useful during review:

diff --git a/pkg/apis/resource/types.go b/pkg/apis/resource/types.go
index 2c03a304fdf..93b96c3e4c8 100644
--- a/pkg/apis/resource/types.go
+++ b/pkg/apis/resource/types.go
@@ -397,7 +397,11 @@ type DeviceRequest struct {
        // additional configuration and selectors to be inherited by this
        // request.
        //
-       // A class is required. Which classes are available depends on the cluster.
+       // Either a class or FirstOf requests are required in DeviceClaim.Requests. When
+       // this request is part of the FirstOf list, a class is required. Nested FirstOf
+       // requests are not allowed
+       //
+       // Which classes are available depends on the cluster.
        //
        // Administrators may use this to restrict which devices may get
        // requested by only installing classes with selectors for permitted
@@ -405,9 +409,19 @@ type DeviceRequest struct {
        // then administrators can create an empty DeviceClass for users
        // to reference.
        //
-       // +required
+       // +optional
+       // +oneOf=deviceRequestType
        DeviceClassName string
 
+       // FirstOf contains subrequests, exactly one of which must be satisfied
+       // in order to satisfy this request. This field may only be set in the
+       // entries of DeviceClaim.Requests. It must not be set in DeviceRequest
+       // instances that themselves are part of a FirstOf.
+       //
+       // +optional
+       // +oneOf=deviceRequestType
+       FirstOf []DeviceRequest
+
        // Selectors define criteria which must be satisfied by a specific
        // device in order for that device to be considered for this
        // request. All selectors must be satisfied for a device to be
@@ -457,7 +471,8 @@ type DeviceRequest struct {
 }
 
 const (
-       DeviceSelectorsMaxSize = 32
+       DeviceSelectorsMaxSize      = 32
+       FirstOfDeviceRequestMaxSize = 8
 )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some alternatives: FirstAvailableFrom, FirstMatchFrom (not feeling strongly for any particular one but leaving them here to see if folks have thoughts..)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest version if FirstAvailableOf but happy to change...but I think it can also be done at implementation time, we don't need the exact API now.

//
// +optional
// +oneOf=deviceRequestType
FirstOf []DeviceRequest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this call for a dedicated type that doesn't have this field?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're enforcing that in validation. We could do it as a separate type, if that's the consensus.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested reusing the type because ... I am lazy. I expect that there will be code which needs to work with both DeviceRequest and such a new type (for example, validation). Go generics doesn't help here because it doesn't support accessing common fields.

But I agree that a separate type is better because then the OpenAPI spec will not include an inner FirstOf which can never be set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be achieved with composition using the json inline tag, though. But that's an implementation detail

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that this would change the protobuf encoding. We want to support clients sending v1alpha3 as defined in 1.31 to simplify the transition and to support version skew (kubelet from 1.31, control plane at 1.32).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the actual Go type details be sorted out at implementation time? I'd love to use a composition with json inlining, but we'll need to verify that that doesn't cause issues.

used to circumvent quota. This reduces the usefulness of the feature, as it
means it will not serve as a quota management feature. However, the primary goal
of the feature is about flexibility across clusters and obtainability of
underlying devices, not quota management.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully kueue can help with that.
cc @kannon92

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johnbelamaric could we call out Kueue as a beta requirement? I’d really like to Kueue usecase in the requirements for DRA features.

WDYT @alculquicondor?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kueue is not part of the Kubernetes core, so it wouldn't be reasonable to make it a beta requirement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But Kueue is a great way to get feedback on how useful this feature is for future users?

Copy link
Contributor

@kannon92 kannon92 Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think of this as the k8s projects dogfooding their own APIs (via k8s.sigs) so we can see what is useful or not. But I do take your point that Kueue is not in tree.

Consider including folks who also work outside the SIG or subproject.
-->

## Design Details
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs some implementation details for scheduler.

cc @dom4ha

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a little description, let me know if it's useful.

Copy link
Contributor

@jpbetz jpbetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
For PRR

of ways to satisfy the claim, with a preference ranking.
* Enable schedulers to evaluate those preferences and allocate devices for the
claim based on them.
* Enable cluster autoscalers to evaluate those preferences and make scaling
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one might be a major change, given that cluster-autoscaler doesn't currently run kube-scheduler's score functions. But @MaciekPytel and @towca can weigh in.

Should implementation be a non-goal? or a beta requirement?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made this a beta criteria

Copy link

@towca towca Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, major change might be underselling it. Cluster Autoscaler has never took scheduler scoring into account, I think due to how differently it works on a high level.

Scheduler scores all/most Nodes that passed filters one-by-one for a single Pod, and just picks the highest score. When CA has to choose which option to scale up, each option likely has multiple Nodes, and each of the Nodes likely has multiple Pods "scheduled" to it in the CA simulation.

IMO integrating CA with scheduler scoring would need major effort on both ends. Scheduler scoring would need to somehow be generalized so that it can handle scoring multiple Nodes and Pods together. Then Cluster Autoscaler would need to integrate with the new mechanism. Depending on the design there'd be more work in one or the other, but either way it seems like a lot.

And to reiterate on something that we've brought up before - the scheduler scores are essentially meaningless in autoscaled clusters. Most of the time, scheduler only has one Node to choose from in such clusters, the one that Cluster Autoscaler provided.

I just want to make sure that we don't underestimate how hard/time-consuming it will be to integrate Cluster Autoscaler support on top of the proposed scheduler scoring.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Kuba, that's helpful context. We will have to decide as we move towards beta whether the effort is worth it or not.

today, it [throws an
error](https://github.com/kubernetes/kubernetes/blob/03f134461462f86239067ec20ec17a0ba892db52/staging/src/k8s.io/dynamic-resource-allocation/structured/allocator.go#L164)
when it encounters a claim with a missing `DeviceClassName`. Instead, here we
would check for entries in `FirstAvailableOf`, and add an additional loop,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if there is one ResourceSlice (associated to a node) that only has the 2nd device of firstAvailableOf available, and another node that has the 1st device of firstAvailableOf available. Will the scheduler be instructed to choose the second node? If so, how?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be via scoring, which we do not currently implement for DRA.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made scoring a beta criteria; for alpha, it would not prefer one node over the other. Maybe we'll try to do that in alpha but for now it's a beta criteria.

@k8s-ci-robot k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Oct 9, 2024
Copy link
Member

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
for SIG Scheduling

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, johnbelamaric, jpbetz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment


- Gather feedback
- Implement node scoring
- Cluster auto scaler implementation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In light of the discussion in the other thread, I think this should be "Evaluate the feasibility of an implementation in cluster-autoscaler". Otherwise I would see it unlikely to graduate this to beta in v1.33 as the current proposal intends.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@alculquicondor
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 9, 2024
@kannon92
Copy link
Contributor

kannon92 commented Oct 9, 2024

I tried to find the person that held this PR but unsure where it is.

If this is ready to go, please unhold.

@klueska
Copy link
Contributor

klueska commented Oct 9, 2024

@johnbelamaric held it himself to avoid prematurely merging without PRR approval from someone other than himself:
#4871 (comment)

Comment on lines 483 to 488
DRA today works on a "first match" basis for a given node. That would not change
with this KEP. However, in order for the scheduler to prefer a node that has the
initial prioritized device request, those requests would need a higher score.
This will be implemented for beta. For alpha, the scheduler may still pick a
node with a less preferred device, if there are nodes with each type of device
available.
Copy link
Contributor

@klueska klueska Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify this a bit. The way it reads it seems like you are saying that today the list isn't actually prioritized. Meaning the scheduler is free to pick any of the possible choices, even if a device higher in the list is available on some node.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will. But given how the scheduler works (evaluate fit and score each node), and the fact that nodes do not usually have heterogeneous devices, the priority here is pretty weak. Maybe we need scoring for beta? Even with scoring, it will be an influence but not a strict priority.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some clarification. But I am also open to making scoring part of alpha.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not having scoring for alpha is fine -- we will just need to be aware of this limitation in setups where it might matter. Where I actually see it being used the most though is to write ClaimTemplates that can be used in different environments with different types of GPUs without modification.

This is especially relevant for NIMs which have different variants, optimized for specific hardware configurations (e.g. 1 H100, 2 A100s, 4 T4s, etc.). Depending on what hardware they have been given, they will pull / run with a different model optimized for that specific hardware configuration.

With this feature in place, our NIM Operator can build a ResourceClaimTemplate with FirstAvailableOf requests for the GPU requirements of each optimized model of a given NIM. Then (depending on the available hardware) the scheduler will have the ability to place the NIM one the "best" node possible without having to put any custom logic for this in the operator itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 9, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 9, 2024
@klueska
Copy link
Contributor

klueska commented Oct 9, 2024

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 9, 2024
@k8s-ci-robot k8s-ci-robot merged commit f98f715 into kubernetes:master Oct 9, 2024
@k8s-ci-robot k8s-ci-robot added this to the v1.32 milestone Oct 9, 2024
Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably should not have merged without API review. I think the idea is fine, but I don't love the API as described.

types of devices for the same claim, and apply constraints and configuration
across those different requests.

The proposal adds a new field to the `DeviceRequest`, called `FirstAvailableOf`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"First" could imply "in this list" or "in time". This could maybe be Prefer or Preference ?

// +oneOf=deviceRequestType
FirstAvailableOf []DeviceRequest

...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do the Selectors, AllocationMode+Count, and AdminAccessfields mean whenFirstAvailableOf` is specified?

It looks to me like we're making the same mistakes that Service makes - sets of fields which are partly overlapping. IIRC we talked about how we would add one-of in the 1.31 API discussion, but I will be darned if I can find it amongst the hundreds of other comments.

From the clues we left ourselves (Patrick, I regret asking you to strike some of those notes :) I think we intended a second list, parallel to Devices.

Edit: I found it: kubernetes/kubernetes#125488 (comment)

So I would expect something like:

diff --git a/staging/src/k8s.io/api/resource/v1alpha3/types.go b/staging/src/k8s.io/api/resource/v1alpha3/types.go
index 298d8d1078e..beecbac9728 100644
--- a/staging/src/k8s.io/api/resource/v1alpha3/types.go
+++ b/staging/src/k8s.io/api/resource/v1alpha3/types.go
@@ -354,6 +354,10 @@ type DeviceClaim struct {
 	// +listType=atomic
 	Requests []DeviceRequest `json:"requests" protobuf:"bytes,1,name=requests"`
 
+	// PrioritizedRequests represent requests for devices which may be
+	// satisfied by more than one kind of device.
+	PrioritizedRequests []PrioritizedDeviceRequest
+
 	// These constraints must be satisfied by the set of devices that get
 	// allocated for the claim.
 	//
@@ -456,6 +460,10 @@ type DeviceRequest struct {
 	AdminAccess bool `json:"adminAccess,omitempty" protobuf:"bytes,6,opt,name=adminAccess"`
 }
 
+type PrioritizedDeviceRequest struct {
+	Options []DeviceRequest
+}
+
 const (
 	DeviceSelectorsMaxSize = 32
 )

But that's not quite right because Name and AdminAccess seem like they should be up-leveled or down-leveled, but not both?. We could extract the core:

diff --git a/staging/src/k8s.io/api/resource/v1alpha3/types.go b/staging/src/k8s.io/api/resource/v1alpha3/types.go
index 298d8d1078e..6d00a9531f1 100644
--- a/staging/src/k8s.io/api/resource/v1alpha3/types.go
+++ b/staging/src/k8s.io/api/resource/v1alpha3/types.go
@@ -354,6 +354,10 @@ type DeviceClaim struct {
 	// +listType=atomic
 	Requests []DeviceRequest `json:"requests" protobuf:"bytes,1,name=requests"`
 
+	// PrioritizedRequests represent requests for devices which may be
+	// satisfied by more than one kind of device.
+	PrioritizedRequests []PrioritizedDeviceRequest
+
 	// These constraints must be satisfied by the set of devices that get
 	// allocated for the claim.
 	//
@@ -393,6 +397,20 @@ type DeviceRequest struct {
 	// +required
 	Name string `json:"name" protobuf:"bytes,1,name=name"`
 
+	DeviceRequestSpec
+
+	// AdminAccess indicates that this is a claim for administrative access
+	// to the device(s). Claims with AdminAccess are expected to be used for
+	// monitoring or other management services for a device.  They ignore
+	// all ordinary claims to the device with respect to access modes and
+	// any resource allocations.
+	//
+	// +optional
+	// +default=false
+	AdminAccess bool `json:"adminAccess,omitempty" protobuf:"bytes,6,opt,name=adminAccess"`
+}
+
+type DeviceRequestSpec struct {
 	// DeviceClassName references a specific DeviceClass, which can define
 	// additional configuration and selectors to be inherited by this
 	// request.
@@ -444,15 +462,13 @@ type DeviceRequest struct {
 	// +optional
 	// +oneOf=AllocationMode
 	Count int64 `json:"count,omitempty" protobuf:"bytes,5,opt,name=count"`
+}
+
+type PrioritizedDeviceRequest struct {
+	Name string `json:"name" protobuf:"bytes,1,name=name"`
+
+	Options []DeviceRequestSpec
 
-	// AdminAccess indicates that this is a claim for administrative access
-	// to the device(s). Claims with AdminAccess are expected to be used for
-	// monitoring or other management services for a device.  They ignore
-	// all ordinary claims to the device with respect to access modes and
-	// any resource allocations.
-	//
-	// +optional
-	// +default=false
 	AdminAccess bool `json:"adminAccess,omitempty" protobuf:"bytes,6,opt,name=adminAccess"`
 }

Or we could do something we discussed and declined to do originally - make single-devices be a special case of priorities:

diff --git a/staging/src/k8s.io/api/resource/v1alpha3/types.go b/staging/src/k8s.io/api/resource/v1alpha3/types.go
index 298d8d1078e..f7d1c63c193 100644
--- a/staging/src/k8s.io/api/resource/v1alpha3/types.go
+++ b/staging/src/k8s.io/api/resource/v1alpha3/types.go
@@ -393,6 +393,20 @@ type DeviceRequest struct {
 	// +required
 	Name string `json:"name" protobuf:"bytes,1,name=name"`
 
+	Options []DeviceRequestSpec
+
+	// AdminAccess indicates that this is a claim for administrative access
+	// to the device(s). Claims with AdminAccess are expected to be used for
+	// monitoring or other management services for a device.  They ignore
+	// all ordinary claims to the device with respect to access modes and
+	// any resource allocations.
+	//
+	// +optional
+	// +default=false
+	AdminAccess bool `json:"adminAccess,omitempty" protobuf:"bytes,6,opt,name=adminAccess"`
+}
+
+type DeviceRequestSpec struct {
 	// DeviceClassName references a specific DeviceClass, which can define
 	// additional configuration and selectors to be inherited by this
 	// request.
@@ -444,16 +458,6 @@ type DeviceRequest struct {
 	// +optional
 	// +oneOf=AllocationMode
 	Count int64 `json:"count,omitempty" protobuf:"bytes,5,opt,name=count"`
-
-	// AdminAccess indicates that this is a claim for administrative access
-	// to the device(s). Claims with AdminAccess are expected to be used for
-	// monitoring or other management services for a device.  They ignore
-	// all ordinary claims to the device with respect to access modes and
-	// any resource allocations.
-	//
-	// +optional
-	// +default=false
-	AdminAccess bool `json:"adminAccess,omitempty" protobuf:"bytes,6,opt,name=adminAccess"`
 }
 
 const (

This last has impact on every user, which is why we didn't do it before, but it is SIMPLER.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original API version I pushed had RankedDeviceRequests that was a sibling to Requests in DeviceClaim, but @pohly was concerned about older clients treating it as an empty claim.

I do think it's cleaner not to have the one "unrolled" one-of along with the embedded list of one-ofs, which we have now and is sort of weird.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this KEP is merged, I will shift focus elsewhere, but I do want to redvisit this before code deadline. We CAN move things around between alpha and beta, as long as it can round-trip.

will be attached to the allocation. Otherwise, it will not.

Similarly, for `Constraints`, the list of requests can include the main request
name ("gpu" in this case), in which case the constraint applies regardless of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that "name" must be unique transitively, which will be much harder to express declaratively. Maybe we treat nested names as <parent>/<child> in references, so names only need to be unique within a single list?

Alternately, those places that allow a sub-request could grow a second field, making it very clear that sub-requests are not valid in pod-specs.

### Higher Level Indirection

Rather than embedding a list of alternative request objects, we could use an
indirection at either the `ResourceClaim` level, or the `DeviceClaim` level.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's unclear to lump these two (ResourceClaim and DeviceClaim) in the same alternative, and then only shoot down ResourceClaim. I think it's clear that ResourceClaim is the wrong place for this, but, as above, DeviceClaim could be.

A DeviceClaim is an expression which evaluates to one or more Devices which have some common constrainsts and config. FirstOfDevices is an expression which evaluates to one or more Devices, and seems like a valid sub-expression of a DeviceClaim.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.