KEP-3063: dynamic resource allocation updates for 1.26 #3502

pohly · 2022-09-12T14:00:40Z

One-line PR description: API update and revised milestones.
Issue link: DRA: control plane controller ("classic DRA") #3063
Other comments: see dynamic resource allocation: remove ResourceClaimStatus.Scheduling pohly/enhancements#13 for a previous discussion around these API changes.

alculquicondor · 2022-09-13T13:33:21Z

/cc

Having the scheduler and drivers exchange availability information through the PodScheduling object has several advantages: - users don't need to see the information - a selected node and potential nodes automatically apply to all pending claims - drivers can make holistic decisions about resource availability, for example when a pod requests two distinct GPUs but only some nodes have more than one or when there are interdependencies with other drivers Deallocate gets renamed to DeallocationRequested to make it describe the state of the claim, not an imperative. The reason why it needs to remain in ResourceClaimStatus is explained better. Because the scheduler extender API has no support for Reserve and Unreserve, the previous proposal for replacing usage of PodScheduling with webhook calls is no longer applicable and would have to be extended. This may be feasible, but is more complicated and is left out for now.

The API shouldn't dictate how drivers receive ResourceClass parameters. It might make sense to use namespaced objects, perhaps because then cleaning up a deployment is easier, or an existing object can be reused (like a ConfigMap for the test driver).

…er references Previously, empty string and nil both meant the same thing (core API group). We don't need two different ways of expressing that.

Using APIVersion in the reference leads to ambiguities when the versions change. Using just APIGroup is better. The examples accidentally used "apiVersion". The naming convention (see https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#naming-of-the-reference-field) is to have Ref as field suffix. In our case that also provides a path towards (perhaps) at some point inlining parameters because that field then can be called just "Parameters".

A separate staging repo will be cleaner than reusing component-helpers.

alculquicondor · 2022-09-30T13:32:22Z

keps/sig-node/3063-dynamic-resource-allocation/README.md

@@ -985,7 +997,8 @@ a certain resource class, a node selector can be specified in that class. That
 selector is static and typically will use labels that determine which nodes may
 have resources available.

-To gather information about the current state of resource availability, the
+To gather information about the current state of resource availability and to
+trigger allocation of a claim, the
 scheduler creates a PodScheduling object. That object is owned by the pod and


clarify if it's one per pod or one per resource claim that the pod has?

"one PodScheduling object for the pod" - will add.

alculquicondor · 2022-09-30T13:43:24Z

keps/sig-node/3063-dynamic-resource-allocation/README.md

-        * **resource driver** sets `podScheduling.claims[name=name of claim in pod].unsuitableNodes`
-        * **resource driver** clears `claim.status.selectedNode` -> next attempt by scheduler has more information and is more likely to succeed
+      * **scheduler** creates or updates a `PodScheduling` object with `podScheduling.spec.potentialNodes=<nodes that fit the pod>`
+      * if *exactly one claim is pending* or *all drivers have provided information*:


why exactly one claim?

Because in that special case it is safe to trigger the allocation: if the node is suitable, the allocation will succeed and the pod can get scheduled without further delays. If the node is not suitable, allocation fail and the next attempt can do better because it has more information.

The same should not be done when there are multiple claims because allocation might succeed for some, but not all of them, which would force the scheduler to recover by asking for deallocation. It's better to wait for information in this case.

Added to the KEP.

keps/sig-node/3063-dynamic-resource-allocation/README.md

alculquicondor · 2022-09-30T14:25:55Z

keps/sig-node/3063-dynamic-resource-allocation/README.md

@@ -1635,7 +1641,7 @@ conditions apply:
  Filter

 One of the ResourceClaims satisfying these criteria is picked randomly and deallocation
-is requested by setting the Deallocate field. The scheduler then needs to wait
+is requested by setting the ResourceClaimStatus.DeallocationRequested field. The scheduler then needs to wait


I still think it's better if the driver decides for itself when to deallocate.

For example, if it's a local resource, it might never deallocate from a node. Otherwise, it could start deallocating from a node when reservedFor is empty, perhaps even with a grace period.

Now, if we add a grace period, kube-scheduler might want to give a signal that it "thinks" that a deallocation could help. This signal could work this way:

the scheduler clears the selectedNode field.

the driver observes:

there is a Pod with an empty PodScheduling.spec.selectedNode that requires this allocated claim. Note that this is different from a Pod that was never attempted for scheduling because the PodScheduling object wouldn't exist at this time.

the claim is not reserved for any other pod

Then the driver could choose to trigger deallocation.

Two advantages:

The scheduler only updates one object.

The driver could be more pro-active.

I still think it's better if the driver decides for itself when to deallocate.

For example, if it's a local resource, it might never deallocate from a node. Otherwise, it could start deallocating from a node when reservedFor is empty, perhaps even with a grace period.

A driver should never deallocate a resource that has a non-empty reservedFor. It cannot be sure that the resource is not in use or about to be used.

To make this work, the scheduler would first have to clear the reservedFor to indicate that deallocating the claim is okay. That's doable.

But that alone puts the claim into "allocated, available" state. That's not a state where the driver pro-actively frees the claim because it doesn't know whether the allocation is still needed. For example, claims with immediate allocation enter that state before a pod gets created which uses them.

So we need some additional indicator.

Now, if we add a grace period, kube-scheduler might want to give a signal that it "thinks" that a deallocation could help.
This signal could work this way:

* the scheduler clears the `selectedNode` field. * the driver observes: * there is a Pod with an empty PodScheduling.spec.selectedNode that requires this allocated claim. Note that this is different from a Pod that was never attempted for scheduling because the PodScheduling object wouldn't exist at this time. * the claim is not reserved for any other pod

Then the driver could choose to trigger deallocation.

That would work, but there's a race condition:

The driver observes the above state and starts deallocation.

Another pod gets created which can use the same claim.

The pod gets added to reservedFor and gets scheduled to a node.

The driver finishes the deallocation and the resource is not available for the pod anymore.

This race can be fixed by adding a ClaimStatus.Deallocating boolean that the driver would have to set before starting the deallocation.

I think that would work. Setting reservedFor together with allocation becomes more important, but that's already done. Does it sound better to you?

Two advantages:

* The scheduler only updates one object.

Agreed.

* The driver could be more pro-active.

No, I think it still has to be reactive. But even that is an improvement.

I can't tell if there was some compromise found here? How do we end up in this situation? Is it something like:

pod with 2 claims

driver-1 allocates and only available on nodes (a, b, c)

driver-2 allocates and only available on nodes (x, y, z)
? That shouldn't happen, right?

I guess that could play out as:

pod with 2 claims

scheduler suggests (a, b, c)

both drivers ACK

scheduler chooses a

driver-1 allocates and only available on nodes (a, b, c)

driver-2 allocates and only available on nodes (a, b, c)

Pod tries to run and resource 2 is no longer availble on a ?

Help me understand how we end up here?

This discussion has been superseded by #3502 (comment) below. After thinking about this some more, I realized that @alculquicondor's proposal depended on not releasing a reservation (because doing so would trigger deallocation, according to the definition above) but not releasing a reservation after a failed pod scheduling attempt leads to deadlocks or poor resource utilization. I gave examples for both below.

To answer your specific example: you are right, both drivers should allocate so that the claims are usable on the same node. The problem is not with where claims are usable, but rather which pod is allowed to use a claim exclusively when claims cannot be shared.

I think @alculquicondor accepted my line of argumentation because he added his LGTM after we discussed this.

The problem is not with where claims are usable, but rather which pod is allowed to use a claim exclusively when claims cannot be shared

Not needed for this review, but can you illuminate an example time-sequence that leads to that?

alculquicondor · 2022-09-30T14:30:41Z

keps/sig-node/3063-dynamic-resource-allocation/README.md

@@ -1635,7 +1641,7 @@ conditions apply:
  Filter

 One of the ResourceClaims satisfying these criteria is picked randomly and deallocation
-is requested by setting the Deallocate field. The scheduler then needs to wait
+is requested by setting the ResourceClaimStatus.DeallocationRequested field. The scheduler then needs to wait
 for the resource driver to react to that change and deallocate the resource.

 This may make it possible to run the Pod


What will be the order of this plugin compared to preemption?

This operation here is triggered by observing that it was some allocated claim which prevented pod scheduling. I think this deallocate+wait then should happen before preemption is attempted.

Would preemption be triggered at all in this case? As far as the scheduler knows, all nodes where rejected by the dynamic resource allocation plugin. Why should it evict pods from any of them when those pods where not the reason that the new pod cannot run?

Why should it evict pods from any of them when those pods were not the reason that the new pod cannot run?

What if one of these pods is using the ResourceClaim that the new pod needs?

I missed this comment earlier. The answer is that taking away a claim from a pod is not supported yet, even if a more important pod is blocked by a less important pod. Support for this will have to be planned and added later.

alculquicondor · 2022-09-30T14:35:09Z

keps/sig-node/3063-dynamic-resource-allocation/README.md

-	// A change of the PotentialNodes field in the PodScheduling object
-	// triggers a check in the driver
+	// A change of the PodSchedulingSpec.PotentialNodes field and/or a failed
+	// allocation attempt trigger a check in the driver


Add that UnsuitableNodes should be built from the union of PotentialNodes and the old list of UnsuitableNodes.

If we only take into account PotentialNodes, there could be some back-and-forth between the scheduler and the driver where the scheduler removes UnsuitableNodes from the next PotentialNodes, then in the next attempt the scheduler adds those nodes back because it thinks they are no longer unsuitable.

Agreed. I had the same thought while working on the implementation.

Updated. I also included an explicit constant for the maximum size of these lists.

alculquicondor · 2022-09-30T14:36:28Z

keps/sig-node/3063-dynamic-resource-allocation/README.md

 #### Reserve

 A node has been chosen for the Pod.

 If using delayed allocation and the resource has not been allocated yet,
-the SelectedNode field of the ResourceClaim
+the PodSchedulingSpec.SelectedNode field


You could do a single API call with the data for PotentialNodes and SelectedNode.

You could store the necessary information in memory in PreScore.

I think the implementation already does that. Let me double-check and update the KEP update accordingly.

keps/sig-node/3063-dynamic-resource-allocation/README.md

alculquicondor · 2022-09-30T14:39:12Z

keps/sig-node/3063-dynamic-resource-allocation/README.md


 #### Unreserve

-The scheduler removes the Pod from the ReservedFor field because it cannot be scheduled after
+The scheduler removes the Pod from the ResourceClaimStatus.ReservedFor field because it cannot be scheduled after


Maybe remove the SelectedNode instead?

Doing only that prevents other pods from using the claim. If one pod is potentially stuck while waiting for some other resource, it shouldn't hold onto claims that might be used by some other pod in the meantime.

This leads to a bigger question of how long those claims should be kept in the allocated state. The underlying resources might be needed for some other claim. Let's be optimistic and assume that most of the time, allocated claims will be put to good use quickly, okay?

We can still tune this once people have some real-world experience with this.

Hmm, after re-reading what we said above about triggering deallocating by clearing the reservedFor field it becomes clear that the scheduler must keep the claim reserved here. Otherwise we'll have a lot of thrashing for a pod with several claims:

several drivers start allocating for the selectedNode

most allocations succeed, one doesn't

reservedFor gets cleared, selectedNode unset

all drivers start deallocating

once that is done, allocation is attempted anew for all claims

I think the same argument about being optimistic also applies to reservedFor: let's keep it set as you suggested and hope that the situation gets resolved quickly or (even better) doesn't occur often.

After sleeping over this I realized that this approach with keeping claims reserved has a drawback.

Suppose there are two separate claims and two pods both referencing them. Both pods get scheduled concurrently. One pod manages to allocate one claim and reserves it, the other pod does the same for the other claim. Now both pods are stuck because they only have one out of two claims.

This can happen both for claims with delayed allocation and with immediate allocation.

To solve this, we would need additional logic somewhere that looks at multiple different pods to detect such a deadlock and then resolve it. I don't know where to put that.

Perhaps the original proposal with giving up reservations and deallocation triggered explicitly is better after all? It relies on chance and repeated attempts to get one of the two pods scheduled, i.e. there's no coordination either, but it's not needed because eventually one pod should succeed.

I've added this to the KEP as explanation why Unreserve must remove the pod.

It's not just the risk of deadlocks. Consider claims X and Y and pods A, referencing X and Y, and pod B, referencing X, and pod C, referencing Y. If A reserves X and C reserves Y, then only pod C can run. A waits for C to finish and release Y, B waits for A.

If A releases X right after the failed scheduling attempt, then both B and C can run.

Co-authored-by: Aldo Culquicondor <1299064+alculquicondor@users.noreply.github.com>

This is in response to kubernetes#3502 (review).

alculquicondor

I think we are missing a step where the pod says "I want to reserve this resource for me" (probably somewhere in a spec). Then the driver responds by setting .status.reservedFor (instead of the scheduler doing it).

If there is such mechanism, I think the race condition you describe wouldn't exist: the driver wouldn't deallocate and reserve for a pod at the same time.

alculquicondor · 2022-10-03T07:55:09Z

keps/sig-node/3063-dynamic-resource-allocation/README.md

+because it cannot be scheduled after all.
+
+This is necessary to prevent a deadlock: suppose there are two stand-alone
+claims that only can be used by one pod at a time and two pods which both


what if this resource can be shared by two pods (but not three)? What is validating that only two are reserving it?

We no longer have a usage count in the API (it was there in an earlier draft), now a claim is either shared (= unlimited number of pods) or single-use (= one pod). But it doesn't matter, in all of these cases the apiserver checks this when validating a ResourceClaimStatus update.

Perhaps it's worth calling out a key design aspect of this KEP: all relevant information that is needed to ensure that a claim doesn't enter some invalid state is stored in the claim object itself. That is why the apiserver can validate it and why state updates are atomic.

Additional objects like PodScheduling can trigger operations like allocating a claim, but even then it is the claim object that records "allocation in progress" (through the driver's finalizer) and not that other object or some internal state of the driver.

"deallocation pending" is another such state that needs to be visible in the claim object. The obvious one is "claim has DeletionTimestamp", but for claims that need to go from "allocated" to "not allocated" instead of "removed" we need something else.

We cannot just reduce it to one case (= delete claim) because the scheduler does not control the lifecycle of the claim.

pohly · 2022-10-03T08:38:07Z

I think we are missing a step where the pod says "I want to reserve this resource for me" (probably somewhere in a spec). Then the driver responds by setting .status.reservedFor (instead of the scheduler doing it).

The driver only gets to see that a pod is interested in a claim when doing delayed allocation. In that case it sees the PodScheduling object. We already decided that it then can and should set the reservedFor together with doing the allocation.

But for an already allocated claim the driver is not involved anymore. Serializing all operations on a claim through the driver controller seems more complicated (we'll need additional APIs for it) and risky from a performance point of view to me.

pohly · 2022-10-03T10:27:19Z

I'd also like to point out that conceptually also other entities can be users of a claim. The main intention is for pods, but changing the design so that it only works for pods (for example, by relying on PodScheduling as the API for getting a pod added to reservedBy) might turn out to be a missed opportunity later on.

The implementation already moved to a separate type. OwnerReference had several fields that were not needed. The maximum size needs to be a constant because various consumers of the API will need to check it.

It makes sense to start with scalability testing early, so for beta we should already have a test that seems realistic. For GA we can then extend that based on user feedback.

alculquicondor

I'm not fully convinced of the API fields used for scheduling. In particular, I'm not keen on the idea of kube-scheduler modifying the ResourceClaim status, competing with the driver. I believe that kube-scheduler should only be writing to the PodScheduling object.

However, since this KEP is relying on new alpha APIs, rather than modifying the stable PodSpec, we can still backtrack in a follow up release.

So, lgtm from sig-scheduling for alpha.

derekwaynecarr · 2022-10-06T17:52:33Z

@thockin @dchen1107 do you want to take a pass at this too? at a high level its fine. the original draft asserts changes may be needed for resource quota, but resource quota can do object count quota on generic crds so i dont see that we need a specific 'resourceclaims` token for use in quota.

thockin

I took another read through this, flagged a few questions. Mostly, it remains the most anxiety inducing KEP of all time, and I can't help but throw myself at the idea that maybe there's some fundamental invariant we can assert which will make this dramatically simpler. I have not found it yet.

thockin · 2022-10-05T18:52:50Z

keps/sig-node/3063-dynamic-resource-allocation/README.md

@@ -1030,19 +1041,17 @@ else changes in the system, like for example deleting objects.
  * **scheduler** filters nodes


"...based on built-in resources" ?

Yes, and other plugins (like the volume binder for storage capacity). I'll add that.

keps/sig-node/3063-dynamic-resource-allocation/README.md

thockin · 2022-10-05T20:41:33Z

keps/sig-node/3063-dynamic-resource-allocation/README.md

-gets set here and the scheduling attempt gets stopped for now. It will be
-retried when the ResourceClaim status changes.
+If using delayed allocation and one or more claims have not been allocated yet,
+the plugin now needs to decide whether it wants to trigger allocation by


I thought only scheduler sets SelectedNode, but this says Plugin?

The scheduler plugin. Elsewhere I have only said "the scheduler", but that meant the same thing because the core scheduler doesn't know anything about claims - everything related to those happens inside the scheduler plugin.

I'll make this more consistent and stick with "the scheduler".

After looking at some other sections I found that "the plugin does xyz" was already used, so I changed my mind: in this chapter, let's use "the scheduler" for the core scheduler logic and "the claim plugin" for anything related to claims.

lavalamp · 2022-10-06T19:54:30Z

keps/sig-node/3063-dynamic-resource-allocation/README.md

-// needed.
+// PodScheduling objects get created by a scheduler when it handles
+// a pod which uses one or more unallocated ResourceClaims with delayed
+// allocation.
 type PodScheduling struct {


Please name this more specifically, e.g. from the name I expected it to be an explanation of the scheduler's decision or something.

Any suggestions? I know that the current usage is just for claims, but calling it PodClaimScheduling would become inappropriate as soon as we find other usages for this object, which is a possibility.

On the other hand this is only alpha. We can name it PodClaimScheduling now and still rename it later.

@thockin : you suggested that name. What do you think?

My issue with the name and the current api type is that it looks hard to expand this into other plausible use cases for a pod scheduling object.

Can we keep this as-is for now and make a final decision during the API review of the implementation?

I don't want to miss the KEP deadline because of a naming discussion.

Yes, apis are never final until then.

Is there an explanation in this KEP why the scheduler has to do this reach-out to drivers? I am reading from the top and haven't figured it out yet.

Because the scheduler and the generic claim plugin inside it have no idea what the claim parameters mean and where a claim might be allocated. That entire logic is provided by the drivers.

The "Coordinating resource allocation through the scheduler" section is meant to explain this, but I probably lack this particular detail where it says "a node is selected tentatively by the scheduler".

I've updated this section to:

For delayed allocation, a node is selected tentatively by the scheduler in an iterative process where the scheduler suggests some potential nodes that fit the other resource requirements of a Pod and resource drivers respond with information about whether they can allocate claims for those nodes. This exchange of information happens through the `PodScheduling` object for a Pod. The scheduler has to involve the drivers because it doesn't know what claim parameters mean and where suitable resources are currently available. Once the scheduler is confident that it has enough information to select a node that will probably work for all claims, it asks the driver(s) to allocate their resources for that node. If that succeeds, the Pod can get scheduled. If it fails, the scheduler must determine whether some other node fits the requirements and if so, request allocation again. If no node fits because some resources were already allocated for a node and are only usable there, then those resources must be released and then get allocated elsewhere.

"The scheduler" is now the core scheduler logic and "the claim plugin" is the new scheduler plugin which handles ResourceClaims.

The "Coordinating resource allocation through the scheduler" needs to explain some of the basic design decisions better.

pohly · 2022-10-06T20:53:50Z

@lavalamp, @thockin: I have pushed an update which may address some of the points you raised. Please take another look.

thockin · 2022-10-06T21:39:18Z

I am LGTMing based on @alculquicondor previous LGTM

/lgtm
/approve

pohly · 2022-10-06T21:52:09Z

/hold cancel

Because Tim had a look.

@derekwaynecarr: I think you can approve now. My understanding is that @lavalamp is okay with not settling all API-related questions before the KEP merge.

lavalamp · 2022-10-06T22:48:00Z

My understanding is that lavalamp is ok with not settling all API-related questions before the KEP merge.

Yeah, that's what API reviews are for.

(As for the KEP, I don't understand it well enough to either approve or block it)

thockin · 2022-10-06T23:22:08Z

Also, this PR is a refinement on an already merged KEP. Not that we can't go back over it with the finest of fine-toothed combs, but I don't think that should block this refinement.

thockin · 2022-10-06T23:22:34Z

oh, it need a sig node owner.

dchen1107 · 2022-10-07T00:22:47Z

/lgtm
/approve

Even there are still some open questions, but we can continue those discussion for alpha implementation. I approved it for moving forward. Thanks.

k8s-ci-robot · 2022-10-07T00:22:57Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dchen1107, pohly, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/sig-node/OWNERS~~ [dchen1107]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

This is in response to kubernetes#3502 (review).

dynamic resource allocation: describe CDI device format

29e9e83

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 12, 2022

k8s-ci-robot requested review from dchen1107 and derekwaynecarr September 12, 2022 14:00

k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 12, 2022

pohly mentioned this pull request Sep 13, 2022

dynamic resource allocation: remove ResourceClaimStatus.Scheduling pohly/enhancements#13

Closed

k8s-ci-robot requested a review from alculquicondor September 13, 2022 13:33

SergeyKanzhelev added this to Triage in SIG Node PR Triage Sep 19, 2022

pohly mentioned this pull request Sep 21, 2022

DRA: control plane controller ("classic DRA") #3063

Open

38 tasks

pohly added 3 commits September 22, 2022 18:55

dynamic resource allocation: retarget for alpha in 1.26

1703065

pohly force-pushed the dynamic-resource-allocation-upstream branch from a60476e to e69ccae Compare September 22, 2022 16:55

pohly added 3 commits September 23, 2022 15:16

dynamic resource allocation: avoid nil vs. empty ambiguity in paramet…

140a6e9

…er references Previously, empty string and nil both meant the same thing (core API group). We don't need two different ways of expressing that.

dynamic resource allocation: use k8s.io/dynamic-resource-allocation

a85988a

A separate staging repo will be cleaner than reusing component-helpers.

alculquicondor reviewed Sep 30, 2022

View reviewed changes

pohly and others added 2 commits September 30, 2022 19:39

Update keps/sig-node/3063-dynamic-resource-allocation/README.md

586e8dc

Co-authored-by: Aldo Culquicondor <1299064+alculquicondor@users.noreply.github.com>

dynamic resource allocation: API constant and scheduler update

a705dc7

This is in response to kubernetes#3502 (review).

alculquicondor reviewed Oct 3, 2022

View reviewed changes

dynamic resource allocation: update ReservedFor API

2bfcb9e

The implementation already moved to a separate type. OwnerReference had several fields that were not needed. The maximum size needs to be a constant because various consumers of the API will need to check it.

pohly force-pushed the dynamic-resource-allocation-upstream branch from 6a0937a to 2bfcb9e Compare October 4, 2022 09:23

pohly added 2 commits October 5, 2022 10:31

dynamic resource allocation: split up scalability test criteria

5c33eef

It makes sense to start with scalability testing early, so for beta we should already have a test that seems realistic. For GA we can then extend that based on user feedback.

dynamic resource allocation: finalize checklist

86726f8

alculquicondor reviewed Oct 5, 2022

View reviewed changes

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 5, 2022

k8s-ci-robot assigned alculquicondor Oct 5, 2022

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 5, 2022

SergeyKanzhelev moved this from Triage to Needs Approver in SIG Node PR Triage Oct 5, 2022

thockin reviewed Oct 6, 2022

View reviewed changes

lavalamp reviewed Oct 6, 2022

View reviewed changes

pohly added 3 commits October 6, 2022 22:11

dynamic resource allocation: consistent naming in scheduler section

89c3b5f

"The scheduler" is now the core scheduler logic and "the claim plugin" is the new scheduler plugin which handles ResourceClaims.

dynamic resource allocation: clarify how the scheduler filters nodes

bcb9046

dynamic resource allocation: clarify role of drivers during scheduling

b3ee3f4

The "Coordinating resource allocation through the scheduler" needs to explain some of the basic design decisions better.

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 6, 2022

k8s-ci-robot assigned thockin Oct 6, 2022

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 6, 2022

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 6, 2022

k8s-ci-robot assigned dchen1107 Oct 7, 2022

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 7, 2022

k8s-ci-robot merged commit af902dc into kubernetes:master Oct 7, 2022

SIG Node PR Triage automation moved this from Needs Approver to Done Oct 7, 2022

k8s-ci-robot added this to the v1.26 milestone Oct 7, 2022

pohly mentioned this pull request Oct 11, 2022

dynamic resource allocation: API kubernetes/kubernetes#112981

Closed

pohly mentioned this pull request Oct 31, 2022

dynamic resource allocation kubernetes/kubernetes#111023

Merged

8 tasks

ahmedtd pushed a commit to ahmedtd/enhancements that referenced this pull request Feb 2, 2023

dynamic resource allocation: API constant and scheduler update

9622bea

This is in response to kubernetes#3502 (review).

		@@ -1030,19 +1041,17 @@ else changes in the system, like for example deleting objects.
		* scheduler filters nodes

KEP-3063: dynamic resource allocation updates for 1.26 #3502

KEP-3063: dynamic resource allocation updates for 1.26 #3502

Conversation

pohly commented Sep 12, 2022

alculquicondor commented Sep 13, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pohly Sep 30, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pohly Oct 6, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alculquicondor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pohly Oct 3, 2022 • edited Loading

Choose a reason for hiding this comment

pohly commented Oct 3, 2022

pohly commented Oct 3, 2022

alculquicondor left a comment

Choose a reason for hiding this comment

derekwaynecarr commented Oct 6, 2022

thockin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pohly Oct 6, 2022 • edited Loading

Choose a reason for hiding this comment

pohly commented Oct 6, 2022

thockin commented Oct 6, 2022

pohly commented Oct 6, 2022

lavalamp commented Oct 6, 2022

thockin commented Oct 6, 2022

thockin commented Oct 6, 2022

dchen1107 commented Oct 7, 2022

k8s-ci-robot commented Oct 7, 2022

pohly Sep 30, 2022 •

edited

Loading

pohly Oct 6, 2022 •

edited

Loading

pohly Oct 3, 2022 •

edited

Loading

pohly Oct 6, 2022 •

edited

Loading