Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP: Support for peremption based on flavor order #810

Merged

Conversation

KunWuLuan
Copy link
Contributor

What type of PR is this?

/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #582

Special notes for your reviewer:

Does this PR introduce a user-facing change?


@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 25, 2023
@netlify
Copy link

netlify bot commented May 25, 2023

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 277e73c
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/64fa976c3838600008c525f1

@k8s-ci-robot
Copy link
Contributor

Hi @KunWuLuan. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 25, 2023
@tenzen-y
Copy link
Member

/release-note-none
/ok-to-test

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 25, 2023
@alculquicondor
Copy link
Contributor

cc @trasc for comment


We will first try `default-flavor1` for cpu and memory resources. If `default-flavor1` doesn't fit, we try preempt in `default-flavor1`. If preemptor return preempt fail, we try to allocate resource in `default-flavor2`.

We will pass preemptor to flavorAssigner and let flavorAssigner try to call public interface of preemptor to get candidate for certain podSet in current flavor if podSet can not find enough resources in current flavor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will the preemptor know which flavors are wanted for each call?

Just as a note: currently , the preemptor depends on flavorassigner , not the other way around.

Copy link
Contributor Author

@KunWuLuan KunWuLuan May 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently preemptor will find resource which need preemption in Do func:

func (p *Preemptor) Do(ctx context.Context, wl workload.Info, assignment flavorassigner.Assignment, snapshot *cache.Snapshot) (int, error) {
log := ctrl.LoggerFrom(ctx)
resPerFlv := resourcesRequiringPreemption(assignment)
cq := snapshot.ClusterQueues[wl.ClusterQueue]

resPerFlv contains flavor info and resource name, we can modify preemptor to allow preemptor accept resPerFlv as argument directly. Then preemptor will know the flavor.

we can can achieve this by moving resourcesRequiringPreemption and totalRequestsForAssignment to outside.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update based on #806

keps/582-preempt-based-on-flavor-order/README.md Outdated Show resolved Hide resolved
keps/582-preempt-based-on-flavor-order/README.md Outdated Show resolved Hide resolved
keps/582-preempt-based-on-flavor-order/README.md Outdated Show resolved Hide resolved
keps/582-preempt-based-on-flavor-order/kep.yaml Outdated Show resolved Hide resolved

We will pass preemptor to flavorAssigner and let flavorAssigner try to call public interface of preemptor to get candidate for certain podSet in current flavor if podSet can not find enough resources in current flavor.

### Test Plan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we have some confidence over the API and algorithm design, please fill this up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll do that

keps/582-preempt-based-on-flavor-order/README.md Outdated Show resolved Hide resolved
gpu: 1
```

We will first try `default-flavor1` for cpu and memory resources. If `default-flavor1` doesn't fit, we try preempt in `default-flavor1`. If preemptor return preempt fail, we try to allocate resource in `default-flavor2`.
Copy link
Member

@tenzen-y tenzen-y May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If default-flavor1 doesn't fit, we try preempt in default-flavor1.

In this situation, what happens if we can preempt default-flavor but it only frees up resources for 2 pods of more high-priority podSets?

Do we preempt default-flavor only If we can free up resources for all pods? Or, do we preempt default-flavor even if we can free up resources for part of the pods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently we only allow one kind of resource use one flavor (because now there is only one flavor assignment for one kind of resource).
so I think in this pr we only preempt default-flavor only If we can free up resources for all pods.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because now there is only one flavor assignment for one kind of resource

You're right. We probably should take this situation as another feature.

@alculquicondor @kerthcet Do you think creating an issue for this feature (AssignMultipleFlavorsToWorkload?) is worth it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the preemption logic should not change.
With #806, we can check whether preemption could find enough candidates.
If it's not possible, then we go to the next flavor.

Do you think creating an issue for this feature (AssignMultipleFlavorsToWorkload?) is worth it?

I don't think we should do this. In most scenarios, this could cause pods to land in very different places, with potentially higher network latency. Also, we can't enforce it, as we wouldn't be able to set node selectors for a subset of pods.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should do this. In most scenarios, this could cause pods to land in very different places, with potentially higher network latency. Also, we can't enforce it, as we wouldn't be able to set node selectors for a subset of pods.

That makes sense.

@alculquicondor
Copy link
Contributor

@KunWuLuan would you mind adding an extension interface proposal for this KEP?

}
```

Length of FlavorFungibility should be either 0 or 3, this means we don't allow users to omit any of these three actions. If FlavorFungibility is empty, we set FlavorFungibility as `Borrow,TryNextFlavor,Preempt`. Actions after `TryNextFlavor` will be executed after all flavors are considered, and actions before `TryNextFlavor` will be executed before next flavor be considered.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea that all the actions should be listed. However, I'm not aware of any k8s API that looks like this.

I started a thread on slack https://kubernetes.slack.com/archives/CJUQN3E4T/p1686161408595329

Are you sure all permutations are valid? For example, if we try to Borrow before Preempting, there could be priority inversion, similar to this: #763 (comment)

Somehow TryNextFlavor is special. Should it be something like the following?

beforeTryingNextFlavor: ["Preempt", "Borrow"]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we will borrow unused quota from other cluster queues before we try preempting.
Do you mean we will change this behavior?
If we can say that preemption always takes precedence over borrowing, I think plan A is better.

Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were initially targeting the release of 0.4 by the end of June. I think we might be too late for a feature like this for which we don't have a clear API yet.

We could still make progress and target 0.5 for this feature. WDYT?

@KunWuLuan
Copy link
Contributor Author

would you mind adding an extension interface proposal for this KEP?

Do you mean

We were initially targeting the release of 0.4 by the end of June. I think we might be too late for a feature like this for which we don't have a clear API yet.

We could still make progress and target 0.5 for this feature. WDYT?

No problem, I think more discussion lead to a better design.

@alculquicondor
Copy link
Contributor

cc @mwielgus

instances. If high priority jobs can preempt jobs in standard instances before trying spot instances,
stability can be achieved.

My use case can be supported by setting `flavorFungibility` to `BeforeNextFlavor` in the Kueue configuration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence looks outdated. The current proposal is doing it in the ClusterQueue spec, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still outdated

}
```

If flavorFungibility is nil in configuration, we will set the `WhenCanBorrow` to `true` and set `WhenCanPreempt` to `false` to maintain consistency with the current behavior.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks outdated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still outdated

```

If flavorFungibility is nil in configuration, we will set the `WhenCanBorrow` to `true` and set `WhenCanPreempt` to `false` to maintain consistency with the current behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain what's the default policy, which should match the current behavior.

IIUC, the current behavior is:

whenCanBorrow: Borrow
whenCanPreempt: TryNextFlavor


If flavorFungibility is nil in configuration, we will set the `WhenCanBorrow` to `true` and set `WhenCanPreempt` to `false` to maintain consistency with the current behavior.

### Behavior Changes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the semantics with respect to Partial Admission?

}
```

We will store the scheduling context in workload info so that we can start from where we stop in previous scheduling attempts. This will be useful to avoid to waste time in one flavor all the time if we try to preempt in a flavor and failed. Scheduling context will contain the `LastScheduledFlavorIdx`, `ResourceFlavors` attached to the CQ, `ClusterQueueUsage` of the CQ and `CohortUsage`. Any changes to these properties will lead to a scheduling from the first flavor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some implementation detail: if the flavors in the CQ change, we should wipe the stored context. Maybe we can keep a hash to detect the changes.

### Implementation

```
func assignFlavors(log logr.Logger, requests []workload.PodSetResources, podSets []kueue.PodSet, resourceFlavors map[kueue.ResourceFlavorReference]*kueue.ResourceFlavor, cq *cache.ClusterQueue, lastSchedule *workload.LastScheduleClusterQueueState) Assignment {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the code here.

@KunWuLuan
Copy link
Contributor Author

work in #849

@KunWuLuan KunWuLuan closed this Aug 31, 2023
@alculquicondor
Copy link
Contributor

We still should have the accompanying KEP merged. Could you have it up-to-date?

@KunWuLuan
Copy link
Contributor Author

We still should have the accompanying KEP merged. Could you have it up-to-date?

Sorry I had a misoperation. I have updated the kep.

@alculquicondor
Copy link
Contributor

Can you run make toc-update?

keps/582-preempt-based-on-flavor-order/README.md Outdated Show resolved Hide resolved
keps/582-preempt-based-on-flavor-order/README.md Outdated Show resolved Hide resolved
instances. If high priority jobs can preempt jobs in standard instances before trying spot instances,
stability can be achieved.

My use case can be supported by setting `flavorFungibility` to `BeforeNextFlavor` in the Kueue configuration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still outdated

}
```

If flavorFungibility is nil in configuration, we will set the `WhenCanBorrow` to `true` and set `WhenCanPreempt` to `false` to maintain consistency with the current behavior.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still outdated

keps/582-preempt-based-on-flavor-order/kep.yaml Outdated Show resolved Hide resolved
keps/582-preempt-based-on-flavor-order/kep.yaml Outdated Show resolved Hide resolved
keps/582-preempt-based-on-flavor-order/kep.yaml Outdated Show resolved Hide resolved
@KunWuLuan
Copy link
Contributor Author

Readme is updated already. All comments were resolved.

@alculquicondor
Copy link
Contributor

/approve
Please run make toc-update again.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, KunWuLuan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 6, 2023
@alculquicondor
Copy link
Contributor

/label tide/merge-method-squash

@k8s-ci-robot k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Sep 6, 2023
@KunWuLuan
Copy link
Contributor Author

Have run make toc-update

@alculquicondor
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 8, 2023
@alculquicondor
Copy link
Contributor

/retest
due to #1090

@k8s-ci-robot k8s-ci-robot merged commit 1a7e294 into kubernetes-sigs:main Sep 8, 2023
15 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.5 milestone Sep 8, 2023
@KunWuLuan KunWuLuan deleted the kep/preemptionFlavorOrder branch September 12, 2023 03:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Preemption based on resource flavor order
6 participants