KEP-693: MultiKueue #1380

mwielgus · 2023-11-29T11:49:54Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

Introduce MultiCluster support in Kueue.

Which issue(s) this PR fixes:

Fixes #693

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

netlify · 2023-11-29T11:50:00Z

✅ Deploy Preview for kubernetes-sigs-kueue canceled.

Name	Link
🔨 Latest commit	`e8c208c`
🔍 Latest deploy log	https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/658dedb1c39dd70008011116

mwielgus · 2023-11-29T11:50:12Z

cc: @trasc

keps/693-multikueue/README.md

keps/693-multikueue/kep.yaml

keps/693-multikueue/README.md

alculquicondor · 2023-11-29T20:25:44Z

cc @nstogner @ahaysx

alculquicondor · 2023-11-30T21:03:55Z

cc @dejanzele

keps/693-multikueue/README.md

trasc

I'll do another pass after I'm done with the deleate -> multikue rename.

keps/693-multikueue/README.md

trasc · 2023-12-06T10:57:28Z

keps/693-multikueue/README.md

+Then it will remove the workloads from the remaining clusters and allow the
+single instance of the job to proceed. The workload will be also admitted in 
+the management cluster.


Suggested change

Then it will remove the workloads from the remaining clusters and allow the

single instance of the job to proceed. The workload will be also admitted in

the management cluster.

Then it will remove the workloads from the remaining remote clusters and allow the

single instance of the job to proceed. The local workload will get the admission check set to retry in order to free the local quota.

Why to set the setting to retry? There is no local quota in the managment cluster.

keps/693-multikueue/README.md

dejanzele · 2023-12-07T20:01:22Z

How will you handle if a worker cluster fails for some reason (network issues, cluster goes down...)? Will you have some sort of job leases and job periodically reporting back they are still executing?

keps/693-multikueue/README.md

alculquicondor

/approve

Just incorporate some thoughts based on the last comments from @mimowo.

@tenzen-y anything to add?

keps/693-multikueue/README.md

tenzen-y · 2023-12-28T11:10:44Z

/approve

Just incorporate some thoughts based on the last comments from @mimowo.

@tenzen-y anything to add?

I can review this KEP within this week.

alculquicondor · 2023-12-28T14:02:44Z

Thanks
/lgtm
/hold for @tenzen-y

k8s-ci-robot · 2023-12-28T14:02:52Z

LGTM label has been added.

Git tree hash: 62b43f0304ec7815182c32fcebbfecfd87bdb00e

tenzen-y

@mwielgus Can a management cluster serve concurrently in the role of a worker cluster?
We can imagine the following situation:

cluster A: Management And Worker cluster
cluster B: Worker cluster

keps/693-multikueue/README.md

tenzen-y · 2023-12-28T20:58:37Z

keps/693-multikueue/README.md

+When the job is running MultiKueue controller will copy its status from worker cluster
+to the management cluster, to keep the impression that the job is running in the management 
+cluster. This is needed to allow pipelines and workflow engines to execute against 
+the management cluster with MultiKueue without any extra changes. 


What happens when the kueue manager loses connectivity to the worker cluster after some workloads are admitted?

Especially preemption and waitForPodsReady, what happens?

We assume the total loss of the cluster and all admitted workloads are suspended/requeued. Once the cluster is reconnected, we remove duplicated admitted workloads just as if two of them were admitted at the same time.
Added to the doc.

If preemption targets exist in the connection loss cluster, what happens?
Kueue scheduler will try to preempt the targets forever, right?

If preemption targets exist in the connection loss cluster, what happens?
Kueue scheduler will try to preempt the targets forever, right?

I read an updated doc and then I understand what happens in the above situation.

tenzen-y · 2023-12-28T21:34:15Z

@mwielgus Can a management cluster serve concurrently in the role of a worker cluster? We can imagine the following situation:

cluster A: Management And Worker cluster cluster B: Worker cluster

@mwielgus I'm waiting only for this.
If I miss any mentioning, please let me know.

mwielgus · 2023-12-28T21:38:22Z

@tenzen-y Yes, such configuration will be possible in the future, once we establish kubernetes/enhancements#4370 as a universal standard for selectively disabling controllers for other API/CRD objects. Right now the only option for CRDs in the management cluster is to install API definitions but without controllers, that prevents allowing two roles inside one cluster.

tenzen-y · 2023-12-28T21:45:42Z

@tenzen-y Yes, such configuration will be possible in the future, once we establish kubernetes/enhancements#4370 as a universal standard for disabling controllers for other APIs/CRDs. Right now the only option for CRDs in the management cluster is to install API definitions but without controllers, that prevents allowing two roles inside one cluster.

@mwielgus It makes sense. Can we mention that in NonGoals? I think that we can say that multiple roles inside one cluster without kubernetes/enhancements#4370 can not be supported.

mwielgus · 2023-12-28T21:50:55Z

@tenzen-y Added.

tenzen-y

@mwielgus Thanks! I'm looking forward the MultiKueue 🎉
/lgtm
/approve

k8s-ci-robot · 2023-12-28T21:53:27Z

LGTM label has been added.

Git tree hash: 828418fd930219441c5d1295b67184cce9512373

k8s-ci-robot · 2023-12-28T21:53:28Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, mwielgus, tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [alculquicondor,tenzen-y]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

tenzen-y · 2023-12-28T21:53:32Z

/hold cancel

k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Nov 29, 2023

k8s-ci-robot requested review from mimowo and trasc November 29, 2023 11:50

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 29, 2023

mimowo reviewed Nov 29, 2023

View reviewed changes

mwielgus force-pushed the mk-kep branch from ecba1de to c245900 Compare November 29, 2023 12:53

mimowo reviewed Nov 29, 2023

View reviewed changes

keps/693-multikueue/README.md Show resolved Hide resolved

mwielgus force-pushed the mk-kep branch from c245900 to 20263b4 Compare November 30, 2023 11:00

alculquicondor reviewed Nov 30, 2023

View reviewed changes

mimowo reviewed Dec 5, 2023

View reviewed changes

keps/693-multikueue/README.md Show resolved Hide resolved

trasc mentioned this pull request Dec 5, 2023

Introduce util admissioncheck package and make ConfigHelper generic #1354

Merged

mimowo mentioned this pull request Dec 5, 2023

Prebuilt workload suport #1358

Merged

mwielgus force-pushed the mk-kep branch from 20263b4 to 6e59f26 Compare December 5, 2023 11:51

mimowo mentioned this pull request Dec 6, 2023

Multi-cluster tests setups #1360

Merged

trasc reviewed Dec 6, 2023

View reviewed changes

ahaysx reviewed Dec 7, 2023

View reviewed changes

keps/693-multikueue/README.md Show resolved Hide resolved

keps/693-multikueue/README.md Show resolved Hide resolved

ahaysx reviewed Dec 7, 2023

View reviewed changes

keps/693-multikueue/README.md Show resolved Hide resolved

dejanzele reviewed Dec 7, 2023

View reviewed changes

keps/693-multikueue/README.md Show resolved Hide resolved

dejanzele reviewed Dec 7, 2023

View reviewed changes

keps/693-multikueue/README.md Show resolved Hide resolved

dejanzele reviewed Dec 7, 2023

View reviewed changes

keps/693-multikueue/README.md Show resolved Hide resolved

dejanzele reviewed Dec 7, 2023

View reviewed changes

keps/693-multikueue/README.md Show resolved Hide resolved

dejanzele reviewed Dec 8, 2023

View reviewed changes

keps/693-multikueue/README.md Outdated Show resolved Hide resolved

dejanzele reviewed Dec 8, 2023

View reviewed changes

keps/693-multikueue/README.md Outdated Show resolved Hide resolved

alculquicondor reviewed Dec 27, 2023

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 27, 2023

alculquicondor reviewed Dec 27, 2023

View reviewed changes

keps/693-multikueue/README.md Show resolved Hide resolved

mwielgus force-pushed the mk-kep branch from bdd7109 to a956643 Compare December 28, 2023 11:41

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 28, 2023

k8s-ci-robot assigned alculquicondor Dec 28, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 28, 2023

tenzen-y reviewed Dec 28, 2023

View reviewed changes

mwielgus force-pushed the mk-kep branch from a956643 to 1a52afe Compare December 28, 2023 21:24

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 28, 2023

k8s-ci-robot requested a review from alculquicondor December 28, 2023 21:24

KEP-693: MultiKueue

e8c208c

mwielgus force-pushed the mk-kep branch from 1a52afe to e8c208c Compare December 28, 2023 21:50

tenzen-y reviewed Dec 28, 2023

View reviewed changes

k8s-ci-robot assigned tenzen-y Dec 28, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 28, 2023

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 28, 2023

k8s-ci-robot merged commit 51a1773 into kubernetes-sigs:main Dec 28, 2023
14 checks passed

k8s-ci-robot added this to the v0.6 milestone Dec 28, 2023

mimowo mentioned this pull request Feb 27, 2024

Support for the Job managedBy field (alpha) kubernetes/kubernetes#123273

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KEP-693: MultiKueue #1380

KEP-693: MultiKueue #1380

mwielgus commented Nov 29, 2023

netlify bot commented Nov 29, 2023 •

edited

Loading

mwielgus commented Nov 29, 2023

alculquicondor commented Nov 29, 2023

alculquicondor commented Nov 30, 2023

trasc left a comment

trasc Dec 6, 2023

mwielgus Dec 9, 2023

dejanzele commented Dec 7, 2023

alculquicondor left a comment

tenzen-y commented Dec 28, 2023 •

edited

Loading

alculquicondor commented Dec 28, 2023

k8s-ci-robot commented Dec 28, 2023

tenzen-y left a comment

tenzen-y Dec 28, 2023

mwielgus Dec 28, 2023

tenzen-y Dec 28, 2023

tenzen-y Dec 28, 2023

tenzen-y commented Dec 28, 2023

mwielgus commented Dec 28, 2023 •

edited

Loading

tenzen-y commented Dec 28, 2023

mwielgus commented Dec 28, 2023

tenzen-y left a comment

k8s-ci-robot commented Dec 28, 2023

k8s-ci-robot commented Dec 28, 2023

tenzen-y commented Dec 28, 2023

KEP-693: MultiKueue #1380

KEP-693: MultiKueue #1380

Conversation

mwielgus commented Nov 29, 2023

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

netlify bot commented Nov 29, 2023 • edited Loading

✅ Deploy Preview for kubernetes-sigs-kueue canceled.

mwielgus commented Nov 29, 2023

alculquicondor commented Nov 29, 2023

alculquicondor commented Nov 30, 2023

trasc left a comment

Choose a reason for hiding this comment

trasc Dec 6, 2023

Choose a reason for hiding this comment

mwielgus Dec 9, 2023

Choose a reason for hiding this comment

dejanzele commented Dec 7, 2023

alculquicondor left a comment

Choose a reason for hiding this comment

tenzen-y commented Dec 28, 2023 • edited Loading

alculquicondor commented Dec 28, 2023

k8s-ci-robot commented Dec 28, 2023

tenzen-y left a comment

Choose a reason for hiding this comment

tenzen-y Dec 28, 2023

Choose a reason for hiding this comment

mwielgus Dec 28, 2023

Choose a reason for hiding this comment

tenzen-y Dec 28, 2023

Choose a reason for hiding this comment

tenzen-y Dec 28, 2023

Choose a reason for hiding this comment

tenzen-y commented Dec 28, 2023

mwielgus commented Dec 28, 2023 • edited Loading

tenzen-y commented Dec 28, 2023

mwielgus commented Dec 28, 2023

tenzen-y left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Dec 28, 2023

k8s-ci-robot commented Dec 28, 2023

tenzen-y commented Dec 28, 2023

netlify bot commented Nov 29, 2023 •

edited

Loading

tenzen-y commented Dec 28, 2023 •

edited

Loading

mwielgus commented Dec 28, 2023 •

edited

Loading