Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multikueue initial implementation #1313

Merged
merged 14 commits into from Jan 18, 2024
Merged

Conversation

trasc
Copy link
Contributor

@trasc trasc commented Nov 8, 2023

What type of PR is this?

/kind feature

What this PR does / why we need it:

Add the basic functionality of MultiKueue.

Which issue(s) this PR fixes:

Relates to #693

Special notes for your reviewer:

Noteble differences vs KEP:

  • MultiKueueClusters are members of the MultiKueueConfig (not standalone objects)
  • Live Job Status update is not implemented

They can be in follow-ups.

Other follow-up points:

  • Mark the admission check ready to reach local Admission when the workload is admitted on a remote.
    We need to be sure that the local job controller is not starting the execution once the job is unsuspended.
  • Revisit the behavior for miss-configuration of queues (having multiple multikueue admission checks).
    Currently the workload is skipped.
  • Implement remote objects deletion on local workload removal.
  • Revisit connection between management cluster and worker cluster is lost. (More relevant for the case when the worker lost is the one executing the workload)

Does this PR introduce a user-facing change?

MultiKueue: Multi cluster job dispatching for k8s Job. This doesn't include support for live status updates.

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Nov 8, 2023
Copy link

netlify bot commented Nov 8, 2023

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 7980152
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/65a9338208f67500078a35b9

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 8, 2023
@trasc
Copy link
Contributor Author

trasc commented Nov 8, 2023

/test all

@trasc
Copy link
Contributor Author

trasc commented Nov 8, 2023

/test pull-kueue-test-e2e-main-1-25

@trasc
Copy link
Contributor Author

trasc commented Nov 8, 2023

/test pull-kueue-test-e2e-main-1-28

@mimowo
Copy link
Contributor

mimowo commented Nov 8, 2023

This will be very helpful for manual and automated testing, thanks!

@trasc
Copy link
Contributor Author

trasc commented Nov 8, 2023

/test pull-kueue-test-e2e-main-1-28

@trasc
Copy link
Contributor Author

trasc commented Nov 8, 2023

/test all

@trasc
Copy link
Contributor Author

trasc commented Nov 8, 2023

This will be very helpful for manual and automated testing, thanks!

Sure, however ,for the time being I propose to keep it as draft until we have a real job for it to do.
I've created the PR mainly to make sure it will work in prow jobs.

@trasc
Copy link
Contributor Author

trasc commented Nov 8, 2023

/test all

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 8, 2023
@trasc
Copy link
Contributor Author

trasc commented Nov 8, 2023

/test all

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 10, 2023
@trasc
Copy link
Contributor Author

trasc commented Nov 10, 2023

/test all

@trasc
Copy link
Contributor Author

trasc commented Nov 10, 2023

/test all

@trasc trasc changed the title Multikueue e2e test setup [POC][Multikueue] Cross cluster workload sync Nov 10, 2023
Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, some nits, remaining ask about the rootContext, and make sure we have the follow ups listed

@mimowo
Copy link
Contributor

mimowo commented Jan 18, 2024

I still have two comments pending:

  1. Multikueue initial implementation #1313 (comment)
  2. Multikueue initial implementation #1313 (comment)

I consider them non-blocking, because
(1.) will be addressed in the future IIUC as part of API changes follow-up, and
(2.) is just about naming which confused me, but it is not a big deal

Still, feel free to address these nice-to-haves, I will renew LGTM.

/lgtm

In case @tenzen-y has some other questions
/hold

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 18, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 9ef999d8d174f897d1e313a9260b021352821577

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall lgtm

keps/693-multikueue/README.md Show resolved Hide resolved
},
},
},
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trasc Where is its case? I could not find it.


type remoteController struct {
localClient client.Client
watchCtx context.Context
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trasc What is a real object? Does this mean that after we graduate MultiKueueConfig to beta?

pkg/controller/admissionchecks/multikueue/remoteclient.go Outdated Show resolved Hide resolved
Comment on lines +136 to +138

func (a *wlReconciler) Reconcile(ctx context.Context, req reconcile.Request) (reconcile.Result, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, even if we add markers here, manifests aren't updated.
However, we have been adding markers to all reconcilers so that we can clarify which permissions the reconciler must have.

So, I still think that having kubebuilder markers would be better. Or we should add markers to the batchJobAdapter instead of here.

test/e2e/multikueue/e2e_test.go Outdated Show resolved Hide resolved
test/integration/multikueue/multikueue_test.go Outdated Show resolved Hide resolved
g.Expect(managerCluster.client.Get(managerCluster.ctx, acKey, &updatetedAc)).To(gomega.Succeed())
cond := apimeta.FindStatusCondition(updatetedAc.Status.Conditions, kueue.AdmissionCheckActive)
g.Expect(cond).NotTo(gomega.BeNil())
g.Expect(cond.Status).To(gomega.Equal(metav1.ConditionTrue), "Reason: %s, Message: %q", cond.Reason, cond.Status)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice error message.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 18, 2024
@mimowo
Copy link
Contributor

mimowo commented Jan 18, 2024

Still
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 18, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 89a095055d41d19e6daa7200ba7b79d5391eb5a1

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trasc Thank you!
/lgtm
/approve
/hold cancel

},
},
},
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's right. Thanks.

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 18, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, tenzen-y, trasc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [alculquicondor,tenzen-y]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 659cf4b into kubernetes-sigs:main Jan 18, 2024
14 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.6 milestone Jan 18, 2024
@trasc trasc deleted the mk-e2e branch January 18, 2024 15:25
@alculquicondor
Copy link
Contributor

/release-note-edit

Basic implementation of MultiKueue for Job. This doesn't include support for live status updates.

@alculquicondor
Copy link
Contributor

/kind feature

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Jan 23, 2024
@tenzen-y
Copy link
Member

/kind api-change

@k8s-ci-robot k8s-ci-robot added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Feb 12, 2024
@alculquicondor
Copy link
Contributor

/release-note-edit

MultiKueue: Multi cluster job dispatching for k8s Job. This doesn't include support for live status updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants