Skip to content
This repository has been archived by the owner on Sep 12, 2023. It is now read-only.

Make scheduler-plugins the default gang scheduler #209

Merged
merged 1 commit into from
Mar 21, 2023

Conversation

Syulin7
Copy link
Contributor

@Syulin7 Syulin7 commented Jan 29, 2023

What this PR does / why we need it:

Now supports many gang schedulers(volcano, scheduler-plugins). This PR supports koordinator gang scheduler.

Related: kubeflow/training-operator#1746

@johnugeorge
Copy link
Member

Thanks @Syulin7 We are in the process of training operator release(currently rc0). Since kubeflow/common dependency has not branched out, we have to wait a bit to have the final release and then merge this one.

/cc @tenzen-y Can you review this?

@tenzen-y
Copy link
Member

Thanks @Syulin7 We are in the process of training operator release(currently rc0). Since kubeflow/common dependency has not branched out, we have to wait a bit to have the final release and then merge this one.

/cc @tenzen-y Can you review this?

Sure.
First, I will ask questions about the Koordinator at kubeflow/training-operator#1746.

@tenzen-y
Copy link
Member

/hold

@Syulin7
Copy link
Contributor Author

Syulin7 commented Jan 31, 2023

@tenzen-y This PR is ready for review. PTAL.

@tenzen-y
Copy link
Member

Can you squash commits into one?

@Syulin7
Copy link
Contributor Author

Syulin7 commented Jan 31, 2023

Can you squash commits into one?

Done

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/assign @johnugeorge

@Syulin7 Syulin7 force-pushed the koordinator branch 2 times, most recently from fec0139 to 34510ef Compare March 15, 2023 15:30
Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a nit, otherwise lgtm

@@ -72,6 +72,8 @@ const (
GangSchedulerNone GangScheduler = "None"
GangSchedulerVolcano GangScheduler = "volcano"
GangSchedulerSchedulerPlugins GangScheduler = "scheduler-plugins"
// GangSchedulerDefault Using this scheduler name or any scheduler name different than volcano uses the scheduler-plugins PodGroup
GangSchedulerDefault GangScheduler = "default-scheduler"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove if this is not being used

@alculquicondor
Copy link
Contributor

/retitle Make scheduler-plugins the default gang scheduler

@Syulin7 Syulin7 changed the title Support Koordinator Gang Scheduler Make scheduler-plugins the default gang scheduler Mar 16, 2023
@Syulin7
Copy link
Contributor Author

Syulin7 commented Mar 16, 2023

@alculquicondor @tenzen-y @terrytangyuan all conversations are done, PTAL.

@@ -164,6 +165,10 @@ func (r *SchedulerFrameworkReconciler) DecoratePodForGangScheduling(
podTemplate *corev1.PodTemplateSpec,
job client.Object,
) {
if podTemplate.Spec.SchedulerName == "" && r.GetGangSchedulerName() != "default-scheduler" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if podTemplate.Spec.SchedulerName == "" && r.GetGangSchedulerName() != "default-scheduler" {
if podTemplate.Spec.SchedulerName == "" {

Let's use a logic the same as the podgroup_control.

@tenzen-y
Copy link
Member

I left a comment for nit. Overall, lgtm.

Signed-off-by: Syulin7 <735122171@qq.com>
@tenzen-y
Copy link
Member

@Syulin7 Thanks for the great work!
/lgtm

@johnugeorge
Do you decide on the week for releasing the training operator v1.6, although releasing kubeflow v1.7 seems delayed?

Week 23 - Distribution Testing Ends: Wednesday, Mar 22nd 2023

@google-oss-prow google-oss-prow bot added the lgtm label Mar 16, 2023
}

// NewSchedulerPluginsControl returns a SchedulerPluginsControl
func NewSchedulerPluginsControl(c client.Client) PodGroupControlInterface {
return &SchedulerPluginsControl{Client: c}
func NewSchedulerPluginsControl(c client.Client, schedulerName string) PodGroupControlInterface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not about this PR, but I realized that some functions below are unnecessary, like UpdatePodGroup, and CreatePodGroup. They are casting unnecessarily: the controller-runtime client can already handle the casting via reflection. And I think GetPodGroup and DeletePodGroup could be refactored so that we don't need the casting either. Maybe even use generics.

@alculquicondor
Copy link
Contributor

/lgtm

Copy link
Member

@terrytangyuan terrytangyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

/lgtm
/approve

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: terrytangyuan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@johnugeorge
Copy link
Member

@tenzen-y The final release is not yet made due to delay in Kubeflow upstream. Sorry for the delay. This should not happen again in the future as we always branch out for all repos except common. We will target this week end to merge- Friday 24 March

@tenzen-y
Copy link
Member

@tenzen-y The final release is not yet made due to delay in Kubeflow upstream. Sorry for the delay. This should not happen again in the future as we always branch out for all repos except common. We will target this week end to merge- Friday 24 March

Thanks for letting me know! I appreciate your hard work!

@johnugeorge
Copy link
Member

Training operator release is made. This can be merged.

Thanks for the patience

@tenzen-y
Copy link
Member

@Syulin7 Thank you!
/hold cancel

@tenzen-y
Copy link
Member

@johnugeorge Can we cut a new release on this repo to move forward kubeflow/training-operator#1747?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants