Add kep for coscheduling base on CRD #42

cwdsuzhou · 2020-08-24T11:09:39Z

This PR add a plugin named batch. Our original repo is here.

In Tencent, this plugin has been stably running for more than half a year.

k8s-ci-robot · 2020-08-24T11:09:47Z

Welcome @cwdsuzhou!

It looks like this is your first PR to kubernetes-sigs/scheduler-plugins 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/scheduler-plugins has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2020-08-24T11:09:47Z

Hi @cwdsuzhou. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Huang-Wei

Thanks @cwdsuzhou for contributing Tencent's coscheduling plugins. I believe this will help grow the
ecosystem of Scheduler Framework a lot.

First comment is on KEP organization, let's make the KEP part a separate PR, and we will discuss the design details there.

kep/batch/README.md

denkensk · 2020-08-25T06:28:51Z

kep/podgroup-coscheduling/README.md

+
+1. Allow the pods do not belong to any group.
+2. If there are no groups scheduling, we check resource, if enough, we allow the pod.
+3. If there are groups running, we check if the current pod belong the max finished group, if it is, we allow it.


What‘s the Max finished group？
@cwdsuzhou

The group who has the max progress.

Please explain max in the document.

denkensk · 2020-08-25T06:37:58Z

@cwdsuzhou Thanks for your contribution. The definition of PodGroup is very important for Coscheduling.

After I review the kep, I find the key points of this kep are similar with the previous one(https://github.com/kubernetes-sigs/scheduler-plugins/tree/master/kep/2-lightweight-coscheduling. ) like queueSort and Permit.

Can we consider supporting podgroup based on the original coscheduling implemention Instead of reimplementing a plugin? Looking forward to your reply.

cwdsuzhou · 2020-08-25T07:57:31Z

@cwdsuzhou Thanks for your contribution. The definition of PodGroup is very important for Coscheduling.

After I review the kep, I find the key points of this kep are similar with the previous one(https://github.com/kubernetes-sigs/scheduler-plugins/tree/master/kep/2-lightweight-coscheduling. ) like queueSort and Permit.

Can we consider supporting podgroup based on the original coscheduling implemention Instead of reimplementing a plugin? Looking forward to your reply.

Yes, that is similar. But we consider more about resources and some add another controller to reconcile the podGroup status. Actually, the coscheduling has finished developing since Nov. 2019. Though the current coscheduling has a similar implementation, it may not cover some scenes or may have bad performance under some scene. We have ran the same integration tests cases. The result is as follows:

denkensk · 2020-08-25T10:04:27Z

we consider more about resources and some add another controller to reconcile the podGroup status.

I think it's OK to reconcile the PodGroup status in controller and pre-check or reserve the resource like Volcano. These features are not in conflict with the current co-scheduling. @cwdsuzhou

it may not cover some scenes or may have bad performance under some scene.

Can your share the scenes not covered in the current Co-scheduling？

cwdsuzhou · 2020-08-25T11:42:44Z

we consider more about resources and some add another controller to reconcile the podGroup status.

I think it's OK to reconcile the PodGroup status in controller and pre-check or reserve the resource like Volcano. These features are not in conflict with the current co-scheduling. @cwdsuzhou

it may not cover some scenes or may have bad performance under some scene.

Can your share the scenes not covered in the current Co-scheduling？

300 resources in a cluster, a job requires 3 pods, 150 resources requested by per pod. 2 pods would schedule, but actually the job can not run. If another job consists of 3 pods and 100 resources required by per pod comes, this job would not run immediately. Because resources have been occupied by the first job.

denkensk · 2020-08-25T12:05:00Z

300 resources in a cluster, a job requires 3 pods, 150 resources requested by per pod. 2 pods would schedule, but actually the job can not run. If another job consists of 3 pods and 100 resources required by per pod comes, this job would not run immediately. Because resources have been occupied by the first job.

This is a cluster where resources are highly used. The resources occupied by the first job will be released after the timeout. And also we can add some more policy like resource pre-check in Pre-Filter.

I don't think there is any problem that can't be solved so that we have to reimplement the coschduling plugins？
@cwdsuzhou

xq2005 · 2020-08-26T13:53:29Z

kep/podgroup-coscheduling/README.md

+	// If specified, indicates the PodGroup's priority. "system-node-critical" and
+	// "system-cluster-critical" are two special keywords which indicate the
+	// highest priorities with the former being the highest priority. Any other
+	// name must be defined by creating a PriorityClass object with that name.


Does PriorityClass mean another CRD or something else? Can you introduce more about this type?

name of Priority class, same as the default int K8s! But it is not used not, I am considering if I should remove it

xq2005 · 2020-08-26T14:23:12Z

kep/podgroup-coscheduling/README.md

+	// MinResources defines the minimal resource of members/tasks to run the pod group;
+	// if there's not enough resources to start all tasks, the scheduler
+	// will not start anyone.
+	MinResources *v1.ResourceList `json:"minResources,omitempty"`


One pod's minimal resource or one PodGroup's minimal resource?

To be honest, I don't quite get the point of MinResources.

The resource request is just one dimension of a PodGroup's constraints, how can you ensure all Pods of a PodGroup can be scheduled successfully by only looking at its resource requests?

pod

but the comments said it's for all tasks.

Yes, this seems to be not necessary. Actually we did not set it in our current implementation, the value is achieved from Pod. I would remove it.

Huang-Wei

First round of review.

Huang-Wei · 2020-08-26T21:46:28Z

kep/podgroup-coscheduling/README.md

@@ -0,0 +1,140 @@
+# Coscheduling based on PodGroup CRD


Let's rename this folder to include the issue number, i.e., kep/42-podgroup-coscheduling.

Huang-Wei · 2020-08-26T23:45:46Z

kep/podgroup-coscheduling/README.md

+<!-- /toc -->
+
+## Motivation
+Currently, through the default scheduler of Kubernetes, we cannot ensure a group of pods scheduled at the same time . Under some scene, it would waste resources since some pods need work together, like spark, tensorflow and so on . So, podgroup-coscheduling is aimed at solving the issue.


Suggested change

Currently, through the default scheduler of Kubernetes, we cannot ensure a group of pods scheduled at the same time . Under some scene, it would waste resources since some pods need work together, like spark, tensorflow and so on . So, podgroup-coscheduling is aimed at solving the issue.

Currently, through the default scheduler of Kubernetes, we cannot ensure a group of pods can be scheduled altogether. Under some scenes, it would waste resources since the whole application cannot work with only partial Pods' running, like Spark jobs, TensorFlow jobs, and so on. This proposal is aimed at solving the issue, by introducing a PodGroup CRD to do the heavy lifting on wiring a group of Pods.

Huang-Wei · 2020-08-26T23:56:42Z

kep/podgroup-coscheduling/README.md

+Sort the job when we submit jobs to a cluster. Currently, we can only do this base on pods.
+
+## Use Cases
+Spark jobs, tensorflow jobs and other pods have to run together.


Suggested change

Spark jobs, tensorflow jobs and other pods have to run together.

Batch workloads such as Spark jobs, TensorFlow jobs that have to run altogether.

Huang-Wei · 2020-08-26T23:58:48Z

kep/podgroup-coscheduling/README.md

+2. Define a CRD name PodGroup to help pod scheduling.
+
+## Non-Goals
+Sort the job when we submit jobs to a cluster. Currently, we can only do this base on pods.


Job here is official k8s Job? or a general term of batch job? And we didn't mention "sort" in previous paragraphs, so "sort the job" sounds a bit abrupt.

a general term of batch job

Huang-Wei · 2020-08-26T23:59:27Z

kep/podgroup-coscheduling/README.md

+Currently, through the default scheduler of Kubernetes, we cannot ensure a group of pods scheduled at the same time . Under some scene, it would waste resources since some pods need work together, like spark, tensorflow and so on . So, podgroup-coscheduling is aimed at solving the issue.
+
+## Goals
+1. Base on scheduling framework, implementing the gang scheduling.


Suggested change

1. Base on scheduling framework, implementing the gang scheduling.

1. Base on the scheduling framework, implement the co-scheduling feature.

Huang-Wei · 2020-08-27T00:27:20Z

kep/podgroup-coscheduling/README.md

+
+	// OccupiedBy marks the podgroup occupied by which group.
+	// Owner reference would be used to filled it, it not initialize, it is empty
+	OccupiedBy string `json:"occupiedBy,omitempty"`


Are you trying to describe a hierarchical relation here? For example, a PodGroup is owned/referred by another PodGroup? If so, you can inject the relation intoObjectMeta.OwnerReferences, so don't need to define a new API field.

Not, in our current implement this is just UID of a deployment, statefulSet or Other CR, e.g tf job or mpi job.

Huang-Wei · 2020-08-27T00:41:19Z

kep/podgroup-coscheduling/README.md

+
+### Controller
+
+We define a controller to reconcile PodGroup status, and we can query the job status through describe the PodGroup. Any pod in a group failed, the Group Status is marked Failed. Controller would also help whem recovering from abnormal cases, e.g. batch scheduling is interpreted due to


... through describing ...

... Once a pod in a group failed ... is marked as Failed...

... Controller would also help to recover from ...

interpreted -> interrupted?

Huang-Wei · 2020-08-27T00:43:26Z

kep/podgroup-coscheduling/README.md

+
+#### PreFilter
+
+This extension helps pre-filter pods. It is useful, especially when there are not enough resources in a cluster. The main proggress is as follows:


Suggested change

This extension helps pre-filter pods. It is useful, especially when there are not enough resources in a cluster. The main proggress is as follows:

This extension helps pre-filter pods. It is useful, especially when there are not enough resources in a cluster. The overall flow works as below:

Huang-Wei · 2020-08-27T00:43:43Z

kep/podgroup-coscheduling/README.md

+
+This extension helps pre-filter pods. It is useful, especially when there are not enough resources in a cluster. The main proggress is as follows:
+
+1. Allow the pods do not belong to any group.


Suggested change

1. Allow the pods do not belong to any group.

1. Allow the pods that do not belong to any group.

Huang-Wei · 2020-08-27T00:44:22Z

kep/podgroup-coscheduling/README.md

+
+1. Allow the pods do not belong to any group.
+2. If there are no groups scheduling, we check resource, if enough, we allow the pod.
+3. If there are groups running, we check if the current pod belong the max finished group, if it is, we allow it.


Please explain max in the document.

cwdsuzhou · 2020-08-27T05:29:24Z

First round of review.

Thank for review, I would update this kep according to your kind suggestions

Huang-Wei

More comments.

kep/42-podgroup-coscheduling/README.md

Huang-Wei · 2020-09-01T18:52:44Z

kep/42-podgroup-coscheduling/README.md

@@ -70,7 +55,7 @@ type PodGroupStatus struct {
 	Phase PodGroupPhase `json:"phase"`

 	// OccupiedBy marks the podgroup occupied by which group.
-	// Owner reference would be used to filled it, it not initialize, it is empty
+	// Owner reference would be used to filled it, if not initialize, it is empty


Suggested change

// Owner reference would be used to filled it, if not initialize, it is empty

// Owner reference would be used to fill it. It's empty if not initialized.

Huang-Wei · 2020-09-01T18:55:49Z

kep/42-podgroup-coscheduling/README.md

-	// MinResources defines the minimal resource of members/tasks to run the pod group;
-	// if there's not enough resources to start all tasks, the scheduler
-	// will not start anyone.
-	MinResources *v1.ResourceList `json:"minResources,omitempty"`


If MinResources is needed for the implementation, I think it can be kept. IIRC this serves as a preFilter optimization, right?

In our original version, we fill this field from podSpec. Keeping this is also ok

If this defined, we would use this pro pre-fileter. If not, we compute the resource requirements from Pod Spec

I'm confused about how to set the MinResource. It's the total minimal resource of the PodGroup or the pod?

Huang-Wei · 2020-09-01T18:56:07Z

kep/42-podgroup-coscheduling/README.md

@@ -96,7 +81,7 @@ type PodGroupStatus struct {

 ### Controller

-We define a controller to reconcile PodGroup status, and we can query the job status through describe the PodGroup. Any pod in a group failed, the Group Status is marked Failed. Controller would also help whem recovering from abnormal cases, e.g. batch scheduling is interpreted due to
+We define a controller to reconcile PodGroup status, and we can query the job status through describing the PodGroup. Onece a pod in a group failed, the Group Status is marked Failed. Controller would also help recover from abnormal cases, e.g. batch scheduling is interrupted due to


Onece -> Once

This comment is still outstanding.

Huang-Wei · 2020-09-01T19:54:19Z

kep/42-podgroup-coscheduling/README.md

+
+This extension helps pre-filter pods. It is useful, especially when there are not enough resources in a cluster. The overall flow works as below:
+
+1. Allow the pods that do not belong to any group.


Suggested change

1. Allow the pods that do not belong to any group.

1. If the pod doesn't belong to a pod group, allow it; otherwise, go to the following steps.

Huang-Wei · 2020-09-01T20:38:02Z

kep/42-podgroup-coscheduling/README.md

+#### Permit
+
+1. When number of pods cannot meet the `minMember` defines in the PodGroup, `Wait` is returned. They will be added to cache with TLL(equal to MaxScheduleTime).
+2. When number meet that, we would send a signal to permit the pods waiting. 


Suggested change

2. When number meet that, we would send a signal to permit the pods waiting.

2. When the number is equal or greater than `minMember`, send a signal to permit the waiting pods.

Huang-Wei · 2020-09-01T20:38:59Z

kep/42-podgroup-coscheduling/README.md

+1. When number of pods cannot meet the `minMember` defines in the PodGroup, `Wait` is returned. They will be added to cache with TLL(equal to MaxScheduleTime).
+2. When number meet that, we would send a signal to permit the pods waiting. 
+
+We can define `MaxScheduleTime` for a PodGroup. If anyone of the pods times out, the whole group would be rejected.


FYI: it's now defined in the latest codebase, and will be available in master branch until #45 gets reviewed.

I think we should keep this. Our original version supports from both args and the crd. If the crd did not set the time, the args defines will work.

scene is as bellow:
A job need 10 pods may time out time after 5s, but a job need 10000 pods would like it to be 5min.

I think it is better not to let the user to set the MaxScheduleTime. Users will prefer to set the MaxScheduleTime greater than it needed to reserve the resource. This will lead the resource to be reserved for a long time.

A job need 10 pods may time out time after 5s, but a job need 10000 pods would like it to be 5min.

We talked about it before. We can use a factor to compute the WaitingTime. The factor can be related to the number of pod. I add Todo in code before.

But for different cluster, the factor would also be different.
e.g. 100 nodes or 2000nodes

Yes, the size of cluster is also a factor to be considered with the the number of pod. This can be discussed.
And I am still worried about it. @Huang-Wei @cwdsuzhou

I think it is better not to let the user to set the MaxScheduleTime. Users will prefer to set the MaxScheduleTime greater than it needed to reserve the resource. This will lead the resource to be reserved for a long time.

Huang-Wei · 2020-09-01T20:39:30Z

kep/42-podgroup-coscheduling/README.md

+1. When number of pods cannot meet the `minMember` defines in the PodGroup, `Wait` is returned. They will be added to cache with TLL(equal to MaxScheduleTime).
+2. When number meet that, we would send a signal to permit the pods waiting. 
+
+We can define `MaxScheduleTime` for a PodGroup. If anyone of the pods times out, the whole group would be rejected.


Suggested change

We can define `MaxScheduleTime` for a PodGroup. If anyone of the pods times out, the whole group would be rejected.

We can define `MaxScheduleTime` for a PodGroup. If any pod times out, the whole pod group would be rejected.

Huang-Wei · 2020-09-01T20:39:46Z

kep/42-podgroup-coscheduling/README.md

+
+This extension is mainly used for helping record the PodGroup Status. When pod binds successfully, we would update the scheduling status of a PodGroup.
+
+We can define `MaxScheduleTime` for a PodGroup. If anyone of the pods times out, the whole group would be rejected.


Suggested change

We can define `MaxScheduleTime` for a PodGroup. If anyone of the pods times out, the whole group would be rejected.

We can define `MaxScheduleTime` for a PodGroup. If any pod times out, the whole group would be rejected.

Huang-Wei · 2020-09-01T20:40:26Z

kep/42-podgroup-coscheduling/kep.yaml

+  - "@Huang-Wei"
+  - "@ahg-g"
+  - "@alculquicondor"
+  - "k82cn"
+  - "@resouer"
+  - "@hex108"
+  - "@everpeace"


Remove names who are not reviewing.

Huang-Wei · 2020-09-01T20:40:57Z

/ok-to-test

cwdsuzhou · 2020-09-02T02:17:53Z

@Huang-Wei PR addressed according to your latest suggestions, PTAL thanks

denkensk · 2020-09-02T02:55:09Z

/cc @denkensk

kep/42-podgroup-coscheduling/README.md

Huang-Wei · 2020-09-08T17:29:21Z

kep/42-podgroup-coscheduling/README.md

+
+#### PreFilter
+
+This extension helps pre-filter pods. It is useful, especially when there are not enough resources in a cluster. The overall flow works as below:


This comment is still outstanding.

helps ... useful... are a bit verbose and doesn't add extra info to make the words more understandable. Can you rephrase it?

Huang-Wei · 2020-09-08T17:36:13Z

kep/42-podgroup-coscheduling/README.md

+This extension helps pre-filter pods. It is useful, especially when there are not enough resources in a cluster. The overall flow works as below:
+
+1. Allow the pods that do not belong to any group.
+2. If there are no groups scheduling, we check resource, if enough, we allow the pod.


This comment is still outstanding.

Huang-Wei · 2020-09-08T17:37:20Z

kep/42-podgroup-coscheduling/README.md

@@ -96,7 +81,7 @@ type PodGroupStatus struct {

 ### Controller

-We define a controller to reconcile PodGroup status, and we can query the job status through describe the PodGroup. Any pod in a group failed, the Group Status is marked Failed. Controller would also help whem recovering from abnormal cases, e.g. batch scheduling is interpreted due to
+We define a controller to reconcile PodGroup status, and we can query the job status through describing the PodGroup. Onece a pod in a group failed, the Group Status is marked Failed. Controller would also help recover from abnormal cases, e.g. batch scheduling is interrupted due to


This comment is still outstanding.

Huang-Wei · 2020-09-08T17:40:10Z

kep/42-podgroup-coscheduling/README.md

+3. If there are groups running, we check if the current pod belong the group having the max progress(num(Pods)/minMember), if it is, we allow it.
+4. Otherwise, we check if the max finished group can still run when allow this pod. If we can, allow it.
+5. Otherwise, we check if the pod has higher priority compared with the max finished one. If yes, we reject the pod belongs to the group and allow the current one. 


@cwdsuzhou does the rewording comment https://github.com/kubernetes-sigs/scheduler-plugins/pull/42/files#r481413128 make sense to you?

Huang-Wei · 2020-09-08T17:41:18Z

kep/42-podgroup-coscheduling/README.md

+3. If there are groups running, we check if the current pod belong the group having the max progress(num(Pods)/minMember), if it is, we allow it.
+4. Otherwise, we check if the max finished group can still run when allow this pod. If we can, allow it.
+5. Otherwise, we check if the pod has higher priority compared with the max finished one. If yes, we reject the pod belongs to the group and allow the current one. 


Regarding @denkensk 's concern, I'd leave it for @cwdsuzhou to comment. Probably it's not a bad idea to also look at the absolute number to reach completion.

Huang-Wei · 2020-09-08T17:41:29Z

kep/42-podgroup-coscheduling/README.md

+4. Otherwise, we check if the max finished group can still run when allow this pod. If we can, allow it.
+5. Otherwise, we check if the pod has higher priority compared with the max finished one. If yes, we reject the pod belongs to the group and allow the current one. 
+
+Any pod rejected to run, their group would be added to a denied list with a ttl.


This comment is still outstanding.

Huang-Wei · 2020-09-08T17:41:37Z

kep/42-podgroup-coscheduling/README.md

+
+#### Permit
+
+1. When number of pods cannot meet the `minMember` defines in the PodGroup, `Wait` is returned. They will be added to cache with TLL(equal to MaxScheduleTime).


This comment is still outstanding.

Huang-Wei · 2020-09-08T17:41:42Z

kep/42-podgroup-coscheduling/README.md

+
+#### PostBind
+
+This extension is mainly used for helping record the PodGroup Status. When pod binds successfully, we would update the scheduling status of a PodGroup.


This comment is still outstanding.

Huang-Wei

Some final comments. We're close to getting it merged.

Huang-Wei · 2020-09-09T21:42:19Z

kep/42-podgroup-coscheduling/kep.yaml

@@ -0,0 +1,13 @@
+title: Coscheduling based on PodGroup CRD
+kep-number: 3


Suggested change

kep-number: 3

kep-number: 42

As we're using the issue number as the KEP number.

Huang-Wei · 2020-09-09T22:06:49Z

kep/42-podgroup-coscheduling/README.md

+3. If there are groups running, we check if the current pod belong the group having the max progress(num(Pods)/minMember), if it is, we allow it.
+4. Otherwise, we check if the max finished group can still run when allow this pod. If we can, allow it.
+5. Otherwise, we check if the pod has higher priority compared with the max finished one. If yes, we reject the pod belongs to the group and allow the current one. 


Please see if the wordings in https://github.com/kubernetes-sigs/scheduler-plugins/pull/42/files#r481413128 makes more sense to you.

Sure, I missed this last time.

denkensk · 2020-09-10T03:46:09Z

kep/42-podgroup-coscheduling/README.md

+	// MinResources defines the minimal resource of members/tasks to run the pod group;
+	// if there's not enough resources to start all tasks, the scheduler
+	// will not start anyone.
+	MinResources *v1.ResourceList `json:"minResources,omitempty"`


How about if the podgroup is made up of different types of pods? Like ps and trainer in tensorflow. How to set the MinResources ？
@cwdsuzhou

This is the v1alpha1 version. I could add that if need and extend this CRD later. e.g

type PodGroup struct {
...
Groups [map]PodGroup
...
}

I think one PodGroup should describe a particular type of workload. In terms of tensorflow application, a good model is to have PS and Worker referencing their corresponding (Sub-)PodGroup, and then there can be a (Parent-)PodGroup to wire those two (Sub-)PodGroup(s).

(not quite sure it would work, need a PoC on this)

I thought per #42 (comment) MinResources was going to be removed?

Huang-Wei · 2020-09-10T18:52:43Z

/approve

Thanks @cwdsuzhou ! Please squash the commits, then I will /lgtm.

k8s-ci-robot · 2020-09-10T18:52:53Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cwdsuzhou, Huang-Wei

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [Huang-Wei]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

cwdsuzhou · 2020-09-10T23:50:01Z

/approve

Thanks @cwdsuzhou ! Please squash the commits, then I will /lgtm.

Done, PTAL, thanks! @Huang-Wei

Huang-Wei · 2020-09-10T23:57:59Z

/lgtm

xujyan · 2020-09-14T07:03:52Z

kep/42-podgroup-coscheduling/README.md

+
+This extension pre-filters pods to save scheduling cycles. This is especially helpful when there are not enough resources in a cluster. The overall flow works as below:
+
+1. If the pod doesn't belong to a pod group, allow it; otherwise, go to the following steps.


How is this "pod belonging to a pod group" relationship expressed?

pod to PodGroup: pod gets applied with a PodGroup label or annotation

PodGroup to pod: either implicitly by dynamic computing, or PodGroup's OccupiedBy - which refers to the the group of pods' controller (i.e., Deployment or StatefulSet).

@cwdsuzhou can confirm.

xujyan · 2020-09-14T07:07:34Z

kep/42-podgroup-coscheduling/README.md

+
+    // OccupiedBy marks the workload (e.g., deployment, statefulset) UID that occupy the podgroup.
+    // It is empty if not initialized.
+	OccupiedBy string `json:"occupiedBy,omitempty"`


As in, owned, like OwnerReference?

I think it would be more intuitive to use the more common terminology?

The canonical meta.OwnerReference may be used to exercise the concept of hierarchical PodGroups, i.e., a child PodGroup "ownedBy" the other parent PodGroup, so using the term "OccupiedBy" in status can distinguish from that semantics IMO - to represent the semantics of PodGroup <-> Workload mapping.

@cwdsuzhou may comment further.

The canonical meta.OwnerReference may be used to exercise the concept of hierarchical PodGroups, i.e., a child PodGroup "ownedBy" the other parent PodGroup, so using the term "OccupiedBy" in status can distinguish from that semantics IMO - to represent the semantics of PodGroup <-> Workload mapping.

@cwdsuzhou may comment further.

Yep ,this field is used for recording the related object e.g. workload or crd. But not absolutely same as the owner reference.

[release-4.10][manual][PR#373][PR#371]

k8s-ci-robot requested a review from Huang-Wei August 24, 2020 11:09

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 24, 2020

k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Aug 24, 2020

Huang-Wei reviewed Aug 24, 2020

View reviewed changes

kep/batch/README.md Outdated Show resolved Hide resolved

cwdsuzhou force-pushed the master branch from f4b450c to 9154b0f Compare August 25, 2020 04:21

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Aug 25, 2020

cwdsuzhou changed the title ~~Add batch plugin~~ Add kep for coscheduling base on CRD Aug 25, 2020

cwdsuzhou force-pushed the master branch from 9154b0f to 98be676 Compare August 25, 2020 06:26

denkensk reviewed Aug 25, 2020

View reviewed changes

xq2005 reviewed Aug 26, 2020

View reviewed changes

Huang-Wei reviewed Aug 27, 2020

View reviewed changes

cwdsuzhou requested a review from Huang-Wei August 27, 2020 07:32

cwdsuzhou mentioned this pull request Sep 1, 2020

REQUEST: New membership for cwdsuzhou kubernetes/org#2156

Closed

Huang-Wei reviewed Sep 1, 2020

View reviewed changes

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 1, 2020

cwdsuzhou requested a review from Huang-Wei September 2, 2020 02:17

k8s-ci-robot requested a review from denkensk September 2, 2020 02:55

cwdsuzhou mentioned this pull request Sep 3, 2020

Add cwdsuzhou as a member of kubernetes-sigs kubernetes/org#2171

Merged

denkensk mentioned this pull request Sep 7, 2020

pod group min available is never updated #49

Closed

Huang-Wei reviewed Sep 8, 2020

View reviewed changes

cwdsuzhou force-pushed the master branch from d6d046c to d75f27c Compare September 9, 2020 02:14

cwdsuzhou requested a review from Huang-Wei September 9, 2020 02:17

Huang-Wei reviewed Sep 9, 2020

View reviewed changes

cwdsuzhou force-pushed the master branch from adb6e86 to 206e3ab Compare September 10, 2020 03:31

denkensk reviewed Sep 10, 2020

View reviewed changes

cwdsuzhou requested a review from Huang-Wei September 10, 2020 03:50

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 10, 2020

Add kep for coscheduling based on CRD

122cdea

cwdsuzhou force-pushed the master branch from 206e3ab to 122cdea Compare September 10, 2020 23:49

k8s-ci-robot assigned Huang-Wei Sep 10, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 10, 2020

k8s-ci-robot merged commit 6303fc5 into kubernetes-sigs:master Sep 11, 2020

xujyan reviewed Sep 14, 2020

View reviewed changes

denkensk mentioned this pull request Sep 16, 2020

Implementation of coscheduling based on CRD [Part 1] #52

Merged

swatisehgal pushed a commit to swatisehgal/scheduler-plugins that referenced this pull request Jun 29, 2022

Merge pull request kubernetes-sigs#42 from swatisehgal/update-4.10

daa1d5b

[release-4.10][manual][PR#373][PR#371]

	Spark jobs, tensorflow jobs and other pods have to run together.
	Batch workloads such as Spark jobs, TensorFlow jobs that have to run altogether.

	1. Base on scheduling framework, implementing the gang scheduling.
	1. Base on the scheduling framework, implement the co-scheduling feature.


		### Controller

		We define a controller to reconcile PodGroup status, and we can query the job status through describe the PodGroup. Any pod in a group failed, the Group Status is marked Failed. Controller would also help whem recovering from abnormal cases, e.g. batch scheduling is interpreted due to


		#### PreFilter

		This extension helps pre-filter pods. It is useful, especially when there are not enough resources in a cluster. The main proggress is as follows:


		This extension helps pre-filter pods. It is useful, especially when there are not enough resources in a cluster. The main proggress is as follows:

		1. Allow the pods do not belong to any group.

	1. Allow the pods do not belong to any group.
	1. Allow the pods that do not belong to any group.

	// Owner reference would be used to filled it, if not initialize, it is empty
	// Owner reference would be used to fill it. It's empty if not initialized.

	1. Allow the pods that do not belong to any group.
	1. If the pod doesn't belong to a pod group, allow it; otherwise, go to the following steps.

	2. When number meet that, we would send a signal to permit the pods waiting.
	2. When the number is equal or greater than `minMember`, send a signal to permit the waiting pods.

	We can define `MaxScheduleTime` for a PodGroup. If anyone of the pods times out, the whole group would be rejected.
	We can define `MaxScheduleTime` for a PodGroup. If any pod times out, the whole pod group would be rejected.


		This extension is mainly used for helping record the PodGroup Status. When pod binds successfully, we would update the scheduling status of a PodGroup.

		We can define `MaxScheduleTime` for a PodGroup. If anyone of the pods times out, the whole group would be rejected.


		#### Permit

		1. When number of pods cannot meet the `minMember` defines in the PodGroup, `Wait` is returned. They will be added to cache with TLL(equal to MaxScheduleTime).


		#### PostBind

		This extension is mainly used for helping record the PodGroup Status. When pod binds successfully, we would update the scheduling status of a PodGroup.

		@@ -0,0 +1,13 @@
		title: Coscheduling based on PodGroup CRD
		kep-number: 3


		This extension pre-filters pods to save scheduling cycles. This is especially helpful when there are not enough resources in a cluster. The overall flow works as below:

		1. If the pod doesn't belong to a pod group, allow it; otherwise, go to the following steps.

Add kep for coscheduling base on CRD #42

Add kep for coscheduling base on CRD #42

Conversation

cwdsuzhou commented Aug 24, 2020

k8s-ci-robot commented Aug 24, 2020

k8s-ci-robot commented Aug 24, 2020

Huang-Wei left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

denkensk commented Aug 25, 2020

cwdsuzhou commented Aug 25, 2020

denkensk commented Aug 25, 2020 • edited Loading

cwdsuzhou commented Aug 25, 2020

denkensk commented Aug 25, 2020

Choose a reason for hiding this comment

cwdsuzhou Aug 26, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Huang-Wei left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cwdsuzhou commented Aug 27, 2020

Huang-Wei left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

denkensk Sep 2, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Huang-Wei commented Sep 1, 2020

cwdsuzhou commented Sep 2, 2020

denkensk commented Sep 2, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Huang-Wei left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Huang-Wei commented Sep 10, 2020

denkensk commented Aug 25, 2020 •

edited

Loading

cwdsuzhou Aug 26, 2020 •

edited

Loading

denkensk Sep 2, 2020 •

edited

Loading