KEP for co-scheduling plugin based on scheduler framework #1463

denkensk · 2020-01-16T13:39:46Z

Add co-scheduling plugin for the new scheduler framework.
This plugin will be put in kubernetes-sigs/scheduler-plugins firstly.

/sig scheduling

k8s-ci-robot · 2020-01-16T13:40:12Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: denkensk
To complete the pull request process, please assign ahg-g
You can assign the PR to them by writing /assign @ahg-g in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

keps/sig-scheduling/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

denkensk · 2020-01-17T03:18:24Z

/assign @ahg-g
/cc @Huang-Wei
/cc @hex108

Huang-Wei

Thanks @denkensk. Some comments below. I will go for a second round of review on the design details.

Huang-Wei · 2020-01-18T00:17:21Z

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

+<!-- /toc -->
+
+## Motivation
+Kubernetes has become a popular solution for orchestrating containerized workloads. Due to limitation of Kubernetes scheduler, some offline workloads(ML/DL)  are managed in a different system. For the sake of operation efficiency, we'd like to treat Kubneretes as a unified management platform. But the ML jobs have the characteristics of All-or-Nothing in the scheduling process, which requires all tasks of a job to be scheduled at the same time. If the job only start part of tasks, it will wait for other tasks be ready to begin to work. In worst case, all jobs are pending like deadlock. To solve the above problem, co-scheduling is needed for the scheduler. The new scheduler framework makes the goal possible.


... for the sake of improving cluster utilization and operation efficiency...

Huang-Wei · 2020-01-18T00:21:41Z

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

+
+In order to implement the coscheduling, we developed plugins in different extension points. In `Sort` phase, add strategy to try to make sure that the pods belong to same PodGroup are queued continuously. In `Permit` phase，put the pod that doesn't meet min-available into WaitingMap and reserve resources until min-available are met or timeout. In `Unreserve` phase，clean up the pods which is timeout.
+
+![image](20200116-coscheduling-plugin-based-on-scheduler-framework-extensions.png)


Can you put it somewhere and comment with the URL? so that it can be viewed during the review.

Done. thx. I add a comment

Huang-Wei · 2020-01-18T00:36:16Z

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

+
+## Proposal
+
+In order to implement the coscheduling, we developed plugins in different extension points. In `Sort` phase, add strategy to try to make sure that the pods belong to same PodGroup are queued continuously. In `Permit` phase，put the pod that doesn't meet min-available into WaitingMap and reserve resources until min-available are met or timeout. In `Unreserve` phase，clean up the pods which is timeout.


How about: In QueueSort phase, implement a strategy to ensure the Pods belonging to the same PodGroup are queued continously. For example, suppose PodGroup A owns Pod-A1, Pod-A2, Pod-A3, while PodGroup B owns Pod-B1, Pod-B2. The pods of the two PodGroups cannot be zigzagged - it should be always <PodGroup-A, PodGroup-B> or the other way around; it should NOT be any way like <Pod-A1, Pod-B1, Pod-A2, ...>.

Huang-Wei · 2020-01-18T00:40:42Z

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

+### Extension points
+
+#### QueueSort
+In order to make the pods which belongs to the same `PodGroup` to be scheduled together as much as possible, much policies are added in `QueueSort` phase. 


What do you mean by "much policies"?

I change it to implement a strategy in QueueSort phase.

Huang-Wei · 2020-01-18T00:41:14Z

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

+```go
+  func  Less(podA *PodInfo, podB *PodInfo) bool
+```
+1. When priorities are different, they are compared based on their priorities. When priorities are the same, they are operated according to the following process.


When priorities (i.e. .spec.priorityValue) are different

Huang-Wei · 2020-01-18T00:45:17Z

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

+```
+1. When priorities are different, they are compared based on their priorities. When priorities are the same, they are operated according to the following process.
+
+2. When podA and podB are both regularPods (we will check it by the label of pod), it follows the same logic of default in-tree `PrioritySort` plugin.


... we will check it by their labels ...

Add a link for plugin.

denkensk · 2020-01-19T03:29:03Z

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

+
+In order to implement the coscheduling, we developed plugins in different extension points. In `QueueSort` phase, implement a strategy to ensure the Pods belonging to the same PodGroup are queued continously. For example, suppose PodGroup A owns Pod-A1, Pod-A2, Pod-A3, while PodGroup B owns Pod-B1, Pod-B2. The pods of the two PodGroups cannot be zigzagged - it should be always <PodGroup-A, PodGroup-B> or the other way around; it should NOT be any way like <Pod-A1, Pod-B1, Pod-A2, ...>In `Permit` phase，put the pod that doesn't meet min-available into WaitingMap and reserve resources until min-available are met or timeout. In `Unreserve` phase，clean up the pods which is timeout.
+
+![image](./20200116-coscheduling-plugin-based-on-scheduler-framework-extensions.png)


https://github.com/denkensk/enhancements/blob/coscheduling-plugin/keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework-extensions.png

ahg-g

first review, I didn't yet read the part from the Coscheduling section and below.

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

ahg-g

few more comments, please try to describe the goal of each step first before describing the implementation.

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

ahg-g · 2020-01-21T00:02:23Z

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

+**Note1**: There are different `LastFailureTimestamp` (even if they are the same, the UID will be different). So when the pods enter the queue, the pods that belongs to the same PodGroup will be together.
+
+#### Pre-Filter
+1. When a pod which belongs to a `PodGroup` comes, calculate the total number of Pods  excludes terminating ones belongs to the same PodGroup. If the result is less than minAvailable, the scheduling process is terminated directly.


Suggested change

1. When a pod which belongs to a `PodGroup` comes, calculate the total number of Pods excludes terminating ones belongs to the same PodGroup. If the result is less than minAvailable, the scheduling process is terminated directly.

1. When a pod which belongs to a `PodGroup` comes, calculate the total number of Pods that belong to the same PodGroup excluding terminating ones. If the result is less than minAvailable, the scheduling process is terminated directly.

as in PreFilter returns unschedulable? note that this will cause a scheduling failure not a "fit error": see here: https://github.com/kubernetes/kubernetes/blob/e6b5194ec12a7aa241c702019d6cd6f360b232d5/pkg/scheduler/core/generic_scheduler.go#L187

I am not sure I understand what you are tying to achieve, so similar to the comments above, start by describing the goal that PreFilter is trying to achieve not its implementation. For example: "PreFilter validates that ...."

PreFilter validates that if the total number of pods belonging to the same PodGroup is less than minAvailable. If so, the scheduling process will be interrupted directly.
I change it to return UnschedulableAndUnresolvable in PreFilter here.

It doesn't matter whether the return status is UnschedulableAndUnresolvable or Unschedulable, in both cases the scheduler will return a normal error, not a fit error, look at the code I pointed to in the comment above.

em.. @ahg-g I think it is not a fit error, maybe a normal error. For example, the current total number of all pods belongs to a same PodGroup is 3 (Maybe there are other reasons why the left pods haven't been created yet), but the minAvailable of the PodGroup is 5. I think it is better to abort the scheduling cycle in PreFilter .

I think it's meant to fail quickly for the first N-1 unassigned Pods, while recording the number in the cache. Then when the Nth Pod comes, scheduler knows there are N-1 Pods pending in the internal queue, so it proceeds the Nth Pod with the reguarlar scheduling cycle. Is it right? @denkensk

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

…duler-framework.md Co-Authored-By: ahg-g <40361897+ahg-g@users.noreply.github.com>

k82cn · 2020-02-10T02:14:07Z

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

+```go
+  func  Less(podA *PodInfo, podB *PodInfo) bool
+```
+1. Trying to order by pod priority (i.e. .spec.priorityValue), pod with higher priority is scheduled ahead of other pod with lower priority. When priorities are the same, they are operated according to the following process.


what will happen if scheduler get low/high priority pods intervally, depdent on timeout? That's a common case under high loads :)

Within the same podgroup or in general?

k82cn · 2020-02-10T02:16:52Z

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

+```
+1. Trying to order by pod priority (i.e. .spec.priorityValue), pod with higher priority is scheduled ahead of other pod with lower priority. When priorities are the same, they are operated according to the following process.
+
+2. When podA and podB are both regularPods (we will check it by their labels), it follows the same logic of default in-tree [PrioritySort](https://github.com/kubernetes/kubernetes/blob/master/pkg/scheduler/framework/plugins/queuesort/priority_sort.go#L41-L45) plugin.


what will happen if we have more pods than minAvailable? how to guarantee that?

When the count of pods belongs to the same PodGroup successfully scheduled is greater than min-available, it will be allowed in permit.

…duler-framework.md Co-Authored-By: Wei Huang <hweicdl@gmail.com>

wgliang · 2020-02-18T07:04:22Z

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

+## Alternatives considered
+1. Using `PodGroup` as a scheduling unit. This requires major refactoring, which only supports Pods as scheduling unit today.
+
+


Risks and Mitigations are also needed, for example the current scheduler's performance is at 100/s. If PodGroup is enabled, then some measures are needed to ease the pressure on the scheduler.

wgliang · 2020-02-18T07:31:03Z

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

+    ...
+    cs.FrameworkHandle.IterateOverWaitingPods(func(p framework.WaitingPod) {
+        if p.GetPod().Labels[PodGroupName] == podGroupName {
+        	p.Allow(cs.Name())


Must PodGroupName be unique?

sftim · 2020-02-18T15:03:18Z

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

+## Testing Plan
+1.  Add detailed unit and integration tests for workloads.
+2.  Add basic e2e tests, to ensure all components are working together.
+


(as someone who mainly contributes to SIG Docs) I like to see at least a sketch of documentation requirements for feature graduation.

Would that be OK?

Huang-Wei

@denkensk Some more comments.

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

Huang-Wei · 2020-03-10T18:46:01Z

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

+#### QueueSort
+In order to make the pods which belongs to the same `PodGroup` to be scheduled together as much as possible, implement a strategy in `QueueSort` phase. 
+
+**limition**: `QueueSort` is the core part of our design and only one `QueueSort` plugin is allowed in the scheduling framework. So our design only supports the case that `QueueSort` extension point isn't implemented in other plugins.


s/our design/this design/

s/limition/limitation

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

Huang-Wei · 2020-03-10T19:06:13Z

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

+   1. Trying to order by the `LastFailureTimestamp` of the podGroup. Pod with earlier timestamp is scheduled ahead of other pod.
+
+   2. If `LastFailureTimestampA` is equal to `LastFailureTimestampB`, trying to order by the `UID` of `PodGroup`,`Pod` with lexicographically greater `UID` is scheduled ahead of other pod. (The purpose is to distinguish different `PodGroup` with the same `LastFailureTimestamp` and to keep the pods of the same `PodGroup` together)
+


How about reorganizing 1-4 paragraphs like this:

Firstly, we will inherit the default in-tree PrioritySort plugin so as to honor .spec.priority to ensure high-priority Pods are always sorted ahead of low-priority ones.

Secondly, if two Pods hold the same priority, the sorting precedence is described as below:

If they are both regularPods (without particular PodGroup label), compare their Timestamp (the time a Pod gets added to the scheduling queue) field: the Pod with earlier timestamp is positioned ahead of the other.

If one is regularPod and the other is pgPod, compare regularPod's Timestamp with pgPod's LastFailureTimestamp: the Pod with earlier timestamp is positioned ahead of the other.

If they are both pgPods:

Compare their LastFailureTimestamp: the Pod with earlier timestamp is positioned ahead of the other.

If their LastFailureTimestamp is identical, order by their UID of PodGroup: a Pod with lexicographically greater UID is scheduled ahead of the other Pod. (The purpose is to tease different PodGroups with the same LastFailureTimestamp apart, while also keeping Pods belonging to the same PodGroup back-to-back)

Just to emphasize what is different in this writing: it focuses on intention, rather than implementation details. This makes the design easier to understand.

Huang-Wei · 2020-03-10T19:10:31Z

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

+
+2. When podA and podB are both regularPods (we will check it by their labels), it follows the same logic of default in-tree [PrioritySort](https://github.com/kubernetes/kubernetes/blob/master/pkg/scheduler/framework/plugins/queuesort/priority_sort.go#L41-L45) plugin.
+
+3. When podA is regularPod, podB is pgPod, trying to order by podA's `Timestamp` (the time pod added to the scheduling queue) and podB’s `LastFailureTimestamp` (we get the PodGroupInfo from the cache). Pod with earlier timestamp is scheduled ahead of other pod.   


@denkensk this comment is still outstanding. Could you answer here?

Huang-Wei · 2020-03-10T19:40:44Z

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

+
+**Note1**: There are different `LastFailureTimestamp` (even if they are the same, the UID will be different). So when the pods enter the queue, the pods that belongs to the same PodGroup will be together.
+
+#### Pre-Filter


Instead of jumping out stating what this phase does, I'd suggest putting a paragraph explaining why we need this and the rough workflow. I suppose you're trying to say:

When a pgGroup Pod is being scheduled for the first time, we have 2 option to deal with it: either start the full scheduling cycle no matter its grouping Pods are present inside schedule queue, or fail quick its scheduling cycle as its grouping Pods number doesn't meet minAvailable. The former case is more efficient, but may cause partial Pods holding system resources until a timeout. The latter case may result in extra scheduling cycles (even if we're going to fail them fast), but will reduce the overall scheduling time for the whole group - as we're waiting them to be all present in the queue first and then start the full scheduling cycle for each).

Here we're adopting the latter approach, blabla...

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

Huang-Wei · 2020-03-10T20:03:17Z

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

+**Note1**: There are different `LastFailureTimestamp` (even if they are the same, the UID will be different). So when the pods enter the queue, the pods that belongs to the same PodGroup will be together.
+
+#### Pre-Filter
+1. `PreFilter` validates that if the total number of pods belonging to the same `PodGroup` is less than `minAvailable`. If so, the scheduling process will be interrupted directly.


This design might still lead to deadlocks if more than one pod group already has minpods in the queues.

I think so. So I'm questioning why it only allows ONE on-going PodGroup. See my comment below.

Huang-Wei · 2020-03-10T20:53:59Z

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

+#### Pre-Filter
+1. `PreFilter` validates that if the total number of pods belonging to the same `PodGroup` is less than `minAvailable`. If so, the scheduling process will be interrupted directly.
+
+2. `PreFilter` validates that if we should reject `WaitingPods` in advance. When the pod belongs to a new `PodGroup` (we can check it by the `LastPodGroup` in cache), it indicates that the new PodGroup scheduling cycle has been entered. Then we will check if the last `PodGroup` meets the minAvailable, if not, we will reject it directly in advance. The purpose is 1.To avoid invalid waiting, 2.To make the pod belongs to the same `PodGroup` fail together, rather than waiting partially.


Can you show me why rejecting on-going waiting PodGroup is mandatory? It seems that we should use a Set (instead of String) for on-going PodGroup as a new PodGroup's scheduling is independent on existing on-going one, isn't it?

If we keep rejecting existing on-going PodGroup, it can still lead to deadlock even if right now we put Permit at the end of scheduling cycle.

Can you show me why rejecting on-going waiting PodGroup is mandatory?

The purpose of this is to make the pods belonging to the same PodGroup adjacent to each other in the process, rather than being queued by other pods. eg: podA1, podA2, podA3(PodGroupA min=3), podB1, podB2, podB3(PodGroupB min=3)

PodA1 PodA2 failed (Insufficient resources in the current cluster)

PodA3 success to WaitingMap (Can't actually schedule successfully because min=3)

PodB1 PodB2 failed (Insufficient resources in the current cluster)

PodB3 success to WaitingMap

Due to timeout mechanism, PodA3 and PodB3 are in WaitingMap. The new scheduling order is PodA1 PodA2 PodB1 PodB2. So PodGroupA is queued by other pods.

The above logic is to reduce the number of scheduling failures. But If it will lead to deadlock because rejecting existing on-going PodGroup, I think it is not necessary. WDYT? @Huang-Wei

Ah, I see the point, any Pod in a PodGroup cannot "reject" the on-going PodGroup - because you don't know if you're the last one and hence cannot tell whether reject or not. What a Pod in an on-going PodGroup can only "approve" - when the waitingPod number exceeds the minAvailable.

So the 2nd paragraph is more an optimization to fail fast the previous PodGroup - which is destined unable to be scheduled, please highlight that. And I think it's not necessarily only the (1st) Pod of another PodGroup can make this (reject) decision - any Pod other than the Pod belonging to the previous PodGroup can make the decision, right?

So you are basically trying to hold a second PodGroup until the first PodGroup has satisfied minAvailable. I have the feeling that Wei has a different interpretation.

But then pods from the second group will move to the unschedulable queue. How do you plan to bring them back once the first group satisfies it's lower limit?

And I think it's not necessarily only the (1st) Pod of another PodGroup can make this (reject) decision - any Pod other than the Pod belonging to the previous PodGroup can make the decision, right?

Yes, that's right. It is just optimized mechanism to fail fast the previous PodGroup. But I think it is difficult to chose when to reject the previous PodGroup. I temporarily removed this mechanism from the KEP. @Huang-Wei

So you are basically trying to hold a second PodGroup until the first PodGroup has satisfied minAvailable.

Not to hold the second PodGroup until the first PodGroup has satisfied minAvailable. It is to fail the first PodGroup fast if it is determined the first PodGroup is unable to satisfy minAvailable competition because the second PodGroup is coming is schedule cycle. @alculquicondor

This is definitely not clear at all. Let's add the conditions for considering a pod group unsatisfiable.

But also I don't think this is a good idea. Let's say some PG1 and PG2 (both requiring 3 min) are being created at almost the same time and the queue looks like this:

queue | events about to come from apiserver PG1, PG1, PG2, PG2 | PG1, PG2

The pods after the line are not in the queue yet, because there might be a delay from apiserver. But it seems that in your proposal, once the first PG2 is seen, we already reject PG1, because it's not satisfiable. But it will be soon.

Yes, you are right. There's something wrong with this mechanism, because it is difficult to chose when to reject the previous PodGroup. I remove this mechanism in PreFilter. Thank you @alculquicondor

…duler-framework.md Co-Authored-By: Wei Huang <hweicdl@gmail.com>

alculquicondor · 2020-03-12T14:07:44Z

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

+<!-- /toc -->
+
+## Motivation
+Kubernetes has become a popular solution for orchestrating containerized workloads. Due to limitation of Kubernetes scheduler, some offline workloads (ML/DL) are managed in a different system. To improve cluster utilization and operation efficiency, we'd like to treat Kubneretes as a unified management platform. But ML jobs are All-or-Nothing: they require all tasks of a job to be scheduled at the same time. If the job only start part of tasks, it will wait for other tasks to be ready to begin to work. In the worst case, all jobs are pending leading to a deadlock. To solve this problem, co-scheduling is needed for the scheduler. The new scheduler framework makes the goal possible. 


s/different system/different controller

As you introduce fixes here and there, consider splitting paragraphs in more lines to ease the reviews.

nit: “Kubneretes”

alculquicondor · 2020-03-12T14:10:41Z

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

+
+## Proposal
+
+In order to implement coscheduling, we developed plugins in different extension points. In `QueueSort`  we ensure that the Pods belonging to the same PodGroup are queued back-to-back. For example, suppose PodGroup A owns Pod-A1, Pod-A2, Pod-A3, while PodGroup B owns Pod-B1, Pod-B2. The pods of the two PodGroups should not interleave - it should be always <PodGroup-A, PodGroup-B> or the other way around; but never <Pod-A1, Pod-B1, Pod-A2, ...>. In `Permit` phase we put the pod that doesn't meet min-available into the WaitingMap and reserve resources until min-available are met or timeout is triggered. In `Unreserve` phase，clean up the pods that timed-out.


nit: Too much detail for QueueSort in this summary.

Also, focus in the general idea. IIRC, it is only one pod group is processed at a time?

alculquicondor · 2020-03-12T14:13:21Z

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

+    PodLister               corelisters.PodLister
+    // Key is the name of PodGroup.
+    PodGroupInfos       	map[string]PodGroupInfo
+    // Name of the last scheduled podgroup


Please update the comment accordingly

alculquicondor · 2020-03-12T14:15:36Z

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

+
+type PodGroupInfo struct {
+    // LastFailureTimestamp stores the timestamp of last scheduling failure.
+    LastFailureTimestamp 	time.Time


alculquicondor · 2020-03-12T14:16:17Z

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

+}
+```
+
+1.  `PodGroupInfo` is initialized the first time the pod belongs to the PodGroup is encountered, and LastFailureTimestamp is updated every time the PodGroup fails to schedule.


again, what is "fail to schedule"?

fails to schedule means that the PodGroup fails in 2 places

Waiting time timeout in WaitingMap

the previous PodGroup is rejected by the second PodGroup in PreFilter

I just want to define a timestamp to compare the timestamp of PodInfo in QueueSort. But I find if I update the LastFailureTimestamp, it maybe lead lead to undefined behavior in the heap. So I change it to InitialTimestamp (InitialTimestamp stores the timestamp of the initialization time of PodGroup.). And compare the InitialAttemptTimestamp of PodInfo

sg, please add these notes to the KEP.

thanks~~ I update in the KEP @alculquicondor

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

alculquicondor · 2020-03-12T14:18:30Z

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

+#### QueueSort
+In order to make the pods which belongs to the same `PodGroup` to be scheduled together as much as possible, implement a strategy in `QueueSort` phase. 
+
+**limition**: `QueueSort` is the core part of our design and only one `QueueSort` plugin is allowed in the scheduling framework. So our design only supports the case that `QueueSort` extension point isn't implemented in other plugins.


s/limition/limitation

alculquicondor · 2020-03-12T14:22:11Z

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

+
+2. When podA and podB are both regularPods (we will check it by their labels), it follows the same logic of default in-tree [PrioritySort](https://github.com/kubernetes/kubernetes/blob/master/pkg/scheduler/framework/plugins/queuesort/priority_sort.go#L41-L45) plugin.
+
+3. When podA is regularPod, podB is pgPod, trying to order by podA's `Timestamp` (the time pod added to the scheduling queue) and podB’s `LastFailureTimestamp` (we get the PodGroupInfo from the cache). Pod with earlier timestamp is scheduled ahead of other pod.   


Define LastFailureTimestamp and InitialTimestamp.

alculquicondor · 2020-03-12T14:24:19Z

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

+   1. Trying to order by the `LastFailureTimestamp` of the podGroup. Pod with earlier timestamp is scheduled ahead of other pod.
+
+   2. If `LastFailureTimestampA` is equal to `LastFailureTimestampB`, trying to order by the `UID` of `PodGroup`,`Pod` with lexicographically greater `UID` is scheduled ahead of other pod. (The purpose is to distinguish different `PodGroup` with the same `LastFailureTimestamp` and to keep the pods of the same `PodGroup` together)
+


Just to emphasize what is different in this writing: it focuses on intention, rather than implementation details. This makes the design easier to understand.

alculquicondor · 2020-03-12T14:45:27Z

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

+#### Pre-Filter
+1. `PreFilter` validates that if the total number of pods belonging to the same `PodGroup` is less than `minAvailable`. If so, the scheduling process will be interrupted directly.
+
+2. `PreFilter` validates that if we should reject `WaitingPods` in advance. When the pod belongs to a new `PodGroup` (we can check it by the `LastPodGroup` in cache), it indicates that the new PodGroup scheduling cycle has been entered. Then we will check if the last `PodGroup` meets the minAvailable, if not, we will reject it directly in advance. The purpose is 1.To avoid invalid waiting, 2.To make the pod belongs to the same `PodGroup` fail together, rather than waiting partially.


So you are basically trying to hold a second PodGroup until the first PodGroup has satisfied minAvailable. I have the feeling that Wei has a different interpretation.

But then pods from the second group will move to the unschedulable queue. How do you plan to bring them back once the first group satisfies it's lower limit?

…duler-framework.md Co-Authored-By: Wei Huang <hweicdl@gmail.com>

alculquicondor · 2020-03-13T13:52:35Z

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

-    // LastFailureTimestamp stores the timestamp of last scheduling failure.
-    LastFailureTimestamp 	time.Time
+    // InitialTimestamp stores the timestamp of the initialization time of PodGroup.
+    InitialTimestamp 	time.Time
    UID           			types.UID
    MinAvailable  		int
    Name                                 String
 }
 ```

 1.  `PodGroupInfo` is initialized the first time the pod belongs to the PodGroup is encountered, and LastFailureTimestamp is updated every time the PodGroup fails to schedule.


This line seems out of date.

Please remove and LastFailureTimestamp is updated every time the PodGroup fails to schedule..

alculquicondor · 2020-03-13T13:55:43Z

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

+  - Compare their `InitialAttemptTimestamp`: the Pod with earlier timestamp is positioned ahead of the other.
+  - If their `InitialAttemptTimestamp` is identical, order by their UID of PodGroup: a Pod with lexicographically greater UID is scheduled ahead of the other Pod. (The purpose is to tease different PodGroups with the same `InitialAttemptTimestamp` apart, while also keeping Pods belonging to the same PodGroup back-to-back)
+
+**Note1**: There are different `InitialTimestamp` (even if they are the same, the UID will be different). So when the pods enter the queue, the pods that belongs to the same PodGroup will be together.


This note seems unnecessary

alculquicondor · 2020-03-13T14:17:40Z

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

+- If one is regularPod and the other is pgPod, compare regularPod's `InitialAttemptTimestamp` with pgPod's `InitialTimestamp`: the Pod with earlier timestamp is positioned ahead of the other.
+
+- If they are both pgPods:
+  - Compare their `InitialAttemptTimestamp`: the Pod with earlier timestamp is positioned ahead of the other.


Have you thought of starvation?
If two pod groups have the same priority, the first one to enter the queue will always have higher precedence than the other, or even individual pods.

This is different than the original, where the scheduler tries to give the same scheduling opportunities to all pods. But, on the other hand, this might be something that we want for co-scheduling.

Ref kubernetes/kubernetes#83834 (comment)
cc @yqwang-ms

Thanks for cc.

BTW, seems co-scheduling by itself just needs "Gang Queuing", i.e.:

Pods belonging to the same PodGroup are queued back-to-back

Which is not necessary to be also FIFO. (And FIFO might NOT be something that we want for co-scheduling, it is more like that to implement co-scheduling, to be FIFO is a helper, so we coupled them)

Given current K8S has priority starvation issues mentioned in kubernetes/kubernetes#86373.
E.g. an early large scale pod group, keeps in waiting, will also cause later small pod group to starve, even if the small pod group can fit into free resource, and from user view, they have the same priority, and just only a little different creation time.

So, we would better decouple FIFO with co-scheduling, such as, for sometime, we move the whole early group behind later group, and for other times, we move the whole back.

If two pod groups have the same priority, the first one to enter the queue will always have higher precedence than the other, or even individual pods.

Technically, due to the limitation of the current global one-queue mechanism, it's possible upon a large pod group as it's pretty challenging to accommodate all precedence into one queue.

But, on the other hand, this might be something that we want for co-scheduling.

Comparing to the way that randomly sorts PodGroup Pods (where wait() is more unpredictable), this approach seems to have more chances to avoid starvation. However, as I mentioned above, technically it still has chances.

So, we would better decouple FIFO with co-scheduling, such as, for sometime, we move the whole early group behind later group, and for other times, we move the whole back.

Could you rephrase this? Did you mean couple or decouple? What would be "other times"?

I mean we would better decouple them given current situation.
For example, a large early pod group A, and another small later pod group B.
For queuing order, could we make A has 70% chance ahead of B? (So B still has 30% chance ahead of A)?
So in this way, we at least can mitigate or avoid the starvation for pods of the same priority. Otherwise, same priority pods also suffers from this proposal, right?

I mean can we easily queue pods in a group as a gang/unit, instead of leveraging detailed tuned compare rules to make sure they are back-to-back (has side effects to be FIFO)?

I mean can we easily queue pods in a group as a gang/unit, instead of leveraging detailed tuned compare rules to make sure they are back-to-back (has side effects to be FIFO)?

I mentioned it in Alternatives considered. I think this requires major refactoring, which only supports Pods as scheduling unit today. @yqwang-ms

alculquicondor · 2020-03-13T14:37:36Z

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md


-2. `PreFilter` validates that if we should reject `WaitingPods` in advance. When the pod belongs to a new `PodGroup` (we can check it by the `LastPodGroup` in cache), it indicates that the new PodGroup scheduling cycle has been entered. Then we will check if the last `PodGroup` meets the minAvailable, if not, we will reject it directly in advance. The purpose is 1.To avoid invalid waiting, 2.To make the pod belongs to the same `PodGroup` fail together, rather than waiting partially.
+Here we're adopting the latter approach, `PreFilter` validates that if the total number of pods belonging to the same `PodGroup` is less than `minAvailable`. If so, the scheduling process will be interrupted directly.


Are we still removing the PreFilter?

I remove the mechanism to reject previous PodGroup, just left that PreFilter` validates that if the total number of pods belonging to the same `PodGroup` is less than `minAvailable`. If so, the scheduling process will be interrupted directly.

the total number of pods where? i.e. where does this counter get increased?

the total number of pods where? i.e. where does this counter get increased?

I get the total number of Pods belonging to the same PodGroup through PodLister.List(selector) because they all have the same label pod-group.scheduling.sigs.k8s.io/name @alculquicondor

That would bypass all the caches, which may led to race conditions. Also it moves pods to the "unschedulable" queue if the number is lower than min. Then there is no way to bring them back unless some condition in the cluster changes that makes the pod "more schedulable". I know there is the flush every one minute, which I want to get rid of. But you probably don't want the plugin to have to wait 1min anyways.

That lister doesn't include unscheduled pods, which these are.

That's true.

That's not the case, because when a pod gets deleted, we bring back pods from the unschedulable queue back to the active queue.

But in both implementations, no (waiting) Pods get deleted - they just get moved from activeQ to unschedulableQ/packoffQ, or the other way around.

You said "Preempt plugin". I thought you were just referring to preemption. Are you referring to Reserve?
In that case, pods move to the plugin's cache. They are neither in active or unschedulableQ.

oops... my bad, typo, I meant "Permit" plugins.

No worries. So my comment still applies. We are not moving pods to unschedulable queue when we tell pods to "wait".

True. My point was that based on current extension points, this seems to be a common limitation among all coscheduling plugins.

angao · 2020-03-17T02:41:37Z

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

+  - If their `InitialAttemptTimestamp` is identical, order by their UID of PodGroup: a Pod with lexicographically greater UID is scheduled ahead of the other Pod. (The purpose is to tease different PodGroups with the same `InitialAttemptTimestamp` apart, while also keeping Pods belonging to the same PodGroup back-to-back)
+
+#### Pre-Filter
+When a `pgGroup` Pod is being scheduled for the first time, we have 2 option to deal with it: either start the full scheduling cycle no matter its grouping Pods are present inside schedule queue, or fail quick its scheduling cycle as its grouping Pods number doesn't meet `minAvailable`. The former case is more efficient, but may cause partial Pods holding system resources until a timeout. The latter case may result in extra scheduling cycles (even if we're going to fail them fast), but will reduce the overall scheduling time for the whole group - as we're waiting them to be all present in the queue first and then start the full scheduling cycle for each).


s/pgGroup Pod/pgPod ?

angao · 2020-03-17T02:51:53Z

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

+In `Permit` phase, we put the pod that doesn't meet min-available into the WaitingMap and reserve resources until min-available are met or timeout is triggered.
+1. Get the number of Running pods that belong to the same PodGroup
+2. Get the number of WaitingPods (used to record pods in waiting status) that belong to the same PodGroup
+3. If Running + WaitingPods + 1 >= min-available(1 means the pod itself), approve the waiting pods that  belong to the same PodGroup. Otherwise, put the pod into WaitingPods and set the timeout (eg: the timeout is dynamic value depends on the size of the `PodGroup`.)


s/min-available/MinAvailable, the former is easily misunderstood as min minus available

k8s-ci-robot · 2020-03-17T03:11:54Z

@denkensk: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
pull-enhancements-verify	`d0f33a9`	link	`/test pull-enhancements-verify`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

angao · 2020-03-18T07:04:47Z

keps/sig-scheduling/20200116-coscheduling-plugin-based-on-scheduler-framework.md

+In `Permit` phase, we put the pod that doesn't meet `minAvailable` into the WaitingMap and reserve resources until `minAvailable` are met or timeout is triggered.
+1. Get the number of Running pods that belong to the same PodGroup
+2. Get the number of WaitingPods (used to record pods in waiting status) that belong to the same PodGroup
+3. If Running + WaitingPods + 1 >= `minAvailable`(1 means the pod itself), approve the waiting pods that  belong to the same PodGroup. Otherwise, put the pod into WaitingPods and set the timeout (eg: the timeout is dynamic value depends on the size of the `PodGroup`.)


I think of a generation algorithm about the dynamic timeout. We know that a group of pods is exactly the same. When the cluster node is unchanged, we can assume that the scheduling time of each pod is the same, then we can get a formula
(minAvailable - waitingPods - 1) * schedulingTime, where schedulingTime is the time that pod joins the scheduling queue to run to the permit plug-in. For example, if the minAvailable of a job is 5 and the time to schedule a pod is 10ms, then the waiting time for the first pod is (5-0-1) * 10ms, the waiting time for the second pod is (5-1-1) * 10ms, and so on for the rest of the pods. Considering network, load, etc., it is possible to add a random time to the timeout. WDYT?

denkensk · 2020-03-26T02:28:48Z

Because this plugin is not in-tree core feature, this kep is moved to kubernetes-sigs/scheduler-plugins#2
It will be incubated in https://github.com/kubernetes-sigs/scheduler-plugins

k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 16, 2020

k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Jan 16, 2020

k8s-ci-robot requested review from ahg-g and k82cn January 16, 2020 13:40

kep for coscheduling-plugin

8e2ffff

denkensk force-pushed the coscheduling-plugin branch from 223cb39 to 8e2ffff Compare January 16, 2020 13:42

denkensk requested review from Huang-Wei and removed request for k82cn January 16, 2020 13:43

k8s-ci-robot assigned ahg-g Jan 17, 2020

k8s-ci-robot requested a review from hex108 January 17, 2020 03:18

Huang-Wei reviewed Jan 18, 2020

View reviewed changes

denkensk added 2 commits January 19, 2020 11:22

fix express

29bfcc7

fix express

4f5b50b

denkensk commented Jan 19, 2020

View reviewed changes

fix express

7644eeb

ahg-g reviewed Jan 20, 2020

View reviewed changes

ahg-g reviewed Jan 21, 2020

View reviewed changes

denkensk and others added 9 commits January 21, 2020 10:38

Update keps/sig-scheduling/20200116-coscheduling-plugin-based-on-sche…

380b7ba

…duler-framework.md Co-Authored-By: ahg-g <40361897+ahg-g@users.noreply.github.com>

Update keps/sig-scheduling/20200116-coscheduling-plugin-based-on-sche…

5873d2c

…duler-framework.md Co-Authored-By: ahg-g <40361897+ahg-g@users.noreply.github.com>

Update keps/sig-scheduling/20200116-coscheduling-plugin-based-on-sche…

be06e52

…duler-framework.md Co-Authored-By: ahg-g <40361897+ahg-g@users.noreply.github.com>

Update keps/sig-scheduling/20200116-coscheduling-plugin-based-on-sche…

622b108

…duler-framework.md Co-Authored-By: ahg-g <40361897+ahg-g@users.noreply.github.com>

Update keps/sig-scheduling/20200116-coscheduling-plugin-based-on-sche…

9b48b36

…duler-framework.md Co-Authored-By: ahg-g <40361897+ahg-g@users.noreply.github.com>

Update keps/sig-scheduling/20200116-coscheduling-plugin-based-on-sche…

63f27a4

…duler-framework.md Co-Authored-By: ahg-g <40361897+ahg-g@users.noreply.github.com>

Update keps/sig-scheduling/20200116-coscheduling-plugin-based-on-sche…

b4928e1

…duler-framework.md Co-Authored-By: ahg-g <40361897+ahg-g@users.noreply.github.com>

update kep for coscheduling-plugin

3e91600

update kep for coscheduling-plugin

922eff4

k82cn reviewed Feb 10, 2020

View reviewed changes

denkensk and others added 2 commits February 10, 2020 11:15

Update keps/sig-scheduling/20200116-coscheduling-plugin-based-on-sche…

bb7d968

…duler-framework.md Co-Authored-By: Wei Huang <hweicdl@gmail.com>

coscheduling-plugin

696d49a

wgliang suggested changes Feb 18, 2020

View reviewed changes

coscheduling-plugin

d887152

sftim reviewed Feb 18, 2020

View reviewed changes

alculquicondor mentioned this pull request Feb 18, 2020

PriorityQueue is far from FIFO for same priority Pods kubernetes/kubernetes#83834

Closed

denkensk added 2 commits February 26, 2020 23:08

coscheduling-plugin

2e9b7c4

coscheduling-plugin

0c91efa

Huang-Wei reviewed Mar 10, 2020

View reviewed changes

denkensk and others added 2 commits March 12, 2020 00:01

Update keps/sig-scheduling/20200116-coscheduling-plugin-based-on-sche…

fa9b1ff

…duler-framework.md Co-Authored-By: Wei Huang <hweicdl@gmail.com>

Update keps/sig-scheduling/20200116-coscheduling-plugin-based-on-sche…

f3321d8

…duler-framework.md Co-Authored-By: Wei Huang <hweicdl@gmail.com>

alculquicondor reviewed Mar 12, 2020

View reviewed changes

denkensk and others added 2 commits March 12, 2020 23:56

Update keps/sig-scheduling/20200116-coscheduling-plugin-based-on-sche…

95835c9

…duler-framework.md Co-Authored-By: Wei Huang <hweicdl@gmail.com>

coscheduling-plugin

56126ca

alculquicondor reviewed Mar 13, 2020

View reviewed changes

coscheduling-plugin

3380577

angao reviewed Mar 17, 2020

View reviewed changes

coscheduling-plugin

d0f33a9

angao reviewed Mar 18, 2020

View reviewed changes

happy2048 mentioned this pull request Mar 25, 2020

Added Volcano as CNCF Sandbox. cncf/toc#318

Merged

denkensk closed this Mar 26, 2020

Huang-Wei mentioned this pull request Mar 26, 2020

KEP for co-scheduling plugin based on scheduler framework kubernetes-sigs/scheduler-plugins#2

Merged

denkensk mentioned this pull request Apr 3, 2020

Co-scheduling plugin based on scheduler framework kubernetes-sigs/scheduler-plugins#4

Merged

denkensk mentioned this pull request Dec 1, 2020

coscheduling queue sort plugin starves pods kubernetes-sigs/scheduler-plugins#110

Closed

denkensk mentioned this pull request Oct 23, 2021

Move coscheduling plugin from scheduler-plugins to kubernetes/kubernetes kubernetes/kubernetes#105802

Closed


		In order to implement the coscheduling, we developed plugins in different extension points. In `Sort` phase, add strategy to try to make sure that the pods belong to same PodGroup are queued continuously. In `Permit` phase，put the pod that doesn't meet min-available into WaitingMap and reserve resources until min-available are met or timeout. In `Unreserve` phase，clean up the pods which is timeout.

		![image](20200116-coscheduling-plugin-based-on-scheduler-framework-extensions.png)


		## Proposal

		In order to implement the coscheduling, we developed plugins in different extension points. In `Sort` phase, add strategy to try to make sure that the pods belong to same PodGroup are queued continuously. In `Permit` phase，put the pod that doesn't meet min-available into WaitingMap and reserve resources until min-available are met or timeout. In `Unreserve` phase，clean up the pods which is timeout.


		In order to implement the coscheduling, we developed plugins in different extension points. In `QueueSort` phase, implement a strategy to ensure the Pods belonging to the same PodGroup are queued continously. For example, suppose PodGroup A owns Pod-A1, Pod-A2, Pod-A3, while PodGroup B owns Pod-B1, Pod-B2. The pods of the two PodGroups cannot be zigzagged - it should be always <PodGroup-A, PodGroup-B> or the other way around; it should NOT be any way like <Pod-A1, Pod-B1, Pod-A2, ...>In `Permit` phase，put the pod that doesn't meet min-available into WaitingMap and reserve resources until min-available are met or timeout. In `Unreserve` phase，clean up the pods which is timeout.

		![image](./20200116-coscheduling-plugin-based-on-scheduler-framework-extensions.png)

	1. When a pod which belongs to a `PodGroup` comes, calculate the total number of Pods excludes terminating ones belongs to the same PodGroup. If the result is less than minAvailable, the scheduling process is terminated directly.
	1. When a pod which belongs to a `PodGroup` comes, calculate the total number of Pods that belong to the same PodGroup excluding terminating ones. If the result is less than minAvailable, the scheduling process is terminated directly.

		## Alternatives considered
		1. Using `PodGroup` as a scheduling unit. This requires major refactoring, which only supports Pods as scheduling unit today.

		1. Trying to order by the `LastFailureTimestamp` of the podGroup. Pod with earlier timestamp is scheduled ahead of other pod.

		2. If `LastFailureTimestampA` is equal to `LastFailureTimestampB`, trying to order by the `UID` of `PodGroup`,`Pod` with lexicographically greater `UID` is scheduled ahead of other pod. (The purpose is to distinguish different `PodGroup` with the same `LastFailureTimestamp` and to keep the pods of the same `PodGroup` together)


		2. When podA and podB are both regularPods (we will check it by their labels), it follows the same logic of default in-tree [PrioritySort](https://github.com/kubernetes/kubernetes/blob/master/pkg/scheduler/framework/plugins/queuesort/priority_sort.go#L41-L45) plugin.

		3. When podA is regularPod, podB is pgPod, trying to order by podA's `Timestamp` (the time pod added to the scheduling queue) and podB’s `LastFailureTimestamp` (we get the PodGroupInfo from the cache). Pod with earlier timestamp is scheduled ahead of other pod.


		Note1: There are different `LastFailureTimestamp` (even if they are the same, the UID will be different). So when the pods enter the queue, the pods that belongs to the same PodGroup will be together.

		#### Pre-Filter


		## Proposal

		In order to implement coscheduling, we developed plugins in different extension points. In `QueueSort` we ensure that the Pods belonging to the same PodGroup are queued back-to-back. For example, suppose PodGroup A owns Pod-A1, Pod-A2, Pod-A3, while PodGroup B owns Pod-B1, Pod-B2. The pods of the two PodGroups should not interleave - it should be always <PodGroup-A, PodGroup-B> or the other way around; but never <Pod-A1, Pod-B1, Pod-A2, ...>. In `Permit` phase we put the pod that doesn't meet min-available into the WaitingMap and reserve resources until min-available are met or timeout is triggered. In `Unreserve` phase，clean up the pods that timed-out.


		2. `PreFilter` validates that if we should reject `WaitingPods` in advance. When the pod belongs to a new `PodGroup` (we can check it by the `LastPodGroup` in cache), it indicates that the new PodGroup scheduling cycle has been entered. Then we will check if the last `PodGroup` meets the minAvailable, if not, we will reject it directly in advance. The purpose is 1.To avoid invalid waiting, 2.To make the pod belongs to the same `PodGroup` fail together, rather than waiting partially.
		Here we're adopting the latter approach, `PreFilter` validates that if the total number of pods belonging to the same `PodGroup` is less than `minAvailable`. If so, the scheduling process will be interrupted directly.

KEP for co-scheduling plugin based on scheduler framework #1463

KEP for co-scheduling plugin based on scheduler framework #1463

Conversation

denkensk commented Jan 16, 2020 • edited Loading

k8s-ci-robot commented Jan 16, 2020

denkensk commented Jan 17, 2020

Huang-Wei left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

denkensk Jan 19, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahg-g left a comment

Choose a reason for hiding this comment

ahg-g left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Huang-Wei left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

denkensk Mar 12, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

denkensk commented Jan 16, 2020 •

edited

Loading

denkensk Jan 19, 2020 •

edited

Loading

denkensk Mar 12, 2020 •

edited

Loading