New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Priority Scheduling w/o Preemption #62068

Closed
salilgupta1 opened this Issue Apr 3, 2018 · 13 comments

Comments

Projects
None yet
7 participants
@salilgupta1

salilgupta1 commented Apr 3, 2018

Is this a BUG REPORT or FEATURE REQUEST?:

/kind feature

What happened:
We are looking to use our Kubernetes cluster as a workload processing system for our clients. We want to use the priority functionality in 1.9. However, we have a hard requirement where we cannot preempt a clients job for another job. Is there any way to disable preemption but maintain priority? We only need priority to order the scheduling of workloads going onto the cluster.

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

@jennac

@salilgupta1

This comment has been minimized.

Show comment
Hide comment
@salilgupta1

salilgupta1 Apr 3, 2018

/sig scheduling

salilgupta1 commented Apr 3, 2018

/sig scheduling

@k82cn

This comment has been minimized.

Show comment
Hide comment
@k82cn

k82cn Apr 4, 2018

Member

for now, priority & preemption are enabled/disabled together; but that's a reasonable request to sort Pod by priority only.

/cc @bsalamat

Member

k82cn commented Apr 4, 2018

for now, priority & preemption are enabled/disabled together; but that's a reasonable request to sort Pod by priority only.

/cc @bsalamat

@k82cn

This comment has been minimized.

Show comment
Hide comment
Member

k82cn commented Apr 4, 2018

@resouer

This comment has been minimized.

Show comment
Hide comment
@resouer

resouer Apr 4, 2018

Member

We can separate them in feature gates but I would like to understand this request better because AFAIK features.PodPriority only affects behavior of scheduling queue, which is more like a internal implementation detail and "invisible" to user.

[Update]: I mean it's not clear how this change will affect users, as scheduler queue is not directly accessible to them.

@salilgupta1 could you please explain a little bit how your system collaborate with Pod priority feature? Especially how features.PodPriority=false would affect that?

Member

resouer commented Apr 4, 2018

We can separate them in feature gates but I would like to understand this request better because AFAIK features.PodPriority only affects behavior of scheduling queue, which is more like a internal implementation detail and "invisible" to user.

[Update]: I mean it's not clear how this change will affect users, as scheduler queue is not directly accessible to them.

@salilgupta1 could you please explain a little bit how your system collaborate with Pod priority feature? Especially how features.PodPriority=false would affect that?

@salilgupta1

This comment has been minimized.

Show comment
Hide comment
@salilgupta1

salilgupta1 Apr 4, 2018

@resouer At least from the documentation it seems like PodPriority is tightly bound to preemption

Pods in Kubernetes 1.8 and later can have priority. Priority indicates the importance of a Pod relative to other Pods. When a Pod cannot be scheduled, the scheduler tries to preempt (evict) lower priority Pods to make scheduling of the pending Pod possible. In Kubernetes 1.9 and later, Priority also affects scheduling order of Pods and out-of-resource eviction ordering on the Node.

Our use case: Clients run a variety of jobs on our platform. We want to run these jobs on K8s. The issue is a client (let's say Client A) could flood the job queue with hundreds of jobs that take up all the available resources on our cluster thus blocking other clients from running jobs. If we could set a priority on the job such that Client A's jobs get de-prioritized in comparison to other client jobs we would have a way of protecting the queue from being flooded by a single client. We want to use K8s podPriority feature but we cannot preempt already running jobs.

salilgupta1 commented Apr 4, 2018

@resouer At least from the documentation it seems like PodPriority is tightly bound to preemption

Pods in Kubernetes 1.8 and later can have priority. Priority indicates the importance of a Pod relative to other Pods. When a Pod cannot be scheduled, the scheduler tries to preempt (evict) lower priority Pods to make scheduling of the pending Pod possible. In Kubernetes 1.9 and later, Priority also affects scheduling order of Pods and out-of-resource eviction ordering on the Node.

Our use case: Clients run a variety of jobs on our platform. We want to run these jobs on K8s. The issue is a client (let's say Client A) could flood the job queue with hundreds of jobs that take up all the available resources on our cluster thus blocking other clients from running jobs. If we could set a priority on the job such that Client A's jobs get de-prioritized in comparison to other client jobs we would have a way of protecting the queue from being flooded by a single client. We want to use K8s podPriority feature but we cannot preempt already running jobs.

@bsalamat

This comment has been minimized.

Show comment
Hide comment
@bsalamat

bsalamat Apr 5, 2018

Contributor

Our use case: Clients run a variety of jobs on our platform. We want to run these jobs on K8s. The issue is a client (let's say Client A) could flood the job queue with hundreds of jobs that take up all the available resources on our cluster thus blocking other clients from running jobs. If we could set a priority on the job such that Client A's jobs get de-prioritized in comparison to other client jobs we would have a way of protecting the queue from being flooded by a single client. We want to use K8s podPriority feature but we cannot preempt already running jobs.

As you have noticed, priority and preemption are enabled together. We can separate them, but first let me understand the problem better.
This is the scenario from your example: Client A runs hundreds of jobs and fills up all the available resources in the cluster and still has quite a few more jobs in the scheduling queue. Another client, Client B, with higher priority arrives and creates a few pods and get in front of the Client A's jobs in the scheduling queue. Client B's jobs won't schedule when preemption is disabled. So, what's the point of getting in front of Client A in the queue?

Contributor

bsalamat commented Apr 5, 2018

Our use case: Clients run a variety of jobs on our platform. We want to run these jobs on K8s. The issue is a client (let's say Client A) could flood the job queue with hundreds of jobs that take up all the available resources on our cluster thus blocking other clients from running jobs. If we could set a priority on the job such that Client A's jobs get de-prioritized in comparison to other client jobs we would have a way of protecting the queue from being flooded by a single client. We want to use K8s podPriority feature but we cannot preempt already running jobs.

As you have noticed, priority and preemption are enabled together. We can separate them, but first let me understand the problem better.
This is the scenario from your example: Client A runs hundreds of jobs and fills up all the available resources in the cluster and still has quite a few more jobs in the scheduling queue. Another client, Client B, with higher priority arrives and creates a few pods and get in front of the Client A's jobs in the scheduling queue. Client B's jobs won't schedule when preemption is disabled. So, what's the point of getting in front of Client A in the queue?

@aveshagarwal

This comment has been minimized.

Show comment
Hide comment
@aveshagarwal

aveshagarwal Apr 5, 2018

Member

One use case of priority without preemption could be: 1) lets say a node (or a set of nodes or could be whole cluster) is drained and restarted after upgrade or node issues, and now just priority makes sure that higher priority pods in the scheduling queue get scheduled first.

Member

aveshagarwal commented Apr 5, 2018

One use case of priority without preemption could be: 1) lets say a node (or a set of nodes or could be whole cluster) is drained and restarted after upgrade or node issues, and now just priority makes sure that higher priority pods in the scheduling queue get scheduled first.

@bsalamat

This comment has been minimized.

Show comment
Hide comment
@bsalamat

bsalamat Apr 6, 2018

Contributor

One use case of priority without preemption could be: 1) lets say a node (or a set of nodes or could be whole cluster) is drained and restarted after upgrade or node issues, and now just priority makes sure that higher priority pods in the scheduling queue get scheduled first.

Makes sense. I am fine with adding a separate feature gate for preemption.

Contributor

bsalamat commented Apr 6, 2018

One use case of priority without preemption could be: 1) lets say a node (or a set of nodes or could be whole cluster) is drained and restarted after upgrade or node issues, and now just priority makes sure that higher priority pods in the scheduling queue get scheduled first.

Makes sense. I am fine with adding a separate feature gate for preemption.

@resouer

This comment has been minimized.

Show comment
Hide comment
@resouer

resouer Apr 7, 2018

Member

Great to see the use case is clear, allow me to take care of this change if no one has picked it up.

/assign

Member

resouer commented Apr 7, 2018

Great to see the use case is clear, allow me to take care of this change if no one has picked it up.

/assign

@k82cn

This comment has been minimized.

Show comment
Hide comment
@k82cn

k82cn Apr 7, 2018

Member

Client B's jobs won't schedule when preemption is disabled. So, what's the point of getting in front of Client A in the queue?

IMO, it's a kind of run-to-complete pods, maybe @salilgupta1 can give more info :)

As comment above, that's reasonable request to me.

Member

k82cn commented Apr 7, 2018

Client B's jobs won't schedule when preemption is disabled. So, what's the point of getting in front of Client A in the queue?

IMO, it's a kind of run-to-complete pods, maybe @salilgupta1 can give more info :)

As comment above, that's reasonable request to me.

@salilgupta1

This comment has been minimized.

Show comment
Hide comment
@salilgupta1

salilgupta1 Apr 9, 2018

Another client, Client B, with higher priority arrives and creates a few pods and get in front of the Client A's jobs in the scheduling queue. Client B's jobs won't schedule when preemption is disabled. So, what's the point of getting in front of Client A in the queue?

Hopefully, a single Client isn't using all the resources at any given time, but yes it could happen. If that is the case, we definitely want the next available resources (when any of Client A's jobs finish) to go to someone else, which is why it seems advantageous to bump Client B further up in the queue.

We're hoping to give users a fairer chance at future available resources.

And @k82cn these are not long-lived services.

salilgupta1 commented Apr 9, 2018

Another client, Client B, with higher priority arrives and creates a few pods and get in front of the Client A's jobs in the scheduling queue. Client B's jobs won't schedule when preemption is disabled. So, what's the point of getting in front of Client A in the queue?

Hopefully, a single Client isn't using all the resources at any given time, but yes it could happen. If that is the case, we definitely want the next available resources (when any of Client A's jobs finish) to go to someone else, which is why it seems advantageous to bump Client B further up in the queue.

We're hoping to give users a fairer chance at future available resources.

And @k82cn these are not long-lived services.

k8s-merge-robot added a commit that referenced this issue Apr 19, 2018

Merge pull request #62243 from resouer/fix-62068
Automatic merge from submit-queue (batch tested with PRs 59592, 62308, 62523, 62635, 62243). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Separate pod priority from preemption

**What this PR does / why we need it**:
Users request to split priority and preemption feature gate so they can use priority separately.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #62068 

**Special notes for your reviewer**:

~~I kept use `ENABLE_POD_PRIORITY` as ENV name for gce cluster scripts for backward compatibility reason. Please let me know if other approach is preffered.~~

~~This is a potential **break change** as existing clusters will be affected, we may need to include this in 1.11 maybe?~~

TODO: update this doc https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/

[Update] Usage: in config file for scheduler:
```yaml
apiVersion: componentconfig/v1alpha1
kind: KubeSchedulerConfiguration
...
disablePreemption: true
```

**Release note**:

```release-note
Split PodPriority and PodPreemption feature gate
```
@resouer

This comment has been minimized.

Show comment
Hide comment
@resouer

resouer Apr 19, 2018

Member

@salilgupta1 FYI, now you can use a config of scheduler to say you want to disable preemption or not.

I will also update https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/ in a following PR to reflect this change.

Member

resouer commented Apr 19, 2018

@salilgupta1 FYI, now you can use a config of scheduler to say you want to disable preemption or not.

I will also update https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/ in a following PR to reflect this change.

@byndcivilization

This comment has been minimized.

Show comment
Hide comment
@byndcivilization

byndcivilization Aug 3, 2018

@resouer we're trying to enable this and running into issues with the componentconfig/v1alpha1 api endpoint in k8s 1.11? Is there a trick here? We're getting the following error:
no matches for kind "KubeSchedulerConfiguration" in version "componentconfig/v1alpha1"

byndcivilization commented Aug 3, 2018

@resouer we're trying to enable this and running into issues with the componentconfig/v1alpha1 api endpoint in k8s 1.11? Is there a trick here? We're getting the following error:
no matches for kind "KubeSchedulerConfiguration" in version "componentconfig/v1alpha1"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment