New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

preemption and overprovisioning questions #73861

Open
jolson490 opened this Issue Feb 8, 2019 · 3 comments

Comments

Projects
None yet
3 participants
@jolson490
Copy link

jolson490 commented Feb 8, 2019

hello,

Can someone please tell me whether there is any best practice for how to determine (given the unique specifics of your own cluster) what quantity of overprovisioning pods to create, and what the size (i.e. resources.requests.memory) should be of each overprovisioning pod?

https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#preemption says when the scheduler is trying to schedule a newly created (Pending) pod (called P), "If no Node is found that satisfies all the specified requirements of the Pod" then the scheduler "tries to find a Node where removal of one or more Pods with lower priority than P would enable P to be scheduled on that Node", and that "If such a Node is found, one or more lower priority Pods get evicted from the Node". I'm wondering if someone can please confirm the scheduler may indeed evict more than one lower priority Running pod in order to schedule a single given higher priority Pending pod?
The reason why I ask is:

  • I specified resources.requests.memory (for each overprovisioning pod) to be relatively small (compared to the size of the other (higher priority) pods in my cluster) and then when a higher priority pod was Pending the scheduler seemed to not evict Running overprovisioning pods in order to schedule the Pending pod.
  • But when I changed the memory/size of each overprovisioning pod to be equal to the size of the largest higher priority pod in my cluster, then preemption occurred.

Regarding the quantity of overprovisioning pods to create, it seems that should depend on how long it takes in your cluster for a new node to be provisioned? (i.e. a longer provisioning time would mean you'd want a larger buffer of overprovisioning pods?)


Extra info:

(This is my first time creating an issue in this repo, and I'm new to preemption and overprovisioning.) Any pointers would be greatly appreciated - thanks!

Note that before I created this issue, I searched the existing issues in this repo for the following terms (and I read those issues): "Unable to schedule", "preemption", & "overprovisioning".

I'm running K8s v1.12.3 in AWS (and I'm using cluster-autoscaler v1.12.2).

I followed https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-can-i-configure-overprovisioning-with-cluster-autoscaler to create the overprovisioning pods in my cluster (and I didn't bother deploying cluster-proportional-autoscaler to my cluster) - and FWIW that e.g. shows for overprovisioning specifying 200m for resources.requests.cpu (but no value is specified for memory).
And https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-does-cluster-autoscaler-work-with-pod-priority-and-preemption is also reference info.

@jolson490

This comment has been minimized.

Copy link
Author

jolson490 commented Feb 8, 2019

@kubernetes/sig-scheduling-proposals

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Feb 8, 2019

@jolson490: Reiterating the mentions to trigger a notification:
@kubernetes/sig-scheduling-proposals

In response to this:

@kubernetes/sig-scheduling-proposals

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@bsalamat

This comment has been minimized.

Copy link
Contributor

bsalamat commented Feb 12, 2019

@jolson490 The scheduler preemption is capable of evicting more than one pod from a single node if needed.
Without having access to logs, it is hard to say why it didn't evict more than one pod in your case, but the most common scenario is when the pending pod cannot be scheduled even after lower priority pods are removed. For example, because the pending pod is so large that it still does not fit on the node, or the pod has node affinity, pod affinity, etc that cannot be satisfied after removing other lower priority pods.

BTW, please use our sig-scheduling channel on https://kubernetes.slack.com for asking support questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment