-
Notifications
You must be signed in to change notification settings - Fork 640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prefered scheduling #211
Comments
/kind feature |
Apart from https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/ |
I tried to work on it but found it was difficult to implement this feature. I believe it is beneficial for anyone who is concerned with this issue to understand the fundamental difficulty. It is indeed possible to detect a pod having a more preferable node to be scheduled in terms of the sum of affinity weight. However, even if we evict the pod to let kube-scheduler place it in another node with a higher affinity score, sometimes it results in having the new pod in the same node as before. This is because scheduling is decided not only by node affinity or inter-pod affinity. (https://kubernetes.io/docs/concepts/scheduling/kube-scheduler/#scoring) So unless descheduler can make the same decision as kube-scheduler, it can cause this kind of ineffective pod evictions. I have no good idea to overcome this difficulty. Copying all scheduling policies in kube-scheduler to descheduler is not realistic. As a user of kubernetes, I decided to always use |
I am looking for the same behavior, although for a different use case: Unless I am missing a Kubernetes feature which I'm unaware of that supports exactly that... |
@barucoh take a look at the topologySpreadConstraints feature. This feature is was promoted to beta and is enabled by default starting with k8s v1.18. |
This is taken from my other comment on the above closed issue, describing my usecase example
The issue referenced above does have a PR against it with a potential implementation however |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
There is some discussion about having the Scheduler have a DryRun capability, which could mean that @asnkh solution doesn't require to import all the scheduling logic into this project. Alas, the issue says that at the moment the way to "check capacity" is using the cluster-capacity tool. kubernetes/kubernetes#58242 <-- kube-scheduler dry run request |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
/remove-lifecycle stale |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
If someone still wants this functionality in a simple quick and dirty way, please check on https://github.com/decayofmind/kube-better-node |
/remove-lifecycle rotten |
@decayofmind Looks really great! I also have a similar use case. We have SPOT autoscaling groups and I need to be sure that my pods always can be scheduled. So I cannot use the required node affinity because there can be a case when my spot ASG will be downscaled to 0. And I want to reschedule the pod if the preferred node affinity can be solved. |
We also have a similar use case, where we want de-scheduler to work with preferredDuringSchedulingIgnoredDuringExecution because if we use "required" then the no of pods that we could span will be limited to no of Nodes if it's single zone cluster or it will be limited to no of zones if it's a multi zonal cluster |
@rajivml have you looked at topology spread constraints? Your situation sounds similar to one that was mentioned above #211 (comment) |
I have the exact same use case as @sergeyshevch, and I don't know of any other way to address this issue. I'd like to add that for this a 100% solution isn't needed, I'd be happy with 80% as well, and I don't mind that
as @asnkh mentioned earlier. Especially because the As long as the evictions respect PDB's I don't mind them at all. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
As I understand, the problem here is that this could result in "flapping" of pods. For some of us, this could be acceptable. We don't current have a good solution for making the same decision of kube-scheduler (eg cluster-capacity tool, dry run, etc), so what about accepting the limitation and reducing the impact of the flapping? For example, if I could say "don't deschedule a pod with an age of less than 30 minutes", the pod could flap at most once every 30 minutes. The hope would be, in a dynamic cluster, the pod would eventually be moved as desired. Worst case, you have a flap periodically, and you would understand and accept this if you want to use the feature. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale @seanmalloy I guess that such a use case can be implemented later and we should freeze this issue to continue the discussion and next implementations. Can you freeze it? |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
I seems this has been finally implemented: #1210 |
/close |
It will be nice if we can have soft check like "preferredDuringSchedulingIgnoredDuringExecution"
to calculate if there is a node with better weight than the node where the pod is running to evict the pod and therefore be scheduled in the better node.
For Example because network routes I prefer my deployment PODS run on ZoneA but if not possible because Nodes there are offline then I aceept to be deployed on ZoneB, but if ZoneA is reachable again I want to reschedule back to zoneA.
The text was updated successfully, but these errors were encountered: