Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefered scheduling #211

Closed
segator opened this issue Jan 3, 2020 · 34 comments
Closed

Prefered scheduling #211

segator opened this issue Jan 3, 2020 · 34 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@segator
Copy link

segator commented Jan 3, 2020

It will be nice if we can have soft check like "preferredDuringSchedulingIgnoredDuringExecution"
to calculate if there is a node with better weight than the node where the pod is running to evict the pod and therefore be scheduled in the better node.

For Example because network routes I prefer my deployment PODS run on ZoneA but if not possible because Nodes there are offline then I aceept to be deployed on ZoneB, but if ZoneA is reachable again I want to reschedule back to zoneA.

@seanmalloy
Copy link
Member

/kind feature

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 6, 2020
@nathan-vp
Copy link

nathan-vp commented Feb 27, 2020

Apart from preferredDuringSchedulingIgnoredDuringExecution, it would be nice if soft taints like PreferNoSchedule could also be taken into account.

https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/

@asnkh
Copy link

asnkh commented Mar 21, 2020

I tried to work on it but found it was difficult to implement this feature. I believe it is beneficial for anyone who is concerned with this issue to understand the fundamental difficulty.

It is indeed possible to detect a pod having a more preferable node to be scheduled in terms of the sum of affinity weight. However, even if we evict the pod to let kube-scheduler place it in another node with a higher affinity score, sometimes it results in having the new pod in the same node as before. This is because scheduling is decided not only by node affinity or inter-pod affinity. (https://kubernetes.io/docs/concepts/scheduling/kube-scheduler/#scoring)

So unless descheduler can make the same decision as kube-scheduler, it can cause this kind of ineffective pod evictions. I have no good idea to overcome this difficulty. Copying all scheduling policies in kube-scheduler to descheduler is not realistic.

As a user of kubernetes, I decided to always use requiredDuringSchedulingIgnoredDuringExecution. Doing so brought other issues but they are resolvable in my case.

@barucoh
Copy link

barucoh commented Apr 2, 2020

I am looking for the same behavior, although for a different use case:
I need to "redistribute" the pods with preferredDuringSchedulingIgnoredDuringExecution across the available AZs to ensure High Availability at all times while preserving pod scale-out operations to have more pods than the number of AZs for the cluster (this is because requiredDuringSchedulingIgnoredDuringExecution will allow a maximum number of pods as the number of AZs the cluster sees).

Unless I am missing a Kubernetes feature which I'm unaware of that supports exactly that...

@seanmalloy
Copy link
Member

I am looking for the same behavior, although for a different use case:
I need to "redistribute" the pods with preferredDuringSchedulingIgnoredDuringExecution across the available AZs to ensure High Availability at all times while preserving pod scale-out operations to have more pods than the number of AZs for the cluster (this is because requiredDuringSchedulingIgnoredDuringExecution will allow a maximum number of pods as the number of AZs the cluster sees).

Unless I am missing a Kubernetes feature which I'm unaware of that supports exactly that...

@barucoh take a look at the topologySpreadConstraints feature. This feature is was promoted to beta and is enabled by default starting with k8s v1.18.

@yoda
Copy link

yoda commented Apr 29, 2020

This is taken from my other comment on the above closed issue, describing my usecase example

Use case would be where you have 2 autoscaler groups of nodes where one is a spot instance type and the other is standard. The spot nodes get terminated on price going above threshold resulting in the preferredDuringScheduling being invalidated resulting in scheduling onto standard nodes. Over time the price on the spots goes back down and the descheduler can cause the rescheduling of them back onto spot instances.

The issue referenced above does have a PR against it with a potential implementation however
@asnkh has an awkward point that in some cases this would possibly result in "flapping" of the redeploys, but maybe this can be dealt with by additional affinities.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 28, 2020
@seanmalloy
Copy link
Member

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 30, 2020
@kesor
Copy link

kesor commented Sep 23, 2020

There is some discussion about having the Scheduler have a DryRun capability, which could mean that @asnkh solution doesn't require to import all the scheduling logic into this project. Alas, the issue says that at the moment the way to "check capacity" is using the cluster-capacity tool.

kubernetes/kubernetes#58242 <-- kube-scheduler dry run request
https://github.com/kubernetes-sigs/cluster-capacity <-- cluster-capacity tool to check where a pod "might be" scheduled

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 22, 2020
@seanmalloy
Copy link
Member

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 22, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 22, 2021
@seanmalloy
Copy link
Member

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 23, 2021
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 21, 2021
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 21, 2021
@decayofmind
Copy link

If someone still wants this functionality in a simple quick and dirty way, please check on https://github.com/decayofmind/kube-better-node

@seanmalloy
Copy link
Member

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 20, 2021
@sergeyshevch
Copy link

@decayofmind Looks really great! I also have a similar use case. We have SPOT autoscaling groups and I need to be sure that my pods always can be scheduled.

So I cannot use the required node affinity because there can be a case when my spot ASG will be downscaled to 0. And I want to reschedule the pod if the preferred node affinity can be solved.

@rajivml
Copy link

rajivml commented Dec 2, 2021

We also have a similar use case, where we want de-scheduler to work with preferredDuringSchedulingIgnoredDuringExecution because if we use "required" then the no of pods that we could span will be limited to no of Nodes if it's single zone cluster or it will be limited to no of zones if it's a multi zonal cluster

@damemi
Copy link
Contributor

damemi commented Dec 2, 2021

@rajivml have you looked at topology spread constraints? Your situation sounds similar to one that was mentioned above #211 (comment)

@dvdvorle
Copy link

I have the exact same use case as @sergeyshevch, and I don't know of any other way to address this issue.

I'd like to add that for this a 100% solution isn't needed, I'd be happy with 80% as well, and I don't mind that

it can cause this kind of ineffective pod evictions

as @asnkh mentioned earlier. Especially because the RemoveDuplicates strategy has the same restriction, no? It also can lead to ineffective pod evictions, but it's still a useful strategy to have.

As long as the evictions respect PDB's I don't mind them at all.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 27, 2022
@sergeyshevch
Copy link

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 6, 2022
@shaneqld
Copy link

It is indeed possible to detect a pod having a more preferable node to be scheduled in terms of the sum of affinity weight. However, even if we evict the pod to let kube-scheduler place it in another node with a higher affinity score, sometimes it results in having the new pod in the same node as before. This is because scheduling is decided not only by node affinity or inter-pod affinity. (https://kubernetes.io/docs/concepts/scheduling/kube-scheduler/#scoring)

So unless descheduler can make the same decision as kube-scheduler, it can cause this kind of ineffective pod evictions. I have no good idea to overcome this difficulty. Copying all scheduling policies in kube-scheduler to descheduler is not realistic.

As I understand, the problem here is that this could result in "flapping" of pods. For some of us, this could be acceptable.

We don't current have a good solution for making the same decision of kube-scheduler (eg cluster-capacity tool, dry run, etc), so what about accepting the limitation and reducing the impact of the flapping? For example, if I could say "don't deschedule a pod with an age of less than 30 minutes", the pod could flap at most once every 30 minutes.

The hope would be, in a dynamic cluster, the pod would eventually be moved as desired. Worst case, you have a flap periodically, and you would understand and accept this if you want to use the feature.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 16, 2022
@sergeyshevch
Copy link

/remove-lifecycle stale

@seanmalloy I guess that such a use case can be implemented later and we should freeze this issue to continue the discussion and next implementations.

Can you freeze it?

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 16, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 15, 2022
@z0rc
Copy link

z0rc commented Dec 15, 2022

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 15, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 15, 2023
@z0rc
Copy link

z0rc commented Mar 15, 2023

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 15, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 13, 2023
@z0rc
Copy link

z0rc commented Jun 13, 2023

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 13, 2023
@miqm
Copy link

miqm commented Sep 4, 2023

I seems this has been finally implemented: #1210

@a7i
Copy link
Contributor

a7i commented Oct 25, 2023

/close

@a7i a7i closed this as completed Oct 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests