Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve cluster autoscaling #24404

Closed
fgrzadkowski opened this issue Apr 18, 2016 · 7 comments
Closed

Improve cluster autoscaling #24404

fgrzadkowski opened this issue Apr 18, 2016 · 7 comments
Assignees
Labels
priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling.

Comments

@fgrzadkowski
Copy link
Contributor

fgrzadkowski commented Apr 18, 2016

Current hacky version has number of drawbacks:

  • works only on GCE
  • doesn't support number of cases and we may still have pending pods
  • very cloud provider specific
  • very hard to update configuration
  • UX is very poor

We'd like to improve it to:

  • support all cases for pending pods
  • provide a reference implementation that could be ported to other cloud providers.

It should support both scaling up (P1) and scaling down (P2).

This an umbrella bug used for referencing PRs etc.

@fgrzadkowski fgrzadkowski added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. team/control-plane labels Apr 18, 2016
@fgrzadkowski fgrzadkowski added this to the v1.3 milestone Apr 18, 2016
@fgrzadkowski
Copy link
Contributor Author

@roberthbailey
Copy link
Contributor

/cc @erictune since this is effectively a "feature issue" and might be a good use case for the new feature issue proposal.

fgrzadkowski added a commit to fgrzadkowski/kubernetes that referenced this issue Apr 20, 2016
when scheduler tried to schedule a Pod, but failed.

Ref kubernetes#24404
fgrzadkowski added a commit to fgrzadkowski/kubernetes that referenced this issue May 12, 2016
when scheduler tried to schedule a Pod, but failed.

Ref kubernetes#24404
k8s-github-robot pushed a commit that referenced this issue May 12, 2016
Automatic merge from submit-queue

Add pod condition PodScheduled to detect situation when scheduler tried to schedule a Pod, but failed

Set `PodSchedule` condition to `ConditionFalse` in `scheduleOne()` if scheduling failed and to `ConditionTrue` in `/bind` subresource.

Ref #24404

@mml (as it seems to be related to "why pending" effort)

<!-- Reviewable:start -->
---
This change is [<img src="http://reviewable.k8s.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](http://reviewable.k8s.io/reviews/kubernetes/kubernetes/24459)
<!-- Reviewable:end -->
@davidopp
Copy link
Member

Can you list what work remains to be done before you can move this issue to 1.4 milestone?

@davidopp
Copy link
Member

ping - can you list what work remains to be done before you can move this issue to 1.4 milestone?

@matchstick matchstick modified the milestones: next-candidate, v1.3 Jun 15, 2016
@matchstick
Copy link
Contributor

Please list what work needs to be done for next candidate.

@mwielgus
Copy link
Contributor

mwielgus commented Jun 15, 2016

  • More efficient unneeded node analysis. The current code will work ok for small to medium clusters (50-150 nodes with 1k-3k pods) but will not work for 1k node/30k pods.
  • Reduce latency in scale down. Right now we are super conservative when considering nodes for scale down - for 10 min we check whether is is possible to schedule all of their pods on other machines. Once we delete one node the checks for all other machines are invalidated (to some degree) and we start counting from the begin. This means that we can only remove 6 nodes per hour. We have to either consider deletion of multiple machines at once or relax our strategy.
  • Clean ups in the code.
  • Decide what to do with best effort pods.

@timothysc timothysc added the sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. label Jun 16, 2016
@wojtek-t wojtek-t removed sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. team/control-plane (deprecated - do not use) labels May 30, 2017
@mwielgus
Copy link
Contributor

Closing the issue. All Cluster Autoscaler issues should be tracked inside https://github.com/kubernetes/autoscaler.
Moreover, all of the improvements mentioned above were done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling.
Projects
None yet
Development

No branches or pull requests

7 participants