Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS: We should consider updating/reusing cluster-autoscaler to support AWS #11935

Closed
justinsb opened this issue Jul 28, 2015 · 58 comments
Closed

AWS: We should consider updating/reusing cluster-autoscaler to support AWS #11935

justinsb opened this issue Jul 28, 2015 · 58 comments

Comments

@justinsb
Copy link
Member

@justinsb justinsb commented Jul 28, 2015

We have an autoscaling group for the minions; we should consider enabling auto-scaling based e.g. on CPU or a custom metric we publish.

@erictune
Copy link
Member

@erictune erictune commented Jul 30, 2015

A group of us have been discussing node autoscaling this week, including @bgrant0607 @vmarmol @davidopp @jszczepkowski @piosz @gmarek @mwielgus @wojtek-t (probably forgetting some people)

@erictune
Copy link
Member

@erictune erictune commented Jul 30, 2015

One thing we talked about was maybe layering the system like this:

  • Pod horizontal autoscaler scales up pod count using CPU as a signal, and maybe later custom metrics, such as http request rate, http latency, etc.
  • Node (horizontal) autoscaler adds nodes when pods are pending due to the scheduler not being able to find a place in the cluster for the pod (failed PodFitsResources check in scheduler). This assumes that pods set reasonable CPU and memory limits.
    So, the Node autoscaler wouldn't directly look at CPU, but indirectly hears about it due to pods being pending.
@justinsb
Copy link
Member Author

@justinsb justinsb commented Jul 30, 2015

That makes a lot of sense to me. I would love to be involved in any discussions.

AWS auto-scaling-groups (and I believe Google MIGs via autoscalers) allow for a quick-and-dirty version of this. Your approach is infinitely better, though I suspect will take a little longer!

The fact that the scheduler will avoid overloading the cluster makes auto-scaling externally much less useful, so we would be in custom metric territory. Even then, I think that having the master node manage the instances will be a much better experience.

Maybe we could promote this interface out of pkg/cloudprovider/aws (currently used only for e2e tests):
https://github.com/GoogleCloudPlatform/kubernetes/blob/8d5a6b063c68b50e9e2e481c04c4cfec4fa57bde/pkg/cloudprovider/aws/aws.go#L147-L154

@ecowan
Copy link

@ecowan ecowan commented Nov 19, 2015

Hi everyone, I too am very interested in seeing progress on this front. I would really appreciate it if someone could point me to any resources / pull requests that have been done. Thanks!

@satheessh
Copy link

@satheessh satheessh commented Jan 8, 2016

+1

2 similar comments
@rafaljanicki
Copy link

@rafaljanicki rafaljanicki commented Jan 12, 2016

+1

@valery-zhurbenko
Copy link

@valery-zhurbenko valery-zhurbenko commented Feb 15, 2016

+1

@piosz piosz added the help-wanted label Feb 15, 2016
@piosz
Copy link
Member

@piosz piosz commented Feb 15, 2016

If anyone from would like to integrate Kubernetes with AWS autoscaler I'm happy to share our experience with integrating Kubernetes with GCE autoscaler.

cc @fgrzadkowski @mwielgus

@sstarcher
Copy link

@sstarcher sstarcher commented Feb 16, 2016

@piosz I would be interested in hearing your experience with Kubernetes GCE autoscaler and I may be interested in helping with this feature.

@miguelfrde
Copy link
Contributor

@miguelfrde miguelfrde commented Feb 17, 2016

@piosz I would be interested on hearing about your experience and helping with this feature as well.

@jimmycuadra
Copy link
Member

@jimmycuadra jimmycuadra commented Mar 2, 2016

@piosz Yes, please! Very interested in this.

@dengshuan
Copy link

@dengshuan dengshuan commented Mar 24, 2016

Any schedules about this feature? Or is there any more detailed discussion about this?

@mwielgus
Copy link
Contributor

@mwielgus mwielgus commented Mar 24, 2016

For 1.3 we have a plan to revisit cluster autoscaling in Kubernetes and make it more user-friendly. At this moment we are discussing our 1.3 priorities and project assignments internally at Google. We will let you once we reach some agreement regarding the possible scope of the improvement that can be delivered by Google and integration plans for other cloud providers (we will definitely need community help there).

@sstarcher
Copy link

@sstarcher sstarcher commented Mar 24, 2016

Our current AWS scaling strategy for Kubernetes currently has 3 parts

  • Scale add instances on Pending pods
  • Remove instances not running pods
  • A changed to the scheduler to pack our load instead of spreading our load
@apobbati
Copy link

@apobbati apobbati commented Aug 24, 2016

@pbitty Have you made any progress on this issue? I'd like to help anyway i can.

@philk
Copy link

@philk philk commented Aug 24, 2016

#1377 might be what you're looking for

On Tue, Aug 23, 2016 at 10:14 PM Abhinav Pobbati notifications@github.com
wrote:

@pbitty https://github.com/pbitty Have you made any progress on this
issue? I'd like to help anyway i can.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#11935 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AABzei0LS2XunOv4fD_tQoLN4eq-LrKJks5qi9MugaJpZM4FhY1C
.

@bjoernhaeuser
Copy link

@bjoernhaeuser bjoernhaeuser commented Aug 24, 2016

@philk I think the mentioned PR is not what we are looking for. Is there a typo or similar?

@andrewsykim
Copy link
Member

@andrewsykim andrewsykim commented Aug 24, 2016

kubernetes-retired/contrib#1311 is probably the link you are looking for. It references other PRs that have been opened regarding cluster autoscaler for AWS

@philk
Copy link

@philk philk commented Aug 24, 2016

Oh, yeah I was on mobile and didn't realize which repo I was in kubernetes-retired/contrib#1377 was what I meant. (Though 1311 above is useful too)

@bgrant0607 bgrant0607 removed the help-wanted label Aug 30, 2016
@aliakhtar
Copy link

@aliakhtar aliakhtar commented Sep 11, 2016

What's the status on this feature? I came across this blog: http://blog.kubernetes.io/2016/07/autoscaling-in-kubernetes.html which said AWS auto scaling would be coming in 1.3. The current stable version is 1.3.6, but I can't find any info on this.

The AWS getting started doc says the max / desired instances in the AWS auto scaling group can be set, but do the new AWS instances auto register themselves?

@fgrzadkowski
Copy link
Contributor

@fgrzadkowski fgrzadkowski commented Sep 12, 2016

This blogpost was released after 1.3 and said that AWS support will be ready soon. AFAIK it's already the case.

@mwielgus Can you please verify? Are there instructions how to set it up? Have we released the image? Does it require kubernetes 1.4 or is it just starting different add-on?

@andrewsykim
Copy link
Member

@andrewsykim andrewsykim commented Sep 12, 2016

There's a README here. I don't think an official image was made so you would have to fork the contrib repo and build/push the image yourself for now.

@btdlin
Copy link

@btdlin btdlin commented Sep 26, 2016

New to the thread, trying to set up auto-scaling with k8s in aws. Is this supported now? I check the README, but not sure exactly what needs to be done to build/push the image. Any update would be really appreciated. Thanks.

@andrewsykim
Copy link
Member

@andrewsykim andrewsykim commented Sep 26, 2016

@btdlin you have to build your own docker image on whatever revision we started supporting cluster autoscaler on AWS and push it to your own registry. If you don't want to do that my company has published a public image for our own use cases which has AWS support on it wattpad/cluster-autoscaler:v1.1.

@jimmycuadra
Copy link
Member

@jimmycuadra jimmycuadra commented Sep 26, 2016

Is there going to be an official image for the autoscaler? Why make people build it for themselves?

@btdlin
Copy link

@btdlin btdlin commented Sep 26, 2016

Thanks @andrewsykim . Looks like v1.4 just released a few hours ago, do we know if autoscaler in aws is included in v1.4?

@andrewsykim
Copy link
Member

@andrewsykim andrewsykim commented Sep 27, 2016

@jimmycuadra yes I believe there will be an official docker image already, we just didn't know if it the published one supported aws as a cloud provider so we built our own.

@fgrzadkowski
Copy link
Contributor

@fgrzadkowski fgrzadkowski commented Sep 27, 2016

@mwielgus Can we make sure that cluster autoscaler image is released to an official repo? And I think we should close this issue now, as we support AWS :)

@danbeaulieu
Copy link

@danbeaulieu danbeaulieu commented Sep 27, 2016

@fgrzadkowski Hi, I am very much interested in this feature but I find the lack of documentation to be an issue. The README leaves a bit to be desired.

  • How are instances scaled in? ie is there any Rhyme or reason to which is picked
  • Is it possible to have heterogeneous instance types in the cluster?
  • By what metrics can I use to scale? CPU usage? container count? etc

I am a heavy AWS user but new to Kubernetes if that helps understand the audience.

@jimmycuadra
Copy link
Member

@jimmycuadra jimmycuadra commented Sep 27, 2016

Once there is an official image for it, let's make sure the docs for the autoscaler mention where it is!

@fgrzadkowski
Copy link
Contributor

@fgrzadkowski fgrzadkowski commented Sep 28, 2016

We already have a PR inflight for better documentation - kubernetes-retired/contrib#1731

@mwielgus I think that to improve documentation we will also need:

@mwielgus Can we close this issue as fixed?

@andyxning
Copy link
Member

@andyxning andyxning commented Nov 22, 2016

@erictune @sstarcher Does monitor the Pending pods means that we can use the InsufficientCPU or InsufficientMemory event to get the same result and based on these events we can add new nodes to the cluster.

These two type events will be emit when pods can not be scheduled when the required resource(CPU/Memory) can not be fulfilled.

@fgrzadkowski
Copy link
Contributor

@fgrzadkowski fgrzadkowski commented Nov 22, 2016

Quick comments - events where not designed to be the API that other components should depend on. That's why we added Scheduled pod condition with reason Unschedulable.

@andyxning
Copy link
Member

@andyxning andyxning commented Nov 22, 2016

@fgrzadkowski IIUC, you mean that event is not reliable and event is not designed to be depended on for usage like this. The most reliable way is to querying pod info and checking for Scheduled value of a pod spec.

After reading the source code, it seems that it will emit a FailedScheduling event before update the pod status.

@fgrzadkowski
Copy link
Contributor

@fgrzadkowski fgrzadkowski commented Nov 22, 2016

Scheduler will emit events, but they are not considered to be part of the api for other components.

Yes, you should just check pod condition, which is part of PodStatus.

@motymichaely
Copy link

@motymichaely motymichaely commented Dec 22, 2016

Hey team, Is there any k8s version aimed for the this feature to be released? Any suggestions for implementing this with AWS ASG + custom metrics?

@mwielgus
Copy link
Contributor

@mwielgus mwielgus commented Dec 22, 2016

The current version of Cluster Autoscaler (0.4.0) supports AWS ASG. Closing the issue.

@mwielgus mwielgus closed this Dec 22, 2016
@mwielgus
Copy link
Contributor

@mwielgus mwielgus commented Dec 22, 2016

BTW, Cluster Autoscaler is not driven by metrics but rather by the real need for a new node because some pods cannot schedule.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.