Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement DigitalOcean cloud provider #254

Closed
klausenbusk opened this issue Aug 23, 2017 · 30 comments · Fixed by #2245
Closed

Implement DigitalOcean cloud provider #254

klausenbusk opened this issue Aug 23, 2017 · 30 comments · Fixed by #2245
Assignees
Labels
area/cluster-autoscaler kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@klausenbusk
Copy link

I'm not exactly sure how to implement this, but I think the easiest way would be creating droplet from a snapshot already configured to join the existing cluster.

@mwielgus
Copy link
Contributor

cc: @andrewsykim

@andrewsykim
Copy link
Member

/assign andrewsykim

@andrewsykim
Copy link
Member

@mwielgus thanks for the ping

@klausenbusk feel free to start work on this issue! Here are some ideas I have so far:

  • create a CRD (custom resource definition) that contains node info that you can use as a template
  • get node info off of existing droplets

Both these solutions should have ways to reference snapshots

@klausenbusk
Copy link
Author

So what we need for creating a new droplet is:

  • region
  • snapshot ID
  • droplet size
  • ssh keys
  • ipv6/private network enabled/disabled (??)

That should be durable with a CRD, we then just need some nodeGroup -> CRD mapping logic. Every nodeGroup should have it own config, although it could inherit some default from a default CRD (like ssh keys and droplet size, even snapshot ID (snapshot name should work across regions I think)).

@andrewsykim what do you think?

@andrewsykim
Copy link
Member

You'll probably want user data too, but in general that seems like the right direction to me. I would even consider having a separate CRD for droplet and droplet groups, but that's an implementation detail to address later.

@klausenbusk
Copy link
Author

Instead of using snapshot, we could let the autoscaler create the droplet and then let a external script initialize the droplet. I think it could make sense, as their is a "million ways" to setup k8s (bootkube, kubeadm to name a few)..

Autoscaler -> IncreaseSize -> Create Droplet -> HTTP POST (ip, ssh key) to another pod..

@JorgeCeja
Copy link

Any updates anyone? I am using stack point's auto-scaler and it is working just fine. I am wondering if they are using a fork of this repo. I would appreciate some input on it and any updates on how to help. Thanks!

@klausenbusk
Copy link
Author

@JorgeCeja They use a fork: https://github.com/StackPointCloud/autoscaler/tree/stackpointio/cluster-autoscaler/cloudprovider/spc which use SPC API to create/delete droplets.

@JorgeCeja
Copy link

Nice, Thanks! I guess I'll be stuck with SPC until this gets resolved. In the meantime, I will give it a shot and see how far I can implement it. If it seems too long, I am willing to open a bounty!

@igauravsehrawat
Copy link

@JorgeCeja
Quick question: Have you been able to scale up digital ocean using auto scaler? When I use SPC with auto scaler solution. I get an error during initialization Error installing node_autoscaler: Failed to set up autoscaler, cannot get machine specs.

Did/have you encountered this kind of problem with Digital Ocean?

Thanks

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. and removed enhancement labels Jun 5, 2018
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 3, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 3, 2018
@kamushadenes
Copy link
Contributor

👍 would love to use this with my Rancher 2 cluster

@kamushadenes
Copy link
Contributor

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Oct 24, 2018
@scruplelesswizard
Copy link

If someone is looking to implement this they will likely want to look at leveraging cluster-api. There is already a DigitalOcean provider available, and it makes scaling nodes trivial (e.g. you can run the command kubectl scale machineset <a machineset name> --replicas 5 to scale your cluster to 5 nodes).

@MaciekPytel
Copy link
Contributor

There is some effort to implement Cluster API support in CA: kubernetes/enhancements#609. The main issue as of now is the fact that CA absolutely needs to be able to delete a specific machine, not just scale down to a given number of replicas. There is an ongoing discussion on how to extend Cluster API to support this.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 27, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 26, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@andrewsykim
Copy link
Member

cc @timoreimann

@fatih
Copy link
Contributor

fatih commented Jul 16, 2019

Hi,

I'm going to look into adding autoscaler support to DigitalOcean. Is there a way we can reopen this issue again? Just want to make sure people who follow this issue are getting updated at where we are.

Thanks

@andrewsykim
Copy link
Member

/reopen

@k8s-ci-robot k8s-ci-robot reopened this Jul 16, 2019
@k8s-ci-robot
Copy link
Contributor

@andrewsykim: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dave08
Copy link

dave08 commented Jul 17, 2019

But now, instead of the proposed above solution of CRD for node template, you'll be doing it with node pools, right @fatih? If so, it would be nice to have node pools that have 0 nodes, but are configured to auto-scale, then using node labels and affinities it could know which pool to use... I currently have a use case that I need very powerful nodes for certain CI tasks that I don't want to have running all the time.

@fatih
Copy link
Contributor

fatih commented Jul 17, 2019

@dave08 it'll be probably tightly integrated with our node-pools indeed. I'm still in investigating on how to implement it. I'll post here occasionally with my updates. If I have working version you'll be able to test it and then we can figure out what to improve on our end.

@dave08
Copy link

dave08 commented Jul 17, 2019

By the way, I also think pools should probably also have minSize and maxSize when auto-scaling is enabled... @fatih

@dave08
Copy link

dave08 commented Aug 13, 2019

@fatih When will this actually be released in our DOKS clusters? Does it depend on the k8s version deployed? Thanks alot for the work 👍 !

@fatih
Copy link
Contributor

fatih commented Aug 13, 2019

@dave08 We're now planning to incorporate this into our new base images. We're still working on it so I can't give a time right now.

When will this actually be released in our DOKS clusters?

Yes! Either that or you'll be able to install it for an existing cluster afterwards

Does it depend on the k8s version deployed?

We're planning to release it beginning with v1.15.x versions. It's still in the early phases so we don't know how it'll look like in the end. We're going to update this issue or let people know once it's finished.

@timoreimann
Copy link
Contributor

@dave08 we are going to use digitalocean/DOKS#5 to track the integration effort. Feel free to subscribe to that issue to be notified of any progress made.

yaroslava-serdiuk pushed a commit to yaroslava-serdiuk/autoscaler that referenced this issue Feb 22, 2024
…ve-clusterqueue

feat: filter inactive clusterQueues in scheduling
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cluster-autoscaler kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

Successfully merging a pull request may close this issue.