-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement DigitalOcean cloud provider #254
Comments
cc: @andrewsykim |
/assign andrewsykim |
@mwielgus thanks for the ping @klausenbusk feel free to start work on this issue! Here are some ideas I have so far:
Both these solutions should have ways to reference snapshots |
So what we need for creating a new droplet is:
That should be durable with a CRD, we then just need some nodeGroup -> CRD mapping logic. Every nodeGroup should have it own config, although it could inherit some default from a default CRD (like ssh keys and droplet size, even snapshot ID (snapshot name should work across regions I think)). @andrewsykim what do you think? |
You'll probably want user data too, but in general that seems like the right direction to me. I would even consider having a separate CRD for droplet and droplet groups, but that's an implementation detail to address later. |
Instead of using snapshot, we could let the autoscaler create the droplet and then let a external script initialize the droplet. I think it could make sense, as their is a "million ways" to setup k8s (bootkube, kubeadm to name a few).. Autoscaler -> IncreaseSize -> Create Droplet -> HTTP POST (ip, ssh key) to another pod.. |
Any updates anyone? I am using stack point's auto-scaler and it is working just fine. I am wondering if they are using a fork of this repo. I would appreciate some input on it and any updates on how to help. Thanks! |
@JorgeCeja They use a fork: https://github.com/StackPointCloud/autoscaler/tree/stackpointio/cluster-autoscaler/cloudprovider/spc which use SPC API to create/delete droplets. |
Nice, Thanks! I guess I'll be stuck with SPC until this gets resolved. In the meantime, I will give it a shot and see how far I can implement it. If it seems too long, I am willing to open a bounty! |
@JorgeCeja Did/have you encountered this kind of problem with Digital Ocean? Thanks |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
👍 would love to use this with my Rancher 2 cluster |
/remove-lifecycle rotten |
If someone is looking to implement this they will likely want to look at leveraging cluster-api. There is already a DigitalOcean provider available, and it makes scaling nodes trivial (e.g. you can run the command |
There is some effort to implement Cluster API support in CA: kubernetes/enhancements#609. The main issue as of now is the fact that CA absolutely needs to be able to delete a specific machine, not just scale down to a given number of replicas. There is an ongoing discussion on how to extend Cluster API to support this. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
cc @timoreimann |
Hi, I'm going to look into adding autoscaler support to DigitalOcean. Is there a way we can reopen this issue again? Just want to make sure people who follow this issue are getting updated at where we are. Thanks |
/reopen |
@andrewsykim: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
But now, instead of the proposed above solution of CRD for node template, you'll be doing it with node pools, right @fatih? If so, it would be nice to have node pools that have 0 nodes, but are configured to auto-scale, then using node labels and affinities it could know which pool to use... I currently have a use case that I need very powerful nodes for certain CI tasks that I don't want to have running all the time. |
@dave08 it'll be probably tightly integrated with our node-pools indeed. I'm still in investigating on how to implement it. I'll post here occasionally with my updates. If I have working version you'll be able to test it and then we can figure out what to improve on our end. |
By the way, I also think pools should probably also have minSize and maxSize when auto-scaling is enabled... @fatih |
@fatih When will this actually be released in our DOKS clusters? Does it depend on the k8s version deployed? Thanks alot for the work 👍 ! |
@dave08 We're now planning to incorporate this into our new base images. We're still working on it so I can't give a time right now.
Yes! Either that or you'll be able to install it for an existing cluster afterwards
We're planning to release it beginning with v1.15.x versions. It's still in the early phases so we don't know how it'll look like in the end. We're going to update this issue or let people know once it's finished. |
@dave08 we are going to use digitalocean/DOKS#5 to track the integration effort. Feel free to subscribe to that issue to be notified of any progress made. |
…ve-clusterqueue feat: filter inactive clusterQueues in scheduling
I'm not exactly sure how to implement this, but I think the easiest way would be creating droplet from a snapshot already configured to join the existing cluster.
The text was updated successfully, but these errors were encountered: