-
Notifications
You must be signed in to change notification settings - Fork 39.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: dynamic cluster addon resizer #13048
Comments
It sounds like there are two problems I think (1) is more important than (2) right now, but (2) may become more important if people start using cluster autoscaling. (1) seems like something we should work on soon, while (2) could be deferred unless people are already doing a lot of manual cluster resizing. |
Yes, that's true. Although on the other hand, solving (2) also solves (1), and isn't much more difficult if we're already coming with logic for determining at which size cutoffs we do different things. |
Yes, I brought this up to @roberthbailey and @zmerlynn before for 1.0 release, and pushed #7046 to 1.0 milestone. But there is no dynamic resizing. We need to address this issue soon to meeting our scalability goal. |
@dchen1107 #7046 looks more like it's about upgrading binary version of an addon, not changing number of replicas or resource request. But I agree they are somewhat related. @a-robinson It seems like solving (1) simpler, i.e. it could be done statically in the setup scripts, whereas (2) requires a continuously-running control loop. (Unless I am misunderstanding.) |
Even if we do scale the addons based on the cluster size, there'd be cases where they still don't fit in the cluster. We need a minimum requirement spec for a cluster. |
@davidopp your understanding matches mine, I think we just have different expectations around how much work is involved in putting the logic into a control loop. @yujuhong I checked into this for GKE, so in case it helps with a more general spec, this is the current state of our addon / system component resource usage: On GCE, a single g1-small can handle all of this with a not unreasonably tiny amount of room to spare, but f1-micros don't really work unless you have at least three of them. All larger instance types are fine. |
cc @kubernetes/rh-cluster-infra @derekwaynecarr |
We don't necessarily need dynamic sizing, but I'm bumping this to p1 to at least have smarter sizing on startup since I've now personally had to help multiple customers having issues with this |
Specifically referring to heapster OOMing, that is |
If this is for 1.1, it needs to be P0 at this point. Should it be? |
It isn't a true blocker for 1.1, so I'll remove it from the milestone. It would be a very nice-to-have to help out the many customers that have run into problems like kubernetes-retired/heapster#632, though. I'm OOO most of the next couple weeks, but if anyone else has cycles to do something here, it'd be a nice addition to 1.1. Taking care of #15716 in the context of large clusters is probably enough. |
I'll assign it to myself to make sure we don't lose track of it. Will also add to v1.2-candidate |
cc @marekbiskup |
@brendandburns is going to implement (1) from |
@mikedanese mentions this will be easier once the addons are managed by Deployment. |
@brendandburns and I discussed an alternative, which is to make (at least some of) the common system pods auto-size themselves. Heapster, for instance, is collecting metrics about the cluster, so it should have a pretty good idea about how many nodes / pods / etc exist and need to be monitored. It could decide that it needs to scale itself up/down, change it's pod definition, and then reschedule itself. This may not work as well if we move to a sharded model, but for the current singleton model it would allow us to create a solution for the addon that needs the most tuning without needing to solve the generic problem. |
@roberthbailey: What if these system pods go pending after triggering a re-schedule event? There is currently no means to define priority. |
True, that would be an issue. |
Is this still targeted for 1.2? Heapster being too small after a cluster has had more nodes added has been causing issues for customers. |
No. This never appeared on any of the lists of features needed for 1.2. My bad for not removing the label; if this was giving the (rather reasonably-interpreted) impression that we were implementing it for 1.2, my apologies. Probably we need to make a pass over all the issues tagged as 1.2 and make sure the label syncs up with reality, as I think this may not be the only one that is out of sync. |
@piosz - have we addressed this concern for heapster? |
Yes, there is addon-resizer implemented by @Q-Lee. |
People create clusters at a wide range of sizes. People then resize clusters to a wide range of sizes. Our cluster addons are configured statically, and do not respond to such changes in size. This causes problems in some cases:
Some really simple control logic should be able to make the situation better by listing the nodes in the cluster and updating the addon RCs using a few basic rules. It could be part of the node controller, or be a very small container that runs on the master or even in the user's cluster (but it better be very small to justify its benefits).
@roberthbailey @zmerlynn
The text was updated successfully, but these errors were encountered: