Feature request: dynamic cluster addon resizer #13048

Closed
a-robinson opened this Issue Aug 21, 2015 · 24 comments

Comments

Projects
None yet
@a-robinson
Member

a-robinson commented Aug 21, 2015

People create clusters at a wide range of sizes. People then resize clusters to a wide range of sizes. Our cluster addons are configured statically, and do not respond to such changes in size. This causes problems in some cases:

  1. The default addons don't all fit on sufficiently small clusters (e.g. create a cluster with a single f1-micro on GCE, prepare to feel the pain)
  2. The default addons waste a large proportion of resources on small clusters -- does my two node cluster 300MiB for heapster? Will it ever need more than one DNS pod?
  3. The default addons don't scale properly to large clusters - heapster with 300MiB of memory isn't going to cut it on even some medium sized clusters, let alone clusters with a hundred nodes (example). More than one DNS pod will be useful for availability.

Some really simple control logic should be able to make the situation better by listing the nodes in the cluster and updating the addon RCs using a few basic rules. It could be part of the node controller, or be a very small container that runs on the master or even in the user's cluster (but it better be very small to justify its benefits).

@roberthbailey @zmerlynn

@davidopp

This comment has been minimized.

Show comment Hide comment
@davidopp

davidopp Aug 21, 2015

Member

It sounds like there are two problems
(1) We do not take cluster size into account when setting vertical and horizontal dimensions of the cluster addons at cluster creation time
(2) Even if we did (1), people might resize their cluster later, requiring us to resize the addons

I think (1) is more important than (2) right now, but (2) may become more important if people start using cluster autoscaling. (1) seems like something we should work on soon, while (2) could be deferred unless people are already doing a lot of manual cluster resizing.

@dchen1107 @yujuhong

Member

davidopp commented Aug 21, 2015

It sounds like there are two problems
(1) We do not take cluster size into account when setting vertical and horizontal dimensions of the cluster addons at cluster creation time
(2) Even if we did (1), people might resize their cluster later, requiring us to resize the addons

I think (1) is more important than (2) right now, but (2) may become more important if people start using cluster autoscaling. (1) seems like something we should work on soon, while (2) could be deferred unless people are already doing a lot of manual cluster resizing.

@dchen1107 @yujuhong

@a-robinson

This comment has been minimized.

Show comment Hide comment
@a-robinson

a-robinson Aug 21, 2015

Member

Yes, that's true.

Although on the other hand, solving (2) also solves (1), and isn't much more difficult if we're already coming with logic for determining at which size cutoffs we do different things.

Member

a-robinson commented Aug 21, 2015

Yes, that's true.

Although on the other hand, solving (2) also solves (1), and isn't much more difficult if we're already coming with logic for determining at which size cutoffs we do different things.

@dchen1107

This comment has been minimized.

Show comment Hide comment
@dchen1107

dchen1107 Aug 22, 2015

Member

Yes, I brought this up to @roberthbailey and @zmerlynn before for 1.0 release, and pushed #7046 to 1.0 milestone. But there is no dynamic resizing. We need to address this issue soon to meeting our scalability goal.

Member

dchen1107 commented Aug 22, 2015

Yes, I brought this up to @roberthbailey and @zmerlynn before for 1.0 release, and pushed #7046 to 1.0 milestone. But there is no dynamic resizing. We need to address this issue soon to meeting our scalability goal.

@davidopp

This comment has been minimized.

Show comment Hide comment
@davidopp

davidopp Aug 22, 2015

Member

@dchen1107 #7046 looks more like it's about upgrading binary version of an addon, not changing number of replicas or resource request. But I agree they are somewhat related.

@a-robinson It seems like solving (1) simpler, i.e. it could be done statically in the setup scripts, whereas (2) requires a continuously-running control loop. (Unless I am misunderstanding.)

Member

davidopp commented Aug 22, 2015

@dchen1107 #7046 looks more like it's about upgrading binary version of an addon, not changing number of replicas or resource request. But I agree they are somewhat related.

@a-robinson It seems like solving (1) simpler, i.e. it could be done statically in the setup scripts, whereas (2) requires a continuously-running control loop. (Unless I am misunderstanding.)

@yujuhong

This comment has been minimized.

Show comment Hide comment
@yujuhong

yujuhong Aug 22, 2015

Contributor

Even if we do scale the addons based on the cluster size, there'd be cases where they still don't fit in the cluster. We need a minimum requirement spec for a cluster.

Contributor

yujuhong commented Aug 22, 2015

Even if we do scale the addons based on the cluster size, there'd be cases where they still don't fit in the cluster. We need a minimum requirement spec for a cluster.

@a-robinson

This comment has been minimized.

Show comment Hide comment
@a-robinson

a-robinson Aug 24, 2015

Member

@davidopp your understanding matches mine, I think we just have different expectations around how much work is involved in putting the logic into a control loop.

@yujuhong I checked into this for GKE, so in case it helps with a more general spec, this is the current state of our addon / system component resource usage:
CPU:
DNS: 100m + 100m + 100m + 10m = 310m
UI: 100m
Heapster: 100m
Fluentd: 100m per node
Kubelet: ??? per node
Docker: ??? per node
Memory:
DNS: 50Mi + 50Mi + 50Mi + 20Mi = 170Mi
UI: 50Mi
Heapster: 300Mi
Fluentd: 200Mi per node
Kubelet: 70Mi per node (but not actually limited by a cgroup AFAIK)
Docker: 30Mi per node (but not actually limited by a cgroup AFAIK)

On GCE, a single g1-small can handle all of this with a not unreasonably tiny amount of room to spare, but f1-micros don't really work unless you have at least three of them. All larger instance types are fine.

Member

a-robinson commented Aug 24, 2015

@davidopp your understanding matches mine, I think we just have different expectations around how much work is involved in putting the logic into a control loop.

@yujuhong I checked into this for GKE, so in case it helps with a more general spec, this is the current state of our addon / system component resource usage:
CPU:
DNS: 100m + 100m + 100m + 10m = 310m
UI: 100m
Heapster: 100m
Fluentd: 100m per node
Kubelet: ??? per node
Docker: ??? per node
Memory:
DNS: 50Mi + 50Mi + 50Mi + 20Mi = 170Mi
UI: 50Mi
Heapster: 300Mi
Fluentd: 200Mi per node
Kubelet: 70Mi per node (but not actually limited by a cgroup AFAIK)
Docker: 30Mi per node (but not actually limited by a cgroup AFAIK)

On GCE, a single g1-small can handle all of this with a not unreasonably tiny amount of room to spare, but f1-micros don't really work unless you have at least three of them. All larger instance types are fine.

@piosz

This comment has been minimized.

Show comment Hide comment
@piosz

piosz Aug 27, 2015

Member
@derekwaynecarr

This comment has been minimized.

Show comment Hide comment
@a-robinson

This comment has been minimized.

Show comment Hide comment
@a-robinson

a-robinson Oct 1, 2015

Member

We don't necessarily need dynamic sizing, but I'm bumping this to p1 to at least have smarter sizing on startup since I've now personally had to help multiple customers having issues with this

Member

a-robinson commented Oct 1, 2015

We don't necessarily need dynamic sizing, but I'm bumping this to p1 to at least have smarter sizing on startup since I've now personally had to help multiple customers having issues with this

@a-robinson

This comment has been minimized.

Show comment Hide comment
@a-robinson

a-robinson Oct 1, 2015

Member

Specifically referring to heapster OOMing, that is

Member

a-robinson commented Oct 1, 2015

Specifically referring to heapster OOMing, that is

@a-robinson a-robinson added this to the v1.1-candidate milestone Oct 2, 2015

@bgrant0607-nocc bgrant0607-nocc modified the milestones: v1.1-candidate, v1.1 Oct 5, 2015

@bgrant0607

This comment has been minimized.

Show comment Hide comment
@bgrant0607

bgrant0607 Oct 15, 2015

Member

If this is for 1.1, it needs to be P0 at this point. Should it be?

Member

bgrant0607 commented Oct 15, 2015

If this is for 1.1, it needs to be P0 at this point. Should it be?

@a-robinson

This comment has been minimized.

Show comment Hide comment
@a-robinson

a-robinson Oct 15, 2015

Member

It isn't a true blocker for 1.1, so I'll remove it from the milestone. It would be a very nice-to-have to help out the many customers that have run into problems like kubernetes/heapster#632, though.

I'm OOO most of the next couple weeks, but if anyone else has cycles to do something here, it'd be a nice addition to 1.1. Taking care of #15716 in the context of large clusters is probably enough.

Member

a-robinson commented Oct 15, 2015

It isn't a true blocker for 1.1, so I'll remove it from the milestone. It would be a very nice-to-have to help out the many customers that have run into problems like kubernetes/heapster#632, though.

I'm OOO most of the next couple weeks, but if anyone else has cycles to do something here, it'd be a nice addition to 1.1. Taking care of #15716 in the context of large clusters is probably enough.

@a-robinson a-robinson removed this from the v1.1 milestone Oct 15, 2015

@davidopp davidopp self-assigned this Oct 15, 2015

@davidopp

This comment has been minimized.

Show comment Hide comment
@davidopp

davidopp Oct 15, 2015

Member

I'll assign it to myself to make sure we don't lose track of it. Will also add to v1.2-candidate

Member

davidopp commented Oct 15, 2015

I'll assign it to myself to make sure we don't lose track of it. Will also add to v1.2-candidate

@davidopp davidopp added this to the v1.2-candidate milestone Oct 15, 2015

@piosz

This comment has been minimized.

Show comment Hide comment
@piosz

piosz Oct 16, 2015

Member
Member

piosz commented Oct 16, 2015

@davidopp

This comment has been minimized.

Show comment Hide comment
@davidopp

davidopp Oct 21, 2015

Member
Member

davidopp commented Oct 21, 2015

@davidopp davidopp modified the milestones: v1.2-candidate, v1.2 Oct 21, 2015

@davidopp

This comment has been minimized.

Show comment Hide comment
@davidopp

davidopp Oct 21, 2015

Member

@brendandburns is going to implement (1) from
#13048 (comment)
for 1.1. But since this issue is titled "dynamic cluster addon resizer" let's leave this issue about that, and leave this issue for 1.2. Static cluster addon sizer can be covered by #15716

Member

davidopp commented Oct 21, 2015

@brendandburns is going to implement (1) from
#13048 (comment)
for 1.1. But since this issue is titled "dynamic cluster addon resizer" let's leave this issue about that, and leave this issue for 1.2. Static cluster addon sizer can be covered by #15716

@davidopp

This comment has been minimized.

Show comment Hide comment
@davidopp

davidopp Dec 15, 2015

Member

@mikedanese mentions this will be easier once the addons are managed by Deployment.

Member

davidopp commented Dec 15, 2015

@mikedanese mentions this will be easier once the addons are managed by Deployment.

@roberthbailey

This comment has been minimized.

Show comment Hide comment
@roberthbailey

roberthbailey Dec 16, 2015

Member

@brendandburns and I discussed an alternative, which is to make (at least some of) the common system pods auto-size themselves. Heapster, for instance, is collecting metrics about the cluster, so it should have a pretty good idea about how many nodes / pods / etc exist and need to be monitored. It could decide that it needs to scale itself up/down, change it's pod definition, and then reschedule itself. This may not work as well if we move to a sharded model, but for the current singleton model it would allow us to create a solution for the addon that needs the most tuning without needing to solve the generic problem.

Member

roberthbailey commented Dec 16, 2015

@brendandburns and I discussed an alternative, which is to make (at least some of) the common system pods auto-size themselves. Heapster, for instance, is collecting metrics about the cluster, so it should have a pretty good idea about how many nodes / pods / etc exist and need to be monitored. It could decide that it needs to scale itself up/down, change it's pod definition, and then reschedule itself. This may not work as well if we move to a sharded model, but for the current singleton model it would allow us to create a solution for the addon that needs the most tuning without needing to solve the generic problem.

@vishh

This comment has been minimized.

Show comment Hide comment
@vishh

vishh Dec 17, 2015

Member

@roberthbailey: What if these system pods go pending after triggering a re-schedule event? There is currently no means to define priority.

Member

vishh commented Dec 17, 2015

@roberthbailey: What if these system pods go pending after triggering a re-schedule event? There is currently no means to define priority.

@roberthbailey

This comment has been minimized.

Show comment Hide comment
@roberthbailey

roberthbailey Dec 18, 2015

Member

True, that would be an issue.

Member

roberthbailey commented Dec 18, 2015

True, that would be an issue.

@a-robinson

This comment has been minimized.

Show comment Hide comment
@a-robinson

a-robinson Jan 20, 2016

Member

Is this still targeted for 1.2? Heapster being too small after a cluster has had more nodes added has been causing issues for customers.

Member

a-robinson commented Jan 20, 2016

Is this still targeted for 1.2? Heapster being too small after a cluster has had more nodes added has been causing issues for customers.

@davidopp

This comment has been minimized.

Show comment Hide comment
@davidopp

davidopp Jan 20, 2016

Member

No. This never appeared on any of the lists of features needed for 1.2. My bad for not removing the label; if this was giving the (rather reasonably-interpreted) impression that we were implementing it for 1.2, my apologies. Probably we need to make a pass over all the issues tagged as 1.2 and make sure the label syncs up with reality, as I think this may not be the only one that is out of sync.

Member

davidopp commented Jan 20, 2016

No. This never appeared on any of the lists of features needed for 1.2. My bad for not removing the label; if this was giving the (rather reasonably-interpreted) impression that we were implementing it for 1.2, my apologies. Probably we need to make a pass over all the issues tagged as 1.2 and make sure the label syncs up with reality, as I think this may not be the only one that is out of sync.

@roberthbailey

This comment has been minimized.

Show comment Hide comment
@roberthbailey

roberthbailey Apr 21, 2017

Member

@piosz - have we addressed this concern for heapster?

Member

roberthbailey commented Apr 21, 2017

@piosz - have we addressed this concern for heapster?

@piosz

This comment has been minimized.

Show comment Hide comment
@piosz

piosz Apr 27, 2017

Member

Yes, there is addon-resizer implemented by @Q-Lee.

Member

piosz commented Apr 27, 2017

Yes, there is addon-resizer implemented by @Q-Lee.

@piosz piosz closed this Apr 27, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment