Re: cluster-autoscaler support #629

mumoshu · 2017-05-09T01:15:36Z

In #151, we've introduced the initial, incomplete, almost theoretical support for cluster-autoscaler. The situation has changed considerably since then - Now we can finally complete the cluster-autoscaler support.

Features

Automatically scale out a node group by adding node(s) when one or more pods in the group become unschedulable due to insufficient resource.
Automatically scales in a node group by removing node(s) that is safe to be done so. Basically, a node group is safe to be removed when it is not running a critical k8s component

kube-aws scope changes

CA is now deployed automatically
- More concretely, a k8s deployment for cluster-autoscaler is automatically created on a k8s cluster when cluster-autoscaler addon is enabled in cluster.yaml

Configuration changes

A valid cluster.yaml for a CA-enabled cluster would now look like:

# cluster-autoscaler addon is deployed and enabled to watch node groups so that it can scale in/out node group(s) that is registered as an autoscaling target.
addons:
  clusterAutoscaler:
    enabled: true
worker:
  nodePools:
  - name: scaled
    # Make this node pool an autoscaling target
    autoscaling:
      clusterAutoscaler:
        enabled: true
  - name: notScaled
    # This node pool is not an autoscaling target

The former experimental.clusterAutoscalerSupport.enabled is dropped for controller nodes in favor of addons.clusterAutoscaler.enabled
worker.nodePools[].clusterAutoscaler.minSize and maxSize are dropped in favor of worker.nodePools[].autoscaling.clustserAutoscaler.enabled and the auto discovery feature of cluster-autoscaler
worker.nodePools[].clusterAutoscalerSupport is kept as-is, but not necessarily be true because when you've enabled addons.clusterAutoscaler, kube-aws by default gives enough IAM permissions to only controller nodes and CA is scheduled there.
This work currently relies on the docker image built from a fork of cluster-autoscaler which supports the automatic node group discovery feature

cloud-config-controller changes

Add cluster-autoscaler deployment to be created when the CA addon is enabled

Go changes

The former ClusterAutoscalerImage is renamed to ClusterProportionalAutoscalerImage
Introduce ClusterAutoscalerImage(clusterAutoscalerImage in cluster.yaml) for the cluster-autoscaler docker image reference
ClusterAutoscalerSupport is no-op in controller nodes, used only for worker nodes. Addons.ClusterAutoscaler is used instead to give controller nodes appropriate IAM permissions and deploy CA to them.
Most of CA related types and funcs are moved from core/controlplane/config to the model package

IAM changes

autoscaling:DescribeTags is allowed in IAM to allow enabling the automatic node group discovery feature of cluster-autoscaler

Gotchas

Note that a node running cluster-autoscaler or kube-resources-autosave can not be scaled in:

09 05:29:21.523426       1 cluster.go:74] Fast evaluation: ip-10-0-0-68.ap-northeast-1.compute.internal for removal
I0509 05:29:21.523467       1 cluster.go:88] Fast evaluation: node ip-10-0-0-68.ap-northeast-1.compute.internal cannot be removed: non-deamons set, non-mirrored, kube-system pod present: cluster-autoscaler-998591511-thcpj
I0509 05:29:21.523479       1 cluster.go:74] Fast evaluation: ip-10-0-0-133.ap-northeast-1.compute.internal for removal
I0509 05:29:21.523488       1 cluster.go:103] Fast evaluation: node ip-10-0-0-133.ap-northeast-1.compute.internal may be removed
I0509 05:29:21.523493       1 cluster.go:74] Fast evaluation: ip-10-0-0-150.ap-northeast-1.compute.internal for removal
I0509 05:29:21.523530       1 cluster.go:88] Fast evaluation: node ip-10-0-0-150.ap-northeast-1.compute.internal cannot be removed: non-deamons set, non-mirrored, kube-system pod present: kube-resources-autosave-2845171460-1cbs5

mumoshu · 2017-05-09T01:22:51Z

I'm trying to complete the cluster-autoscaler support according to our roadmap for v0.9.7 https://github.com/kubernetes-incubator/kube-aws/blob/master/ROADMAP.md#v097

codecov-io · 2017-05-09T01:25:48Z

Codecov Report

Merging #629 into master will decrease coverage by 0.04%.
The diff coverage is 62.96%.

@@            Coverage Diff            @@
##           master    #629      +/-   ##
=========================================
- Coverage   37.14%   37.1%   -0.05%     
=========================================
  Files          51      52       +1     
  Lines        3201    3210       +9     
=========================================
+ Hits         1189    1191       +2     
- Misses       1836    1842       +6     
- Partials      176     177       +1

Impacted Files	Coverage Δ
model/controller.go	`0% <0%> (ø)`	⬆️
model/node_pool_config.go	`20.28% <0%> (ø)`	⬆️
model/node_labels.go	`0% <0%> (ø)`
core/controlplane/config/config.go	`57.09% <94.44%> (+0.66%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 910fbd0...166c34b. Read the comment docs.

mumoshu · 2017-05-09T05:32:00Z

This work depends on kubernetes/autoscaler#11

mumoshu · 2017-05-09T06:30:57Z

e2e/run

@@ -156,13 +163,17 @@ customize_cluster_yaml() {
 worker:
  nodePools:
  - name: asg1
+    clusterAutoscalerSupport:
+      enabled: true


clusterAutoscalerSupport is meant to provide enough permissions to call AWS APIs required to host cluster-autoscaler. Then, probably we should provide a nodeSelector for cluster-autoscaler so that we can ensure CA to be scheduled to nodes with the enough permissions?

Edit: And node labels correspond to clusterAutoscalerSupport.enabled accordingly to the nodeSelector?

We should add a validation which emits an error when there are no worker node pool or controller node whose clusterAutoscalerSupport is enabled.
Otherwise cluster-autoscaler can be unable to work/be scheduled due to missing IAM permissions.

can't we run CA on controller nodes? they already have elevated permissions, adding more wont make it much worse

@redbaron Yes, we can. kube-aws as of today already supports controller.clusterAutoscalerSupport.enabled to provide appropriate iam permissions to controller nodes. It would be a matter of just adding appropriate node labels and node selectors to stick cluster-autoscaler to whatever nodes(worker or controller) clusterAutoscalerSupport is enabled.

mumoshu · 2017-05-09T06:32:59Z

core/nodepool/config/templates/stack-template.json

@@ -145,6 +145,13 @@
        ],
        "MinSize": "{{.MinCount}}",
        "Tags": [
+          {{if gt .ClusterAutoscaler.MaxSize 0}}


Should be if .ClusterAutoscaler.Enabled

mumoshu · 2017-05-09T06:33:50Z

core/controlplane/config/templates/cloud-config-controller

+                    - name: AWS_REGION
+                      value: {{.Region}}
+                  volumeMounts:
+                    - name: ssl-certs


This volumeMount and the corresponding volume will be unnecessary once kubernetes/autoscaler#48 is merged to CA

The PR is merged

mumoshu · 2017-05-12T04:20:41Z

Assuming I have no control over when a docker image for cluster-autoscaler containing improvements on which this PR depends is released, I have opened https://github.com/kube-aws/autoscaler and https://quay.io/repository/kube-aws/cluster-autoscaler for hosting our own.

Update: This is how our docker image is built and released.

TAG=$(git rev-parse HEAD) REGISTRY=quay.io/kube-aws make release

And images can be found at https://quay.io/repository/kube-aws/cluster-autoscaler?tab=tags

mumoshu · 2017-05-12T04:21:34Z

kubernetes/autoscaler#11 is now merged but a docker image containing it isn't released yet.

Update: #629 (comment)

redbaron · 2017-05-19T06:44:21Z

core/controlplane/config/templates/stack-template.json

@@ -203,12 +203,13 @@
                  "Resource": [ "*" ]
                },
                {{end}}
-                {{if .Experimental.ClusterAutoscalerSupport.Enabled}}
+                {{if .Addons.ClusterAutoscaler.Enabled}}


{ "Effect": "Allow", "Action": [ "autoscaling:DescribeAutoScalingGroups", "autoscaling:DescribeAutoScalingInstances", "autoscaling:DescribeTags" ], "Resource": "*" }, { "Action": [ "autoscaling:SetDesiredCapacity", "autoscaling:TerminateInstanceInAutoScalingGroup" ], "Condition": { "Null": { "autoscaling:ResourceTag/kubernetes.io/cluster/{{.ClusterName}}": "false" } }, "Resource": "*", "Effect": "Allow" },

What is the Null condition? Could you point a doc for that for me?

Conditional keys supported by AWS Autoscaling:
http://docs.aws.amazon.com/autoscaling/latest/userguide/control-access-using-iam.html

Null condition:
http://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_elements.html#Conditions_Null

Thanks! So it permits an action only when the targeted resource has the tag? Great - I will incorporate it.

Done for controller nodes.
I'm still unsure if we'd want to support the customization to schedule CA to worker nodes #629 (comment)

redbaron · 2017-05-19T06:47:28Z

core/nodepool/config/templates/stack-template.json

@@ -361,6 +368,7 @@
                  "Action": [
                    "autoscaling:DescribeAutoScalingGroups",


I still don't understand, what is the benfit of running cluster autoscaler on worker nodes?

@redbaron Thanks, good question 👍
The bigger a k8s cluster is, the more CA's resource usage is.
So, I guess one wants a dedicated node pool with a single worker node which can be recreated easier than controller nodes and used to run CA which is possibly scaled-up/down by a vertical-pod-autoscaler.
That's why I tried to support running CA in worker nodes, even though it isn't the default setup.
What do you think? 😃

For an usual use-case, running CA in a controller node is recommended and that's the default setup.

On larger clusters you'd have large controller nodes too :) Anyway, I see the point and I guess adding this support isn't much of the work

@redbaron

On larger clusters you'd have large controller nodes too :)

Yea, that's certainly true!
Do you think we can just recommend users to make controller large enough, instead of complication cluster.yaml like this?

redbaron · 2017-05-19T06:51:48Z

model/node_labels.go

+	}
+	sort.Strings(keys)
+	for _, k := range keys {
+		v := l[k]


v := l[k] if len(v) > 0 { labels = append(labels, fmt.Sprintf("%s=%s", k, v)) } else { labels = append(labels, fmt.Sprintf("%s", k)) }

@redbaron Thanks!
Is a node label spec like mykey= is invalid?

no it isn't, but seeing args on a command line with empty = is a little confusing, at least for me. end result is the same

Thanks for the confirmation. Ok, I will make the suggested change 👍

mumoshu · 2017-05-22T00:45:23Z

I've updated the PR description accordingly to the current state of the work.

mumoshu · 2017-05-22T05:15:56Z

Added a cluster-autoscaling section in the doc

mumoshu · 2017-05-22T05:25:45Z

Testing E2E after merging with the master branch locally

n kubernetes-retired#151, we've introduced the initial, incomplete, almost theoretical support for cluster-autoscaler. The situation has changed considerably since then - Now we can finally complete the cluster-autoscaler support. * Automatically scale out a node group by adding node(s) when one or more pods in the group become unschedulable due to insufficient resource. * Automatically scales in a node group by removing node(s) that is safe to be done so. Basically, a node group is safe to be removed when it is not running a critical k8s component * CA is now deployed automatically * More concretely, a k8s deployment for cluster-autoscaler is automatically created on a k8s cluster when cluster-autoscaler addon is enabled in cluster.yaml A valid cluster.yaml for a CA-enabled cluster would now look like: ```yaml addons: clusterAutoscaler: enabled: true worker: nodePools: - name: scaled # Make this node pool an autoscaling target autoscaling: clusterAutoscaler: enabled: true - name: notScaled # This node pool is not an autoscaling target ``` * The former `experimental.clusterAutoscalerSupport.enabled` is dropped for controller nodes in favor of `addons.clusterAutoscaler.enabled` * `worker.nodePools[].clusterAutoscaler.minSize` and `maxSize` are dropped in favor of `worker.nodePools[].autoscaling.clustserAutoscaler.enabled` and the auto discovery feature of cluster-autoscaler * `worker.nodePools[].clusterAutoscalerSupport` is kept as-is, but not necessarily be `true` because when you've enabled `addons.clusterAutoscaler`, kube-aws by default gives enough IAM permissions to only controller nodes and CA is scheduled there. * This work currently relies on the docker image built from a fork of cluster-autoscaler which supports the automatic node group discovery feature * Add cluster-autoscaler deployment to be created when the CA addon is enabled * The former `ClusterAutoscalerImage` is renamed to `ClusterProportionalAutoscalerImage` * Introduce `ClusterAutoscalerImage`(`clusterAutoscalerImage` in cluster.yaml) for the cluster-autoscaler docker image reference * `ClusterAutoscalerSupport` is no-op in controller nodes, used only for worker nodes. `Addons.ClusterAutoscaler` is used instead to give controller nodes appropriate IAM permissions and deploy CA to them. * Most of CA related types and funcs are moved from `core/controlplane/config` to the `model` package * `autoscaling:DescribeTags` is allowed in IAM to allow enabling the automatic node group discovery feature of cluster-autoscaler Note that a node running cluster-autoscaler or kube-resources-autosave can not be scaled in: ``` 09 05:29:21.523426 1 cluster.go:74] Fast evaluation: ip-10-0-0-68.ap-northeast-1.compute.internal for removal I0509 05:29:21.523467 1 cluster.go:88] Fast evaluation: node ip-10-0-0-68.ap-northeast-1.compute.internal cannot be removed: non-deamons set, non-mirrored, kube-system pod present: cluster-autoscaler-998591511-thcpj I0509 05:29:21.523479 1 cluster.go:74] Fast evaluation: ip-10-0-0-133.ap-northeast-1.compute.internal for removal I0509 05:29:21.523488 1 cluster.go:103] Fast evaluation: node ip-10-0-0-133.ap-northeast-1.compute.internal may be removed I0509 05:29:21.523493 1 cluster.go:74] Fast evaluation: ip-10-0-0-150.ap-northeast-1.compute.internal for removal I0509 05:29:21.523530 1 cluster.go:88] Fast evaluation: node ip-10-0-0-150.ap-northeast-1.compute.internal cannot be removed: non-deamons set, non-mirrored, kube-system pod present: kube-resources-autosave-2845171460-1cbs5 ```

mumoshu · 2017-05-22T07:47:29Z

Whoa, all the tests have passed.

cknowles · 2017-05-30T08:29:24Z

cluster.yaml defaults could do with a bit of an update here, it's still listing config which is not read any more.

Vince-Cercury · 2017-10-11T04:57:44Z

Sorry to comment on a closed ticket (happy to open a new one if required).

What happens if the condition is missing for the IAM role:

"Condition": { "Null": { "autoscaling:ResourceTag/kubernetes.io/cluster/{{.ClusterName}}": "false" } },

I don't have much control over IAM roles and my roles are shared between all my clusters.
I tried using a wildcard / start but it is not accepted by AWS.
What would you recommend for a case where the role is shared between many clusters?
What happens if I omit the condition altogether? The upstream project does not have it.
Is there a more generic tag I should check instead of being specific to a cluster?

mumoshu · 2017-10-11T06:04:07Z

@Vincemd If it was missing for Describe* operations, it would be ok. I'm afraid you would end up leaking more autoscaling group names when one of your pods are hijacked by attackers but it won't be a huge problem anyway?

However, if it was missing for TerminateInstancesInAutoscalingGroups, CA can theoretically terminate EC2 instances outside of K8S clusters when e.g. CA had bug(s) result in doing so. #800 is an example of that.

In your case, my suggestion is to tag EC2 instances with something like "PartOfKubernetes": "true" and use it in the condition of the IAM policy. So that CA simply can't terminate EC2 instances not part of K8S cluster(s). I suppose it isn't the best thing to do - the best would be create IAM role per cluster - but better than nothing.

Does my explanation make sense? 😃

Vince-Cercury · 2017-10-11T10:16:02Z

@mumoshu thanks for the clear explanation. I got it.

Good news is that my production Kubernetes cluster will have its own IAM role, so I can be specific and use the conditions properly (can only terminate/setDesired for the prod cluster based on specific tag)

It's only my non prod clusters, which are non public facing and sharing same IAM role at the moment. I will work with a tag, as suggested.

Problem solved!

…oscaler Re: cluster-autoscaler support

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 9, 2017

mumoshu commented May 9, 2017

View reviewed changes

mumoshu added this to the backlog milestone May 12, 2017

mumoshu mentioned this pull request May 12, 2017

Feature: Cluster Autoscaling #148

Closed

2 tasks

mumoshu modified the milestones: wip, backlog, tbd, v0.9.7 May 12, 2017

mumoshu added this to To be implemented in v0.9.7 May 12, 2017

mumoshu force-pushed the re-cluster-autoscaler branch 3 times, most recently from 27e9c6a to f112823 Compare May 18, 2017 08:13

redbaron reviewed May 19, 2017

View reviewed changes

mumoshu modified the milestones: v0.9.7-rc.1, v0.9.7-rc.<tbd> May 22, 2017

mumoshu force-pushed the re-cluster-autoscaler branch from 8363215 to 166c34b Compare May 22, 2017 07:30

mumoshu changed the title ~~WIP: Re: cluster-autoscaler support~~ Re: cluster-autoscaler support May 22, 2017

mumoshu merged commit 0504707 into kubernetes-retired:master May 22, 2017

mumoshu deleted the re-cluster-autoscaler branch May 22, 2017 07:59

mumoshu mentioned this pull request May 24, 2017

Update ROADMAP.md #675

Merged

kylehodgetts pushed a commit to HotelsDotCom/kube-aws that referenced this pull request Mar 27, 2018

Merge pull request kubernetes-retired#629 from mumoshu/re-cluster-aut…

b985e08

…oscaler Re: cluster-autoscaler support

		@@ -361,6 +368,7 @@
		"Action": [
		"autoscaling:DescribeAutoScalingGroups",

Re: cluster-autoscaler support #629

Re: cluster-autoscaler support #629

Conversation

mumoshu commented May 9, 2017 • edited

Features

kube-aws scope changes

Configuration changes

cloud-config-controller changes

Go changes

IAM changes

Gotchas

mumoshu commented May 9, 2017

codecov-io commented May 9, 2017 • edited

Codecov Report

mumoshu commented May 9, 2017

mumoshu May 9, 2017 • edited

Choose a reason for hiding this comment

mumoshu May 9, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mumoshu commented May 12, 2017 • edited

mumoshu commented May 12, 2017 • edited

redbaron May 19, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mumoshu commented May 22, 2017

mumoshu commented May 22, 2017

mumoshu commented May 22, 2017

mumoshu commented May 22, 2017

cknowles commented May 30, 2017

Vince-Cercury commented Oct 11, 2017

mumoshu commented Oct 11, 2017

Vince-Cercury commented Oct 11, 2017

mumoshu commented May 9, 2017 •

edited

codecov-io commented May 9, 2017 •

edited

mumoshu May 9, 2017 •

edited

mumoshu May 9, 2017 •

edited

mumoshu commented May 12, 2017 •

edited

mumoshu commented May 12, 2017 •

edited

redbaron May 19, 2017 •

edited