AWS Cluster Autoscaler Permissions #113

pluttrell · 2017-06-09T23:24:05Z

Using v0.5.4 of the aws-cluster-autoscaler, we're getting this error:

E0609 23:20:59.162974       1 static_autoscaler.go:108] Failed to update node registry: Unable to get first autoscaling.Group for node-us-west-2a.dev.clusters.mydomain.io

It sure looks like a permission problem... But per the instructions, I have the following policy on my instance role named nodes.dev.clusters.mydomain.io:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "autoscaling:DescribeAutoScalingGroups",
                "autoscaling:DescribeAutoScalingInstances",
                "autoscaling:SetDesiredCapacity",
                "autoscaling:TerminateInstanceInAutoScalingGroup"
            ],
            "Resource": "*"
        }
    ]
}

Without this addition, I get a different error:

E0609 23:05:48.475214       1 static_autoscaler.go:108] Failed to update node registry: AccessDenied: User: arn:aws:sts::11111111111:assumed-role/nodes.dev.clusters.mydomain.io/i-0472257b3f8d4ec43 is not authorized to perform: autoscaling:DescribeAutoScalingGroups
	status code: 403, request id: 2cf17af0-4d68-11e7-825c-73c99354b20d

So we're thinking that we have the necessary permissions.

For reference here's our execution config:

./cluster-autoscaler
--cloud-provider=aws
--nodes=1:10:node-us-west-2a.dev.clusters.mydomain.io
--nodes=1:10:node-us-west-2b.dev.clusters.mydomain.io
--nodes=1:10:node-us-west-2c.dev.clusters.mydomain.io
--scale-down-delay=10m
--skip-nodes-with-local-storage=false
--skip-nodes-with-system-pods=true
--v=4

Any ideas on what to do?
Is there any strategy for debugging this?

The text was updated successfully, but these errors were encountered:

zaa · 2017-06-12T15:30:00Z

Judging by the code from https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/aws_manager.go#L114 it looks like you've passed an incorrect group name.

mwielgus · 2017-06-14T19:57:50Z

@pluttrell Was it a problem with the group name?

pluttrell · 2017-06-16T00:19:05Z

Nope, the group names were identical to what was in AWS.

We do however have the aws-cluster-autoscaler working perfectly with just using the kubernetes resource files directly without helm, so we've gone with that option for now.

mwielgus · 2017-06-16T00:50:01Z

Great :). Closing the bug.

7chenko · 2017-09-08T03:30:36Z

Getting a similar error, with kops 1.7.0, kubernetes 1.7.5, cluster-autoscaler 0.6.1, but only when trying to scale from 0 nodes. According to this, as of CA 0.6.1 I should be able to scale to/from 0. I'm getting errors like this:

E0908 03:18:13.511590       1 static_autoscaler.go:118] Failed to update node registry: RequestError: send request failed
caused by: Post https://autoscaling.us-west-2.amazonaws.com/: dial tcp: i/o timeout

Using a deployment similar to this one, and it works as long as there is at least 1 node up:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
  labels:
    app: cluster-autoscaler
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
    spec:
      containers:
        - image: gcr.io/google_containers/cluster-autoscaler:v0.6.1
          name: cluster-autoscaler
          resources:
            limits:
              cpu: 100m
              memory: 300Mi
            requests:
              cpu: 100m
              memory: 300Mi
          command:
            - ./cluster-autoscaler
            - --v=4
            - --stderrthreshold=info
            - --cloud-provider=aws
            - --skip-nodes-with-local-storage=false
            - --nodes=0:10:nodes.uswest2.metamoto.net
          env:
            - name: AWS_REGION
              value: us-west-2
          volumeMounts:
            - name: ssl-certs
              mountPath: /etc/ssl/certs/ca-certificates.crt
              readOnly: true
          imagePullPolicy: "Always"
      volumes:
        - name: ssl-certs
          hostPath:
            path: "/etc/ssl/certs/ca-certificates.crt"
      tolerations:
        - key: "node-role.kubernetes.io/master"
          effect: NoSchedule

7chenko · 2017-09-08T05:09:19Z

Figured this out, it was the fact that a kube-dns pod was not running on the master node. To run it, had to add the master toleration to the kube-dns deployment (same as with cluster-autoscaler deployment above). Once kube-dns was running on the master, autoscaler was able to use it to get ASG info from AWS and scale up from 0 nodes.

MrHohn · 2017-11-01T17:45:38Z

Curious does cluster-autoscaler depend on in-cluster DNS service? Probably not?

Instead of putting kube-dns on master, what about setting dnsPolicy: Default for cluster-autoscaler so that the name resolution does not go through kube-dns?

Using dnsPolicy: ClusterFirst on pods that run on master node might not work unless kube-proxy pod also runs on master (for Service VIP -> backend Pods routing), which isn't always true (e.g. in GCE kube-up it doesn't).

shiv9012 · 2018-03-14T05:53:57Z

@MrHohn @7chenko @StevenACoffman i have tried

running both cluster-autoscaler & kube-dns on master
using dnsPolicy: Default for cluster-autoscaler

Still im getting this error

Failed to update node registry: RequestError: send request failed
caused by: Post https://autoscaling.us-east-1.amazonaws.com/: dial tcp ...*:443: i/o timeout

Please suggest

MrHohn · 2018-04-09T16:46:05Z

Failed to update node registry: RequestError: send request failed
caused by: Post https://autoscaling.us-east-1.amazonaws.com/: dial tcp ...*:443: i/o timeout

This looks like a routing or firewall issue instead..

srossross-tableau · 2018-05-09T17:58:46Z

I'm getting the original error posted Failed to update node registry: Unable to get first autoscaling.Group nodes.public-prod.k8s.local

What steps can I take to debug and fix this?

srossross-tableau · 2018-05-09T18:36:40Z

I think that I have the correct AWS permissions to describe the autoscaling groups

If I exec into the cluster-autoscaler pod and install the aws cli. I can run:

aws --region us-west-2 autoscaling describe-auto-scaling-groups | grep nodes
            "AutoScalingGroupARN": "arn:aws:autoscaling:us-west-2:***:autoScalingGroup:****:autoScalingGroupName/nodes.public-prod.k8s.local",

aleksandra-malinowska · 2018-05-10T13:26:47Z

Briefly looking at the code, it seems that AWS returns no groups with this name. Based on the error message, method is called with correct group name.

I'm unable to replicate or debug it, but I guess if you get different results for requests made by Go library and command line tool, maintainers of those tools may be better able to help.

christopherhein · 2018-05-16T16:13:49Z

@srossross-tableau can you confirm that the original request is including the region like you have in the aws call from in the container?

You might need to make sure your env is set correctly.

env:
- name: AWS_REGION
  value: us-west-2

srossross-tableau · 2018-05-16T16:35:58Z

Thanks @christopherhein that was the issue.

dthomason · 2018-12-31T17:49:27Z

Curious does cluster-autoscaler depend on in-cluster DNS service? Probably not?

Instead of putting kube-dns on master, what about setting dnsPolicy: Default for cluster-autoscaler so that the name resolution does not go through kube-dns?

Using dnsPolicy: ClusterFirst on pods that run on master node might not work unless kube-proxy pod also runs on master (for Service VIP -> backend Pods routing), which isn't always true (e.g. in GCE kube-up it doesn't).

I tested this and feel this is the best approach. It keeps you from having to modify the kube-dns deployment while keeping your masters clean. Thanks!!

…-differences UPSTREAM: <carry>: openshift: add custom nodeset comparator

gazal-k · 2019-10-21T00:00:13Z

Curious does cluster-autoscaler depend on in-cluster DNS service? Probably not?
Instead of putting kube-dns on master, what about setting dnsPolicy: Default for cluster-autoscaler so that the name resolution does not go through kube-dns?
Using dnsPolicy: ClusterFirst on pods that run on master node might not work unless kube-proxy pod also runs on master (for Service VIP -> backend Pods routing), which isn't always true (e.g. in GCE kube-up it doesn't).

I tested this and feel this is the best approach. It keeps you from having to modify the kube-dns deployment while keeping your masters clean. Thanks!!

Setting dnsPolicy: Default worked for me too on EKS 1.13

waterdrops · 2020-03-24T07:51:56Z

Curious does cluster-autoscaler depend on in-cluster DNS service? Probably not?
Instead of putting kube-dns on master, what about setting dnsPolicy: Default for cluster-autoscaler so that the name resolution does not go through kube-dns?
Using dnsPolicy: ClusterFirst on pods that run on master node might not work unless kube-proxy pod also runs on master (for Service VIP -> backend Pods routing), which isn't always true (e.g. in GCE kube-up it doesn't).

I tested this and feel this is the best approach. It keeps you from having to modify the kube-dns deployment while keeping your masters clean. Thanks!!

Setting dnsPolicy: Default worked for me too on EKS 1.13

I met the same error on EKS 1.13, you helped me a lot, Thank you very much @gazal-k

mwielgus closed this as completed Jun 16, 2017

StevenACoffman mentioned this issue Nov 1, 2017

kube-dns: tolerate being scheduled on master node kubernetes/kubernetes#54945

Closed

aleksandra-malinowska added area/cluster-autoscaler area/provider/aws Issues or PRs related to aws provider labels May 10, 2018

christopherhein mentioned this issue Aug 21, 2018

REQUEST: New membership for @christopherhein kubernetes/org#24

Closed

6 tasks

ingvagabund pushed a commit to ingvagabund/autoscaler that referenced this issue Aug 26, 2019

Merge pull request kubernetes#113 from frobware/tolerate-small-memory…

a091531

…-differences UPSTREAM: <carry>: openshift: add custom nodeset comparator

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWS Cluster Autoscaler Permissions #113

AWS Cluster Autoscaler Permissions #113

pluttrell commented Jun 9, 2017

zaa commented Jun 12, 2017

mwielgus commented Jun 14, 2017

pluttrell commented Jun 16, 2017

mwielgus commented Jun 16, 2017

7chenko commented Sep 8, 2017

7chenko commented Sep 8, 2017

MrHohn commented Nov 1, 2017

shiv9012 commented Mar 14, 2018 •

edited

Loading

MrHohn commented Apr 9, 2018

srossross-tableau commented May 9, 2018 •

edited

Loading

srossross-tableau commented May 9, 2018

aleksandra-malinowska commented May 10, 2018

christopherhein commented May 16, 2018 •

edited

Loading

srossross-tableau commented May 16, 2018

dthomason commented Dec 31, 2018

gazal-k commented Oct 21, 2019

waterdrops commented Mar 24, 2020

AWS Cluster Autoscaler Permissions #113

AWS Cluster Autoscaler Permissions #113

Comments

pluttrell commented Jun 9, 2017

zaa commented Jun 12, 2017

mwielgus commented Jun 14, 2017

pluttrell commented Jun 16, 2017

mwielgus commented Jun 16, 2017

7chenko commented Sep 8, 2017

7chenko commented Sep 8, 2017

MrHohn commented Nov 1, 2017

shiv9012 commented Mar 14, 2018 • edited Loading

MrHohn commented Apr 9, 2018

srossross-tableau commented May 9, 2018 • edited Loading

srossross-tableau commented May 9, 2018

aleksandra-malinowska commented May 10, 2018

christopherhein commented May 16, 2018 • edited Loading

srossross-tableau commented May 16, 2018

dthomason commented Dec 31, 2018

gazal-k commented Oct 21, 2019

waterdrops commented Mar 24, 2020

shiv9012 commented Mar 14, 2018 •

edited

Loading

srossross-tableau commented May 9, 2018 •

edited

Loading

christopherhein commented May 16, 2018 •

edited

Loading