Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to generate AWS EC2 Instance Types: unable to load EC2 Instance Type list #3216

Closed
mrichman opened this issue Jun 9, 2020 · 9 comments · Fixed by lablabs/terraform-aws-eks-cluster-autoscaler#11

Comments

@mrichman
Copy link

mrichman commented Jun 9, 2020

I'm not sure if my worker node role is missing a permission:

aws_cloud_provider.go:363] Failed to generate AWS EC2 Instance Types: unable to load EC2 Instance Type list

I currently have the following applied:

  • AmazonEKSWorkerNodePolicy
  • AmazonEC2ContainerRegistryReadOnly
  • AmazonEKS_CNI_Policy

and this inline policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "autoscaling:DescribeAutoScalingGroups",
                "autoscaling:DescribeAutoScalingInstances",
                "autoscaling:DescribeTags",
                "autoscaling:SetDesiredCapacity",
                "autoscaling:TerminateInstanceInAutoScalingGroup"
            ],
            "Resource": "*"
        }
    ]
}
@gjtempleton
Copy link
Member

The code path that currently fails is actually using the AWS Pricing API over HTTP, here (that's on master as you've not mentioned which version of the CA you're using):
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/aws_util.go#L55:1

@mrichman
Copy link
Author

I updated to CA 1.16.5 and added these additional permissions and it's now working:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "autoscaling:DescribeAutoScalingGroups",
                "autoscaling:DescribeAutoScalingInstances",
                "autoscaling:DescribeInstances",
                "autoscaling:DescribeLaunchConfigurations",
                "autoscaling:DescribeTags",
                "autoscaling:SetDesiredCapacity",
                "autoscaling:TerminateInstanceInAutoScalingGroup",
                "ec2:DescribeLaunchTemplateVersions",
                "ec2:DescribeInstanceTypes"
            ],
            "Resource": "*"
        }
    ]
}

@git-heera
Copy link

My deployment failing with below error. Any particular reason it is going to pricing.us-east-1.amazonaws.com?

W0806 10:18:39.046022 1 aws_util.go:71] Error fetching https://pricing.us-east-1.amazonaws.com/offers/v1.0/aws/AmazonEC2/current/ap-east-1/index.json skipping...
F0806 10:18:39.046052 1 aws_cloud_provider.go:358] Failed to generate AWS EC2 Instance Types: unable to load EC2 Instance Type list

Please advise

@gjtempleton
Copy link
Member

@git-heera that seems related to #3276 - as this issue's closed you're better moving discussion to there.

(The reason it's using the us-east-1 endpoint is here: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/aws_util.go#L35 )

@leonK-DI
Copy link

I updated to CA 1.16.5 and added these additional permissions and it's now working:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "autoscaling:DescribeAutoScalingGroups",
                "autoscaling:DescribeAutoScalingInstances",
                "autoscaling:DescribeInstances",
                "autoscaling:DescribeLaunchConfigurations",
                "autoscaling:DescribeTags",
                "autoscaling:SetDesiredCapacity",
                "autoscaling:TerminateInstanceInAutoScalingGroup",
                "ec2:DescribeLaunchTemplateVersions",
                "ec2:DescribeInstanceTypes"
            ],
            "Resource": "*"
        }
    ]
}

This should be added to the official documentation: https://github.com/kubernetes/autoscaler/tree/master/charts/cluster-autoscaler#aws---iam right?

@gjtempleton
Copy link
Member

This should be added to the official documentation: https://github.com/kubernetes/autoscaler/tree/master/charts/cluster-autoscaler#aws---iam right?

Huh, that's a good point, I've currently raised #4670 to rework the IAM docs elsewhere as I'd forgotten we also documented these permissions under the chart's README.

My suspicion would be that we're better keeping the docs in one place and linking to that single source of truth from the chart's README rather than trying to keep the two always in sync and up to date. Thoughts?

@stijnbrouwers
Copy link

Same here.
I updated from 9.9.2 to 9.15.0 and suddenly, the pod wouldn't start anymore.
I had to add the last line "ec2:DescribeInstanceTypes" to my IAM role.
The full role for me now looks like this

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                       "autoscaling:DescribeAutoScalingGroups",
                        "autoscaling:DescribeAutoScalingInstances",
                        "autoscaling:DescribeLaunchConfigurations",
                        "autoscaling:DescribeTags",
                        "autoscaling:SetDesiredCapacity",
                        "autoscaling:TerminateInstanceInAutoScalingGroup",
                        "ec2:DescribeLaunchTemplateVersions",
                        "ec2:DescribeInstanceTypes"
            ],
            "Resource": "*"
        }
    ]
}

@gjtempleton
Copy link
Member

Hi @leonK-DI and @stijnbrouwers, thanks for raising this. I've raised #4701 to update the chart's README to point to the other IAM docs in this repo, you can see my reasoning there, but would be great to get any feedback that PR from both of you as users who encountered this.

@rishabh-jain
Copy link

rishabh-jain commented Apr 17, 2023

From my understanding,

if anyone is experiencing this issue while deploying to the latest regions like (Zurich, Spain, Hyderabad), using older releases, please try with the latest release version - 1.24.1

The error is misleading, as it tries to describe the instance groups using the region, which is unknown in the previous releases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants