Cluster autoscaler barfs on ASG-based instance groups that specify `instanceRequirements` #15306

danports · 2023-04-08T16:43:45Z

/kind bug

1. What kops version are you running? The command kops version, will display
this information.
1.26.2

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.
1.26.3

3. What cloud provider are you using?
AWS

4. What commands did you run? What is the simplest way to reproduce this issue?
Deploy a cluster with autoscaling enabled and with an instance group that includes an instanceRequirements section, e.g.:

    instanceRequirements:
      cpu:
        min: "2"
        max: "4"
      memory:
        min: "4G"
        max: "8G"

5. What happened after the commands executed?
The cluster deploys just fine, but then the cluster autoscaler gets stuck and never actually autoscales anything:

E0408 01:37:29.084682       1 mixed_nodeinfos_processor.go:151] Unable to build proper template node for my-nice-asg: ASG "my-nice-asg" uses the unknown EC2 instance type ""
E0408 01:37:29.084470       1 aws_wrapper.go:727] Failed to query instance requirements for ASG my-nice-asg: UnauthorizedOperation: You are not authorized to perform this operation.

6. What did you expect to happen?
Cluster autoscaler shouldn't get stuck in an error loop.

7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

See snippet above.

8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.

No problems with the kOps output.

9. Anything else do we need to know?

It looks like the cluster autoscaler calls the EC2 API GetInstanceTypesFromInstanceRequirements to figure out which instance types an ASG includes, and the control plane role that kOps generates doesn't include that IAM permission - seems like it would be an easy fix to include that permission in the role.

The text was updated successfully, but these errors were encountered:

k8s-triage-robot · 2023-07-07T16:57:49Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

sillyfrog · 2023-07-11T23:24:10Z

/remove-lifecycle stale

k8s-triage-robot · 2024-01-24T06:57:06Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

sillyfrog · 2024-01-28T02:06:29Z

/remove-lifecycle stale

k8s-triage-robot · 2024-04-27T02:35:45Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-05-27T02:39:18Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Apr 8, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 7, 2023

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 11, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 24, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 28, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 27, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster autoscaler barfs on ASG-based instance groups that specify `instanceRequirements` #15306

Cluster autoscaler barfs on ASG-based instance groups that specify `instanceRequirements` #15306

danports commented Apr 8, 2023 •

edited

k8s-triage-robot commented Jul 7, 2023

sillyfrog commented Jul 11, 2023

k8s-triage-robot commented Jan 24, 2024

sillyfrog commented Jan 28, 2024

k8s-triage-robot commented Apr 27, 2024

k8s-triage-robot commented May 27, 2024

Cluster autoscaler barfs on ASG-based instance groups that specify instanceRequirements #15306

Cluster autoscaler barfs on ASG-based instance groups that specify instanceRequirements #15306

Comments

danports commented Apr 8, 2023 • edited

k8s-triage-robot commented Jul 7, 2023

sillyfrog commented Jul 11, 2023

k8s-triage-robot commented Jan 24, 2024

sillyfrog commented Jan 28, 2024

k8s-triage-robot commented Apr 27, 2024

k8s-triage-robot commented May 27, 2024

Cluster autoscaler barfs on ASG-based instance groups that specify `instanceRequirements` #15306

Cluster autoscaler barfs on ASG-based instance groups that specify `instanceRequirements` #15306

danports commented Apr 8, 2023 •

edited