Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster Autoscaler IAM policy lacks permission to list EC2 instance types #13520

Closed
seh opened this issue Apr 19, 2022 · 0 comments · Fixed by #13532
Closed

Cluster Autoscaler IAM policy lacks permission to list EC2 instance types #13520

seh opened this issue Apr 19, 2022 · 0 comments · Fixed by #13532
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@seh
Copy link
Contributor

seh commented Apr 19, 2022

1. What kops version are you running?

1.23.0 (git-a067cd7742a497a5c512762b9880664d865289f1)

2. What Kubernetes version are you running?

1.23.5

3. What cloud provider are you using?

AWS

4. What commands did you run? What is the simplest way to reproduce this issue?

Set the following fields in a Cluster manifest:

spec:
  clusterAutoscaler:
    enabled: true
  iam:
    useServiceAccountExternalPermissions: true
  serviceAccountIssuerDiscovery:
    enableAWSOIDCProvider: true

Given that, run the following:

kops replace --filename=cluster.yaml
kops update cluster --yes

5. What happened after the commands executed?

The "cluster-autoscaler" Deployment's pod spec includes the three environment variables required to use IRSA for its "cluster-autoscaler" ServiceAccount. For each of the cluster autoscaler's pods, the container starts, then dies immediately for lack of AWS IAM permission:

aws_cloud_provider.go:369] Failed to generate AWS EC2 Instance Types: UnauthorizedOperation: You are not authorized to perform this operation.

The related IAM role's policy lacks the "ec2:DescribeInstanceTypes" action. Per #12187, by default, we have the cluster autoscaler generate its set of EC2 instance types dynamically, and in that mode the cluster autoscaler documentation mentions granting the “ec2:DescribeInstanceTypes” permission when using the dynamic EC2 catalog survey.

6. What did you expect to happen?

kOps would recognize the intersection of using "external service account permissions" and the cluster autoscaler's dynamic EC2 catalog survey being enabled, and add the missing "ec2:DescribeInstanceTypes" action to the cluster autoscaler' dedicated IAM role's policy.

7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

cluster.yaml file
apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  name: redacted-cluster-name
spec:
  additionalSans:
  - api.redacted
  - api.internal.redacted
  api:
    loadBalancer:
      additionalSecurityGroups:
      - sg-04bfa48a96656906e
      class: Network
      crossZoneLoadBalancing: true
      type: Public
  authorization:
    rbac: {}
  awsLoadBalancerController:
    enabled: true
  certManager:
    defaultIssuer: not-sure-which
    enabled: true
  cloudConfig:
    awsEBSCSIDriver:
      enabled: true
    disableSecurityGroupIngress: true
  cloudProvider: aws
  clusterAutoscaler:
    balanceSimilarNodeGroups: true
    enabled: true
  configBase: s3://bucket-1/redacted-cluster-name
  etcdClusters:
  - etcdMembers:
    - instanceGroup: master-us-east-2a
      name: a
    - instanceGroup: master-us-east-2b
      name: b
    - instanceGroup: master-us-east-2c
      name: c
    manager:
      env:
      - name: ETCD_LISTEN_METRICS_URLS
        value: http://0.0.0.0:8081
      - name: ETCD_METRICS
        value: extensive
    name: main
  - etcdMembers:
    - instanceGroup: master-us-east-2a
      name: a
    - instanceGroup: master-us-east-2b
      name: b
    - instanceGroup: master-us-east-2c
      name: c
    manager:
      env:
      - name: ETCD_LISTEN_METRICS_URLS
        value: http://0.0.0.0:8082
      - name: ETCD_METRICS
        value: basic
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
    useServiceAccountExternalPermissions: true
  kubeAPIServer:
    featureGates:
      EphemeralContainers: "true"
  kubeProxy:
    enabled: false
  kubelet:
    anonymousAuth: false
    featureGates:
      EphemeralContainers: "true"
    kubeReserved:
      cpu: 750m
      memory: .75Gi
  kubernetesVersion: 1.23.5
  metricsServer:
    enabled: true
  networkCIDR: 10.3.0.0/16
  networkID: vpc-0b963d861ceaf3b17
  networking:
    calico:
      bpfEnabled: true
      crossSubnet: true
      encapsulationMode: vxlan
      typhaReplicas: 3
  nonMasqueradeCIDR: 100.64.0.0/10
  podIdentityWebhook:
    enabled: true
  serviceAccountIssuerDiscovery:
    discoveryStore: s3://bucket-2/redacted
    enableAWSOIDCProvider: true
  snapshotController:
    enabled: true
  sshAccess:
  - 18.223.165.172/32
  subnets:
  - cidr: 10.3.100.0/22
    id: subnet-0c828450b78705439
    name: utility-us-east-2a
    type: Utility
    zone: us-east-2a
  - cidr: 10.3.104.0/22
    id: subnet-0398d4a75a3888c0c
    name: utility-us-east-2b
    type: Utility
    zone: us-east-2b
  - cidr: 10.3.108.0/22
    id: subnet-058646b4f6fbd9929
    name: utility-us-east-2c
    type: Utility
    zone: us-east-2c
  - cidr: 10.3.0.0/22
    egress: nat-094e8b4023a0f8093
    id: subnet-08a9f946eb2814ee4
    name: us-east-2a
    type: Private
    zone: us-east-2a
  - cidr: 10.3.4.0/22
    egress: nat-026c73e6779288dc7
    id: subnet-0442a24d37181e2a3
    name: us-east-2b
    type: Private
    zone: us-east-2b
  - cidr: 10.3.8.0/22
    egress: nat-0e034de3e76de6d12
    id: subnet-03617a6eec6a5fbed
    name: us-east-2c
    type: Private
    zone: us-east-2c
  topology:
    dns:
      type: Public
    masters: private
    nodes: private

9. Anything else do we need to know?

I first brought up this problem in the "kops-users" channel of the "Kubernetes" Slack workspace.

I see that it's possible to disable the cluster autoscaler's dynamic EC2 instance type survey by setting the "spec.clusterAutoscaler.awsUseStaticInstanceList" field to true.

/kind bug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants