Master IAMRolePolicy too long with long cluster names. #12606

BenWolstencroft · 2021-10-25T12:27:13Z

/kind bug

1. What kops version are you running?
1.21.2

2. What Kubernetes version are you running?
1.21.4

3. What cloud provider are you using?
AWS

4. What commands did you run? What is the simplest way to reproduce this issue?
kops update cluster --yes

5. What happened after the commands executed?
IAMRolePolicy/. Example error: error reading actual policy document: policy size was 11655. Policy cannot exceed 10240 bytes.

6. What did you expect to happen?
Update to succeed

7. Please provide your cluster manifest.

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: "2020-05-22T10:30:56Z"
  name: redacted
spec:
  api:
    dns: {}
    loadBalancer:
      class: Classic
      type: Internal
  authorization:
    rbac: {}
  awsLoadBalancerController:
    enabled: true
  certManager:
    defaultIssuer: redacted
    enabled: true
  channel: stable
  cloudLabels:
    BudgetCode: Kube
    ProjectCode: Kube-Testing-Standalone-Cluster
  cloudProvider: aws
  clusterAutoscaler:
    balanceSimilarNodeGroups: false
    cpuRequest: 100m
    enabled: true
    expander: least-waste
    memoryRequest: 300Mi
    newPodScaleUpDelay: 0s
    scaleDownDelayAfterAdd: 10m0s
    scaleDownUtilizationThreshold: "0.5"
    skipNodesWithLocalStorage: true
    skipNodesWithSystemPods: true
  configBase: s3://redacted/redacted
  etcdClusters:
  - cpuRequest: 200m
    etcdMembers:
    - instanceGroup: master-eu-west-1a
      name: a
    - instanceGroup: master-eu-west-1b
      name: b
    - instanceGroup: master-eu-west-1c
      name: c
    memoryRequest: 100Mi
    name: main
  - cpuRequest: 100m
    etcdMembers:
    - instanceGroup: master-eu-west-1a
      name: a
    - instanceGroup: master-eu-west-1b
      name: b
    - instanceGroup: master-eu-west-1c
      name: c
    memoryRequest: 100Mi
    name: events
  externalPolicies:
    master:
    - arn:aws:iam::redacted:policy/Kubernetes-Cluster-Systems-MasterNodePolicy-1DXUGOJ7A8T6E
    node:
    - arn:aws:iam::redacted:policy/Kubernetes-Cluster-Systems-NodePolicy-K9TNB85U5ELB
  iam:
    allowContainerRegistry: true
    legacy: false
  kubeAPIServer:
    oidcClientID: systems-kubernetes
    oidcGroupsClaim: groups
    oidcIssuerURL: https://redacted/
    oidcUsernameClaim: email
  kubelet:
    anonymousAuth: false
    authenticationTokenWebhook: true
    authorizationMode: Webhook
  kubernetesApiAccess:
  - 10.0.0.0/8
  kubernetesVersion: 1.21.4
  masterInternalName: api.internal.redacted
  masterPublicName: api.redacted
  metricsServer:
    enabled: true
  networkCIDR: 10.30.4.0/22
  networkID: redacted
  networking:
    calico:
      crossSubnet: true
  nodeTerminationHandler:
    enableSQSTerminationDraining: true
    enabled: true
    managedASGTag: aws-node-termination-handler/managed
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - 10.0.0.0/8
  sshKeyName: redacted
  subnets:
  - cidr: 10.30.4.0/23
    egress: External
    id: redacted
    name: eu-west-1a
    type: Private
    zone: eu-west-1a
  - cidr: 10.30.6.0/23
    egress: External
    id: redacted
    name: eu-west-1b
    type: Private
    zone: eu-west-1b
  topology:
    dns:
      type: Public
    masters: private
    nodes: private

9. Anything else do we need to know?

I believe that this is due to a combination of the number of addons we have enabled, and the length of the cluster name, the cluster name is included in many of the policies to limit the resources that the permissions are granted against.

We have an identical cluster with a shorter name (40 characters long) which works (vs 43 characters, which fails)

Maybe to prevent addons growing the base policy would be to separate policies for addons out onto their own policy?

The text was updated successfully, but these errors were encountered:

BenWolstencroft · 2021-10-25T12:31:36Z

Note: while this gives the same result as #12558 - it's not the same problem, we are not using additional inline permissions for our own policy additions, we use managed policies for this - our configuration is failing based purely on addons being turned on with a long cluster name

rifelpet · 2021-10-25T13:40:02Z

I agree we should put addons in their own policy(s). There is a limit of 10 (increasable to 20) attached IAM policies per IAM role so we'll need to be cognizant of that. We could start with all addons being in one separate policy which should be sufficient for now given that the control plane policy itself is fairly large.

IRSA is another valid workaround and IMO the solution we should be encouraging here, given that each (addon) service account has its own IAM role and policy.

olemarkus · 2021-10-25T17:55:02Z

This bug was filed against kops 1.21. Can you try 1.22.1?

BenWolstencroft · 2021-10-26T11:44:26Z

@olemarkus - same with 1.22.1

olemarkus · 2021-10-26T12:26:42Z

1.22 have a test that specifically captures this. The cluster name is not that long in that test, but the margin is fairly large.
The cluster spec used can be seen here: https://github.com/kubernetes/kops/blob/v1.22.1/tests/integration/update_cluster/many-addons/in-v1alpha2.yaml
and the resulting policy for master nodes can be seen here: https://github.com/kubernetes/kops/blob/v1.22.1/tests/integration/update_cluster/many-addons/data/aws_iam_role_policy_masters.minimal.example.com_policy

Using 1.22.1, it would be interesting how your policy differs from the one above. The one above is just shy of 8k, and the max policy size is 10k. So with regards to cluster name, there should be a decent margin indeed.

BenWolstencroft · 2021-10-26T13:16:31Z

@olemarkus - is there a way for me to get the resultant policy ? the dry run is showing me the diff of changes it's trying to make, but not the complete document

rifelpet · 2021-10-26T13:20:51Z

Your best bet might be to run kops update cluster --yes -v 9 and find the PutRolePolicy API call in the output. That should include the full document

olemarkus · 2021-10-26T13:26:29Z

Or run with terraform output, which would leave a policy locally with similar location and name

mattoz0 · 2021-11-04T21:45:46Z

I am also getting hit by this issue, the worst part is anytime i edit the cluster configuration. It doesn't change the error message. error reading actual policy document: policy size was 11224. Policy cannot exceed 10240 bytes.

It always seems to be 11224 bytes. The issue happens regardless of whether i have an inline policy or not.

olemarkus · 2021-11-05T07:13:05Z

Hey. Same as above, we'd need the generated policy to be able to investigate this further.

mattoz0 · 2021-11-09T06:06:00Z

Not sure if this is the correct policy, it doesn't seem to be 11224 bytes. i changed all the details in the policy so they don't reflect my actual setup. However i made sure to keep the same character count.

Seems like the most appropriate based on the error message:

error running task "IAMRolePolicy/masters.kubernetes.example1234.dev" (9m58s remaining to succeed): error reading actual policy document: policy size was 11224. Policy cannot exceed 10240 bytes.

This was exported using kops update cluster ${cluster} --yes --target terraform

aws_iam_role_policy_masters.kubernetes.example1234.dev_policy

{
  "Statement": [
    {
      "Action": "ec2:AttachVolume",
      "Condition": {
        "StringEquals": {
          "aws:ResourceTag/KubernetesCluster": "kubernetes.example1234.dev",
          "aws:ResourceTag/k8s.io/role/master": "1"
        }
      },
      "Effect": "Allow",
      "Resource": [
        "*"
      ]
    },
    {
      "Action": [
        "s3:Get*"
      ],
      "Effect": "Allow",
      "Resource": "arn:aws:s3:::kops-state-store-test/kubernetes.example1234.dev/*"
    },
    {
      "Action": [
        "s3:GetObject",
        "s3:DeleteObject",
        "s3:DeleteObjectVersion",
        "s3:PutObject"
      ],
      "Effect": "Allow",
      "Resource": "arn:aws:s3:::kops-state-store-test/kubernetes.example1234.dev/backups/etcd/main/*"
    },
    {
      "Action": [
        "s3:GetObject",
        "s3:DeleteObject",
        "s3:DeleteObjectVersion",
        "s3:PutObject"
      ],
      "Effect": "Allow",
      "Resource": "arn:aws:s3:::kops-state-store-test/kubernetes.example1234.dev/backups/etcd/events/*"
    },
    {
      "Action": [
        "s3:GetBucketLocation",
        "s3:GetEncryptionConfiguration",
        "s3:ListBucket",
        "s3:ListBucketVersions"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::kops-state-store-test"
      ]
    },
    {
      "Action": [
        "route53:ChangeResourceRecordSets",
        "route53:ListResourceRecordSets",
        "route53:GetHostedZone"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:route53:::hostedzone/Z00118143D7LEO5A8IZU4"
      ]
    },
    {
      "Action": [
        "route53:GetChange"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:route53:::change/*"
      ]
    },
    {
      "Action": [
        "route53:ListHostedZones",
        "route53:ListTagsForResource"
      ],
      "Effect": "Allow",
      "Resource": [
        "*"
      ]
    },
    {
      "Action": "ec2:CreateTags",
      "Condition": {
        "StringEquals": {
          "ec2:CreateAction": [
            "CreateVolume",
            "CreateSnapshot"
          ]
        }
      },
      "Effect": "Allow",
      "Resource": [
        "arn:aws:ec2:*:*:volume/*",
        "arn:aws:ec2:*:*:snapshot/*"
      ]
    },
    {
      "Action": "ec2:CreateTags",
      "Condition": {
        "StringEquals": {
          "ec2:CreateAction": [
            "CreateVolume",
            "CreateSnapshot"
          ]
        }
      },
      "Effect": "Allow",
      "Resource": [
        "arn:aws:ec2:*:*:volume/*",
        "arn:aws:ec2:*:*:snapshot/*"
      ]
    },
    {
      "Action": "ec2:DeleteTags",
      "Condition": {
        "StringEquals": {
          "aws:ResourceTag/KubernetesCluster": "kubernetes.example1234.dev"
        }
      },
      "Effect": "Allow",
      "Resource": [
        "arn:aws:ec2:*:*:volume/*",
        "arn:aws:ec2:*:*:snapshot/*"
      ]
    },
    {
      "Action": [
        "ec2:AuthorizeSecurityGroupIngress",
        "ec2:DeleteSecurityGroup",
        "ec2:RevokeSecurityGroupIngress",
        "elasticloadbalancing:ModifyTargetGroupAttributes",
        "elasticloadbalancing:ModifyRule",
        "elasticloadbalancing:DeleteRule",
        "elasticloadbalancing:AddTags",
        "elasticloadbalancing:RemoveTags"
      ],
      "Condition": {
        "StringEquals": {
          "aws:ResourceTag/elbv2.k8s.aws/cluster": "kubernetes.example1234.dev"
        }
      },
      "Effect": "Allow",
      "Resource": "*"
    },
    {
      "Action": [
        "autoscaling:DescribeAutoScalingGroups",
        "autoscaling:DescribeAutoScalingInstances",
        "autoscaling:DescribeLaunchConfigurations",
        "autoscaling:DescribeTags",
        "ec2:CreateSecurityGroup",
        "ec2:CreateTags",
        "ec2:DescribeAccountAttributes",
        "ec2:DescribeAvailabilityZones",
        "ec2:DescribeInstanceTypes",
        "ec2:DescribeInstances",
        "ec2:DescribeLaunchTemplateVersions",
        "ec2:DescribeNetworkInterfaces",
        "ec2:DescribeRegions",
        "ec2:DescribeRouteTables",
        "ec2:DescribeSecurityGroups",
        "ec2:DescribeSubnets",
        "ec2:DescribeTags",
        "ec2:DescribeVolumes",
        "ec2:DescribeVolumesModifications",
        "ec2:DescribeVpcs",
        "ec2:ModifyNetworkInterfaceAttribute",
        "ecr:BatchCheckLayerAvailability",
        "ecr:BatchGetImage",
        "ecr:DescribeRepositories",
        "ecr:GetAuthorizationToken",
        "ecr:GetDownloadUrlForLayer",
        "ecr:GetRepositoryPolicy",
        "ecr:ListImages",
        "elasticloadbalancing:CreateRule",
        "elasticloadbalancing:DescribeListenerCertificates",
        "elasticloadbalancing:DescribeListeners",
        "elasticloadbalancing:DescribeLoadBalancerAttributes",
        "elasticloadbalancing:DescribeLoadBalancerPolicies",
        "elasticloadbalancing:DescribeLoadBalancers",
        "elasticloadbalancing:DescribeRules",
        "elasticloadbalancing:DescribeTags",
        "elasticloadbalancing:DescribeTargetGroupAttributes",
        "elasticloadbalancing:DescribeTargetGroups",
        "elasticloadbalancing:DescribeTargetHealth",
        "iam:GetServerCertificate",
        "iam:ListServerCertificates",
        "kms:DescribeKey",
        "kms:GenerateRandom"
      ],
      "Effect": "Allow",
      "Resource": "*"
    },
    {
      "Action": [
        "autoscaling:SetDesiredCapacity",
        "autoscaling:TerminateInstanceInAutoScalingGroup",
        "ec2:AttachVolume",
        "ec2:AuthorizeSecurityGroupIngress",
        "ec2:DeleteRoute",
        "ec2:DeleteSecurityGroup",
        "ec2:DeleteVolume",
        "ec2:DetachVolume",
        "ec2:ModifyInstanceAttribute",
        "ec2:ModifyVolume",
        "ec2:RevokeSecurityGroupIngress",
        "elasticloadbalancing:AddTags",
        "elasticloadbalancing:ApplySecurityGroupsToLoadBalancer",
        "elasticloadbalancing:AttachLoadBalancerToSubnets",
        "elasticloadbalancing:ConfigureHealthCheck",
        "elasticloadbalancing:DeleteListener",
        "elasticloadbalancing:DeleteLoadBalancer",
        "elasticloadbalancing:DeleteLoadBalancerListeners",
        "elasticloadbalancing:DeleteTargetGroup",
        "elasticloadbalancing:DeregisterInstancesFromLoadBalancer",
        "elasticloadbalancing:DeregisterTargets",
        "elasticloadbalancing:DetachLoadBalancerFromSubnets",
        "elasticloadbalancing:ModifyListener",
        "elasticloadbalancing:ModifyLoadBalancerAttributes",
        "elasticloadbalancing:ModifyTargetGroup",
        "elasticloadbalancing:RegisterInstancesWithLoadBalancer",
        "elasticloadbalancing:RegisterTargets",
        "elasticloadbalancing:SetLoadBalancerPoliciesForBackendServer",
        "elasticloadbalancing:SetLoadBalancerPoliciesOfListener"
      ],
      "Condition": {
        "StringEquals": {
          "aws:ResourceTag/KubernetesCluster": "kubernetes.example1234.dev"
        }
      },
      "Effect": "Allow",
      "Resource": "*"
    },
    {
      "Action": [
        "ec2:CreateSecurityGroup",
        "ec2:CreateVolume",
        "elasticloadbalancing:CreateListener",
        "elasticloadbalancing:CreateLoadBalancer",
        "elasticloadbalancing:CreateLoadBalancerListeners",
        "elasticloadbalancing:CreateLoadBalancerPolicy",
        "elasticloadbalancing:CreateTargetGroup"
      ],
      "Condition": {
        "StringEquals": {
          "aws:RequestTag/KubernetesCluster": "kubernetes.example1234.dev"
        }
      },
      "Effect": "Allow",
      "Resource": "*"
    }
  ],
  "Version": "2012-10-17"
}

BenWolstencroft · 2021-11-09T07:25:14Z

Hi, apologies for the slow response here, my kops update cluster --yes -v 9 output did not contain the term PutRolePolicy, I have uploaded the entire log output to the following gist (it's long):

https://gist.github.com/BenWolstencroft/bb15bc888c92893facd97006fad49c53

I've redacted as much sensitive information as I could find in the log.

BenWolstencroft · 2021-11-09T11:53:23Z

@olemarkus @rifelpet @mattoz0 - I've had some success here - looking through the logs it appears as though the issue is not when trying to write a new IAMRolePolicy, but when trying to read back the current one to establish the current state / generate a change!

I modified the contents of the current inline policy via the aws console to just have a single Action *, Resources * (dangerous i know, but i needed a policy that would work, and was short), then reran the kops update cluster --yes and it has now succeeded and overwritten the new, updated, correct policy (the same policy i get when i export for terraform)

rifelpet · 2021-11-09T13:34:28Z

Yes that lines up with the originally reported error message error reading actual policy document .... I have a theory this is due to kOps not ignoring white space when evaluating a policy document's size. I have a fix in #12700, if you're able to test that and confirm it works that would be great.

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Oct 25, 2021

BenWolstencroft changed the title ~~Kops Master IAMRolePolicy too long with long cluster names.~~ Master IAMRolePolicy too long with long cluster names. Oct 25, 2021

rifelpet mentioned this issue Nov 9, 2021

Ignore white space when validating IAM policy size limits #12700

Merged

k8s-ci-robot closed this as completed in #12700 Nov 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Master IAMRolePolicy too long with long cluster names. #12606

Master IAMRolePolicy too long with long cluster names. #12606

BenWolstencroft commented Oct 25, 2021 •

edited

Loading

BenWolstencroft commented Oct 25, 2021

rifelpet commented Oct 25, 2021

olemarkus commented Oct 25, 2021

BenWolstencroft commented Oct 26, 2021

olemarkus commented Oct 26, 2021

BenWolstencroft commented Oct 26, 2021

rifelpet commented Oct 26, 2021

olemarkus commented Oct 26, 2021

mattoz0 commented Nov 4, 2021

olemarkus commented Nov 5, 2021

mattoz0 commented Nov 9, 2021

BenWolstencroft commented Nov 9, 2021 •

edited

Loading

BenWolstencroft commented Nov 9, 2021 •

edited

Loading

rifelpet commented Nov 9, 2021

Master IAMRolePolicy too long with long cluster names. #12606

Master IAMRolePolicy too long with long cluster names. #12606

Comments

BenWolstencroft commented Oct 25, 2021 • edited Loading

BenWolstencroft commented Oct 25, 2021

rifelpet commented Oct 25, 2021

olemarkus commented Oct 25, 2021

BenWolstencroft commented Oct 26, 2021

olemarkus commented Oct 26, 2021

BenWolstencroft commented Oct 26, 2021

rifelpet commented Oct 26, 2021

olemarkus commented Oct 26, 2021

mattoz0 commented Nov 4, 2021

olemarkus commented Nov 5, 2021

mattoz0 commented Nov 9, 2021

BenWolstencroft commented Nov 9, 2021 • edited Loading

BenWolstencroft commented Nov 9, 2021 • edited Loading

rifelpet commented Nov 9, 2021

BenWolstencroft commented Oct 25, 2021 •

edited

Loading

BenWolstencroft commented Nov 9, 2021 •

edited

Loading

BenWolstencroft commented Nov 9, 2021 •

edited

Loading