Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NodeGroup create fails on iam:passRole when path is a boundary requirement. #2847

Closed
ChristopherMotesHalfaker opened this issue Nov 16, 2020 · 12 comments
Labels
kind/bug priority/important-soon Ideally to be resolved in time for the next release

Comments

@ChristopherMotesHalfaker
Copy link

ChristopherMotesHalfaker commented Nov 16, 2020

What happened?
Node Groups will not create when an IAM path is required for IAM:passRole. Since 0.30.0, the path section from instanceRoleARN is compressed prior creating the resource. Recently tested with 0.32.0-rc.0
Works successfully with 0.29.2

What you expected to happen?
Nodegroups create

How to reproduce it?

$ cat nodegroup.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: myCluster
  region: redacted
  tags:
    Environment: Sandbox

managedNodeGroups:
  - name: myNode
    privateNetworking: true
    instanceType: t3.medium
    minSize: 1
    maxSize: 4
    volumeSize: 20
    iam:
      instanceRoleARN: arn:partition:iam::AccountId:role/boundryRequirement/boundryRequirement-eks-managed-node
$ /tmp/eksctl create nodegroup -f nodegroup.yaml
[ℹ]  eksctl version 0.30.0
[ℹ]  using region <redacted>
[ℹ]  will use version 1.18 for new nodegroup(s) based on control plane version
[ℹ]  nodegroup "myNode" present in the given config, but missing in the cluster
[ℹ]  1 nodegroup (myNode) was included (based on the include/exclude rules)
[ℹ]  will create a CloudFormation stack for each of 1 managed nodegroups in cluster "mycluster"
[ℹ]  2 sequential tasks: { fix cluster compatibility, 1 task: { 1 task: { create managed nodegroup "myNode" } } }
[ℹ]  checking cluster stack for missing resources
[ℹ]  cluster stack has all required resources
[ℹ]  building managed nodegroup stack "eksctl-mycluster-nodegroup-myNode"
[ℹ]  deploying stack "eksctl-mycluster-nodegroup-myNode"
[✖]  unexpected status "ROLLBACK_IN_PROGRESS" while waiting for CloudFormation stack "eksctl-mycluster-nodegroup-myNode"
[ℹ]  fetching stack events in attempt to troubleshoot the root cause of the failure
[✖]  AWS::EKS::Nodegroup/ManagedNodeGroup: CREATE_FAILED – "User: arn:partition:sts::AccountId:assumed-role/boundryRequirement-jenkins-role/i-instanceId is not authorized to perform: iam:PassRole on resource: arn:partition:iam::AccountId:role/boundryRequirement-eks-managed-node (Service: AmazonEKS; Status Code: 403; Error Code: AccessDeniedException; Request ID: requestId; Proxy: null)"
[ℹ]  1 error(s) occurred and nodegroups haven't been created properly, you may wish to check CloudFormation console
[ℹ]  to cleanup resources, run 'eksctl delete nodegroup --region=<redacted> --cluster=mycluster --name=<name>' for each of the failed nodegroup
[✖]  waiting for CloudFormation stack "eksctl-mycluster-nodegroup-myNode": ResourceNotReady: failed waiting for successful resource state

Anything else we need to know?
We're using an IAM Instance Role to create/update clusters and nodegroups with many boundary restrictions.
Versions
Please paste in the output of these commands:

$ sh-4.2$ /tmp/eksctl version
0.30.0
$ kubectl  version
Client Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.8-eks-7c9bda", GitCommit:"7c9bda52c425d0d56d7b93f1377a826b4132c05c", GitTreeState:"clean", BuildDate:"2020-08-28T23:07:29Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.9-eks-d1db3c", GitCommit:"d1db3c46e55f95d6a7d3e5578689371318f95ff9", GitTreeState:"clean", BuildDate:"2020-10-20T22:53:22Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

Logs
[✖] unexpected status "ROLLBACK_IN_PROGRESS" while waiting for CloudFormation stack "eksctl-mycluster-nodegroup-myNode"
[ℹ] fetching stack events in attempt to troubleshoot the root cause of the failure
[✖] AWS::EKS::Nodegroup/ManagedNodeGroup: CREATE_FAILED – "User: arn:partition:sts::AccountId:assumed-role/boundryRequirement-jenkins-role/i-instanceId is not authorized to perform: iam:PassRole on resource: arn:partition:iam::AccountId:role/boundryRequirement-eks-managed-node (Service: AmazonEKS; Status Code: 403; Error Code: AccessDeniedException; Request ID: requestId; Proxy: null)"
[ℹ] 1 error(s) occurred and nodegroups haven't been created properly, you may wish to check CloudFormation console
[ℹ] to cleanup resources, run 'eksctl delete nodegroup --region= --cluster=mycluster --name=' for each of the failed nodegroup
[✖] waiting for CloudFormation stack "eksctl-mycluster-nodegroup-myNode": ResourceNotReady: failed waiting for successful resource state

@michaelbeaumont michaelbeaumont added the priority/important-soon Ideally to be resolved in time for the next release label Nov 17, 2020
@aclevername aclevername self-assigned this Dec 2, 2020
@aclevername
Copy link
Contributor

Hey @ChristopherMotesHalfaker , I could not reproduce the issue on 0.32.0. Could you expand upon what you think the problem is? The output makes it look like an issue with provided role.

the path section from instanceRoleARN is compressed prior creating the resource

Is this referring to the work done as part of #2689 ? You are correct that when you now provide a instanceRoleARN with a nested/deep resource path it gets trimmed down. From your example it would get mutated:

before: 
`arn:partition:iam::AccountId:role/boundryRequirement/boundryRequirement-eks-managed-node`
after:
`arn:partition:iam::AccountId:role/boundryRequirement-eks-managed-node`

Is the problem that your iam:passRole specifies the full ARN, which is causing a mismatch between the two?

@ChristopherMotesHalfaker
Copy link
Author

0.32.0 Fails with the same error.
I'm certainly familiar with the problem in #2689. We worked around it separately.
Yes, the passRole in our boundary is explicit regarding the path. E.g. arn:partition:iam::AccountId:role/boundryRequirement-eks-managed-node doesn't match the allowed resources for iam:passRole.

@aclevername
Copy link
Contributor

0.32.0 Fails with the same error.
I'm certainly familiar with the problem in #2689. We worked around it separately.
Yes, the passRole in our boundary is explicit regarding the path. E.g. arn:partition:iam::AccountId:role/boundryRequirement-eks-managed-node doesn't match the allowed resources for iam:passRole.

@ChristopherMotesHalfaker if you update your passRole to match the compressed ARN does it work?

@ChristopherMotesHalfaker
Copy link
Author

Do you mean update passRole in the boundary? The contents of that boundary policy is out my team's control. If you mean somewhere else, you'll need to be more concise.

@ChristopherMotesHalfaker
Copy link
Author

Is it possible to just get a boolean Like compressNodeARN: true, so we can set it to false?

@michaelbeaumont
Copy link
Contributor

michaelbeaumont commented Dec 10, 2020

@ChristopherMotesHalfaker Looking at https://docs.aws.amazon.com/eks/latest/userguide/troubleshooting_iam.html#security-iam-troubleshoot-ConfigMap, it seems like using a role with a path shouldn't work so I'm surprised that it ever worked. What does the kube-system/aws-auth ConfigMap look like for a cluster with a managed nodegroup where you pass a role with a path (using an earlier version of eksctl)?

@ChristopherMotesHalfaker
Copy link
Author

The path is compressed in the aws-auth config map. The failure happens when cloudformation tries to use the compressed ARN in NodeRole. We can't pass that through our boundary. Maybe if I give you more insight to our workflow

  1. Create/validate required IAM resources
  2. Execute cluster build with eksctl
  3. configure non-standard resources helm/kubectl
  4. configure aws-auth helm/kubectl
  5. build node groups with eksctl

Everything works as expected until step 5 with version >=0.30.0. Once step 4 is completed, I can manually create a node group, from the console. The console only allows me to select from a dropdown that has the fully pathed ARN.

@michaelbeaumont
Copy link
Contributor

The interaction between cloudformation, the role and the boundary is clear. But when I create a managed nodegroup from the console using a role ARN with a path, it ends up pathed in aws-auth and the creation fails (which is the reason for #2689).

Are you adding the compressed role ARN to aws-auth yourself before creating the managed nodegroup? If so, this will work, although you end up with a useless, pathed entry in the ConfigMap.

@michaelbeaumont michaelbeaumont self-assigned this Dec 11, 2020
@ChristopherMotesHalfaker
Copy link
Author

Yup, we re-write aws-auth after cluster creation and before node creation. It's the only way we could get it to work. We'll need to keep our nodegroup configs separate from the cluster config for routine updates.

@michaelbeaumont
Copy link
Contributor

@ChristopherMotesHalfaker if we were to fix this by compressing the path ourselves in the config map for managed nodes, would that allow you to skip rewriting aws-auth from your workflow?

@ChristopherMotesHalfaker
Copy link
Author

Probably not. This isn't the only role we control in aws-auth.

@github-actions
Copy link
Contributor

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the stale label Jan 18, 2021
@aclevername aclevername removed their assignment Jan 28, 2021
@michaelbeaumont michaelbeaumont removed their assignment Mar 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug priority/important-soon Ideally to be resolved in time for the next release
Projects
None yet
Development

No branches or pull requests

3 participants