(Managed)NodeGroups: Handle existing instanceRoleARN with nested resource path #2689

dmcneil · 2020-09-29T15:16:44Z

Description

When creating a cluster with either unmanaged or managed node groups that use an existing instanceRoleARN containing a nested/deep resource path, the nodes will fail to join the cluster and eventually the create process will timeout somewhere around:

[ℹ]  building managed nodegroup stack "eksctl-<cluster>-nodegroup-<nodegroup>"
[ℹ]  deploying stack "eksctl-<cluster>-nodegroup-<nodegroup>"

Example configuration to reproduce the issue:

# ... truncated for brevity

nodeGroups: # also applies to managedNodeGroups.
  - name: mng-1
    instanceType: m5.large
    iam:
      instanceRoleARN: arn:aws:iam::1234567890:role/foo/bar/baz/custom-eks-role

Currently, when using a role ARN like this, the aws-auth ConfigMap in kube-system will have the ARN copied directly but it should be copied as arn:aws:iam::1234567890:role/custom-eks-role:

$ kubectl -n kube-system get configmaps aws-auth -o yaml

apiVersion: v1
data:
  mapRoles: |
    - groups:
      - system:bootstrappers
      - system:nodes
      rolearn: arn:aws:iam::1234567890:role/foo/bar/baz/custom-eks-role # BAD
      rolearn: arn:aws:iam::1234567890:role/custom-eks-role # GOOD
      username: system:node:{{EC2PrivateDNSName}}
kind: ConfigMap
metadata:
  name: aws-auth
  namespace: kube-system

It's a bit confusing. To the user, the ARN will possibly look invalid but that's how it is supposed to be on the EKS side...🤷‍♂️

The AWS EKS Console will show the node group(s) as failed with the vague Nodes fail to join cluster message. The AWS documentation for this falls under Container runtime network not ready troubleshooting so you can only determine this is the issue if you are able to SSH into one of the nodes and view the system/process logs which will show a bunch of Unauthorized messages when trying to register/interact with the Kubernetes API.

This PR allows for the user to still provide the fully-qualified ARN in their configuration while handling the correct ARN form with the expected behavior. I initially tried adding the fix at just the AuthConfig level but the logic there isn't applied during the initial cluster creation. I have verified that the CloudFormation process works correctly even when passing the normalized ARN in the CF template. Additional test cases to explicitly test this fix are added and the existing tests all still pass in regards to regression.

Checklist

Added tests that cover your change (if possible)
Added/modified documentation as required (such as the README.md, or the userdocs directory)
The godoc comment for NormalizeARN gives a summary of its purpose.
Manually tested
Tested manually with both managed and unmanaged node groups along with no custom role at all for regression.
Added labels for change area (e.g. area/nodegroup), target version (e.g. version/0.12.0) and kind (e.g. kind/improvement)
Make sure the title of the PR is a good description that can go into the release notes

(Managed)NodeGroups: Handle instanceRoleARN with nested resource path

c7c2ff0

michaelbeaumont added the kind/improvement label Oct 6, 2020

michaelbeaumont approved these changes Oct 6, 2020

View reviewed changes

Merge branch 'master' into instance-role-arn-fix

cd88cb4

michaelbeaumont merged commit 7b57250 into eksctl-io:master Oct 6, 2020

aclevername mentioned this pull request Dec 2, 2020

NodeGroup create fails on iam:passRole when path is a boundary requirement. #2847

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(Managed)NodeGroups: Handle existing instanceRoleARN with nested resource path #2689

(Managed)NodeGroups: Handle existing instanceRoleARN with nested resource path #2689

dmcneil commented Sep 29, 2020 •

edited

(Managed)NodeGroups: Handle existing instanceRoleARN with nested resource path #2689

(Managed)NodeGroups: Handle existing instanceRoleARN with nested resource path #2689

Conversation

dmcneil commented Sep 29, 2020 • edited

Description

Checklist

dmcneil commented Sep 29, 2020 •

edited