(Managed)NodeGroups: Handle existing instanceRoleARN with nested resource path #2689
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
When creating a cluster with either unmanaged or managed node groups that use an existing
instanceRoleARN
containing a nested/deep resource path, the nodes will fail to join the cluster and eventually the create process will timeout somewhere around:Example configuration to reproduce the issue:
Currently, when using a role ARN like this, the
aws-auth
ConfigMap inkube-system
will have the ARN copied directly but it should be copied asarn:aws:iam::1234567890:role/custom-eks-role
:It's a bit confusing. To the user, the ARN will possibly look invalid but that's how it is supposed to be on the EKS side...🤷♂️
The AWS EKS Console will show the node group(s) as failed with the vague
Nodes fail to join cluster
message. The AWS documentation for this falls under Container runtime network not ready troubleshooting so you can only determine this is the issue if you are able to SSH into one of the nodes and view the system/process logs which will show a bunch ofUnauthorized
messages when trying to register/interact with the Kubernetes API.This PR allows for the user to still provide the fully-qualified ARN in their configuration while handling the correct ARN form with the expected behavior. I initially tried adding the fix at just the
AuthConfig
level but the logic there isn't applied during the initial cluster creation. I have verified that the CloudFormation process works correctly even when passing the normalized ARN in the CF template. Additional test cases to explicitly test this fix are added and the existing tests all still pass in regards to regression.Checklist
README.md
, or theuserdocs
directory)The godoc comment for
NormalizeARN
gives a summary of its purpose.Tested manually with both managed and unmanaged node groups along with no custom role at all for regression.
area/nodegroup
), target version (e.g.version/0.12.0
) and kind (e.g.kind/improvement
)