Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it possible to use assume role #15234

Merged
merged 2 commits into from
May 9, 2023
Merged

Conversation

zetaab
Copy link
Member

@zetaab zetaab commented Mar 15, 2023

What is our current problem? We are hosting kOps management portal in AWS (which has AWS role). We are using that to provision clusters to another AWS accounts. Currently we are using management portal role and calling assumerole to another account role. However, because its called "role chaining" we can get maximum 1 hour long credentials from STS. In all cases the 1 hour long credentials are not enough (for instance kops rolling-update). My goal in this PR is that kOps could automatically refresh assumed role. So in future I could just specify AWS_ASSUME_ROLE_ARN to kOps cli binary, and it will automatically assume and refresh itself if needed.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. area/provider/aws Issues or PRs related to aws provider labels Mar 15, 2023
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 15, 2023
@zetaab
Copy link
Member Author

zetaab commented Mar 15, 2023

cc @hakman @olemarkus I need some help here, if you read the description and check the solution. Does it make sense at all? Is there easier way to solve this?

func stscreds.NewCredentials():

// NewCredentials returns a pointer to a new Credentials value wrapping the
// AssumeRoleProvider. The credentials will expire every 15 minutes and the
// role will be named after a nanosecond timestamp of this operation. The
// Credentials value will attempt to refresh the credentials using the provider
// when Credentials.Get is called, if the cached credentials are expiring.

so if I understand this correctly it will handle the credential refresh automatically out of the box?

I need to code test software tomorrow, which will use this and the "old" (aka assumerole before calling it) way

@rifelpet
Copy link
Member

Can you define the role ARN in a profile in the AWS config file, and specify the profile via AWS_PROFILE?

https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-profiles.html

@zetaab
Copy link
Member Author

zetaab commented Mar 16, 2023

@rifelpet we are not using aws cli to call anything, its just go code. I will make sample code about this and test it inside kubernetes

@zetaab
Copy link
Member Author

zetaab commented Mar 16, 2023

I will test that, does it help if I generate ~/.aws/config and export AWS_PROFILE=foobar

@zetaab
Copy link
Member Author

zetaab commented Mar 16, 2023

it seems that its not possible, similar issue aws/aws-cli#3875

the problem is that the portal is getting the original aws role as env variables:

% kubectl exec -it credtest-d7c469dc4-lcrrf -- env|grep AWS
AWS_REGION=eu-central-1
AWS_DEFAULT_REGION=eu-central-1
AWS_STS_REGIONAL_ENDPOINTS=regional
AWS_ROLE_ARN=arn:aws:iam::xxx:role/kapi-role
AWS_WEB_IDENTITY_TOKEN_FILE=/var/run/secrets/eks.amazonaws.com/serviceaccount/token

after that it cannot assume role again (see issue)

similar request boto/boto3#2360 so it seems that this is not widely supported in aws sdks

@zetaab
Copy link
Member Author

zetaab commented Mar 16, 2023

Screenshot 2023-03-16 at 10 31 11

setup is something like this. We have pod running in account x, using role r1. We are executing kops inside that pod, but before we can really execute anything we need to assume the role to r2 (account y role which is allowed to do stuff in account y).

@zetaab zetaab changed the title wip: make it possible to use assume role Make it possible to use assume role Mar 16, 2023
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 16, 2023
@zetaab
Copy link
Member Author

zetaab commented Mar 16, 2023

this does not break any current behaviour, but makes it work for this use-case. I do not see huge risk of merging this. However, if someone have better idea how to make it work - that is maybe better to discuss first.

test code available here https://github.com/zetaab/credtest/blob/main/main.go

explanations:
func aliveCheck - current behaviour, we will create new credentials with assumerole before starting kops cli. The credentials will expire after one hour (which is problem when rolling updating cluster)
func aliveCheckNew - the same behaviour than in this PR, works as planned. It will automatically refresh the credentials when needed
func aliveCheckNew2 - tried to use aws configfile to override the behaviour if AWS clients could assume role before when executing commands. However, it does not work - it does not assume role again if its already using role.

@zetaab
Copy link
Member Author

zetaab commented Apr 3, 2023

@hakman any news to this?

@hakman
Copy link
Member

hakman commented Apr 6, 2023

@zetaab I will try to discuss it during today's office hours with @justinsb.

@justinsb
Copy link
Member

justinsb commented Apr 8, 2023

So I was wondering about whether this should be AWS_ASSUME_ROLE_ARN or AWS_ROLE_ARN, . AWS_ROLE_ARN is used by the aws cli and most SDKs, but it is only used for web identity. It also sounds like we want to avoid potential conflicts here, and anyone using IRSA (as you are) will have AWS_ROLE_ARN set, so there would be a high chance of conflicts.

Code LGTM, just trying to understand the nuances here.

@zetaab
Copy link
Member Author

zetaab commented Apr 8, 2023

@justinsb yep AWS_ROLE_ARN is used by AWS CDKs in case of web identity. We have the use-case that pod is started using web identity role, which means that it will inject role to pod which is used for commands. However, after that we would like to assume it again using another role (otherwise we need to add own oidc providers for ALL aws accounts). So when we start pod using IRSA, the contoller will inject env variable AWS_ROLE_ARN to pod. After that using AWS_ASSUME_ROLE_ARN works fine and it will not conflict with anything.

One option is that we rename it without AWS prefix like KOPS_ROLE_ARN or KOPS_ASSUME_ROLE_ARN? Then we can be quite sure that it does not conflict with things in future?

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 8, 2023
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 8, 2023
@zetaab
Copy link
Member Author

zetaab commented May 8, 2023

@hakman / @justinsb how we could proceed with this one? IMO it should not break any existing setup, but it makes it possible to use roles with roles :)

@hakman
Copy link
Member

hakman commented May 9, 2023

@zetaab what do you think about KOPS_AWS_ROLE_ARN for now?

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 9, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hakman

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels May 9, 2023
@k8s-ci-robot k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels May 9, 2023
@zetaab
Copy link
Member Author

zetaab commented May 9, 2023

@hakman I rebased, needs lgtm again

@hakman
Copy link
Member

hakman commented May 9, 2023

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 9, 2023
@hakman
Copy link
Member

hakman commented May 9, 2023

/test pull-kops-e2e-k8s-aws-calico

@k8s-ci-robot k8s-ci-robot merged commit 40072c1 into kubernetes:master May 9, 2023
8 checks passed
@zetaab zetaab deleted the assumerole branch May 9, 2023 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/aws Issues or PRs related to aws provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/office-hours lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants