Feat: Enhance podIdentity Role Assumption in AWS by Direct Integration with OIDC/Federation and Implement Credentials Cache #5177

ThaSami · 2023-11-10T18:41:17Z

In the existing setup, when AWS roles are overridden through pod identity mechanism or by specifying the awsRoleArn trigger authentication environment variable, KEDA attempts to assume the specified role using its Operator's IAM role. This process necessitates granting the KEDA operator's role the necessary permissions to assume the target role. Additionally, the trust relationship on the target role must be configured to permit this assumption. While this approach aligns well with setups using kube2iam, it introduces a redundant step in environments that utilize IRSA.

The proposed Pull Request (PR) enables KEDA to directly assume AWS roles by using OpenID Connect (OIDC) and federation mechanisms. This direct method bypasses the need to configure additional permissions on the KEDA operator's role. By doing so, it simplifies the role assumption process in IRSA environments, reducing the configuration overhead typically associated with setting up cross-account role permissions and trust policies.
The modification maintains backward compatibility; if role assumption is unsuccessful, it will revert to attempting role assumption via the operator's role

Additionally, this update introduces a credentials cache for AWS roles. When multiple ScaledObjects utilize the same AWS role, this cache significantly cuts down on the number of AWS API calls.

Checklist

Tests have been added
Changelog has been updated and is aligned with our changelog requirements
A PR is opened to update the documentation on (repo) (if applicable)
Commits are signed with Developer Certificate of Origin (DCO - learn more)

Fixes #5178
Fixes #5297

github-actions · 2023-11-10T18:41:28Z

Thank you for your contribution! 🙏 We will review your PR as soon as possible.

While you are waiting, make sure to:

Add an entry in our changelog in alphabetical order and link related issue
Update the documentation, if needed
Add unit & e2e tests for your changes
GitHub checks are passing
Is the DCO check failing? Here is how you can fix DCO issues

Learn more about:

Our contribution guide

Signed-off-by: Sami S <25616506+ThaSami@users.noreply.github.com>

blakepettersson · 2023-11-20T15:16:20Z

I think in general this makes sense - perhaps it would be good to showcase how other operators (e.g external-secrets) uses STS, or to link to some reference documentation on how STS is meant to be used by a k8s operator to showcase to people not so familiar with AWS that this is indeed the recommended path to go.

This also ties in with the work @JorTurFer is doing in #5061 and kedacore/keda-docs#1251 so would be good to get his eyes on this as well.

Otherwise it would be good to add some E2E tests for this, but I'm not familiar with how this is done in KEDA so that would also need @JorTurFer's input 😄

ThaSami · 2023-11-23T15:20:04Z

Thank you, @blakepettersson, for highlighting the external-secrets approach.
Their method is interesting they provide 3 ways to auth:

Utilizing the operator's IAM role, the same as what we do in KEDA. This includes the option for the controller's role to assume another role via sts:AssumeRole. This is closely related to what we're addressing in the current PR, which aims to modify this.
Employing AWS secret keys.
Leveraging service account tokens, provided that volume projection is enabled — a setting activated by default in EKS environments.

The third option catches my interest. @JorTurFer, it might be worth considering this into the new AWS provider.

While I still believe the work being done in this PR holds some value, we could shift our approach a little bit. Instead of retrieving KEDA's service account token via os.GetEnv(WEB_IDENTITY_TOKEN), we might use External-secret's approach method to dynamically generate the other workload's service account token.
The external-secrets team shows how to generate credentials given the service account's name, as seen in their implementation: Token Fetcher in external-secrets.

JorTurFer · 2023-11-27T23:41:34Z

While I still believe the work being done in this PR holds some value, we could shift our approach a little bit. Instead of retrieving KEDA's service account token via os.GetEnv(WEB_IDENTITY_TOKEN), we might use External-secret's approach method to dynamically generate the other workload's service account token.

Honestly, I don't think that this is worth because KEDA could create a token for any identity within the cluster, and that's a huge risk IMHO. I didn't know that external-secrets does this, but now it's a no-go for me. Adding the capability of requesting service account tokens without any limitation is potentially risky because KEDA could request a token for a service account bound with a role that it's not allowed for KEDA, but as KEDA can just request a valid k8s token for that workload, KEDA could by-pass the assumption policy and escalate its privileges. This approach also would require a huge privilege increase in KEDA side, because KEDA needs this permission by RBAC. I strongly disagree with this, but maybe I'm missing something important. @zroubalik WDYT?
I'd like to add also that this approach has been copied from the aws implementation from secrets store CSI and I also think that it's terrible. Their RBAC allows this, so the controller can fake any service account in the cluster without restrictions (including privilege escalation of the component itself, requesting service account token from other workloads with more privileges). This isn't in k8s repos but in aws organization.

About the approach, I have thoughts in conflict with doing the fallback from NewWebIdentityRoleProvider to NewAssumeRoleProvider because it can produce confusion to users, and I'd just select one of them and document it as way to go, but I don't have any strong opinion about doing it or not (if it's well documented). In any case, if we have consensus about doing or not the fallback, and if we agree with not doing it, having consensus about using WebIdentity or not, I can just update my PR to reflect these agreements.

zroubalik · 2023-12-07T17:34:28Z

I am with @JorTurFer on this, though to be fair, I don't consider myself an expert on the area.

JorTurFer · 2023-12-07T17:53:18Z

Just for update this PR, @ThaSami and I have been talking about directly via slack and we will move this to the new aws auth (the one I have as wip), adding to it some extra features, so the new auth will include also:

The fallback from assumeRoleWithWebIdentity to AssumeRole, documenting it well. We agreed on this because SDKs already have a default option that does exactly this, so it's not totally confusing
We need to create a session cache on top of SDK CredentialsCache to reduce the amount of requested tokens. This is because currently, each scaler generates it's own token, 100 scalers means 100 tokens although they are for the same ARN. The token for a given ARN has to be shared accross all the scalers that uses the ARN

These to features are in addition to already pending changes in that auth

zroubalik · 2023-12-07T18:18:08Z

Just for update this PR, @ThaSami and I have been talking about directly via slack and we will move this to the new aws auth (the one I have as wip), adding to it some extra features, so the new auth will include also:
* The fallback from assumeRoleWithWebIdentity  to AssumeRole, documenting it well. We agreed on this because SDKs already have a default option that does exactly this, so it's not totally confusing

* We need to create a session cache on top of SDK CredentialsCache  to reduce the amount of requested tokens. This is because currently, each scaler generates it's own token, 100 scalers means 100 tokens although they are for the same ARN. The token for a given ARN has to be shared accross all the scalers that uses the ARN
These to features are in addition to already pending changes in that auth

Excellent, I really like the cache!

ThaSami · 2023-12-17T20:15:11Z

Thanks, @JorTurFer @zroubalik for the insights

Following our conversation on Slack with Jorge, I've implemented a cache for AWS Role credentials. This cache monitors each AWS role's usage and performs cleanup only when a scaler is removed. I would appreciate your input.

JorTurFer · 2023-12-31T16:32:27Z

Thanks for your contribution! As we have talked about offline, I close this PR in favor of #5061 which adds this functionality to the new auth.

ThaSami requested a review from a team as a code owner November 10, 2023 18:41

assume roles using webIdentity

49ea026

Signed-off-by: Sami S <25616506+ThaSami@users.noreply.github.com>

ThaSami force-pushed the irsa-federation branch from a3ab249 to 49ea026 Compare November 10, 2023 18:54

Update changelog

2a8df17

Signed-off-by: Sami S <25616506+ThaSami@users.noreply.github.com>

ThaSami added 2 commits December 17, 2023 20:07

Add AWS credentials cache

e712ca5

Update changelogs

c1014c7

ThaSami changed the title ~~Enhance podIdentity Role Assumption in AWS by Direct Integration with OIDC/Federation~~ Feat: Enhance podIdentity Role Assumption in AWS by Direct Integration with OIDC/Federation and Implement Credentials Cache Dec 17, 2023

JorTurFer closed this Dec 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Enhance podIdentity Role Assumption in AWS by Direct Integration with OIDC/Federation and Implement Credentials Cache #5177

Feat: Enhance podIdentity Role Assumption in AWS by Direct Integration with OIDC/Federation and Implement Credentials Cache #5177

ThaSami commented Nov 10, 2023 •

edited

github-actions bot commented Nov 10, 2023

blakepettersson commented Nov 20, 2023

ThaSami commented Nov 23, 2023

JorTurFer commented Nov 27, 2023 •

edited

zroubalik commented Dec 7, 2023

JorTurFer commented Dec 7, 2023 •

edited

zroubalik commented Dec 7, 2023

ThaSami commented Dec 17, 2023

JorTurFer commented Dec 31, 2023

Feat: Enhance podIdentity Role Assumption in AWS by Direct Integration with OIDC/Federation and Implement Credentials Cache #5177

Feat: Enhance podIdentity Role Assumption in AWS by Direct Integration with OIDC/Federation and Implement Credentials Cache #5177

Conversation

ThaSami commented Nov 10, 2023 • edited

Checklist

github-actions bot commented Nov 10, 2023

blakepettersson commented Nov 20, 2023

ThaSami commented Nov 23, 2023

JorTurFer commented Nov 27, 2023 • edited

zroubalik commented Dec 7, 2023

JorTurFer commented Dec 7, 2023 • edited

zroubalik commented Dec 7, 2023

ThaSami commented Dec 17, 2023

JorTurFer commented Dec 31, 2023

ThaSami commented Nov 10, 2023 •

edited

JorTurFer commented Nov 27, 2023 •

edited

JorTurFer commented Dec 7, 2023 •

edited