Skip to content

WIP: ARO-24037 feat(azure): enable ACR image pulls via Projected SA#8397

Closed
twolff-gh wants to merge 1 commit intoopenshift:mainfrom
twolff-gh:azure-acr-projected-sa
Closed

WIP: ARO-24037 feat(azure): enable ACR image pulls via Projected SA#8397
twolff-gh wants to merge 1 commit intoopenshift:mainfrom
twolff-gh:azure-acr-projected-sa

Conversation

@twolff-gh
Copy link
Copy Markdown

@twolff-gh twolff-gh commented May 1, 2026

What this PR does / why we need it:

Add RBAC to guest clusters allowing kubelets to request projected service account tokens
for the api://AzureADTokenExchange audience. This is one of the prerequisites for
KEP-4412 ACR image pulls via federated workload identity on ARO HCP.

When a pod's ServiceAccount is annotated with ACR credentials, the kubelet requests a
projected SA token from the API server. The token is exchanged with Azure AD via a
federated identity credential on a User-Assigned Managed Identity, which returns an
access token used to pull from private ACR — no pull secrets or node-level identity
attachment required.

The ServiceAccountNodeAudienceRestriction admission controller (enabled by default
in K8s 1.33+) requires RBAC authorization for the requested audience. Without this
ClusterRole, the token request is denied and the image pull fails. OCP 4.21 ships
K8s 1.34 where KEP-4412 is beta and the kubelet credential provider SA token support
is enabled by default.

Gated behind azureutil.IsAroHCP() — only applies to managed ARO HCP clusters.

References:

Which issue(s) this PR fixes:

Fixes ARO-24037

Special notes for your reviewer:

Test Plan

  • Validated end-to-end on OCP 4.21.9 (K8s 1.34.6) personal dev environment
  • Pod pulled private ACR image in 3.3s via projected SA token with RBAC applied
  • Verified without RBAC the same pod fails with audience not authorized error
  • make test passes
  • make build passes

Validated on personal dev environment: OCP 4.21.9 (K8s 1.34.6), ARO HCP.

Prerequisites

  • Private ACR with a test image
  • User-Assigned MI with AcrPull role on the ACR
  • Federated identity credential on the MI trusting the HC's OIDC issuer
  • tokenAttributes injected into worker nodes via MachineConfig (CS-side, not part of this PR)

Steps

  1. Apply RBAC to guest cluster:
oc --kubeconfig /tmp/guest-kubeconfig apply -f - <<'EOF'
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: system:node:credential-provider-token-audience
rules:
- apiGroups: [""]
  resources: ["serviceaccounts/token"]
  verbs: ["create"]
- verbs: ["request-serviceaccounts-token-audience"]
  apiGroups: [""]
  resources: ["api://AzureADTokenExchange"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: system:node:credential-provider-token-audience
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:node:credential-provider-token-audience
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:nodes
EOF

2. Create annotated ServiceAccount:
oc --kubeconfig /tmp/guest-kubeconfig create namespace acr-sa-test
oc --kubeconfig /tmp/guest-kubeconfig apply -f - <<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
  name: acr-pull-sa
  namespace: acr-sa-test
  annotations:
    kubernetes.azure.com/acr-client-id: "${MI_CLIENT_ID}"
    kubernetes.azure.com/acr-tenant-id: "${TENANT_ID}"
EOF

3. Deploy test pod pulling from private ACR:
oc --kubeconfig /tmp/guest-kubeconfig apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: acr-sa-test
  namespace: acr-sa-test
spec:
  serviceAccountName: acr-pull-sa
  containers:
  - name: test
    image: ${ACR_NAME}.azurecr.io/private-nginx:latest
  restartPolicy: Never
EOF

4. Verify pod pulls successfully:
oc --kubeconfig /tmp/guest-kubeconfig get pod acr-sa-test -n acr-sa-test
# NAME          READY   STATUS    RESTARTS   AGE
# acr-sa-test   1/1     Running   0          9s

## Checklist:
- [x] Subject and description added to both, commit and PR.
- [x] Relevant issues have been referenced.
- [ ] This change includes docs.
- [x] This change includes unit tests.

<!-- This is an auto-generated comment: release notes by coderabbit.ai -->

## Summary by CodeRabbit

* **New Features**
  * Enabled service account token request capabilities for node credential providers, supporting Azure AD token exchange in hosted clusters.

* **Tests**
  * Added tests to validate credential provider token audience RBAC configuration and role binding assignments.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

…ovider token audience

Add ClusterRole and ClusterRoleBinding in the guest cluster granting
system:nodes the request-serviceaccounts-token-audience verb. This is
required by the ServiceAccountNodeAudienceRestriction feature gate
(enabled by default in K8s 1.33+) to authorize kubelet to request
projected service account tokens for the api://AzureADTokenExchange
audience used by credential providers (KEP-4412).

Conditional on Azure (ARO HCP) via azureutil.IsAroHCP().

Ref: ARO-24037

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 1, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 1, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 1, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 9baf8193-085a-4391-ab19-9705c44fa260

📥 Commits

Reviewing files that changed from the base of the PR and between 302e42b and 10618dd.

📒 Files selected for processing (4)
  • control-plane-operator/hostedclusterconfigoperator/controllers/resources/manifests/rbac.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/resources/rbac/reconcile.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/resources/rbac/reconcile_test.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.go

📝 Walkthrough

Walkthrough

This change introduces RBAC reconciliation support for node credential provider token audience in the hosted cluster configuration operator. Two new manifest helper functions create ClusterRole and ClusterRoleBinding objects. Corresponding reconciliation functions populate the ClusterRole with rules granting nodes permission to create service account tokens and request the credential-provider audience, and configure the ClusterRoleBinding to assign these permissions to the system:nodes group. The reconciliation entries are integrated into the Azure ARO HCP workflow. Tests validate that the reconciliation functions correctly set the expected rules, references, and subjects.

🚥 Pre-merge checks | ✅ 11 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Test Structure And Quality ⚠️ Warning Test file has 17 assertions with only 1 meaningful failure message; 16 assertions lack descriptive messages. Add descriptive failure messages to all assertions using Gomega's Expect(value).To(matcher, "message") syntax.
✅ Passed checks (11 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title references ARO-24037 and ACR image pulls via Projected SA, which aligns with the PR's core objective of enabling ACR image pulls through federated workload identity. However, it doesn't clearly convey the primary technical change: adding RBAC permissions for kubelets to request projected ServiceAccount tokens.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed Test names are stable and deterministic with no dynamic values, generated identifiers, timestamps, or runtime-dependent information.
Microshift Test Compatibility ✅ Passed PR contains no Ginkgo e2e tests. Added tests in reconcile_test.go are standard Go unit tests using testing.T, not Ginkgo-based.
Single Node Openshift (Sno) Test Compatibility ✅ Passed This PR does not add any Ginkgo e2e tests. The new tests in reconcile_test.go are standard Go unit tests using testing.T and Gomega assertions, not Ginkgo e2e tests. They test RBAC reconciliation functions in isolation without making any assumptions about cluster topology, multi-node configurations, or SNO compatibility.
Topology-Aware Scheduling Compatibility ✅ Passed PR introduces only RBAC resources (ClusterRole/ClusterRoleBinding) with no scheduling constraints, affinity rules, or topology-specific assumptions.
Ote Binary Stdout Contract ✅ Passed PR adds pure utility functions and unit tests for RBAC configuration with no process-level code, stdout writes, or klog/log configuration.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR contains only standard Go unit tests for RBAC reconciliation with no Ginkgo e2e tests, IPv4 hardcoding, or external connectivity requirements.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release and removed do-not-merge/needs-area labels May 1, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 1, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: twolff-gh
Once this PR has been reviewed and has the lgtm label, please assign enxebre for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@codecov
Copy link
Copy Markdown

codecov Bot commented May 1, 2026

Codecov Report

❌ Patch coverage is 72.09302% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 34.36%. Comparing base (e087807) to head (10618dd).
⚠️ Report is 325 commits behind head on main.

Files with missing lines Patch % Lines
...igoperator/controllers/resources/manifests/rbac.go 0.00% 12 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8397      +/-   ##
==========================================
+ Coverage   29.94%   34.36%   +4.41%     
==========================================
  Files        1049     1053       +4     
  Lines       97557    99420    +1863     
==========================================
+ Hits        29216    34167    +4951     
+ Misses      65837    62603    -3234     
- Partials     2504     2650     +146     
Files with missing lines Coverage Δ
...igoperator/controllers/resources/rbac/reconcile.go 3.76% <100.00%> (+3.76%) ⬆️
...rconfigoperator/controllers/resources/resources.go 50.40% <100.00%> (-1.40%) ⬇️
...igoperator/controllers/resources/manifests/rbac.go 0.00% <0.00%> (ø)

... and 245 files with indirect coverage changes

Flag Coverage Δ
cmd-support 30.31% <ø> (?)
cpo-hostedcontrolplane 36.45% <ø> (?)
cpo-other 37.83% <72.09%> (?)
hypershift-operator 47.84% <ø> (?)
other 17.05% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@twolff-gh
Copy link
Copy Markdown
Author

RBAC in the guest cluster is the customer's responsibility, not the platform's

@twolff-gh twolff-gh closed this May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant