Skip to content

Conversation

@bryan-cox
Copy link
Member

Define design for running HyperShift hosted control planes on
self-managed Azure infrastructure. Covers deployment workflow,
infrastructure prerequisites, and workload identity integration.

Fixes: https://issues.redhat.com/browse/CNTRLPLANE-2209

Signed-off-by: Bryan Cox brcox@redhat.com

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 10, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 10, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 10, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign csrwng for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@bryan-cox bryan-cox changed the title docs(hypershift): add self-managed Azure enhancement proposal feature(hypershift): add self-managed Azure enhancement proposal Dec 10, 2025
@bryan-cox bryan-cox changed the title feature(hypershift): add self-managed Azure enhancement proposal CNTRLPLANE-2209: Add self-managed Azure HCP enhancement proposal Dec 10, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Dec 10, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Dec 10, 2025

@bryan-cox: This pull request references CNTRLPLANE-2209 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Define design for running HyperShift hosted control planes on
self-managed Azure infrastructure. Covers deployment workflow,
infrastructure prerequisites, and workload identity integration.

Fixes: https://issues.redhat.com/browse/CNTRLPLANE-2209

Signed-off-by: Bryan Cox brcox@redhat.com

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

deployments. While SNO could theoretically serve as a management cluster for
HyperShift, this is not a target use case for self-managed Azure.

### Implementation Details/Notes/Constraints
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I would expect detailed info what components need to be updated for self-managed HCP on Azure and how. For me personally, the most important information would be that everyone in a hosted control plane who needs to talk to the Azure cloud must to use a token minter sidecar because of XYZ and how CPO tells its operands to do so.

You (HyperShift team) need to do a lot of work to support new hypershift install, hypershift create cluster azure etc., but there are no details about it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Component Changes" section now provides the details you're looking for. The key points:

  1. CPO tells operands to use token minter sidecar because hosted control plane components run in the management cluster but need to authenticate to Azure using the guest cluster's service accounts. The token-minter sidecar mints tokens from the guest cluster's API server (via kubeconfig) and writes them to a shared volume that the main container reads.

  2. Why token minter is needed: Standard workload identity relies on projected service account tokens, but HCP components are in a different cluster than where their service accounts are defined. The token-minter bridges this gap.

  3. HyperShift CLI work: The hypershift create cluster azure command already supports user-provided infrastructure. The hypershift create infra azure and hypershift create credentials azure commands are planned for Tech Preview to simplify setup.

The "Hosted Control Plane Components Requiring Token Minter Sidecar" table and "CSI Driver Token Minter Configuration" sections detail the specific components and how CPO configures them.


AI-assisted response via Claude Code

Comment on lines +115 to +134
## Proposal

Self-managed Azure HyperShift extends the existing HyperShift architecture to
support Azure as a platform where users manage all infrastructure themselves.
The implementation leverages existing HyperShift patterns while adding
Azure-specific infrastructure provisioning guidance and workload identity
integration.

The deployment consists of three main phases:

1. **Azure Workload Identity Setup**: Create managed identities for OpenShift
components, configure the OIDC issuer, and establish federated credentials.

2. **Management Cluster Setup**: Install the HyperShift operator on an existing
OpenShift cluster in Azure, optionally configure External DNS, and prepare
the cluster to host control planes.

3. **Hosted Cluster Creation**: Provision Azure infrastructure for the hosted
cluster, deploy the control plane, create worker node VMs, and integrate
workload identities.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying from the template:

Enumerate all of the proposed changes at a high level, including all of the components that need to be modified and how they will be different. Include the reason for each choice in the design and implementation that is proposed here.

So, what components need to be modified?

Copy link
Contributor

@jsafrane jsafrane Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see there is a list of components bellow now.

@bryan-cox
Copy link
Member Author

bryan-cox commented Dec 15, 2025

@jsafrane Thank you for the feedback on component details. I've updated the proposal to address your comments:

Re: Comment at line 134 (Component list)
Added a new "Component Changes" subsection that enumerates:

  1. HyperShift Repository Components - CLI commands (hypershift install, hypershift create cluster azure, hypershift create infra azure, hypershift create credentials azure), CPO, HyperShift Operator, and NodePool Controller with their specific modifications
  2. Hosted Control Plane Components Requiring Token Minter Sidecar - Table listing each component (CCM, CSI drivers, Image Registry, Ingress, CAP/CAPZ) and their Azure API usage

Re: Comment at line 284 (Token minter sidecar details)
Added detailed "Azure Workload Identity Architecture" section explaining:

  • How the OIDC issuer is set up
  • Token flow diagram showing how control plane pods authenticate to Azure
  • CPO token minter configuration details (service account annotations, projected token volumes, environment variables)

The enhancement now documents how CPO configures operands to use workload identity federation for Azure API access.


AI-assisted response via Claude Code

@bryan-cox bryan-cox force-pushed the CNTRLPLANE-2209 branch 6 times, most recently from c4589ea to 64b90d5 Compare December 15, 2025 17:44
@bryan-cox
Copy link
Member Author

/test ?

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 15, 2025

@bryan-cox: The following commands are available to trigger required jobs:

/test markdownlint

Use /test all to run all jobs.

Details

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@bryan-cox
Copy link
Member Author

/test markdownlint

@bryan-cox bryan-cox force-pushed the CNTRLPLANE-2209 branch 2 times, most recently from d86d881 to 0db22b3 Compare December 15, 2025 20:35
@bryan-cox
Copy link
Member Author

/test markdownlint

@bryan-cox
Copy link
Member Author

/test markdownlint

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 15, 2025

@bryan-cox: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Comment on lines +150 to +153
- `hypershift create infra azure`: New infrastructure provisioning command
for creating Azure resources (VNets, subnets, NSGs, storage accounts).
- `hypershift create credentials azure`: New command for generating workload
identity credentials and federated credential configurations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen these in docs yet - is this supposed to simplify OIDC and identities setup? Is this aimed for TechPreview?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hypershift create infra azure and hypershift create credentials azure commands are planned for Tech Preview to simplify the OIDC and workload identity setup. For Dev Preview, users will follow the manual setup documented at https://hypershift.pages.dev/how-to/azure/self-managed-azure-index/. The goal for Tech Preview is to provide a more streamlined experience similar to what we offer for AWS.


AI-assisted response via Claude Code

1. Projects a Kubernetes service account token to a well-known file path
2. Configures Azure SDK environment variables for workload identity
authentication
3. Enables components to obtain Azure access tokens without long-lived
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just leaving a note here: Azure CSI drivers still don't use "real" short-term credential setup (SMB), and we don't have ETA yet: kubernetes-sigs/azurefile-csi-driver#1737 (comment)

But it should not block this enhancement, and I'd expect this to be rather hidden fix without behavior change - but something we might want to focus on when testing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the note. I've added a callout in the CSI Driver section about this upstream limitation (kubernetes-sigs/azurefile-csi-driver#1737). This is good to keep in mind for testing but as you noted, shouldn't block the enhancement and will be transparent to users when upstream support is available.


AI-assisted response via Claude Code

- A projected service account token volume
- Environment variables: `AZURE_CLIENT_ID`, `AZURE_TENANT_ID`,
`AZURE_FEDERATED_TOKEN_FILE`
- The appropriate managed identity client ID for its Azure RBAC permissions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The appropriate managed identity client ID for its Azure RBAC permissions
- The appropriate workload identity client ID for its Azure RBAC permissions

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Changed "managed identity" to "workload identity" for consistency.


AI-assisted response via Claude Code


| Environment Variable | Value |
|---------------------|-------|
| `AZURE_CLIENT_ID` | Managed identity client ID from `HostedCluster.Spec.Platform.Azure.WorkloadIdentities.<component>.ClientID` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `AZURE_CLIENT_ID` | Managed identity client ID from `HostedCluster.Spec.Platform.Azure.WorkloadIdentities.<component>.ClientID` |
| `AZURE_CLIENT_ID` | Workload identity client ID from `HostedCluster.Spec.Platform.Azure.WorkloadIdentities.<component>.ClientID` |

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Changed "Managed identity" to "Workload identity" for consistency.


AI-assisted response via Claude Code

- Creates storage accounts for OIDC and image registry
- Creates managed identities for OpenShift components

2. The platform engineer installs the OpenShift management cluster on Azure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have a statement on what type of management clusters are we expecting to support. So far we've been testing with standalone Azure OpenShift without Workload Idenitity - is that the only expected configuration we'll be supporting?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added a statement clarifying the supported management cluster configuration. For Dev Preview, the supported configuration is a standalone OpenShift cluster on Azure with Workload Identity (backed by federated managed identities).


AI-assisted response via Claude Code

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction to my previous reply: The supported management cluster configuration is a standalone OpenShift cluster, which can run on Azure or AWS. Updated the document to reflect this.


AI-assisted response via Claude Code

3. **Workload Identity Integration**: Azure Workload Identity Federation is used
for secure authentication, requiring:
- OIDC issuer configuration
- Managed identities for each OpenShift component
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Managed identities for each OpenShift component
- Workload identities for each OpenShift component

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Changed "Managed identities" to "Workload identities" for consistency.


AI-assisted response via Claude Code


- Azure subscription for CI/CD testing with appropriate quotas
- Test Azure infrastructure (VNets, storage accounts, DNS zones) for e2e tests
- Integration with existing HyperShift CI infrastructure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Azure Graph API access is required.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added Azure Graph API access requirement to the Infrastructure Needed section.


AI-assisted response via Claude Code

Define design for running HyperShift hosted control planes on
self-managed Azure infrastructure. Covers deployment workflow,
infrastructure prerequisites, and workload identity integration.

Fixes: https://issues.redhat.com/browse/CNTRLPLANE-2209

Signed-off-by: Bryan Cox <brcox@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants