Skip to content

enhancement: dual-stream RHEL 9/10 NodePool support in HyperShift#2019

Open
enxebre wants to merge 2 commits into
openshift:masterfrom
enxebre:dual-stream-rhel-nodepool
Open

enhancement: dual-stream RHEL 9/10 NodePool support in HyperShift#2019
enxebre wants to merge 2 commits into
openshift:masterfrom
enxebre:dual-stream-rhel-nodepool

Conversation

@enxebre
Copy link
Copy Markdown
Member

@enxebre enxebre commented May 21, 2026

Summary

  • Add enhancement proposal for OCPSTRAT-3014
  • Introduces spec.osImageStream on NodePool API for per-NodePool RHEL stream selection (rhel-9 / rhel-10)
  • Threads OS stream through token secret → ignition server → MCO bootstrap pipeline via OSImageStream CR injection
  • Handles runc upgrade guard: RHEL 10 does not ship runc, so NodePools with runc MachineConfigs stay on RHEL 9
  • Updates boot image metadata parsing for the new multi-stream ConfigMap format (streams key)
  • All stream validation surfaces through NodePoolValidMachineConfigConditionType

Test plan

  • Unit tests for getRHELStream(), config hash behavior, runc detection, OSImageStream CR generation
  • E2E tests on TechPreview CI: dual-stream NodePools, validation rejection, runc fallback

🤖 Generated with Claude Code

@openshift-ci openshift-ci Bot requested review from csrwng and sjenning May 21, 2026 16:40
Copy link
Copy Markdown

@jparrill jparrill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together, Alberto, the enhancement looks really nice. I've left some inline comments worth looking at. The main areas I'd like to see addressed:

  1. Boot image availability: What happens when a user requests a stream whose boot image doesn't exist in their region/platform? (Comment on Risks section)
  2. InPlace NodePools: The enhancement doesn't address osImageStream changes on InPlace upgrade-type NodePools, which has significant implications for baremetal and disconnected deployments. (Comment on Upgrade section)
  3. Karpenter: The Karpenter path needs a brief subsection explaining how the default stream flows through the pipeline. (Comment on Non-Goals)
  4. Hash invariant: The separation between spec-value and resolved-value in the config hash is critical for avoiding unintended rollouts and deserves explicit documentation. (Comment on Phase 2)

Everything else is either minor (condition type, pipeline diagram) or informational (CI cost, graduation timeline, status data source).


3. **runc detection false negatives.** A user could configure runc via a mechanism other than CRI-O drop-in files. *Mitigation:* Match the MCO's detection logic exactly. If the MCO considers it sufficient for standalone, it is sufficient for HyperShift.

4. **Mixed-stream clusters.** A HostedCluster with RHEL 9 and RHEL 10 NodePools is a new topology. Component compatibility across RHEL versions must be validated. *Mitigation:* This is the same topology supported by standalone clusters via per-MachineConfigPool stream selection.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The enhancement acknowledges that RHEL 10 AMIs exist in only 2 AWS regions and Azure has no marketplace images. However, it doesn't specify the user-facing behavior when someone requests osImageStream: rhel-10 in a region or platform where the boot image doesn't exist.

A few questions:

  • Does the NodePool controller reject the request via ValidPlatformImage=False? Does it fall back to rhel-9 silently? Or does it proceed and fail at machine provisioning time?
  • Which platforms are actually supported for rhel-10 in TechPreview? AWS (2 regions only)? Azure (VHD only, no marketplace)? GCP? KubeVirt? The enhancement should explicitly list which platform+region combinations are supported and what happens for unsupported ones.
  • For disconnected environments: if dual-stream payloads carry additional images, customers need to mirror both streams. The enhancement should document what additional content needs to be mirrored (two sets of boot images + two sets of node OS container images) and whether IDMS handles this transparently.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValidPlatformImage=False

yes, this continues current behaviour and signaling. When a boot image doesn't exist for a stream we fail. @yuqi-zhang can you help hcp align with mco/installer expectations/ux on standalone? Should we instead fallback when a boot image is not available for a stream and assume that the appropriate os container image will be discovered and succeed to install via image-references?

The MCO's `GetBuiltinDefaultStreamName()` already returns a version-based default. HyperShift could skip generating the OSImageStream CR and let the MCO choose.

This was rejected because:
- It removes per-NodePool stream control — all NodePools would get the same default stream
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test plan lists 5 E2E scenarios, several of which create HostedClusters with dual-stream NodePools. A couple of concerns on CI cost:

  • Each test with two NodePools (rhel-9 + rhel-10) roughly doubles the node count per test. What's the estimated overhead in time and infrastructure per CI run?
  • HyperShift doesn't have a dedicated TechPreview CI job — tests that need TechPreview features simply create a HostedCluster with FeatureSet: TechPreviewNoUpgrade (as seen in create_cluster_test.go). These new tests would run alongside existing E2E tests in the same jobs. What's the impact on overall job duration?
  • Given RHEL 10 AMIs are limited to 2 AWS regions, the tests are geographically constrained. Do the current CI jobs run in those regions?

This doesn't block the enhancement, but the team should have visibility into the CI cost before approving.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hypershift has a techpreview job and also can set any HC to be techpreview itself.
This validations would require about ~6 additional ec2 instances per run.
Update the section to clarify all this.


1. Administrator upgrades a HostedCluster from 4.19 to 5.0. An existing NodePool has no `spec.osImageStream` (would default to `rhel-10`) but has a MachineConfig setting `default_runtime = "runc"`.
2. NodePool controller's `validMachineConfigCondition` calls `getRHELStream()` which detects runc and returns `"rhel-9"` (fallback).
3. The condition is set to `NodePoolValidMachineConfigConditionType=True` with message "OS stream defaulted to rhel-9: RHEL 10 is incompatible with default_runtime=runc". The NodePool stays on RHEL 9.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The enhancement routes stream validation errors through NodePoolValidMachineConfigConditionType. This works technically — getRHELStream() errors propagate through NewConfigGenerator() and the existing condition handler catches them.

However, for SRE observability, it might be worth having a separate condition (e.g., ValidOSStream) so that monitoring dashboards can distinguish "malformed MachineConfig" from "incompatible OS stream selection" without parsing the condition message. This is especially relevant for ROSA HCP and ARO HCP where SRE teams monitor conditions at fleet scale.

I'd suggest validating this with the ROSA and ARO SRE teams to see if a dedicated condition would help their workflows.

name: "rhel-10"
Resolve stream ────────▶ Token Secret
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small correction on the pipeline description: copyMCOOutputToMCC() is not a separate top-level step in GetPayload(). Looking at local_ignitionprovider.go, it's called inside runMCO() at line 454, not as an independent step between runMCO() and runMCC().

The actual sequence in GetPayload() is:

1. runFeatureGateRender()          — line 735
2. runMCO()                        — line 740
     └── copyMCOOutputToMCC()      — line 454 (inside runMCO)
3. [NEW: write 99_osimagestream.yaml to mccDir]
4. runMCC()                        — line 745
5. runMCSAndFetchPayload()         — line 750

The injection point between runMCO() and runMCC() (lines 743-745) is correct. Just the diagram that should reflect copyMCOOutputToMCC is internal to runMCO, not a standalone step.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated


7. **Pass stream through `IgnitionProvider` interface** — add `osStream string` parameter to `GetPayload()` in the `IgnitionProvider` interface (`tokensecret_controller.go`). The `TokenSecretReconciler` reads `os-stream` from the token secret data and passes it to `GetPayload()`. Update the mock `IgnitionProvider` in tests accordingly.

```go
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a critical implementation invariant that I think deserves more emphasis in the enhancement:

The hash must use the value from spec.osImageStream.name directly (empty string when the field is unset), never the resolved value from getRHELStream(). If someone accidentally puts the resolved default (e.g., "rhel-10" for implicit >=5.0) into the hash input, it would change the hash for every existing NodePool without explicit osImageStream set, triggering a mass rollout of all nodes across the fleet.

In Go, "abc" + "" == "abc", so appending an empty string to the hash input is safe — the hash stays identical. But appending a non-empty resolved default would break this property.

I'd suggest documenting this explicitly as a design invariant, something like: "The rhelStream field in rolloutConfig MUST be populated from spec.osImageStream.name, never from the return value of getRHELStream(). This ensures zero-impact on existing NodePools."

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

- Boot image ConfigMap parsing supports multi-stream format.
- Stream plumbed through token secret to ignition server.
- OSImageStream CR generated in `GetPayload()`.
- MCO ExternalTopologyMode guard removed (coordinated with MCO team).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The graduation criteria describe TechPreview first, then GA, but don't specify a timeline. A couple of questions:

  • Is TechPreview targeting 5.0? If so, when does GA land?
  • If GA targets 5.0 as well (no TechPreview phase), then full E2E coverage is needed before that release. The current test plan only covers TechPreview CI jobs.
  • The NodePool API field promotion must be coordinated with the MachineConfigPool OSStreams feature gate promotion in openshift/api — is there alignment on timeline with the MCO team?


### Architecture Overview

The OS stream flows through three layers:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The enhancement mentions in Non-Goals that "Karpenter NodePools always use the version-derived default stream" but doesn't elaborate on how this works in practice.

Looking at the current code, OpenshiftEC2NodeClass has no field for OS stream selection. setKarpenterAMILabels() in token.go resolves a single AMI per architecture from StreamMetadata with no stream dimension. The in-memory NodePool created by KarpenterIgnitionReconciler.createInMemoryNodePool() doesn't carry spec.osImageStream.

Could you add a brief "Karpenter" subsection that clarifies:

  • How the version-derived default stream flows through the Karpenter pipeline (from release image → in-memory NodePool → setKarpenterAMILabels → AMI labels → EC2NodeClass)?
  • Whether OpenshiftEC2NodeClass will gain a stream field in a future phase, or if this is explicitly a non-goal?
  • Whether the AMI label scheme (hypershift.openshift.io/ami, hypershift.openshift.io/ami-arm64) needs any changes for dual-stream, given it currently assumes one AMI per architecture.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to clarify

- No manual action required.

**Downgrade to a version without this feature:**
- `spec.osImageStream` is ignored by older controllers. NodePools revert to the single-stream behavior (always using the `rhel-coreos` image from the release payload).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section covers upgrade/downgrade for the feature itself but doesn't address what happens when spec.osImageStream changes on a NodePool with upgradeType: InPlace.

For Replace NodePools this is straightforward — a new hash triggers a MachineDeployment rollout, old nodes are replaced with new ones booting the correct stream. But for InPlace NodePools, changing the stream would mean the MCD on the running node attempts an rpm-ostree rebase from RHEL 9 to RHEL 10 (or vice versa) — which is a cross-major-version OS rebase on a live node.

Looking at the current codebase:

  • The in-place upgrader (inplaceupgrader.go) has no rollback mechanism. If a node enters MachineConfigDaemonStateDegraded, the upgrade stops but does not revert previously upgraded nodes. (Grep for "rollback"/"revert"/"undo" in the inplaceupgrader directory returns zero results.)
  • E2E tests for in-place upgrades (nodepool_upgrade_test.go, nodepool_lifecycle_test.go) test forward version upgrades only. No test exercises cross-RHEL-version changes or failure/degradation scenarios.
  • The Agent platform has no restrictions on upgrade type — Agent NodePools can be InPlace. In self-managed baremetal deployments, InPlace is common because re-provisioning hardware is expensive.

This means a baremetal operator could change osImageStream on an InPlace NodePool, triggering an OS rebase across all metal nodes with no rollback path if something goes wrong.

I'd suggest the enhancement explicitly addresses InPlace NodePools. Options:

  1. Reject osImageStream changes on InPlace NodePools via validation (safest for TechPreview)
  2. Document it as supported-but-risky with a clear warning in the condition message
  3. Explicitly list it as a non-goal for the initial implementation

Also worth noting: for disconnected/air-gapped baremetal deployments, dual-stream means customers need to mirror both streams' container images and boot artifacts. The enhancement should document what additional content needs to be mirrored.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

version downgrades are not supported. Clarified. Day 2 changes to the streams are supported as they are also in the canonical machineConfigPool API.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuqi-zhang does mco does dualstream UX really supports moving back and forward inplace between rhel 9 and 10?


1. Update `DeserializeImageMetadata` (`support/releaseinfo/deserialize.go`) — parse the `streams` key when present, fall back to `stream` for older payloads. Return stream-indexed metadata.
2. Update `CoreOSStreamMetadata` (`support/releaseinfo/releaseinfo.go`) — support multiple streams (map of stream name to per-arch metadata).
3. Wire default boot image resolution per platform — accept stream parameter and look up stream-specific boot images from the parsed metadata:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small verification question: the enhancement says the NodePool controller infers the OS stream from CAPI Machine.Status.NodeInfo.OSImage. The current code in nodeVersionsFromMachines() (version.go) reads KubeletVersion from Machine.Status.NodeInfo but doesn't read OSImage today.

CAPI should copy NodeInfo (including OSImage) from the guest Node to the Machine status, so this should work. But has this been verified across all platforms (AWS, Azure, Agent, KubeVirt)? If any platform's CAPI provider doesn't populate Machine.Status.NodeInfo.OSImage, the status reporting would silently fail.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not a provider specific thing, it's core capi bubbling up nodeinfo, that's a contractual boundary we can consider consumable.

@enxebre
Copy link
Copy Markdown
Member Author

enxebre commented May 25, 2026

/label tide/merge-method-squash

@openshift-ci openshift-ci Bot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label May 25, 2026
enxebre and others added 2 commits May 25, 2026 15:19
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix pipeline diagram: copyMCOOutputToMCC() is internal to runMCO(), not a standalone step
- Add boot image error behavior for missing stream+region (NodePoolValidPlatformImageType=False)
- Add Karpenter step (phase 1) clarifying version-derived default flow
- Add design invariant for rolloutConfig.rhelStream hash safety
- Rewrite E2E section: 8 NodePools in TestNodePool suite covering explicit/implicit streams, validation, runc, and upgrade stream switches (Replace + InPlace)
- Add disconnected mirroring risk (IDMS/ITMS handles both streams transparently)
- Downgrade section: downgrades are not supported

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@enxebre enxebre force-pushed the dual-stream-rhel-nodepool branch from c413be6 to 97df15f Compare May 25, 2026 13:19
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 25, 2026

@enxebre: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/markdownlint 97df15f link true /test markdownlint

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Copy Markdown

@sdminonne sdminonne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. Thanks for this @enxebre


4. **Mixed-stream clusters.** A HostedCluster with RHEL 9 and RHEL 10 NodePools is a new topology. Component compatibility across RHEL versions must be validated. *Mitigation:* This is the same topology supported by standalone clusters via per-MachineConfigPool stream selection.

5. **Disconnected environments require mirroring both streams.** Dual-stream payloads carry two sets of node OS container images (one per RHEL stream). Disconnected customers using both streams must mirror both sets. Boot images are platform-specific and already handled outside the payload (e.g. pre-uploaded AMIs, VHDs). The OS container images are referenced via `ImageDigestMirrorSet` / `ImageTagMirrorSet` — existing IDMS/ITMS mirroring workflows handle this transparently as long as both stream images are included in the mirror list. No additional HyperShift-specific mirroring tooling is needed.
Copy link
Copy Markdown

@sdminonne sdminonne May 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't supply hard values but if the size of two streams is critical in disconnected we should eventually double check with ROSA people for ECR size limit for their zero egress config.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 26, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sdminonne
Once this PR has been reviewed and has the lgtm label, please assign enxebre for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown
Contributor

@csrwng csrwng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Just a question after a brief look


## Open Questions

1. Should `spec.osImageStream` be immutable after creation (requiring NodePool replacement to change streams), or mutable (triggering a rolling update)? The current proposal allows mutation with rollout, mirroring how other NodePool spec changes work.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense for replace strategy. For in-place, mutation would only make sense if you're moving forward (rhel9 -> rhel10), but not the other way, no?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants