CNTRLPLANE-631: Install image registry CA via MCO in HyperShift#7943
CNTRLPLANE-631: Install image registry CA via MCO in HyperShift#7943hypershift-jira-solve-ci[bot] wants to merge 3 commits into
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
/auto-cc |
|
@hypershift-jira-solve-ci[bot]: This pull request references CNTRLPLANE-631 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@hypershift-jira-solve-ci[bot]: This pull request references CNTRLPLANE-631 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughWhen Image Registry capability is enabled, the HostedControlPlane reconciler now creates/reconciles a self-signed image-registry signer Secret and reads its CA certificate. The reconciler always invokes an ignition reconciliation that builds or updates a core ignition ConfigMap and corresponding MachineConfig embedding the signer CA for worker nodes. If the capability is disabled, the image-registry ignition ConfigMap is deleted. If the signer Secret is absent during reconciliation, the process logs this and produces ignition content with empty CA data. Sequence Diagram(s)sequenceDiagram
participant HCP as HostedControlPlane Reconciler
participant PKI as PKI Reconciler
participant Secret as ImageRegistry Signer Secret (k8s)
participant Ignition as Ignition Config Builder
participant MC as MachineConfig (k8s)
participant Worker as Worker Nodes
HCP->>HCP: Check Image Registry capability
alt Capability enabled
HCP->>PKI: ReconcileImageRegistrySignerSecret(...)
PKI->>Secret: Create/Update self-signed signer Secret
HCP->>Secret: Fetch signer Secret (may be NotFound)
Secret-->>HCP: CA data or NotFound
HCP->>Ignition: ReconcileImageRegistryCAIgnitionConfig(cm, ownerRef, CA-data)
Ignition->>MC: Create/Update MachineConfig with CA file at imageRegistryCertPath
MC-->>Ignition: MachineConfig reconciled
MC->>Worker: MachineConfig delivered/applied
Worker->>Worker: CA file written to filesystem
else Capability disabled
HCP->>Ignition: Delete ignition ConfigMap for image-registry CA
end
🚥 Pre-merge checks | ✅ 8 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (8 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@hypershift-jira-solve-ci[bot]: This pull request references CNTRLPLANE-631 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
| // image registry service-CA certificate on worker nodes. This allows nodes to | ||
| // trust the internal image registry without requiring the node-ca daemonset. | ||
| // When serviceCA is empty (e.g., during initial cluster bootstrap before the | ||
| // service-ca operator has started), a no-op MachineConfig is created. |
There was a problem hiding this comment.
We have been moving away from using the service-ca operator at all. How does that affect this PR and this comment here?
We self sign certs. If you look in the codebase, you should see this pattern.
There was a problem hiding this comment.
Good point. The current implementation reads the service-CA certificate from the service-serving-ca ConfigMap that the ManagedCAObserver syncs from the guest cluster's openshift-managed-config/service-ca. If HyperShift is moving away from using the service-ca operator entirely, this approach would need to change.
Looking at the codebase, the self-signed cert pattern is used in the pki/ package (e.g., pki/ca.go). The alternative approach would be to generate a self-signed CA for the image registry's serving cert in the control plane PKI reconciliation, rather than depending on the guest cluster's service-ca operator to produce it.
I'll investigate the self-signed cert pattern and rework this to align with that direction. Could you point me to a specific example in the codebase that best represents the pattern you'd like followed here?
AI-assisted response via Claude Code
There was a problem hiding this comment.
You should be able to look at reconcileSelfSignedCA in the codebase
There was a problem hiding this comment.
Done. Reworked to use the reconcileSelfSignedCA pattern:
- Added
ImageRegistrySignerSecretmanifest andReconcileImageRegistrySignerSecretinpki/ca.go - Wired it into
reconcilePKIto create a self-signedimage-registry-signerSecret - Updated
reconcileImageRegistryCAIgnitionConfigto read the CA cert from the signer Secret (ca.crtkey) instead of theservice-serving-caConfigMap
This eliminates the dependency on the service-ca operator and the bootstrap chicken-and-egg problem.
AI-assisted response via Claude Code
|
|
||
| "github.com/clarketm/json" | ||
| igntypes "github.com/coreos/ignition/v2/config/v3_2/types" | ||
| "github.com/vincent-petithory/dataurl" |
There was a problem hiding this comment.
Is there something else we can use beside this? Something more official.
There was a problem hiding this comment.
Good question. The github.com/vincent-petithory/dataurl package is already used elsewhere in the codebase — specifically in hypershift-operator/controllers/nodepool/apiserver-haproxy/haproxy.go (line 472) for encoding ignition file contents via dataurl.EncodeBytes(). So it's an existing dependency, not a new one introduced by this PR.
That said, if you'd prefer to avoid this third-party library, we could use fmt.Sprintf("data:text/plain;charset=utf-8;base64,%s", base64.StdEncoding.EncodeToString([]byte(serviceCA))) from the standard library instead, which produces the same RFC 2397 data URL format that ignition expects. Let me know which approach you'd prefer.
AI-assisted response via Claude Code
There was a problem hiding this comment.
If it's already in the repo, I guess that is fine.
There was a problem hiding this comment.
Sounds good, thanks for confirming! 👍
AI-assisted response via Claude Code
| return nil | ||
| } | ||
|
|
||
| // Fetch the service-serving CA. This ConfigMap is populated by the ManagedCAObserver |
There was a problem hiding this comment.
ManagedCAObserver is an existing controller in this codebase at control-plane-operator/hostedclusterconfigoperator/controllers/cmca/configmap_observer.go (line 59). It watches two CA ConfigMaps in the guest cluster (openshift-managed-config/router-ca and openshift-managed-config/service-ca) and syncs their content to a ConfigMap on the management cluster side. A separate controller then uses that synced content to adjust the Kube controller manager CA.
This PR reads the service-serving-ca ConfigMap that the ManagedCAObserver has already synced to the control plane namespace, to get the service-CA certificate for the image registry. However, per your earlier comment about moving away from the service-ca operator, this approach may need to be reworked to use self-signed certs instead.
AI-assisted response via Claude Code
|
@bryan-cox Re: the AI-assisted response via Claude Code |
|
Re: dataurl dependency concern Thanks for confirming, @bryan-cox! 👍 AI-assisted response via Claude Code |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: hypershift-jira-solve-ci[bot] The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@hypershift-jira-solve-ci[bot]: This pull request references CNTRLPLANE-631 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go`:
- Around line 2075-2078: The code currently treats an existing
imageRegistrySigner secret with a missing certs.CASignerCertMapKey as a no-op;
instead, detect signerCA == "" and fail fast by returning an error from the
reconciliation (e.g., fmt.Errorf("image registry signer secret exists but %s is
missing or empty", certs.CASignerCertMapKey)) so the controller surfaces the
problem and requeues; update the block around signerCA/imageRegistrySigner to
log the error and return that error (rather than continuing with an empty
signerCA).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: efc9fe56-ec94-4e19-aebd-76df774ebd1f
📒 Files selected for processing (4)
control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.gocontrol-plane-operator/controllers/hostedcontrolplane/ignition/reconcile.gocontrol-plane-operator/controllers/hostedcontrolplane/manifests/pki.gocontrol-plane-operator/controllers/hostedcontrolplane/pki/ca.go
🚧 Files skipped from review as they are similar to previous changes (1)
- control-plane-operator/controllers/hostedcontrolplane/ignition/reconcile.go
|
@hypershift-jira-solve-ci[bot]: This pull request references CNTRLPLANE-631 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@coderabbitai Thanks for the review. This is already handled in the current code. Lines 2075-2078 of caBytes, hasCA := imageRegistrySigner.Data[certs.CASignerCertMapKey]
if !hasCA || len(caBytes) == 0 {
return fmt.Errorf("image registry signer secret %s/%s is missing %q", imageRegistrySigner.Namespace, imageRegistrySigner.Name, certs.CASignerCertMapKey)
}This was addressed in commit AI-assisted response via Claude Code |
|
hypershift-jira-solve-ci - unit tests are failing |
|
@bryan-cox Fixed. The karpenter ignition controller tests ( Added the missing AI-assisted response via Claude Code |
|
@hypershift-jira-solve-ci[bot]: This pull request references CNTRLPLANE-631 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #7943 +/- ##
==========================================
+ Coverage 36.08% 36.13% +0.04%
==========================================
Files 767 767
Lines 93486 93565 +79
==========================================
+ Hits 33737 33812 +75
+ Misses 57042 57040 -2
- Partials 2707 2713 +6
🚀 New features to boost your workflow:
|
|
Use the /restructure-commits command out of the hypershift repo to clean up the commits |
|
/test address-review-comments |
|
Done. Restructured the 8 commits into 2 component-based commits:
AI-assisted response via Claude Code |
|
this change would trigger a fleet wide nodepool rollout. /hold |
|
Re: #7943 (comment) @enxebre Valid concern. Adding the 4th core ignition ConfigMap (image registry CA) changes the The unit test hash changes (e.g. The AI-assisted response via Claude Code |
eb7831f to
2c8b52a
Compare
Add ReconcileImageRegistryCAIgnitionConfig to distribute the internal image registry CA certificate to worker nodes through a MachineConfig ignition config. This allows nodes to trust the internal image registry without relying on the node-ca daemonset. - Create a MachineConfig placing the CA cert at the CRI-O trust path /etc/docker/certs.d/image-registry.openshift-image-registry.svc:5000/ca.crt - Add IgnitionImageRegistryCAConfig and MachineConfigImageRegistryCA manifest helpers - Wire up reconciliation in the hosted control plane controller with automatic cleanup when image registry capability is disabled - Add unit tests for the ignition config reconciliation Refs: CNTRLPLANE-631 Signed-off-by: OpenShift CI Bot <ci-bot@redhat.com>
Update the NodePool controller to account for the new image registry CA ignition ConfigMap when image registry capability is enabled, incrementing the expected core config resource count by one. Update tests across nodepool conditions, config generation, secret janitor, and karpenter ignition controllers to reflect the additional core ignition ConfigMap. Refs: CNTRLPLANE-631 Signed-off-by: OpenShift CI Bot <ci-bot@redhat.com>
c757975 to
b5742f7
Compare
jparrill
left a comment
There was a problem hiding this comment.
Dropped some comments. Thanks!
| } | ||
| if capabilities.IsImageRegistryCapabilityEnabled(cg.hostedCluster.Spec.Capabilities) { | ||
| // additional core config resource for image registry CA distribution. | ||
| expectedCoreConfigResources += 1 |
There was a problem hiding this comment.
This is dangerous, it will affect to all HostedCluster currently deployed, causing a rollout.
There was a problem hiding this comment.
Good point. This increment only takes effect when the image registry capability is enabled (which is the default). For existing clusters, this works safely because the CPO creates the new ignition-config-image-registry-ca ConfigMap during its reconcile loop, so by the time the NodePool controller checks expectedCoreConfigResources, the ConfigMap already exists. The rollout is intentional — workers need the CA cert to trust the internal image registry's TLS certificate. Without it, image pulls from the internal registry fail.
That said, I've added a note in the PR description about the bootstrap behavior: workers initially boot with a no-op MachineConfig (no CA data), and once the service-ca-operator runs and ManagedCAObserver syncs the CA, the MachineConfig is updated with the real certificate, which triggers the MCO-driven node rollout.
AI-assisted response via Claude Code
| return nil | ||
| } | ||
|
|
||
| // reconcileImageRegistryCAIgnitionConfig creates or updates a core ignition config |
There was a problem hiding this comment.
Add unit tests for this method, at least with the 5 basic flow paths (capability disabled, ConfigMap not found, key missing, key empty, happy path).
There was a problem hiding this comment.
Done. Added unit tests in hostedcontrolplane_controller_test.go covering all 5 flow paths:
- Capability disabled → verifies ignition config is deleted
- ConfigMap not found → verifies no-op ignition config is created
- Key missing from ConfigMap → verifies error is returned
- Key empty in ConfigMap → verifies error is returned
- Happy path (valid CA data) → verifies ignition config is created with correct labels and data
AI-assisted response via Claude Code
| // Fetch the service-serving-ca ConfigMap which is synced from the guest cluster | ||
| // by ManagedCAObserver. This CA signs the image registry's serving certificate | ||
| // via the guest cluster's service-ca-operator. | ||
| const serviceCAKey = "service-ca.crt" |
There was a problem hiding this comment.
I think we have this constant in other place, do you mind move it to manifests pkg?
There was a problem hiding this comment.
Done. Moved the constant to manifests.ServiceCAKey in control-plane-operator/controllers/hostedcontrolplane/manifests/kcm.go alongside the ServiceServingCA() manifest function, and updated the controller to reference manifests.ServiceCAKey.
AI-assisted response via Claude Code
| ctx := t.Context() | ||
|
|
||
| pullSecret, ignitionServerCACert, machineConfig, ignitionConfig, ignitionConfig2, ignitionConfig3 := setupTestObjects() | ||
| pullSecret, ignitionServerCACert, machineConfig, ignitionConfig, ignitionConfig2, ignitionConfig3, ignitionConfig4 := setupTestObjects() |
There was a problem hiding this comment.
The setupTestObjects() function now returns 7 positional values, which is hard to read and will keep growing with each new core config:
pullSecret, ignitionServerCACert, machineConfig, ignitionConfig, ignitionConfig2, ignitionConfig3, ignitionConfig4 := setupTestObjects()
Consider returning a struct instead — this would make adding future core configs a one-line change rather than updating the return signature and every call site:
type testFixtures struct {
pullSecret *corev1.Secret
ignitionServerCACert *corev1.Secret
machineConfig *corev1.ConfigMap
coreIgnitionConfigs []*corev1.ConfigMap
}
This is a pre-existing pattern issue but IMHO make sense to change it in this PR to avoid increase the tech debt.
There was a problem hiding this comment.
Done. Refactored setupTestObjects() to return a testFixtures struct with named fields:
```go
type testFixtures struct {
pullSecret *corev1.Secret
ignitionServerCACert *corev1.Secret
machineConfig *corev1.ConfigMap
coreIgnitionConfigs []*corev1.ConfigMap
}
```
The 4 core ignition ConfigMaps are now in a slice, so adding future core configs is a one-line change. Updated both call sites (TestUpdatingConfigCondition and TestUpdatingVersionCondition) to use the struct.
AI-assisted response via Claude Code
| "github.com/vincent-petithory/dataurl" | ||
| ) | ||
|
|
||
| func TestReconcileImageRegistryCAIgnitionConfig(t *testing.T) { |
There was a problem hiding this comment.
Add a check for the OwnerReference added during the reconciliation.
There was a problem hiding this comment.
Done. Added OwnerReference assertions in the test to verify that APIVersion, Kind, and Name are set correctly after reconciliation.
AI-assisted response via Claude Code
| t.Logf("AllMachinesReady was not observed as False with aggregated message during provisioning "+ | ||
| t.Logf("AllMachinesReady was not observed as False with aggregated message during provisioning " + | ||
| "(CAPI provider may have set Ready condition before we could observe nil state)") | ||
| } |
There was a problem hiding this comment.
Check the the CA cert file exists.
There was a problem hiding this comment.
Done. Added an explicit assertion in the unit test verifying the CA cert file path matches the expected value (`/etc/docker/certs.d/image-registry.openshift-image-registry.svc:5000/ca.crt`). The unit test in `reconcile_image_registry_ca_test.go` now checks both the constant reference and the literal path string.
For e2e validation on actual worker nodes, that would require SSH access to the node which is beyond the scope of this unit test change — happy to add an e2e test case in a follow-up if needed.
AI-assisted response via Claude Code
| // service-serving-ca certificate on worker nodes. This allows nodes to trust | ||
| // the internal image registry without requiring the node-ca daemonset. | ||
| // When serviceCA is empty (e.g., during initial cluster bootstrap before the | ||
| // service-serving-ca ConfigMap has been synced), a no-op MachineConfig is created. |
There was a problem hiding this comment.
Consider adding a brief note in the PR description about the bootstrap behavior:
worker nodes initially boot with a no-op MachineConfig (no CA data). Once the guest cluster's service-ca-operator runs and ManagedCAObserver syncs the CA, the MachineConfig is updated with the real certificate, which triggers an MCO-driven node rollout. This is expected and unavoidable, but operationally significant for anyone debugging a node restart during initial cluster provisioning.
There was a problem hiding this comment.
Done. Added the bootstrap behavior note to the PR description:
Worker nodes initially boot with a no-op MachineConfig (no CA data). Once the guest cluster's service-ca-operator runs and ManagedCAObserver syncs the CA, the MachineConfig is updated with the real certificate, which triggers an MCO-driven node rollout. This is expected and unavoidable, but operationally significant for anyone debugging a node restart during initial cluster provisioning.
AI-assisted response via Claude Code
| // that distributes the service-serving-ca certificate to worker nodes, enabling | ||
| // them to trust the internal image registry's TLS certificate. The service-serving-ca | ||
| // ConfigMap is synced from the guest cluster by ManagedCAObserver. | ||
| // When image registry capability is disabled, the ignition ConfigMap is cleaned up. |
There was a problem hiding this comment.
I would add something like:
// NOTE: This relies on the guest cluster's service-ca-operator to sign the
// image registry's serving certificate. If HyperShift moves away from
// service-ca-operator in the future, this will need to use whatever
// replaces it as the CA source.
There was a problem hiding this comment.
Done. Added the suggested NOTE comment to the `reconcileImageRegistryCAIgnitionConfig` function documentation:
```go
// NOTE: This relies on the guest cluster's service-ca-operator to sign the
// image registry's serving certificate. If HyperShift moves away from
// service-ca-operator in the future, this will need to use whatever
// replaces it as the CA source.
```
AI-assisted response via Claude Code
|
/test address-review-comments |
|
@hypershift-jira-solve-ci[bot]: This pull request references CNTRLPLANE-631 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
- Move serviceCAKey constant to manifests.ServiceCAKey for reuse - Add NOTE comment about service-ca-operator dependency - Add unit tests for reconcileImageRegistryCAIgnitionConfig covering all 5 flow paths (capability disabled, ConfigMap not found, key missing, key empty, happy path) - Refactor setupTestObjects() in nodepool conditions_test.go to return a testFixtures struct instead of 7 positional values - Add OwnerReference and CA cert file path assertions in the ignition reconcile test Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Re: #7943 (review) Thanks @jparrill for the thorough review! All comments have been addressed: Code changes made:
Replies posted:
AI-assisted response via Claude Code |
|
@hypershift-jira-solve-ci[bot]: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
The PR's last commit was on April 27, 2026, while conflicting changes to Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryBoth failures — Root CauseThe PR branch Conflicting file: Commits on
Git also flagged merge activity (but no conflict) in two additional files the PR touches:
Because the merge failed, the CI job recorded The PR also carries the Recommendations
Evidence
|
What this PR does / why we need it:
Distributes the image registry service-CA certificate to worker nodes via MCO (MachineConfig Operator) instead of relying on the node-ca daemonset.
This PR:
ReconcileImageRegistryCAIgnitionConfigfunction that creates a MachineConfig (99-worker-image-registry-ca) placing the service-CA certificate at/etc/docker/certs.d/image-registry.openshift-image-registry.svc:5000/ca.crton worker nodes, allowing CRI-O to trust the internal image registry's TLS certificatereconcileCoreIgnitionConfigflow, fetching the service-serving-ca ConfigMap (synced from the guest cluster by ManagedCAObserver)getCoreConfigsexpected core config count to account for the new image registry CA ignition config when the image registry capability is enabledBootstrap behavior
Worker nodes initially boot with a no-op MachineConfig (no CA data). Once the guest cluster's service-ca-operator runs and ManagedCAObserver syncs the CA, the MachineConfig is updated with the real certificate, which triggers an MCO-driven node rollout. This is expected and unavoidable, but operationally significant for anyone debugging a node restart during initial cluster provisioning.
Which issue(s) this PR fixes:
Fixes https://issues.redhat.com/browse/CNTRLPLANE-631
Special notes for your reviewer:
Capabilitiesis nilServiceCAKeyconstant is defined in the manifests package for reusesetupTestObjects()helper in nodepool condition tests now returns atestFixturesstruct instead of 7 positional valuesChecklist:
Always review AI generated responses prior to use.
Generated with Claude Code via
/jira:solve [CNTRLPLANE-631](https://issues.redhat.com/browse/CNTRLPLANE-631)Summary by CodeRabbit
New Features
Tests