CNTRLPLANE-2207: feat(install): self-manage webhook certs instead of relying on service-ca#8174
Conversation
…e-ca The service-ca operator is not available on non-OpenShift clusters (e.g. AKS). Instead of relying on its annotations to generate serving certs and inject CA bundles, always self-manage them: - At install time: generate a self-signed CA and serving cert using support/certs, set caBundle directly on CRDs and webhook configs. - At runtime: a WebhookCertReconciler (following the SharedIngressReconciler pattern) auto-renews the serving cert when < 30 days of validity remain and patches caBundle on CRDs and webhook configurations. The service-ca annotations (inject-cabundle on CRDs/webhook configs, serving-cert-secret-name on the Service) are removed as they are no longer needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Clarify that this function is only called once at install time, not during runtime renewal. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ecret controller The default controller name is derived from the watched resource type, which conflicts with an existing secret controller in the same manager. Signed-off-by: Alberto Garcia <agarcial@redhat.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The self-managed webhook cert change removed the service-ca annotations but only injected the CA bundle into CRDs at install time, not into the MutatingWebhookConfiguration and ValidatingWebhookConfiguration resources. The webhook configurations were deployed without a CA bundle, causing API calls through the webhook to fail with "x509: certificate signed by unknown authority" until the runtime controller patched them. Move the cert generation before webhook config creation and pass the CA bundle directly into all webhook ClientConfig entries at install time. Signed-off-by: Borja Clemente <bclement@redhat.com>
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@clebs: This pull request references CNTRLPLANE-2207 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
📝 WalkthroughWalkthroughThis pull request implements explicit webhook certificate management for HyperShift operators, removing reliance on OpenShift-specific service annotations. The changes introduce: (1) initial webhook certificate generation ( Sequence DiagramsequenceDiagram
participant Install as Installation Process
participant CA as Certificate Generation
participant APIServer as Kubernetes API
participant Reconciler as WebhookCertReconciler<br/>Controller
participant Secrets as Secret Resources
participant WebhookConfig as Webhook/CRD<br/>Resources
Install->>CA: GenerateInitialWebhookCerts(namespace, serviceName)
CA->>Secrets: Create CA Secret<br/>(self-signed certificate)
CA->>Secrets: Create Serving TLS Secret
CA-->>Install: Return CA bundle bytes
Install->>WebhookConfig: Apply configs with explicit<br/>CABundle field
Reconciler->>APIServer: Watch Secrets in namespace
Secrets-->>Reconciler: Secret created/modified event
Reconciler->>Secrets: ReconcileSelfSignedCA (CA Secret)
Reconciler->>Secrets: ReconcileSignedCert (Serving Secret)
Reconciler->>WebhookConfig: Patch CRDs conversion<br/>webhook.clientConfig.caBundle
Reconciler->>WebhookConfig: Patch MutatingWebhookConfiguration<br/>clientConfig.caBundle
Reconciler->>WebhookConfig: Patch ValidatingWebhookConfiguration<br/>clientConfig.caBundle
Reconciler->>Reconciler: Requeue after 12h
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
Skipping CI for Draft Pull Request. |
|
/test e2e-aws e2e-aks e2e-kubevirt-aws-ovn-reduced |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #8174 +/- ##
==========================================
+ Coverage 29.89% 30.27% +0.38%
==========================================
Files 1050 1051 +1
Lines 97819 98053 +234
==========================================
+ Hits 29240 29688 +448
+ Misses 66075 65823 -252
- Partials 2504 2542 +38
🚀 New features to boost your workflow:
|
Test Resultse2e-aks
e2e-aws
|
|
@clebs: This pull request references CNTRLPLANE-2207 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/test e2e-aws |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
hypershift-operator/controllers/webhookcerts/webhookcerts_controller.go (1)
156-204: Consider using patch instead of update for webhook configurations.The current Get/Update pattern could encounter conflicts if another controller modifies the webhook configuration between the Get and Update calls. While this is unlikely in practice, using a patch operation (similar to
patchCRDsCABundle) would be more robust.♻️ Suggested refactor to use patch
// Patch MutatingWebhookConfiguration mwc := &admissionregistrationv1.MutatingWebhookConfiguration{} if err := r.Client.Get(ctx, client.ObjectKey{Name: webhookName}, mwc); err != nil { if !apierrors.IsNotFound(err) { return fmt.Errorf("failed to get MutatingWebhookConfiguration: %w", err) } } else { needsPatch := false + patch := client.MergeFrom(mwc.DeepCopy()) for i := range mwc.Webhooks { if !bytes.Equal(mwc.Webhooks[i].ClientConfig.CABundle, caBundle) { needsPatch = true mwc.Webhooks[i].ClientConfig.CABundle = caBundle } } if needsPatch { - if err := r.Client.Update(ctx, mwc); err != nil { + if err := r.Client.Patch(ctx, mwc, patch); err != nil { return fmt.Errorf("failed to update MutatingWebhookConfiguration: %w", err) } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hypershift-operator/controllers/webhookcerts/webhookcerts_controller.go` around lines 156 - 204, Replace the current Get/Update pattern in patchWebhookConfigsCABundle with a patch using client.Patch to avoid update conflicts: for both MutatingWebhookConfiguration and ValidatingWebhookConfiguration (variables mwc and vwc) keep the existing Get call to load the resource and create an original := mwc.DeepCopy() (or vwc.DeepCopy()), then modify the DeepCopy's Webhooks[*].ClientConfig.CABundle where different, and if any changes call r.Client.Patch(ctx, modified, client.MergeFrom(original)); handle apierrors.IsNotFound the same way and return the same wrapped errors on Patch failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@hypershift-operator/controllers/webhookcerts/webhookcerts_controller.go`:
- Around line 156-204: Replace the current Get/Update pattern in
patchWebhookConfigsCABundle with a patch using client.Patch to avoid update
conflicts: for both MutatingWebhookConfiguration and
ValidatingWebhookConfiguration (variables mwc and vwc) keep the existing Get
call to load the resource and create an original := mwc.DeepCopy() (or
vwc.DeepCopy()), then modify the DeepCopy's Webhooks[*].ClientConfig.CABundle
where different, and if any changes call r.Client.Patch(ctx, modified,
client.MergeFrom(original)); handle apierrors.IsNotFound the same way and return
the same wrapped errors on Patch failure.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 55f58e8b-55c7-4666-a859-9eafca18fcd7
📒 Files selected for processing (6)
cmd/install/assets/hypershift_operator.gocmd/install/install.gocmd/install/install_test.gohypershift-operator/controllers/webhookcerts/webhookcerts_controller.gohypershift-operator/controllers/webhookcerts/webhookcerts_controller_test.gohypershift-operator/main.go
|
/test images |
|
/test e2e-aws |
2 similar comments
|
/test e2e-aws |
|
/test e2e-aws |
|
/cc @LiangquanLi930 |
|
/label tide/merge-method-squash |
|
/pipeline required |
|
Scheduling tests matching the |
|
/test e2e-aws e2e-aws-upgrade-hypershift-operator |
|
Tests from second stage were triggered manually. Pipeline can be controlled only manually, until HEAD changes. Use command to trigger second stage. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bryan-cox, clebs, enxebre The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/verified by e2e |
|
@clebs: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@clebs: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
…relying on service-ca (openshift#8174) * feat(install): self-manage webhook certs instead of relying on service-ca The service-ca operator is not available on non-OpenShift clusters (e.g. AKS). Instead of relying on its annotations to generate serving certs and inject CA bundles, always self-manage them: - At install time: generate a self-signed CA and serving cert using support/certs, set caBundle directly on CRDs and webhook configs. - At runtime: a WebhookCertReconciler (following the SharedIngressReconciler pattern) auto-renews the serving cert when < 30 days of validity remain and patches caBundle on CRDs and webhook configurations. The service-ca annotations (inject-cabundle on CRDs/webhook configs, serving-cert-secret-name on the Service) are removed as they are no longer needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: rename GenerateWebhookCerts to GenerateInitialWebhookCerts Clarify that this function is only called once at install time, not during runtime renewal. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(webhookcerts): name controller to avoid collision with existing secret controller The default controller name is derived from the watched resource type, which conflicts with an existing secret controller in the same manager. Signed-off-by: Alberto Garcia <agarcial@redhat.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(install): inject CA bundle into webhooks The self-managed webhook cert change removed the service-ca annotations but only injected the CA bundle into CRDs at install time, not into the MutatingWebhookConfiguration and ValidatingWebhookConfiguration resources. The webhook configurations were deployed without a CA bundle, causing API calls through the webhook to fail with "x509: certificate signed by unknown authority" until the runtime controller patched them. Move the cert generation before webhook config creation and pass the CA bundle directly into all webhook ClientConfig entries at install time. Signed-off-by: Borja Clemente <bclement@redhat.com> --------- Signed-off-by: Alberto Garcia <agarcial@redhat.com> Signed-off-by: Borja Clemente <bclement@redhat.com> Co-authored-by: enxebre <alberto.garcial@hotmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The webhookcerts controller introduced in PR openshift#8174 (CNTRLPLANE-2207) performs Get/Update on MutatingWebhookConfiguration and ValidatingWebhookConfiguration via the cached client, which triggers lazy informer creation requiring list/watch at cluster scope. The ClusterRole was missing these permissions, causing repeated reflector errors in the operator logs. Add a cluster-scoped RBAC rule for mutatingwebhookconfigurations and validatingwebhookconfigurations with get, list, watch, and update verbs. Closes: OCPBUGS-83751 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Juan Manuel Parrilla Madrid <jparrill@redhat.com>
The service-ca operator is not available on non-OpenShift clusters (e.g. AKS). Instead of relying on its annotations to generate serving certs and inject CA bundles, always self-manage them for the HO SVC/Webhook:
The service-ca annotations (inject-cabundle on CRDs/webhook configs, serving-cert-secret-name on the Service) are removed as they are no longer needed.
Checklist:
Summary by CodeRabbit
Release Notes
New Features
Improvements