Skip to content

CNTRLPLANE-2207: feat(install): self-manage webhook certs instead of relying on service-ca#8174

Merged
openshift-merge-bot[bot] merged 4 commits intoopenshift:mainfrom
clebs:webhook-certs
Apr 8, 2026
Merged

CNTRLPLANE-2207: feat(install): self-manage webhook certs instead of relying on service-ca#8174
openshift-merge-bot[bot] merged 4 commits intoopenshift:mainfrom
clebs:webhook-certs

Conversation

@clebs
Copy link
Copy Markdown
Member

@clebs clebs commented Apr 7, 2026

The service-ca operator is not available on non-OpenShift clusters (e.g. AKS). Instead of relying on its annotations to generate serving certs and inject CA bundles, always self-manage them for the HO SVC/Webhook:

  • At install time: generate a self-signed CA and serving cert using support/certs, set caBundle directly on CRDs and webhook configs.
  • At runtime: a WebhookCertReconciler (following the SharedIngressReconciler pattern) auto-renews the serving cert when < 30 days of validity remain and patches caBundle on CRDs and webhook configurations.

The service-ca annotations (inject-cabundle on CRDs/webhook configs, serving-cert-secret-name on the Service) are removed as they are no longer needed.

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added automatic webhook certificate generation and management for improved platform compatibility.
    • Webhook CA bundles are now explicitly configured during operator installation.
  • Improvements

    • Enhanced webhook configuration to work consistently across different platforms by replacing annotation-based certificate injection with explicit certificate handling.

enxebre and others added 4 commits April 1, 2026 13:05
…e-ca

The service-ca operator is not available on non-OpenShift clusters
(e.g. AKS). Instead of relying on its annotations to generate serving
certs and inject CA bundles, always self-manage them:

- At install time: generate a self-signed CA and serving cert using
  support/certs, set caBundle directly on CRDs and webhook configs.
- At runtime: a WebhookCertReconciler (following the
  SharedIngressReconciler pattern) auto-renews the serving cert
  when < 30 days of validity remain and patches caBundle on CRDs
  and webhook configurations.

The service-ca annotations (inject-cabundle on CRDs/webhook configs,
serving-cert-secret-name on the Service) are removed as they are
no longer needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Clarify that this function is only called once at install time,
not during runtime renewal.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ecret controller

The default controller name is derived from the watched resource type,
which conflicts with an existing secret controller in the same manager.

Signed-off-by: Alberto Garcia <agarcial@redhat.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The self-managed webhook cert change removed the service-ca annotations
but only injected the CA bundle into CRDs at install time, not into the
MutatingWebhookConfiguration and ValidatingWebhookConfiguration resources.

The webhook configurations were deployed without a CA bundle, causing API
calls through the webhook to fail with "x509: certificate signed by
unknown authority" until the runtime controller patched them.

Move the cert generation before webhook config creation and pass the CA
bundle directly into all webhook ClientConfig entries at install time.

Signed-off-by: Borja Clemente <bclement@redhat.com>
@openshift-ci-robot
Copy link
Copy Markdown

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 7, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Apr 7, 2026

@clebs: This pull request references CNTRLPLANE-2207 which is a valid jira issue.

Details

In response to this:

The service-ca operator is not available on non-OpenShift clusters (e.g. AKS). Instead of relying on its annotations to generate serving certs and inject CA bundles, always self-manage them for the HO SVC/Webhook:

  • At install time: generate a self-signed CA and serving cert using support/certs, set caBundle directly on CRDs and webhook configs.
  • At runtime: a WebhookCertReconciler (following the SharedIngressReconciler pattern) auto-renews the serving cert when < 30 days of validity remain and patches caBundle on CRDs and webhook configurations.

The service-ca annotations (inject-cabundle on CRDs/webhook configs, serving-cert-secret-name on the Service) are removed as they are no longer needed.

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 7, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 7, 2026

📝 Walkthrough

Walkthrough

This pull request implements explicit webhook certificate management for HyperShift operators, removing reliance on OpenShift-specific service annotations. The changes introduce: (1) initial webhook certificate generation (GenerateInitialWebhookCerts) that creates self-signed CA and serving certificate Secrets during installation, (2) explicit CABundle fields added to webhook configuration types, and (3) a new WebhookCertReconciler controller that monitors and maintains webhook certificate Secrets at runtime while patching CRDs and webhook configurations with current CA bundles. The install process now passes generated CA bundle bytes to webhook configurations instead of relying on annotation-based injection.

Sequence Diagram

sequenceDiagram
    participant Install as Installation Process
    participant CA as Certificate Generation
    participant APIServer as Kubernetes API
    participant Reconciler as WebhookCertReconciler<br/>Controller
    participant Secrets as Secret Resources
    participant WebhookConfig as Webhook/CRD<br/>Resources

    Install->>CA: GenerateInitialWebhookCerts(namespace, serviceName)
    CA->>Secrets: Create CA Secret<br/>(self-signed certificate)
    CA->>Secrets: Create Serving TLS Secret
    CA-->>Install: Return CA bundle bytes
    Install->>WebhookConfig: Apply configs with explicit<br/>CABundle field
    
    Reconciler->>APIServer: Watch Secrets in namespace
    Secrets-->>Reconciler: Secret created/modified event
    Reconciler->>Secrets: ReconcileSelfSignedCA (CA Secret)
    Reconciler->>Secrets: ReconcileSignedCert (Serving Secret)
    Reconciler->>WebhookConfig: Patch CRDs conversion<br/>webhook.clientConfig.caBundle
    Reconciler->>WebhookConfig: Patch MutatingWebhookConfiguration<br/>clientConfig.caBundle
    Reconciler->>WebhookConfig: Patch ValidatingWebhookConfiguration<br/>clientConfig.caBundle
    Reconciler->>Reconciler: Requeue after 12h
Loading
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 7, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci Bot added do-not-merge/needs-area area/cli Indicates the PR includes changes for CLI labels Apr 7, 2026
@clebs
Copy link
Copy Markdown
Member Author

clebs commented Apr 7, 2026

/test e2e-aws e2e-aks e2e-kubevirt-aws-ovn-reduced

@openshift-ci openshift-ci Bot added area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release and removed do-not-merge/needs-area labels Apr 7, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 7, 2026

Codecov Report

❌ Patch coverage is 62.03209% with 71 lines in your changes missing coverage. Please review.
✅ Project coverage is 30.27%. Comparing base (9f96a0f) to head (911ba26).
⚠️ Report is 75 commits behind head on main.

Files with missing lines Patch % Lines
...ontrollers/webhookcerts/webhookcerts_controller.go 68.15% 36 Missing and 14 partials ⚠️
cmd/install/assets/hypershift_operator.go 0.00% 8 Missing ⚠️
hypershift-operator/main.go 0.00% 8 Missing ⚠️
cmd/install/install.go 64.28% 4 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8174      +/-   ##
==========================================
+ Coverage   29.89%   30.27%   +0.38%     
==========================================
  Files        1050     1051       +1     
  Lines       97819    98053     +234     
==========================================
+ Hits        29240    29688     +448     
+ Misses      66075    65823     -252     
- Partials     2504     2542      +38     
Files with missing lines Coverage Δ
cmd/install/install.go 51.39% <64.28%> (+0.03%) ⬆️
cmd/install/assets/hypershift_operator.go 28.24% <0.00%> (+0.34%) ⬆️
hypershift-operator/main.go 0.00% <0.00%> (ø)
...ontrollers/webhookcerts/webhookcerts_controller.go 68.15% <68.15%> (ø)

... and 9 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cwbotbot
Copy link
Copy Markdown

cwbotbot commented Apr 7, 2026

Test Results

e2e-aks

e2e-aws

@clebs clebs marked this pull request as ready for review April 7, 2026 15:12
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 7, 2026
@openshift-ci openshift-ci Bot requested review from cblecker and sjenning April 7, 2026 15:13
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Apr 7, 2026

@clebs: This pull request references CNTRLPLANE-2207 which is a valid jira issue.

Details

In response to this:

The service-ca operator is not available on non-OpenShift clusters (e.g. AKS). Instead of relying on its annotations to generate serving certs and inject CA bundles, always self-manage them for the HO SVC/Webhook:

  • At install time: generate a self-signed CA and serving cert using support/certs, set caBundle directly on CRDs and webhook configs.
  • At runtime: a WebhookCertReconciler (following the SharedIngressReconciler pattern) auto-renews the serving cert when < 30 days of validity remain and patches caBundle on CRDs and webhook configurations.

The service-ca annotations (inject-cabundle on CRDs/webhook configs, serving-cert-secret-name on the Service) are removed as they are no longer needed.

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Summary by CodeRabbit

Release Notes

  • New Features

  • Added automatic webhook certificate generation and management for improved platform compatibility.

  • Webhook CA bundles are now explicitly configured during operator installation.

  • Improvements

  • Enhanced webhook configuration to work consistently across different platforms by replacing annotation-based certificate injection with explicit certificate handling.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@clebs
Copy link
Copy Markdown
Member Author

clebs commented Apr 7, 2026

/test e2e-aws

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
hypershift-operator/controllers/webhookcerts/webhookcerts_controller.go (1)

156-204: Consider using patch instead of update for webhook configurations.

The current Get/Update pattern could encounter conflicts if another controller modifies the webhook configuration between the Get and Update calls. While this is unlikely in practice, using a patch operation (similar to patchCRDsCABundle) would be more robust.

♻️ Suggested refactor to use patch
 	// Patch MutatingWebhookConfiguration
 	mwc := &admissionregistrationv1.MutatingWebhookConfiguration{}
 	if err := r.Client.Get(ctx, client.ObjectKey{Name: webhookName}, mwc); err != nil {
 		if !apierrors.IsNotFound(err) {
 			return fmt.Errorf("failed to get MutatingWebhookConfiguration: %w", err)
 		}
 	} else {
 		needsPatch := false
+		patch := client.MergeFrom(mwc.DeepCopy())
 		for i := range mwc.Webhooks {
 			if !bytes.Equal(mwc.Webhooks[i].ClientConfig.CABundle, caBundle) {
 				needsPatch = true
 				mwc.Webhooks[i].ClientConfig.CABundle = caBundle
 			}
 		}
 		if needsPatch {
-			if err := r.Client.Update(ctx, mwc); err != nil {
+			if err := r.Client.Patch(ctx, mwc, patch); err != nil {
 				return fmt.Errorf("failed to update MutatingWebhookConfiguration: %w", err)
 			}
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hypershift-operator/controllers/webhookcerts/webhookcerts_controller.go`
around lines 156 - 204, Replace the current Get/Update pattern in
patchWebhookConfigsCABundle with a patch using client.Patch to avoid update
conflicts: for both MutatingWebhookConfiguration and
ValidatingWebhookConfiguration (variables mwc and vwc) keep the existing Get
call to load the resource and create an original := mwc.DeepCopy() (or
vwc.DeepCopy()), then modify the DeepCopy's Webhooks[*].ClientConfig.CABundle
where different, and if any changes call r.Client.Patch(ctx, modified,
client.MergeFrom(original)); handle apierrors.IsNotFound the same way and return
the same wrapped errors on Patch failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@hypershift-operator/controllers/webhookcerts/webhookcerts_controller.go`:
- Around line 156-204: Replace the current Get/Update pattern in
patchWebhookConfigsCABundle with a patch using client.Patch to avoid update
conflicts: for both MutatingWebhookConfiguration and
ValidatingWebhookConfiguration (variables mwc and vwc) keep the existing Get
call to load the resource and create an original := mwc.DeepCopy() (or
vwc.DeepCopy()), then modify the DeepCopy's Webhooks[*].ClientConfig.CABundle
where different, and if any changes call r.Client.Patch(ctx, modified,
client.MergeFrom(original)); handle apierrors.IsNotFound the same way and return
the same wrapped errors on Patch failure.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 55f58e8b-55c7-4666-a859-9eafca18fcd7

📥 Commits

Reviewing files that changed from the base of the PR and between 899fd2a and 911ba26.

📒 Files selected for processing (6)
  • cmd/install/assets/hypershift_operator.go
  • cmd/install/install.go
  • cmd/install/install_test.go
  • hypershift-operator/controllers/webhookcerts/webhookcerts_controller.go
  • hypershift-operator/controllers/webhookcerts/webhookcerts_controller_test.go
  • hypershift-operator/main.go

@clebs
Copy link
Copy Markdown
Member Author

clebs commented Apr 7, 2026

/test images

@clebs
Copy link
Copy Markdown
Member Author

clebs commented Apr 7, 2026

/test e2e-aws

2 similar comments
@clebs
Copy link
Copy Markdown
Member Author

clebs commented Apr 7, 2026

/test e2e-aws

@clebs
Copy link
Copy Markdown
Member Author

clebs commented Apr 8, 2026

/test e2e-aws

@clebs
Copy link
Copy Markdown
Member Author

clebs commented Apr 8, 2026

/cc @LiangquanLi930
/assign @enxebre

@enxebre
Copy link
Copy Markdown
Member

enxebre commented Apr 8, 2026

/label tide/merge-method-squash
/approve

@openshift-ci openshift-ci Bot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Apr 8, 2026
@enxebre
Copy link
Copy Markdown
Member

enxebre commented Apr 8, 2026

/pipeline required

@openshift-ci-robot
Copy link
Copy Markdown

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks-4-21
/test e2e-aws-4-21
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 8, 2026
@clebs
Copy link
Copy Markdown
Member Author

clebs commented Apr 8, 2026

/test e2e-aws e2e-aws-upgrade-hypershift-operator

Copy link
Copy Markdown
Member

@bryan-cox bryan-cox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Apr 8, 2026
@openshift-ci-robot
Copy link
Copy Markdown

Tests from second stage were triggered manually. Pipeline can be controlled only manually, until HEAD changes. Use command to trigger second stage.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 8, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bryan-cox, clebs, enxebre

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@clebs
Copy link
Copy Markdown
Member Author

clebs commented Apr 8, 2026

/verified by e2e

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Apr 8, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@clebs: This PR has been marked as verified by e2e.

Details

In response to this:

/verified by e2e

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 8, 2026

@clebs: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot Bot merged commit c674483 into openshift:main Apr 8, 2026
29 checks passed
@clebs clebs deleted the webhook-certs branch April 8, 2026 19:06
mehabhalodiya pushed a commit to mehabhalodiya/hypershift that referenced this pull request Apr 13, 2026
…relying on service-ca (openshift#8174)

* feat(install): self-manage webhook certs instead of relying on service-ca

The service-ca operator is not available on non-OpenShift clusters
(e.g. AKS). Instead of relying on its annotations to generate serving
certs and inject CA bundles, always self-manage them:

- At install time: generate a self-signed CA and serving cert using
  support/certs, set caBundle directly on CRDs and webhook configs.
- At runtime: a WebhookCertReconciler (following the
  SharedIngressReconciler pattern) auto-renews the serving cert
  when < 30 days of validity remain and patches caBundle on CRDs
  and webhook configurations.

The service-ca annotations (inject-cabundle on CRDs/webhook configs,
serving-cert-secret-name on the Service) are removed as they are
no longer needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: rename GenerateWebhookCerts to GenerateInitialWebhookCerts

Clarify that this function is only called once at install time,
not during runtime renewal.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(webhookcerts): name controller to avoid collision with existing secret controller

The default controller name is derived from the watched resource type,
which conflicts with an existing secret controller in the same manager.

Signed-off-by: Alberto Garcia <agarcial@redhat.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(install): inject CA bundle into webhooks

The self-managed webhook cert change removed the service-ca annotations
but only injected the CA bundle into CRDs at install time, not into the
MutatingWebhookConfiguration and ValidatingWebhookConfiguration resources.

The webhook configurations were deployed without a CA bundle, causing API
calls through the webhook to fail with "x509: certificate signed by
unknown authority" until the runtime controller patched them.

Move the cert generation before webhook config creation and pass the CA
bundle directly into all webhook ClientConfig entries at install time.

Signed-off-by: Borja Clemente <bclement@redhat.com>

---------

Signed-off-by: Alberto Garcia <agarcial@redhat.com>
Signed-off-by: Borja Clemente <bclement@redhat.com>
Co-authored-by: enxebre <alberto.garcial@hotmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jparrill added a commit to jparrill/hypershift that referenced this pull request Apr 18, 2026
The webhookcerts controller introduced in PR openshift#8174 (CNTRLPLANE-2207)
performs Get/Update on MutatingWebhookConfiguration and
ValidatingWebhookConfiguration via the cached client, which triggers
lazy informer creation requiring list/watch at cluster scope. The
ClusterRole was missing these permissions, causing repeated reflector
errors in the operator logs.

Add a cluster-scoped RBAC rule for mutatingwebhookconfigurations and
validatingwebhookconfigurations with get, list, watch, and update verbs.

Closes: OCPBUGS-83751

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Juan Manuel Parrilla Madrid <jparrill@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cli Indicates the PR includes changes for CLI area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants