Skip to content

CNTRLPLANE-3426: restart operator upon tls config change#1389

Open
ricardomaraschini wants to merge 1 commit into
openshift:mainfrom
ricardomaraschini:restart-cvo-on-tls-changes
Open

CNTRLPLANE-3426: restart operator upon tls config change#1389
ricardomaraschini wants to merge 1 commit into
openshift:mainfrom
ricardomaraschini:restart-cvo-on-tls-changes

Conversation

@ricardomaraschini
Copy link
Copy Markdown

@ricardomaraschini ricardomaraschini commented May 18, 2026

when the cluster wide tls config changes we now restart the operator by cancelling the run context. by doing so we ensure that the metrics end point will always use the correct tls version and ciphers.

Summary by CodeRabbit

  • New Features

    • Metrics endpoint now honors the cluster TLS security profile (minimum TLS version and cipher suites) and applies those settings to the metrics server.
    • Added optional TLS override settings to explicitly specify minimum TLS version and cipher suites.
    • Operator watches cluster TLS profile and will restart the metrics server when settings change.
  • Tests

    • Added tests validating observation of APIServer TLS profiles and derived TLS settings across scenarios.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 18, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@ricardomaraschini: This pull request explicitly references no jira issue.

Details

In response to this:

when the cluster wide tls config changes we now restart the operator by cancelling the run context. by doing so we ensure that the metrics end point will always use the correct tls version and ciphers.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 18, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

The PR observes the cluster APIServer TLS security profile, returns a defensive MinTLSVersion and cipher suite list, applies them to the metrics serving TLS template, and wires an APIServer informer to trigger shutdown when those settings change so the metrics server restarts with updated TLS settings.

Changes

APIServer TLS Configuration Integration

Layer / File(s) Summary
TLS observer and tests
pkg/cvo/metrics.go, pkg/cvo/metrics_test.go
Adds APIServer lister import, MetricsOptions TLS override fields, and ObserveAPIServerTLSConfig(ctx, lister) which extracts MinTLSVersion and cipher suites from the APIServer TLS profile (including custom profiles) with defensive fallbacks. Adds TestObserveAPIServerTLSConfig covering custom/intermediate/nil/invalid scenarios.
Metrics TLS Configuration Application
pkg/cvo/metrics.go
RunMetrics now applies MetricsOptions.MinTLSVersion and MetricsOptions.CipherSuites overrides onto the baseTlSConfig template used by the serving certificate controller before starting metrics serving.
Operator lifecycle and change detection
pkg/start/start.go
When metrics are enabled, registers a post-cache-sync APIServer informer watcher that re-observes TLS settings and calls runCancel() if MinTLSVersion or cipher suites change. On leader start, re-observes and writes observed TLS values into o.MetricsOptions for use by the metrics server.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 10 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning Critical error: Test calls ObserveAPIServerTLSConfig(ctx, lister) with 2 args but function takes only 1. WaitForCacheSync blocks indefinitely without timeout per K8s docs. Remove ctx parameter from function call. Add timeout to WaitForCacheSync or remove unused ctx variable. Add informative assertion messages.
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: the operator restarts when TLS configuration changes to apply updated TLS settings to the metrics endpoint.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PR does not use Ginkgo testing framework. New tests use standard Go testing package with stable, descriptive test case names containing no dynamic information.
Microshift Test Compatibility ✅ Passed The PR adds a standard Go unit test (TestObserveAPIServerTLSConfig), not a Ginkgo e2e test. The custom check only applies to new Ginkgo e2e tests, so it does not apply to this PR.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No Ginkgo e2e tests were added in this PR. Only standard Go unit tests were added (TestObserveAPIServerTLSConfig). The check for SNO compatibility applies only to Ginkgo e2e tests, not unit tests.
Topology-Aware Scheduling Compatibility ✅ Passed No scheduling constraints introduced. Only operator code changes for TLS config handling. No Pod/Deployment manifests, affinity rules, replica counts, or topology assumptions added.
Ote Binary Stdout Contract ✅ Passed No OTE stdout violations. New code uses klog (configured to stderr) in callbacks/functions, not process-level code. No direct stdout writes found.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No Ginkgo e2e tests are added in this PR. The new test (TestObserveAPIServerTLSConfig) is a standard Go unit test using testing.T, not a Ginkgo e2e test. The custom check is not applicable.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 18, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ricardomaraschini
Once this PR has been reviewed and has the lgtm label, please assign pratikmahajan for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ricardomaraschini ricardomaraschini marked this pull request as draft May 18, 2026 11:20
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 18, 2026
@ricardomaraschini
Copy link
Copy Markdown
Author

/hold

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 18, 2026
@ricardomaraschini ricardomaraschini marked this pull request as ready for review May 18, 2026 11:22
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 18, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/cvo/metrics.go`:
- Around line 435-440: The code assigns options.CipherSuites directly to
baseTlSConfig.CipherSuites (in the block handling Apply APIServer TLS
configuration) which may contain TLS 1.3 suite IDs that Go ignores; update the
assignment to filter out TLS 1.3 cipher suite IDs (those defined in
ObserveAPIServerTLSConfig / the observed profile) before setting
baseTlSConfig.CipherSuites, or alternatively add a clear comment/log explaining
that TLS 1.3 suites are not configurable in crypto/tls; locate the logic around
options.CipherSuites and baseTlSConfig.CipherSuites and implement the filter
step to drop TLS 1.3 IDs prior to assignment.

In `@pkg/start/start.go`:
- Around line 334-350: The baseline TLS profile is captured from the lister
before the APIServer informer cache is started/synced
(o.MetricsOptions.ListenAddress branch), so startMinTLS/startCiphers (from
cvo.ObserveAPIServerTLSConfig) can be read from an empty cache causing an
immediate false-positive runCancel; fix by deferring the initial call to
cvo.ObserveAPIServerTLSConfig until after ConfigInformerFactory.Start(...) and
WaitForCacheSync(...) complete (or otherwise ensure the
apiServerInformer.Informer() has synced) and only then set
startMinTLS/startCiphers once (or set a boolean so the first post-sync
observation initializes the baseline), leaving the existing handler logic that
compares current vs start and calls runCancel unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: d6e1b8c7-a6f5-4891-b1d2-dcd57d5c2c0b

📥 Commits

Reviewing files that changed from the base of the PR and between e75b70a and fc32b95.

📒 Files selected for processing (3)
  • pkg/cvo/metrics.go
  • pkg/cvo/metrics_test.go
  • pkg/start/start.go

Comment thread pkg/cvo/metrics.go Outdated
Comment thread pkg/start/start.go Outdated
@ricardomaraschini ricardomaraschini force-pushed the restart-cvo-on-tls-changes branch from fc32b95 to 9469b26 Compare May 18, 2026 11:29
Comment thread pkg/cvo/metrics.go Outdated
@ricardomaraschini ricardomaraschini force-pushed the restart-cvo-on-tls-changes branch from 9469b26 to 55d653c Compare May 18, 2026 13:50
@ricardomaraschini ricardomaraschini changed the title NO-JIRA: restart operator upon tls config change CNTRLPLANE-3426: restart operator upon tls config change May 19, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented May 19, 2026

@ricardomaraschini: This pull request references CNTRLPLANE-3426 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

when the cluster wide tls config changes we now restart the operator by cancelling the run context. by doing so we ensure that the metrics end point will always use the correct tls version and ciphers.

Summary by CodeRabbit

  • New Features

  • Metrics endpoint now honors the cluster's TLS security profile (minimum TLS version and cipher suites) and applies those settings to the metrics server.

  • Operator updates metrics TLS settings when cluster-level TLS configuration changes and will restart the metrics server to apply updates.

  • Tests

  • Added tests validating observation of APIServer TLS profiles and derived TLS settings across scenarios.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ricardomaraschini
Copy link
Copy Markdown
Author

/jira refresh

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented May 19, 2026

@ricardomaraschini: This pull request references CNTRLPLANE-3426 which is a valid jira issue.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ricardomaraschini
Copy link
Copy Markdown
Author

/retest

@ricardomaraschini ricardomaraschini force-pushed the restart-cvo-on-tls-changes branch from 55d653c to b7a72a9 Compare May 19, 2026 10:00
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/start/start.go`:
- Around line 370-379: The current event handler only uses UpdateFunc and misses
Add/Delete transitions; extract the TLS-change detection into a helper
closure/function (e.g., checkAPIServerTLSChange) that calls
cvo.ObserveAPIServerTLSConfig(apiServerLister), compares with startMinTLSVersion
and startCipherSuites (use slices.Equal for ciphers), logs via klog.Infof and
calls runCancel() on change, then register that helper for AddFunc, UpdateFunc
and DeleteFunc on apiServerSharedIndexInformer via
cache.ResourceEventHandlerFuncs so all three events trigger the same check.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: bae20338-20bd-41b3-a6b5-aec4c02873db

📥 Commits

Reviewing files that changed from the base of the PR and between 55d653c and b7a72a9.

📒 Files selected for processing (3)
  • pkg/cvo/metrics.go
  • pkg/cvo/metrics_test.go
  • pkg/start/start.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/cvo/metrics_test.go
  • pkg/cvo/metrics.go

Comment thread pkg/start/start.go Outdated
@ricardomaraschini ricardomaraschini force-pushed the restart-cvo-on-tls-changes branch 2 times, most recently from bb2538b to 884fdc1 Compare May 19, 2026 10:23
when the cluster wide tls config changes we now restart the operator by
cancelling the run context. by doing so we ensure that the metrics end
point will always use the correct tls version and ciphers.
@ricardomaraschini ricardomaraschini force-pushed the restart-cvo-on-tls-changes branch from 884fdc1 to ab62b8b Compare May 19, 2026 10:41
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 19, 2026

@ricardomaraschini: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/lint 884fdc1 link true /test lint

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants