Skip to content

CNTRLPLANE-3363: Add KMS plugin health reporter design#2005

Open
ibihim wants to merge 1 commit intoopenshift:masterfrom
ibihim:2026-05-07_kmsv2tpv2-health-monitor
Open

CNTRLPLANE-3363: Add KMS plugin health reporter design#2005
ibihim wants to merge 1 commit intoopenshift:masterfrom
ibihim:2026-05-07_kmsv2tpv2-health-monitor

Conversation

@ibihim
Copy link
Copy Markdown
Contributor

@ibihim ibihim commented May 8, 2026

What

A health reporter sidecar runs alongside every API server pod replica when KMS is enabled. It probes the colocated KMS plugin(s) and writes a single advisory KMSHealthReporter_<nodeName> condition per node on the apiserver operator CR.

Why

Exposes plugin health state through the operator CRs and onward into the ClusterOperator's Degraded condition, so a misbehaving KMS plugin is visible in oc get co rather than silently waiting until KAS encryption fails.

Supports future key rotation: per-plugin keyID in the reporter's Message lets a rotation controller verify all nodes agree on the active key before initiating rotation.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 8, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented May 8, 2026

@ibihim: This pull request references CNTRLPLANE-3363 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

What

A health reporter sidecar runs alongside every API server pod replica when KMS is enabled. It probes the colocated KMS plugin(s) and writes a single advisory KMSHealthReporter_<nodeName> condition per node on the apiserver operator CR. A separate aggregator controller reads those conditions and emits a single KMSPluginsDegraded rollup; library-go's StatusSyncer propagates the _Degraded suffix into the ClusterOperator's Degraded condition. The Message field on each per-node condition is structured input for the aggregator: one key=value line per probed plugin, carrying keyID, status, lastChecked, and an optional trailing detail.

Why

Exposes plugin health state through the operator CRs and onward into the ClusterOperator's Degraded condition, so a misbehaving KMS plugin is visible in oc get co rather than silently waiting until KAS encryption fails.

Supports future key rotation: per-plugin keyID in the reporter's Message lets a rotation controller verify all nodes agree on the active key before initiating rotation.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot requested review from JoelSpeed and derekwaynecarr May 8, 2026 17:57
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 8, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign benluddy for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

- Per-node health reporter sidecar publishes one advisory
  KMSHealthReporter_<nodeName> condition on the apiserver operator CR.
- Aggregator controller reads those conditions and emits a single
  KMSPluginsDegraded rollup; library-go's StatusSyncer routes the
  _Degraded suffix into the ClusterOperator's Degraded condition.
- Message format: one key=value line per probed plugin (keyID, status,
  lastChecked, optional trailing detail).
- Risks: stale reporter conditions, orphaned conditions on KMS disable,
  cold-start window.
@ibihim ibihim force-pushed the 2026-05-07_kmsv2tpv2-health-monitor branch from bb85f9a to b719627 Compare May 8, 2026 18:22
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 8, 2026

@ibihim: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants