Skip to content

CNTRLPLANE-3686: feat(api,cpo): add observedGeneration to ControlPlaneComponentStatus#8819

Merged
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
muraee:controlplanecomponent-observergeneartion
Jun 29, 2026
Merged

CNTRLPLANE-3686: feat(api,cpo): add observedGeneration to ControlPlaneComponentStatus#8819
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
muraee:controlplanecomponent-observergeneartion

Conversation

@muraee

@muraee muraee commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds observedGeneration field to ControlPlaneComponentStatus to track which HostedControlPlane generation each component last reconciled
  • Closes the gap where a component reports RolloutComplete=True from a previous reconcile while the CPO hasn't yet processed a new HCP spec change
  • The field is set unconditionally at the end of reconcileComponentStatus — conditions (RolloutComplete, Available) communicate success/failure, while observedGeneration signals "I've processed this generation"

Test plan

  • Unit tests updated — TestReconcileComponentStatus asserts ObservedGeneration equals HCP generation in all test cases
  • make api regenerated CRDs, deepcopy, clients, docs
  • make api-lint-fix passes with 0 issues
  • make verify passes

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Control plane component status now reports observedGeneration, indicating which Hosted Control Plane spec generation has been reconciled.
    • Control Plane Component CRD status output now emphasizes rollout completion with RolloutComplete and RolloutCompleteMessage (replacing progressing-focused columns).
  • Bug Fixes

    • Component status reconciliation now correctly sets observedGeneration based on the Hosted Control Plane’s current generation.

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 258f547f-e911-4083-a826-1ef72b494fb9

📥 Commits

Reviewing files that changed from the base of the PR and between 149ac0c and 27d3b42.

⛔ Files ignored due to path filters (7)
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests.yaml is excluded by !**/zz_generated*
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/controlplanecomponents.hypershift.openshift.io/AAA_ungated.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/controlplanecomponents.crd.yaml is excluded by !**/zz_generated.crd-manifests/**, !cmd/install/assets/**/*.yaml
  • docs/content/reference/aggregated-docs.md is excluded by !docs/content/reference/aggregated-docs.md
  • docs/content/reference/api.md is excluded by !docs/content/reference/api.md
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/controlplanecomponent_types.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests.yaml is excluded by !vendor/**, !**/vendor/**, !**/zz_generated*
📒 Files selected for processing (3)
  • api/hypershift/v1beta1/controlplanecomponent_types.go
  • support/controlplane-component/status.go
  • support/controlplane-component/status_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • support/controlplane-component/status.go
  • support/controlplane-component/status_test.go
  • api/hypershift/v1beta1/controlplanecomponent_types.go

📝 Walkthrough

Walkthrough

ControlPlaneComponentStatus adds an ObservedGeneration field with validation bounds. reconcileComponentStatus now copies cpContext.HCP.Generation into that field after rollout and version handling. The status test sets a concrete hosted control plane generation and asserts that the reconciled component status records it.

🚥 Pre-merge checks | ✅ 11
✅ Passed checks (11 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding observedGeneration to ControlPlaneComponentStatus.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PASS: The touched test titles are static t.Run strings; no dynamic values, generated IDs, or time-based data appear in titles.
Test Structure And Quality ✅ Passed PASS: The changed test is table-driven unit code, not Ginkgo; it has isolated subtests, no cluster waits/resources, and matches existing package style.
Topology-Aware Scheduling Compatibility ✅ Passed Changes only add observedGeneration/status plumbing and tests; no pod affinity, nodeSelector, spread, replica-count, or control-plane scheduling logic.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed The touched test is a plain unit test (testing/t.Run) with a fake client; no new Ginkgo e2e specs or IPv4/external-network assumptions appear in the changed files.
No-Weak-Crypto ✅ Passed Changed files only add observedGeneration/status plumbing and tests; no crypto primitives or secret/token comparisons appear in the patch.
Container-Privileges ✅ Passed PR only changes API/status code; no manifest files or privileged flags (privileged, hostPID/network/IPC, SYS_ADMIN, allowPrivilegeEscalation, root) were added.
No-Sensitive-Data-In-Logs ✅ Passed No new logging was added in the touched files; changes only set status fields/messages, and searches found no logger calls or sensitive-data patterns.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@muraee muraee changed the title feat(api,cpo): add observedGeneration to ControlPlaneComponentStatus CNTRLPLANE-3686: feat(api,cpo): add observedGeneration to ControlPlaneComponentStatus Jun 23, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 23, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 23, 2026

Copy link
Copy Markdown

@muraee: This pull request references CNTRLPLANE-3686 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

  • Adds observedGeneration field to ControlPlaneComponentStatus to track which HostedControlPlane generation each component last reconciled
  • Closes the gap where a component reports RolloutComplete=True from a previous reconcile while the CPO hasn't yet processed a new HCP spec change
  • The field is set unconditionally at the end of reconcileComponentStatus — conditions (RolloutComplete, Available) communicate success/failure, while observedGeneration signals "I've processed this generation"

Test plan

  • Unit tests updated — TestReconcileComponentStatus asserts ObservedGeneration equals HCP generation in all test cases
  • make api regenerated CRDs, deepcopy, clients, docs
  • make api-lint-fix passes with 0 issues
  • make verify passes

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added area/api Indicates the PR includes changes for the API area/cli Indicates the PR includes changes for CLI area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/documentation Indicates the PR includes changes for documentation area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release and removed do-not-merge/needs-area labels Jun 23, 2026
@openshift-ci openshift-ci Bot requested review from jparrill and sdminonne June 23, 2026 16:32
@github-actions github-actions Bot temporarily deployed to docs-preview/pr-8819 June 23, 2026 16:35 Inactive
@codecov

codecov Bot commented Jun 23, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 42.56%. Comparing base (8efac9c) to head (27d3b42).
⚠️ Report is 93 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8819      +/-   ##
==========================================
+ Coverage   41.91%   42.56%   +0.64%     
==========================================
  Files         769      768       -1     
  Lines       96763    95299    -1464     
==========================================
+ Hits        40557    40560       +3     
+ Misses      53402    51932    -1470     
- Partials     2804     2807       +3     
Files with missing lines Coverage Δ
support/controlplane-component/status.go 72.28% <100.00%> (+0.33%) ⬆️

... and 9 files with indirect coverage changes

Flag Coverage Δ
cmd-support 35.46% <100.00%> (+<0.01%) ⬆️
cpo-hostedcontrolplane 44.84% <ø> (ø)
cpo-other 44.70% <ø> (+0.13%) ⬆️
hypershift-operator 53.05% <ø> (+2.87%) ⬆️
other 31.69% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@csrwng

csrwng commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Comment thread api/hypershift/v1beta1/controlplanecomponent_types.go Outdated
@JoelSpeed

Copy link
Copy Markdown
Contributor

/approve for API change

@openshift-ci

openshift-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JoelSpeed, muraee

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 24, 2026
@muraee muraee force-pushed the controlplanecomponent-observergeneartion branch from bc3c753 to 149ac0c Compare June 24, 2026 10:21
@github-actions github-actions Bot temporarily deployed to docs-preview/pr-8819 June 24, 2026 10:28 Inactive
Track which HostedControlPlane generation each component last
reconciled. This closes the gap where a component reports
RolloutComplete=True from a previous reconcile while the CPO hasn't
yet processed a new HCP spec change.

The field is set unconditionally at the end of reconcileComponentStatus
— conditions (RolloutComplete, Available) communicate success/failure,
while observedGeneration signals "I've processed this generation."

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@muraee muraee force-pushed the controlplanecomponent-observergeneartion branch from 149ac0c to 27d3b42 Compare June 24, 2026 14:06
@csrwng

csrwng commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 24, 2026
@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-v2-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws
/test e2e-v2-gke

@hypershift-jira-solve-ci

Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-aws | Build: 2069784783814135808 | Cost: $2.6537709999999994 | Failed step: hypershift-aws-run-e2e-nested

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@cwbotbot

cwbotbot commented Jun 24, 2026

Copy link
Copy Markdown

Test Results

e2e-aws

e2e-aks

@hypershift-jira-solve-ci

Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-aks | Build: 2069784781284970496 | Cost: $3.026036499999999 | Failed step: hypershift-azure-run-e2e

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@muraee

muraee commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

/retest-required

@muraee

muraee commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

/verified by unit

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Jun 25, 2026
@openshift-ci-robot

Copy link
Copy Markdown

@muraee: This PR has been marked as verified by unit.

Details

In response to this:

/verified by unit

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@hypershift-jira-solve-ci

Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-aks | Build: 2070065991101976576 | Cost: $3.3597865 | Failed step: hypershift-azure-run-e2e

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD 652dbf2 and 2 for PR HEAD 27d3b42 in total

@hypershift-jira-solve-ci

Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-aws | Build: 2070111964402552832 | Cost: $2.759507249999999 | Failed step: hypershift-aws-run-e2e-nested

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD 02675db and 1 for PR HEAD 27d3b42 in total

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD 334535c and 0 for PR HEAD 27d3b42 in total

@hypershift-jira-solve-ci

Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-aws | Build: 2070308322560446464 | Cost: $2.08311025 | Failed step: hypershift-azure-run-e2e

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

/hold

Revision 27d3b42 was retested 3 times: holding

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 26, 2026
@muraee

muraee commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

/retest-required
/unhold

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 26, 2026
@openshift-merge-bot

Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD c180b15 and 2 for PR HEAD 27d3b42 in total

@muraee

muraee commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

/retest-required

@hypershift-jira-solve-ci

Copy link
Copy Markdown

Now I have all the evidence I need. Let me compile the final analysis. The key findings are:

  1. PR changes are entirely unrelated to the failures — The PR adds observedGeneration to ControlPlaneComponentStatus and renames ProgressingRolloutComplete. It modifies API types, CRD manifests, docs, and status reconciliation code — none of which are related to packageserver, pull secret propagation, or Karpenter kubelet TLS.

  2. Three distinct, independent test failures — all are pre-existing/flaky:

    • TestCreateClusterHABreakGlassCredentials/EnsureNoCrashingPods: packageserver had 2 restarts (a transient OLM issue)
    • TestCreateCluster/EnsureGlobalPullSecret: pull secret propagation timed out after 150s
    • TestKarpenter/OpenshiftEC2NodeClass_Kubelet_propagation: TLS internal error accessing kubelet API on a Karpenter-provisioned node
  3. 562 tests ran, 551 passed, 29 skipped, 11 failed — the 11 failures cascade from the 3 leaf failures above.

Test Failure Analysis Complete

Job Information

  • Prow Job: pull-ci-openshift-hypershift-main-e2e-aws
  • Build ID: 2071515473207365632
  • Target: e2e-aws
  • Result: 562 tests, 551 passed, 29 skipped, 11 failures (3 distinct root failures)

Test Failure Analysis

Error

3 independent test failures (11 total including parent cascades):

1. TestCreateClusterHABreakGlassCredentials/ValidateHostedCluster/EnsureNoCrashingPods:
   Container packageserver in pod packageserver-5854977657-w7fbr has a restartCount > 0 (2)

2. TestCreateCluster/Main/EnsureGlobalPullSecret/When_management-cluster_hostedCluster.Spec.PullSecret_is_updated_in-place_it_should_propagate_to_guest_without_rollout:
   Timed out after 150.000s. openshift-config/pull-secret did not propagate dummy entry

3. TestKarpenter/Main/Parallel_provisioning_tests/OpenshiftEC2NodeClass_Kubelet_propagation:
   Get "https://10.0.133.67:10250/containerLogs/kube-system/kubelet-config-checker/checker": remote error: tls: internal error

Summary

All three test failures are pre-existing flaky tests unrelated to PR #8819's changes. The PR exclusively modifies ControlPlaneComponentStatus API types — adding an observedGeneration field and renaming the Progressing condition to RolloutComplete — touching only CRD manifests, API type definitions, docs, and a single line in status.go. None of the failing tests exercise ControlPlaneComponent status reconciliation, pull secret propagation, packageserver lifecycle, or Karpenter kubelet TLS connectivity. The 98% pass rate (551/562) with 3 independent, unrelated leaf failures is consistent with known environmental flakiness in the HyperShift e2e suite.

Root Cause

These failures are NOT caused by PR #8819. Each failure has an independent, pre-existing root cause:

  1. EnsureNoCrashingPods (packageserver restarts): The OLM packageserver pod restarted twice during cluster provisioning. This is a known transient condition — packageserver can restart during initial cluster bootstrap as the OLM operator reconciles catalog sources. The test has a zero-tolerance policy for any container restart, making it inherently flaky.

  2. EnsureGlobalPullSecret (propagation timeout): The test patches the management-cluster pull secret and waits 150 seconds for the change to propagate to the guest cluster's openshift-config/pull-secret. The propagation timed out, likely due to the control plane reconciliation loop not picking up the change within the timeout window. This is a timing-sensitive test that can fail under load when multiple hosted clusters are running concurrently (as in this e2e run with TestCreateCluster, TestKarpenter, TestAutoscaling, etc. all running in parallel).

  3. OpenshiftEC2NodeClass Kubelet propagation (TLS internal error): The test creates a Karpenter-provisioned node with a custom kubelet config, then tries to read container logs from the kubelet API on that node. The tls: internal error when connecting to https://10.0.133.67:10250 indicates the kubelet's serving certificate was not yet properly signed or trusted by the kube-apiserver. On Karpenter-provisioned nodes, the kubelet serving cert approval can race with the test's 120-second timeout, especially when the node just joined the cluster (the test confirmed the node became ready only 5m9s before the log fetch attempt).

PR #8819's changes are limited to:

  • api/hypershift/v1beta1/controlplanecomponent_types.go — adds ObservedGeneration int64 field
  • CRD manifests — regenerated to include the new field
  • support/controlplane-component/status.go — sets component.Status.ObservedGeneration = cpContext.HCP.Generation
  • Docs — updated API reference

None of these files are in the code path of any failing test.

Recommendations
  1. Retest the PR — Run /retest to trigger a new e2e-aws run. These failures are flaky and unrelated to the PR changes. A clean run is expected.

  2. No code changes needed in PR CNTRLPLANE-3686: feat(api,cpo): add observedGeneration to ControlPlaneComponentStatus #8819 — The PR's changes to ControlPlaneComponentStatus API and CRD manifests are not involved in any of the 3 failure modes.

  3. For the HyperShift team (separate from this PR):

    • The EnsureNoCrashingPods test could benefit from excluding known-flaky pods like packageserver or allowing a small restart tolerance during bootstrap.
    • The EnsureGlobalPullSecret propagation timeout (150s) may need to be increased or the test should retry the check.
    • The OpenshiftEC2NodeClass_Kubelet_propagation test's 120s timeout for kubelet log access on a freshly-provisioned Karpenter node may be insufficient — kubelet serving cert approval can take longer.
Evidence
Evidence Detail
PR scope Adds observedGeneration to ControlPlaneComponentStatus, renames ProgressingRolloutComplete — 10 files changed, all API/CRD/docs/status
Files changed by PR controlplanecomponent_types.go, CRD YAMLs, status.go (+1 line: component.Status.ObservedGeneration = cpContext.HCP.Generation), docs
Failure 1: packageserver packageserver-5854977657-w7fbr had restartCount=2, unrelated to ControlPlaneComponent reconciliation
Failure 2: pull secret Timed out at 150s waiting for openshift-config/pull-secret propagation to guest cluster — timing/load sensitive
Failure 3: Karpenter TLS tls: internal error on kubelet port 10250 of Karpenter node 10.0.133.67 — kubelet serving cert not yet approved within 120s timeout
Test pass rate 551/562 tests passed (98%), 29 skipped, 11 failed (3 distinct leaf failures cascading to parents)
Failure independence 3 failures are in 3 separate test suites (TestCreateClusterHABreakGlassCredentials, TestCreateCluster, TestKarpenter) with no common code path
Failure step hypershift-aws-run-e2e-nested (test phase) — pre/post phases passed
Build log .work/prow-job-analyze-test-failure/2071515473207365632/logs/hypershift-aws-run-e2e-nested-build-log.txt
JUnit XML .work/prow-job-analyze-test-failure/2071515473207365632/logs/junit_e2e.xml

@muraee

muraee commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

/retest-required

@openshift-ci

openshift-ci Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

@muraee: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot Bot merged commit 0b20ee5 into openshift:main Jun 29, 2026
51 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/api Indicates the PR includes changes for the API area/cli Indicates the PR includes changes for CLI area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/documentation Indicates the PR includes changes for documentation area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants