OCPBUGS-86306: Stop controllers fighting over HCP status by jkyros · Pull Request #8562 · openshift/hypershift

jkyros · 2026-05-20T23:31:33Z

What this PR does / why we need it:

We have multiple controllers writing to the HCP's status object with Update() calls, which replaces the entire object
Depending on what the controller performing the Update() understands and sees, and when it performs the update, that potentially blows away fields like vcpus from the autonode status that we added here: AUTOSCALE-615: include Karpenter node vCPUs in billing metric #8265
This just replaces the Status().Update() calls with calls to Patch calls that use client.MergeFromWithOptions() along with client.MergeFromWithOptimisticLock{} so that the full object updates don't wipe other controllers' status update
Testing this explicitly is unpleasant without some kind of refactor, the parts controllers that write the status aren't set up for easy testing, but it's observably possible the issue can happen because of the use of Update() calls across multiple controllers on the same object

Which issue(s) this PR fixes:

Fixes OCPBUGS-86306

Special notes for your reviewer:

Checklist:

Subject and description added to both, commit and PR.
Relevant issues have been referenced.
This change includes docs.
This change includes unit tests.

Summary by CodeRabbit

Refactor
- Switched status persistence across multiple controllers to optimistic-locking status patches for more robust concurrent updates and clearer success/error logs.
Tests
- Added tests validating status patch generation and behavior: include intended fields, preserve unrelated status fields, and retain full condition arrays when updating a single condition.

openshift-merge-bot · 2026-05-20T23:31:36Z

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

openshift-ci · 2026-05-20T23:31:37Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

openshift-ci-robot · 2026-05-20T23:31:40Z

@jkyros: This pull request references Jira Issue OCPBUGS-86306, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (5.0.0) matches configured target version for branch (5.0.0)
bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

What this PR does / why we need it:

We have multiple controllers writing to the HCP's status object with Update() calls, which replaces the entire object

Depending on what the controller performing the Update() understands and sees, and when it performs the update, that potentially blows away fields like vcpus from the autonode status that we added here: AUTOSCALE-615: include Karpenter node vCPUs in billing metric #8265

This just replaces the Status().Update() calls with calls to Patch calls that use client.MergeFromWithOptions() along with client.MergeFromWithOptimisticLock{} so that the full object updates don't wipe other controllers' status update

Testing this explicitly is unpleasant without some kind of refactor, the parts controllers that write the status aren't set up for easy testing, but it's observably possible the issue can happen because of the use of Update() calls across multiple controllers on the same object

Which issue(s) this PR fixes:

Fixes OCPBUGS-86306

Special notes for your reviewer:

Checklist:

Subject and description added to both, commit and PR.

Relevant issues have been referenced.

This change includes docs.

This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai · 2026-05-20T23:31:46Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: cf90fa5a-fc68-4d1a-bf6a-6c30a5364c66

📥 Commits

Reviewing files that changed from the base of the PR and between 10d5332 and 9995cf5.

📒 Files selected for processing (7)

control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go
control-plane-operator/hostedclusterconfigoperator/controllers/hcpstatus/hcpstatus.go
control-plane-operator/hostedclusterconfigoperator/controllers/hcpstatus/hcpstatus_test.go
control-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.go
hypershift-operator/controllers/etcdbackup/reconciler.go
hypershift-operator/controllers/etcdbackup/reconciler_test.go
ignition-server/controllers/local_ignitionprovider.go

📝 Walkthrough

Walkthrough

This pull request replaces direct HostedControlPlane status updates with optimistic-locking status patches across multiple controllers. The hcpstatus controller now generates RFC‑6902 JSON Patch ops (including a resourceVersion test) and issues Status().Patch. Other controllers (hostedcontrolplane, resources, etcdbackup, ignition-server) deep-copy original HCP objects and persist condition changes via Status().Patch with MergeFromWithOptimisticLock. New unit tests validate patch generation and backup-condition patch behavior.

Suggested reviewers

enxebre
csrwng

🚥 Pre-merge checks | ✅ 10 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality	⚠️ Warning	TestUpdateHCPBackupCondition lacks messages on critical assertions: cond.Status, cond.Reason, cond.ObservedGeneration, and FindStatusCondition checks.	Add meaningful messages to all critical assertions in TestUpdateHCPBackupCondition to help diagnose test failures. Example: g.Expect(cond.Status).To(Equal(...), "EtcdBackupSucceeded condition status mismatch").

✅ Passed checks (10 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'OCPBUGS-86306: Stop controllers fighting over HCP status' clearly and concisely summarizes the main change: replacing Status().Update() calls with optimistic-locking patches to prevent concurrent status conflicts.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names	✅ Passed	All 9 test cases (7 in TestBuildStatusPatch, 2 in TestUpdateHCPBackupCondition) use stable, deterministic test names with only static descriptive strings, no dynamic values or generated identifiers.
Microshift Test Compatibility	✅ Passed	No Ginkgo e2e tests (It, Describe, Context, When) added. PR adds only standard Go unit tests: TestBuildStatusPatch and TestUpdateHCPBackupCondition using testing.T and Gomega.
Single Node Openshift (Sno) Test Compatibility	✅ Passed	The PR adds two standard Go unit tests (TestBuildStatusPatch, TestUpdateHCPBackupCondition), not Ginkgo e2e tests. SNO compatibility check applies only to Ginkgo tests, so not applicable here.
Topology-Aware Scheduling Compatibility	✅ Passed	PR only changes status update mechanism (Update→Patch with optimistic locking). No new scheduling constraints, affinity rules, or topology assumptions introduced.
Ote Binary Stdout Contract	✅ Passed	No violations detected. Changes involve only Kubernetes client Status().Patch calls in controller reconcilers. No process-level stdout writes, fmt.Print calls, or Ginkgo suite functions.
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	No new Ginkgo e2e tests added. PR adds only standard Go unit tests (TestBuildStatusPatch, TestUpdateHCPBackupCondition) using testing.T, not Ginkgo framework.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2026-05-20T23:31:59Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jkyros
Once this PR has been reviewed and has the lgtm label, please assign enxebre for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2026-05-20T23:32:50Z

@jkyros: This pull request references Jira Issue OCPBUGS-86306, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (5.0.0) matches configured target version for branch (5.0.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Details

In response to this:

What this PR does / why we need it:

We have multiple controllers writing to the HCP's status object with Update() calls, which replaces the entire object

Depending on what the controller performing the Update() understands and sees, and when it performs the update, that potentially blows away fields like vcpus from the autonode status that we added here: AUTOSCALE-615: include Karpenter node vCPUs in billing metric #8265

This just replaces the Status().Update() calls with calls to Patch calls that use client.MergeFromWithOptions() along with client.MergeFromWithOptimisticLock{} so that the full object updates don't wipe other controllers' status update

Testing this explicitly is unpleasant without some kind of refactor, the parts controllers that write the status aren't set up for easy testing, but it's observably possible the issue can happen because of the use of Update() calls across multiple controllers on the same object

Which issue(s) this PR fixes:

Fixes OCPBUGS-86306

Special notes for your reviewer:

Checklist:

Subject and description added to both, commit and PR.

Relevant issues have been referenced.

This change includes docs.

This change includes unit tests.

Summary by CodeRabbit

Refactor

Improved status condition handling across hosted cluster controllers for more reliable concurrent updates.

Tests

Added test coverage for status condition patching behavior to ensure accurate field updates without unintended side effects.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

ignition-server/controllers/local_ignitionprovider.go (1)
768-788: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Wrap reconcileValidReleaseInfoCondition in retry logic for optimistic-lock conflicts.

This helper now uses Status().Patch(..., client.MergeFromWithOptimisticLock{}), which returns 409 conflicts during concurrent HCP status writes. However, the caller at lines 451-452 only logs the error and continues, leaving IgnitionServerValidReleaseInfo stale indefinitely—there's no guaranteed near-term retry.

Wrap the helper logic in retry.RetryOnConflict(retry.DefaultBackoff, func() { ... }) and re-fetch the HostedControlPlane inside the retry block. This pattern is already used in support/k8sutil/object.go and aligns with Kubernetes best practices.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ignition-server/controllers/local_ignitionprovider.go` around lines 768 -
788, Wrap the existing status-patch logic in
retry.RetryOnConflict(retry.DefaultBackoff, func() error { ... }) so
optimistic-lock 409s are retried: inside the retry block re-fetch the latest
HostedControlPlane using r.Client.Get into hostedControlPlane, create
originalHCP := hostedControlPlane.DeepCopy(), compute/set the
IgnitionServerValidReleaseInfo condition (use
releaseImageProvider.GetMissingImages() and meta.SetStatusCondition as before),
then call r.Client.Status().Patch(ctx, &hostedControlPlane,
client.MergeFromWithOptions(originalHCP, client.MergeFromWithOptimisticLock{}))
and return the patch error; let RetryOnConflict surface a final error if retries
fail.

🧹 Nitpick comments (3)

control-plane-operator/hostedclusterconfigoperator/controllers/hcpstatus/hcpstatus_test.go (2)

19-160: ⚡ Quick win

Add parallelization to align with unit-test execution guidance.

This test suite does not use t.Parallel(), so it misses the repository’s unit-test parallelism requirement.

Suggested change

 func TestMergeFromPatchDoesNotStompUnrelatedFields(t *testing.T) {
+	t.Parallel()
 	t.Run("When hcpstatus controller patches conditions and versionStatus, it should not include autoNode in the patch", func(t *testing.T) {
+		t.Parallel()
 		g := NewWithT(t)
@@
 	t.Run("When hcpstatus controller makes no changes, it should produce an empty patch", func(t *testing.T) {
+		t.Parallel()
 		g := NewWithT(t)
@@
 	t.Run("When a condition-only controller patches, the conditions array should contain ALL conditions from the read", func(t *testing.T) {
+		t.Parallel()
 		g := NewWithT(t)

As per coding guidelines, "Use race detection and parallel execution for unit tests".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/hcpstatus/hcpstatus_test.go`
around lines 19 - 160, The test suite
TestMergeFromPatchDoesNotStompUnrelatedFields is missing parallelization; add
t.Parallel() at the start of TestMergeFromPatchDoesNotStompUnrelatedFields and
inside each subtest closure (the three t.Run anonymous funcs) so each subtest
and the parent test run with testing parallelism and race detection enabled;
look for the TestMergeFromPatchDoesNotStompUnrelatedFields function and the
three t.Run blocks to insert the t.Parallel() calls.

151-152: ⚡ Quick win

Avoid panic-prone type assertions in tests.

These direct assertions can panic and hide the real failure context. Prefer checked assertions so failures stay diagnostic.

Suggested change

-		status := patchMap["status"].(map[string]interface{})
-		conditions := status["conditions"].([]interface{})
+		status, ok := patchMap["status"].(map[string]interface{})
+		g.Expect(ok).To(BeTrue(), "patch should contain a status object")
+		conditions, ok := status["conditions"].([]interface{})
+		g.Expect(ok).To(BeTrue(), "status.conditions should be a JSON array")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/hcpstatus/hcpstatus_test.go`
around lines 151 - 152, Replace the unchecked type assertions on
patchMap["status"] and status["conditions"] with guarded checks so the test
fails with a clear message instead of panicking: retrieve the value using the
comma-ok form (e.g., statusIface, ok := patchMap["status"]; if !ok {
t.Fatalf(...) } then status, ok := statusIface.(map[string]interface{}); if !ok
{ t.Fatalf(...) }) and likewise check conditionsIface, ok :=
status["conditions"]; if !ok { t.Fatalf(...) } then conditions, ok :=
conditionsIface.([]interface{}); if !ok { t.Fatalf(...) }); alternatively use
your test helper (e.g., require.IsType/require.NotNil) around the same symbols
(patchMap, status, conditions) to make assertions explicit and diagnostic.

control-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.go (1)

570-576: ⚡ Quick win

Skip no-op recovery status patches.

reconcileClusterRecovery always patches after meta.SetStatusCondition, even when the condition is unchanged. Because this path polls every 120 seconds, steady-state recovery checks become repeated write attempts and avoidable optimistic-lock conflicts on the same HCP status object. Guard on the boolean returned by meta.SetStatusCondition to skip writing unchanged conditions.

♻️ Suggested change

 	originalHCP := hcp.DeepCopy()
-	meta.SetStatusCondition(&hcp.Status.Conditions, *condition)
+	if !meta.SetStatusCondition(&hcp.Status.Conditions, *condition) {
+		if !finished {
+			return ctrl.Result{RequeueAfter: 120 * time.Second}, nil
+		}
+		return ctrl.Result{}, nil
+	}
 	log.Info("setting condition", "type", condition.Type, "status", condition.Status, "message", condition.Message)
 	if err := r.cpClient.Status().Patch(ctx, hcp, client.MergeFromWithOptions(originalHCP, client.MergeFromWithOptimisticLock{})); err != nil {
 		return ctrl.Result{}, fmt.Errorf("failed to patch status on hcp for hosted cluster recovery: %w. Condition error message: %v", err, condition.Message)
 	}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.go`
around lines 570 - 576, reconcileClusterRecovery is always patching HCP status
even when meta.SetStatusCondition didn't change the condition; capture the
boolean return from meta.SetStatusCondition and only call
r.cpClient.Status().Patch (using originalHCP and the same
MergeFromWithOptions/OptimisticLock) if that boolean is true to skip no-op
writes and avoid redundant optimistic-lock updates.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@control-plane-operator/hostedclusterconfigoperator/controllers/hcpstatus/hcpstatus.go`:
- Around line 79-80: The patch call on h.mgtClusterClient.Status().Patch (using
crclient.MergeFromWithOptions and MergeFromWithOptimisticLock) must treat
optimistic-lock conflicts differently: after the Patch returns an error, detect
conflicts with apierrors.IsConflict(err) and, if true, return reconcile.Result{}
with a nil error so the controller runtime can requeue with backoff; for
non-conflict errors continue returning the wrapped fmt.Errorf("failed to patch
hcp status: %w", err). Locate the error handling directly after the Patch
invocation in the hcp status reconciliation code and insert the
apierrors.IsConflict check there.

In `@hypershift-operator/controllers/etcdbackup/reconciler.go`:
- Around line 302-305: The patch that sets meta.SetStatusCondition and calls
r.Status().Patch(hcp, client.MergeFromWithOptions(originalHCP,
client.MergeFromWithOptimisticLock{})) should be wrapped in
retry.RetryOnConflict so optimistic-lock 409s are retried: on each retry re-get
the latest HCP (fresh read into a new variable), reapply
condition.ObservedGeneration = hcp.Generation and
meta.SetStatusCondition(&hcp.Status.Conditions, condition) against the fresh
HCP, then call r.Status().Patch with client.MergeFromWithOptions(originalHCP,
client.MergeFromWithOptimisticLock{}) (or use a new merge-from base as
appropriate) until success or non-conflict error; ensure errors are returned if
retries exhaust.

---

Outside diff comments:
In `@ignition-server/controllers/local_ignitionprovider.go`:
- Around line 768-788: Wrap the existing status-patch logic in
retry.RetryOnConflict(retry.DefaultBackoff, func() error { ... }) so
optimistic-lock 409s are retried: inside the retry block re-fetch the latest
HostedControlPlane using r.Client.Get into hostedControlPlane, create
originalHCP := hostedControlPlane.DeepCopy(), compute/set the
IgnitionServerValidReleaseInfo condition (use
releaseImageProvider.GetMissingImages() and meta.SetStatusCondition as before),
then call r.Client.Status().Patch(ctx, &hostedControlPlane,
client.MergeFromWithOptions(originalHCP, client.MergeFromWithOptimisticLock{}))
and return the patch error; let RetryOnConflict surface a final error if retries
fail.

---

Nitpick comments:
In
`@control-plane-operator/hostedclusterconfigoperator/controllers/hcpstatus/hcpstatus_test.go`:
- Around line 19-160: The test suite
TestMergeFromPatchDoesNotStompUnrelatedFields is missing parallelization; add
t.Parallel() at the start of TestMergeFromPatchDoesNotStompUnrelatedFields and
inside each subtest closure (the three t.Run anonymous funcs) so each subtest
and the parent test run with testing parallelism and race detection enabled;
look for the TestMergeFromPatchDoesNotStompUnrelatedFields function and the
three t.Run blocks to insert the t.Parallel() calls.
- Around line 151-152: Replace the unchecked type assertions on
patchMap["status"] and status["conditions"] with guarded checks so the test
fails with a clear message instead of panicking: retrieve the value using the
comma-ok form (e.g., statusIface, ok := patchMap["status"]; if !ok {
t.Fatalf(...) } then status, ok := statusIface.(map[string]interface{}); if !ok
{ t.Fatalf(...) }) and likewise check conditionsIface, ok :=
status["conditions"]; if !ok { t.Fatalf(...) } then conditions, ok :=
conditionsIface.([]interface{}); if !ok { t.Fatalf(...) }); alternatively use
your test helper (e.g., require.IsType/require.NotNil) around the same symbols
(patchMap, status, conditions) to make assertions explicit and diagnostic.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.go`:
- Around line 570-576: reconcileClusterRecovery is always patching HCP status
even when meta.SetStatusCondition didn't change the condition; capture the
boolean return from meta.SetStatusCondition and only call
r.cpClient.Status().Patch (using originalHCP and the same
MergeFromWithOptions/OptimisticLock) if that boolean is true to skip no-op
writes and avoid redundant optimistic-lock updates.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: d0b2014e-1faa-4d10-a400-26d60e220b42

📥 Commits

Reviewing files that changed from the base of the PR and between c23dbb8 and 58ac9ee.

📒 Files selected for processing (6)

control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go
control-plane-operator/hostedclusterconfigoperator/controllers/hcpstatus/hcpstatus.go
control-plane-operator/hostedclusterconfigoperator/controllers/hcpstatus/hcpstatus_test.go
control-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.go
hypershift-operator/controllers/etcdbackup/reconciler.go
ignition-server/controllers/local_ignitionprovider.go

coderabbitai · 2026-05-20T23:38:11Z

+		if err := h.mgtClusterClient.Status().Patch(ctx, hcp, crclient.MergeFromWithOptions(originalHCP, crclient.MergeFromWithOptimisticLock{})); err != nil {
+			return reconcile.Result{}, fmt.Errorf("failed to patch hcp status: %w", err)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the file at the specified lines cat -n control-plane-operator/hostedclusterconfigoperator/controllers/hcpstatus/hcpstatus.go | sed -n '70,90p'

Repository: openshift/hypershift

Length of output: 1230

🏁 Script executed:

# Let's also see more context around the patch operation cat -n control-plane-operator/hostedclusterconfigoperator/controllers/hcpstatus/hcpstatus.go | sed -n '60,100p'

Repository: openshift/hypershift

Length of output: 2002

🏁 Script executed:

# Check if there are other patch operations in the file to understand the pattern rg -n "Status\(\)\.Patch|MergeFromWithOptimisticLock" control-plane-operator/hostedclusterconfigoperator/controllers/hcpstatus/hcpstatus.go -A 2 -B 2

Repository: openshift/hypershift

Length of output: 450

🏁 Script executed:

# Search for how conflicts are handled elsewhere in the codebase rg -n "409|Conflict|optimistic" control-plane-operator/ --type go -i | head -20

Repository: openshift/hypershift

Length of output: 4005

🏁 Script executed:

# Check the controller-runtime and kubernetes patterns for conflict handling rg -n "MergeFromWithOptimisticLock" . --type go | head -10

Repository: openshift/hypershift

Length of output: 1802

🏁 Script executed:

# Check the vendored controller-runtime to understand conflict handling patterns cat vendor/sigs.k8s.io/controller-runtime/pkg/client/patch.go | sed -n '50,80p'

Repository: openshift/hypershift

Length of output: 1445

🏁 Script executed:

# Search for error handling patterns around optimistic locks rg -n "apiutil|IsConflict|StatusError" control-plane-operator/ --type go | head -15

Repository: openshift/hypershift

Length of output: 1952

🏁 Script executed:

# Check if there are any examples of conflict detection in the codebase rg -n "apierrors.IsConflict|StatusError" . --type go | head -20

Repository: openshift/hypershift

Length of output: 3023

🌐 Web query:

Kubernetes controller-runtime conflict 409 error handling optimistic locking best practices

💡 Result:

Handling 409 Conflict errors in Kubernetes is an essential aspect of managing optimistic concurrency [1][2]. When multiple clients (e.g., controllers, users, HPA) modify the same resource, the Kubernetes API server uses the resourceVersion field to ensure updates are based on the latest state [1][3]. If a conflict occurs, the API server rejects the update with a 409 Conflict status [3][4]. Best practices for handling these conflicts include: 1. Let the Controller Reconcile (Recommended for most cases) In standard controller-runtime development, the most idiomatic approach is to return the error from your Reconcile function [5][3]. This allows the controller-runtime's built-in work queue to catch the error, log it, and automatically re-enqueue the request with exponential backoff [6][7][8]. This avoids manual retry loops inside your reconciliation logic and keeps the controller simple and idempotent [9][10]. 2. Use RetryOnConflict (For specific update operations) If you are performing a complex update operation within a reconciliation loop that must succeed before continuing, you can use the k8s.io/client-go/util/retry helper [5][11][12]. This helper manages the re-fetching of the object to get the latest resourceVersion before attempting the update again [1][12][13]. Example: err:= retry.RetryOnConflict(retry.DefaultRetry, func error { // 1. Re-fetch the latest version of the object if err:= r.Get(ctx, req.NamespacedName, latestObj); err!= nil { return err } // 2. Apply modifications to the fresh object latestObj.Status.Phase = "Running" // 3. Attempt update return r.Status.Update(ctx, latestObj) }) 3. Consider Server-Side Apply For complex objects with many fields modified by multiple controllers, Server-Side Apply (SSA) is the preferred alternative to traditional Update calls [5]. SSA allows for field-level ownership, significantly reducing the frequency of 409 conflicts compared to full object updates [5][12]. Key Principles: - Avoid manual, infinite retry loops. - Do not manually manipulate or suppress the resourceVersion field to bypass concurrency checks [2]. - Separate spec updates from status updates; status subresources help minimize conflicts [2]. - Leverage the framework: Return errors to the reconciler to benefit from built-in rate-limiting and backoff [6][8].

Citations:

1: https://medium.com/@pankajhasija2109/mastering-kubernetes-api-concurrency-7cc3083ffbb9

2: https://oneuptime.com/blog/post/2026-02-09-optimistic-concurrency-resourceversion/view

3: https://github.com/gardener/gardener/blob/master/docs/development/kubernetes-clients.md

4: https://medium.com/@timebertt/kubernetes-controllers-at-scale-clients-caches-conflicts-patches-explained-aa0f7a8b4332

5: How to elegantly solve the update conflict problem kubernetes-sigs/controller-runtime#1748

6: https://dev.to/jamesli/client-go-deep-dive-workqueue-the-reliable-task-queue-for-kubernetes-controllers-3pjc

7: https://oneuptime.com/blog/post/2026-02-09-reconciliation-loops-exponential-backoff/view

8: https://stuartleeks.com/posts/error-back-off-with-controller-runtime/

9: Docs suggestion: Optimistic Concurrency kubernetes-sigs/kubebuilder#617

10: https://book.kubebuilder.io/cronjob-tutorial/controller-implementation.html

11: https://oneuptime.com/blog/post/2026-02-09-retry-logic-api-conflicts-controllers/view

12: https://platformwale.blog/2026/02/24/kubernetes-operators-a-deep-dive-into-the-internals/

13: https://medium.com/@vamshitejanizam/building-resilient-kubernetes-controllers-a-practical-guide-to-retry-mechanisms-0d689160fa51

🏁 Script executed:

# Check for controller-runtime configuration or custom backoff settings rg -n "DefaultRetry|ExponentialBackoff|SetupWithManager" control-plane-operator/hostedclusterconfigoperator/controllers/hcpstatus/hcpstatus.go -A 3

Repository: openshift/hypershift

Length of output: 46

🏁 Script executed:

# Check if there's any custom error handling wrapper in the codebase rg -n "apierrors.IsConflict" . --type go -B 3 -A 3 | head -40

Repository: openshift/hypershift

Length of output: 3994

Handle optimistic-lock conflicts distinctly from other patch failures.

The current code treats 409 conflicts from optimistic locking the same as other patch errors, which can create misleading error metrics and logs. Conflicts are expected under contention and should be handled separately. Detect conflicts using apierrors.IsConflict(err) and return nil error to allow the framework's work queue to naturally requeue with exponential backoff, keeping other patch errors as genuine failures. See reference implementation in vendor/sigs.k8s.io/cluster-api/util/patch/patch.go for this pattern.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@control-plane-operator/hostedclusterconfigoperator/controllers/hcpstatus/hcpstatus.go` around lines 79 - 80, The patch call on h.mgtClusterClient.Status().Patch (using crclient.MergeFromWithOptions and MergeFromWithOptimisticLock) must treat optimistic-lock conflicts differently: after the Patch returns an error, detect conflicts with apierrors.IsConflict(err) and, if true, return reconcile.Result{} with a nil error so the controller runtime can requeue with backoff; for non-conflict errors continue returning the wrapped fmt.Errorf("failed to patch hcp status: %w", err). Locate the error handling directly after the Patch invocation in the hcp status reconciliation code and insert the apierrors.IsConflict check there.

coderabbitai · 2026-05-20T23:38:11Z

+	originalHCP := hcp.DeepCopy()
 	condition.ObservedGeneration = hcp.Generation
 	meta.SetStatusCondition(&hcp.Status.Conditions, condition)
-	return r.Status().Update(ctx, hcp)
+	return r.Status().Patch(ctx, hcp, client.MergeFromWithOptions(originalHCP, client.MergeFromWithOptimisticLock{}))


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

cd hypershift-operator/controllers/etcdbackup && wc -l reconciler.go

Repository: openshift/hypershift

Length of output: 82

🏁 Script executed:

# First, let's look at the updateHCPBackupCondition function cd hypershift-operator/controllers/etcdbackup && sed -n '295,310p' reconciler.go

Repository: openshift/hypershift

Length of output: 968

🏁 Script executed:

# Now examine the callers at lines 193, 452, and 484 cd hypershift-operator/controllers/etcdbackup && sed -n '188,198p' reconciler.go

Repository: openshift/hypershift

Length of output: 396

🏁 Script executed:

cd hypershift-operator/controllers/etcdbackup && sed -n '447,457p' reconciler.go

Repository: openshift/hypershift

Length of output: 391

🏁 Script executed:

cd hypershift-operator/controllers/etcdbackup && sed -n '479,489p' reconciler.go

Repository: openshift/hypershift

Length of output: 393

🏁 Script executed:

# Check imports to verify retry is available cd hypershift-operator/controllers/etcdbackup && head -30 reconciler.go

Repository: openshift/hypershift

Length of output: 950

Retry optimistic-lock conflicts instead of silently dropping the HCP condition update.

MergeFromWithOptimisticLock{} surfaces 409 Conflict errors when the HCP is concurrently modified, but all callers at lines 193, 452, and 484 only log the error and continue. This means conflicts can permanently stale EtcdBackupSucceeded conditions on all three terminal paths (in-progress, success, failure).

The adjacent function updateHostedClusterBackupURL already handles this correctly with retry.RetryOnConflict, reflecting the same concurrency concern. Wrap this function's patch in retry.RetryOnConflict with a fresh HCP read to safely retry on conflicts.

Proposed fix

func (r *HCPEtcdBackupReconciler) updateHCPBackupCondition(ctx context.Context, hcp *hyperv1.HostedControlPlane, condition metav1.Condition) error { - originalHCP := hcp.DeepCopy() - condition.ObservedGeneration = hcp.Generation - meta.SetStatusCondition(&hcp.Status.Conditions, condition) - return r.Status().Patch(ctx, hcp, client.MergeFromWithOptions(originalHCP, client.MergeFromWithOptimisticLock{})) + return retry.RetryOnConflict(retry.DefaultBackoff, func() error { + current := &hyperv1.HostedControlPlane{} + if err := r.Get(ctx, client.ObjectKeyFromObject(hcp), current); err != nil { + return err + } + + originalHCP := current.DeepCopy() + condition.ObservedGeneration = current.Generation + meta.SetStatusCondition(&current.Status.Conditions, condition) + return r.Status().Patch(ctx, current, client.MergeFromWithOptions(originalHCP, client.MergeFromWithOptimisticLock{})) + }) }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@hypershift-operator/controllers/etcdbackup/reconciler.go` around lines 302 - 305, The patch that sets meta.SetStatusCondition and calls r.Status().Patch(hcp, client.MergeFromWithOptions(originalHCP, client.MergeFromWithOptimisticLock{})) should be wrapped in retry.RetryOnConflict so optimistic-lock 409s are retried: on each retry re-get the latest HCP (fresh read into a new variable), reapply condition.ObservedGeneration = hcp.Generation and meta.SetStatusCondition(&hcp.Status.Conditions, condition) against the fresh HCP, then call r.Status().Patch with client.MergeFromWithOptions(originalHCP, client.MergeFromWithOptimisticLock{}) (or use a new merge-from base as appropriate) until success or non-conflict error; ensure errors are returned if retries exhaust.

codecov · 2026-05-20T23:41:49Z

Codecov Report

❌ Patch coverage is 64.19753% with 29 lines in your changes missing coverage. Please review.
✅ Project coverage is 40.43%. Comparing base (d86f3d4) to head (9995cf5).
⚠️ Report is 17 commits behind head on main.

Files with missing lines	Patch %	Lines
...rconfigoperator/controllers/hcpstatus/hcpstatus.go	70.49%	16 Missing and 2 partials ⚠️
...ostedcontrolplane/hostedcontrolplane_controller.go	22.22%	5 Missing and 2 partials ⚠️
...rconfigoperator/controllers/resources/resources.go	42.85%	2 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #8562      +/-   ##
==========================================
+ Coverage   40.34%   40.43%   +0.08%     
==========================================
  Files         755      755              
  Lines       93167    93298     +131     
==========================================
+ Hits        37587    37723     +136     
+ Misses      52877    52871       -6     
- Partials     2703     2704       +1

Files with missing lines	Coverage Δ
...hift-operator/controllers/etcdbackup/reconciler.go	`76.81% <100.00%> (+0.03%)`	⬆️
...ition-server/controllers/local_ignitionprovider.go	`38.14% <100.00%> (+0.08%)`	⬆️
...rconfigoperator/controllers/resources/resources.go	`55.39% <42.85%> (+0.01%)`	⬆️
...ostedcontrolplane/hostedcontrolplane_controller.go	`45.05% <22.22%> (+0.02%)`	⬆️
...rconfigoperator/controllers/hcpstatus/hcpstatus.go	`23.36% <70.49%> (+23.36%)`	⬆️

... and 3 files with indirect coverage changes

Flag	Coverage Δ
cmd-support	`34.44% <ø> (+0.13%)`	⬆️
cpo-hostedcontrolplane	`41.77% <22.22%> (+<0.01%)`	⬆️
cpo-other	`40.48% <67.64%> (+0.34%)`	⬆️
hypershift-operator	`50.73% <100.00%> (+<0.01%)`	⬆️
other	`31.54% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

ignition-server/controllers/local_ignitionprovider.go (1)
767-788: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use retry.RetryOnConflict to prevent silent condition update loss on 409 conflicts.

This function uses optimistic locking but doesn't retry on conflicts. Since runMCO at lines 450-452 only logs the error, any 409 conflict will silently drop the IgnitionServerValidReleaseInfo condition update. Wrap the patch in retry.RetryOnConflict(retry.DefaultBackoff, func() error {...}) with a fresh HCP read inside the closure, following the pattern used in support/k8sutil.UpdateObject and elsewhere in the codebase.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ignition-server/controllers/local_ignitionprovider.go` around lines 767 -
788, The patch of hostedControlPlane’s status can silently fail on 409
conflicts; wrap the r.Client.Status().Patch call in
retry.RetryOnConflict(retry.DefaultBackoff, func() error { ... }) and inside the
closure re-read the current HostedControlPlane (hostedControlPlane) from the API
to compute/merge the IgnitionServerValidReleaseInfo condition (using
releaseImageProvider.GetMissingImages() as before), then call
r.Client.Status().Patch (or reuse support/k8sutil.UpdateObject pattern)
returning any error so conflicts are retried; keep originalHCP only for the
MergeFrom base inside the closure.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@control-plane-operator/hostedclusterconfigoperator/controllers/hcpstatus/hcpstatus_test.go`:
- Around line 20-161: Add parallelization by calling t.Parallel() at the start
of the top-level TestMergeFromPatchDoesNotStompUnrelatedFields function and
inside each subtest's closure (the anonymous func passed to t.Run).
Specifically, insert t.Parallel() as the first statement in
TestMergeFromPatchDoesNotStompUnrelatedFields and as the first statement in each
subtest function (the func(t *testing.T) for the three t.Run blocks) so the test
and its subtests run with Go's t.Parallel.

---

Outside diff comments:
In `@ignition-server/controllers/local_ignitionprovider.go`:
- Around line 767-788: The patch of hostedControlPlane’s status can silently
fail on 409 conflicts; wrap the r.Client.Status().Patch call in
retry.RetryOnConflict(retry.DefaultBackoff, func() error { ... }) and inside the
closure re-read the current HostedControlPlane (hostedControlPlane) from the API
to compute/merge the IgnitionServerValidReleaseInfo condition (using
releaseImageProvider.GetMissingImages() as before), then call
r.Client.Status().Patch (or reuse support/k8sutil.UpdateObject pattern)
returning any error so conflicts are retried; keep originalHCP only for the
MergeFrom base inside the closure.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 6b55879f-4645-4cc2-9269-fc6811eee015

📥 Commits

Reviewing files that changed from the base of the PR and between 58ac9ee and 10d5332.

📒 Files selected for processing (6)

control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go
control-plane-operator/hostedclusterconfigoperator/controllers/hcpstatus/hcpstatus.go
control-plane-operator/hostedclusterconfigoperator/controllers/hcpstatus/hcpstatus_test.go
control-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.go
hypershift-operator/controllers/etcdbackup/reconciler.go
ignition-server/controllers/local_ignitionprovider.go

jkyros · 2026-05-21T00:18:53Z

/test e2e-aws-autonode

So right now we have multiple controllers writing HCP status, and several of them are doing full Update() calls instead of patches, which means that the whole object gets overwritten. This results in race conditions such that things like e.g. vcpus get wiped from the status object when something else writes the object. This just swaps all those Status().Update() with Patch + MergeFrom + oplocks so the controllers should get 409'd and retry instead of stomping on fields updated by other controllers. Signed-off-by: John Kyros <jkyros@redhat.com>

hypershift-jira-solve-ci · 2026-05-21T07:33:59Z

Now I have the complete picture. Let me compile the final report.

Test Failure Analysis Complete

Job Information

Prow Job: pull-ci-openshift-hypershift-main-e2e-aws-autonode
Build ID: 2057254802236116992
Target: e2e-aws-autonode
PR: OCPBUGS-86306: Stop controllers fighting over HCP status #8562 (OCPBUGS-86306: Stop controllers fighting over HCP status)
Failed Test: TestKarpenter/ValidateHostedCluster
Duration: 2502s (41m42s) — timed out after 30m rollout wait

Test Failure Analysis

Error

util.go:615: Failed to wait for HostedCluster e2e-clusters-zkl9p/karpenter-xq47w to rollout in 30m0s: context deadline exceeded
eventually.go:225: observed *v1beta1.HostedCluster e2e-clusters-zkl9p/karpenter-xq47w invalid at RV 27888 after 30m0s: wanted most recent version history to have state Completed, has state Partial

Incorrect conditions:
 - wanted ClusterVersionProgressing=False, got ClusterVersionProgressing=Unknown: StatusUnknown(Condition not found in the CVO.)
 - wanted ClusterVersionSucceeding=True, got ClusterVersionSucceeding=Unknown: StatusUnknown(Condition not found in the CVO.)
 - wanted ClusterVersionReleaseAccepted=True, got ClusterVersionReleaseAccepted=Unknown: StatusUnknown(Condition not found in the CVO.)
 - wanted ClusterVersionAvailable=True, got ClusterVersionAvailable=Unknown: StatusUnknown(Condition not found in the CVO.)

Summary

Two independent failures — neither caused by PR #8562's code changes.

1. e2e-aws-autonode (Prow): The TestKarpenter/ValidateHostedCluster test failed because the Cluster Version Operator (CVO) in the hosted cluster never completed its rollout within 30 minutes. The version history remained in Partial state and all ClusterVersion* conditions stayed Unknown with message "Condition not found in the CVO." The test also hit API client rate limiting (client rate limiter Wait returned an error: context deadline exceeded), suggesting resource pressure on the management cluster. The PR's Status().Update() → Status().Patch() changes are uninvolved — the hcpstatus controller correctly reports Unknown when CVO conditions are absent; the root issue is the CVO never set them.

2. codecov/patch: The PR achieved 37.50% diff coverage against a target of 40.34%. The new test file (hcpstatus_test.go, 161 lines) provides good coverage of patch semantics, but several modified files (hostedcontrolplane_controller.go, resources.go, local_ignitionprovider.go, reconciler.go) contain patching changes in code paths that are not exercised by unit tests, pulling patch coverage below the threshold.

Root Cause

e2e-aws-autonode — Pre-existing flake, unrelated to PR #8562:

The CVO in the hosted cluster failed to complete its initial rollout. The failure sequence:

HostedCluster karpenter-xq47w was created successfully (24s)
Guest API server became reachable after 2m24s (with initial DNS resolution failures and i/o timeouts — normal during bootstrap)
2 worker nodes joined and became ready (8m24s)
The test waited 30 minutes for the HostedCluster rollout to reach state: Completed, but it remained at state: Partial
All CVO conditions (ClusterVersionProgressing, ClusterVersionAvailable, ClusterVersionSucceeding, ClusterVersionReleaseAccepted) were Unknown because the CVO in the hosted cluster never populated its conditions on the ClusterVersion object
During this wait, the test also encountered client rate limiter Wait returned an error: context deadline exceeded, indicating API server throttling prevented even basic HC status polling

The PR changes the hcpstatus controller from Status().Update() to Status().Patch() with optimistic locking. This controller reads CVO conditions from the hosted cluster's ClusterVersion object and writes them to HCP status. The Unknown status is the correct behavior when CVO conditions are absent — it means the controller ran and found no conditions, not that the controller failed. The CVO conditions being absent is a hosted cluster infrastructure issue (CVO deployment never completed), not a controller write mechanism issue.

codecov/patch — Coverage gap in modified files:

The PR modified 6 files but added unit tests only for hcpstatus.go. The other files (hostedcontrolplane_controller.go, resources.go, local_ignitionprovider.go, reconciler.go) contain Status().Update() → Status().Patch() changes in code paths that are integration-tested by e2e but not unit-tested, resulting in 37.50% patch coverage (below the 40.34% target).

Recommendations

e2e-aws-autonode: Retry the job — this is a pre-existing infrastructure flake where the hosted cluster CVO failed to complete its initial rollout within 30 minutes. The failure is unrelated to the PR's Update→Patch changes. The client rate limiter errors suggest management cluster resource pressure during this run.
codecov/patch: To resolve the coverage gate, consider adding unit tests for the modified patch paths in:
- hostedcontrolplane_controller.go — reconcileValidIDPConfigurationCondition and removeCloudResources
- resources.go — reconcileClusterRecovery and destroyCloudResources
- reconciler.go — updateHCPBackupCondition
- local_ignitionprovider.go — reconcileValidReleaseInfoCondition
Alternatively, if the project treats codecov/patch as non-blocking, no action is needed.
Consider adding retry-on-conflict logic (as noted by CodeRabbit review): The optimistic lock patches can fail with 409 Conflict errors when multiple controllers update HCP status concurrently. Adding retry.RetryOnConflict would make the patches more resilient, though the controller-runtime reconcile loop provides built-in retry via requeue.

Evidence

Evidence	Detail
Test name	`TestKarpenter/ValidateHostedCluster`
Failure mode	30m timeout waiting for rollout `state: Completed`, stuck at `state: Partial`
CVO conditions	All `Unknown: StatusUnknown(Condition not found in the CVO.)` — CVO never set them
HC status	`Available=True`, `Progressing=False`, `EtcdAvailable=True`, `KubeAPIServerAvailable=True`
Rate limiter	`client rate limiter Wait returned an error: context deadline exceeded` during rollout wait
Version history	`Partial` — CVO install never completed
API connectivity	Initial DNS failures + i/o timeouts resolved after 2m24s (normal bootstrap)
Nodes	2 nodes became ready successfully (8m24s)
PR changes	`Status().Update()` → `Status().Patch(MergeFromWithOptimisticLock)` — no CVO condition logic changed
codecov/patch	37.50% diff coverage vs 40.34% target — insufficient unit test coverage on modified files
codecov/project	40.34% (+0.00%) — overall project coverage unaffected

So our conversion from Update() to Patch() was clever, but our patch calls can't write a nil, the field just get omitted, and we have fields where that nil is normal, expected, and necessary, so this tries to work around it. This is gross but if it works it at least we know we understand the problem. Signed-off-by: John Kyros <jkyros@redhat.com>

jkyros · 2026-05-21T17:18:48Z

/test e2e-aws-autonode

openshift-ci · 2026-05-21T18:49:54Z

@jkyros: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

jkyros · 2026-05-21T20:17:52Z

Ugh that passed. Great. There's something in here with Patch and fields that are +required +nullable that I think breaks it if we don't mess with it. The grotesque way I solved it with the JSON patch is not ideal. Trying to do better.

openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 20, 2026

openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels May 20, 2026

openshift-ci Bot added the do-not-merge/needs-area label May 20, 2026

coderabbitai Bot reviewed May 20, 2026

View reviewed changes

jkyros force-pushed the karpenter-fix-hcp-status-race-oplocks branch from 58ac9ee to 10d5332 Compare May 20, 2026 23:59

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

Comment thread control-plane-operator/hostedclusterconfigoperator/controllers/hcpstatus/hcpstatus_test.go Outdated

jkyros force-pushed the karpenter-fix-hcp-status-race-oplocks branch from 10d5332 to 9995cf5 Compare May 21, 2026 16:37

hypershift-jira-solve-ci Bot mentioned this pull request May 22, 2026

OCPBUGS-85151: Re-enable serviceaccount-pull-secrets controller when registry managementState changes from Removed #8522

Open

4 tasks

		if err := h.mgtClusterClient.Status().Patch(ctx, hcp, crclient.MergeFromWithOptions(originalHCP, crclient.MergeFromWithOptimisticLock{})); err != nil {
		return reconcile.Result{}, fmt.Errorf("failed to patch hcp status: %w", err)

Conversation

jkyros commented May 20, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Checklist:

Summary by CodeRabbit

Uh oh!

openshift-merge-bot Bot commented May 20, 2026

Uh oh!

openshift-ci Bot commented May 20, 2026

Uh oh!

openshift-ci-robot commented May 20, 2026

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Checklist:

Uh oh!

coderabbitai Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Suggested reviewers

❌ Failed checks (2 warnings)

Uh oh!

openshift-ci Bot commented May 20, 2026

Uh oh!

openshift-ci-robot commented May 20, 2026

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Checklist:

Summary by CodeRabbit

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jkyros commented May 21, 2026

Uh oh!

hypershift-jira-solve-ci Bot commented May 21, 2026

Test Failure Analysis Complete

Job Information

Test Failure Analysis

Error

Summary

Uh oh!

jkyros commented May 21, 2026

Uh oh!

openshift-ci Bot commented May 21, 2026

Uh oh!

jkyros commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jkyros commented May 20, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 20, 2026 •

edited

Loading

codecov Bot commented May 20, 2026 •

edited

Loading