Skip to content

OCPBUGS-86306: Stop controllers fighting over HCP status#8562

Draft
jkyros wants to merge 2 commits into
openshift:mainfrom
jkyros:karpenter-fix-hcp-status-race-oplocks
Draft

OCPBUGS-86306: Stop controllers fighting over HCP status#8562
jkyros wants to merge 2 commits into
openshift:mainfrom
jkyros:karpenter-fix-hcp-status-race-oplocks

Conversation

@jkyros
Copy link
Copy Markdown
Member

@jkyros jkyros commented May 20, 2026

What this PR does / why we need it:

  • We have multiple controllers writing to the HCP's status object with Update() calls, which replaces the entire object
  • Depending on what the controller performing the Update() understands and sees, and when it performs the update, that potentially blows away fields like vcpus from the autonode status that we added here: AUTOSCALE-615: include Karpenter node vCPUs in billing metric #8265
  • This just replaces the Status().Update() calls with calls to Patch calls that use client.MergeFromWithOptions() along with client.MergeFromWithOptimisticLock{} so that the full object updates don't wipe other controllers' status update
  • Testing this explicitly is unpleasant without some kind of refactor, the parts controllers that write the status aren't set up for easy testing, but it's observably possible the issue can happen because of the use of Update() calls across multiple controllers on the same object

Which issue(s) this PR fixes:

Fixes OCPBUGS-86306

Special notes for your reviewer:

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Summary by CodeRabbit

  • Refactor

    • Switched status persistence across multiple controllers to optimistic-locking status patches for more robust concurrent updates and clearer success/error logs.
  • Tests

    • Added tests validating status patch generation and behavior: include intended fields, preserve unrelated status fields, and retain full condition arrays when updating a single condition.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 20, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 20, 2026
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels May 20, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@jkyros: This pull request references Jira Issue OCPBUGS-86306, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

What this PR does / why we need it:

  • We have multiple controllers writing to the HCP's status object with Update() calls, which replaces the entire object
  • Depending on what the controller performing the Update() understands and sees, and when it performs the update, that potentially blows away fields like vcpus from the autonode status that we added here: AUTOSCALE-615: include Karpenter node vCPUs in billing metric #8265
  • This just replaces the Status().Update() calls with calls to Patch calls that use client.MergeFromWithOptions() along with client.MergeFromWithOptimisticLock{} so that the full object updates don't wipe other controllers' status update
  • Testing this explicitly is unpleasant without some kind of refactor, the parts controllers that write the status aren't set up for easy testing, but it's observably possible the issue can happen because of the use of Update() calls across multiple controllers on the same object

Which issue(s) this PR fixes:

Fixes OCPBUGS-86306

Special notes for your reviewer:

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: cf90fa5a-fc68-4d1a-bf6a-6c30a5364c66

📥 Commits

Reviewing files that changed from the base of the PR and between 10d5332 and 9995cf5.

📒 Files selected for processing (7)
  • control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/hcpstatus/hcpstatus.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/hcpstatus/hcpstatus_test.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.go
  • hypershift-operator/controllers/etcdbackup/reconciler.go
  • hypershift-operator/controllers/etcdbackup/reconciler_test.go
  • ignition-server/controllers/local_ignitionprovider.go

📝 Walkthrough

Walkthrough

This pull request replaces direct HostedControlPlane status updates with optimistic-locking status patches across multiple controllers. The hcpstatus controller now generates RFC‑6902 JSON Patch ops (including a resourceVersion test) and issues Status().Patch. Other controllers (hostedcontrolplane, resources, etcdbackup, ignition-server) deep-copy original HCP objects and persist condition changes via Status().Patch with MergeFromWithOptimisticLock. New unit tests validate patch generation and backup-condition patch behavior.

Suggested reviewers

  • enxebre
  • csrwng
🚥 Pre-merge checks | ✅ 10 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning TestUpdateHCPBackupCondition lacks messages on critical assertions: cond.Status, cond.Reason, cond.ObservedGeneration, and FindStatusCondition checks. Add meaningful messages to all critical assertions in TestUpdateHCPBackupCondition to help diagnose test failures. Example: g.Expect(cond.Status).To(Equal(...), "EtcdBackupSucceeded condition status mismatch").
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'OCPBUGS-86306: Stop controllers fighting over HCP status' clearly and concisely summarizes the main change: replacing Status().Update() calls with optimistic-locking patches to prevent concurrent status conflicts.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All 9 test cases (7 in TestBuildStatusPatch, 2 in TestUpdateHCPBackupCondition) use stable, deterministic test names with only static descriptive strings, no dynamic values or generated identifiers.
Microshift Test Compatibility ✅ Passed No Ginkgo e2e tests (It, Describe, Context, When) added. PR adds only standard Go unit tests: TestBuildStatusPatch and TestUpdateHCPBackupCondition using testing.T and Gomega.
Single Node Openshift (Sno) Test Compatibility ✅ Passed The PR adds two standard Go unit tests (TestBuildStatusPatch, TestUpdateHCPBackupCondition), not Ginkgo e2e tests. SNO compatibility check applies only to Ginkgo tests, so not applicable here.
Topology-Aware Scheduling Compatibility ✅ Passed PR only changes status update mechanism (Update→Patch with optimistic locking). No new scheduling constraints, affinity rules, or topology assumptions introduced.
Ote Binary Stdout Contract ✅ Passed No violations detected. Changes involve only Kubernetes client Status().Patch calls in controller reconcilers. No process-level stdout writes, fmt.Print calls, or Ginkgo suite functions.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No new Ginkgo e2e tests added. PR adds only standard Go unit tests (TestBuildStatusPatch, TestUpdateHCPBackupCondition) using testing.T, not Ginkgo framework.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release and removed do-not-merge/needs-area labels May 20, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 20, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jkyros
Once this PR has been reviewed and has the lgtm label, please assign enxebre for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link
Copy Markdown

@jkyros: This pull request references Jira Issue OCPBUGS-86306, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

What this PR does / why we need it:

  • We have multiple controllers writing to the HCP's status object with Update() calls, which replaces the entire object
  • Depending on what the controller performing the Update() understands and sees, and when it performs the update, that potentially blows away fields like vcpus from the autonode status that we added here: AUTOSCALE-615: include Karpenter node vCPUs in billing metric #8265
  • This just replaces the Status().Update() calls with calls to Patch calls that use client.MergeFromWithOptions() along with client.MergeFromWithOptimisticLock{} so that the full object updates don't wipe other controllers' status update
  • Testing this explicitly is unpleasant without some kind of refactor, the parts controllers that write the status aren't set up for easy testing, but it's observably possible the issue can happen because of the use of Update() calls across multiple controllers on the same object

Which issue(s) this PR fixes:

Fixes OCPBUGS-86306

Special notes for your reviewer:

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Summary by CodeRabbit

  • Refactor

  • Improved status condition handling across hosted cluster controllers for more reliable concurrent updates.

  • Tests

  • Added test coverage for status condition patching behavior to ensure accurate field updates without unintended side effects.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
ignition-server/controllers/local_ignitionprovider.go (1)

768-788: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Wrap reconcileValidReleaseInfoCondition in retry logic for optimistic-lock conflicts.

This helper now uses Status().Patch(..., client.MergeFromWithOptimisticLock{}), which returns 409 conflicts during concurrent HCP status writes. However, the caller at lines 451-452 only logs the error and continues, leaving IgnitionServerValidReleaseInfo stale indefinitely—there's no guaranteed near-term retry.

Wrap the helper logic in retry.RetryOnConflict(retry.DefaultBackoff, func() { ... }) and re-fetch the HostedControlPlane inside the retry block. This pattern is already used in support/k8sutil/object.go and aligns with Kubernetes best practices.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ignition-server/controllers/local_ignitionprovider.go` around lines 768 -
788, Wrap the existing status-patch logic in
retry.RetryOnConflict(retry.DefaultBackoff, func() error { ... }) so
optimistic-lock 409s are retried: inside the retry block re-fetch the latest
HostedControlPlane using r.Client.Get into hostedControlPlane, create
originalHCP := hostedControlPlane.DeepCopy(), compute/set the
IgnitionServerValidReleaseInfo condition (use
releaseImageProvider.GetMissingImages() and meta.SetStatusCondition as before),
then call r.Client.Status().Patch(ctx, &hostedControlPlane,
client.MergeFromWithOptions(originalHCP, client.MergeFromWithOptimisticLock{}))
and return the patch error; let RetryOnConflict surface a final error if retries
fail.
🧹 Nitpick comments (3)
control-plane-operator/hostedclusterconfigoperator/controllers/hcpstatus/hcpstatus_test.go (2)

19-160: ⚡ Quick win

Add parallelization to align with unit-test execution guidance.

This test suite does not use t.Parallel(), so it misses the repository’s unit-test parallelism requirement.

Suggested change
 func TestMergeFromPatchDoesNotStompUnrelatedFields(t *testing.T) {
+	t.Parallel()
 	t.Run("When hcpstatus controller patches conditions and versionStatus, it should not include autoNode in the patch", func(t *testing.T) {
+		t.Parallel()
 		g := NewWithT(t)
@@
 	t.Run("When hcpstatus controller makes no changes, it should produce an empty patch", func(t *testing.T) {
+		t.Parallel()
 		g := NewWithT(t)
@@
 	t.Run("When a condition-only controller patches, the conditions array should contain ALL conditions from the read", func(t *testing.T) {
+		t.Parallel()
 		g := NewWithT(t)

As per coding guidelines, "Use race detection and parallel execution for unit tests".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/hcpstatus/hcpstatus_test.go`
around lines 19 - 160, The test suite
TestMergeFromPatchDoesNotStompUnrelatedFields is missing parallelization; add
t.Parallel() at the start of TestMergeFromPatchDoesNotStompUnrelatedFields and
inside each subtest closure (the three t.Run anonymous funcs) so each subtest
and the parent test run with testing parallelism and race detection enabled;
look for the TestMergeFromPatchDoesNotStompUnrelatedFields function and the
three t.Run blocks to insert the t.Parallel() calls.

151-152: ⚡ Quick win

Avoid panic-prone type assertions in tests.

These direct assertions can panic and hide the real failure context. Prefer checked assertions so failures stay diagnostic.

Suggested change
-		status := patchMap["status"].(map[string]interface{})
-		conditions := status["conditions"].([]interface{})
+		status, ok := patchMap["status"].(map[string]interface{})
+		g.Expect(ok).To(BeTrue(), "patch should contain a status object")
+		conditions, ok := status["conditions"].([]interface{})
+		g.Expect(ok).To(BeTrue(), "status.conditions should be a JSON array")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/hcpstatus/hcpstatus_test.go`
around lines 151 - 152, Replace the unchecked type assertions on
patchMap["status"] and status["conditions"] with guarded checks so the test
fails with a clear message instead of panicking: retrieve the value using the
comma-ok form (e.g., statusIface, ok := patchMap["status"]; if !ok {
t.Fatalf(...) } then status, ok := statusIface.(map[string]interface{}); if !ok
{ t.Fatalf(...) }) and likewise check conditionsIface, ok :=
status["conditions"]; if !ok { t.Fatalf(...) } then conditions, ok :=
conditionsIface.([]interface{}); if !ok { t.Fatalf(...) }); alternatively use
your test helper (e.g., require.IsType/require.NotNil) around the same symbols
(patchMap, status, conditions) to make assertions explicit and diagnostic.
control-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.go (1)

570-576: ⚡ Quick win

Skip no-op recovery status patches.

reconcileClusterRecovery always patches after meta.SetStatusCondition, even when the condition is unchanged. Because this path polls every 120 seconds, steady-state recovery checks become repeated write attempts and avoidable optimistic-lock conflicts on the same HCP status object. Guard on the boolean returned by meta.SetStatusCondition to skip writing unchanged conditions.

♻️ Suggested change
 	originalHCP := hcp.DeepCopy()
-	meta.SetStatusCondition(&hcp.Status.Conditions, *condition)
+	if !meta.SetStatusCondition(&hcp.Status.Conditions, *condition) {
+		if !finished {
+			return ctrl.Result{RequeueAfter: 120 * time.Second}, nil
+		}
+		return ctrl.Result{}, nil
+	}
 	log.Info("setting condition", "type", condition.Type, "status", condition.Status, "message", condition.Message)
 	if err := r.cpClient.Status().Patch(ctx, hcp, client.MergeFromWithOptions(originalHCP, client.MergeFromWithOptimisticLock{})); err != nil {
 		return ctrl.Result{}, fmt.Errorf("failed to patch status on hcp for hosted cluster recovery: %w. Condition error message: %v", err, condition.Message)
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.go`
around lines 570 - 576, reconcileClusterRecovery is always patching HCP status
even when meta.SetStatusCondition didn't change the condition; capture the
boolean return from meta.SetStatusCondition and only call
r.cpClient.Status().Patch (using originalHCP and the same
MergeFromWithOptions/OptimisticLock) if that boolean is true to skip no-op
writes and avoid redundant optimistic-lock updates.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@control-plane-operator/hostedclusterconfigoperator/controllers/hcpstatus/hcpstatus.go`:
- Around line 79-80: The patch call on h.mgtClusterClient.Status().Patch (using
crclient.MergeFromWithOptions and MergeFromWithOptimisticLock) must treat
optimistic-lock conflicts differently: after the Patch returns an error, detect
conflicts with apierrors.IsConflict(err) and, if true, return reconcile.Result{}
with a nil error so the controller runtime can requeue with backoff; for
non-conflict errors continue returning the wrapped fmt.Errorf("failed to patch
hcp status: %w", err). Locate the error handling directly after the Patch
invocation in the hcp status reconciliation code and insert the
apierrors.IsConflict check there.

In `@hypershift-operator/controllers/etcdbackup/reconciler.go`:
- Around line 302-305: The patch that sets meta.SetStatusCondition and calls
r.Status().Patch(hcp, client.MergeFromWithOptions(originalHCP,
client.MergeFromWithOptimisticLock{})) should be wrapped in
retry.RetryOnConflict so optimistic-lock 409s are retried: on each retry re-get
the latest HCP (fresh read into a new variable), reapply
condition.ObservedGeneration = hcp.Generation and
meta.SetStatusCondition(&hcp.Status.Conditions, condition) against the fresh
HCP, then call r.Status().Patch with client.MergeFromWithOptions(originalHCP,
client.MergeFromWithOptimisticLock{}) (or use a new merge-from base as
appropriate) until success or non-conflict error; ensure errors are returned if
retries exhaust.

---

Outside diff comments:
In `@ignition-server/controllers/local_ignitionprovider.go`:
- Around line 768-788: Wrap the existing status-patch logic in
retry.RetryOnConflict(retry.DefaultBackoff, func() error { ... }) so
optimistic-lock 409s are retried: inside the retry block re-fetch the latest
HostedControlPlane using r.Client.Get into hostedControlPlane, create
originalHCP := hostedControlPlane.DeepCopy(), compute/set the
IgnitionServerValidReleaseInfo condition (use
releaseImageProvider.GetMissingImages() and meta.SetStatusCondition as before),
then call r.Client.Status().Patch(ctx, &hostedControlPlane,
client.MergeFromWithOptions(originalHCP, client.MergeFromWithOptimisticLock{}))
and return the patch error; let RetryOnConflict surface a final error if retries
fail.

---

Nitpick comments:
In
`@control-plane-operator/hostedclusterconfigoperator/controllers/hcpstatus/hcpstatus_test.go`:
- Around line 19-160: The test suite
TestMergeFromPatchDoesNotStompUnrelatedFields is missing parallelization; add
t.Parallel() at the start of TestMergeFromPatchDoesNotStompUnrelatedFields and
inside each subtest closure (the three t.Run anonymous funcs) so each subtest
and the parent test run with testing parallelism and race detection enabled;
look for the TestMergeFromPatchDoesNotStompUnrelatedFields function and the
three t.Run blocks to insert the t.Parallel() calls.
- Around line 151-152: Replace the unchecked type assertions on
patchMap["status"] and status["conditions"] with guarded checks so the test
fails with a clear message instead of panicking: retrieve the value using the
comma-ok form (e.g., statusIface, ok := patchMap["status"]; if !ok {
t.Fatalf(...) } then status, ok := statusIface.(map[string]interface{}); if !ok
{ t.Fatalf(...) }) and likewise check conditionsIface, ok :=
status["conditions"]; if !ok { t.Fatalf(...) } then conditions, ok :=
conditionsIface.([]interface{}); if !ok { t.Fatalf(...) }); alternatively use
your test helper (e.g., require.IsType/require.NotNil) around the same symbols
(patchMap, status, conditions) to make assertions explicit and diagnostic.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.go`:
- Around line 570-576: reconcileClusterRecovery is always patching HCP status
even when meta.SetStatusCondition didn't change the condition; capture the
boolean return from meta.SetStatusCondition and only call
r.cpClient.Status().Patch (using originalHCP and the same
MergeFromWithOptions/OptimisticLock) if that boolean is true to skip no-op
writes and avoid redundant optimistic-lock updates.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: d0b2014e-1faa-4d10-a400-26d60e220b42

📥 Commits

Reviewing files that changed from the base of the PR and between c23dbb8 and 58ac9ee.

📒 Files selected for processing (6)
  • control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/hcpstatus/hcpstatus.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/hcpstatus/hcpstatus_test.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.go
  • hypershift-operator/controllers/etcdbackup/reconciler.go
  • ignition-server/controllers/local_ignitionprovider.go

Comment on lines +79 to +80
if err := h.mgtClusterClient.Status().Patch(ctx, hcp, crclient.MergeFromWithOptions(originalHCP, crclient.MergeFromWithOptimisticLock{})); err != nil {
return reconcile.Result{}, fmt.Errorf("failed to patch hcp status: %w", err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the file at the specified lines
cat -n control-plane-operator/hostedclusterconfigoperator/controllers/hcpstatus/hcpstatus.go | sed -n '70,90p'

Repository: openshift/hypershift

Length of output: 1230


🏁 Script executed:

# Let's also see more context around the patch operation
cat -n control-plane-operator/hostedclusterconfigoperator/controllers/hcpstatus/hcpstatus.go | sed -n '60,100p'

Repository: openshift/hypershift

Length of output: 2002


🏁 Script executed:

# Check if there are other patch operations in the file to understand the pattern
rg -n "Status\(\)\.Patch|MergeFromWithOptimisticLock" control-plane-operator/hostedclusterconfigoperator/controllers/hcpstatus/hcpstatus.go -A 2 -B 2

Repository: openshift/hypershift

Length of output: 450


🏁 Script executed:

# Search for how conflicts are handled elsewhere in the codebase
rg -n "409|Conflict|optimistic" control-plane-operator/ --type go -i | head -20

Repository: openshift/hypershift

Length of output: 4005


🏁 Script executed:

# Check the controller-runtime and kubernetes patterns for conflict handling
rg -n "MergeFromWithOptimisticLock" . --type go | head -10

Repository: openshift/hypershift

Length of output: 1802


🏁 Script executed:

# Check the vendored controller-runtime to understand conflict handling patterns
cat vendor/sigs.k8s.io/controller-runtime/pkg/client/patch.go | sed -n '50,80p'

Repository: openshift/hypershift

Length of output: 1445


🏁 Script executed:

# Search for error handling patterns around optimistic locks
rg -n "apiutil|IsConflict|StatusError" control-plane-operator/ --type go | head -15

Repository: openshift/hypershift

Length of output: 1952


🏁 Script executed:

# Check if there are any examples of conflict detection in the codebase
rg -n "apierrors.IsConflict|StatusError" . --type go | head -20

Repository: openshift/hypershift

Length of output: 3023


🌐 Web query:

Kubernetes controller-runtime conflict 409 error handling optimistic locking best practices

💡 Result:

Handling 409 Conflict errors in Kubernetes is an essential aspect of managing optimistic concurrency [1][2]. When multiple clients (e.g., controllers, users, HPA) modify the same resource, the Kubernetes API server uses the resourceVersion field to ensure updates are based on the latest state [1][3]. If a conflict occurs, the API server rejects the update with a 409 Conflict status [3][4]. Best practices for handling these conflicts include: 1. Let the Controller Reconcile (Recommended for most cases) In standard controller-runtime development, the most idiomatic approach is to return the error from your Reconcile function [5][3]. This allows the controller-runtime's built-in work queue to catch the error, log it, and automatically re-enqueue the request with exponential backoff [6][7][8]. This avoids manual retry loops inside your reconciliation logic and keeps the controller simple and idempotent [9][10]. 2. Use RetryOnConflict (For specific update operations) If you are performing a complex update operation within a reconciliation loop that must succeed before continuing, you can use the k8s.io/client-go/util/retry helper [5][11][12]. This helper manages the re-fetching of the object to get the latest resourceVersion before attempting the update again [1][12][13]. Example: err:= retry.RetryOnConflict(retry.DefaultRetry, func error { // 1. Re-fetch the latest version of the object if err:= r.Get(ctx, req.NamespacedName, latestObj); err!= nil { return err } // 2. Apply modifications to the fresh object latestObj.Status.Phase = "Running" // 3. Attempt update return r.Status.Update(ctx, latestObj) }) 3. Consider Server-Side Apply For complex objects with many fields modified by multiple controllers, Server-Side Apply (SSA) is the preferred alternative to traditional Update calls [5]. SSA allows for field-level ownership, significantly reducing the frequency of 409 conflicts compared to full object updates [5][12]. Key Principles: - Avoid manual, infinite retry loops. - Do not manually manipulate or suppress the resourceVersion field to bypass concurrency checks [2]. - Separate spec updates from status updates; status subresources help minimize conflicts [2]. - Leverage the framework: Return errors to the reconciler to benefit from built-in rate-limiting and backoff [6][8].

Citations:


🏁 Script executed:

# Check for controller-runtime configuration or custom backoff settings
rg -n "DefaultRetry|ExponentialBackoff|SetupWithManager" control-plane-operator/hostedclusterconfigoperator/controllers/hcpstatus/hcpstatus.go -A 3

Repository: openshift/hypershift

Length of output: 46


🏁 Script executed:

# Check if there's any custom error handling wrapper in the codebase
rg -n "apierrors.IsConflict" . --type go -B 3 -A 3 | head -40

Repository: openshift/hypershift

Length of output: 3994


Handle optimistic-lock conflicts distinctly from other patch failures.

The current code treats 409 conflicts from optimistic locking the same as other patch errors, which can create misleading error metrics and logs. Conflicts are expected under contention and should be handled separately. Detect conflicts using apierrors.IsConflict(err) and return nil error to allow the framework's work queue to naturally requeue with exponential backoff, keeping other patch errors as genuine failures. See reference implementation in vendor/sigs.k8s.io/cluster-api/util/patch/patch.go for this pattern.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/hcpstatus/hcpstatus.go`
around lines 79 - 80, The patch call on h.mgtClusterClient.Status().Patch (using
crclient.MergeFromWithOptions and MergeFromWithOptimisticLock) must treat
optimistic-lock conflicts differently: after the Patch returns an error, detect
conflicts with apierrors.IsConflict(err) and, if true, return reconcile.Result{}
with a nil error so the controller runtime can requeue with backoff; for
non-conflict errors continue returning the wrapped fmt.Errorf("failed to patch
hcp status: %w", err). Locate the error handling directly after the Patch
invocation in the hcp status reconciliation code and insert the
apierrors.IsConflict check there.

Comment on lines +302 to +305
originalHCP := hcp.DeepCopy()
condition.ObservedGeneration = hcp.Generation
meta.SetStatusCondition(&hcp.Status.Conditions, condition)
return r.Status().Update(ctx, hcp)
return r.Status().Patch(ctx, hcp, client.MergeFromWithOptions(originalHCP, client.MergeFromWithOptimisticLock{}))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

cd hypershift-operator/controllers/etcdbackup && wc -l reconciler.go

Repository: openshift/hypershift

Length of output: 82


🏁 Script executed:

# First, let's look at the updateHCPBackupCondition function
cd hypershift-operator/controllers/etcdbackup && sed -n '295,310p' reconciler.go

Repository: openshift/hypershift

Length of output: 968


🏁 Script executed:

# Now examine the callers at lines 193, 452, and 484
cd hypershift-operator/controllers/etcdbackup && sed -n '188,198p' reconciler.go

Repository: openshift/hypershift

Length of output: 396


🏁 Script executed:

cd hypershift-operator/controllers/etcdbackup && sed -n '447,457p' reconciler.go

Repository: openshift/hypershift

Length of output: 391


🏁 Script executed:

cd hypershift-operator/controllers/etcdbackup && sed -n '479,489p' reconciler.go

Repository: openshift/hypershift

Length of output: 393


🏁 Script executed:

# Check imports to verify retry is available
cd hypershift-operator/controllers/etcdbackup && head -30 reconciler.go

Repository: openshift/hypershift

Length of output: 950


Retry optimistic-lock conflicts instead of silently dropping the HCP condition update.

MergeFromWithOptimisticLock{} surfaces 409 Conflict errors when the HCP is concurrently modified, but all callers at lines 193, 452, and 484 only log the error and continue. This means conflicts can permanently stale EtcdBackupSucceeded conditions on all three terminal paths (in-progress, success, failure).

The adjacent function updateHostedClusterBackupURL already handles this correctly with retry.RetryOnConflict, reflecting the same concurrency concern. Wrap this function's patch in retry.RetryOnConflict with a fresh HCP read to safely retry on conflicts.

Proposed fix
 func (r *HCPEtcdBackupReconciler) updateHCPBackupCondition(ctx context.Context, hcp *hyperv1.HostedControlPlane, condition metav1.Condition) error {
-	originalHCP := hcp.DeepCopy()
-	condition.ObservedGeneration = hcp.Generation
-	meta.SetStatusCondition(&hcp.Status.Conditions, condition)
-	return r.Status().Patch(ctx, hcp, client.MergeFromWithOptions(originalHCP, client.MergeFromWithOptimisticLock{}))
+	return retry.RetryOnConflict(retry.DefaultBackoff, func() error {
+		current := &hyperv1.HostedControlPlane{}
+		if err := r.Get(ctx, client.ObjectKeyFromObject(hcp), current); err != nil {
+			return err
+		}
+
+		originalHCP := current.DeepCopy()
+		condition.ObservedGeneration = current.Generation
+		meta.SetStatusCondition(&current.Status.Conditions, condition)
+		return r.Status().Patch(ctx, current, client.MergeFromWithOptions(originalHCP, client.MergeFromWithOptimisticLock{}))
+	})
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hypershift-operator/controllers/etcdbackup/reconciler.go` around lines 302 -
305, The patch that sets meta.SetStatusCondition and calls r.Status().Patch(hcp,
client.MergeFromWithOptions(originalHCP, client.MergeFromWithOptimisticLock{}))
should be wrapped in retry.RetryOnConflict so optimistic-lock 409s are retried:
on each retry re-get the latest HCP (fresh read into a new variable), reapply
condition.ObservedGeneration = hcp.Generation and
meta.SetStatusCondition(&hcp.Status.Conditions, condition) against the fresh
HCP, then call r.Status().Patch with client.MergeFromWithOptions(originalHCP,
client.MergeFromWithOptimisticLock{}) (or use a new merge-from base as
appropriate) until success or non-conflict error; ensure errors are returned if
retries exhaust.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 20, 2026

Codecov Report

❌ Patch coverage is 64.19753% with 29 lines in your changes missing coverage. Please review.
✅ Project coverage is 40.43%. Comparing base (d86f3d4) to head (9995cf5).
⚠️ Report is 17 commits behind head on main.

Files with missing lines Patch % Lines
...rconfigoperator/controllers/hcpstatus/hcpstatus.go 70.49% 16 Missing and 2 partials ⚠️
...ostedcontrolplane/hostedcontrolplane_controller.go 22.22% 5 Missing and 2 partials ⚠️
...rconfigoperator/controllers/resources/resources.go 42.85% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8562      +/-   ##
==========================================
+ Coverage   40.34%   40.43%   +0.08%     
==========================================
  Files         755      755              
  Lines       93167    93298     +131     
==========================================
+ Hits        37587    37723     +136     
+ Misses      52877    52871       -6     
- Partials     2703     2704       +1     
Files with missing lines Coverage Δ
...hift-operator/controllers/etcdbackup/reconciler.go 76.81% <100.00%> (+0.03%) ⬆️
...ition-server/controllers/local_ignitionprovider.go 38.14% <100.00%> (+0.08%) ⬆️
...rconfigoperator/controllers/resources/resources.go 55.39% <42.85%> (+0.01%) ⬆️
...ostedcontrolplane/hostedcontrolplane_controller.go 45.05% <22.22%> (+0.02%) ⬆️
...rconfigoperator/controllers/hcpstatus/hcpstatus.go 23.36% <70.49%> (+23.36%) ⬆️

... and 3 files with indirect coverage changes

Flag Coverage Δ
cmd-support 34.44% <ø> (+0.13%) ⬆️
cpo-hostedcontrolplane 41.77% <22.22%> (+<0.01%) ⬆️
cpo-other 40.48% <67.64%> (+0.34%) ⬆️
hypershift-operator 50.73% <100.00%> (+<0.01%) ⬆️
other 31.54% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jkyros jkyros force-pushed the karpenter-fix-hcp-status-race-oplocks branch from 58ac9ee to 10d5332 Compare May 20, 2026 23:59
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
ignition-server/controllers/local_ignitionprovider.go (1)

767-788: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use retry.RetryOnConflict to prevent silent condition update loss on 409 conflicts.

This function uses optimistic locking but doesn't retry on conflicts. Since runMCO at lines 450-452 only logs the error, any 409 conflict will silently drop the IgnitionServerValidReleaseInfo condition update. Wrap the patch in retry.RetryOnConflict(retry.DefaultBackoff, func() error {...}) with a fresh HCP read inside the closure, following the pattern used in support/k8sutil.UpdateObject and elsewhere in the codebase.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ignition-server/controllers/local_ignitionprovider.go` around lines 767 -
788, The patch of hostedControlPlane’s status can silently fail on 409
conflicts; wrap the r.Client.Status().Patch call in
retry.RetryOnConflict(retry.DefaultBackoff, func() error { ... }) and inside the
closure re-read the current HostedControlPlane (hostedControlPlane) from the API
to compute/merge the IgnitionServerValidReleaseInfo condition (using
releaseImageProvider.GetMissingImages() as before), then call
r.Client.Status().Patch (or reuse support/k8sutil.UpdateObject pattern)
returning any error so conflicts are retried; keep originalHCP only for the
MergeFrom base inside the closure.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@control-plane-operator/hostedclusterconfigoperator/controllers/hcpstatus/hcpstatus_test.go`:
- Around line 20-161: Add parallelization by calling t.Parallel() at the start
of the top-level TestMergeFromPatchDoesNotStompUnrelatedFields function and
inside each subtest's closure (the anonymous func passed to t.Run).
Specifically, insert t.Parallel() as the first statement in
TestMergeFromPatchDoesNotStompUnrelatedFields and as the first statement in each
subtest function (the func(t *testing.T) for the three t.Run blocks) so the test
and its subtests run with Go's t.Parallel.

---

Outside diff comments:
In `@ignition-server/controllers/local_ignitionprovider.go`:
- Around line 767-788: The patch of hostedControlPlane’s status can silently
fail on 409 conflicts; wrap the r.Client.Status().Patch call in
retry.RetryOnConflict(retry.DefaultBackoff, func() error { ... }) and inside the
closure re-read the current HostedControlPlane (hostedControlPlane) from the API
to compute/merge the IgnitionServerValidReleaseInfo condition (using
releaseImageProvider.GetMissingImages() as before), then call
r.Client.Status().Patch (or reuse support/k8sutil.UpdateObject pattern)
returning any error so conflicts are retried; keep originalHCP only for the
MergeFrom base inside the closure.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 6b55879f-4645-4cc2-9269-fc6811eee015

📥 Commits

Reviewing files that changed from the base of the PR and between 58ac9ee and 10d5332.

📒 Files selected for processing (6)
  • control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/hcpstatus/hcpstatus.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/hcpstatus/hcpstatus_test.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.go
  • hypershift-operator/controllers/etcdbackup/reconciler.go
  • ignition-server/controllers/local_ignitionprovider.go

@jkyros
Copy link
Copy Markdown
Member Author

jkyros commented May 21, 2026

/test e2e-aws-autonode

So right now we have multiple controllers writing HCP status, and
several of them are doing full Update() calls instead of patches, which
means that the whole object gets overwritten.

This results in race conditions such that things like e.g. vcpus get
wiped from the status object when something else writes the object.

This just swaps all those Status().Update() with Patch + MergeFrom +
oplocks so the controllers should get 409'd and retry instead of
stomping on fields updated by other controllers.

Signed-off-by: John Kyros <jkyros@redhat.com>
@hypershift-jira-solve-ci
Copy link
Copy Markdown

Now I have the complete picture. Let me compile the final report.

Test Failure Analysis Complete

Job Information

  • Prow Job: pull-ci-openshift-hypershift-main-e2e-aws-autonode
  • Build ID: 2057254802236116992
  • Target: e2e-aws-autonode
  • PR: OCPBUGS-86306: Stop controllers fighting over HCP status #8562 (OCPBUGS-86306: Stop controllers fighting over HCP status)
  • Failed Test: TestKarpenter/ValidateHostedCluster
  • Duration: 2502s (41m42s) — timed out after 30m rollout wait

Test Failure Analysis

Error

util.go:615: Failed to wait for HostedCluster e2e-clusters-zkl9p/karpenter-xq47w to rollout in 30m0s: context deadline exceeded
eventually.go:225: observed *v1beta1.HostedCluster e2e-clusters-zkl9p/karpenter-xq47w invalid at RV 27888 after 30m0s: wanted most recent version history to have state Completed, has state Partial

Incorrect conditions:
 - wanted ClusterVersionProgressing=False, got ClusterVersionProgressing=Unknown: StatusUnknown(Condition not found in the CVO.)
 - wanted ClusterVersionSucceeding=True, got ClusterVersionSucceeding=Unknown: StatusUnknown(Condition not found in the CVO.)
 - wanted ClusterVersionReleaseAccepted=True, got ClusterVersionReleaseAccepted=Unknown: StatusUnknown(Condition not found in the CVO.)
 - wanted ClusterVersionAvailable=True, got ClusterVersionAvailable=Unknown: StatusUnknown(Condition not found in the CVO.)

Summary

Two independent failures — neither caused by PR #8562's code changes.

1. e2e-aws-autonode (Prow): The TestKarpenter/ValidateHostedCluster test failed because the Cluster Version Operator (CVO) in the hosted cluster never completed its rollout within 30 minutes. The version history remained in Partial state and all ClusterVersion* conditions stayed Unknown with message "Condition not found in the CVO." The test also hit API client rate limiting (client rate limiter Wait returned an error: context deadline exceeded), suggesting resource pressure on the management cluster. The PR's Status().Update()Status().Patch() changes are uninvolved — the hcpstatus controller correctly reports Unknown when CVO conditions are absent; the root issue is the CVO never set them.

2. codecov/patch: The PR achieved 37.50% diff coverage against a target of 40.34%. The new test file (hcpstatus_test.go, 161 lines) provides good coverage of patch semantics, but several modified files (hostedcontrolplane_controller.go, resources.go, local_ignitionprovider.go, reconciler.go) contain patching changes in code paths that are not exercised by unit tests, pulling patch coverage below the threshold.

Root Cause

e2e-aws-autonode — Pre-existing flake, unrelated to PR #8562:

The CVO in the hosted cluster failed to complete its initial rollout. The failure sequence:

  1. HostedCluster karpenter-xq47w was created successfully (24s)
  2. Guest API server became reachable after 2m24s (with initial DNS resolution failures and i/o timeouts — normal during bootstrap)
  3. 2 worker nodes joined and became ready (8m24s)
  4. The test waited 30 minutes for the HostedCluster rollout to reach state: Completed, but it remained at state: Partial
  5. All CVO conditions (ClusterVersionProgressing, ClusterVersionAvailable, ClusterVersionSucceeding, ClusterVersionReleaseAccepted) were Unknown because the CVO in the hosted cluster never populated its conditions on the ClusterVersion object
  6. During this wait, the test also encountered client rate limiter Wait returned an error: context deadline exceeded, indicating API server throttling prevented even basic HC status polling

The PR changes the hcpstatus controller from Status().Update() to Status().Patch() with optimistic locking. This controller reads CVO conditions from the hosted cluster's ClusterVersion object and writes them to HCP status. The Unknown status is the correct behavior when CVO conditions are absent — it means the controller ran and found no conditions, not that the controller failed. The CVO conditions being absent is a hosted cluster infrastructure issue (CVO deployment never completed), not a controller write mechanism issue.

codecov/patch — Coverage gap in modified files:

The PR modified 6 files but added unit tests only for hcpstatus.go. The other files (hostedcontrolplane_controller.go, resources.go, local_ignitionprovider.go, reconciler.go) contain Status().Update()Status().Patch() changes in code paths that are integration-tested by e2e but not unit-tested, resulting in 37.50% patch coverage (below the 40.34% target).

Recommendations
  1. e2e-aws-autonode: Retry the job — this is a pre-existing infrastructure flake where the hosted cluster CVO failed to complete its initial rollout within 30 minutes. The failure is unrelated to the PR's Update→Patch changes. The client rate limiter errors suggest management cluster resource pressure during this run.

  2. codecov/patch: To resolve the coverage gate, consider adding unit tests for the modified patch paths in:

    • hostedcontrolplane_controller.goreconcileValidIDPConfigurationCondition and removeCloudResources
    • resources.goreconcileClusterRecovery and destroyCloudResources
    • reconciler.goupdateHCPBackupCondition
    • local_ignitionprovider.goreconcileValidReleaseInfoCondition

    Alternatively, if the project treats codecov/patch as non-blocking, no action is needed.

  3. Consider adding retry-on-conflict logic (as noted by CodeRabbit review): The optimistic lock patches can fail with 409 Conflict errors when multiple controllers update HCP status concurrently. Adding retry.RetryOnConflict would make the patches more resilient, though the controller-runtime reconcile loop provides built-in retry via requeue.

Evidence
Evidence Detail
Test name TestKarpenter/ValidateHostedCluster
Failure mode 30m timeout waiting for rollout state: Completed, stuck at state: Partial
CVO conditions All Unknown: StatusUnknown(Condition not found in the CVO.) — CVO never set them
HC status Available=True, Progressing=False, EtcdAvailable=True, KubeAPIServerAvailable=True
Rate limiter client rate limiter Wait returned an error: context deadline exceeded during rollout wait
Version history Partial — CVO install never completed
API connectivity Initial DNS failures + i/o timeouts resolved after 2m24s (normal bootstrap)
Nodes 2 nodes became ready successfully (8m24s)
PR changes Status().Update()Status().Patch(MergeFromWithOptimisticLock) — no CVO condition logic changed
codecov/patch 37.50% diff coverage vs 40.34% target — insufficient unit test coverage on modified files
codecov/project 40.34% (+0.00%) — overall project coverage unaffected

So our conversion from Update() to Patch() was clever, but our patch
calls can't write a nil, the field just get omitted, and we have fields
where that nil is normal, expected, and necessary, so this tries to work
around it.

This is gross but if it works it at least we know we understand the
problem.

Signed-off-by: John Kyros <jkyros@redhat.com>
@jkyros jkyros force-pushed the karpenter-fix-hcp-status-race-oplocks branch from 10d5332 to 9995cf5 Compare May 21, 2026 16:37
@jkyros
Copy link
Copy Markdown
Member Author

jkyros commented May 21, 2026

/test e2e-aws-autonode

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 21, 2026

@jkyros: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@jkyros
Copy link
Copy Markdown
Member Author

jkyros commented May 21, 2026

Ugh that passed. Great. There's something in here with Patch and fields that are +required +nullable that I think breaks it if we don't mess with it. The grotesque way I solved it with the JSON patch is not ideal. Trying to do better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants