HYPERFLEET-1043 - feat: E2E tests for force-delete lifecycle by pnguyen44 · Pull Request #111 · openshift-hyperfleet/hyperfleet-e2e

pnguyen44 · 2026-06-02T15:43:15Z

Summary

Adds E2E test for the force-delete lifecycle. Validates the escape hatch for clusters stuck in Finalizing state.

HYPERFLEET-1043

Changes

E2E tests for the force-delete lifecycle:
- E2E test for force-deleting a cluster stuck in Finalizing, verifying it and its nodepools are removed
- ~~Force-deleting a single nodepool does not affect the parent cluster~~
- ~~Adapter handles 404 gracefully without crashing or retrying~~
- ~~Negative cases: 409, 400, 404~~
Bumped hyperfleet-api-spec to v1.0.15 for force-delete API operations

Test Plan

Unit tests added/updated
make test-all passes
make lint passes
Helm chart changes validated with make test-helm (if applicable)
Deployed to a development cluster and verified (if Helm/config changes)
E2E tests passed (if cross-component or major changes)

coderabbitai · 2026-06-02T15:43:29Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR implements a force-delete escape hatch for clusters stuck in finalization. The change adds an HTTP response validation helper (handleHTTPNoBodyResponse), client API methods to force-delete clusters and nodepools via ForceDeleteRequest with a reason, enhances test cleanup to recognize 404 responses, introduces deferred cleanup registration for Ginkgo tests, and includes a comprehensive e2e test that simulates adapter unavailability, verifies soft-delete persistence, and validates hard deletion with cascade removal of child nodepools.

Sequence Diagram

sequenceDiagram
  participant Test as E2E Test
  participant API as HyperFleet API
  participant Adapter as Cluster Adapter
  
  Test->>API: Create cluster + nodepool
  API->>Adapter: Provision resources
  Adapter-->>API: Reconciled=True
  
  Test->>Adapter: Scale down to 0 (simulate outage)
  
  Test->>API: DELETE /clusters/{id} (soft-delete)
  API-->>Test: Cluster accessible, Reconciled=False
  Test->>API: Poll for hard-deletion
  API-->>Test: Still exists (Adapter unavailable)
  
  Test->>API: POST /clusters/{id}/force-delete
  API-->>Test: 204 No Content (hard-deleted)
  Test->>API: Verify cluster + nodepool
  API-->>Test: HTTP 404 (cascade removal)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 6

✅ Passed checks (6 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: adding E2E tests for force-delete lifecycle with a clear feature label and issue reference.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Sec-02: Secrets In Log Output	✅ Passed	No secrets, tokens, passwords, or credentials found in any log statements across modified files. All logged fields contain safe metadata (IDs, names, paths) and user-provided justification text.
Description check	✅ Passed	The PR description directly relates to the changeset: E2E tests for force-delete lifecycle, API spec dependency bump, and helper methods to validate the escape hatch for clusters stuck in Finalizing state.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

✨ Simplify code

Create PR with simplified code

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (2)

pkg/client/client.go (1)

78-99: ⚡ Quick win

Extract the shared status-mismatch error construction.

Lines 82-96 duplicate the mismatch path in handleHTTPResponse (54-68) verbatim. A small helper keeps both in sync as the error shape evolves.

♻️ Proposed extraction

+// newHTTPError builds an HTTPError from a non-matching response, reading the body when possible.
+func newHTTPError(resp *http.Response, action string) *HTTPError {
+	body, err := io.ReadAll(resp.Body)
+	if err != nil {
+		return &HTTPError{
+			StatusCode: resp.StatusCode,
+			Action:     action,
+			Body:       fmt.Sprintf("failed to read error response body: %v", err),
+		}
+	}
+	return &HTTPError{
+		StatusCode: resp.StatusCode,
+		Action:     action,
+		Body:       string(body),
+	}
+}
+
 // handleHTTPNoBodyResponse validates the status code for responses with no body (e.g. 204).
 func handleHTTPNoBodyResponse(resp *http.Response, expectedStatus int, action string) error {
 	defer func() { _ = resp.Body.Close() }()
 
 	if resp.StatusCode != expectedStatus {
-		body, err := io.ReadAll(resp.Body)
-		if err != nil {
-			return &HTTPError{
-				StatusCode: resp.StatusCode,
-				Action:     action,
-				Body:       fmt.Sprintf("failed to read error response body: %v", err),
-			}
-		}
-		return &HTTPError{
-			StatusCode: resp.StatusCode,
-			Action:     action,
-			Body:       string(body),
-		}
+		return newHTTPError(resp, action)
 	}
 
 	return nil
 }

handleHTTPResponse can reuse the same helper for its 54-68 block.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/client/client.go` around lines 78 - 99, The status-mismatch error
construction in handleHTTPNoBodyResponse and handleHTTPResponse should be
extracted into a single helper (e.g., buildStatusMismatchError) that takes (resp
*http.Response, expectedStatus int, action string) and returns error; implement
the helper to read and close resp.Body, handle read errors by returning an
*HTTPError with Body set to the read-error message, or otherwise return
*HTTPError with the response body string, and then replace the duplicated blocks
in both handleHTTPNoBodyResponse and handleHTTPResponse to call this helper
instead of duplicating the logic.

e2e/cluster/force_delete.go (1)

110-130: ⚖️ Poor tradeoff

Fix Ginkgo cleanup ordering to avoid leaving a soft-deleted cluster behind on failures

In e2e/cluster/force_delete.go, the ginkgo.DeferCleanup that restores API required adapters is registered before h.DeferClusterCleanup(clusterID); with Ginkgo’s LIFO DeferCleanup, cluster cleanup runs first. That means CleanupTestCluster (DeleteCluster + wait for hard-delete/404) can run while the stuck-adapter is still scaled down (and still present in the required-adapters set), so the DELETE won’t complete until the restore happens—risking a cleanup timeout and leaving the cluster in Finalizing for later specs. Reorder/depend the cleanups so RestoreAPIRequiredAdaptersWithRetry runs before DeferClusterCleanup (similar to how the nodepool suite does it).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@e2e/cluster/force_delete.go` around lines 110 - 130, The cleanup that
restores adapters (ginkgo.DeferCleanup calling
h.RestoreAPIRequiredAdaptersWithRetry) must be registered after the cluster
cleanup registration so it runs first (Ginkgo runs DeferCleanup in LIFO order);
swap the registration order around the cluster creation so
h.DeferClusterCleanup(clusterID) is called before ginkgo.DeferCleanup(...
RestoreAPIRequiredAdaptersWithRetry ...) (or otherwise ensure
RestoreAPIRequiredAdaptersWithRetry is the last DeferCleanup registered) to
guarantee adapters are restored before CleanupTestCluster/DeleteCluster runs.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@e2e/cluster/force_delete.go`:
- Around line 110-130: The cleanup that restores adapters (ginkgo.DeferCleanup
calling h.RestoreAPIRequiredAdaptersWithRetry) must be registered after the
cluster cleanup registration so it runs first (Ginkgo runs DeferCleanup in LIFO
order); swap the registration order around the cluster creation so
h.DeferClusterCleanup(clusterID) is called before ginkgo.DeferCleanup(...
RestoreAPIRequiredAdaptersWithRetry ...) (or otherwise ensure
RestoreAPIRequiredAdaptersWithRetry is the last DeferCleanup registered) to
guarantee adapters are restored before CleanupTestCluster/DeleteCluster runs.

In `@pkg/client/client.go`:
- Around line 78-99: The status-mismatch error construction in
handleHTTPNoBodyResponse and handleHTTPResponse should be extracted into a
single helper (e.g., buildStatusMismatchError) that takes (resp *http.Response,
expectedStatus int, action string) and returns error; implement the helper to
read and close resp.Body, handle read errors by returning an *HTTPError with
Body set to the read-error message, or otherwise return *HTTPError with the
response body string, and then replace the duplicated blocks in both
handleHTTPNoBodyResponse and handleHTTPResponse to call this helper instead of
duplicating the logic.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 1d341ad6-5251-4ddd-98af-d42af6bd2dc7

📥 Commits

Reviewing files that changed from the base of the PR and between 28bad99 and 5a97794.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum, !**/go.sum

📒 Files selected for processing (9)

e2e/cluster/force_delete.go
e2e/nodepool/force_delete.go
go.mod
pkg/client/client.go
pkg/client/cluster.go
pkg/client/nodepool.go
pkg/helper/helper.go
test-design/testcases/delete-cluster.md
test-design/testcases/delete-nodepool.md

pnguyen44 · 2026-06-02T16:05:29Z

@CodeRabbit The reason is a required field in the API contract. The log mirrors what a real client would emit for observability.

coderabbitai · 2026-06-02T16:05:48Z

Tip

For best results, initiate chat on the files or code changes.

@pnguyen44 Acknowledged — makes sense. Required field + observability parity with production clients is a solid justification. I'll update my understanding going forward.

📝 Learnings were identified and not saved (knowledge base disabled). Enable

force-delete lifecycle

rafabene · 2026-06-02T16:39:22Z

+				// Restore the adapter after the test regardless of outcome
+				ginkgo.DeferCleanup(func(ctx context.Context) {
+					ginkgo.By("Restore nodepool adapter to 1 replica")
+					if err := h.ScaleDeployment(ctx, h.Cfg.Namespace, deploymentName, 1); err != nil {


Tip
nit — non-blocking suggestion

Category: Pattern

This restores the adapter to a hardcoded 1 replica, but the original deployment might have a different count. Consider capturing the replica count before scaling down and restoring to that value:

// Before scaling down originalReplicas, err := h.GetDeploymentReplicas(ctx, h.Cfg.Namespace, deploymentName) Expect(err).NotTo(HaveOccurred()) // In DeferCleanup h.ScaleDeployment(ctx, h.Cfg.Namespace, deploymentName, originalReplicas)

If test environments always run 1 replica, feel free to ignore.

pnguyen44 · 2026-06-02T19:01:21Z

/retest-required

pnguyen44 · 2026-06-02T19:17:07Z

/retest-required

pnguyen44 · 2026-06-02T19:23:09Z

/retest

pnguyen44 · 2026-06-02T19:27:25Z

/retest

pnguyen44 · 2026-06-02T19:33:52Z

/retest-required

kuudori · 2026-06-02T19:52:00Z

/retest-required

pnguyen44 · 2026-06-02T20:17:19Z

/retest-required

kuudori · 2026-06-02T20:38:40Z

/lgtm

openshift-ci · 2026-06-02T20:38:46Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kuudori

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [kuudori]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci Bot requested review from kuudori and ma-hill June 2, 2026 15:43

coderabbitai Bot reviewed Jun 2, 2026

View reviewed changes

pnguyen44 changed the title ~~HYPERFLEET-1043 - feat: add e3e tests for~~ HYPERFLEET-1043 - feat: E2E tests for force-delete lifecycle Jun 2, 2026

pnguyen44 force-pushed the HYPERFLEET-1043/force-delete branch from 5a97794 to 9b073ea Compare June 2, 2026 16:15

HYPERFLEET-1043 - feat: add e3e tests for

232a8da

force-delete lifecycle

pnguyen44 force-pushed the HYPERFLEET-1043/force-delete branch from 9b073ea to 232a8da Compare June 2, 2026 16:35

rafabene reviewed Jun 2, 2026

View reviewed changes

Comment thread e2e/nodepool/force_delete.go Outdated

rafabene reviewed Jun 2, 2026

View reviewed changes

HYPERFLEET-1043 - chore: add assertion to test

9763331

rh-amarin reviewed Jun 2, 2026

View reviewed changes

Comment thread test-design/testcases/delete-cluster.md Outdated

rh-amarin reviewed Jun 2, 2026

View reviewed changes

Comment thread test-design/testcases/delete-cluster.md Outdated

rh-amarin reviewed Jun 2, 2026

View reviewed changes

Comment thread test-design/testcases/delete-cluster.md Outdated

rh-amarin reviewed Jun 2, 2026

View reviewed changes

Comment thread test-design/testcases/delete-nodepool.md Outdated

HYPERFLEET-1043 - chore: address feedback

f8fe2ef

openshift-ci Bot assigned kuudori Jun 2, 2026

openshift-ci Bot added the lgtm label Jun 2, 2026

openshift-ci Bot added the approved label Jun 2, 2026

openshift-merge-bot Bot merged commit 918dbf9 into openshift-hyperfleet:main Jun 2, 2026
7 checks passed

Conversation

pnguyen44 commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test Plan

Uh oh!

coderabbitai Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Sequence Diagram

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

pnguyen44 commented Jun 2, 2026

Uh oh!

coderabbitai Bot commented Jun 2, 2026

Uh oh!

Uh oh!

rafabene Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pnguyen44 commented Jun 2, 2026

Uh oh!

pnguyen44 commented Jun 2, 2026

Uh oh!

pnguyen44 commented Jun 2, 2026

Uh oh!

pnguyen44 commented Jun 2, 2026

Uh oh!

pnguyen44 commented Jun 2, 2026

Uh oh!

kuudori commented Jun 2, 2026

Uh oh!

pnguyen44 commented Jun 2, 2026

Uh oh!

kuudori commented Jun 2, 2026

Uh oh!

openshift-ci Bot commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pnguyen44 commented Jun 2, 2026 •

edited

Loading

coderabbitai Bot commented Jun 2, 2026 •

edited

Loading