HYPERFLEET-1043 - feat: E2E tests for force-delete lifecycle#111
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR implements a force-delete escape hatch for clusters stuck in finalization. The change adds an HTTP response validation helper ( Sequence DiagramsequenceDiagram
participant Test as E2E Test
participant API as HyperFleet API
participant Adapter as Cluster Adapter
Test->>API: Create cluster + nodepool
API->>Adapter: Provision resources
Adapter-->>API: Reconciled=True
Test->>Adapter: Scale down to 0 (simulate outage)
Test->>API: DELETE /clusters/{id} (soft-delete)
API-->>Test: Cluster accessible, Reconciled=False
Test->>API: Poll for hard-deletion
API-->>Test: Still exists (Adapter unavailable)
Test->>API: POST /clusters/{id}/force-delete
API-->>Test: 204 No Content (hard-deleted)
Test->>API: Verify cluster + nodepool
API-->>Test: HTTP 404 (cascade removal)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
pkg/client/client.go (1)
78-99: ⚡ Quick winExtract the shared status-mismatch error construction.
Lines 82-96 duplicate the mismatch path in
handleHTTPResponse(54-68) verbatim. A small helper keeps both in sync as the error shape evolves.♻️ Proposed extraction
+// newHTTPError builds an HTTPError from a non-matching response, reading the body when possible. +func newHTTPError(resp *http.Response, action string) *HTTPError { + body, err := io.ReadAll(resp.Body) + if err != nil { + return &HTTPError{ + StatusCode: resp.StatusCode, + Action: action, + Body: fmt.Sprintf("failed to read error response body: %v", err), + } + } + return &HTTPError{ + StatusCode: resp.StatusCode, + Action: action, + Body: string(body), + } +} + // handleHTTPNoBodyResponse validates the status code for responses with no body (e.g. 204). func handleHTTPNoBodyResponse(resp *http.Response, expectedStatus int, action string) error { defer func() { _ = resp.Body.Close() }() if resp.StatusCode != expectedStatus { - body, err := io.ReadAll(resp.Body) - if err != nil { - return &HTTPError{ - StatusCode: resp.StatusCode, - Action: action, - Body: fmt.Sprintf("failed to read error response body: %v", err), - } - } - return &HTTPError{ - StatusCode: resp.StatusCode, - Action: action, - Body: string(body), - } + return newHTTPError(resp, action) } return nil }
handleHTTPResponsecan reuse the same helper for its 54-68 block.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/client/client.go` around lines 78 - 99, The status-mismatch error construction in handleHTTPNoBodyResponse and handleHTTPResponse should be extracted into a single helper (e.g., buildStatusMismatchError) that takes (resp *http.Response, expectedStatus int, action string) and returns error; implement the helper to read and close resp.Body, handle read errors by returning an *HTTPError with Body set to the read-error message, or otherwise return *HTTPError with the response body string, and then replace the duplicated blocks in both handleHTTPNoBodyResponse and handleHTTPResponse to call this helper instead of duplicating the logic.e2e/cluster/force_delete.go (1)
110-130: ⚖️ Poor tradeoffFix Ginkgo cleanup ordering to avoid leaving a soft-deleted cluster behind on failures
In
e2e/cluster/force_delete.go, theginkgo.DeferCleanupthat restores API required adapters is registered beforeh.DeferClusterCleanup(clusterID); with Ginkgo’s LIFODeferCleanup, cluster cleanup runs first. That meansCleanupTestCluster(DeleteCluster+ wait for hard-delete/404) can run while the stuck-adapter is still scaled down (and still present in the required-adapters set), so the DELETE won’t complete until the restore happens—risking a cleanup timeout and leaving the cluster inFinalizingfor later specs. Reorder/depend the cleanups soRestoreAPIRequiredAdaptersWithRetryruns beforeDeferClusterCleanup(similar to how the nodepool suite does it).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@e2e/cluster/force_delete.go` around lines 110 - 130, The cleanup that restores adapters (ginkgo.DeferCleanup calling h.RestoreAPIRequiredAdaptersWithRetry) must be registered after the cluster cleanup registration so it runs first (Ginkgo runs DeferCleanup in LIFO order); swap the registration order around the cluster creation so h.DeferClusterCleanup(clusterID) is called before ginkgo.DeferCleanup(... RestoreAPIRequiredAdaptersWithRetry ...) (or otherwise ensure RestoreAPIRequiredAdaptersWithRetry is the last DeferCleanup registered) to guarantee adapters are restored before CleanupTestCluster/DeleteCluster runs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@e2e/cluster/force_delete.go`:
- Around line 110-130: The cleanup that restores adapters (ginkgo.DeferCleanup
calling h.RestoreAPIRequiredAdaptersWithRetry) must be registered after the
cluster cleanup registration so it runs first (Ginkgo runs DeferCleanup in LIFO
order); swap the registration order around the cluster creation so
h.DeferClusterCleanup(clusterID) is called before ginkgo.DeferCleanup(...
RestoreAPIRequiredAdaptersWithRetry ...) (or otherwise ensure
RestoreAPIRequiredAdaptersWithRetry is the last DeferCleanup registered) to
guarantee adapters are restored before CleanupTestCluster/DeleteCluster runs.
In `@pkg/client/client.go`:
- Around line 78-99: The status-mismatch error construction in
handleHTTPNoBodyResponse and handleHTTPResponse should be extracted into a
single helper (e.g., buildStatusMismatchError) that takes (resp *http.Response,
expectedStatus int, action string) and returns error; implement the helper to
read and close resp.Body, handle read errors by returning an *HTTPError with
Body set to the read-error message, or otherwise return *HTTPError with the
response body string, and then replace the duplicated blocks in both
handleHTTPNoBodyResponse and handleHTTPResponse to call this helper instead of
duplicating the logic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 1d341ad6-5251-4ddd-98af-d42af6bd2dc7
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum,!**/go.sum
📒 Files selected for processing (9)
e2e/cluster/force_delete.goe2e/nodepool/force_delete.gogo.modpkg/client/client.gopkg/client/cluster.gopkg/client/nodepool.gopkg/helper/helper.gotest-design/testcases/delete-cluster.mdtest-design/testcases/delete-nodepool.md
|
@CodeRabbit The reason is a required field in the API contract. The log mirrors what a real client would emit for observability. |
|
Tip For best results, initiate chat on the files or code changes.
|
5a97794 to
9b073ea
Compare
force-delete lifecycle
9b073ea to
232a8da
Compare
| // Restore the adapter after the test regardless of outcome | ||
| ginkgo.DeferCleanup(func(ctx context.Context) { | ||
| ginkgo.By("Restore nodepool adapter to 1 replica") | ||
| if err := h.ScaleDeployment(ctx, h.Cfg.Namespace, deploymentName, 1); err != nil { |
There was a problem hiding this comment.
Tip
nit — non-blocking suggestion
Category: Pattern
This restores the adapter to a hardcoded 1 replica, but the original deployment might have a different count. Consider capturing the replica count before scaling down and restoring to that value:
// Before scaling down
originalReplicas, err := h.GetDeploymentReplicas(ctx, h.Cfg.Namespace, deploymentName)
Expect(err).NotTo(HaveOccurred())
// In DeferCleanup
h.ScaleDeployment(ctx, h.Cfg.Namespace, deploymentName, originalReplicas)If test environments always run 1 replica, feel free to ignore.
|
/retest-required |
1 similar comment
|
/retest-required |
|
/retest |
1 similar comment
|
/retest |
|
/retest-required |
2 similar comments
|
/retest-required |
|
/retest-required |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kuudori The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
918dbf9
into
openshift-hyperfleet:main
Summary
Adds E2E test for the force-delete lifecycle. Validates the escape hatch for clusters stuck in Finalizing state.
HYPERFLEET-1043
Changes
Force-deleting a single nodepool does not affect the parent clusterAdapter handles 404 gracefully without crashing or retryingNegative cases: 409, 400, 404hyperfleet-api-specto v1.0.15 for force-delete API operationsTest Plan
make test-allpassesmake lintpassesmake test-helm(if applicable)