fix(bench): synchronously wait for teardown deletes to complete#96
Merged
Conversation
PR #95 fixed the namespace teardown→bootstrap race by waiting for any Terminating namespace inside ensureNamespace. The next end-to-end run hit the same race class on a different resource: profile np-stress's installCRDs got Create→IsAlreadyExists on a still-Terminating CRD from np-typical's teardown, skipped the Create, slept 3s, and during that sleep the CRD finalizer completed. The follow-up createSource then saw "the server could not find the requested resource". Rather than playing whack-a-mole with one Terminating-aware Ensure per resource type (namespaces today, CRDs tomorrow, ClusterProjections next week), centralize the cleanup-completion wait in teardown itself. After issuing every Delete, teardown now polls until every namespace, CRD, and ClusterProjection it deleted is observed NotFound. Bounded at 120s; on timeout the function returns silently (next bootstrap will surface genuinely stuck state). The PR #95 ensureNamespace wait stays in place as defense-in-depth — it covers external-actor deletes that happen during a run, not just the inter-profile teardown race this commit closes.
Bench smoke —
|
| Path | Samples | p50 | p95 | p99 |
|---|---|---|---|---|
| NP single-target | 100 | 16.4ms | 21.4ms | 36.7ms |
| CP-selector earliest | 30 | 57.5ms | 111.2ms | 125.8ms |
| CP-selector slowest | 30 | 139.3ms | 176.8ms | 191.1ms |
| CP-list earliest | 30 | 38.1ms | 71.2ms | 82.7ms |
| CP-list slowest | 30 | 38.1ms | 77.9ms | 82.8ms |
Total wall: 88s • Commit: 29785f8 • Workflow run
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
PR #95 fixed the namespace teardown→bootstrap race by waiting for any `Terminating` namespace inside `ensureNamespace`. The next end-to-end `--profile=full` run hit the same race class on a different resource type:
Rather than play whack-a-mole (namespaces today via PR #95, CRDs tomorrow, ClusterProjections next week), this PR closes the whole class at the source.
What
`teardown` now synchronously polls until every namespace, CRD, and ClusterProjection it deleted is observed `NotFound`. Bounded at 120s; on timeout it returns silently (next bootstrap will surface genuinely-stuck state).
The PR #95 `ensureNamespace` wait stays in place as defense-in-depth — it covers external-actor deletes that happen during a run, not just the inter-profile teardown race this commit closes.
Trade-off
Teardown wall grows by however long the last finalizer takes. In practice:
Across an 8-profile `full` matrix that's maybe +1-2 min total wall. Acceptable for the predictability win.
Test plan