test(depot-client): stabilize fault harness#4924
test(depot-client): stabilize fault harness#4924NathanFlurry wants to merge 1 commit intosqlite-soak/sqlite-transport-traitfrom
Conversation
Code Review: PR #4924 — test(depot-client): stabilize fault harnessOverviewThis PR stabilizes the depot-client fault test harness by: (1) serializing fault scenario runs to prevent process-global state interference, and (2) strengthening the database pointer invariant check in verify.rs by cross-validating the scan result against resolve_database_branch. scenario.rsMutex choice is correct. parking_lot::Mutex (already imported at line 33) is used in the sync run() function, which aligns with the project convention of using parking_lot for sync contexts. Guard lifetime is correct. The named binding gives _run_guard a lifetime until the end of run(), not just the let statement. A bare underscore would drop immediately and release the lock. Error message is excellent. The bail! message names the exact cargo test invocation needed to fix the problem, which is very helpful for developers hitting this unexpectedly. Profile-based timeouts are appropriate. 30s vs. 120s for Simple vs. Chaos is a reasonable distinction given Chaos tests introduce fault injection delays. No concerns with scenario.rs. verify.rsBug fix is substantive (worth calling out in PR description). The original code had a silent control-flow issue: after calling self.violate("database pointer ... is missing") when the scan found nothing, it would fall through to Ok(current) -> Ok(None). Callers depending on a Some result would then panic rather than seeing the violation accumulate. The new early return after the violation is the correct fix. BucketId::from_gas_id(Id::nil()) pattern is established. This mirrors the same call in vfs_support.rs line 306 and is the correct pattern for test fixtures that do not use bucket isolation. Potential invariant gap to consider. The new cross-check only fires when both the scan and the resolver return Some. If resolved is Some but scanned_current is None, no violation is raised — whereas the old code would have violated with "database pointer ... is missing". Consider adding a check for the resolved-Some/scanned-None case. This may be intentional if resolve_database_branch can find pointers via its bucket-fallback path that the prefix scan misses, but it is worth confirming. Summary
One question to resolve before merging, otherwise this is clean stabilization work. |
48cc3bb to
e650129
Compare
29f3416 to
d83cee1
Compare
|
Landed in main via stack-merge fast-forward push. Commits are in main; closing to match. |

Description
Please include a summary of the changes and the related issue. Please also include relevant motivation and context.
Type of change
How Has This Been Tested?
Please describe the tests that you ran to verify your changes.
Checklist: