Skip to content

test(depot-client): stabilize fault harness#4924

Closed
NathanFlurry wants to merge 1 commit intosqlite-soak/sqlite-transport-traitfrom
sqlite-soak/depot-fault-harness-serialization
Closed

test(depot-client): stabilize fault harness#4924
NathanFlurry wants to merge 1 commit intosqlite-soak/sqlite-transport-traitfrom
sqlite-soak/depot-fault-harness-serialization

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Copy link
Copy Markdown
Member Author

NathanFlurry commented May 4, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@NathanFlurry NathanFlurry marked this pull request as ready for review May 4, 2026 03:02
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 4, 2026

Code Review: PR #4924 — test(depot-client): stabilize fault harness

Overview

This PR stabilizes the depot-client fault test harness by: (1) serializing fault scenario runs to prevent process-global state interference, and (2) strengthening the database pointer invariant check in verify.rs by cross-validating the scan result against resolve_database_branch.

scenario.rs

Mutex choice is correct. parking_lot::Mutex (already imported at line 33) is used in the sync run() function, which aligns with the project convention of using parking_lot for sync contexts.

Guard lifetime is correct. The named binding gives _run_guard a lifetime until the end of run(), not just the let statement. A bare underscore would drop immediately and release the lock.

Error message is excellent. The bail! message names the exact cargo test invocation needed to fix the problem, which is very helpful for developers hitting this unexpectedly.

Profile-based timeouts are appropriate. 30s vs. 120s for Simple vs. Chaos is a reasonable distinction given Chaos tests introduce fault injection delays.

No concerns with scenario.rs.

verify.rs

Bug fix is substantive (worth calling out in PR description). The original code had a silent control-flow issue: after calling self.violate("database pointer ... is missing") when the scan found nothing, it would fall through to Ok(current) -> Ok(None). Callers depending on a Some result would then panic rather than seeing the violation accumulate. The new early return after the violation is the correct fix.

BucketId::from_gas_id(Id::nil()) pattern is established. This mirrors the same call in vfs_support.rs line 306 and is the correct pattern for test fixtures that do not use bucket isolation.

Potential invariant gap to consider. The new cross-check only fires when both the scan and the resolver return Some. If resolved is Some but scanned_current is None, no violation is raised — whereas the old code would have violated with "database pointer ... is missing". Consider adding a check for the resolved-Some/scanned-None case. This may be intentional if resolve_database_branch can find pointers via its bucket-fallback path that the prefix scan misses, but it is worth confirming.

Summary

Area Assessment
Serialization approach Correct — parking_lot::try_lock in sync context
Timeout values Reasonable for Simple/Chaos profiles
Bug fix in verify.rs Real fix, not just cosmetic
Cross-check invariant Strengthened, with one question about the resolved-Some/scanned-None gap
Conventions (Mutex, logging, comments) All correct

One question to resolve before merging, otherwise this is clean stabilization work.

@NathanFlurry
Copy link
Copy Markdown
Member Author

Landed in main via stack-merge fast-forward push. Commits are in main; closing to match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant