Skip to content

test(match): zsync sparse-match adversarial fixture#3657

Merged
oferchen merged 7 commits into
masterfrom
test/zsync-sparse-match-2080
May 5, 2026
Merged

test(match): zsync sparse-match adversarial fixture#3657
oferchen merged 7 commits into
masterfrom
test/zsync-sparse-match-2080

Conversation

@oferchen
Copy link
Copy Markdown
Owner

@oferchen oferchen commented May 5, 2026

Summary

Adds crates/match/tests/sparse_match_fixture.rs, an integration test that drives the production DeltaSignatureIndex::from_signature + DeltaGenerator::generate pipeline over synthetic basis/source pairs where almost no basis blocks recur in source. Exercises the rolling-hash hot path under near-constant rejection, the worst case identified in docs/design/zsync-inspired-matching.md that the planned bithash prefilter (#2059) targets. Pins today's pre-bithash matching accuracy as a baseline so future prefilter work can be benchmarked without regression risk.

Matrix coverage

  • Planted matching block counts K in {0, 1, 2} (K=0 is the most important: full literal stream, exercises the no-match hot path).
  • Basis and source sizes in {64 KB, 1 MB, 16 MB}. The 16 MB cases are marked #[ignore] so default cargo nextest runs only the 64 KB and 1 MB fixtures; stress-test CI matrices opt in via --run-ignored only.
  • Block sizes in {1024, 4096}.
  • Strong checksums: MD5 (no seed) and XXH3-64.
  • Worst-case timing budget of 5s asserted on the 64 KB / 1024-block fixture as a tripwire for accidental quadratic regressions.

Wire-compat invariants confirmed

  • The pipeline emits exactly K Copy tokens at basis indices 0..K, in order.
  • matched_bytes == K * block_size, literal_bytes == source.len() - K * block_size, total_bytes == source.len().
  • Byte-domain disjointness between basis (in [0, 0x7f]) and source non-planted bytes (in [0x80, 0xff]) makes the "no-match" property robust at every sliding-window alignment, not just block-aligned offsets. A construction sanity-check asserts this directly on the smallest fixture.
  • Strong-checksum verify call counts are not directly observable through the public API; the test notes this in a comment with the upstream reference (rsum.c:362-366) and asserts only the byte-level invariants.

Test plan

  • CI: fmt+clippy, nextest (stable), Windows, macOS, Linux musl all green.
  • Default cargo nextest run -p matching skips the #[ignore]-marked 16 MB cases.
  • cargo nextest run -p matching --run-ignored only runs the stress fixtures in the opt-in matrix.

@github-actions github-actions Bot added the test label May 5, 2026
oferchen added 3 commits May 5, 2026 14:36
Add an integration test that drives DeltaSignatureIndex::build +
DeltaGenerator::generate over basis/source pairs where almost no
basis blocks recur in source. Pins today's pre-bithash baseline
matching behaviour for the rolling-hash hot path so the planned
prefilter work (#2059) can be benchmarked against a fixed accuracy
reference.

The matrix sweeps planted block counts K in {0, 1, 2}, basis sizes
in {64 KB, 1 MB, 16 MB}, block sizes in {1024, 4096}, and strong
checksum algorithms (MD5, XXH3-64). The 16 MB cases are gated
behind #[ignore] so default cargo nextest runs only the smaller
fixtures; stress-test CI matrices opt in via --run-ignored only.
@oferchen oferchen force-pushed the test/zsync-sparse-match-2080 branch from ea7f97a to 8b94112 Compare May 5, 2026 11:36
@oferchen oferchen merged commit 8e750a7 into master May 5, 2026
38 of 39 checks passed
@oferchen oferchen deleted the test/zsync-sparse-match-2080 branch May 6, 2026 18:57
oferchen added a commit that referenced this pull request May 18, 2026
* test(match): zsync sparse-match adversarial fixture

Add an integration test that drives DeltaSignatureIndex::build +
DeltaGenerator::generate over basis/source pairs where almost no
basis blocks recur in source. Pins today's pre-bithash baseline
matching behaviour for the rolling-hash hot path so the planned
prefilter work (#2059) can be benchmarked against a fixed accuracy
reference.

The matrix sweeps planted block counts K in {0, 1, 2}, basis sizes
in {64 KB, 1 MB, 16 MB}, block sizes in {1024, 4096}, and strong
checksum algorithms (MD5, XXH3-64). The 16 MB cases are gated
behind #[ignore] so default cargo nextest runs only the smaller
fixtures; stress-test CI matrices opt in via --run-ignored only.

* style: cargo fmt

* fix(test): inline format args in sparse-match assert
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant