Skip to content

feat(match): add zsync-inspired bithash prefilter to MatchIndex#3737

Merged
oferchen merged 2 commits into
masterfrom
feat/zsync-bithash-prefilter
May 6, 2026
Merged

feat(match): add zsync-inspired bithash prefilter to MatchIndex#3737
oferchen merged 2 commits into
masterfrom
feat/zsync-bithash-prefilter

Conversation

@oferchen
Copy link
Copy Markdown
Owner

@oferchen oferchen commented May 5, 2026

Summary

  • Add zsync-inspired bithash prefilter (crates/match/src/index/bithash.rs) to DeltaSignatureIndex. Sized to ~8x the rsum-bucket count per docs/design/zsync-bithash.md section 1, populated alongside tag_table in populate_index, and probed in both find_match_bytes and find_match_slices between the tag_table gate and the FxHashMap walk.
  • Wire BitHash::clear() into DeltaSignatureIndex::rebuild so the INC_RECURSE per-segment rebuild path recycles the bit array without re-allocating, mirroring the existing tag_table reset.
  • Property tests (bithash_tests.rs, proptest 1.4) pin the no-false-negative contract end-to-end (every inserted basis block is still found through both probe paths) and at the BitHash level (every inserted rsum survives contains, including after clear + reinsert).

Wire-compat

In-memory only. No signature payload, NDX framing, capability string, multiplex frame, golden-byte fixture, or interop matrix is touched. The only observable effect is reduced receiver CPU during delta search.

Test plan

  • nextest run -p matching --all-features (CI)
  • fmt + clippy (CI)
  • Windows / macOS / Linux musl matrices (CI)
  • Interop matrix vs upstream 3.0.9 / 3.1.3 / 3.4.1 (CI)

Translate zsync's librcksum bithash gate into oc-rsync's existing
DeltaSignatureIndex. The bithash is an oversized one-sided bit array
sized to roughly 8x the rsum-bucket count and probed after the
1 KB tag_table check but before the FxHashMap walk. At the design
saturation density (1/8) it rejects ~7/8 of post-tag misses for
purely receiver-side CPU savings.

The prefilter is in-memory only: no signature payload, NDX framing,
capability negotiation, multiplex frame, golden-byte fixture, or
interop matrix is touched. Sizing follows docs/design/zsync-bithash.md
sections 1 and 2; insertion and probe sites match section 3 exactly.

Wires into populate_index alongside tag_table and lookup, and into
DeltaSignatureIndex::rebuild via clear() to recycle across the
INC_RECURSE per-segment rebuild path. The probe goes into both
find_match_bytes and find_match_slices; check_block_match_slices is
intentionally not gated since it bypasses the hash table.

Property tests pin the no-false-negative contract end-to-end (every
inserted basis block is still found) and at the BitHash level
(contains is true for every previously inserted rsum, including
across clear/reinsert cycles).

Closes #2060
Closes #2061
Closes #2062
@github-actions github-actions Bot added the enhancement New feature or request label May 5, 2026
@oferchen oferchen merged commit 3d0391d into master May 6, 2026
40 checks passed
@oferchen oferchen deleted the feat/zsync-bithash-prefilter branch May 6, 2026 18:56
oferchen added a commit that referenced this pull request May 14, 2026
Introduces CHANGELOG.md and records the four zsync-inspired internal
optimizations to the receiver block-match path: bithash prefilter
(#3737), sequential-match extension (#3751), matched-block pruning
(#3748), and compact-key layout (#3994). All four are wire-compatible
refactors of the in-memory match index; golden-byte fixtures and
interop against upstream rsync 3.0.9 / 3.1.3 / 3.4.1 are unchanged.

Closes #2087.
oferchen added a commit that referenced this pull request May 18, 2026
* feat(match): add zsync-inspired bithash prefilter to MatchIndex

Translate zsync's librcksum bithash gate into oc-rsync's existing
DeltaSignatureIndex. The bithash is an oversized one-sided bit array
sized to roughly 8x the rsum-bucket count and probed after the
1 KB tag_table check but before the FxHashMap walk. At the design
saturation density (1/8) it rejects ~7/8 of post-tag misses for
purely receiver-side CPU savings.

The prefilter is in-memory only: no signature payload, NDX framing,
capability negotiation, multiplex frame, golden-byte fixture, or
interop matrix is touched. Sizing follows docs/design/zsync-bithash.md
sections 1 and 2; insertion and probe sites match section 3 exactly.

Wires into populate_index alongside tag_table and lookup, and into
DeltaSignatureIndex::rebuild via clear() to recycle across the
INC_RECURSE per-segment rebuild path. The probe goes into both
find_match_bytes and find_match_slices; check_block_match_slices is
intentionally not gated since it bypasses the hash table.

Property tests pin the no-false-negative contract end-to-end (every
inserted basis block is still found) and at the BitHash level
(contains is true for every previously inserted rsum, including
across clear/reinsert cycles).

Closes #2060
Closes #2061
Closes #2062

* fix(deps): regenerate Cargo.lock for proptest dev-dep
oferchen added a commit that referenced this pull request May 18, 2026
Introduces CHANGELOG.md and records the four zsync-inspired internal
optimizations to the receiver block-match path: bithash prefilter
(#3737), sequential-match extension (#3751), matched-block pruning
(#3748), and compact-key layout (#3994). All four are wire-compatible
refactors of the in-memory match index; golden-byte fixtures and
interop against upstream rsync 3.0.9 / 3.1.3 / 3.4.1 are unchanged.

Closes #2087.
oferchen added a commit that referenced this pull request May 20, 2026
Adds the two ZSO-1 acceptance tests that complement the bithash
prefilter implementation already merged in PR #3737:

- bithash_rejects_at_least_seven_eighths_of_known_misses: builds a
  filter with N=1000 inserted rsums and probes with N=1000 disjoint
  known-miss rsums, asserting at least 87% reject at the bithash
  gate. Measured rate on this run: 97.2%.
- bithash_state_does_not_leak_between_independent_indexes: builds
  two DeltaSignatureIndex values from distinct basis bytes and
  confirms a probe for blocks indexed in one never matches in the
  other, pinning the per-NDX lifecycle relied on by the ZSO-7
  audit (docs/audits/zso-7-inc-recurse-per-segment-state-2026-05-21.md).

No production code changes. Wire-byte parity is unaffected; golden
tests stay green. Bithash design follows zsync 0.6.2 librcksum
(Colin Phipps, http://zsync.moria.org.uk/, README acknowledgement
added in PR #4597).
oferchen added a commit that referenced this pull request May 21, 2026
…SO-5)

Adds the ZSO-5 (task #2513) regression suite at
crates/matching/tests/zsync_wire_parity.rs. For each ZSO-1..4
optimization, constructs a basis + source pair designed to exercise
that optimization's code path, runs generate_delta, serialises the
DeltaScript through the production wire-format encoder
(protocol::wire::write_token_stream), and pins the resulting bytes
against (a) determinism across repeated runs and (b) round-trip
reconstruction of the source via apply_delta.

Active tests:
- bithash_optimization_preserves_wire_bytes (ZSO-1, task #2510, shipped
  in PR #3737) - 16 KiB random basis + disjoint random source so every
  rolling probe walks the bithash reject path. Asserts all-literal
  script and deterministic wire bytes.
- seq_match_optimization_preserves_wire_bytes (ZSO-2, task #2510,
  landing on PR #4624) - 84-block basis used as its own source so
  every adjacent-block hint fires. Asserts single fat Copy token and
  deterministic wire bytes. Test was authored to remain green on
  master (extend_run + script-level coalescing already produce the
  pinned shape) and stays green when PR #4624 swaps the probe path.
- all_active_zso_fixtures_are_deterministic - cross-cutting check that
  signals "matching pipeline became non-deterministic" as a single
  failure.

Stub tests (#[ignore]):
- hash_chain_prune_preserves_wire_bytes (ZSO-3, task #2511)
- compact_rolling_key_preserves_wire_bytes (ZSO-4, task #2512)

The stubs are structured so the only step required when those
optimizations land is to drop #[ignore] and swap the placeholder
basis/source for the duplicate-heavy / small-basis corpora documented
in the test rustdoc.

Complements the protocol-layer golden tests at
crates/protocol/tests/golden_protocol_v28_mplex_delta_stats.rs which
pin the token frame format but do not synthesize ZSO-specific basis
fixtures.

No production code changes. No new dependencies (uses a small in-test
LCG to keep the corpus deterministic). All 321 matching tests pass;
407 protocol golden tests pass.
oferchen added a commit that referenced this pull request May 21, 2026
…SO-5) (#4627)

Adds the ZSO-5 (task #2513) regression suite at
crates/matching/tests/zsync_wire_parity.rs. For each ZSO-1..4
optimization, constructs a basis + source pair designed to exercise
that optimization's code path, runs generate_delta, serialises the
DeltaScript through the production wire-format encoder
(protocol::wire::write_token_stream), and pins the resulting bytes
against (a) determinism across repeated runs and (b) round-trip
reconstruction of the source via apply_delta.

Active tests:
- bithash_optimization_preserves_wire_bytes (ZSO-1, task #2510, shipped
  in PR #3737) - 16 KiB random basis + disjoint random source so every
  rolling probe walks the bithash reject path. Asserts all-literal
  script and deterministic wire bytes.
- seq_match_optimization_preserves_wire_bytes (ZSO-2, task #2510,
  landing on PR #4624) - 84-block basis used as its own source so
  every adjacent-block hint fires. Asserts single fat Copy token and
  deterministic wire bytes. Test was authored to remain green on
  master (extend_run + script-level coalescing already produce the
  pinned shape) and stays green when PR #4624 swaps the probe path.
- all_active_zso_fixtures_are_deterministic - cross-cutting check that
  signals "matching pipeline became non-deterministic" as a single
  failure.

Stub tests (#[ignore]):
- hash_chain_prune_preserves_wire_bytes (ZSO-3, task #2511)
- compact_rolling_key_preserves_wire_bytes (ZSO-4, task #2512)

The stubs are structured so the only step required when those
optimizations land is to drop #[ignore] and swap the placeholder
basis/source for the duplicate-heavy / small-basis corpora documented
in the test rustdoc.

Complements the protocol-layer golden tests at
crates/protocol/tests/golden_protocol_v28_mplex_delta_stats.rs which
pin the token frame format but do not synthesize ZSO-specific basis
fixtures.

No production code changes. No new dependencies (uses a small in-test
LCG to keep the corpus deterministic). All 321 matching tests pass;
407 protocol golden tests pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant