feat(match): add zsync-inspired bithash prefilter to MatchIndex#3737
Merged
Conversation
Translate zsync's librcksum bithash gate into oc-rsync's existing DeltaSignatureIndex. The bithash is an oversized one-sided bit array sized to roughly 8x the rsum-bucket count and probed after the 1 KB tag_table check but before the FxHashMap walk. At the design saturation density (1/8) it rejects ~7/8 of post-tag misses for purely receiver-side CPU savings. The prefilter is in-memory only: no signature payload, NDX framing, capability negotiation, multiplex frame, golden-byte fixture, or interop matrix is touched. Sizing follows docs/design/zsync-bithash.md sections 1 and 2; insertion and probe sites match section 3 exactly. Wires into populate_index alongside tag_table and lookup, and into DeltaSignatureIndex::rebuild via clear() to recycle across the INC_RECURSE per-segment rebuild path. The probe goes into both find_match_bytes and find_match_slices; check_block_match_slices is intentionally not gated since it bypasses the hash table. Property tests pin the no-false-negative contract end-to-end (every inserted basis block is still found) and at the BitHash level (contains is true for every previously inserted rsum, including across clear/reinsert cycles). Closes #2060 Closes #2061 Closes #2062
3 tasks
oferchen
added a commit
that referenced
this pull request
May 14, 2026
Introduces CHANGELOG.md and records the four zsync-inspired internal optimizations to the receiver block-match path: bithash prefilter (#3737), sequential-match extension (#3751), matched-block pruning (#3748), and compact-key layout (#3994). All four are wire-compatible refactors of the in-memory match index; golden-byte fixtures and interop against upstream rsync 3.0.9 / 3.1.3 / 3.4.1 are unchanged. Closes #2087.
oferchen
added a commit
that referenced
this pull request
May 18, 2026
* feat(match): add zsync-inspired bithash prefilter to MatchIndex Translate zsync's librcksum bithash gate into oc-rsync's existing DeltaSignatureIndex. The bithash is an oversized one-sided bit array sized to roughly 8x the rsum-bucket count and probed after the 1 KB tag_table check but before the FxHashMap walk. At the design saturation density (1/8) it rejects ~7/8 of post-tag misses for purely receiver-side CPU savings. The prefilter is in-memory only: no signature payload, NDX framing, capability negotiation, multiplex frame, golden-byte fixture, or interop matrix is touched. Sizing follows docs/design/zsync-bithash.md sections 1 and 2; insertion and probe sites match section 3 exactly. Wires into populate_index alongside tag_table and lookup, and into DeltaSignatureIndex::rebuild via clear() to recycle across the INC_RECURSE per-segment rebuild path. The probe goes into both find_match_bytes and find_match_slices; check_block_match_slices is intentionally not gated since it bypasses the hash table. Property tests pin the no-false-negative contract end-to-end (every inserted basis block is still found) and at the BitHash level (contains is true for every previously inserted rsum, including across clear/reinsert cycles). Closes #2060 Closes #2061 Closes #2062 * fix(deps): regenerate Cargo.lock for proptest dev-dep
oferchen
added a commit
that referenced
this pull request
May 18, 2026
Introduces CHANGELOG.md and records the four zsync-inspired internal optimizations to the receiver block-match path: bithash prefilter (#3737), sequential-match extension (#3751), matched-block pruning (#3748), and compact-key layout (#3994). All four are wire-compatible refactors of the in-memory match index; golden-byte fixtures and interop against upstream rsync 3.0.9 / 3.1.3 / 3.4.1 are unchanged. Closes #2087.
4 tasks
oferchen
added a commit
that referenced
this pull request
May 20, 2026
Adds the two ZSO-1 acceptance tests that complement the bithash prefilter implementation already merged in PR #3737: - bithash_rejects_at_least_seven_eighths_of_known_misses: builds a filter with N=1000 inserted rsums and probes with N=1000 disjoint known-miss rsums, asserting at least 87% reject at the bithash gate. Measured rate on this run: 97.2%. - bithash_state_does_not_leak_between_independent_indexes: builds two DeltaSignatureIndex values from distinct basis bytes and confirms a probe for blocks indexed in one never matches in the other, pinning the per-NDX lifecycle relied on by the ZSO-7 audit (docs/audits/zso-7-inc-recurse-per-segment-state-2026-05-21.md). No production code changes. Wire-byte parity is unaffected; golden tests stay green. Bithash design follows zsync 0.6.2 librcksum (Colin Phipps, http://zsync.moria.org.uk/, README acknowledgement added in PR #4597).
5 tasks
oferchen
added a commit
that referenced
this pull request
May 21, 2026
…SO-5) Adds the ZSO-5 (task #2513) regression suite at crates/matching/tests/zsync_wire_parity.rs. For each ZSO-1..4 optimization, constructs a basis + source pair designed to exercise that optimization's code path, runs generate_delta, serialises the DeltaScript through the production wire-format encoder (protocol::wire::write_token_stream), and pins the resulting bytes against (a) determinism across repeated runs and (b) round-trip reconstruction of the source via apply_delta. Active tests: - bithash_optimization_preserves_wire_bytes (ZSO-1, task #2510, shipped in PR #3737) - 16 KiB random basis + disjoint random source so every rolling probe walks the bithash reject path. Asserts all-literal script and deterministic wire bytes. - seq_match_optimization_preserves_wire_bytes (ZSO-2, task #2510, landing on PR #4624) - 84-block basis used as its own source so every adjacent-block hint fires. Asserts single fat Copy token and deterministic wire bytes. Test was authored to remain green on master (extend_run + script-level coalescing already produce the pinned shape) and stays green when PR #4624 swaps the probe path. - all_active_zso_fixtures_are_deterministic - cross-cutting check that signals "matching pipeline became non-deterministic" as a single failure. Stub tests (#[ignore]): - hash_chain_prune_preserves_wire_bytes (ZSO-3, task #2511) - compact_rolling_key_preserves_wire_bytes (ZSO-4, task #2512) The stubs are structured so the only step required when those optimizations land is to drop #[ignore] and swap the placeholder basis/source for the duplicate-heavy / small-basis corpora documented in the test rustdoc. Complements the protocol-layer golden tests at crates/protocol/tests/golden_protocol_v28_mplex_delta_stats.rs which pin the token frame format but do not synthesize ZSO-specific basis fixtures. No production code changes. No new dependencies (uses a small in-test LCG to keep the corpus deterministic). All 321 matching tests pass; 407 protocol golden tests pass.
oferchen
added a commit
that referenced
this pull request
May 21, 2026
…SO-5) (#4627) Adds the ZSO-5 (task #2513) regression suite at crates/matching/tests/zsync_wire_parity.rs. For each ZSO-1..4 optimization, constructs a basis + source pair designed to exercise that optimization's code path, runs generate_delta, serialises the DeltaScript through the production wire-format encoder (protocol::wire::write_token_stream), and pins the resulting bytes against (a) determinism across repeated runs and (b) round-trip reconstruction of the source via apply_delta. Active tests: - bithash_optimization_preserves_wire_bytes (ZSO-1, task #2510, shipped in PR #3737) - 16 KiB random basis + disjoint random source so every rolling probe walks the bithash reject path. Asserts all-literal script and deterministic wire bytes. - seq_match_optimization_preserves_wire_bytes (ZSO-2, task #2510, landing on PR #4624) - 84-block basis used as its own source so every adjacent-block hint fires. Asserts single fat Copy token and deterministic wire bytes. Test was authored to remain green on master (extend_run + script-level coalescing already produce the pinned shape) and stays green when PR #4624 swaps the probe path. - all_active_zso_fixtures_are_deterministic - cross-cutting check that signals "matching pipeline became non-deterministic" as a single failure. Stub tests (#[ignore]): - hash_chain_prune_preserves_wire_bytes (ZSO-3, task #2511) - compact_rolling_key_preserves_wire_bytes (ZSO-4, task #2512) The stubs are structured so the only step required when those optimizations land is to drop #[ignore] and swap the placeholder basis/source for the duplicate-heavy / small-basis corpora documented in the test rustdoc. Complements the protocol-layer golden tests at crates/protocol/tests/golden_protocol_v28_mplex_delta_stats.rs which pin the token frame format but do not synthesize ZSO-specific basis fixtures. No production code changes. No new dependencies (uses a small in-test LCG to keep the corpus deterministic). All 321 matching tests pass; 407 protocol golden tests pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
crates/match/src/index/bithash.rs) toDeltaSignatureIndex. Sized to ~8x the rsum-bucket count perdocs/design/zsync-bithash.mdsection 1, populated alongsidetag_tableinpopulate_index, and probed in bothfind_match_bytesandfind_match_slicesbetween the tag_table gate and the FxHashMap walk.BitHash::clear()intoDeltaSignatureIndex::rebuildso the INC_RECURSE per-segment rebuild path recycles the bit array without re-allocating, mirroring the existingtag_tablereset.bithash_tests.rs, proptest 1.4) pin the no-false-negative contract end-to-end (every inserted basis block is still found through both probe paths) and at theBitHashlevel (every inserted rsum survivescontains, including afterclear+ reinsert).Wire-compat
In-memory only. No signature payload, NDX framing, capability string, multiplex frame, golden-byte fixture, or interop matrix is touched. The only observable effect is reduced receiver CPU during delta search.
Test plan
nextest run -p matching --all-features(CI)