feat(match): add zsync-inspired seq-match extend-run#3751
Merged
Conversation
Coalesces consecutive matched basis blocks at the DeltaScript layer into
a single fat `DeltaToken::Copy { len = run * block_length }`, mirroring
zsync's `next_match` shortcut from `librcksum/rsum.c:262`. The wire
layer (`script_to_wire_delta`) expands fat Copy tokens back into one
DeltaOp per basis block, so the wire byte stream stays byte-identical
to the no-coalesce baseline (closes #2065).
Adds `DeltaSignatureIndex::extend_run(start_block_index, target,
max_blocks)` as the public helper and pins the post-coalesce token
stream with a golden-byte regression test in
`crates/match/tests/seq_match_golden.rs` (closes #2066). Updates
shifted-insertion and sparse-match fixtures to expand fat Copy runs
before asserting per-block contiguity.
Wire-compat invariants from `docs/design/zsync-seq-match.md`:
- One write_int(-(block_index + 1)) per basis block on the wire.
- CPRES_ZLIB dictionary sync still feeds one block per match call.
- `apply_delta` and `compute_file_checksum` already honour `len`,
no semantic changes downstream.
…ocks The seq-match extend_run helper coalesces consecutive matching blocks into a single COPY token, so the prior token-count assertion no longer holds. Asserting that all input bytes are covered by COPY tokens (and zero literal bytes) preserves the test's intent without depending on the internal coalescing strategy.
Both seq_match_emits_single_fat_copy_for_full_basis_run and seq_match_matched_bytes_match_baseline assume every basis block is full-length so that extend_run can walk the whole basis in a single fat-copy and matched bytes equal block_count * block_length. With 65 536-byte input and 700-byte blocks, the trailing 536-byte partial block defeated those invariants. Sizing to 700 * 94 = 65 800 bytes keeps the basis well under the < 700^2 byte threshold (so the layout still picks 700) and removes the partial trailing block.
3 tasks
oferchen
added a commit
that referenced
this pull request
May 14, 2026
Introduces CHANGELOG.md and records the four zsync-inspired internal optimizations to the receiver block-match path: bithash prefilter (#3737), sequential-match extension (#3751), matched-block pruning (#3748), and compact-key layout (#3994). All four are wire-compatible refactors of the in-memory match index; golden-byte fixtures and interop against upstream rsync 3.0.9 / 3.1.3 / 3.4.1 are unchanged. Closes #2087.
oferchen
added a commit
that referenced
this pull request
May 18, 2026
* feat(match): add zsync-inspired seq-match extend-run
Coalesces consecutive matched basis blocks at the DeltaScript layer into
a single fat `DeltaToken::Copy { len = run * block_length }`, mirroring
zsync's `next_match` shortcut from `librcksum/rsum.c:262`. The wire
layer (`script_to_wire_delta`) expands fat Copy tokens back into one
DeltaOp per basis block, so the wire byte stream stays byte-identical
to the no-coalesce baseline (closes #2065).
Adds `DeltaSignatureIndex::extend_run(start_block_index, target,
max_blocks)` as the public helper and pins the post-coalesce token
stream with a golden-byte regression test in
`crates/match/tests/seq_match_golden.rs` (closes #2066). Updates
shifted-insertion and sparse-match fixtures to expand fat Copy runs
before asserting per-block contiguity.
Wire-compat invariants from `docs/design/zsync-seq-match.md`:
- One write_int(-(block_index + 1)) per basis block on the wire.
- CPRES_ZLIB dictionary sync still feeds one block per match call.
- `apply_delta` and `compute_file_checksum` already honour `len`,
no semantic changes downstream.
* fix(match): pad extend_run test data to multiple of DEFAULT_BLOCK_SIZE
* fix(test): assert copied bytes instead of token count for repeated blocks
The seq-match extend_run helper coalesces consecutive matching blocks
into a single COPY token, so the prior token-count assertion no longer
holds. Asserting that all input bytes are covered by COPY tokens (and
zero literal bytes) preserves the test's intent without depending on
the internal coalescing strategy.
* fix(test): size synthetic basis to exact multiple of DEFAULT_BLOCK_SIZE
Both seq_match_emits_single_fat_copy_for_full_basis_run and
seq_match_matched_bytes_match_baseline assume every basis block is
full-length so that extend_run can walk the whole basis in a single
fat-copy and matched bytes equal block_count * block_length. With
65 536-byte input and 700-byte blocks, the trailing 536-byte partial
block defeated those invariants. Sizing to 700 * 94 = 65 800 bytes
keeps the basis well under the < 700^2 byte threshold (so the layout
still picks 700) and removes the partial trailing block.
oferchen
added a commit
that referenced
this pull request
May 18, 2026
Introduces CHANGELOG.md and records the four zsync-inspired internal optimizations to the receiver block-match path: bithash prefilter (#3737), sequential-match extension (#3751), matched-block pruning (#3748), and compact-key layout (#3994). All four are wire-compatible refactors of the in-memory match index; golden-byte fixtures and interop against upstream rsync 3.0.9 / 3.1.3 / 3.4.1 are unchanged. Closes #2087.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
DeltaToken::Copy { len = run * block_length }at the script layer, mirroring zsync'snext_matchshortcut. Wire layer expands fat Copy tokens back into one DeltaOp per block, preserving byte-identical wire output (closes Stabilize run_client delete test timestamps #2065).DeltaSignatureIndex::extend_run(start_block_index, target, max_blocks)as the public helper that probes consecutive blocks against a buffered target and returns the matching run length.crates/match/tests/seq_match_golden.rscovering the coalesced token shape, byte-equivalent reconstruction, theextend_runAPI, and the matched-bytes invariant from the design doc (closes Fix local delta copy when identical timestamps #2066).Wire-compat invariants
Per
docs/design/zsync-seq-match.md:write_int(-(block_index + 1))per basis block on the wire.send_block_matchcall (post-expansion).apply_deltaandcompute_file_checksumalready honourlen, so a fat Copy round-trips without semantic changes.Test plan
crates/match/tests/seq_match_golden.rsconfirms the post-coalesce token shape and byte-equivalence.