Skip to content

feat(match): add zsync-inspired seq-match extend-run#3751

Merged
oferchen merged 4 commits into
masterfrom
feat/zsync-seq-match
May 6, 2026
Merged

feat(match): add zsync-inspired seq-match extend-run#3751
oferchen merged 4 commits into
masterfrom
feat/zsync-seq-match

Conversation

@oferchen
Copy link
Copy Markdown
Owner

@oferchen oferchen commented May 5, 2026

Summary

  • Coalesces consecutive matched basis blocks into a single fat DeltaToken::Copy { len = run * block_length } at the script layer, mirroring zsync's next_match shortcut. Wire layer expands fat Copy tokens back into one DeltaOp per block, preserving byte-identical wire output (closes Stabilize run_client delete test timestamps #2065).
  • Adds DeltaSignatureIndex::extend_run(start_block_index, target, max_blocks) as the public helper that probes consecutive blocks against a buffered target and returns the matching run length.
  • Adds a golden-byte regression test in crates/match/tests/seq_match_golden.rs covering the coalesced token shape, byte-equivalent reconstruction, the extend_run API, and the matched-bytes invariant from the design doc (closes Fix local delta copy when identical timestamps #2066).
  • Updates shifted-insertion and sparse-match fixtures to expand fat Copy runs before asserting per-block contiguity.

Wire-compat invariants

Per docs/design/zsync-seq-match.md:

  • One write_int(-(block_index + 1)) per basis block on the wire.
  • CPRES_ZLIB dictionary sync still feeds one block per send_block_match call (post-expansion).
  • apply_delta and compute_file_checksum already honour len, so a fat Copy round-trips without semantic changes.

Test plan

  • CI fmt+clippy passes on stable.
  • CI nextest stable / Windows / macOS / Linux musl pass with the new and updated fixtures.
  • crates/match/tests/seq_match_golden.rs confirms the post-coalesce token shape and byte-equivalence.
  • Existing shifted-insertion + sparse-match adversarial fixtures still hold their per-block contiguity invariants after the expand-fat-Copy update.

Coalesces consecutive matched basis blocks at the DeltaScript layer into
a single fat `DeltaToken::Copy { len = run * block_length }`, mirroring
zsync's `next_match` shortcut from `librcksum/rsum.c:262`. The wire
layer (`script_to_wire_delta`) expands fat Copy tokens back into one
DeltaOp per basis block, so the wire byte stream stays byte-identical
to the no-coalesce baseline (closes #2065).

Adds `DeltaSignatureIndex::extend_run(start_block_index, target,
max_blocks)` as the public helper and pins the post-coalesce token
stream with a golden-byte regression test in
`crates/match/tests/seq_match_golden.rs` (closes #2066). Updates
shifted-insertion and sparse-match fixtures to expand fat Copy runs
before asserting per-block contiguity.

Wire-compat invariants from `docs/design/zsync-seq-match.md`:
- One write_int(-(block_index + 1)) per basis block on the wire.
- CPRES_ZLIB dictionary sync still feeds one block per match call.
- `apply_delta` and `compute_file_checksum` already honour `len`,
  no semantic changes downstream.
@github-actions github-actions Bot added the enhancement New feature or request label May 6, 2026
oferchen added 3 commits May 6, 2026 06:36
…ocks

The seq-match extend_run helper coalesces consecutive matching blocks
into a single COPY token, so the prior token-count assertion no longer
holds. Asserting that all input bytes are covered by COPY tokens (and
zero literal bytes) preserves the test's intent without depending on
the internal coalescing strategy.
Both seq_match_emits_single_fat_copy_for_full_basis_run and
seq_match_matched_bytes_match_baseline assume every basis block is
full-length so that extend_run can walk the whole basis in a single
fat-copy and matched bytes equal block_count * block_length. With
65 536-byte input and 700-byte blocks, the trailing 536-byte partial
block defeated those invariants. Sizing to 700 * 94 = 65 800 bytes
keeps the basis well under the < 700^2 byte threshold (so the layout
still picks 700) and removes the partial trailing block.
@oferchen oferchen merged commit 6122b50 into master May 6, 2026
39 checks passed
@oferchen oferchen deleted the feat/zsync-seq-match branch May 6, 2026 18:56
oferchen added a commit that referenced this pull request May 14, 2026
Introduces CHANGELOG.md and records the four zsync-inspired internal
optimizations to the receiver block-match path: bithash prefilter
(#3737), sequential-match extension (#3751), matched-block pruning
(#3748), and compact-key layout (#3994). All four are wire-compatible
refactors of the in-memory match index; golden-byte fixtures and
interop against upstream rsync 3.0.9 / 3.1.3 / 3.4.1 are unchanged.

Closes #2087.
oferchen added a commit that referenced this pull request May 18, 2026
* feat(match): add zsync-inspired seq-match extend-run

Coalesces consecutive matched basis blocks at the DeltaScript layer into
a single fat `DeltaToken::Copy { len = run * block_length }`, mirroring
zsync's `next_match` shortcut from `librcksum/rsum.c:262`. The wire
layer (`script_to_wire_delta`) expands fat Copy tokens back into one
DeltaOp per basis block, so the wire byte stream stays byte-identical
to the no-coalesce baseline (closes #2065).

Adds `DeltaSignatureIndex::extend_run(start_block_index, target,
max_blocks)` as the public helper and pins the post-coalesce token
stream with a golden-byte regression test in
`crates/match/tests/seq_match_golden.rs` (closes #2066). Updates
shifted-insertion and sparse-match fixtures to expand fat Copy runs
before asserting per-block contiguity.

Wire-compat invariants from `docs/design/zsync-seq-match.md`:
- One write_int(-(block_index + 1)) per basis block on the wire.
- CPRES_ZLIB dictionary sync still feeds one block per match call.
- `apply_delta` and `compute_file_checksum` already honour `len`,
  no semantic changes downstream.

* fix(match): pad extend_run test data to multiple of DEFAULT_BLOCK_SIZE

* fix(test): assert copied bytes instead of token count for repeated blocks

The seq-match extend_run helper coalesces consecutive matching blocks
into a single COPY token, so the prior token-count assertion no longer
holds. Asserting that all input bytes are covered by COPY tokens (and
zero literal bytes) preserves the test's intent without depending on
the internal coalescing strategy.

* fix(test): size synthetic basis to exact multiple of DEFAULT_BLOCK_SIZE

Both seq_match_emits_single_fat_copy_for_full_basis_run and
seq_match_matched_bytes_match_baseline assume every basis block is
full-length so that extend_run can walk the whole basis in a single
fat-copy and matched bytes equal block_count * block_length. With
65 536-byte input and 700-byte blocks, the trailing 536-byte partial
block defeated those invariants. Sizing to 700 * 94 = 65 800 bytes
keeps the basis well under the < 700^2 byte threshold (so the layout
still picks 700) and removes the partial trailing block.
oferchen added a commit that referenced this pull request May 18, 2026
Introduces CHANGELOG.md and records the four zsync-inspired internal
optimizations to the receiver block-match path: bithash prefilter
(#3737), sequential-match extension (#3751), matched-block pruning
(#3748), and compact-key layout (#3994). All four are wire-compatible
refactors of the in-memory match index; golden-byte fixtures and
interop against upstream rsync 3.0.9 / 3.1.3 / 3.4.1 are unchanged.

Closes #2087.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant