test(matching): zsync adversarial - shifted-insertion + sparse-match (#2079 #2080)#4164
Merged
Merged
Conversation
b238b2b to
ce7e408
Compare
…2079 #2080) Combine the two zsync-prefilter adversarial regression gates the optimisation tracker calls out in project_zsync_optimizations.md: - #2079 shifted-insertion: extend the existing 32 KiB fixture with a 1 MiB-basis case driven by a deterministic xorshift64* PRNG. Asserts that the delta references basis blocks on both sides of the insertion for a parameterised (N, M) vector covering aligned, sub-block, sub-word, and tail-edge insert positions. Lost-block tolerance is one block (the boundary block of an unaligned insert). - #2080 sparse-match: new internal unit test sitting inside src/index/ so it can probe the otherwise private bithash field without enlarging the public API. Builds a 100 MiB basis + 100 MiB target whose only overlap is a single 4 KiB planted region, then drives the rolling window over the full target asserting (a) exactly one block-aligned match per basis block in the planted region, (b) the match's offset and length, and (c) a >= 70 % bithash rejection rate over the non-planted region. The 100 MiB case is marked #[ignore] so the default nextest run stays fast; stress runs invoke it explicitly via --run-ignored. Both tests use deterministic xorshift PRNGs so the inputs are reproducible without committing binary fixtures to the test tree. Files: - crates/matching/tests/shifted_insertion_fixture.rs (+234) - crates/matching/src/index/sparse_match_tests.rs (new, +312) - crates/matching/src/index/mod.rs (+2, register submodule)
ce7e408 to
e68e069
Compare
7 tasks
oferchen
added a commit
that referenced
this pull request
May 16, 2026
Captures evidence that the cumulative zsync-inspired matching work (#2059-#2087) plus its three recent follow-up PRs (#4164 adversarial fixtures, #4166 arena allocator feasibility audit, #4169 zsync cleanup audit) has not regressed upstream rsync interoperability. Anchored on master commit 6a615aa where both the CI and Interop Validation workflows are green. Records the per-version matrix (3.0.9 / 3.1.3 / 3.4.1 / 3.4.2 native, plus protocols 28-32 against 3.4.2), the 71-scenario standalone list with the single pre-existing known failure, and the delta-stats parity proof (oc-rsync and upstream both report matched=91900) that the matching pipeline remains wire-equivalent.
oferchen
added a commit
that referenced
this pull request
May 16, 2026
…4171) Captures evidence that the cumulative zsync-inspired matching work (#2059-#2087) plus its three recent follow-up PRs (#4164 adversarial fixtures, #4166 arena allocator feasibility audit, #4169 zsync cleanup audit) has not regressed upstream rsync interoperability. Anchored on master commit 6a615aa where both the CI and Interop Validation workflows are green. Records the per-version matrix (3.0.9 / 3.1.3 / 3.4.1 / 3.4.2 native, plus protocols 28-32 against 3.4.2), the 71-scenario standalone list with the single pre-existing known failure, and the delta-stats parity proof (oc-rsync and upstream both report matched=91900) that the matching pipeline remains wire-equivalent.
oferchen
added a commit
that referenced
this pull request
May 18, 2026
…2079 #2080) (#4164) Combine the two zsync-prefilter adversarial regression gates the optimisation tracker calls out in project_zsync_optimizations.md: - #2079 shifted-insertion: extend the existing 32 KiB fixture with a 1 MiB-basis case driven by a deterministic xorshift64* PRNG. Asserts that the delta references basis blocks on both sides of the insertion for a parameterised (N, M) vector covering aligned, sub-block, sub-word, and tail-edge insert positions. Lost-block tolerance is one block (the boundary block of an unaligned insert). - #2080 sparse-match: new internal unit test sitting inside src/index/ so it can probe the otherwise private bithash field without enlarging the public API. Builds a 100 MiB basis + 100 MiB target whose only overlap is a single 4 KiB planted region, then drives the rolling window over the full target asserting (a) exactly one block-aligned match per basis block in the planted region, (b) the match's offset and length, and (c) a >= 70 % bithash rejection rate over the non-planted region. The 100 MiB case is marked #[ignore] so the default nextest run stays fast; stress runs invoke it explicitly via --run-ignored. Both tests use deterministic xorshift PRNGs so the inputs are reproducible without committing binary fixtures to the test tree. Files: - crates/matching/tests/shifted_insertion_fixture.rs (+234) - crates/matching/src/index/sparse_match_tests.rs (new, +312) - crates/matching/src/index/mod.rs (+2, register submodule)
oferchen
added a commit
that referenced
this pull request
May 18, 2026
…4171) Captures evidence that the cumulative zsync-inspired matching work (#2059-#2087) plus its three recent follow-up PRs (#4164 adversarial fixtures, #4166 arena allocator feasibility audit, #4169 zsync cleanup audit) has not regressed upstream rsync interoperability. Anchored on master commit bfb940e where both the CI and Interop Validation workflows are green. Records the per-version matrix (3.0.9 / 3.1.3 / 3.4.1 / 3.4.2 native, plus protocols 28-32 against 3.4.2), the 71-scenario standalone list with the single pre-existing known failure, and the delta-stats parity proof (oc-rsync and upstream both report matched=91900) that the matching pipeline remains wire-equivalent.
oferchen
added a commit
that referenced
this pull request
May 18, 2026
…2079 #2080) (#4164) Combine the two zsync-prefilter adversarial regression gates the optimisation tracker calls out in project_zsync_optimizations.md: - #2079 shifted-insertion: extend the existing 32 KiB fixture with a 1 MiB-basis case driven by a deterministic xorshift64* PRNG. Asserts that the delta references basis blocks on both sides of the insertion for a parameterised (N, M) vector covering aligned, sub-block, sub-word, and tail-edge insert positions. Lost-block tolerance is one block (the boundary block of an unaligned insert). - #2080 sparse-match: new internal unit test sitting inside src/index/ so it can probe the otherwise private bithash field without enlarging the public API. Builds a 100 MiB basis + 100 MiB target whose only overlap is a single 4 KiB planted region, then drives the rolling window over the full target asserting (a) exactly one block-aligned match per basis block in the planted region, (b) the match's offset and length, and (c) a >= 70 % bithash rejection rate over the non-planted region. The 100 MiB case is marked #[ignore] so the default nextest run stays fast; stress runs invoke it explicitly via --run-ignored. Both tests use deterministic xorshift PRNGs so the inputs are reproducible without committing binary fixtures to the test tree. Files: - crates/matching/tests/shifted_insertion_fixture.rs (+234) - crates/matching/src/index/sparse_match_tests.rs (new, +312) - crates/matching/src/index/mod.rs (+2, register submodule)
oferchen
added a commit
that referenced
this pull request
May 18, 2026
…4171) Captures evidence that the cumulative zsync-inspired matching work (#2059-#2087) plus its three recent follow-up PRs (#4164 adversarial fixtures, #4166 arena allocator feasibility audit, #4169 zsync cleanup audit) has not regressed upstream rsync interoperability. Anchored on master commit 7329d95 where both the CI and Interop Validation workflows are green. Records the per-version matrix (3.0.9 / 3.1.3 / 3.4.1 / 3.4.2 native, plus protocols 28-32 against 3.4.2), the 71-scenario standalone list with the single pre-existing known failure, and the delta-stats parity proof (oc-rsync and upstream both report matched=91900) that the matching pipeline remains wire-equivalent.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Combine the two zsync-prefilter adversarial regression gates the
optimisation tracker (
project_zsync_optimizations.md) calls out forthe rolling-hash + bithash + MatchIndex pipeline that landed earlier in
the v0.6 cycle.
#2079 shifted-insertion
Extends
crates/matching/tests/shifted_insertion_fixture.rswithshifted_insertion_1mib_basis_matches_span_the_shift(+234 LoC):(N, M)fixture vector covering 7 cells: head-alignedsingle block, mid-aligned single + multi-block, near-tail aligned,
mid sub-block (1 byte, 7 bytes), and a quarter-position
multi-block-with-tail unaligned insert.
Nfiller bytes (high-bit-set xorshift stream)at offset
M, generate the delta, and assert (a) the script's totalbyte count, (b) literal_bytes covers at least
N, (c) round-tripreconstruction is byte-identical, and (d) the COPY tokens reference
basis blocks both before and after the insertion point with
at most one boundary block lost.
This guards the property that any future prefilter (bithash, compact
keys, seq-match) cannot silently lose matches shifted by
Nbytesagainst an aligned basis.
#2080 sparse-match
Adds
crates/matching/src/index/sparse_match_tests.rs(new, +312 LoC),registered from
mod.rs. The test lives inside theindexmoduleso it can read the otherwise private
bithashandtag_tablefieldswithout enlarging the public API.
except a single 4 KiB region taken verbatim from the basis at offset
32 MiB and replanted at target offset 64 MiB.
referencing consecutive basis blocks starting at the planted
offset / block_len, in order.
70 %. The design-note theoretical bound is 7/8 (87.5 %); the
observed value at 100 MiB / 1 KiB blocks is ~78 % because the
tag-table fast path saturates at this scale and clustering on
the upper sum2 bits pulls the rejection rate below the uniform
bound. The 70 % floor catches a regression below the "most
probes are rejected" gate the project memory cites.
The 100 MiB test is marked
#[ignore](the existing fixture marks16 MiB tests the same way) so the default
cargo nextestmatrix staysfast. Stress runs invoke it via:
cargo nextest run --release -p matching --all-features \ --run-ignored only -E 'test(sparse_match_100mib_single_overlap)'Locally the test passes in 90 s wall clock with the actual rejection
rate at 0.786.
Two cheap sanity tests (
basis_builder_is_deterministic,target_non_planted_bytes_have_msb_set) run as part of the defaultmatrix and pin the fixture builder invariants on a 1 MiB prefix.
Files
crates/matching/tests/shifted_insertion_fixture.rs(+234)crates/matching/src/index/sparse_match_tests.rs(new, +312)crates/matching/src/index/mod.rs(+2, register the newcfg(test)submodule)
Test plan
cargo fmt --all -- --checkcleancargo clippy -p matching --all-targets --all-features --no-deps -- -D warningscleancargo nextest run -p matching --all-features- 318 passed, 3 skipped (the existing 16 MiB ignored cases plus the new 100 MiB ignored case)cargo nextest run --release -p matching --run-ignored only -E 'test(sparse_match_100mib_single_overlap)'- passes locally in 90 s, bithash rejection rate observed 0.786cargo nextest run -p matching -E 'test(shifted_insertion_1mib)'- new 1 MiB shifted-insertion case passes in 0.3 s