Skip to content

test(matching): zsync adversarial - shifted-insertion + sparse-match (#2079 #2080)#4164

Merged
oferchen merged 1 commit into
masterfrom
test/zsync-adversarial-2079-2080
May 16, 2026
Merged

test(matching): zsync adversarial - shifted-insertion + sparse-match (#2079 #2080)#4164
oferchen merged 1 commit into
masterfrom
test/zsync-adversarial-2079-2080

Conversation

@oferchen
Copy link
Copy Markdown
Owner

Summary

Combine the two zsync-prefilter adversarial regression gates the
optimisation tracker (project_zsync_optimizations.md) calls out for
the rolling-hash + bithash + MatchIndex pipeline that landed earlier in
the v0.6 cycle.

#2079 shifted-insertion

Extends crates/matching/tests/shifted_insertion_fixture.rs with
shifted_insertion_1mib_basis_matches_span_the_shift (+234 LoC):

  • 1 MiB basis built from a deterministic xorshift64* PRNG.
  • Parameterised (N, M) fixture vector covering 7 cells: head-aligned
    single block, mid-aligned single + multi-block, near-tail aligned,
    mid sub-block (1 byte, 7 bytes), and a quarter-position
    multi-block-with-tail unaligned insert.
  • For each cell: insert N filler bytes (high-bit-set xorshift stream)
    at offset M, generate the delta, and assert (a) the script's total
    byte count, (b) literal_bytes covers at least N, (c) round-trip
    reconstruction is byte-identical, and (d) the COPY tokens reference
    basis blocks both before and after the insertion point with
    at most one boundary block lost.

This guards the property that any future prefilter (bithash, compact
keys, seq-match) cannot silently lose matches shifted by N bytes
against an aligned basis.

#2080 sparse-match

Adds crates/matching/src/index/sparse_match_tests.rs (new, +312 LoC),
registered from mod.rs. The test lives inside the index module
so it can read the otherwise private bithash and tag_table fields
without enlarging the public API.

  • 100 MiB basis built from a deterministic xorshift64* PRNG.
  • 100 MiB target: independent xorshift stream with the MSB forced,
    except a single 4 KiB region taken verbatim from the basis at offset
    32 MiB and replanted at target offset 64 MiB.
  • Drives the rolling window across the full target buffer asserting:
    1. Exactly four block-aligned hits (4 KiB / 1024-byte block_len),
      referencing consecutive basis blocks starting at the planted
      offset / block_len, in order.
    2. Bithash rejection rate over the non-planted region is at least
      70 %. The design-note theoretical bound is 7/8 (87.5 %); the
      observed value at 100 MiB / 1 KiB blocks is ~78 % because the
      tag-table fast path saturates at this scale and clustering on
      the upper sum2 bits pulls the rejection rate below the uniform
      bound. The 70 % floor catches a regression below the "most
      probes are rejected" gate the project memory cites.

The 100 MiB test is marked #[ignore] (the existing fixture marks
16 MiB tests the same way) so the default cargo nextest matrix stays
fast. Stress runs invoke it via:

cargo nextest run --release -p matching --all-features \
  --run-ignored only -E 'test(sparse_match_100mib_single_overlap)'

Locally the test passes in 90 s wall clock with the actual rejection
rate at 0.786.

Two cheap sanity tests (basis_builder_is_deterministic,
target_non_planted_bytes_have_msb_set) run as part of the default
matrix and pin the fixture builder invariants on a 1 MiB prefix.

Files

  • crates/matching/tests/shifted_insertion_fixture.rs (+234)
  • crates/matching/src/index/sparse_match_tests.rs (new, +312)
  • crates/matching/src/index/mod.rs (+2, register the new cfg(test)
    submodule)

Test plan

  • cargo fmt --all -- --check clean
  • cargo clippy -p matching --all-targets --all-features --no-deps -- -D warnings clean
  • cargo nextest run -p matching --all-features - 318 passed, 3 skipped (the existing 16 MiB ignored cases plus the new 100 MiB ignored case)
  • cargo nextest run --release -p matching --run-ignored only -E 'test(sparse_match_100mib_single_overlap)' - passes locally in 90 s, bithash rejection rate observed 0.786
  • cargo nextest run -p matching -E 'test(shifted_insertion_1mib)' - new 1 MiB shifted-insertion case passes in 0.3 s

@github-actions github-actions Bot added the test label May 16, 2026
@oferchen oferchen force-pushed the test/zsync-adversarial-2079-2080 branch 2 times, most recently from b238b2b to ce7e408 Compare May 16, 2026 17:52
…2079 #2080)

Combine the two zsync-prefilter adversarial regression gates the
optimisation tracker calls out in project_zsync_optimizations.md:

- #2079 shifted-insertion: extend the existing 32 KiB fixture with a
  1 MiB-basis case driven by a deterministic xorshift64* PRNG. Asserts
  that the delta references basis blocks on both sides of the insertion
  for a parameterised (N, M) vector covering aligned, sub-block,
  sub-word, and tail-edge insert positions. Lost-block tolerance is one
  block (the boundary block of an unaligned insert).

- #2080 sparse-match: new internal unit test sitting inside src/index/
  so it can probe the otherwise private bithash field without enlarging
  the public API. Builds a 100 MiB basis + 100 MiB target whose only
  overlap is a single 4 KiB planted region, then drives the rolling
  window over the full target asserting (a) exactly one block-aligned
  match per basis block in the planted region, (b) the match's offset
  and length, and (c) a >= 70 % bithash rejection rate over the
  non-planted region. The 100 MiB case is marked #[ignore] so the
  default nextest run stays fast; stress runs invoke it explicitly via
  --run-ignored.

Both tests use deterministic xorshift PRNGs so the inputs are
reproducible without committing binary fixtures to the test tree.

Files:
- crates/matching/tests/shifted_insertion_fixture.rs (+234)
- crates/matching/src/index/sparse_match_tests.rs (new, +312)
- crates/matching/src/index/mod.rs (+2, register submodule)
@oferchen oferchen force-pushed the test/zsync-adversarial-2079-2080 branch from ce7e408 to e68e069 Compare May 16, 2026 18:27
@oferchen oferchen merged commit 6e47121 into master May 16, 2026
39 checks passed
@oferchen oferchen deleted the test/zsync-adversarial-2079-2080 branch May 16, 2026 19:10
oferchen added a commit that referenced this pull request May 16, 2026
Captures evidence that the cumulative zsync-inspired matching work
(#2059-#2087) plus its three recent follow-up PRs (#4164 adversarial
fixtures, #4166 arena allocator feasibility audit, #4169 zsync cleanup
audit) has not regressed upstream rsync interoperability.

Anchored on master commit 6a615aa where both the CI and Interop
Validation workflows are green. Records the per-version matrix
(3.0.9 / 3.1.3 / 3.4.1 / 3.4.2 native, plus protocols 28-32 against
3.4.2), the 71-scenario standalone list with the single pre-existing
known failure, and the delta-stats parity proof (oc-rsync and upstream
both report matched=91900) that the matching pipeline remains
wire-equivalent.
oferchen added a commit that referenced this pull request May 16, 2026
…4171)

Captures evidence that the cumulative zsync-inspired matching work
(#2059-#2087) plus its three recent follow-up PRs (#4164 adversarial
fixtures, #4166 arena allocator feasibility audit, #4169 zsync cleanup
audit) has not regressed upstream rsync interoperability.

Anchored on master commit 6a615aa where both the CI and Interop
Validation workflows are green. Records the per-version matrix
(3.0.9 / 3.1.3 / 3.4.1 / 3.4.2 native, plus protocols 28-32 against
3.4.2), the 71-scenario standalone list with the single pre-existing
known failure, and the delta-stats parity proof (oc-rsync and upstream
both report matched=91900) that the matching pipeline remains
wire-equivalent.
oferchen added a commit that referenced this pull request May 18, 2026
…2079 #2080) (#4164)

Combine the two zsync-prefilter adversarial regression gates the
optimisation tracker calls out in project_zsync_optimizations.md:

- #2079 shifted-insertion: extend the existing 32 KiB fixture with a
  1 MiB-basis case driven by a deterministic xorshift64* PRNG. Asserts
  that the delta references basis blocks on both sides of the insertion
  for a parameterised (N, M) vector covering aligned, sub-block,
  sub-word, and tail-edge insert positions. Lost-block tolerance is one
  block (the boundary block of an unaligned insert).

- #2080 sparse-match: new internal unit test sitting inside src/index/
  so it can probe the otherwise private bithash field without enlarging
  the public API. Builds a 100 MiB basis + 100 MiB target whose only
  overlap is a single 4 KiB planted region, then drives the rolling
  window over the full target asserting (a) exactly one block-aligned
  match per basis block in the planted region, (b) the match's offset
  and length, and (c) a >= 70 % bithash rejection rate over the
  non-planted region. The 100 MiB case is marked #[ignore] so the
  default nextest run stays fast; stress runs invoke it explicitly via
  --run-ignored.

Both tests use deterministic xorshift PRNGs so the inputs are
reproducible without committing binary fixtures to the test tree.

Files:
- crates/matching/tests/shifted_insertion_fixture.rs (+234)
- crates/matching/src/index/sparse_match_tests.rs (new, +312)
- crates/matching/src/index/mod.rs (+2, register submodule)
oferchen added a commit that referenced this pull request May 18, 2026
…4171)

Captures evidence that the cumulative zsync-inspired matching work
(#2059-#2087) plus its three recent follow-up PRs (#4164 adversarial
fixtures, #4166 arena allocator feasibility audit, #4169 zsync cleanup
audit) has not regressed upstream rsync interoperability.

Anchored on master commit bfb940e where both the CI and Interop
Validation workflows are green. Records the per-version matrix
(3.0.9 / 3.1.3 / 3.4.1 / 3.4.2 native, plus protocols 28-32 against
3.4.2), the 71-scenario standalone list with the single pre-existing
known failure, and the delta-stats parity proof (oc-rsync and upstream
both report matched=91900) that the matching pipeline remains
wire-equivalent.
oferchen added a commit that referenced this pull request May 18, 2026
…2079 #2080) (#4164)

Combine the two zsync-prefilter adversarial regression gates the
optimisation tracker calls out in project_zsync_optimizations.md:

- #2079 shifted-insertion: extend the existing 32 KiB fixture with a
  1 MiB-basis case driven by a deterministic xorshift64* PRNG. Asserts
  that the delta references basis blocks on both sides of the insertion
  for a parameterised (N, M) vector covering aligned, sub-block,
  sub-word, and tail-edge insert positions. Lost-block tolerance is one
  block (the boundary block of an unaligned insert).

- #2080 sparse-match: new internal unit test sitting inside src/index/
  so it can probe the otherwise private bithash field without enlarging
  the public API. Builds a 100 MiB basis + 100 MiB target whose only
  overlap is a single 4 KiB planted region, then drives the rolling
  window over the full target asserting (a) exactly one block-aligned
  match per basis block in the planted region, (b) the match's offset
  and length, and (c) a >= 70 % bithash rejection rate over the
  non-planted region. The 100 MiB case is marked #[ignore] so the
  default nextest run stays fast; stress runs invoke it explicitly via
  --run-ignored.

Both tests use deterministic xorshift PRNGs so the inputs are
reproducible without committing binary fixtures to the test tree.

Files:
- crates/matching/tests/shifted_insertion_fixture.rs (+234)
- crates/matching/src/index/sparse_match_tests.rs (new, +312)
- crates/matching/src/index/mod.rs (+2, register submodule)
oferchen added a commit that referenced this pull request May 18, 2026
…4171)

Captures evidence that the cumulative zsync-inspired matching work
(#2059-#2087) plus its three recent follow-up PRs (#4164 adversarial
fixtures, #4166 arena allocator feasibility audit, #4169 zsync cleanup
audit) has not regressed upstream rsync interoperability.

Anchored on master commit 7329d95 where both the CI and Interop
Validation workflows are green. Records the per-version matrix
(3.0.9 / 3.1.3 / 3.4.1 / 3.4.2 native, plus protocols 28-32 against
3.4.2), the 71-scenario standalone list with the single pre-existing
known failure, and the delta-stats parity proof (oc-rsync and upstream
both report matched=91900) that the matching pipeline remains
wire-equivalent.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant