Skip to content

docs(design): intra-file parallel rolling-hash for large files (#2206)#4222

Merged
oferchen merged 1 commit into
masterfrom
docs/delta-intra-file-parallel-design-2206
May 17, 2026
Merged

docs(design): intra-file parallel rolling-hash for large files (#2206)#4222
oferchen merged 1 commit into
masterfrom
docs/delta-intra-file-parallel-design-2206

Conversation

@oferchen
Copy link
Copy Markdown
Owner

Summary

Recommendation

Defer. Do not implement intra-file parallel rolling-hash until the MD4/MD5 multibuf SIMD benches (#4189, #4191) show single-thread strong-checksum verify is no longer the hot stage on representative workloads. Either lifts single-thread throughput enough that intra-file parallelism becomes unnecessary, or it does not - the decision must rest on bench numbers, not design speculation. A prototype-behind-feature path is rejected because behavioural divergence between gated and default builds is exactly what the golden-byte parity tests are designed to catch.

Wire-format

No wire-format change. Token stream MUST be byte-identical to the serial matcher's output for every input. crates/protocol/tests/golden/ and the interop matrix must remain green for any future implementation PR.

Test plan

  • Pure docs change - no code touched.
  • cargo fmt --all clean (verified before push).
  • Doc has no AI tooling references, no em-dashes.
  • All file:line citations resolve in the current tree.

Records the design space for splitting a single large basis file into
overlapping windows and running parallel rolling-hash matchers. Cites
the current sequential scan in crates/matching and crates/transfer,
proposes the MIN_PARALLEL_FILE_SIZE_BYTES threshold, and recommends
deferring the implementation until MD4/MD5 multibuf SIMD benches
(#4189, #4191) show single-thread strong-checksum verify headroom is
exhausted.
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label May 17, 2026
@oferchen oferchen merged commit 372b111 into master May 17, 2026
8 checks passed
@oferchen oferchen deleted the docs/delta-intra-file-parallel-design-2206 branch May 17, 2026 19:18
oferchen added a commit that referenced this pull request May 18, 2026
#4222)

Records the design space for splitting a single large basis file into
overlapping windows and running parallel rolling-hash matchers. Cites
the current sequential scan in crates/matching and crates/transfer,
proposes the MIN_PARALLEL_FILE_SIZE_BYTES threshold, and recommends
deferring the implementation until MD4/MD5 multibuf SIMD benches
(#4189, #4191) show single-thread strong-checksum verify headroom is
exhausted.
oferchen added a commit that referenced this pull request May 18, 2026
#4222)

Records the design space for splitting a single large basis file into
overlapping windows and running parallel rolling-hash matchers. Cites
the current sequential scan in crates/matching and crates/transfer,
proposes the MIN_PARALLEL_FILE_SIZE_BYTES threshold, and recommends
deferring the implementation until MD4/MD5 multibuf SIMD benches
(#4189, #4191) show single-thread strong-checksum verify headroom is
exhausted.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant