perf(encoder): inline hash-chain walk into hash_chain_candidate (lazy L1) by polaz · Pull Request #185 · structured-world/structured-zstd

polaz · 2026-05-19T07:11:50Z

Summary

Inline the hash-chain walk directly into hash_chain_candidate to
eliminate the 4 KiB-per-call stack array that chain_candidates
materialized. With lazy_depth = 2 (levels 7+), pick_lazy_match
triggers three chain walks per committed position, so the array form
cost ~12 KiB of stack zero-fill + return-copy traffic per accepted
match before any useful comparison happened. Donor
ZSTD_HcFindBestMatch runs a single fused loop with no intermediate
buffer; this mirrors that.

chain_candidates itself stays live — the chain-walk unit tests
drive it directly, and the BT-optimal HC candidate collector in
match_generator.rs (around line 2437) consumes it through a macro
pipeline that inherits the array form. Inlining the array out of that
BT-optimal site is a separate, larger refactor and is NOT in this PR.

Scope (only lazy hot path)

Single file changed: zstd/src/encoding/hc/mod.rs.

hash_chain_candidate: chain walk is now inlined. Loop body fuses
candidate-position extraction, range check, donor speculative tail
gate, common_prefix_len, extend_backwards, and best update.
chain_candidates: unchanged signature and behavior, still used by
the BT-optimal HC collector and the chain-walk unit tests.
No public API changes, no behavior changes outside the lazy band's
internal candidate selection (same candidates considered, same
better_candidate ordering).

Measurements

compress/level_{5,8,12,15}_lazy/decodecorpus-z000033/matrix/pure_rust
— criterion 10 samples each, clean back-to-back vs origin/main,
p = 0.00 across all four cells:

level	main thrpt	branch thrpt	speedup
L5 lazy	13.5 MiB/s	25.8 MiB/s	1.91×
L8 lazy	9.6 MiB/s	17.0 MiB/s	1.77×
L12 lazy	8.3 MiB/s	14.0 MiB/s	1.69×
L15 lazy	8.1 MiB/s	13.7 MiB/s	1.70×

Ratio (full lazy × scenario matrix via REPORT lines, 77 cells):
bit-identical to origin/main. No ratio change anywhere, no
correctness change — the inlined walk visits the same chain links in
the same order and applies the same predicates.

Verification

534/534 lib tests pass (debug profile)
lint pass — clean
format check — clean
Ratio matrix unchanged vs main

Out of scope (tracked in #184)

L2: fuse chain-walk and speculative gate further; add donor
PREFETCH_L1(chain_table[next & chain_mask])
L3: share rep + chain results across the lazy lookahead at
pos, pos+1, pos+2
L4: validate target_len early-exit parity vs donor
BT-optimal chain_candidates callsite inlining

Summary by CodeRabbit

Refactor
- Optimized internal compression matching logic to improve efficiency and reduce memory overhead.

`hash_chain_candidate` previously consumed the output of `chain_candidates`, which returned `[usize; MAX_HC_SEARCH_DEPTH]` — a 4 KiB stack array that was zero-filled on entry and returned by value. With `lazy_depth = 2` (levels 7+) `pick_lazy_match` runs three chain walks per committed position, so the array form spent ~12 KiB of stack zero-fill and return-copy traffic per accepted match before any useful work happened. Inline the chain walk directly into `hash_chain_candidate`: one fused loop that produces a candidate, runs the donor speculative tail check, runs `common_prefix_len`, and updates `best` — no intermediate buffer. Mirrors donor `zstd_lazy.c` `ZSTD_HcFindBestMatch`, which never materializes a candidate array. `chain_candidates` is kept as the dump-style helper that the chain-walk unit tests still drive directly. Verified on `compress/level_{5,8,12,15}_lazy/decodecorpus-z000033/matrix/pure_rust` (criterion 10 samples, clean back-to-back vs origin/main, p = 0.00 across the board): | level | main thrpt | this thrpt | speedup | |---|---|---|---| | L5 lazy | 13.5 MiB/s | 25.8 MiB/s | 1.91× | | L8 lazy | 9.6 MiB/s | 17.0 MiB/s | 1.77× | | L12 lazy | 8.3 MiB/s | 14.0 MiB/s | 1.69× | | L15 lazy | 8.1 MiB/s | 13.7 MiB/s | 1.70× | Ratio matrix (lazy band × all 7 scenarios): bit-identical to origin/main. 534/534 lib tests pass, clippy and fmt clean. Part of #184.

coderabbitai · 2026-05-19T07:12:14Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 25e490ea-8caa-4be8-ae24-5f648802c099

📥 Commits

Reviewing files that changed from the base of the PR and between 62b8f5e and cc63fe2.

📒 Files selected for processing (1)

zstd/src/encoding/hc/mod.rs

📝 Walkthrough

Walkthrough

HcMatcher::hash_chain_candidate is refactored to inline hash-chain traversal with self-loop detection and speculative 4-byte tail gating. The chain is walked directly using cached hash-table state, candidates are filtered to the live window, and matching is evaluated with a gate that skips expensive prefix computation when monotonicity fails.

Changes

Hash-chain candidate matching optimization

Layer / File(s)	Summary
Inline chain walk initialization `zstd/src/encoding/hc/mod.rs`	Chain traversal is set up inline: hash chain and mask are computed, the current chain cursor is initialized, and max iteration steps are capped by `min(self.search_depth, MAX_HC_SEARCH_DEPTH)`.
Speculative matching in chain traversal loop `zstd/src/encoding/hc/mod.rs`	The while-loop walks the hash chain with self-loop detection, filters candidates to the live window `[history_abs_start, abs_pos)`, and applies speculative 4-byte tail gating only when `best` exists and `new_offset >= best.offset`. When gating fails, the candidate is skipped; otherwise full `common_prefix_len` and backward extension are performed. Early return triggers when `best.match_len >= target_len`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

structured-world/structured-zstd#184: Implements the L1 subtask of inlining chain_candidates() into hash_chain_candidate and removing the intermediate candidate buffer.

Possibly related PRs

structured-world/structured-zstd#125: Overlaps with speculative 4-byte tail check implementation and regression tests for HC matching in the same method.

Poem

🐰 I hopped along the hash-chain line,
Sniffed self-loops, skipped the wasted time,
Tail-gated matches, quick and lean,
No buffer baggage in between,
A tiny hop for faster rhyme.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically summarizes the main change: inlining the hash-chain walk into hash_chain_candidate for performance optimization in the encoder, with scope limited to the lazy L1 path.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/#184-lazy-investigation

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-05-19T07:14:44Z

Codecov Report

❌ Patch coverage is 98.14815% with 1 line in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
zstd/src/encoding/hc/mod.rs	98.14%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copilot

Pull request overview

This PR optimizes the HC (hash-chain) match finder by inlining the chain-walk loop directly into HcMatcher::hash_chain_candidate, avoiding materializing a large fixed-size candidate buffer on the stack and reducing per-position stack traffic in lazy parsing.

Changes:

Inline the hash-chain walk into hash_chain_candidate (replacing the chain_candidates() buffer materialization in this hot path).
Preserve existing behaviors during the walk (window filtering, speculative tail gating, self-loop handling, and search depth cap).

Two doc-only adjustments to the inlined chain walk: - Outer rationale block: correct the claim that `chain_candidates` is a test-only helper. It is still consumed by the BT-optimal HC candidate collector in match_generator.rs (around the `chain_candidates(...).into_iter()` callsite). Inlining the array out of that BT path is a separate refactor and is called out as out-of-scope. - Per-iteration block inside the chain loop: drop the duplicate speculative-tail-gate rationale that restated the outer block. Keep one short pointer to the outer comment so the hot path stays readable. No code-behavior change; 534/534 lib tests pass, clippy and fmt clean.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

Copilot AI review requested due to automatic review settings May 19, 2026 07:11

Copilot started reviewing on behalf of polaz May 19, 2026 07:12 View session

Copilot AI reviewed May 19, 2026

View reviewed changes

Comment thread zstd/src/encoding/hc/mod.rs Outdated

Comment thread zstd/src/encoding/hc/mod.rs Outdated

Comment thread zstd/src/encoding/hc/mod.rs

polaz requested a review from Copilot May 19, 2026 07:27

Copilot started reviewing on behalf of polaz May 19, 2026 07:28 View session

Copilot AI reviewed May 19, 2026

View reviewed changes

polaz merged commit af4fddd into main May 19, 2026
25 checks passed

polaz deleted the feat/#184-lazy-investigation branch May 19, 2026 07:34

sw-release-bot Bot mentioned this pull request May 19, 2026

chore: release v0.0.22 #156

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(encoder): inline hash-chain walk into hash_chain_candidate (lazy L1)#185

perf(encoder): inline hash-chain walk into hash_chain_candidate (lazy L1)#185
polaz merged 2 commits into
mainfrom
feat/#184-lazy-investigation

polaz commented May 19, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 19, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

Uh oh!

codecov Bot commented May 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

polaz commented May 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Scope (only lazy hot path)

Measurements

Verification

Out of scope (tracked in #184)

Related

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

Uh oh!

codecov Bot commented May 19, 2026

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

polaz commented May 19, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 19, 2026 •

edited

Loading