perf(encoder): port donor ZSTD_compressBlock_fast — 4-cursor + per-level cParams + cmov + window-correctness (#198 phase 3) by polaz · Pull Request #221 · structured-world/structured-zstd

polaz · 2026-05-21T23:02:29Z

Summary

Native Rust port of donor's ZSTD_compressBlock_fast_noDict_generic (zstd_fast.c:192-456) — closes phase 3 of #198. Restores Fast-strategy throughput after the phase 1b regression and aligns encoder behaviour with donor parity on every front that affects ratio / speed / format correctness.

Squashed from PR #219 (41 review threads addressed across 10 rounds) into a single coherent commit.

Headline results

large-log-stream (25 MiB dense corpus, i9-9900K):

State	Time vs main	Throughput
main (pre-phase-1b)	0%	~290 MiB/s
#217 merged (phase 1b)	+122%	~235 MiB/s
This PR	−22%	~790 MiB/s

Net change phase 1b → phase 3: ~3.4× faster.

Ratio audit on `decodecorpus-z000033`

L-7..L-1 Fast: monotone acceleration gradient restored (−1.85% to −4.79% vs donor)
L2-3 Dfast: parity or slight win (−0.94% to −4.61%)
L5-15 Lazy: consistently beat donor by −5.7..−6.18%
L16-17 btopt: parity / win (−0.33% to −3.59%)
L20-22 btultra2: parity (±0.2%)
L1 Fast: +7.43% residual gap → follow-up perf(encoder): Fast L1 +7.43% ratio gap vs donor on decodecorpus-z000033 #220

Milestones (squashed from PR #219)

M1 — per-level fast_hash_log/fast_mls/fast_step_size threading through LevelParams
M2 — full 4-cursor ip0/ip1/ip2/ip3 lookahead body + immediate-rep2 inner loop
M3 — cmov match-found variant + per-window dispatch (10 monomorphisations: mls 4..=8 × use_cmov true/false)
M4 — beyond-donor fast_hash_log: 13 → 14 for negative levels (+32 KB memory, 2× fewer collisions on structured corpora)
M5 — reverted (adaptive mls peek did not pay off)
M6 — per-level fast_step_size from donor's targetLength = -level formula; restores acceleration gradient
M7 — added donor's missing current0+2 hash insertion after each match emit (zstd_fast.c:407); raised L1/decodecorpus sequence-match-rate 43.1% → 57.7%
M8 — dropped RESERVED_PREFIX_BYTES dummy byte; history layout now donor-parity, sentinel-0 protection via INITIAL_PREFIX_START_INDEX = 1 filter
M9 — sliding prefix floor enforces the advertised frame window (1 << window_log), NOT the dictionary-budget-inflated max_window_size. Prevents emitting offsets > advertised window during dictionary-primed compression — would otherwise produce format-invalid frames.

API changes

FastKernelMatcher::with_params(window_log, hash_log, mls, step_size) — step_size promoted to 4th positional arg
FastKernelMatcher::reset(...) — same signature change
set_step_size post-init setter removed (closed gap where new code paths could silently fall back to step_size=2)
RESERVED_PREFIX_BYTES: 1 → 0; new INITIAL_PREFIX_START_INDEX = 1 constant

Tests

577 nextest, all passing (+13 new tests covering per-level dispatch, cmov vs branch parity, cmov out-of-window false-positive, backward extension, sliding-floor enforcement on inflated max_window_size)
cross_validation FFI roundtrips on every level 1..=22 (dict + no-dict) — wire-format interop preserved
clippy clean

Out of scope (follow-ups)

perf(encoder): Fast L1 +7.43% ratio gap vs donor on decodecorpus-z000033 #220 — Fast L1 +7.43% ratio residual on decodecorpus
L18-19 btultra +3.5% — separate ratio investigation

Closes #198 phase 3.

Summary by CodeRabbit

Bug Fixes
- Stronger window-bound enforcement and out-of-window match rejection so emitted offsets never exceed the advertised decoder window.
Refactor
- Per-level fast tuning (including explicit step-size) exposed and threaded through match engines and driver.
- History/sentinel layout simplified to remove reserved dummy bytes and tighten hash/drain invariants.
- Inner compression loop reorganized into a donor-style pipeline with explicit step handling and safer bounds checks.
New Features
- Optional branchless (cmov) match path with parity validation versus the branch variant.
Tests
- New unit and regression tests for per-level tuning, driver wiring, cmov parity, match-found correctness, and window-bound behavior.

… + per-level cParams + cmov dispatch + window-correctness (#198 phase 3) Closes phase 3 of #198: native Rust port of donor's `ZSTD_compressBlock_fast_noDict_generic` (`zstd_fast.c:192-456`) restoring Fast-strategy throughput after the phase 1b regression and aligning encoder behaviour with donor parity on every front that affects ratio / speed / format correctness. ## Headline results `large-log-stream` (25 MiB dense corpus, i9-9900K): | State | Time vs main | Throughput | |------------------------------|-------------:|---------------:| | main (pre-phase-1b) | 0% | ~290 MiB/s | | #217 merged (phase 1b) | +122% | ~235 MiB/s | | **This PR (phase 3, M1-M9)** | **−22%** | **~790 MiB/s** | Net change phase 1b → phase 3: ~3.4× faster. ## Ratio audit on `decodecorpus-z000033` - L-7..L-1 Fast: monotone acceleration gradient restored (−1.85% to −4.79% vs donor) — pre-M6 these levels all produced identical 585601 bytes - L2-3 Dfast: parity or slight win (-0.94% to -4.61%) - L5-15 Lazy: consistently beat donor by −5.7% to −6.18% - L16-17 btopt: parity / win (−0.33% to −3.59%) - L20-22 btultra2: parity (±0.2%) - L1 Fast: +7.43% residual gap — tracked as follow-up #220 ## Milestones (squashed) - M1: per-level `fast_hash_log`/`fast_mls`/`fast_step_size` threading through `LevelParams` - M2: full 4-cursor `ip0/ip1/ip2/ip3` lookahead body + immediate-rep2 inner loop ported from `zstd_fast.c` - M3: `cmov` match-found variant + per-window dispatch surface (10 monomorphisations across `mls` 4..=8 × `use_cmov` true/false) - M4: beyond-donor `fast_hash_log: 13 → 14` for negative levels (+32 KB memory, 2× fewer collisions on structured corpora) - M5: reverted (adaptive mls peek did not pay off) - M6: per-level `fast_step_size` from donor's `targetLength = -level` formula; restores acceleration gradient - M7: added donor's missing `current0+2` hash insertion after each match emit (`zstd_fast.c:407`); raised L1/decodecorpus sequence-match-rate 43.1% → 57.7% - M8: dropped `RESERVED_PREFIX_BYTES` dummy byte; history layout now donor-parity, sentinel-0 protection via `INITIAL_PREFIX_START_INDEX = 1` filter - M9 (format correctness): sliding prefix floor at scan time enforces the advertised frame window (`1 << window_log`), NOT the dictionary-budget-inflated `max_window_size`. Prevents emitting offsets > advertised window during dictionary-primed compression — would otherwise produce format-invalid frames. Regression test included. ## API changes (driver) - `FastKernelMatcher::with_params(window_log, hash_log, mls, step_size)` — step_size promoted to 4th positional arg; previous `set_step_size` post-init setter removed (closed the gap where new code paths could silently fall back to default step_size=2) - `FastKernelMatcher::reset(window_log, hash_log, mls, step_size)` — same signature change - `RESERVED_PREFIX_BYTES`: 1 → 0 (no dummy region); legacy name retained for drain-offset math - New `INITIAL_PREFIX_START_INDEX = 1` constant for sentinel-0 filter baseline ## Kernel structure - `match_found<USE_CMOV>` — branchless cmov / branch dispatch via const generic; `# Safety` documented; bitwise `&` intentional to preserve cmov ordering (donor `__asm__("")` equivalent) - `MatchFound::{Rep,Explicit}` variants carry explicit `current0` (donor's writeback position) — correct on both probe paths, pre-backward-extension - `ip3 > ilimit` exit (not `>=`) — recovers the last hashable position at end-of-block - Two hash insertions post-match-emit: at `current0+2` (donor zstd_fast.c:407, was missing) and `ip0-2` ## Tests - 577 nextest, all passing - New per-level dispatch test pinning hash_log / mls / step_size for L1, Fastest, Uncompressed, L-1..L-7 - cmov vs branch byte-for-byte equality + cmov out-of-window false-positive regression - explicit_match_backward_extension deterministic via marker-byte layout (asserts match_len ≥ 5 + literals don't end with marker) - start_matching_enforces_max_window_size_offset_bound - start_matching_caps_offsets_at_window_log_not_inflated_max (M9 format-correctness regression) - cross_validation FFI roundtrips on every level 1..=22 (dict + no-dict) — wire-format interop preserved ## Out of scope (follow-ups) - #220 — Fast L1 +7.43% ratio residual on decodecorpus - L18-19 btultra +3.5% — separate ratio investigation

coderabbitai · 2026-05-21T23:02:35Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4dc128a3-ed6b-4b09-85e3-e82872fee342

📥 Commits

Reviewing files that changed from the base of the PR and between 3d150f1 and e6153f2.

📒 Files selected for processing (1)

zstd/src/encoding/match_generator.rs

📝 Walkthrough

Walkthrough

LevelParams now exposes per-level Fast tuning and resolve_level_params/LEVEL_TABLE populate them; MatchGeneratorDriver threads fast_* into FastKernelMatcher::with_params/reset; FastKernelMatcher stores step_size/use_cmov and removes the dummy-prefix model; compress_block_fast becomes a 4-cursor donor pipeline with a USE_CMOV const generic; tests updated.

Changes

Per-level Fast kernel parameter dispatch and donor pipeline

Layer / File(s)	Summary
LevelParams schema and per-level fast tuning `zstd/src/encoding/match_generator.rs`	`LevelParams` removes `hash_fill_step` and adds `fast_hash_log`, `fast_mls`, `fast_step_size`; `LEVEL_TABLE` and `resolve_level_params` updated (Uncompressed, Fastest, negative levels, Level(22) tier); unit tests validate resolved fast params.
MatchGeneratorDriver parameter threading `zstd/src/encoding/match_generator.rs`	Driver `new`/`reset` now pass per-level `params.fast_*` into `FastKernelMatcher::with_params`/`reset` instead of hardcoded donor-level defaults.
Fast kernel 4-cursor donor pipeline and cmov variant `zstd/src/encoding/simple/fast_kernel/kernel.rs`	`compress_block_fast` gains const generic `USE_CMOV` and `step_size` parameter; `K_STEP_INCR` and `CMOV_DUMMY` introduced; `match_found<const USE_CMOV>` replaces prior helper; main loop rewritten to a 4-cursor donor pipeline with MatchFound-driven probes, backward/forward extension, and immediate rep2 inner loop; tests updated (cmov vs branch equivalence, sentinel rejection).
FastKernelMatcher state and per-level threading `zstd/src/encoding/simple/fast_matcher.rs`	Matcher stores `use_cmov` (derived from window_log) and `step_size`; `with_params`/`reset` signatures accept `step_size` and validate it; history/prefix invariants change (HISTORY_DRAIN_BASE=0, INITIAL_PREFIX_START_INDEX=1); `drain_real_prefix`, eviction, and `start_matching` reworked; kernel dispatch selects `compress_block_fast::<MLS, use_cmov>` at runtime and passes `step_size`.
Prefix/layout and window safety tests `zstd/src/encoding/simple/fast_matcher.rs`, `zstd/src/encoding/simple/fast_kernel/kernel.rs`	Reserved dummy prefix removed and many tests updated to the new baseline; added regression tests asserting emitted offsets are capped by the advertised frame window after dictionary priming; test harness call sites updated to new kernel signature and matcher ctor/reset.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

perf(fast): port donor ZSTD_compressBlock_fast — 22× regression on Fast strategy #198: Implements the donor-style Fast kernel port goals (4-cursor pipeline, raw match probes) that this PR advances.

Possibly related PRs

structured-world/structured-zstd#217: Related work threading fast-kernel params and matcher wiring.
structured-world/structured-zstd#215: Prior donor-shape fast kernel implementation; this PR refactors/extends that kernel.
perf(encoder): bump Fast window_log to 19 (donor parity, large-log-stream ratio fix) #187: Related edits to level tuning/window sizing that overlap with driver/matcher wiring.

"I hop through hashes, four cursors in flight,
No dummy prefix now — the window's kept tight,
Step-size and cmov hum tuned per level,
Sequences march out, precise and on bevel,
A rabbit's small cheer for a kernel done right." 🐇✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: porting donor ZSTD_compressBlock_fast with 4-cursor optimization, per-level compression parameters, cmov variant, and window correctness fix. It is specific, concise, and clearly summarizes the primary contribution.
Linked Issues check	✅ Passed	All primary objectives from issue `#198` are met: Vec hash table [✓], 4-cursor pipeline [✓], donor-style matching [✓], backward extension [✓], cmov variant [✓], per-level parameter threading [✓], and window-correctness enforcement [✓].
Out of Scope Changes check	✅ Passed	All changes are directly scoped to issue `#198`: match_generator.rs refactors level parameter handling, kernel.rs ports the 4-cursor fast compression logic, and fast_matcher.rs integrates per-level parameters and window-correctness fixes. No unrelated modifications detected.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch perf/#198-fast-matcher-phase3

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

Ports donor’s ZSTD_compressBlock_fast_noDict_generic behavior into the Rust Fast encoder path to restore Fast-strategy throughput and improve donor-parity correctness (including window/offset correctness under dictionary priming).

Changes:

Extend FastKernelMatcher to carry per-level Fast cParams (hash_log, mls, step_size) and derive use_cmov from window_log.
Replace the Fast kernel’s phase-1 single-cursor loop with the donor-style 4-cursor (ip0/ip1/ip2/ip3) pipeline, step acceleration, immediate-rep2 loop, and branch/cmov match-found variants via const generics.
Wire per-level Fast parameters through LevelParams and MatchGeneratorDriver reset/backend swap paths; add regression tests for per-level dispatch.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
`zstd/src/encoding/simple/fast_matcher.rs`	Threads per-level params into matcher lifecycle, adds cmov + step_size plumbing, and enforces advertised-window prefix flooring for format correctness.
`zstd/src/encoding/simple/fast_kernel/kernel.rs`	Implements donor 4-cursor fast block compressor, adds cmov/branch match-found variants, and step_size-driven acceleration + immediate-rep2 logic.
`zstd/src/encoding/match_generator.rs`	Introduces Fast-specific per-level parameters in `LevelParams`, wires them into Simple backend creation/reset, and adds dispatch tests.

codecov · 2026-05-21T23:05:31Z

Codecov Report

❌ Patch coverage is 91.51625% with 47 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
zstd/src/encoding/simple/fast_matcher.rs	79.58%	39 Missing ⚠️
zstd/src/encoding/simple/fast_kernel/kernel.rs	97.04%	8 Missing ⚠️

📢 Thoughts on this report? Let us know!

polaz · 2026-05-21T23:18:24Z

@coderabbitai full review

coderabbitai · 2026-05-21T23:18:29Z

✅ Actions performed

Full review triggered.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@zstd/src/encoding/match_generator.rs`:
- Around line 8157-8204: Extend the test to actually exercise the driver wiring
by creating a MatchGeneratorDriver, for several Fast levels call the same
FastKernelMatcher::with_params(...) / m.reset(...) path used in the code (the
sites where you wire resolve_level_params into the matcher) and then assert the
inner matcher observed tuple equals (fast_hash_log, fast_mls, fast_step_size)
from resolve_level_params; in practice, for each chosen CompressionLevel call
resolve_level_params(level, None), reset the driver/matcher the same way the
production code does (the FastKernelMatcher::with_params(...) / m.reset(...)
calls), then read the matcher's effective (hash_log, mls, step_size) fields and
assert they match p.fast_hash_log, p.fast_mls, p.fast_step_size to catch any
swapping or stale values.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 65027f1a-c31d-4489-837e-16337515e0bf

📥 Commits

Reviewing files that changed from the base of the PR and between 6d9ac63 and 123cf23.

📒 Files selected for processing (3)

zstd/src/encoding/match_generator.rs
zstd/src/encoding/simple/fast_kernel/kernel.rs
zstd/src/encoding/simple/fast_matcher.rs

…atorDriver::reset (#221 CR nitpick) CR noted the existing per-level dispatch test only validated resolve_level_params() output. If with_params(...) / m.reset(...) on the Fast wiring sites swapped fast_hash_log and fast_mls, or kept a stale fast_step_size across resets, the existing test would still pass — it never inspected the inner matcher. This adds fast_levels_driver_wiring_threads_cparams_into_inner_matcher: for every Fast level (Level(1), Fastest, Uncompressed, L-1..L-7) the test calls Matcher::reset on a real MatchGeneratorDriver, then reaches into the simple-backend FastKernelMatcher and asserts (hash_log, mls, step_size) == resolve_level_params(level). Added test-only crate getters on FastKernelMatcher (step_size / hash_log / mls) so the assertion can read the effective tuple without piercing private fields. 578/578 tests pass (+1 new); clippy clean. Part of #198 phase 3.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

Doc comment listed (window_log, hash_log, mls) triple but the function signature includes step_size as 4th positional argument.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

zstd/src/encoding/simple/fast_matcher.rs (1)
319-322: 🧹 Nitpick | 🔵 Trivial | 💤 Low value

Minor: test helper returns max_window_size, not 1 << window_log.

The docstring says "Equals 1 << window_log", but the implementation returns self.max_window_size. These diverge after dictionary priming inflates max_window_size. Since this is test-only and existing tests only call it before any inflation, this works correctly today, but the implementation could be made consistent with the docstring by returning 1u64 << self.window_log.
Align implementation with docstring
     pub(crate) fn window_size(&self) -> u64 {
-        self.max_window_size as u64
+        1u64 << self.window_log
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@zstd/src/encoding/simple/fast_matcher.rs` around lines 319 - 322, The test
helper window_size currently returns self.max_window_size which diverges from
the docstring; change it to compute and return the canonical window size as 1u64
<< self.window_log (use the struct's window_log field) so window_size() matches
the documented "Equals `1 << window_log`" behavior instead of returning
max_window_size.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@zstd/src/encoding/simple/fast_matcher.rs`:
- Around line 319-322: The test helper window_size currently returns
self.max_window_size which diverges from the docstring; change it to compute and
return the canonical window size as 1u64 << self.window_log (use the struct's
window_log field) so window_size() matches the documented "Equals `1 <<
window_log`" behavior instead of returning max_window_size.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a45e61e5-6ae4-443a-b46f-4212e0484273

📥 Commits

Reviewing files that changed from the base of the PR and between 21887be and f0e8f48.

📒 Files selected for processing (1)

zstd/src/encoding/simple/fast_matcher.rs

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

window_size docstring states 'Equals 1 << window_log' but body returned self.max_window_size. Both fields are kept in sync, but reading window_log makes the helper literally match the documented contract.

…_BASE + hot-path debug_assert The constant was set to 0 (M8 dropped the dummy-prefix region) so the old name no longer reflected what the value represents. Renaming to HISTORY_DRAIN_BASE matches its actual role: the drain start offset used by eviction paths, with sentinel-0 protection living entirely in INITIAL_PREFIX_START_INDEX. Also downgrade the step_size >= 2 check inside compress_block_fast from assert! to debug_assert! — with_params / reset already validate this at matcher-construction time, so the release hot path no longer needs the per-block branch + formatting machinery.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

zstd/src/encoding/simple/fast_kernel/kernel.rs (1)
253-263: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep step_size validated in release builds.

compress_block_fast is still a safe pub(crate) boundary. Downgrading this to debug_assert! makes invalid step_size inputs silently build-dependent for any direct caller that bypasses FastKernelMatcher, even though the loop cadence assumes donor's step_size >= 2 contract.
Suggested fix
-    debug_assert!(
+    assert!(
         step_size >= 2,
         "Fast kernel requires step_size >= 2 (got {step_size}); \
          the donor formula clamps to a min of 2",
     );
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@zstd/src/encoding/simple/fast_kernel/kernel.rs` around lines 253 - 263, The
check that enforces step_size >= 2 must run in release builds too: replace the
debug-only check in compress_block_fast with a runtime validation (e.g., assert!
or an explicit if that panics with the same message) so invalid callers that
bypass FastKernelMatcher (and its with_params/reset guards) cannot supply
step_size < 2; update the message to match the existing text ("Fast kernel
requires step_size >= 2 (got {step_size}); the donor formula clamps to a min of
2") so callers of compress_block_fast get a clear, release-time failure.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@zstd/src/encoding/simple/fast_kernel/kernel.rs`:
- Around line 253-263: The check that enforces step_size >= 2 must run in
release builds too: replace the debug-only check in compress_block_fast with a
runtime validation (e.g., assert! or an explicit if that panics with the same
message) so invalid callers that bypass FastKernelMatcher (and its
with_params/reset guards) cannot supply step_size < 2; update the message to
match the existing text ("Fast kernel requires step_size >= 2 (got {step_size});
the donor formula clamps to a min of 2") so callers of compress_block_fast get a
clear, release-time failure.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 55dbb7ed-3e5b-4bd3-90e0-44c4aca3f9c6

📥 Commits

Reviewing files that changed from the base of the PR and between f0e8f48 and 492e05b.

📒 Files selected for processing (2)

zstd/src/encoding/simple/fast_kernel/kernel.rs
zstd/src/encoding/simple/fast_matcher.rs

compress_block_fast is a safe pub(crate) boundary, so the step_size contract must hold in release builds too — a direct caller that bypasses FastKernelMatcher::with_params / reset would otherwise silently mis-iterate the loop cadence under release. The earlier downgrade to debug_assert! traded too much safety for a once-per- block branch that the predictor and per-block hash/probe work make negligible.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

…omments with M8 layout - prefix_start_index is pinned to INITIAL_PREFIX_START_INDEX (1), not HISTORY_DRAIN_BASE (0). The '1' floor is what rejects the hash table's all-zero empty-slot value, not a dummy prefix byte. - Pre-accept_data path: history is genuinely empty under M8 (no dummy seeding), so the empty slice comes from a zero-length range, not from a HISTORY_DRAIN_BASE-seeded dummy region.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

…ertions and comments Three test sites still spelled HISTORY_DRAIN_BASE 'dummy' / 'RESERVED dummy' even though M8 dropped the dummy region (HISTORY_DRAIN_BASE = 0). Reword each so the failure message describes what's actually being checked (history layout = retained prefix + last block / hashed range in absolute indices), not a layout that no longer exists.

Two related correctness gaps surfaced by CR: - with_params / reset accepted any window_log u8, but the kernel stores indices as u32 and computes max_window_size = 1usize << window_log. window_log >= usize::BITS would panic at runtime; values above 31 are unrepresentable as u32 indices regardless. Reject at construction with assert!(window_log <= 31). - compress_block_fast's 4-cursor loop computed ip2 = ip0 + step / ip3 = ip2 + 1 / next_step = ip0 + K_STEP_INCR with plain usize addition. Under a misbehaving caller (huge step_size) or after enough step-doubling iterations, the additions could wrap past usize::MAX, bypassing the ip3 > ilimit guard and turning the range check into a false-pass — then base.add(ip2) / hash reads would dereference an out-of-buffer position. Switch the cursor advances to checked_add (break the loop on overflow, identical drain path as the normal 'ran out of room' exit) and the step-doubling / restart paths to saturating_add.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

…s below u32::MAX with_params / reset previously accepted window_log up to 31, allowing max_window_size = 2^31. accept_data lets history grow to roughly 2 * max_window_size before draining (eager-eviction band, donor parity), so window_log = 31 could push history.len() to 2^32 and trip the kernel's data.len() > u32::MAX panic. Lower the cap to 30 — matches donor zstd's ZSTD_WINDOWLOG_MAX_64 — so the eviction band tops out at 2^31 < u32::MAX with headroom for the pending block.

polaz · 2026-05-22T08:01:17Z

@coderabbitai full review

coderabbitai · 2026-05-22T08:01:23Z

✅ Actions performed

Full review triggered.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

…kip unreachable index 0 on rehash Two CR findings on the post-window_log-cap revision: - accept_data function-level docs still described the window-budget eviction as 'enforced lazily on the actual append (inside extend_history_with_pending)'. The actual code does the eviction EAGERLY in accept_data so the driver's commit_space can observe the pre/post history.len() delta — without that visibility the dict-budget retire never fires. Rewrite the doc paragraph to match. - drain_real_prefix rebases prefix_start_index to INITIAL_PREFIX_START_INDEX (= 1) so the kernel's match_idx >= prefix_start_index filter rejects sentinel-0 lookups. The rehash loop was then primed from HISTORY_DRAIN_BASE (= 0), inserting one entry at position 0 that the kernel can never read back. Start the rehash at INITIAL_PREFIX_START_INDEX instead — same range coverage for every kernel-reachable byte, one fewer hash op and zero pollution of the freshly-cleared table.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

polaz · 2026-05-22T09:01:20Z

@coderabbitai full review

coderabbitai · 2026-05-22T09:01:26Z

✅ Actions performed

Full review triggered.

polaz · 2026-05-22T09:03:44Z

@coderabbitai full review

coderabbitai · 2026-05-22T09:03:50Z

✅ Actions performed

Full review triggered.

coderabbitai

♻️ Duplicate comments (1)

zstd/src/encoding/match_generator.rs (1)

8215-8262: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Exercise the backend-switch Fast path in this test.

This loop never leaves BackendTag::Simple, so it only validates FastKernelMatcher::reset(...); the FastKernelMatcher::with_params(...) wiring at Line 963 is still untested. Reset through any non-Fast level before each Fast case so the swap-in path is covered too.

🔧 Minimal test adjustment

     for &level in &fast_levels {
         let p = resolve_level_params(level, None);
@@
-        crate::encoding::Matcher::reset(&mut driver, level);
+        crate::encoding::Matcher::reset(&mut driver, CompressionLevel::Default);
+        crate::encoding::Matcher::reset(&mut driver, level);

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@zstd/src/encoding/match_generator.rs` around lines 8215 - 8262, The loop only
ever lives in BackendTag::Simple so it never exercises the backend-switch path
(FastKernelMatcher::with_params); before calling
crate::encoding::Matcher::reset(&mut driver, level) for each fast_levels entry,
first reset the driver with a non-Fast compression level (e.g., call
crate::encoding::Matcher::reset(&mut driver, CompressionLevel::Level(9)) or any
level that resolves to a non-Fast strategy) to force the matcher to swap back
into the Fast backend; this ensures the FastKernelMatcher::with_params wiring is
exercised for each case resolved by resolve_level_params and validates the
backend-switch path used by MatchGeneratorDriver::new / simple_mut().

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@zstd/src/encoding/match_generator.rs`:
- Around line 8215-8262: The loop only ever lives in BackendTag::Simple so it
never exercises the backend-switch path (FastKernelMatcher::with_params); before
calling crate::encoding::Matcher::reset(&mut driver, level) for each fast_levels
entry, first reset the driver with a non-Fast compression level (e.g., call
crate::encoding::Matcher::reset(&mut driver, CompressionLevel::Level(9)) or any
level that resolves to a non-Fast strategy) to force the matcher to swap back
into the Fast backend; this ensures the FastKernelMatcher::with_params wiring is
exercised for each case resolved by resolve_level_params and validates the
backend-switch path used by MatchGeneratorDriver::new / simple_mut().

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b3355e2b-5232-4664-acf6-ef1ca7a4d10a

📥 Commits

Reviewing files that changed from the base of the PR and between 70cb3e4 and 3d150f1.

📒 Files selected for processing (3)

zstd/src/encoding/match_generator.rs
zstd/src/encoding/simple/fast_kernel/kernel.rs
zstd/src/encoding/simple/fast_matcher.rs

…ckend-switch hop The Fast-levels wiring test stayed inside BackendTag::Simple for the entire loop, so every iteration validated only FastKernelMatcher::reset — never the with_params path that MatchGeneratorDriver::new / simple_mut take when swapping into the Fast variant from a different backend. Bounce through a non-Fast strategy (Level(9) → Lazy) before each Fast reset so the next reset actually triggers the backend-switch + with_params construction, covering both reset and with_params wiring per Fast level.

polaz · 2026-05-22T09:32:31Z

@coderabbitai Fixed in e6153f2 — wiring test now bounces through CompressionLevel::Level(9) (resolves to Lazy strategy) before each Fast reset, so the loop exercises the backend-switch path that goes through FastKernelMatcher::with_params on top of the reset-only path it was already covering.

coderabbitai · 2026-05-22T09:33:14Z

Tip

For best results, initiate chat on the files or code changes.

@polaz: Thanks for the fix — bouncing through Level(9) before each Fast reset is a clean way to exercise the with_params backend-switch path without duplicating test structure. The wiring test now covers both construction-from-scratch and reset-only paths together.

(∧˃ᵕ˂∧)♡

Copilot AI review requested due to automatic review settings May 21, 2026 23:02

Copilot started reviewing on behalf of polaz May 21, 2026 23:02 View session

Copilot AI reviewed May 21, 2026

View reviewed changes

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

Comment thread zstd/src/encoding/match_generator.rs

polaz added 2 commits May 22, 2026 03:06

Merge branch 'main' into perf/#198-fast-matcher-phase3

21887be

Copilot AI review requested due to automatic review settings May 22, 2026 04:12

Copilot started reviewing on behalf of polaz May 22, 2026 04:12 View session

Copilot AI reviewed May 22, 2026

View reviewed changes

Comment thread zstd/src/encoding/simple/fast_matcher.rs Outdated

docs(fast_matcher): correct with_params doc to include step_size

f0e8f48

Doc comment listed (window_log, hash_log, mls) triple but the function signature includes step_size as 4th positional argument.

polaz requested a review from Copilot May 22, 2026 04:26

Copilot started reviewing on behalf of polaz May 22, 2026 04:26 View session

coderabbitai Bot reviewed May 22, 2026

View reviewed changes

Copilot AI reviewed May 22, 2026

View reviewed changes

Comment thread zstd/src/encoding/simple/fast_matcher.rs Outdated

Comment thread zstd/src/encoding/simple/fast_kernel/kernel.rs

polaz added 2 commits May 22, 2026 07:38

refactor(fast_matcher): align window_size helper with its docstring

ce42669

window_size docstring states 'Equals 1 << window_log' but body returned self.max_window_size. Both fields are kept in sync, but reading window_log makes the helper literally match the documented contract.

coderabbitai Bot reviewed May 22, 2026

View reviewed changes

polaz requested a review from Copilot May 22, 2026 05:09

Copilot started reviewing on behalf of polaz May 22, 2026 05:10 View session

Copilot AI reviewed May 22, 2026

View reviewed changes

Comment thread zstd/src/encoding/simple/fast_matcher.rs Outdated

Comment thread zstd/src/encoding/simple/fast_matcher.rs

polaz requested a review from Copilot May 22, 2026 05:26

Copilot started reviewing on behalf of polaz May 22, 2026 05:26 View session

Copilot AI reviewed May 22, 2026

View reviewed changes

Comment thread zstd/src/encoding/simple/fast_matcher.rs Outdated

Comment thread zstd/src/encoding/simple/fast_matcher.rs Outdated

Copilot started reviewing on behalf of polaz May 22, 2026 06:53 View session

Copilot AI reviewed May 22, 2026

View reviewed changes

Comment thread zstd/src/encoding/simple/fast_matcher.rs Outdated

Comment thread zstd/src/encoding/simple/fast_matcher.rs Outdated

Comment thread zstd/src/encoding/simple/fast_matcher.rs Outdated

polaz added 2 commits May 22, 2026 10:01

Merge branch 'main' into perf/#198-fast-matcher-phase3

1f777d4

Copilot AI review requested due to automatic review settings May 22, 2026 07:02

Copilot started reviewing on behalf of polaz May 22, 2026 07:02 View session

Copilot AI reviewed May 22, 2026

View reviewed changes

Comment thread zstd/src/encoding/simple/fast_kernel/kernel.rs

Comment thread zstd/src/encoding/simple/fast_kernel/kernel.rs

Comment thread zstd/src/encoding/simple/fast_kernel/kernel.rs

Comment thread zstd/src/encoding/simple/fast_kernel/kernel.rs

polaz requested a review from Copilot May 22, 2026 07:41

Copilot started reviewing on behalf of polaz May 22, 2026 07:41 View session

Copilot AI reviewed May 22, 2026

View reviewed changes

Comment thread zstd/src/encoding/simple/fast_matcher.rs

polaz requested a review from Copilot May 22, 2026 07:59

Copilot started reviewing on behalf of polaz May 22, 2026 07:59 View session

Copilot AI reviewed May 22, 2026

View reviewed changes

Comment thread zstd/src/encoding/simple/fast_matcher.rs

Comment thread zstd/src/encoding/simple/fast_matcher.rs

polaz requested a review from Copilot May 22, 2026 08:45

Copilot started reviewing on behalf of polaz May 22, 2026 08:45 View session

Copilot AI reviewed May 22, 2026

View reviewed changes

coderabbitai Bot reviewed May 22, 2026

View reviewed changes

polaz closed this May 22, 2026

Conversation

polaz commented May 21, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Headline results

Ratio audit on decodecorpus-z000033

Milestones (squashed from PR #219)

API changes

Tests

Out of scope (follow-ups)

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

codecov Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

polaz commented May 21, 2026

Uh oh!

coderabbitai Bot commented May 21, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

polaz commented May 21, 2026 •

edited by coderabbitai Bot

Loading

Ratio audit on `decodecorpus-z000033`

coderabbitai Bot commented May 21, 2026 •

edited

Loading

codecov Bot commented May 21, 2026 •

edited

Loading