perf(encoder): donor-parity greedy parse at L4 — ratio + speed win by polaz · Pull Request #179 · structured-world/structured-zstd

polaz · 2026-05-18T15:18:17Z

Summary

L4 (StrategyTag::Greedy) was routed through the Row backend's plain start_matching with lazy_depth = 1, paying for an unconditional depth-1 lookahead while still calling itself "greedy". Donor ZSTD_compressBlock_lazy_generic at depth == 0 (zstd_lazy.c:1560) is structurally different — that's why the strategy is named "greedy": not because the matches are long, but because each iteration greedily prefers the cheaper 1 literal + repcode encoding over a 0-literal regular match.

This PR adds RowMatchGenerator::start_matching_greedy mirroring donor's depth-0 lazy_generic flow, and dispatches L4 (lazy_depth == 0) to it via a debug_assert_eq!(matcher.lazy_depth, 0) invariant — the runtime side of the routing is pinned in tests via driver.row_matcher().lazy_depth == 0. L5+ keep the existing plain start_matching path (their lazy lookahead already gives ratio wins vs donor); the plain row-lazy methods are kept as #[allow(dead_code)] scaffolding for a future routing of L5+ through the Row backend.

Structural changes carried over from donor

Default start = pos + 1: probe the repcode bank at abs_pos + 1 before regular search. A near-repcode beats a longer regular offset on encoding cost (4-5 bits vs 9-13 bits for the offset code).
Hybrid commit (not pure donor goto): donor's pure commit-on-first-rep was tried and cost +1.1pp ratio on z000033 because donor recovers that loss via components we don't replicate (minMatch=5, block splitting). Our hybrid still runs the regular search at abs_pos and picks the longer of (rep@+1, regular@pos) — keeps the speed shape, reclaims the ratio.
Skip-step grows on miss, shift = 10: donor uses 8 (kSearchStrength). Shift = 8 cost ~3pp ratio loss; shift = infinity cost ~+14% speed. Shift = 10 sits at the sweet spot — only kicks in past ~1 KiB literal run, so on structured data the step stays at 1.
Repcode min_match = 4 (donor MEM_read32 + ZSTD_count). ROW_MIN_MATCH_LEN = 6 gates the regular row-table search; the independent repcode probe benefits from the lower threshold because 4-5-byte reps are cheap to encode and frequently beat the literal alternative. Outer-loop guard is pos + GREEDY_MIN_LOOKAHEAD <= current_len (= 5) so the tail position whose abs_pos + 1 repcode probe needs 4 bytes of lookahead is reachable.

pick_lazy_match_shared is intentionally not called from the new path — depth == 0 means no lookahead, emit on first viable hit.

Note: donor's immediate-repcode loop after store (probe offset_2 for back-to-back hits + swap) was implemented and then removed. A panic! probe inside the inner loop ran the full 528-test suite + bench matrix without firing. The donor design is single-rep; ours is three-rep (repcode_candidate_shared evaluates rep1, rep2, rep3 plus the ll0 fallback), so the inner-loop slot is subsumed by the next main-loop iteration's rep probe. Removing it nudged decodecorpus-z000033 from 538142 to 537897 bytes (-245).

Measurements (L4 only, criterion --baseline main, p < 0.05 unless noted)

Scenario	rust_bytes vs main	time vs main
small-1k-random	0	+3.4%
small-10k-random	0	-0.2% (p=0.96, noise)
small-4k-log-lines	0	-5.1%
decodecorpus-z000033	-1522 bytes	-24.0%
high-entropy-1m	0	-7.8%
low-entropy-1m	0	-2.4%
large-log-stream	-2 bytes	+9.0%

Ratio: 5 cells equal, 2 cells better, 0 cells worse.
Speed: 4 wins (incl. -24% on z000033 — the cell that motivated #178), 1 noise, 2 regressions.

Tests

8 new L4-targeted tests (plus a private l4_greedy_round_trip helper):

driver_level4_selects_row_backend — backend routing + lazy_depth == 0 assertion (pins greedy vs lazy choice in release tests too).
driver_level4_greedy_round_trip_single_slice — non-empty triple emission on a repeating pattern.
driver_level4_greedy_round_trip_cross_slice — cross-slice match (offset >= chunk.len()).
driver_level4_greedy_tail_rep_only_reachable — guards the GREEDY_MIN_LOOKAHEAD = REP_MIN_MATCH_LEN + 1 loop bound; without the fix the 5-byte-tail rep-only position was unreachable.
driver_level4_greedy_empty_input_emits_nothing — current_len == 0 early-return path.
driver_level4_greedy_sub_min_lookahead_input — 4-byte payload below GREEDY_MIN_LOOKAHEAD; outer loop never executes.
driver_level4_greedy_incompressible_input — pseudo-random 256B exercises the miss-branch.
driver_level4_greedy_long_literal_run_skip_step_growth — 2 KiB pseudo-random drives lit_len >> SKIP_STRENGTH.
driver_level4_greedy_all_zeros_heavy_rep1 — max_offset == 1 rep1 commit-on-first-hit.
driver_level4_greedy_periodic_pattern_rep_cascade — 16-byte period steady-state cascade.

row/mod.rs coverage: 89.12% to 96.69%. Remaining gaps are micro-branches (rep probe at exact end-of-buffer, regular-strictly-longer-than-rep arm, compact_history drain) and the #[allow(dead_code)] scaffold.

Known follow-up

The +9% large-log-stream regression and the +3.4% small-1k-random one reflect a different hotpath. samply profile on L4 large-log-stream shows RowMatchGenerator::skip_matching_with_hint consuming ~25% of self time — it inserts every position of each 128 KiB block into the row hash table (~102M insert_position calls on a 100 MB stream), dominating any per-iteration delta in the parse loop. Independent of this PR; tracked as #180.

Test plan

cargo nextest run: 528/528 passed (3 skipped)
cargo clippy --all-targets --features dict_builder: clean
cargo fmt --check: clean
cargo llvm-cov --lib: row/mod.rs at 96.69%
compare_ffi L4 matrix: full scenario sweep, ratios captured above
Ratio gate: rust_bytes <= ffi_bytes constraint from CLAUDE.md respected — no ratio regression on any cell

perf(decode + encode-greedy): close 3-5× donor gap on negative-level decompress; share SIMD primitives + add dedicated greedy strategy #178 — parent issue (decode/encode shared primitives + greedy)
perf(encoder): skip_matching_with_hint dominates self-time (~25%) — drop eager block-boundary inserts #180 — skip_matching_with_hint dominates self-time (follow-up for the speed regressions noted above)

Summary by CodeRabbit

Performance
- Level 4 Row backend now uses a more aggressive greedy matching path, improving matching behavior for repeats and periodic patterns and yielding faster/more consistent compression.
Tests
- Added extensive Level 4 test coverage: routing/invariant checks, single- and cross-slice greedy round-trips, and multiple regression cases (empty input, incompressible data, long literals, heavy repeats).

L4 (`StrategyTag::Greedy`) was routed through the Row backend's plain `start_matching` with `lazy_depth = 1`, paying for an unconditional depth-1 lookahead while still calling itself "greedy". Donor `ZSTD_compressBlock_lazy_generic` at `depth == 0` (`zstd_lazy.c:1560`) is structurally different — that's why the strategy is named "greedy": not because the matches are long, but because each iteration greedily prefers the cheaper `1 literal + repcode` encoding over a 0-literal regular match. Add `RowMatchGenerator::start_matching_greedy` that mirrors donor's depth-0 lazy_generic flow and dispatch L4 (lazy_depth = 0) to it. Structural features carried over: - **Default `start = pos + 1`**: probe the repcode bank at `abs_pos + 1` before regular search, so a near-repcode beats a longer regular offset on encoding cost. - **`if depth == 0: goto _storeSequence`**: a rep hit ends the iteration without paying for the regular search at `abs_pos`. Our hybrid still runs the regular search and picks the longer of (rep@ +1, regular@pos) — pure donor commit-on-first-rep was tried and cost +1.1pp ratio vs main because donor recovers the loss via components we don't replicate (`minMatch = 5`, block splitting). The hybrid keeps the speed shape and reclaims the ratio. - **Immediate-repcode loop after store**: scan forward for back-to-back `offset_2` hits with `lit_len = 0`. The donor `offset_1 ↔ offset_2` swap is reproduced by `encode_offset_with_history(actual = offset_hist[1], lit_len = 0)`. - **Skip-step grows on miss**: shift right by 10 (donor uses 8). Shift = 8 cost ~3pp ratio loss; shift = ∞ (no grow) cost ~+14% speed. Shift = 10 sits at the sweet spot — only kicks in past ~1 KiB literal run, so on structured data the step stays at 1. - **Repcode min_match = 4** (donor `MEM_read32 + ZSTD_count`). `ROW_MIN_MATCH_LEN = 6` gates the regular row-table search; the independent repcode probe benefits from the lower threshold because 4–5-byte reps are cheap to encode and frequently beat the literal alternative. `pick_lazy_match_shared` is intentionally not called from the new path — depth == 0 means no lookahead, emit on first viable hit. L4 measurements vs main (criterion `--baseline`, p < 0.05 unless noted): | Scenario | rust_bytes Δ | time Δ | |-----------------------|--------------|--------| | small-1k-random | 0 | +3.4% | | small-10k-random | 0 | -0.2% (noise, p=0.96) | | small-4k-log-lines | 0 | -5.1% | | decodecorpus-z000033 | -1277 bytes | -24.0% | | high-entropy-1m | 0 | -7.8% | | low-entropy-1m | 0 | -2.4% | | large-log-stream | -2 bytes | +9.0% | Ratio: 5 cells equal, 2 cells better, **0 cells worse**. The 2 speed regressions (small-1k-random, large-log-stream) reflect a different hotpath: profile on large-log-stream shows `RowMatchGenerator::skip_matching_with_hint` consuming ~25% of self time, dominating any per-iteration delta in the parse loop. Tracked separately (see follow-up issue) — independent of this PR. Refs #178

coderabbitai · 2026-05-18T15:18:31Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: c27fcc1f-d69f-4821-96d0-85b5ba8bb646

📥 Commits

Reviewing files that changed from the base of the PR and between db07f98 and 3250a07.

📒 Files selected for processing (1)

zstd/src/encoding/match_generator.rs

📝 Walkthrough

Walkthrough

Adds a depth-0 greedy parse path to the Row matcher, sets level 4’s Row matcher lazy_depth = 0, routes the backend dispatcher to call the greedy path, and expands tests for level‑4 greedy single-slice, cross-slice, and edge-case behaviors.

Changes

Greedy Matching Path for Compression Level 4

Layer / File(s)	Summary
Greedy matcher implementation `zstd/src/encoding/row/mod.rs`	Implements `RowMatchGenerator::start_matching_greedy` (depth‑0 greedy loop), table backfill/insert logic, rep-vs-row candidate selection, skip-step on misses, `offset_hist` updates, tail insertion, and trailing `Sequence::Literals`; adds lazy-style docs and scoped `#[allow(dead_code)]` on helpers.
Dispatcher wiring, level 4 configuration, and tests `zstd/src/encoding/match_generator.rs`	Sets LEVEL_TABLE level 4 Row `lazy_depth = 0`; changes `BackendTag::Row` to call `start_matching_greedy` with `debug_assert_eq!(matcher.lazy_depth, 0)` and adds Level(4) greedy round-trip and regression tests (single-slice, cross-slice, empty input, sub-min lookahead, incompressible data, long literal-run skip growth, heavy rep1, periodic-pattern rep cascades).

Sequence Diagram

sequenceDiagram
  participant TestHarness
  participant MatchGeneratorDriver
  participant RowMatchGenerator
  participant RowTable
  participant OffsetHistory
  TestHarness->>MatchGeneratorDriver: request compress_block(Level(4), slices)
  MatchGeneratorDriver->>RowMatchGenerator: debug_assert_lazy_depth_then_start_matching_greedy()
  RowMatchGenerator->>RowTable: backfill & lookup (abs_pos, abs_pos+1)
  RowMatchGenerator->>OffsetHistory: encode_offset_with_history(offset)
  RowMatchGenerator->>MatchGeneratorDriver: emit Sequence::Triple / Sequence::Literals
  MatchGeneratorDriver->>TestHarness: reconstructed output / match metadata

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

perf(decode + encode-greedy): close 3-5× donor gap on negative-level decompress; share SIMD primitives + add dedicated greedy strategy #178: Adds a dedicated Row greedy path and routes level 4 to greedy (matches this PR's changes).

Possibly related PRs

structured-world/structured-zstd#84: Introduced the Row backend and RowMatchGenerator that this PR extends.
structured-world/structured-zstd#125: Overlaps backend dispatch refactor touching the same compress_block Row arm.

"I hopped through rows with tuned delight,
Probing repcodes by day and emitting triples by night.
Level four taught me to choose the longer chase,
Backfilled tables and left literals a trace.
A greedy rabbit's hop—swift, clever, and light."

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'perf(encoder): donor-parity greedy parse at L4 — ratio + speed win' clearly and specifically describes the main change: implementing a donor-parity greedy parse for compression level 4 with performance improvements.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch perf/#178-dedicated-greedy

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-05-18T15:21:06Z

Codecov Report

❌ Patch coverage is 92.64706% with 20 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
zstd/src/encoding/match_generator.rs	91.13%	18 Missing ⚠️
zstd/src/encoding/row/mod.rs	97.10%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@zstd/src/encoding/row/mod.rs`:
- Around line 290-315: The outer greedy loop is still gated by ROW_MIN_MATCH_LEN
which prevents evaluating the rep-only tail where REP_MIN_MATCH_LEN = 4 should
apply; introduce a GREEDY_MIN_LOOKAHEAD = REP_MIN_MATCH_LEN + 1 (or simply use
REP_MIN_MATCH_LEN + 1) and replace the loop guard `while pos + ROW_MIN_MATCH_LEN
<= current_len` with `while pos + GREEDY_MIN_LOOKAHEAD <= current_len` so the
`rep_probe_pos = abs_pos + 1` / `rep_match` probe (which checks `rep_probe_pos +
REP_MIN_MATCH_LEN <= self.history_abs_end()`) becomes reachable for the final
bytes. Ensure REP_MIN_MATCH_LEN is defined/visible before the loop so the new
GREEDY_MIN_LOOKAHEAD can reference it.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2274cc9f-d4b5-46c2-9afb-d3d5c1be2110

📥 Commits

Reviewing files that changed from the base of the PR and between c71559e and d832376.

📒 Files selected for processing (2)

zstd/src/encoding/match_generator.rs
zstd/src/encoding/row/mod.rs

Copilot

Pull request overview

This PR refines the level-4 (Greedy) encoder path by adding a dedicated donor-parity greedy loop for the Row backend and routing L4 to it (via lazy_depth = 0), aiming to reduce overhead from unnecessary lazy lookahead while preserving (or improving) ratio via repcode-first probing and an immediate rep loop.

Changes:

Added RowMatchGenerator::start_matching_greedy() implementing a donor-inspired greedy parse loop for lazy_depth == 0.
Updated level-4 parameters to set lazy_depth: 0 so Greedy no longer routes through the depth-1 lazy machinery.
Added Row-backend dispatch logic to select the greedy path when lazy_depth == 0.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
`zstd/src/encoding/row/mod.rs`	Adds the new greedy parsing loop (`start_matching_greedy`) with repcode-first probing, miss skip-step, and immediate rep loop.
`zstd/src/encoding/match_generator.rs`	Routes level 4 to `lazy_depth = 0` and dispatches Row matching to the new greedy path.

`row/mod.rs:290`: outer-loop guard relaxed from `pos + ROW_MIN_MATCH_LEN <= current_len` to `pos + GREEDY_MIN_LOOKAHEAD <= current_len` where `GREEDY_MIN_LOOKAHEAD = REP_MIN_MATCH_LEN + 1 = 5`. The previous bound was over-strict: a repcode probe at `abs_pos + 1` only needs 4 bytes of lookahead beyond that probe point, so the last 5-byte position was unreachable even though its rep probe would succeed. `REP_MIN_MATCH_LEN` is hoisted out of the loop body so the new outer guard can reference it. `row/mod.rs:242` (doc): bullet (2) of the greedy parse description claimed "the regular search at abs_pos is skipped entirely" — that is donor's depth-0 behavior, not ours. Rewritten to document the hybrid form (rep probe at +1 *and* regular search at pos, longer wins, ties go to rep). Explicitly notes why donor's pure commit-on-first-rep was rejected: ratio cliff without minMatch=5 and superblock entropy sharing. `row/mod.rs:367` (doc): comment said `SKIP_STRENGTH = 12` (~4 KiB miss-run threshold) but the constant is 10 (~1 KiB). Comment realigned to the actual value. `match_generator.rs:1223` (dispatch): the `Row` backend arm had a runtime `if matcher.lazy_depth == 0` with a fall-through to the plain `start_matching` path, but `StrategyTag::backend` maps only `Greedy` → `Row`, so the `else` was unreachable in normal operation. Replaced with unconditional `start_matching_greedy` plus a `debug_assert_eq!(matcher.lazy_depth, 0, …)` documenting the invariant. The plain `start_matching` / `best_match` / `pick_lazy_match` / `repcode_candidate` methods on `RowMatchGenerator` now have no in-tree caller. Marked `#[allow(dead_code)]` rather than deleted: a future routing of a lazy strategy through the Row backend would otherwise have to re-derive the rep+row best-of-two pick and the `pick_lazy_match` driver. Doc comment makes the rationale explicit. `match_generator.rs` tests: new `driver_level4_greedy_round_trip_single_slice` locks down L4 greedy reconstruction on a repeating-pattern single slice and asserts at least one `Sequence::Triple` is emitted (the rep probe must produce a match). `driver_level4_greedy_round_trip_cross_slice` commits two identical slices and asserts the parse matches into the first slice's history (offset >= chunk.len()). Covers the new `start_matching_greedy` entry point + cross-block hash-table seeding. Verified locally: - cargo nextest run: 521 / 521 pass (3 skipped) - cargo clippy --all-targets --features dict_builder: clean - cargo fmt --check: clean - compare_ffi L4 REPORT lines: ratio unchanged on all scenarios (538142 z000033, 8947 large-log-stream, ...)

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@zstd/src/encoding/match_generator.rs`:
- Around line 3997-4118: The tests only verify round-trip reconstruction but
don't assert that Level(4) actually uses the greedy path; after calling
driver.reset(CompressionLevel::Level(4)) assert the matcher routing state (e.g.
assert_eq!(driver.row_matcher().lazy_depth, 0)) to prove greedy selection, and
add a targeted fixture in driver_level4_greedy_round_trip_single_slice (or a new
test) that inspects the first emitted sequence from start_matching (or uses a
probe of start_matching_greedy vs lazy behavior) to assert a greedy-specific
behavior (for example expecting an immediate rep Triple at start rather than the
lazy parser's different first sequence), referencing
MatchGeneratorDriver::reset, MatchGeneratorDriver::row_matcher, and
start_matching/start_matching_greedy to locate changes.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d39f8572-2f5f-4744-bb66-d9ce477cdaec

📥 Commits

Reviewing files that changed from the base of the PR and between d832376 and 6bf8628.

📒 Files selected for processing (2)

zstd/src/encoding/match_generator.rs
zstd/src/encoding/row/mod.rs

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

The immediate-rep loop inside `start_matching_greedy` was unreachable on every test and bench workload: a `panic!` probe placed inside the loop body ran the full 528-test suite without firing. The donor design it mirrors is single-rep, where the inner loop catches hits the main loop's single-slot probe wouldn't. Our main-loop rep probe goes through `repcode_candidate_shared` which already evaluates rep1, rep2, rep3 plus the `ll0` fallback — the inner-loop slot is subsumed by the next outer-iteration rep probe of the same slot, so the inner loop is dead by construction. Removed. Side effect on `decodecorpus-z000033` (L4): rust_bytes 538142 -> 537897 (-245 bytes), because the loop's offset_hist rotations on the few inputs where its slice-equality check passed turned out to bias subsequent encoding decisions slightly the wrong way. Other scenarios unchanged. Tests added (all hit `start_matching_greedy` via L4 driver): - `l4_greedy_round_trip` private test helper: commits multi-slice data per-slice with drive interleaved (single-shot driving loses earlier slices' bytes once they fall out of the window). - `driver_level4_greedy_tail_rep_only_reachable` — cross-slice payload whose tail position lives in the 5-byte window that was unreachable under the old `pos + ROW_MIN_MATCH_LEN <= current_len` guard. Asserts at least one match emitted from that region. - `driver_level4_greedy_empty_input_emits_nothing` — `current_len == 0` early-return guard exercises. - `driver_level4_greedy_sub_min_lookahead_input` — 4-byte payload below `GREEDY_MIN_LOOKAHEAD = 5`; outer loop never runs, all bytes emit as literals. - `driver_level4_greedy_incompressible_input` — 256 pseudo-random bytes hit the miss-branch + skip-step path with no rep / row candidate. - `driver_level4_greedy_long_literal_run_skip_step_growth` — 2 KiB of pseudo-random data drives the literal-run length past the `SKIP_STRENGTH = 10` threshold so the per-miss step grows beyond 1. - `driver_level4_greedy_all_zeros_heavy_rep1` — heavy rep1 path (offset = 1, byte-against-prev-byte). Asserts `max_offset == 1`. - `driver_level4_greedy_periodic_pattern_rep_cascade` — 16-byte period repeated 32x; locks the steady-state rep-cascade at offset >= 16. Doc on `start_matching_greedy` updated: the donor immediate-rep paragraph is rewritten to explain WHY it's omitted (single-rep donor vs three-rep us) and that removal was empirically validated via the panic probe. Coverage on `row/mod.rs`: 89.12% -> 96.69%. Remaining gaps are micro- branches (rep probe at exact end-of-buffer, regular-strictly-longer- than-rep arm, history compaction drain) and the `#[allow(dead_code)]` `start_matching` / `best_match` / `pick_lazy_match` / `repcode_candidate` scaffold for future lazy-on-Row routing. Verified: - cargo nextest run: 528 / 528 pass (3 skipped) - cargo clippy --all-targets --features dict_builder: clean - cargo fmt --check: clean - compare_ffi L4 REPORT (post-removal): decodecorpus-z000033: rust_bytes 537897 (was 538142, -245) large-log-stream: rust_bytes 8947 (unchanged) others: unchanged

`driver_level4_selects_row_backend`: extended to also assert `driver.row_matcher().lazy_depth == 0` after `reset(Level(4))`. The dispatcher in `MatchGeneratorDriver::start_matching` has a `debug_assert_eq!(matcher.lazy_depth, 0)` invariant that routes `BackendTag::Row` unconditionally into `start_matching_greedy` — but that invariant is only checked in debug builds. The runtime assertion here pins the routing choice in release tests too, so a future change that rerouted L4 through the lazy parse (`start_matching` with `lazy_depth >= 1`) would surface before the round-trip suite runs. Round-trip correctness alone passes on either parser, which is why that suite needs a routing-specific companion. Doc fixes: - `row/mod.rs:237` (start_matching_greedy docs): reworded "from a depth-1 lazy with `lazy_depth = 0`" — internally inconsistent (lazy_depth = 0 IS greedy, not depth-1 lazy). Now reads "from the lazy parse in `start_matching` which `lazy_depth >= 1` strategies use", removing the contradictory phrasing. - Round-trip test doc updated to acknowledge that round-trip alone does NOT pin the greedy-vs-lazy choice (the lazy_depth assertion in the sibling test does), and to describe what the round-trip suite actually guards (non-empty triple emission on structured input). Verified: - cargo nextest run: 528 / 528 pass (3 skipped)

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@zstd/src/encoding/row/mod.rs`:
- Around line 409-425: The code currently calls self.insert_positions(abs_pos,
candidate.start + candidate.match_len) which fails to index bytes between
candidate.start and abs_pos when extend_backwards_shared moves candidate.start
backwards; update the insertion range to start from candidate.start (i.e. call
insert_positions with candidate.start as the first arg) so the committed match
region [candidate.start, candidate.start + candidate.match_len) is backfilled
into the row table before emitting the sequence (affecting insert_positions,
candidate.start, abs_pos, and candidate.match_len).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0fb07c76-aba3-45c2-b9bc-daad938da461

📥 Commits

Reviewing files that changed from the base of the PR and between 6bf8628 and 6fb4d37.

📒 Files selected for processing (2)

zstd/src/encoding/match_generator.rs
zstd/src/encoding/row/mod.rs

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

… bound Two doc-only corrections + one declined suggestion with measurement rationale baked into the source comment. `row/mod.rs:172` (dead-code `start_matching` doc): the dispatch invariant referenced by this scaffold method lives in the `BackendTag::Row` arm of `MatchGeneratorDriver::compress_block` (`match_generator.rs:1230` area), not in `MatchGeneratorDriver::start_matching`. The doc comment had it on the wrong method, which would mislead anyone navigating the call stack from this scaffold back to its sole caller. `match_generator.rs:3995` (`driver_level4_selects_row_backend` test comment): same reference fix — the `debug_assert_eq!` lives inside `compress_block` not `start_matching`. `row/mod.rs:411` (`insert_positions` bound): added an explanatory comment pinning the `abs_pos` lower bound and explaining WHY the narrower range is intentional. The wider range `[candidate.start, candidate.start + match_len)` was tried and regresses `rust_bytes` on `decodecorpus-z000033` by +447 (537897 -> 538344). `extend_backwards_shared` absorbs literal bytes that were already row-indexed on earlier miss iterations via `insert_position(abs_pos)`, so re-indexing them at emit time overwrites the same `abs_pos -> position` mapping a second time and evicts more recent row-slot tenants. The narrow `abs_pos` bound keeps the row table healthier and the ratio +0 / -447 the right direction. No code-behavior change (the function body's emit logic is the same). Verified: - cargo nextest run: 528 / 528 pass (3 skipped) - cargo clippy --all-targets --features dict_builder: clean - cargo fmt --check: clean - compare_ffi L4 z000033 rust_bytes: 537897 (unchanged from pre-doc-fix baseline)

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@zstd/src/encoding/match_generator.rs`:
- Around line 4216-4225: The test currently asserts on total triples (triples >=
1) which can be satisfied by matches inside the first slice; instead inspect the
detailed match information returned by l4_greedy_round_trip and assert that at
least one match originates in the second slice (i.e., has a start position >=
first.len()). Replace the weak assert with a check on the second return value
from l4_greedy_round_trip (the matches/list/positions it returns) to ensure
there exists a match whose position >= first.len(), referencing the variables
first, second, combined and the l4_greedy_round_trip call.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ff899b25-64b2-4a30-ad1d-21318756ca7f

📥 Commits

Reviewing files that changed from the base of the PR and between 6fb4d37 and c2df3f9.

📒 Files selected for processing (2)

zstd/src/encoding/match_generator.rs
zstd/src/encoding/row/mod.rs

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

…sertion The previous `driver_level4_greedy_tail_rep_only_reachable` asserted `triples >= 1` on a combined-payload round-trip — too weak, because the original first slice (`"BEEF_BEEF_aaaaaa"`) already contained "BEEF" twice and would emit at least one match through the regular row-table path even if the `GREEDY_MIN_LOOKAHEAD = REP_MIN_MATCH_LEN + 1` guard relaxation regressed and the second-slice tail rep stayed unreachable. Pass-through assertion masking the case under test. Replace with a slice-by-slice drive that counts triples emitted from the **second slice's** `start_matching` pass and a deliberately constructed payload that triggers the exact tail boundary: - First slice: `"ABCDABCDABCDABCD"` (16 bytes, strict period 4) guarantees `offset_hist[0] = 4` by the time the parse reaches the end of the slice. - Second slice: `"ABCDA"` — exactly 5 bytes (= `GREEDY_MIN_LOOKAHEAD`). Outer loop runs once at `pos = 0`. The regular `row_candidate` requires 6 bytes from `abs_pos`, which is past the live-history end, so the only viable hit is the `abs_pos + 1` rep probe. `second[1..5] == "BCDA"` matches the 4-byte sequence `first[13..16] ++ second[0] == "BCDA"` at offset 4, and `extend_backwards_shared` then absorbs `second[0]` for a 5-byte rep match. With the old `pos + ROW_MIN_MATCH_LEN <= current_len` guard the outer loop would have refused to enter (0 + 6 > 5); with the relaxed `pos + GREEDY_MIN_LOOKAHEAD <= current_len` guard it enters and the rep fires. Without this strengthening, a future regression on the loop guard would not surface here. With it, the test fails immediately. Verified: - cargo nextest run: 528 / 528 pass (3 skipped) - cargo clippy --all-targets --features dict_builder: clean - cargo fmt --check: clean

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

…y asserts The `driver_level4_greedy_long_literal_run_skip_step_growth` test comment claimed it would catch "a future change to SKIP_STRENGTH" that would silently skip valid match positions, but the test body only asserts bit-exact round-trip — round-trip would pass on SKIP_STRENGTH = 6 or 14 as well, since both produce valid sequences, just at different per-iteration costs. The doc overstated the guarantee. Rewrite the comment to describe what the test actually verifies: a smoke that the miss branch + step-grow path in start_matching_greedy survives 2 KiB of incompressible input without panicking or violating round-trip. Pinning the exact step growth would require returning step / iteration metadata from the parse, which is invasive plumbing for a constant that has been stable since the original tuning round. Doc-only change. No code-behavior delta. Existing test continues to run as a stress smoke for the skip-step path. Verified compile: cargo build --release clean.

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

Copilot AI review requested due to automatic review settings May 18, 2026 15:18

polaz mentioned this pull request May 18, 2026

perf(encoder): skip_matching_with_hint dominates self-time (~25%) — drop eager block-boundary inserts #180

Open

6 tasks

Copilot started reviewing on behalf of polaz May 18, 2026 15:19 View session

coderabbitai Bot reviewed May 18, 2026

View reviewed changes

Comment thread zstd/src/encoding/row/mod.rs Outdated

Copilot AI reviewed May 18, 2026

View reviewed changes

Comment thread zstd/src/encoding/row/mod.rs Outdated

Comment thread zstd/src/encoding/row/mod.rs Outdated

Comment thread zstd/src/encoding/match_generator.rs Outdated

Comment thread zstd/src/encoding/match_generator.rs Outdated

polaz requested a review from Copilot May 18, 2026 15:40

Copilot started reviewing on behalf of polaz May 18, 2026 15:41 View session

coderabbitai Bot reviewed May 18, 2026

View reviewed changes

Comment thread zstd/src/encoding/match_generator.rs

Copilot AI reviewed May 18, 2026

View reviewed changes

Comment thread zstd/src/encoding/row/mod.rs Outdated

Comment thread zstd/src/encoding/match_generator.rs Outdated

polaz added 2 commits May 18, 2026 19:04

polaz requested a review from Copilot May 18, 2026 16:14

coderabbitai Bot reviewed May 18, 2026

View reviewed changes

Comment thread zstd/src/encoding/row/mod.rs

Copilot started reviewing on behalf of polaz May 18, 2026 16:15 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

Comment thread zstd/src/encoding/row/mod.rs

Comment thread zstd/src/encoding/match_generator.rs Outdated

polaz requested a review from Copilot May 18, 2026 20:53

Copilot started reviewing on behalf of polaz May 18, 2026 20:53 View session

coderabbitai Bot reviewed May 18, 2026

View reviewed changes

Comment thread zstd/src/encoding/match_generator.rs Outdated

Copilot AI reviewed May 18, 2026

View reviewed changes

polaz requested a review from Copilot May 18, 2026 21:11

Copilot AI reviewed May 18, 2026

View reviewed changes

Comment thread zstd/src/encoding/match_generator.rs Outdated

Copilot started reviewing on behalf of polaz May 18, 2026 21:20 View session

polaz requested a review from Copilot May 18, 2026 21:30

Copilot AI reviewed May 18, 2026

View reviewed changes

Copilot started reviewing on behalf of polaz May 18, 2026 21:53 View session

polaz merged commit 62fe6e1 into main May 18, 2026
25 checks passed

polaz deleted the perf/#178-dedicated-greedy branch May 18, 2026 22:01

sw-release-bot Bot mentioned this pull request May 18, 2026

chore: release v0.0.22 #156

Open

coderabbitai Bot mentioned this pull request May 18, 2026

feat: block splitting for improved compression ratio #23

Open

8 tasks

polaz mentioned this pull request May 18, 2026

Roadmap: structured-zstd feature parity with C zstd #28

Open

coderabbitai Bot mentioned this pull request May 19, 2026

perf+ratio(encoder): lazy band investigation — ratio and speed regressions vs FFI #184

Open

Conversation

polaz commented May 18, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Structural changes carried over from donor

Measurements (L4 only, criterion --baseline main, p < 0.05 unless noted)

Tests

Known follow-up

Test plan

Related

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related issues

Possibly related PRs

Uh oh!

codecov Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

polaz commented May 18, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 18, 2026 •

edited

Loading

codecov Bot commented May 18, 2026 •

edited

Loading