perf(encoder): donor-parity greedy parse at L4 — ratio + speed win#179
Conversation
L4 (`StrategyTag::Greedy`) was routed through the Row backend's plain
`start_matching` with `lazy_depth = 1`, paying for an unconditional
depth-1 lookahead while still calling itself "greedy". Donor
`ZSTD_compressBlock_lazy_generic` at `depth == 0` (`zstd_lazy.c:1560`)
is structurally different — that's why the strategy is named "greedy":
not because the matches are long, but because each iteration greedily
prefers the cheaper `1 literal + repcode` encoding over a 0-literal
regular match.
Add `RowMatchGenerator::start_matching_greedy` that mirrors donor's
depth-0 lazy_generic flow and dispatch L4 (lazy_depth = 0) to it.
Structural features carried over:
- **Default `start = pos + 1`**: probe the repcode bank at `abs_pos +
1` before regular search, so a near-repcode beats a longer regular
offset on encoding cost.
- **`if depth == 0: goto _storeSequence`**: a rep hit ends the
iteration without paying for the regular search at `abs_pos`. Our
hybrid still runs the regular search and picks the longer of (rep@
+1, regular@pos) — pure donor commit-on-first-rep was tried and
cost +1.1pp ratio vs main because donor recovers the loss via
components we don't replicate (`minMatch = 5`, block splitting).
The hybrid keeps the speed shape and reclaims the ratio.
- **Immediate-repcode loop after store**: scan forward for
back-to-back `offset_2` hits with `lit_len = 0`. The donor
`offset_1 ↔ offset_2` swap is reproduced by
`encode_offset_with_history(actual = offset_hist[1], lit_len = 0)`.
- **Skip-step grows on miss**: shift right by 10 (donor uses 8).
Shift = 8 cost ~3pp ratio loss; shift = ∞ (no grow) cost ~+14%
speed. Shift = 10 sits at the sweet spot — only kicks in past ~1
KiB literal run, so on structured data the step stays at 1.
- **Repcode min_match = 4** (donor `MEM_read32 + ZSTD_count`).
`ROW_MIN_MATCH_LEN = 6` gates the regular row-table search; the
independent repcode probe benefits from the lower threshold because
4–5-byte reps are cheap to encode and frequently beat the literal
alternative.
`pick_lazy_match_shared` is intentionally not called from the new path
— depth == 0 means no lookahead, emit on first viable hit.
L4 measurements vs main (criterion `--baseline`, p < 0.05 unless noted):
| Scenario | rust_bytes Δ | time Δ |
|-----------------------|--------------|--------|
| small-1k-random | 0 | +3.4% |
| small-10k-random | 0 | -0.2% (noise, p=0.96) |
| small-4k-log-lines | 0 | -5.1% |
| decodecorpus-z000033 | -1277 bytes | -24.0% |
| high-entropy-1m | 0 | -7.8% |
| low-entropy-1m | 0 | -2.4% |
| large-log-stream | -2 bytes | +9.0% |
Ratio: 5 cells equal, 2 cells better, **0 cells worse**. The 2 speed
regressions (small-1k-random, large-log-stream) reflect a different
hotpath: profile on large-log-stream shows
`RowMatchGenerator::skip_matching_with_hint` consuming ~25% of self
time, dominating any per-iteration delta in the parse loop. Tracked
separately (see follow-up issue) — independent of this PR.
Refs #178
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds a depth-0 greedy parse path to the Row matcher, sets level 4’s Row matcher ChangesGreedy Matching Path for Compression Level 4
Sequence DiagramsequenceDiagram
participant TestHarness
participant MatchGeneratorDriver
participant RowMatchGenerator
participant RowTable
participant OffsetHistory
TestHarness->>MatchGeneratorDriver: request compress_block(Level(4), slices)
MatchGeneratorDriver->>RowMatchGenerator: debug_assert_lazy_depth_then_start_matching_greedy()
RowMatchGenerator->>RowTable: backfill & lookup (abs_pos, abs_pos+1)
RowMatchGenerator->>OffsetHistory: encode_offset_with_history(offset)
RowMatchGenerator->>MatchGeneratorDriver: emit Sequence::Triple / Sequence::Literals
MatchGeneratorDriver->>TestHarness: reconstructed output / match metadata
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@zstd/src/encoding/row/mod.rs`:
- Around line 290-315: The outer greedy loop is still gated by ROW_MIN_MATCH_LEN
which prevents evaluating the rep-only tail where REP_MIN_MATCH_LEN = 4 should
apply; introduce a GREEDY_MIN_LOOKAHEAD = REP_MIN_MATCH_LEN + 1 (or simply use
REP_MIN_MATCH_LEN + 1) and replace the loop guard `while pos + ROW_MIN_MATCH_LEN
<= current_len` with `while pos + GREEDY_MIN_LOOKAHEAD <= current_len` so the
`rep_probe_pos = abs_pos + 1` / `rep_match` probe (which checks `rep_probe_pos +
REP_MIN_MATCH_LEN <= self.history_abs_end()`) becomes reachable for the final
bytes. Ensure REP_MIN_MATCH_LEN is defined/visible before the loop so the new
GREEDY_MIN_LOOKAHEAD can reference it.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 2274cc9f-d4b5-46c2-9afb-d3d5c1be2110
📒 Files selected for processing (2)
zstd/src/encoding/match_generator.rszstd/src/encoding/row/mod.rs
There was a problem hiding this comment.
Pull request overview
This PR refines the level-4 (Greedy) encoder path by adding a dedicated donor-parity greedy loop for the Row backend and routing L4 to it (via lazy_depth = 0), aiming to reduce overhead from unnecessary lazy lookahead while preserving (or improving) ratio via repcode-first probing and an immediate rep loop.
Changes:
- Added
RowMatchGenerator::start_matching_greedy()implementing a donor-inspired greedy parse loop forlazy_depth == 0. - Updated level-4 parameters to set
lazy_depth: 0so Greedy no longer routes through the depth-1 lazy machinery. - Added Row-backend dispatch logic to select the greedy path when
lazy_depth == 0.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
zstd/src/encoding/row/mod.rs |
Adds the new greedy parsing loop (start_matching_greedy) with repcode-first probing, miss skip-step, and immediate rep loop. |
zstd/src/encoding/match_generator.rs |
Routes level 4 to lazy_depth = 0 and dispatches Row matching to the new greedy path. |
`row/mod.rs:290`: outer-loop guard relaxed from
`pos + ROW_MIN_MATCH_LEN <= current_len` to
`pos + GREEDY_MIN_LOOKAHEAD <= current_len` where
`GREEDY_MIN_LOOKAHEAD = REP_MIN_MATCH_LEN + 1 = 5`. The previous
bound was over-strict: a repcode probe at `abs_pos + 1` only needs
4 bytes of lookahead beyond that probe point, so the last 5-byte
position was unreachable even though its rep probe would succeed.
`REP_MIN_MATCH_LEN` is hoisted out of the loop body so the new
outer guard can reference it.
`row/mod.rs:242` (doc): bullet (2) of the greedy parse description
claimed "the regular search at abs_pos is skipped entirely" — that
is donor's depth-0 behavior, not ours. Rewritten to document the
hybrid form (rep probe at +1 *and* regular search at pos, longer
wins, ties go to rep). Explicitly notes why donor's pure
commit-on-first-rep was rejected: ratio cliff without minMatch=5
and superblock entropy sharing.
`row/mod.rs:367` (doc): comment said `SKIP_STRENGTH = 12` (~4 KiB
miss-run threshold) but the constant is 10 (~1 KiB). Comment
realigned to the actual value.
`match_generator.rs:1223` (dispatch): the `Row` backend arm had a
runtime `if matcher.lazy_depth == 0` with a fall-through to the
plain `start_matching` path, but `StrategyTag::backend` maps only
`Greedy` → `Row`, so the `else` was unreachable in normal
operation. Replaced with unconditional `start_matching_greedy`
plus a `debug_assert_eq!(matcher.lazy_depth, 0, …)` documenting
the invariant.
The plain `start_matching` / `best_match` / `pick_lazy_match` /
`repcode_candidate` methods on `RowMatchGenerator` now have no
in-tree caller. Marked `#[allow(dead_code)]` rather than deleted:
a future routing of a lazy strategy through the Row backend would
otherwise have to re-derive the rep+row best-of-two pick and the
`pick_lazy_match` driver. Doc comment makes the rationale explicit.
`match_generator.rs` tests: new
`driver_level4_greedy_round_trip_single_slice` locks down L4 greedy
reconstruction on a repeating-pattern single slice and asserts at
least one `Sequence::Triple` is emitted (the rep probe must produce
a match). `driver_level4_greedy_round_trip_cross_slice` commits two
identical slices and asserts the parse matches into the first
slice's history (offset >= chunk.len()). Covers the new
`start_matching_greedy` entry point + cross-block hash-table
seeding.
Verified locally:
- cargo nextest run: 521 / 521 pass (3 skipped)
- cargo clippy --all-targets --features dict_builder: clean
- cargo fmt --check: clean
- compare_ffi L4 REPORT lines: ratio unchanged on all scenarios
(538142 z000033, 8947 large-log-stream, ...)
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@zstd/src/encoding/match_generator.rs`:
- Around line 3997-4118: The tests only verify round-trip reconstruction but
don't assert that Level(4) actually uses the greedy path; after calling
driver.reset(CompressionLevel::Level(4)) assert the matcher routing state (e.g.
assert_eq!(driver.row_matcher().lazy_depth, 0)) to prove greedy selection, and
add a targeted fixture in driver_level4_greedy_round_trip_single_slice (or a new
test) that inspects the first emitted sequence from start_matching (or uses a
probe of start_matching_greedy vs lazy behavior) to assert a greedy-specific
behavior (for example expecting an immediate rep Triple at start rather than the
lazy parser's different first sequence), referencing
MatchGeneratorDriver::reset, MatchGeneratorDriver::row_matcher, and
start_matching/start_matching_greedy to locate changes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: d39f8572-2f5f-4744-bb66-d9ce477cdaec
📒 Files selected for processing (2)
zstd/src/encoding/match_generator.rszstd/src/encoding/row/mod.rs
The immediate-rep loop inside `start_matching_greedy` was unreachable
on every test and bench workload: a `panic!` probe placed inside the
loop body ran the full 528-test suite without firing. The donor design
it mirrors is single-rep, where the inner loop catches hits the main
loop's single-slot probe wouldn't. Our main-loop rep probe goes
through `repcode_candidate_shared` which already evaluates rep1, rep2,
rep3 plus the `ll0` fallback — the inner-loop slot is subsumed by the
next outer-iteration rep probe of the same slot, so the inner loop is
dead by construction. Removed.
Side effect on `decodecorpus-z000033` (L4): rust_bytes 538142 -> 537897
(-245 bytes), because the loop's offset_hist rotations on the few
inputs where its slice-equality check passed turned out to bias
subsequent encoding decisions slightly the wrong way. Other scenarios
unchanged.
Tests added (all hit `start_matching_greedy` via L4 driver):
- `l4_greedy_round_trip` private test helper: commits multi-slice data
per-slice with drive interleaved (single-shot driving loses earlier
slices' bytes once they fall out of the window).
- `driver_level4_greedy_tail_rep_only_reachable` — cross-slice payload
whose tail position lives in the 5-byte window that was unreachable
under the old `pos + ROW_MIN_MATCH_LEN <= current_len` guard. Asserts
at least one match emitted from that region.
- `driver_level4_greedy_empty_input_emits_nothing` — `current_len == 0`
early-return guard exercises.
- `driver_level4_greedy_sub_min_lookahead_input` — 4-byte payload
below `GREEDY_MIN_LOOKAHEAD = 5`; outer loop never runs, all bytes
emit as literals.
- `driver_level4_greedy_incompressible_input` — 256 pseudo-random
bytes hit the miss-branch + skip-step path with no rep / row
candidate.
- `driver_level4_greedy_long_literal_run_skip_step_growth` — 2 KiB of
pseudo-random data drives the literal-run length past the
`SKIP_STRENGTH = 10` threshold so the per-miss step grows beyond 1.
- `driver_level4_greedy_all_zeros_heavy_rep1` — heavy rep1 path
(offset = 1, byte-against-prev-byte). Asserts `max_offset == 1`.
- `driver_level4_greedy_periodic_pattern_rep_cascade` — 16-byte period
repeated 32x; locks the steady-state rep-cascade at offset >= 16.
Doc on `start_matching_greedy` updated: the donor immediate-rep
paragraph is rewritten to explain WHY it's omitted (single-rep donor
vs three-rep us) and that removal was empirically validated via the
panic probe.
Coverage on `row/mod.rs`: 89.12% -> 96.69%. Remaining gaps are micro-
branches (rep probe at exact end-of-buffer, regular-strictly-longer-
than-rep arm, history compaction drain) and the `#[allow(dead_code)]`
`start_matching` / `best_match` / `pick_lazy_match` / `repcode_candidate`
scaffold for future lazy-on-Row routing.
Verified:
- cargo nextest run: 528 / 528 pass (3 skipped)
- cargo clippy --all-targets --features dict_builder: clean
- cargo fmt --check: clean
- compare_ffi L4 REPORT (post-removal):
decodecorpus-z000033: rust_bytes 537897 (was 538142, -245)
large-log-stream: rust_bytes 8947 (unchanged)
others: unchanged
`driver_level4_selects_row_backend`: extended to also assert
`driver.row_matcher().lazy_depth == 0` after `reset(Level(4))`. The
dispatcher in `MatchGeneratorDriver::start_matching` has a
`debug_assert_eq!(matcher.lazy_depth, 0)` invariant that routes
`BackendTag::Row` unconditionally into `start_matching_greedy` — but
that invariant is only checked in debug builds. The runtime assertion
here pins the routing choice in release tests too, so a future change
that rerouted L4 through the lazy parse (`start_matching` with
`lazy_depth >= 1`) would surface before the round-trip suite runs.
Round-trip correctness alone passes on either parser, which is why
that suite needs a routing-specific companion.
Doc fixes:
- `row/mod.rs:237` (start_matching_greedy docs): reworded "from a
depth-1 lazy with `lazy_depth = 0`" — internally inconsistent
(lazy_depth = 0 IS greedy, not depth-1 lazy). Now reads "from
the lazy parse in `start_matching` which `lazy_depth >= 1`
strategies use", removing the contradictory phrasing.
- Round-trip test doc updated to acknowledge that round-trip alone
does NOT pin the greedy-vs-lazy choice (the lazy_depth assertion
in the sibling test does), and to describe what the round-trip
suite actually guards (non-empty triple emission on structured
input).
Verified:
- cargo nextest run: 528 / 528 pass (3 skipped)
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@zstd/src/encoding/row/mod.rs`:
- Around line 409-425: The code currently calls self.insert_positions(abs_pos,
candidate.start + candidate.match_len) which fails to index bytes between
candidate.start and abs_pos when extend_backwards_shared moves candidate.start
backwards; update the insertion range to start from candidate.start (i.e. call
insert_positions with candidate.start as the first arg) so the committed match
region [candidate.start, candidate.start + candidate.match_len) is backfilled
into the row table before emitting the sequence (affecting insert_positions,
candidate.start, abs_pos, and candidate.match_len).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0fb07c76-aba3-45c2-b9bc-daad938da461
📒 Files selected for processing (2)
zstd/src/encoding/match_generator.rszstd/src/encoding/row/mod.rs
… bound
Two doc-only corrections + one declined suggestion with measurement
rationale baked into the source comment.
`row/mod.rs:172` (dead-code `start_matching` doc): the dispatch
invariant referenced by this scaffold method lives in the
`BackendTag::Row` arm of `MatchGeneratorDriver::compress_block`
(`match_generator.rs:1230` area), not in
`MatchGeneratorDriver::start_matching`. The doc comment had it on the
wrong method, which would mislead anyone navigating the call stack
from this scaffold back to its sole caller.
`match_generator.rs:3995` (`driver_level4_selects_row_backend` test
comment): same reference fix — the `debug_assert_eq!` lives inside
`compress_block` not `start_matching`.
`row/mod.rs:411` (`insert_positions` bound): added an explanatory
comment pinning the `abs_pos` lower bound and explaining WHY the
narrower range is intentional. The wider range
`[candidate.start, candidate.start + match_len)` was tried and
regresses `rust_bytes` on `decodecorpus-z000033` by +447
(537897 -> 538344). `extend_backwards_shared` absorbs literal bytes
that were already row-indexed on earlier miss iterations via
`insert_position(abs_pos)`, so re-indexing them at emit time
overwrites the same `abs_pos -> position` mapping a second time
and evicts more recent row-slot tenants. The narrow `abs_pos` bound
keeps the row table healthier and the ratio +0 / -447 the right
direction.
No code-behavior change (the function body's emit logic is the same).
Verified:
- cargo nextest run: 528 / 528 pass (3 skipped)
- cargo clippy --all-targets --features dict_builder: clean
- cargo fmt --check: clean
- compare_ffi L4 z000033 rust_bytes: 537897 (unchanged from
pre-doc-fix baseline)
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@zstd/src/encoding/match_generator.rs`:
- Around line 4216-4225: The test currently asserts on total triples (triples >=
1) which can be satisfied by matches inside the first slice; instead inspect the
detailed match information returned by l4_greedy_round_trip and assert that at
least one match originates in the second slice (i.e., has a start position >=
first.len()). Replace the weak assert with a check on the second return value
from l4_greedy_round_trip (the matches/list/positions it returns) to ensure
there exists a match whose position >= first.len(), referencing the variables
first, second, combined and the l4_greedy_round_trip call.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: ff899b25-64b2-4a30-ad1d-21318756ca7f
📒 Files selected for processing (2)
zstd/src/encoding/match_generator.rszstd/src/encoding/row/mod.rs
…sertion The previous `driver_level4_greedy_tail_rep_only_reachable` asserted `triples >= 1` on a combined-payload round-trip — too weak, because the original first slice (`"BEEF_BEEF_aaaaaa"`) already contained "BEEF" twice and would emit at least one match through the regular row-table path even if the `GREEDY_MIN_LOOKAHEAD = REP_MIN_MATCH_LEN + 1` guard relaxation regressed and the second-slice tail rep stayed unreachable. Pass-through assertion masking the case under test. Replace with a slice-by-slice drive that counts triples emitted from the **second slice's** `start_matching` pass and a deliberately constructed payload that triggers the exact tail boundary: - First slice: `"ABCDABCDABCDABCD"` (16 bytes, strict period 4) guarantees `offset_hist[0] = 4` by the time the parse reaches the end of the slice. - Second slice: `"ABCDA"` — exactly 5 bytes (= `GREEDY_MIN_LOOKAHEAD`). Outer loop runs once at `pos = 0`. The regular `row_candidate` requires 6 bytes from `abs_pos`, which is past the live-history end, so the only viable hit is the `abs_pos + 1` rep probe. `second[1..5] == "BCDA"` matches the 4-byte sequence `first[13..16] ++ second[0] == "BCDA"` at offset 4, and `extend_backwards_shared` then absorbs `second[0]` for a 5-byte rep match. With the old `pos + ROW_MIN_MATCH_LEN <= current_len` guard the outer loop would have refused to enter (0 + 6 > 5); with the relaxed `pos + GREEDY_MIN_LOOKAHEAD <= current_len` guard it enters and the rep fires. Without this strengthening, a future regression on the loop guard would not surface here. With it, the test fails immediately. Verified: - cargo nextest run: 528 / 528 pass (3 skipped) - cargo clippy --all-targets --features dict_builder: clean - cargo fmt --check: clean
…y asserts The `driver_level4_greedy_long_literal_run_skip_step_growth` test comment claimed it would catch "a future change to SKIP_STRENGTH" that would silently skip valid match positions, but the test body only asserts bit-exact round-trip — round-trip would pass on SKIP_STRENGTH = 6 or 14 as well, since both produce valid sequences, just at different per-iteration costs. The doc overstated the guarantee. Rewrite the comment to describe what the test actually verifies: a smoke that the miss branch + step-grow path in start_matching_greedy survives 2 KiB of incompressible input without panicking or violating round-trip. Pinning the exact step growth would require returning step / iteration metadata from the parse, which is invasive plumbing for a constant that has been stable since the original tuning round. Doc-only change. No code-behavior delta. Existing test continues to run as a stress smoke for the skip-step path. Verified compile: cargo build --release clean.
Summary
L4 (
StrategyTag::Greedy) was routed through the Row backend's plainstart_matchingwithlazy_depth = 1, paying for an unconditional depth-1 lookahead while still calling itself "greedy". DonorZSTD_compressBlock_lazy_genericatdepth == 0(zstd_lazy.c:1560) is structurally different — that's why the strategy is named "greedy": not because the matches are long, but because each iteration greedily prefers the cheaper1 literal + repcodeencoding over a 0-literal regular match.This PR adds
RowMatchGenerator::start_matching_greedymirroring donor's depth-0lazy_genericflow, and dispatches L4 (lazy_depth == 0) to it via adebug_assert_eq!(matcher.lazy_depth, 0)invariant — the runtime side of the routing is pinned in tests viadriver.row_matcher().lazy_depth == 0. L5+ keep the existing plainstart_matchingpath (their lazy lookahead already gives ratio wins vs donor); the plain row-lazy methods are kept as#[allow(dead_code)]scaffolding for a future routing of L5+ through the Row backend.Structural changes carried over from donor
start = pos + 1: probe the repcode bank atabs_pos + 1before regular search. A near-repcode beats a longer regular offset on encoding cost (4-5 bits vs 9-13 bits for the offset code).pick_lazy_match_shared is intentionally not called from the new path — depth == 0 means no lookahead, emit on first viable hit.
Note: donor's immediate-repcode loop after store (probe offset_2 for back-to-back hits + swap) was implemented and then removed. A panic! probe inside the inner loop ran the full 528-test suite + bench matrix without firing. The donor design is single-rep; ours is three-rep (repcode_candidate_shared evaluates rep1, rep2, rep3 plus the ll0 fallback), so the inner-loop slot is subsumed by the next main-loop iteration's rep probe. Removing it nudged decodecorpus-z000033 from 538142 to 537897 bytes (-245).
Measurements (L4 only, criterion --baseline main, p < 0.05 unless noted)
Ratio: 5 cells equal, 2 cells better, 0 cells worse.
Speed: 4 wins (incl. -24% on z000033 — the cell that motivated #178), 1 noise, 2 regressions.
Tests
8 new L4-targeted tests (plus a private l4_greedy_round_trip helper):
row/mod.rs coverage: 89.12% to 96.69%. Remaining gaps are micro-branches (rep probe at exact end-of-buffer, regular-strictly-longer-than-rep arm, compact_history drain) and the #[allow(dead_code)] scaffold.
Known follow-up
The +9% large-log-stream regression and the +3.4% small-1k-random one reflect a different hotpath. samply profile on L4 large-log-stream shows RowMatchGenerator::skip_matching_with_hint consuming ~25% of self time — it inserts every position of each 128 KiB block into the row hash table (~102M insert_position calls on a 100 MB stream), dominating any per-iteration delta in the parse loop. Independent of this PR; tracked as #180.
Test plan
Related
Summary by CodeRabbit
Performance
Tests