Skip to content

feat(encoder): strategy-aware literal gates (G4 + G5)#182

Merged
polaz merged 14 commits into
mainfrom
feat/#23-block-split-literal-gates
May 19, 2026
Merged

feat(encoder): strategy-aware literal gates (G4 + G5)#182
polaz merged 14 commits into
mainfrom
feat/#23-block-split-literal-gates

Conversation

@polaz
Copy link
Copy Markdown
Member

@polaz polaz commented May 19, 2026

Summary

Donor-parity ZSTD_minLiteralsToCompress (G4) and ZSTD_minGain (G5)
wired into literal-section gates on both emit and estimator paths.
Companion to G3 (whole-block bail-out, #181).

G4 — ZSTD_minLiteralsToCompress (donor zstd_compress_literals.c:114-127)

Strategy-aware floor 8 << shift, 6-byte floor on reused HUF tables:

strategy band shift floor (no HUF reuse)
fast / dfast / greedy / lazy 3 64
btopt 2 32
btultra 1 16
btultra2 0 8

Previously hardcoded < 8 — same lower bound as btultra2 but ignored
strategy.

G5 — ZSTD_minGain (donor zstd_compress_internal.h:677-684)

Strategy-aware expansion margin (srcSize >> minlog) + 2:

strategy band minlog margin
fast..btopt 6 ~1.56% + 2
btultra 7 ~0.78% + 2
btultra2 8 ~0.39% + 2

G5 comparison uses donor formula cLitSize >= srcSize - minGain form
(payload + tree_desc vs literal payload length, no headers on either
side). Previously bare >= raw_section_bytes (zero margin) and earlier
iteration of >= raw_section_bytes - mg (which skewed by header diff)
forced fallback on marginally-winning compressed sections that donor
would keep.

Plumbing + estimator/emit parity

CompressState.strategy_tag threaded through FrameCompressor::new /
new_with_matcher / reset and synced into CompressedBlockScratch's
scratch_state. Reset-time sync happens at both reset sites:
FrameCompressor::compress and StreamingEncoder::ensure_frame_started.
Centralized use_raw_literal_fallback helper shared by emit
(compress_literals) and probe (estimate_literals_section_bytes) —
single decision point, no drift.

Emit's RLE pre-shortcut fires for any non-empty all-identical literal
section
(was gated by len >= 8). RLE and raw share the same lhSize
for a given len, so RLE = lhSize + 1 equals raw = lhSize + len at
len == 1 and is smaller by len - 1 bytes for len >= 2. The
pre-check now correctly covers:

  • len < min_lits all-identical (donor falls through to raw — we win)
  • len >= min_lits all-identical (donor reaches the same RLE block
    via cLitSize == 1 after compress — same wire output)

The pre-check runs BEFORE the min_lits gate, and
estimate_literals_section_bytes mirrors this order so splitter probe
costs match real emit byte-for-byte.

Verification

  • 534/534 lib tests pass (debug profile), including 6 new tests:
    • min_literals_to_compress_returns_per_strategy_floor — per-strategy
      floor + HUF reuse
    • min_gain_returns_per_strategy_margin — per-strategy margin +
      small-source edge
    • use_raw_literal_fallback_uses_payload_vs_srcsize_threshold
      walks the 2-byte gap between the donor payload-vs-srcSize formula
      and the on-wire-vs-on-wire form at literals.len() = 20
    • estimator_literals_section_mirrors_emit_for_short_inputs
      byte-equivalent estimator/emit on boundary lengths around each
      strategy's min_lits, sub-min_lits all-identical (RLE
      pre-check), and the HUF-reuse path with seeded last_huff_table
    • ensure_frame_started_refreshes_stale_strategy_tag_at_reset
      streaming reset-time sync: deliberately corrupts
      state.strategy_tag to a sentinel before the first write,
      asserts ensure_frame_started restores the correct tag
    • set_compression_level_then_compress_refreshes_strategy_tag
      FrameCompressor::set_compression_level from Fastest to
      Level(20) (Fast band → BtUltra2 band) followed by compress()
      must produce the new band's resolved tag
  • clippy release all-targets -D warnings — clean
  • fmt check — clean
  • Full ratio matrix (203 cells via REPORT lines): 0 deltas vs main
  • Speed (decodecorpus-z000033, criterion 10 samples, clean back-to-back):
    • L1 fast / L2 dfast / L3 dfast / L16 btopt — no significant change
      vs main (all p > 0.4 in latest clean run)
    • L20/L21/L22 btultra2 high-entropy-1m — no change vs main (p > 0.05),
      Rust ~10× faster than FFI; fast-skip preserved
    • L8 lazy: −8.4% vs main (pre-existing lazy regression; tracked
      separately as next-up lazy investigation, not introduced by this
      PR)

Notes / Follow-ups

  • L8 lazy −8% — separate lazy track (ratio + speed) follows this PR.
  • Block-level margin in SplitEstimator::estimate_subblock_size /
    emit_single_sequence_block still uses uniform (source_len >> 8) + 2
    (btultra2 value applied across all strategies). The min_gain helper
    is currently wired only into literal-section gates; migration to
    the block-level sites is a separate cleanup.
  • Remaining block-splitting gaps: G6 (HUF_flags_preferRepeat), G1
    (superblock shared entropy), G2 (cost-budgeted carve), G7 (extend
    post-split to lazy band) — tracked in feat: block splitting for improved compression ratio #23.

Related: #23

Summary by CodeRabbit

  • Refactor
    • Optimized internal compression literal-encoding logic for improved consistency and efficiency across compression levels.

Review Change Stack

polaz added 2 commits May 19, 2026 02:50
…y possible

The previous wording claimed the bail-out routes blocks that "would
have been raw-fallbacked by the real emit anyway, by the same
threshold". That is true ONLY for the single-partition path the
bail-out routes to — it is NOT a proof that a recursive split could
never beat the whole-block outcome. In principle both sub-blocks
could compress strictly (no raw-fallback in either half) and produce
`cost(first) + cost(second) < source_len + 3`. The wider donor band
gives at most `min_gain` bytes of theoretical recoverable ratio per
block — small, but non-zero.

Comment split into two paragraphs:
1. The "no wire-output drift" guarantee (only applies to the
   single-partition path the bail-out actually takes).
2. The "what we forgo by skipping the split case" caveat — both
   sub-blocks would need to compress strictly AND beat the whole-
   block cost; band is bounded by `min_gain`, empirically zero
   delta in the bench matrix.

No behavior change. Doc-only.
Donor `ZSTD_minLiteralsToCompress` (`zstd_compress_literals.c:114-127`)
floors literal-compression entry at `8 << shift` bytes with shift
derived from strategy index (fast/dfast/greedy/lazy=3, btopt=2,
btultra=1, btultra2=0) and 6-byte floor on reused HUF tables.
Previously hardcoded `< 8` floor — same lower bound but ignored
strategy: btultra2 missed the 8-byte tail compression opportunity,
btopt missed 32-byte band, etc.

Donor `ZSTD_minGain` (`zstd_compress_internal.h:677-684`) requires
compressed literals to beat raw by `(srcSize >> minlog) + 2` margin
where minlog=6 for fast..btopt, 7 for btultra, 8 for btultra2.
Previously bare `>= raw_section_bytes` — overcompressed micro-wins
that fell back to raw on real emit, biasing splitter cost predictions.

`CompressState.strategy_tag` plumbed through frame compressor,
streaming encoder, and `CompressedBlockScratch::scratch_state` so emit
and estimator paths use identical gates — keeps splitter probe costs
in sync with actual emit decisions.

- Verified 514/514 lib tests pass, clippy + fmt clean
- Full ratio matrix (203 cells): 0 deltas vs main — gates donor-equivalent
  on current fixtures
- Block-level `min_gain = (srcSize >> 8) + 2` (btultra2 margin) kept
  uniform at `SplitEstimator::estimate_subblock_size` /
  `emit_single_sequence_block` — migration to strategy-aware block-level
  gain is a separate cleanup

Related: #23
Copilot AI review requested due to automatic review settings May 19, 2026 00:10
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 19, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6aba9502-5a8b-4e46-afe8-9245e35d19df

📥 Commits

Reviewing files that changed from the base of the PR and between f5014d5 and 3933521.

📒 Files selected for processing (4)
  • zstd/src/encoding/blocks/compressed.rs
  • zstd/src/encoding/frame_compressor.rs
  • zstd/src/encoding/levels/fastest.rs
  • zstd/src/encoding/streaming_encoder.rs

📝 Walkthrough

Walkthrough

Adds strategy-aware literal-compression gating: CompressState carries a StrategyTag; new helpers compute per-strategy min-literals and min-gain; encoder and estimator use the same branch order (RLE first, then strategy gate) and donor-style payload-vs-srcSize raw-vs-compressed fallback; tests updated.

Changes

Strategy-aware literal compression gating

Layer / File(s) Summary
CompressState schema and encoder initialization
zstd/src/encoding/frame_compressor.rs, zstd/src/encoding/streaming_encoder.rs, zstd/src/encoding/levels/fastest.rs
CompressState gains a strategy_tag field. FrameCompressor::new, FrameCompressor::new_with_matcher, and StreamingEncoder::new_with_matcher initialize strategy_tag from CompressionLevel. FrameCompressor::compress and ensure_frame_started re-sync strategy_tag before each frame. Unit tests and a fixture updated to set strategy_tag.
Strategy-aware literal compression thresholds
zstd/src/encoding/blocks/compressed.rs
Adds min_literals_to_compress, min_gain, and use_raw_literal_fallback helpers that compute strategy-dependent thresholds (including prior-Huffman-table considerations).
Encoder literal compression path
zstd/src/encoding/blocks/compressed.rs
Reorders literal-section branching to run the all-identical RLE fast-path before the strategy gate, replaces fixed-length gating with min_literals_to_compress, threads strategy into compress_literals, and applies donor-style raw-fallback via use_raw_literal_fallback. compress_literals signature updated to accept strategy.
Estimator parity with encoder
zstd/src/encoding/blocks/compressed.rs
Splitter probe threads state.strategy_tag; estimate_literals_section_bytes accepts strategy, mirrors encoder ordering (RLE then min-lits), and uses use_raw_literal_fallback with consistent last-Huffman handling to match emitter decisions.
Comments, imports, and tests
zstd/src/encoding/blocks/compressed.rs, zstd/src/encoding/levels/fastest.rs, zstd/src/encoding/frame_compressor.rs
Updates whole-block bail-out comments; adjusts test imports to include min_gain, min_literals_to_compress, and StrategyTag; adds tests for per-strategy floors, margins, corrected raw-fallback boundary, estimator/emitter parity across strategies and boundary lengths, and regression tests for strategy_tag refresh.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Possibly related PRs

Poem

🐰 I nibble bytes where thresholds play,
RLE first, then Huffman leads the way,
Strategy sewn into each tiny state,
Raw or tree — the gate decides my fate,
I hop, I pack, and compress the day.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'feat(encoder): strategy-aware literal gates (G4 + G5)' directly and clearly summarizes the main changes: implementation of strategy-aware literal compression gates G4 and G5 in the encoder.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/#23-block-split-literal-gates

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
zstd/src/encoding/blocks/compressed.rs (1)

530-537: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Estimator RLE gate does not mirror emitter for short literal runs.

estimate_literals_section_bytes allows RLE on all-identical literals without the emitter’s len >= 8 guard. With reused HUF tables (min_lits == 6), 6–7 byte runs can be costed as RLE in estimator but not emitted as RLE, which breaks estimator/emit parity.

Proposed parity fix
 fn estimate_literals_section_bytes(
     literals: &[u8],
     last_huff: &mut Option<huff0_encoder::HuffmanTable>,
     counts: &mut [usize; 256],
     strategy: crate::encoding::strategy::StrategyTag,
 ) -> usize {
     // Mirror `encode_block_parts_with_sequence_scratch` literal-mode branches.
     let min_lits = min_literals_to_compress(strategy, last_huff.is_some());
-    if literals.len() < min_lits {
-        *last_huff = None;
-        return uncompressed_literals_header_bytes(literals.len()) + literals.len();
-    }
-    if all_bytes_identical(literals) {
+    if literals.len() >= 8 && all_bytes_identical(literals) {
         *last_huff = None;
         return uncompressed_literals_header_bytes(literals.len()) + 1;
     }
+    if literals.len() < min_lits {
+        *last_huff = None;
+        return uncompressed_literals_header_bytes(literals.len()) + literals.len();
+    }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@zstd/src/encoding/blocks/compressed.rs` around lines 530 - 537, The estimator
allows RLE for any all-identical literal slice but the emitter only emits RLE
for runs len >= 8, causing parity errors; in estimate_literals_section_bytes
(the block using min_literals_to_compress, last_huff, all_bytes_identical, and
uncompressed_literals_header_bytes) add the same short-run guard the emitter
uses by requiring literals.len() >= 8 before treating an all-identical run as
RLE, and otherwise fall through to the non-RLE path (clearing last_huff and
returning uncompressed header + length or 1 as currently done).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@zstd/src/encoding/blocks/compressed.rs`:
- Around line 530-537: The estimator allows RLE for any all-identical literal
slice but the emitter only emits RLE for runs len >= 8, causing parity errors;
in estimate_literals_section_bytes (the block using min_literals_to_compress,
last_huff, all_bytes_identical, and uncompressed_literals_header_bytes) add the
same short-run guard the emitter uses by requiring literals.len() >= 8 before
treating an all-identical run as RLE, and otherwise fall through to the non-RLE
path (clearing last_huff and returning uncompressed header + length or 1 as
currently done).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 54afb8a4-0802-4612-b2a5-29a28af9eae1

📥 Commits

Reviewing files that changed from the base of the PR and between f5014d5 and 19a3eda.

📒 Files selected for processing (4)
  • zstd/src/encoding/blocks/compressed.rs
  • zstd/src/encoding/frame_compressor.rs
  • zstd/src/encoding/levels/fastest.rs
  • zstd/src/encoding/streaming_encoder.rs

@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2026

Codecov Report

❌ Patch coverage is 99.56140% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
zstd/src/encoding/blocks/compressed.rs 99.44% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Emitter only emits RLE when `len >= 8 && all_identical`
(`encode_block_parts_with_sequence_scratch` RLE branch). After G4
landed, estimator's `min_lits` can be 6 on reused HUF tables, so a
bare `all_identical` check let estimator predict a 1-byte RLE payload
for 6-7 byte all-identical literals that the emitter would actually
write as raw — splitter probe costs drift from real emit output.

Mirror the same `len >= 8` guard explicitly so estimator and emit
stay byte-equivalent on every input.

Reported by CodeRabbit on PR #182.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR threads encoder strategy information into compression state so literal-section emit and estimator paths can apply strategy-aware donor gates for minimum literals and minimum gain.

Changes:

  • Adds strategy_tag to CompressState and initializes/syncs it in frame and streaming encoders.
  • Adds min_literals_to_compress and min_gain helpers in compressed-block encoding.
  • Applies the new gates to literal compression and estimator paths, including post-split estimator state.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
zstd/src/encoding/streaming_encoder.rs Initializes streaming encoder compression state with a strategy tag.
zstd/src/encoding/levels/fastest.rs Updates tests to construct CompressState with the new field.
zstd/src/encoding/frame_compressor.rs Adds and synchronizes CompressState::strategy_tag for frame compression.
zstd/src/encoding/blocks/compressed.rs Implements and applies strategy-aware literal compression gates.
Comments suppressed due to low confidence (4)

zstd/src/encoding/blocks/compressed.rs:535

  • The estimator no longer mirrors the emit path for small RLE literal sections. For strategies whose min_lits is greater than 8, an 8..63 byte all-identical literal section is emitted as RLE by encode_block_parts_with_sequence_scratch, but this branch returns a raw-literals cost before checking all_bytes_identical, so splitter decisions are based on the wrong size/state.
    let min_lits = min_literals_to_compress(strategy, last_huff.is_some());
    if literals.len() < min_lits {
        *last_huff = None;
        return uncompressed_literals_header_bytes(literals.len()) + literals.len();
    }
    if all_bytes_identical(literals) {

zstd/src/encoding/blocks/compressed.rs:596

  • This does not match the donor cLitSize >= srcSize - minGain gate quoted below. total includes the zstd literals-section header, and raw_section_bytes includes the raw header, so the fallback boundary is shifted by the header-size difference instead of applying minGain to the literal source size/HUF payload. That can make the estimator choose raw in cases donor would still use compressed literals.
    let raw_section_bytes = uncompressed_literals_header_bytes(literals.len()) + literals.len();
    let mg = min_gain(literals.len(), strategy);
    if total >= raw_section_bytes.saturating_sub(mg) {

zstd/src/encoding/blocks/compressed.rs:1762

  • This fallback still compares the full on-wire literals section against the raw section with minGain subtracted, rather than applying the quoted donor check to cLitSize versus srcSize - minGain. Because total_len includes the compressed literals header and raw_section_bytes includes the raw header, the decision boundary is offset by header sizes and can fall back to raw when donor would emit compressed literals.
    let raw_section_bytes = uncompressed_literals_header_bytes(literals.len()) + literals.len();
    // Donor `compress_literals` (`zstd_compress_literals.c:187-188`):
    // `cLitSize >= srcSize - minGain` → raw fallback. `minGain` is
    // strategy-aware (`min_gain` helper above; ~1.5% for fast..btopt,
    // ~0.78% for btultra, ~0.39% for btultra2). Previously this check
    // used bare `>= raw_section_bytes` (zero margin) which let
    // marginally-compressing blocks emit huf when raw + reused table
    // would compress better in the next block. Saturating subtraction
    // covers tiny inputs where `raw_section_bytes < min_gain`.
    let mg = min_gain(literals.len(), strategy);
    if total_len >= raw_section_bytes.saturating_sub(mg) {

zstd/src/encoding/blocks/compressed.rs:537

  • When a previous HUF table exists, min_lits is 6, so 6- or 7-byte all-identical literal sections reach this branch and are costed as RLE. The real emit path only takes the RLE shortcut for len >= 8, and compress_literals does not have an RLE fallback for these sub-8 inputs, so estimator probes undercount these sections and can choose different splits than the encoder emits.
    let min_lits = min_literals_to_compress(strategy, last_huff.is_some());
    if literals.len() < min_lits {
        *last_huff = None;
        return uncompressed_literals_header_bytes(literals.len()) + literals.len();
    }
    if all_bytes_identical(literals) {
        *last_huff = None;
        return uncompressed_literals_header_bytes(literals.len()) + 1;

Comment thread zstd/src/encoding/blocks/compressed.rs Outdated
Comment thread zstd/src/encoding/blocks/compressed.rs
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@zstd/src/encoding/blocks/compressed.rs`:
- Around line 529-544: The estimator currently checks min_literals_to_compress
first which can return raw for 8-byte identical literals when emit would choose
RLE; in the function (mirror of encode_block_parts_with_sequence_scratch) swap
the order so the all_bytes_identical && literals.len() >= 8 RLE check runs
before the min_lits check: if it matches, set *last_huff = None and return
uncompressed_literals_header_bytes(literals.len()) + 1, otherwise fall through
to the min_literals_to_compress check and the existing raw path; keep references
to last_huff, min_literals_to_compress, all_bytes_identical and
uncompressed_literals_header_bytes so behavior matches the emitter exactly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1f9e826e-9867-43a6-addb-5bc2245623c6

📥 Commits

Reviewing files that changed from the base of the PR and between 19a3eda and f7d7426.

📒 Files selected for processing (1)
  • zstd/src/encoding/blocks/compressed.rs

Comment thread zstd/src/encoding/blocks/compressed.rs Outdated
Emit checks the RLE pre-shortcut (`len >= 8 && all_identical`)
BEFORE the strategy-aware `min_lits` gate, so an all-identical
8-byte literal section emits a 2-byte RLE block under every
strategy — including fast/dfast/greedy/lazy where `min_lits == 64`.
The estimator previously gated `min_lits` first, predicting raw
for those inputs while emit wrote RLE — splitter probe costs
diverged. Swap the order in `estimate_literals_section_bytes` so
both paths agree byte-for-byte; update emit's RLE-pre-check
comment to be accurate about the donor divergence.

Add unit tests covering:

- `min_literals_to_compress` per-strategy values with and without
  HUF reuse (donor table from `zstd_compress_literals.c:114-127`)
- `min_gain` per-strategy margins (`zstd_compress_internal.h:677-684`)
  including small-source `(src >> minlog) + 2` edge
- Estimator/emit parity for boundary literal lengths around each
  strategy's `min_lits` and across the RLE `>= 8` cliff

Verified 531/531 lib tests pass; ratio matrix (203 cells) shows
0 delta vs main.
Copilot AI review requested due to automatic review settings May 19, 2026 00:27
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (2)

zstd/src/encoding/blocks/compressed.rs:1773

  • This compares the compressed literals size to raw_section_bytes - min_gain, but the donor check documented above is against srcSize - minGain (the literal payload length, without the raw header). Adding the raw header makes the gate 1–3 bytes too permissive, so marginal Huffman sections can be emitted where the strategy-aware min-gain rule should fall back to raw and clear the table state.
    if total_len >= raw_section_bytes.saturating_sub(mg) {

zstd/src/encoding/blocks/compressed.rs:543

  • This RLE mirror keeps the old len >= 8 cutoff, but the new strategy gate can attempt Huffman at length 6 when last_huff is present. If the emitter is corrected to RLE 6–7 byte all-identical sections with a reusable table, the estimator needs the same threshold or split probes will overestimate those partitions.
    if literals.len() >= 8 && all_bytes_identical(literals) {
        *last_huff = None;
        return uncompressed_literals_header_bytes(literals.len()) + 1;

Comment thread zstd/src/encoding/blocks/compressed.rs Outdated
Comment thread zstd/src/encoding/blocks/compressed.rs Outdated
Two related parity fixes to the literal-section pipeline:

1. min_gain comparison now matches donor formula. Previously compared
   `total_len >= raw_section_bytes - mg` (compressed lhSize on LHS,
   raw lhSize on RHS), which skewed the threshold by
   `compressed_header - raw_header` bytes and forced raw fallback on
   marginally-winning compressed sections that donor would keep. Switch
   to donor's `cLitSize >= srcSize - minGain` form
   (`zstd_compress_literals.c:187-188`): compare
   `(total_len - compressed_header_len) >= literals.len() - mg`. Applied
   in both estimator (`estimate_literals_section_bytes`) and emit
   (`compress_literals`) so probe costs stay in lockstep.

2. RLE pre-check fires for any non-empty all-identical literal section,
   not just `len >= 8`. The old `len >= 8` floor caused 6/7-byte
   all-identical sections with a reusable HUF table (`min_lits == 6`)
   to skip RLE and emit a larger HUF block — donor reaches the same
   RLE block via `cLitSize == 1` after the `min_lits` gate. Dropping
   the floor matches donor on `>= min_lits` inputs and produces
   strictly smaller output than donor on the `< min_lits` all-identical
   edges (where donor falls through to raw). Applied symmetrically in
   estimator and emit.

531/531 lib tests pass; ratio matrix (203 cells via REPORT lines) shows
0 delta vs main — both fixes are ratio-safe on current fixtures.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comment thread zstd/src/encoding/blocks/compressed.rs
Comment thread zstd/src/encoding/blocks/compressed.rs
The parity test previously only seeded `last_huff_table: None`, so the
floor-6 HUF reuse path (`ZSTD_minLiteralsToCompress` with valid prior
HUF table) and 6/7-byte all-identical sections that newly route
through the unconditional RLE pre-check were not exercised in
estimator/emit parity.

Add cases:
- Sub-`min_lits` all-identical (`len in 1..=7` at fast strategy):
  exercises the new "any non-empty all-identical → RLE" path
- Seeded HUF table with `Lazy` strategy at 6/7-byte all-identical and
  16-byte non-identical: exercises the `min_lits == 6` floor and the
  HUF-reuse decision under the new RLE pre-check

Also update the in-test docstring to drop the stale "RLE `>= 8` cliff"
phrasing — the cliff was removed when RLE became unconditional for
non-empty all-identical inputs.

531/531 lib tests pass.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comment thread zstd/src/encoding/blocks/compressed.rs Outdated
Comment thread zstd/src/encoding/blocks/compressed.rs Outdated
polaz added 2 commits May 19, 2026 04:24
…igin

`donor_clit` told the reader where the formula came from, not what the
value represents. Rename to `huf_section_size` (tree description plus
HUF payload, excluding the literals lhSize) — describes the bytes
themselves so the call site is readable without recalling reference
naming conventions. Same rename in both the emit (`compress_literals`)
and estimator (`estimate_literals_section_bytes`) sides.

Test names follow the same principle:
- `min_literals_to_compress_matches_donor_table`
  → `min_literals_to_compress_returns_per_strategy_floor`
- `min_gain_matches_donor_margin`
  → `min_gain_returns_per_strategy_margin`

Behavior unchanged; 531/531 lib tests pass, clippy/fmt clean, ratio
matrix 0 deltas vs main.
Extract `use_raw_literal_fallback(huf_section_size, literals_len,
strategy)` helper so `compress_literals` (emit) and
`estimate_literals_section_bytes` (probe) share one decision point —
neither side can drift back to an on-wire-vs-on-wire comparison that
inflates the threshold by the lhSize delta and rejects marginally
compressed sections.

Add `use_raw_literal_fallback_uses_payload_vs_srcsize_threshold` test
that walks the 2-byte gap (`compressed_lhsize - raw_lhsize`) at
`literals.len() = 20` to lock in the payload-vs-srcSize semantics.

Tighten the RLE shortcut docstring: RLE equals raw at `len == 1`
(both 2 bytes), strictly smaller only at `len >= 2`.

532/532 lib tests, clippy/fmt clean.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comment thread zstd/src/encoding/blocks/compressed.rs Outdated
Comment thread zstd/src/encoding/blocks/compressed.rs Outdated
Comment thread zstd/src/encoding/blocks/compressed.rs
`min_literals_to_compress` Lazy mapping: drop the "most-aggressive-of-band"
phrasing — donor's shift table pins strategies 1..6 to `shift = 3`
uniformly, so Lazy → 64-byte floor regardless of which donor index in
the 4..6 band we'd nominally pick. No aggressiveness gradient exists
within this band.

`min_gain` doc: split the donor formula (gates both block-level and
literal-section in donor) from the current crate usage (helper is
wired into literal-section gates only; block-level emit and probe
still use uniform `(source_len >> 8) + 2`). Previous wording conflated
the two and read as if the helper were already plumbed everywhere.
@polaz polaz requested a review from Copilot May 19, 2026 02:12
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comment thread zstd/src/encoding/frame_compressor.rs
Comment thread zstd/src/encoding/frame_compressor.rs
`StreamingEncoder::ensure_frame_started` resets the matcher and clears
last_huff / fse_tables for the new frame but did not refresh
`state.strategy_tag` from the active compression level. That left the
literal-compression gates (`min_literals_to_compress`, `min_gain`)
reading the construction-time strategy on every subsequent frame,
which would silently leak the wrong gate floor if the encoder were
ever extended with a between-frame level change.

Mirror `FrameCompressor::compress`'s reset-time sync at the streaming
reset site so both entry points are byte-equivalent at the gate
level. Tighten the `CompressState.strategy_tag` invariant docstring
to enumerate the two owned reset sites explicitly. Add a regression
test that exercises the streaming reset across four levels and
asserts the synced tag matches `StrategyTag::for_compression_level`.

533/533 lib tests pass; clippy and fmt clean.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comment thread zstd/src/encoding/frame_compressor.rs
…y_tag

The streaming reset-time sync test only exercises a single fixed level
per encoder, which would pass even with the reset-time sync deleted
because the construction-time tag already matches. Add a regression
test that drives the level-change path explicitly: construct
`FrameCompressor` at `CompressionLevel::Fastest`, call
`set_compression_level(Best)` to cross from the Fast band into the
BtUltra2 band, run a full `compress()` cycle, then verify
`state.strategy_tag` reflects the new level's
`StrategyTag::for_compression_level` resolution.

Includes an `assert_ne!` fixture invariant that the two chosen levels
resolve to different `StrategyTag` variants, so a future renumbering
that collapses the bands would be loud about it rather than turning
the test into a no-op.

534/534 lib tests pass; clippy and fmt clean.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

zstd/src/encoding/blocks/compressed.rs:566

  • This repeats the small-section byte formula even though the estimator branch applies to all literal lengths; for lengths 32+ the headers are not 1 byte, so the comment should qualify the comparison while keeping the correct "RLE is smaller" conclusion.
    // (any non-empty section) BEFORE the `min_lits` gate — on
    // all-identical inputs RLE is equal to raw at `len == 1` (both
    // 2 bytes) and strictly smaller for `len >= 2`, so it is never
    // worse than raw and is selected regardless of strategy. Estimator

Comment thread zstd/src/encoding/blocks/compressed.rs Outdated
Comment thread zstd/src/encoding/blocks/compressed.rs Outdated
`min_literals_to_compress` doc: replace the "≥30 bytes" HUF
tree-description lower bound with neutral phrasing — small alphabets
with low max symbols can serialize substantially smaller descriptions.
The exact byte cost depends on alphabet size and max symbol, not a
fixed minimum; the surrounding sentence already conveys that
overhead-dominates-payload is what drives the per-strategy floor.

RLE-shortcut docstrings (emit and estimator branches): replace
"2 bytes vs `1 + len`" with the header-helper-aware form. Both RLE
and raw use the same `uncompressed_literals_header_bytes(len)`
(1/2/3/5 bytes by length tier), so the size relationship is
RLE = lhSize + 1, raw = lhSize + len → equal at `len == 1`, RLE
smaller by `len - 1` for `len >= 2`. The previous wording was only
correct for the `len < 32` tier where lhSize = 1.

No code changes; clippy and fmt clean.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (3)

zstd/src/encoding/blocks/compressed.rs:414

  • The literals header size tiers listed here include a 5-byte case, but write_uncompressed_literals_header/uncompressed_literals_header_bytes in this file only encode raw/RLE literal headers as 1, 2, or 3 bytes. Keeping the extra tier in the explanation makes the RLE-vs-raw size argument harder to verify.
    // reuse). RLE and raw share the same lhSize for a given `len`
    // (both use `uncompressed_literals_header_bytes`), so RLE = lhSize + 1
    // and raw = lhSize + len. That makes RLE equal to raw on `len == 1`
    // and smaller by exactly `len - 1` bytes for `len >= 2`, regardless of
    // the lhSize tier (1 / 2 / 3 / 5 bytes). Our pre-check fires for ANY

zstd/src/encoding/blocks/compressed.rs:572

  • This duplicates the same inaccurate header-size tier list: raw/RLE literal headers in this file are 1, 2, or 3 bytes, not 1/2/3/5. Please align the estimator comment with uncompressed_literals_header_bytes so future parity work does not reason from the wrong size table.
    // (any non-empty section) BEFORE the `min_lits` gate — RLE and raw
    // share `uncompressed_literals_header_bytes(len)` (1/2/3/5 bytes by
    // length tier), so on all-identical inputs RLE = lhSize + 1 equals
    // raw = lhSize + len at `len == 1` and is smaller by `len - 1` for
    // `len >= 2`. RLE is never worse than raw, so it is selected

zstd/src/encoding/blocks/compressed.rs:2003

  • This repeats the inaccurate historical claim for the 6/7-byte HUF-reuse cases. Before this PR, the hardcoded len >= 8 gate meant these cases did not enter HUF compression; they were emitted raw, so the comment should describe the new lowered-floor scenario rather than prior behavior.
            // HUF reuse path: floor drops to 6, so 6/7-byte sections
            // previously hit the HUF compress path. With the RLE
            // pre-check now non-empty-only, all-identical 6/7-byte
            // sections must route through RLE and stay byte-equivalent

Comment thread zstd/src/encoding/frame_compressor.rs Outdated
Comment thread zstd/src/encoding/streaming_encoder.rs Outdated
Comment thread zstd/src/encoding/blocks/compressed.rs
Comment thread zstd/src/encoding/blocks/compressed.rs Outdated
…out the sync

Three factual cleanups so the literal-gate tests provably guard the
reset-time `strategy_tag` sync and the docstrings around the RLE
pre-check stop misrepresenting historical behavior.

- `set_compression_level_then_compress_refreshes_strategy_tag`: switch
  from `CompressionLevel::Best` to `CompressionLevel::Level(20)` so the
  post-switch tag genuinely crosses into the BtUltra2 band. `Best`
  resolves to `Lazy` today, which keeps `min_literals_to_compress` in
  the same `shift=3 → 64-byte` band as `Fast` and weakens the signal.
  Add an explicit `assert_eq!(expected, StrategyTag::BtUltra2)` fixture
  invariant so future renumbering breaks loudly.
- `ensure_frame_started_refreshes_stale_strategy_tag_at_reset`
  (renamed): the previous version constructed the encoder at the same
  level it asserted, so `StreamingEncoder::new`'s own init satisfied
  the assertion even if the reset-time sync were deleted. Now the test
  deliberately corrupts `state.strategy_tag` to a sentinel
  (`BtUltra2`) before the first write and asserts the reset path
  restores the legitimate tag — a missing sync leaves the sentinel
  visible and trips the assertion.
- RLE pre-check docstring on the emit side: drop the "6/7 bytes with
  HUF reuse" example. With HUF reuse `min_lits == 6`, so 6/7-byte
  sections are NOT below the gate; donor would HUF-encode them and
  reach RLE via `cLitSize == 1`. The justification for the pre-check
  is the genuine under-gate range (e.g. 8..63 bytes at
  fast/dfast/greedy/lazy without HUF reuse), not the HUF-reuse band.
- Estimator/emit parity test prose: the historical behavior for
  6/7-byte all-identical sections under the prior hardcoded `len >= 8`
  gate was RAW (not HUF). Correct both the per-case rationale and the
  block-level docstring; the test cases themselves are unchanged.

534/534 lib tests pass; clippy and fmt clean.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

@polaz
Copy link
Copy Markdown
Member Author

polaz commented May 19, 2026

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 19, 2026

✅ Actions performed

Full review triggered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants