From 7d2050a4a08e98c6aa014b123ed97ace93bcbca8 Mon Sep 17 00:00:00 2001
From: Dmitry Prudnikov <mail@polaz.com>
Date: Tue, 19 May 2026 02:50:34 +0300
Subject: [PATCH 01/14] =?UTF-8?q?docs(encoder):=20honest=20G3=20bail-out?=
 =?UTF-8?q?=20wording=20=E2=80=94=20split-win=20is=20theoretically=20possi?=
 =?UTF-8?q?ble?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The previous wording claimed the bail-out routes blocks that "would
have been raw-fallbacked by the real emit anyway, by the same
threshold". That is true ONLY for the single-partition path the
bail-out routes to — it is NOT a proof that a recursive split could
never beat the whole-block outcome. In principle both sub-blocks
could compress strictly (no raw-fallback in either half) and produce
`cost(first) + cost(second) < source_len + 3`. The wider donor band
gives at most `min_gain` bytes of theoretical recoverable ratio per
block — small, but non-zero.

Comment split into two paragraphs:
1. The "no wire-output drift" guarantee (only applies to the
   single-partition path the bail-out actually takes).
2. The "what we forgo by skipping the split case" caveat — both
   sub-blocks would need to compress strictly AND beat the whole-
   block cost; band is bounded by `min_gain`, empirically zero
   delta in the bench matrix.

No behavior change. Doc-only.
---
 zstd/src/encoding/blocks/compressed.rs | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/zstd/src/encoding/blocks/compressed.rs b/zstd/src/encoding/blocks/compressed.rs
index 543ced6d..1f742a86 100644
--- a/zstd/src/encoding/blocks/compressed.rs
+++ b/zstd/src/encoding/blocks/compressed.rs
@@ -968,14 +968,20 @@ impl SplitEstimator<'_> {
         //   single-partition path → `emit_single_sequence_block`,
         //   which applies the SAME `min_gain` expansion fallback (its
         //   `buffers.compressed.len() >= source_len - min_gain` check
-        //   right before deciding raw-fallback). So whatever block we
-        //   bail on would have been raw-fallbacked by the real emit
-        //   anyway, by the same threshold.
-        // - For a missed split-win to matter, both sub-blocks would
-        //   need to compress strictly (no raw-fallback in either
-        //   half), AND `cost(first) + cost(second) < source_len + 3`.
-        //   The wider donor band gives at most `min_gain` bytes of
-        //   theoretical recoverable ratio per block.
+        //   right before deciding raw-fallback). So for the
+        //   single-partition path specifically, any block we bail on
+        //   here would also raw-fallback there by the same threshold —
+        //   no wire-output drift from this bail-out vs the "let the
+        //   real emit decide" alternative.
+        // - Returning here does skip the split case, so this is NOT a
+        //   proof that a recursive split could never do better: in
+        //   principle, both sub-blocks could compress strictly (no
+        //   raw-fallback in either half) and beat the whole-block
+        //   outcome. For such a missed split-win to matter, both
+        //   sub-blocks would need to compress strictly AND
+        //   `cost(first) + cost(second) < source_len + 3`. The wider
+        //   donor band gives at most `min_gain` bytes of theoretical
+        //   recoverable ratio per block.
         // - Empirically validated: `compare_ffi --list` REPORT lines
         //   show **zero rust_bytes delta** vs main on every
         //   (scenario, level) cell across the full bench matrix.

From 19a3eda6b31ddea40fa230ad5ba4e225df716d0b Mon Sep 17 00:00:00 2001
From: Dmitry Prudnikov <mail@polaz.com>
Date: Tue, 19 May 2026 03:09:52 +0300
Subject: [PATCH 02/14] feat(encoder): strategy-aware literal gates (G4 + G5)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Donor `ZSTD_minLiteralsToCompress` (`zstd_compress_literals.c:114-127`)
floors literal-compression entry at `8 << shift` bytes with shift
derived from strategy index (fast/dfast/greedy/lazy=3, btopt=2,
btultra=1, btultra2=0) and 6-byte floor on reused HUF tables.
Previously hardcoded `< 8` floor — same lower bound but ignored
strategy: btultra2 missed the 8-byte tail compression opportunity,
btopt missed 32-byte band, etc.

Donor `ZSTD_minGain` (`zstd_compress_internal.h:677-684`) requires
compressed literals to beat raw by `(srcSize >> minlog) + 2` margin
where minlog=6 for fast..btopt, 7 for btultra, 8 for btultra2.
Previously bare `>= raw_section_bytes` — overcompressed micro-wins
that fell back to raw on real emit, biasing splitter cost predictions.

`CompressState.strategy_tag` plumbed through frame compressor,
streaming encoder, and `CompressedBlockScratch::scratch_state` so emit
and estimator paths use identical gates — keeps splitter probe costs
in sync with actual emit decisions.

- Verified 514/514 lib tests pass, clippy + fmt clean
- Full ratio matrix (203 cells): 0 deltas vs main — gates donor-equivalent
  on current fixtures
- Block-level `min_gain = (srcSize >> 8) + 2` (btultra2 margin) kept
  uniform at `SplitEstimator::estimate_subblock_size` /
  `emit_single_sequence_block` — migration to strategy-aware block-level
  gain is a separate cleanup

Related: #23
---
 zstd/src/encoding/blocks/compressed.rs | 111 +++++++++++++++++++++++--
 zstd/src/encoding/frame_compressor.rs  |  23 +++++
 zstd/src/encoding/levels/fastest.rs    |   2 +
 zstd/src/encoding/streaming_encoder.rs |   3 +
 4 files changed, 134 insertions(+), 5 deletions(-)

diff --git a/zstd/src/encoding/blocks/compressed.rs b/zstd/src/encoding/blocks/compressed.rs
index 1f742a86..68d91851 100644
--- a/zstd/src/encoding/blocks/compressed.rs
+++ b/zstd/src/encoding/blocks/compressed.rs
@@ -13,6 +13,72 @@ use crate::{
 const MIN_SEQUENCES_BLOCK_SPLITTING: usize = 300;
 const MAX_NB_BLOCK_SPLITS: usize = 196;
 
+/// Donor `ZSTD_minLiteralsToCompress` (`zstd_compress_literals.c:114-127`):
+/// strategy-aware floor below which `compress_literals` does not even
+/// attempt huf compression and falls back to raw.
+///
+/// Formula: `shift = MIN(9 - donor_strategy, 3); mintc = (huf_repeat ==
+/// valid) ? 6 : (8 << shift)`. With huf reuse available, the per-block huf
+/// header overhead is gone, so the cheap floor is 6 bytes. Without it, the
+/// huf tree-description overhead (≥30 bytes) dominates on small payloads
+/// and we'd produce a worse-than-raw block — donor's shift table gives
+/// strategy 1..6 → 64 bytes, strategy 7 (btopt) → 32, strategy 8 (btultra)
+/// → 16, strategy 9 (btultra2) → 8.
+///
+/// Our `StrategyTag` enum has seven variants (no separate lazy2/btlazy2 —
+/// the `Lazy` variant covers donor strategies 4..6). Map our variants to
+/// donor strategy numbers using the most-aggressive-of-band mapping
+/// (`Lazy → strat=5 → shift=3`) so the threshold matches what donor would
+/// pick for the levels we route to that variant.
+#[inline]
+fn min_literals_to_compress(
+    strategy: crate::encoding::strategy::StrategyTag,
+    has_huf_table: bool,
+) -> usize {
+    use crate::encoding::strategy::StrategyTag;
+    if has_huf_table {
+        return 6;
+    }
+    let shift: u32 = match strategy {
+        StrategyTag::Fast | StrategyTag::Dfast | StrategyTag::Greedy | StrategyTag::Lazy => 3,
+        StrategyTag::BtOpt => 2,
+        StrategyTag::BtUltra => 1,
+        StrategyTag::BtUltra2 => 0,
+    };
+    8usize << shift
+}
+
+/// Donor `ZSTD_minGain` (`zstd_compress_internal.h:677-684`):
+/// strategy-aware minimum-compression margin used by both block-level
+/// emit ("compressed block must beat raw + minGain bytes") and the
+/// literal-section expansion check.
+///
+/// Formula: `minlog = (strat >= btultra) ? strat - 1 : 6; (src_size >>
+/// minlog) + 2`. So:
+/// - fast..btopt (strat 1..7): minlog=6 → ~1.5% margin + 2 bytes
+/// - btultra (strat 8): minlog=7 → ~0.78% margin + 2 bytes
+/// - btultra2 (strat 9): minlog=8 → ~0.39% margin + 2 bytes
+///
+/// Used in `compress_literals` and `estimate_literals_section_bytes` to
+/// match donor's `cLitSize >= srcSize - minGain` raw-fallback gate.
+/// Existing block-level emit / probe paths (`emit_single_sequence_block`,
+/// `SplitEstimator::estimate_subblock_size`) already use `min_log = 8`
+/// directly (`source_len >> 8 + 2`) — that is the btultra2 margin, the
+/// most permissive of the band, applied uniformly. Migrating those sites
+/// to this strategy-aware helper is a separate cleanup; this commit only
+/// wires the helper into the literal-section gates that previously had
+/// no `min_gain` margin at all (bare `>= raw_section_bytes`).
+#[inline]
+fn min_gain(src_size: usize, strategy: crate::encoding::strategy::StrategyTag) -> usize {
+    use crate::encoding::strategy::StrategyTag;
+    let minlog: u32 = match strategy {
+        StrategyTag::BtUltra => 7,
+        StrategyTag::BtUltra2 => 8,
+        _ => 6,
+    };
+    (src_size >> minlog) + 2
+}
+
 /// Donor `kInverseProbabilityLog256`: floor(-log2(x / 256) * 256).
 const INVERSE_PROBABILITY_LOG_256: [usize; 256] = [
     0, 2048, 1792, 1642, 1536, 1453, 1386, 1329, 1280, 1236, 1197, 1162, 1130, 1100, 1073, 1047,
@@ -203,6 +269,7 @@ pub(crate) fn compress_block_with_post_split<M: Matcher>(
             fse_tables: clone_fse_tables(&state.fse_tables),
             block_scratch: super::CompressedBlockScratch::new(),
             offset_hist: state.offset_hist,
+            strategy_tag: state.strategy_tag,
         },
         workspace,
     };
@@ -302,11 +369,30 @@ fn encode_block_parts_with_sequence_scratch<M: Matcher>(
     // literals section
 
     let mut writer = BitWriter::from(output);
+    // Donor `compress_literals` (`zstd_compress_literals.c:153-160`):
+    // `srcSize < ZSTD_minLiteralsToCompress(strategy, prevHuf->repeatMode)`
+    // → `ZSTD_noCompressLiterals` (raw). The threshold is strategy-aware
+    // (see `min_literals_to_compress`). With huf reuse available the
+    // floor drops to 6 since there is no per-block huf-header overhead.
+    let strategy = state.strategy_tag;
+    let has_huf_table = state.last_huff_table.is_some();
+    let min_lits = min_literals_to_compress(strategy, has_huf_table);
+    // RLE keeps a separate `>= 8 + all_bytes_identical` gate: donor's
+    // `cLitSize == 1` RLE-detection path (`zstd_compress_literals.c:192-201`)
+    // fires after the compress attempt, but our pre-check is cheaper and
+    // produces the same RLE block for `len >= 8 && all_identical`. RLE
+    // for sub-8-byte all-identical inputs goes through the raw path in
+    // both donor (`srcSize < 8` arm) and us — same wire output.
     if literals_vec.len() >= 8 && all_bytes_identical(literals_vec) {
         rle_literals(literals_vec, &mut writer);
         state.last_huff_table = None;
-    } else if literals_vec.len() >= 8 {
-        match compress_literals(literals_vec, state.last_huff_table.as_ref(), &mut writer) {
+    } else if literals_vec.len() >= min_lits {
+        match compress_literals(
+            literals_vec,
+            state.last_huff_table.as_ref(),
+            &mut writer,
+            strategy,
+        ) {
             HuffmanTableUpdate::New(table) => {
                 state.last_huff_table.replace(table);
             }
@@ -416,6 +502,7 @@ fn estimate_block_parts_size<M: Matcher>(
         literals_vec,
         &mut state.last_huff_table,
         &mut workspace.lit_counts,
+        state.strategy_tag,
     );
 
     let seq_bytes = if workspace.sequences.is_empty() {
@@ -437,9 +524,11 @@ fn estimate_literals_section_bytes(
     literals: &[u8],
     last_huff: &mut Option<huff0_encoder::HuffmanTable>,
     counts: &mut [usize; 256],
+    strategy: crate::encoding::strategy::StrategyTag,
 ) -> usize {
     // Mirror `encode_block_parts_with_sequence_scratch` literal-mode branches.
-    if literals.len() < 8 {
+    let min_lits = min_literals_to_compress(strategy, last_huff.is_some());
+    if literals.len() < min_lits {
         *last_huff = None;
         return uncompressed_literals_header_bytes(literals.len()) + literals.len();
     }
@@ -503,7 +592,8 @@ fn estimate_literals_section_bytes(
     // small payloads where the raw header (1 byte for size < 32) is smaller
     // than the compressed header (3 bytes for size < 1 KB).
     let raw_section_bytes = uncompressed_literals_header_bytes(literals.len()) + literals.len();
-    if total >= raw_section_bytes {
+    let mg = min_gain(literals.len(), strategy);
+    if total >= raw_section_bytes.saturating_sub(mg) {
         *last_huff = None;
         return raw_section_bytes;
     }
@@ -1588,6 +1678,7 @@ fn compress_literals(
     literals: &[u8],
     last_table: Option<&huff0_encoder::HuffmanTable>,
     writer: &mut BitWriter<&mut Vec<u8>>,
+    strategy: crate::encoding::strategy::StrategyTag,
 ) -> HuffmanTableUpdate {
     let reset_idx = writer.index();
 
@@ -1659,7 +1750,16 @@ fn compress_literals(
     // `literals.len()` would bias toward raw on small payloads where the
     // raw header is smaller than the compressed header.
     let raw_section_bytes = uncompressed_literals_header_bytes(literals.len()) + literals.len();
-    if total_len >= raw_section_bytes {
+    // Donor `compress_literals` (`zstd_compress_literals.c:187-188`):
+    // `cLitSize >= srcSize - minGain` → raw fallback. `minGain` is
+    // strategy-aware (`min_gain` helper above; ~1.5% for fast..btopt,
+    // ~0.78% for btultra, ~0.39% for btultra2). Previously this check
+    // used bare `>= raw_section_bytes` (zero margin) which let
+    // marginally-compressing blocks emit huf when raw + reused table
+    // would compress better in the next block. Saturating subtraction
+    // covers tiny inputs where `raw_section_bytes < min_gain`.
+    let mg = min_gain(literals.len(), strategy);
+    if total_len >= raw_section_bytes.saturating_sub(mg) {
         writer.reset_to(reset_idx);
         raw_literals(literals, writer);
         HuffmanTableUpdate::Cleared
@@ -1733,6 +1833,7 @@ mod tests {
             fse_tables: FseTables::new(),
             block_scratch: super::CompressedBlockScratch::new(),
             offset_hist: [10, 20, 30],
+            strategy_tag: crate::encoding::strategy::StrategyTag::Fast,
         };
         let source = [0xA5; 8];
         let sequences = [RawSequence {
diff --git a/zstd/src/encoding/frame_compressor.rs b/zstd/src/encoding/frame_compressor.rs
index e58bb68d..bc67a14b 100644
--- a/zstd/src/encoding/frame_compressor.rs
+++ b/zstd/src/encoding/frame_compressor.rs
@@ -390,6 +390,15 @@ pub(crate) struct CompressState<M: Matcher> {
     /// Offset history for repeat offset encoding: [rep0, rep1, rep2].
     /// Initialized to [1, 4, 8] per RFC 8878 §3.1.2.5.
     pub(crate) offset_hist: [u32; 3],
+    /// Strategy tag resolved from the current `CompressionLevel` at every
+    /// `matcher.reset()` call. Used by the literal-compression gates
+    /// (`min_literals_to_compress`, `min_gain`) in
+    /// `encoding::blocks::compressed` to mirror donor's strategy-aware
+    /// thresholds (`zstd_compress_literals.c:114-127, 187-188`). Defaults
+    /// to `Fast` so a `CompressState` constructed without an explicit
+    /// reset still has a valid value (matches the level-1 default donor
+    /// would emit for `disableLiteralCompression == 0`).
+    pub(crate) strategy_tag: crate::encoding::strategy::StrategyTag,
 }
 
 impl<R: Read, W: Write> FrameCompressor<R, W, MatchGeneratorDriver> {
@@ -408,6 +417,9 @@ impl<R: Read, W: Write> FrameCompressor<R, W, MatchGeneratorDriver> {
                 fse_tables: FseTables::new(),
                 block_scratch: crate::encoding::blocks::CompressedBlockScratch::new(),
                 offset_hist: [1, 4, 8],
+                strategy_tag: crate::encoding::strategy::StrategyTag::for_compression_level(
+                    compression_level,
+                ),
             },
             #[cfg(feature = "hash")]
             hasher: XxHash64::with_seed(0),
@@ -430,6 +442,9 @@ impl<R: Read, W: Write, M: Matcher> FrameCompressor<R, W, M> {
                 fse_tables: FseTables::new(),
                 block_scratch: crate::encoding::blocks::CompressedBlockScratch::new(),
                 offset_hist: [1, 4, 8],
+                strategy_tag: crate::encoding::strategy::StrategyTag::for_compression_level(
+                    compression_level,
+                ),
             },
             compression_level,
             #[cfg(feature = "hash")]
@@ -489,6 +504,14 @@ impl<R: Read, W: Write, M: Matcher> FrameCompressor<R, W, M> {
         // Clearing buffers to allow re-using of the compressor
         self.state.matcher.reset(self.compression_level);
         self.state.offset_hist = [1, 4, 8];
+        // Sync `state.strategy_tag` to the level resolved at this reset so
+        // the literal-compression gates (`min_literals_to_compress` /
+        // `min_gain` in `encoding::blocks::compressed`) see the correct
+        // strategy for the next frame. Frame-by-frame level changes go
+        // through this same `compress()` entry point, so re-syncing here
+        // covers level switches without touching the matcher dispatch.
+        self.state.strategy_tag =
+            crate::encoding::strategy::StrategyTag::for_compression_level(self.compression_level);
         let cached_entropy = if use_dictionary_state {
             self.dictionary_entropy_cache.as_ref()
         } else {
diff --git a/zstd/src/encoding/levels/fastest.rs b/zstd/src/encoding/levels/fastest.rs
index 5200f359..5ddf09c3 100644
--- a/zstd/src/encoding/levels/fastest.rs
+++ b/zstd/src/encoding/levels/fastest.rs
@@ -186,6 +186,7 @@ mod tests {
             fse_tables: FseTables::new(),
             block_scratch: crate::encoding::blocks::CompressedBlockScratch::new(),
             offset_hist: [1, 4, 8],
+            strategy_tag: crate::encoding::strategy::StrategyTag::Fast,
         };
         let mut output = Vec::new();
 
@@ -213,6 +214,7 @@ mod tests {
             fse_tables: FseTables::new(),
             block_scratch: crate::encoding::blocks::CompressedBlockScratch::new(),
             offset_hist: [1, 4, 8],
+            strategy_tag: crate::encoding::strategy::StrategyTag::Fast,
         };
         let mut output = Vec::new();
 
diff --git a/zstd/src/encoding/streaming_encoder.rs b/zstd/src/encoding/streaming_encoder.rs
index 5ecee5db..54687234 100644
--- a/zstd/src/encoding/streaming_encoder.rs
+++ b/zstd/src/encoding/streaming_encoder.rs
@@ -66,6 +66,9 @@ impl<W: Write, M: Matcher> StreamingEncoder<W, M> {
                 fse_tables: FseTables::new(),
                 block_scratch: crate::encoding::blocks::CompressedBlockScratch::new(),
                 offset_hist: [1, 4, 8],
+                strategy_tag: crate::encoding::strategy::StrategyTag::for_compression_level(
+                    compression_level,
+                ),
             },
             pending: Vec::new(),
             encoded_scratch: Vec::new(),

From f7d74262ec3eb75c6e69c16de9c13264413cd860 Mon Sep 17 00:00:00 2001
From: Dmitry Prudnikov <mail@polaz.com>
Date: Tue, 19 May 2026 03:17:17 +0300
Subject: [PATCH 03/14] fix(encoder): estimator RLE gate mirrors emit's >= 8
 guard
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Emitter only emits RLE when `len >= 8 && all_identical`
(`encode_block_parts_with_sequence_scratch` RLE branch). After G4
landed, estimator's `min_lits` can be 6 on reused HUF tables, so a
bare `all_identical` check let estimator predict a 1-byte RLE payload
for 6-7 byte all-identical literals that the emitter would actually
write as raw — splitter probe costs drift from real emit output.

Mirror the same `len >= 8` guard explicitly so estimator and emit
stay byte-equivalent on every input.

Reported by CodeRabbit on PR #182.
---
 zstd/src/encoding/blocks/compressed.rs | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/zstd/src/encoding/blocks/compressed.rs b/zstd/src/encoding/blocks/compressed.rs
index 68d91851..2fa3428d 100644
--- a/zstd/src/encoding/blocks/compressed.rs
+++ b/zstd/src/encoding/blocks/compressed.rs
@@ -532,7 +532,13 @@ fn estimate_literals_section_bytes(
         *last_huff = None;
         return uncompressed_literals_header_bytes(literals.len()) + literals.len();
     }
-    if all_bytes_identical(literals) {
+    // Mirror emit's RLE gate exactly: emitter only emits RLE when
+    // `len >= 8 && all_identical` (see comment at the RLE branch above).
+    // With reused HUF tables `min_lits == 6`, so a bare `all_identical`
+    // check here would have estimator predict a 1-byte RLE payload for
+    // 6-7 byte all-identical literals that the emitter would actually
+    // write as raw — splitter probe costs drift from real emit output.
+    if literals.len() >= 8 && all_bytes_identical(literals) {
         *last_huff = None;
         return uncompressed_literals_header_bytes(literals.len()) + 1;
     }

From 42034fc7d00e256ec8b3d34bee1ff2842e735ff8 Mon Sep 17 00:00:00 2001
From: Dmitry Prudnikov <mail@polaz.com>
Date: Tue, 19 May 2026 03:27:00 +0300
Subject: [PATCH 04/14] fix(encoder): mirror emit order in estimator + cover
 gates with tests
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Emit checks the RLE pre-shortcut (`len >= 8 && all_identical`)
BEFORE the strategy-aware `min_lits` gate, so an all-identical
8-byte literal section emits a 2-byte RLE block under every
strategy — including fast/dfast/greedy/lazy where `min_lits == 64`.
The estimator previously gated `min_lits` first, predicting raw
for those inputs while emit wrote RLE — splitter probe costs
diverged. Swap the order in `estimate_literals_section_bytes` so
both paths agree byte-for-byte; update emit's RLE-pre-check
comment to be accurate about the donor divergence.

Add unit tests covering:

- `min_literals_to_compress` per-strategy values with and without
  HUF reuse (donor table from `zstd_compress_literals.c:114-127`)
- `min_gain` per-strategy margins (`zstd_compress_internal.h:677-684`)
  including small-source `(src >> minlog) + 2` edge
- Estimator/emit parity for boundary literal lengths around each
  strategy's `min_lits` and across the RLE `>= 8` cliff

Verified 531/531 lib tests pass; ratio matrix (203 cells) shows
0 delta vs main.
---
 zstd/src/encoding/blocks/compressed.rs | 150 ++++++++++++++++++++++---
 1 file changed, 132 insertions(+), 18 deletions(-)

diff --git a/zstd/src/encoding/blocks/compressed.rs b/zstd/src/encoding/blocks/compressed.rs
index 2fa3428d..c755d70a 100644
--- a/zstd/src/encoding/blocks/compressed.rs
+++ b/zstd/src/encoding/blocks/compressed.rs
@@ -377,12 +377,16 @@ fn encode_block_parts_with_sequence_scratch<M: Matcher>(
     let strategy = state.strategy_tag;
     let has_huf_table = state.last_huff_table.is_some();
     let min_lits = min_literals_to_compress(strategy, has_huf_table);
-    // RLE keeps a separate `>= 8 + all_bytes_identical` gate: donor's
-    // `cLitSize == 1` RLE-detection path (`zstd_compress_literals.c:192-201`)
-    // fires after the compress attempt, but our pre-check is cheaper and
-    // produces the same RLE block for `len >= 8 && all_identical`. RLE
-    // for sub-8-byte all-identical inputs goes through the raw path in
-    // both donor (`srcSize < 8` arm) and us — same wire output.
+    // RLE pre-check: donor `compress_literals` reaches RLE through
+    // `cLitSize == 1` after a full compress attempt
+    // (`zstd_compress_literals.c:192-201`), but only AFTER passing the
+    // `min_lits` gate — so donor at fast/dfast/greedy/lazy emits raw
+    // for 8..63 all-identical inputs. Our pre-check fires earlier and
+    // emits the smaller RLE block instead (2 bytes vs `1+len` raw): a
+    // strict win in compressed size on all-identical sections, no
+    // donor binary parity but better ratio. Note the order — RLE
+    // pre-check runs BEFORE `min_lits`. `estimate_literals_section_bytes`
+    // mirrors this order so probe costs match emit byte-for-byte.
     if literals_vec.len() >= 8 && all_bytes_identical(literals_vec) {
         rle_literals(literals_vec, &mut writer);
         state.last_huff_table = None;
@@ -526,22 +530,23 @@ fn estimate_literals_section_bytes(
     counts: &mut [usize; 256],
     strategy: crate::encoding::strategy::StrategyTag,
 ) -> usize {
-    // Mirror `encode_block_parts_with_sequence_scratch` literal-mode branches.
+    // Mirror `encode_block_parts_with_sequence_scratch` literal-mode branches
+    // **in the same order**. The emitter checks the RLE pre-shortcut
+    // (`len >= 8 && all_identical`) BEFORE the `min_lits` gate, so an
+    // all-identical 8-byte literal section emits a 2-byte RLE block under
+    // any strategy — including fast/dfast/greedy/lazy where `min_lits == 64`.
+    // If estimator gated `min_lits` first, it would predict a raw block
+    // for those inputs while the emitter writes RLE, biasing splitter
+    // probe costs. Keep the order identical to emit.
+    if literals.len() >= 8 && all_bytes_identical(literals) {
+        *last_huff = None;
+        return uncompressed_literals_header_bytes(literals.len()) + 1;
+    }
     let min_lits = min_literals_to_compress(strategy, last_huff.is_some());
     if literals.len() < min_lits {
         *last_huff = None;
         return uncompressed_literals_header_bytes(literals.len()) + literals.len();
     }
-    // Mirror emit's RLE gate exactly: emitter only emits RLE when
-    // `len >= 8 && all_identical` (see comment at the RLE branch above).
-    // With reused HUF tables `min_lits == 6`, so a bare `all_identical`
-    // check here would have estimator predict a 1-byte RLE payload for
-    // 6-7 byte all-identical literals that the emitter would actually
-    // write as raw — splitter probe costs drift from real emit output.
-    if literals.len() >= 8 && all_bytes_identical(literals) {
-        *last_huff = None;
-        return uncompressed_literals_header_bytes(literals.len()) + 1;
-    }
 
     counts.fill(0);
     for &b in literals {
@@ -1782,9 +1787,11 @@ mod tests {
 
     use super::{
         FseTableMode, RawSequence, choose_table, emit_single_sequence_block, encode_match_len,
-        encode_offset_with_history, previous_table, remember_last_used_tables,
+        encode_offset_with_history, min_gain, min_literals_to_compress, previous_table,
+        remember_last_used_tables,
     };
     use crate::encoding::frame_compressor::{CompressState, FseTables, PreviousFseTable};
+    use crate::encoding::strategy::StrategyTag;
     use crate::fse::fse_encoder::build_table_from_symbol_counts;
     use alloc::vec::Vec;
 
@@ -1824,6 +1831,113 @@ mod tests {
         assert_eq!(hist, [9, 10, 20]);
     }
 
+    #[test]
+    fn min_literals_to_compress_matches_donor_table() {
+        for strat in [
+            StrategyTag::Fast,
+            StrategyTag::Dfast,
+            StrategyTag::Greedy,
+            StrategyTag::Lazy,
+        ] {
+            assert_eq!(min_literals_to_compress(strat, false), 64);
+            assert_eq!(min_literals_to_compress(strat, true), 6);
+        }
+        assert_eq!(min_literals_to_compress(StrategyTag::BtOpt, false), 32);
+        assert_eq!(min_literals_to_compress(StrategyTag::BtOpt, true), 6);
+        assert_eq!(min_literals_to_compress(StrategyTag::BtUltra, false), 16);
+        assert_eq!(min_literals_to_compress(StrategyTag::BtUltra, true), 6);
+        assert_eq!(min_literals_to_compress(StrategyTag::BtUltra2, false), 8);
+        assert_eq!(min_literals_to_compress(StrategyTag::BtUltra2, true), 6);
+    }
+
+    #[test]
+    fn min_gain_matches_donor_margin() {
+        let src = 4096usize;
+        for strat in [
+            StrategyTag::Fast,
+            StrategyTag::Dfast,
+            StrategyTag::Greedy,
+            StrategyTag::Lazy,
+            StrategyTag::BtOpt,
+        ] {
+            assert_eq!(min_gain(src, strat), (src >> 6) + 2);
+        }
+        assert_eq!(min_gain(src, StrategyTag::BtUltra), (src >> 7) + 2);
+        assert_eq!(min_gain(src, StrategyTag::BtUltra2), (src >> 8) + 2);
+        assert_eq!(min_gain(0, StrategyTag::Fast), 2);
+        assert_eq!(min_gain(63, StrategyTag::Fast), 2);
+        assert_eq!(min_gain(64, StrategyTag::Fast), 3);
+    }
+
+    #[test]
+    fn estimator_literals_section_mirrors_emit_for_short_inputs() {
+        use super::{
+            CompressedBlockScratch, EntropyOnlyMatcher, EstimatorWorkspace,
+            encode_block_parts_with_sequence_scratch, estimate_block_parts_size,
+        };
+        // For each strategy at boundary literal lengths around `min_lits`
+        // and across the RLE pre-check `>= 8` cliff, estimator's predicted
+        // size MUST equal the bytes the emitter actually writes.
+        let cases: &[(StrategyTag, &[(usize, bool)])] = &[
+            (
+                StrategyTag::Fast,
+                &[(8, true), (8, false), (63, true), (63, false), (64, false)],
+            ),
+            (
+                StrategyTag::BtUltra2,
+                &[(7, true), (7, false), (8, true), (8, false), (16, false)],
+            ),
+            (StrategyTag::BtOpt, &[(8, true), (31, true), (32, false)]),
+        ];
+
+        for (strat, inputs) in cases {
+            for (len, identical) in *inputs {
+                let literals: Vec<u8> = if *identical {
+                    alloc::vec![0x5Au8; *len]
+                } else {
+                    (0..*len as u8).collect()
+                };
+                let mut est_state = CompressState::<EntropyOnlyMatcher> {
+                    matcher: EntropyOnlyMatcher,
+                    last_huff_table: None,
+                    fse_tables: FseTables::new(),
+                    block_scratch: CompressedBlockScratch::new(),
+                    offset_hist: [1, 4, 8],
+                    strategy_tag: *strat,
+                };
+                let mut emit_state = CompressState::<EntropyOnlyMatcher> {
+                    matcher: EntropyOnlyMatcher,
+                    last_huff_table: None,
+                    fse_tables: FseTables::new(),
+                    block_scratch: CompressedBlockScratch::new(),
+                    offset_hist: [1, 4, 8],
+                    strategy_tag: *strat,
+                };
+                let mut workspace = EstimatorWorkspace::default();
+                let est = estimate_block_parts_size(&mut est_state, &literals, &[], &mut workspace);
+                let mut emitted: Vec<u8> = Vec::new();
+                let mut scratch: Vec<crate::blocks::sequence_section::Sequence> = Vec::new();
+                encode_block_parts_with_sequence_scratch(
+                    &mut emit_state,
+                    &literals,
+                    &[],
+                    &mut emitted,
+                    &mut scratch,
+                );
+                assert_eq!(
+                    est,
+                    emitted.len(),
+                    "estimator/emit parity broken: strategy={:?} len={} identical={} est={} emit={}",
+                    strat,
+                    len,
+                    identical,
+                    est,
+                    emitted.len(),
+                );
+            }
+        }
+    }
+
     #[test]
     fn encode_match_len_uses_correct_upper_range_base() {
         assert_eq!(encode_match_len(65539), (52, 0, 16));

From 8c7e7724674a6e2fa3d7803abd00092927d3308f Mon Sep 17 00:00:00 2001
From: Dmitry Prudnikov <mail@polaz.com>
Date: Tue, 19 May 2026 03:49:06 +0300
Subject: [PATCH 05/14] fix(encoder): donor-formula min_gain + RLE for any
 all-identical section
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Two related parity fixes to the literal-section pipeline:

1. min_gain comparison now matches donor formula. Previously compared
   `total_len >= raw_section_bytes - mg` (compressed lhSize on LHS,
   raw lhSize on RHS), which skewed the threshold by
   `compressed_header - raw_header` bytes and forced raw fallback on
   marginally-winning compressed sections that donor would keep. Switch
   to donor's `cLitSize >= srcSize - minGain` form
   (`zstd_compress_literals.c:187-188`): compare
   `(total_len - compressed_header_len) >= literals.len() - mg`. Applied
   in both estimator (`estimate_literals_section_bytes`) and emit
   (`compress_literals`) so probe costs stay in lockstep.

2. RLE pre-check fires for any non-empty all-identical literal section,
   not just `len >= 8`. The old `len >= 8` floor caused 6/7-byte
   all-identical sections with a reusable HUF table (`min_lits == 6`)
   to skip RLE and emit a larger HUF block — donor reaches the same
   RLE block via `cLitSize == 1` after the `min_lits` gate. Dropping
   the floor matches donor on `>= min_lits` inputs and produces
   strictly smaller output than donor on the `< min_lits` all-identical
   edges (where donor falls through to raw). Applied symmetrically in
   estimator and emit.

531/531 lib tests pass; ratio matrix (203 cells via REPORT lines) shows
0 delta vs main — both fixes are ratio-safe on current fixtures.
---
 zstd/src/encoding/blocks/compressed.rs | 100 ++++++++++++++-----------
 1 file changed, 57 insertions(+), 43 deletions(-)

diff --git a/zstd/src/encoding/blocks/compressed.rs b/zstd/src/encoding/blocks/compressed.rs
index c755d70a..4de1bab4 100644
--- a/zstd/src/encoding/blocks/compressed.rs
+++ b/zstd/src/encoding/blocks/compressed.rs
@@ -377,17 +377,21 @@ fn encode_block_parts_with_sequence_scratch<M: Matcher>(
     let strategy = state.strategy_tag;
     let has_huf_table = state.last_huff_table.is_some();
     let min_lits = min_literals_to_compress(strategy, has_huf_table);
-    // RLE pre-check: donor `compress_literals` reaches RLE through
-    // `cLitSize == 1` after a full compress attempt
-    // (`zstd_compress_literals.c:192-201`), but only AFTER passing the
-    // `min_lits` gate — so donor at fast/dfast/greedy/lazy emits raw
-    // for 8..63 all-identical inputs. Our pre-check fires earlier and
-    // emits the smaller RLE block instead (2 bytes vs `1+len` raw): a
-    // strict win in compressed size on all-identical sections, no
-    // donor binary parity but better ratio. Note the order — RLE
-    // pre-check runs BEFORE `min_lits`. `estimate_literals_section_bytes`
-    // mirrors this order so probe costs match emit byte-for-byte.
-    if literals_vec.len() >= 8 && all_bytes_identical(literals_vec) {
+    // RLE pre-check: donor `compress_literals` reaches RLE only through
+    // the `cLitSize == 1` branch (`zstd_compress_literals.c:192-201`)
+    // after passing the `min_lits` gate and running a full HUF compress —
+    // so donor emits raw for any all-identical section under `min_lits`
+    // (e.g. 8..63 bytes at fast/dfast/greedy/lazy, 6..7 bytes with HUF
+    // reuse). RLE is unconditionally better than raw on any non-empty
+    // all-identical input (2 bytes vs `1 + len`), so our pre-check fires
+    // for ANY all-identical literal slice regardless of strategy/min_lits.
+    // This produces strictly smaller output than donor on the small
+    // all-identical edges while still matching donor on `>= min_lits`
+    // inputs (where donor's compress+`cLitSize==1` path reaches the same
+    // RLE block). Note the order — RLE pre-check runs BEFORE `min_lits`;
+    // `estimate_literals_section_bytes` mirrors this exactly so probe
+    // costs match emit byte-for-byte.
+    if !literals_vec.is_empty() && all_bytes_identical(literals_vec) {
         rle_literals(literals_vec, &mut writer);
         state.last_huff_table = None;
     } else if literals_vec.len() >= min_lits {
@@ -531,14 +535,12 @@ fn estimate_literals_section_bytes(
     strategy: crate::encoding::strategy::StrategyTag,
 ) -> usize {
     // Mirror `encode_block_parts_with_sequence_scratch` literal-mode branches
-    // **in the same order**. The emitter checks the RLE pre-shortcut
-    // (`len >= 8 && all_identical`) BEFORE the `min_lits` gate, so an
-    // all-identical 8-byte literal section emits a 2-byte RLE block under
-    // any strategy — including fast/dfast/greedy/lazy where `min_lits == 64`.
-    // If estimator gated `min_lits` first, it would predict a raw block
-    // for those inputs while the emitter writes RLE, biasing splitter
-    // probe costs. Keep the order identical to emit.
-    if literals.len() >= 8 && all_bytes_identical(literals) {
+    // **in the same order**. The emitter pre-checks `all_identical`
+    // (any non-empty section) BEFORE the `min_lits` gate — RLE is
+    // unconditionally smaller than raw on all-identical inputs, so it
+    // is selected regardless of strategy. Estimator must use the same
+    // ordering and predicate so probe costs match emit byte-for-byte.
+    if !literals.is_empty() && all_bytes_identical(literals) {
         *last_huff = None;
         return uncompressed_literals_header_bytes(literals.len()) + 1;
     }
@@ -595,16 +597,23 @@ fn estimate_literals_section_bytes(
     let compressed_header = compressed_literals_header_bytes(literals.len());
     let total = compressed_header + tree_desc + payload;
 
-    // Mirror `compress_literals` raw fallback: compare on-wire sections
-    // *byte-for-byte* — `total` already includes the compressed literals
-    // header (and the Huffman tree description when emitting a new table),
-    // so the raw side must also include its own 1–3 byte header. Comparing
-    // against bare `literals.len()` would bias the splitter toward raw on
-    // small payloads where the raw header (1 byte for size < 32) is smaller
-    // than the compressed header (3 bytes for size < 1 KB).
+    // Donor `compress_literals` raw-fallback gate
+    // (`zstd_compress_literals.c:187-188`):
+    //   `cLitSize >= srcSize - minGain`
+    // where `cLitSize` is the encoded literals payload + tree description
+    // (output of `HUF_compress*`, excluding the surrounding lhSize bytes)
+    // and `srcSize` is the literal-payload length. In our terms:
+    //   - donor `cLitSize` ≡ `total - compressed_header` (tree_desc + payload)
+    //   - donor `srcSize`  ≡ `literals.len()`
+    // Using the on-wire `total >= raw_section_bytes - mg` form (which
+    // includes the compressed header on the LHS and the raw header on
+    // the RHS) skews the threshold by `compressed_header - raw_header`
+    // bytes and rejects compressed sections that donor would keep,
+    // losing ratio. Mirror donor's payload-vs-srcSize form here.
     let raw_section_bytes = uncompressed_literals_header_bytes(literals.len()) + literals.len();
     let mg = min_gain(literals.len(), strategy);
-    if total >= raw_section_bytes.saturating_sub(mg) {
+    let donor_clit = total - compressed_header; // tree_desc + payload
+    if donor_clit >= literals.len().saturating_sub(mg) {
         *last_huff = None;
         return raw_section_bytes;
     }
@@ -1754,23 +1763,28 @@ fn compress_literals(
     writer.change_bits(size_index, encoded_len as u64, size_bits);
     let total_len = (writer.index() - reset_idx) / 8;
 
-    // Compare on-wire sections byte-for-byte: `total_len` already includes
-    // the compressed literals header + tree description + payload, so the
-    // raw side must also include its own 1–3 byte header
-    // (`uncompressed_literals_header_bytes`). Comparing against bare
-    // `literals.len()` would bias toward raw on small payloads where the
-    // raw header is smaller than the compressed header.
-    let raw_section_bytes = uncompressed_literals_header_bytes(literals.len()) + literals.len();
-    // Donor `compress_literals` (`zstd_compress_literals.c:187-188`):
-    // `cLitSize >= srcSize - minGain` → raw fallback. `minGain` is
-    // strategy-aware (`min_gain` helper above; ~1.5% for fast..btopt,
-    // ~0.78% for btultra, ~0.39% for btultra2). Previously this check
-    // used bare `>= raw_section_bytes` (zero margin) which let
-    // marginally-compressing blocks emit huf when raw + reused table
-    // would compress better in the next block. Saturating subtraction
-    // covers tiny inputs where `raw_section_bytes < min_gain`.
+    // Donor `compress_literals` raw-fallback gate
+    // (`zstd_compress_literals.c:187-188`):
+    //   `cLitSize >= srcSize - minGain`
+    // where donor's `cLitSize` is the encoded literals payload plus the
+    // tree description (output of `HUF_compress*`, excluding the
+    // surrounding `lhSize` literals header), and `srcSize` is the
+    // literal-payload length. In our terms:
+    //   - donor `cLitSize` ≡ `total_len - compressed_literals_header_bytes`
+    //     (i.e. tree_desc + huf_payload, no lhSize)
+    //   - donor `srcSize`  ≡ `literals.len()`
+    // Comparing `total_len >= raw_section_bytes - minGain` (with the
+    // compressed-section lhSize on the LHS and raw-section header on
+    // the RHS) skews the threshold by `compressed_header - raw_header`
+    // bytes and rejects compressed sections that donor would keep —
+    // direct ratio loss. Mirror donor's payload-vs-srcSize form here.
+    // `minGain` is strategy-aware (`min_gain` helper above; ~1.56% for
+    // fast..btopt, ~0.78% for btultra, ~0.39% for btultra2). Saturating
+    // subtraction covers tiny inputs where `literals.len() < minGain`.
+    let compressed_header_len = compressed_literals_header_bytes(literals.len());
+    let donor_clit = total_len - compressed_header_len; // tree_desc + payload
     let mg = min_gain(literals.len(), strategy);
-    if total_len >= raw_section_bytes.saturating_sub(mg) {
+    if donor_clit >= literals.len().saturating_sub(mg) {
         writer.reset_to(reset_idx);
         raw_literals(literals, writer);
         HuffmanTableUpdate::Cleared

From 70e79ec482c4ebd616d2922c36fffeb3a4e8058c Mon Sep 17 00:00:00 2001
From: Dmitry Prudnikov <mail@polaz.com>
Date: Tue, 19 May 2026 04:08:24 +0300
Subject: [PATCH 06/14] test(encoder): extend literal-gate parity test to HUF
 reuse path
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The parity test previously only seeded `last_huff_table: None`, so the
floor-6 HUF reuse path (`ZSTD_minLiteralsToCompress` with valid prior
HUF table) and 6/7-byte all-identical sections that newly route
through the unconditional RLE pre-check were not exercised in
estimator/emit parity.

Add cases:
- Sub-`min_lits` all-identical (`len in 1..=7` at fast strategy):
  exercises the new "any non-empty all-identical → RLE" path
- Seeded HUF table with `Lazy` strategy at 6/7-byte all-identical and
  16-byte non-identical: exercises the `min_lits == 6` floor and the
  HUF-reuse decision under the new RLE pre-check

Also update the in-test docstring to drop the stale "RLE `>= 8` cliff"
phrasing — the cliff was removed when RLE became unconditional for
non-empty all-identical inputs.

531/531 lib tests pass.
---
 zstd/src/encoding/blocks/compressed.rs | 70 ++++++++++++++++++++++----
 1 file changed, 61 insertions(+), 9 deletions(-)

diff --git a/zstd/src/encoding/blocks/compressed.rs b/zstd/src/encoding/blocks/compressed.rs
index 4de1bab4..ea28728e 100644
--- a/zstd/src/encoding/blocks/compressed.rs
+++ b/zstd/src/encoding/blocks/compressed.rs
@@ -1807,6 +1807,7 @@ mod tests {
     use crate::encoding::frame_compressor::{CompressState, FseTables, PreviousFseTable};
     use crate::encoding::strategy::StrategyTag;
     use crate::fse::fse_encoder::build_table_from_symbol_counts;
+    use crate::huff0::huff0_encoder;
     use alloc::vec::Vec;
 
     fn tables_match(
@@ -1890,30 +1891,80 @@ mod tests {
             encode_block_parts_with_sequence_scratch, estimate_block_parts_size,
         };
         // For each strategy at boundary literal lengths around `min_lits`
-        // and across the RLE pre-check `>= 8` cliff, estimator's predicted
-        // size MUST equal the bytes the emitter actually writes.
-        let cases: &[(StrategyTag, &[(usize, bool)])] = &[
+        // and across the all-identical RLE pre-check (fires for any
+        // non-empty all-identical input under every strategy/HUF state),
+        // estimator's predicted size MUST equal the bytes the emitter
+        // actually writes. Cases include: (a) fresh state with
+        // `last_huff_table: None` covering the strategy-specific
+        // `min_lits` band (8/16/32/64), (b) seeded HUF-reuse state
+        // covering the lowered floor of 6 and 6/7-byte all-identical
+        // sections that previously hit the HUF path instead of RLE,
+        // (c) sub-`min_lits` all-identical sections that take the
+        // RLE pre-check regardless of strategy.
+        type Inputs = &'static [(usize, bool)];
+        let cases: &[(StrategyTag, bool, Inputs)] = &[
+            // (strategy, seed_huff_reuse, [(len, all_identical)])
             (
                 StrategyTag::Fast,
-                &[(8, true), (8, false), (63, true), (63, false), (64, false)],
+                false,
+                &[
+                    (1, true), // sub-min_lits all-identical → RLE
+                    (5, true), // sub-min_lits all-identical → RLE
+                    (8, true),
+                    (8, false),
+                    (63, true),
+                    (63, false),
+                    (64, false),
+                ],
             ),
             (
                 StrategyTag::BtUltra2,
+                false,
                 &[(7, true), (7, false), (8, true), (8, false), (16, false)],
             ),
-            (StrategyTag::BtOpt, &[(8, true), (31, true), (32, false)]),
+            (
+                StrategyTag::BtOpt,
+                false,
+                &[(8, true), (31, true), (32, false)],
+            ),
+            // HUF reuse path: floor drops to 6, so 6/7-byte sections
+            // previously hit the HUF compress path. With the RLE
+            // pre-check now non-empty-only, all-identical 6/7-byte
+            // sections must route through RLE and stay byte-equivalent
+            // estimator-vs-emit. Also exercise non-identical 6-byte
+            // raw fallback and 16-byte HUF reuse path.
+            (
+                StrategyTag::Lazy,
+                true,
+                &[(6, true), (7, true), (6, false), (16, false)],
+            ),
         ];
 
-        for (strat, inputs) in cases {
+        for (strat, seed_huff, inputs) in cases {
             for (len, identical) in *inputs {
                 let literals: Vec<u8> = if *identical {
                     alloc::vec![0x5Au8; *len]
                 } else {
                     (0..*len as u8).collect()
                 };
+                // Seed both estimator and emit state with the same
+                // synthetic HUF table when `seed_huff` is true so the
+                // reuse path's `min_lits == 6` floor is exercised.
+                // Counts from a varied byte sequence give a valid
+                // (writeable) table that survives the `decide_huff_reuse`
+                // decision when literals are large enough to consider it.
+                let seed_table = if *seed_huff {
+                    let mut counts = [0usize; 256];
+                    for b in (0..=63u8).chain(64..=127u8) {
+                        counts[b as usize] = 1;
+                    }
+                    Some(huff0_encoder::HuffmanTable::build_from_counts(&counts))
+                } else {
+                    None
+                };
                 let mut est_state = CompressState::<EntropyOnlyMatcher> {
                     matcher: EntropyOnlyMatcher,
-                    last_huff_table: None,
+                    last_huff_table: seed_table.clone(),
                     fse_tables: FseTables::new(),
                     block_scratch: CompressedBlockScratch::new(),
                     offset_hist: [1, 4, 8],
@@ -1921,7 +1972,7 @@ mod tests {
                 };
                 let mut emit_state = CompressState::<EntropyOnlyMatcher> {
                     matcher: EntropyOnlyMatcher,
-                    last_huff_table: None,
+                    last_huff_table: seed_table,
                     fse_tables: FseTables::new(),
                     block_scratch: CompressedBlockScratch::new(),
                     offset_hist: [1, 4, 8],
@@ -1941,8 +1992,9 @@ mod tests {
                 assert_eq!(
                     est,
                     emitted.len(),
-                    "estimator/emit parity broken: strategy={:?} len={} identical={} est={} emit={}",
+                    "estimator/emit parity broken: strategy={:?} seed_huff={} len={} identical={} est={} emit={}",
                     strat,
+                    seed_huff,
                     len,
                     identical,
                     est,

From 60931c17edb9d8aff11bd80b748a075851cc33a7 Mon Sep 17 00:00:00 2001
From: Dmitry Prudnikov <mail@polaz.com>
Date: Tue, 19 May 2026 04:24:36 +0300
Subject: [PATCH 07/14] refactor(encoder): rename literal-gate vars to describe
 value, not origin
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

`donor_clit` told the reader where the formula came from, not what the
value represents. Rename to `huf_section_size` (tree description plus
HUF payload, excluding the literals lhSize) — describes the bytes
themselves so the call site is readable without recalling reference
naming conventions. Same rename in both the emit (`compress_literals`)
and estimator (`estimate_literals_section_bytes`) sides.

Test names follow the same principle:
- `min_literals_to_compress_matches_donor_table`
  → `min_literals_to_compress_returns_per_strategy_floor`
- `min_gain_matches_donor_margin`
  → `min_gain_returns_per_strategy_margin`

Behavior unchanged; 531/531 lib tests pass, clippy/fmt clean, ratio
matrix 0 deltas vs main.
---
 zstd/src/encoding/blocks/compressed.rs | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/zstd/src/encoding/blocks/compressed.rs b/zstd/src/encoding/blocks/compressed.rs
index ea28728e..94bdfc52 100644
--- a/zstd/src/encoding/blocks/compressed.rs
+++ b/zstd/src/encoding/blocks/compressed.rs
@@ -612,8 +612,8 @@ fn estimate_literals_section_bytes(
     // losing ratio. Mirror donor's payload-vs-srcSize form here.
     let raw_section_bytes = uncompressed_literals_header_bytes(literals.len()) + literals.len();
     let mg = min_gain(literals.len(), strategy);
-    let donor_clit = total - compressed_header; // tree_desc + payload
-    if donor_clit >= literals.len().saturating_sub(mg) {
+    let huf_section_size = total - compressed_header; // tree_desc + payload, no lhSize
+    if huf_section_size >= literals.len().saturating_sub(mg) {
         *last_huff = None;
         return raw_section_bytes;
     }
@@ -1782,9 +1782,9 @@ fn compress_literals(
     // fast..btopt, ~0.78% for btultra, ~0.39% for btultra2). Saturating
     // subtraction covers tiny inputs where `literals.len() < minGain`.
     let compressed_header_len = compressed_literals_header_bytes(literals.len());
-    let donor_clit = total_len - compressed_header_len; // tree_desc + payload
+    let huf_section_size = total_len - compressed_header_len; // tree_desc + payload, no lhSize
     let mg = min_gain(literals.len(), strategy);
-    if donor_clit >= literals.len().saturating_sub(mg) {
+    if huf_section_size >= literals.len().saturating_sub(mg) {
         writer.reset_to(reset_idx);
         raw_literals(literals, writer);
         HuffmanTableUpdate::Cleared
@@ -1847,7 +1847,7 @@ mod tests {
     }
 
     #[test]
-    fn min_literals_to_compress_matches_donor_table() {
+    fn min_literals_to_compress_returns_per_strategy_floor() {
         for strat in [
             StrategyTag::Fast,
             StrategyTag::Dfast,
@@ -1866,7 +1866,7 @@ mod tests {
     }
 
     #[test]
-    fn min_gain_matches_donor_margin() {
+    fn min_gain_returns_per_strategy_margin() {
         let src = 4096usize;
         for strat in [
             StrategyTag::Fast,

From bcc0c0d03ac2a8f8af59090870131e83fa2d1515 Mon Sep 17 00:00:00 2001
From: Dmitry Prudnikov <mail@polaz.com>
Date: Tue, 19 May 2026 04:30:06 +0300
Subject: [PATCH 08/14] refactor(encoder): centralize raw-literal-fallback gate
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Extract `use_raw_literal_fallback(huf_section_size, literals_len,
strategy)` helper so `compress_literals` (emit) and
`estimate_literals_section_bytes` (probe) share one decision point —
neither side can drift back to an on-wire-vs-on-wire comparison that
inflates the threshold by the lhSize delta and rejects marginally
compressed sections.

Add `use_raw_literal_fallback_uses_payload_vs_srcsize_threshold` test
that walks the 2-byte gap (`compressed_lhsize - raw_lhsize`) at
`literals.len() = 20` to lock in the payload-vs-srcSize semantics.

Tighten the RLE shortcut docstring: RLE equals raw at `len == 1`
(both 2 bytes), strictly smaller only at `len >= 2`.

532/532 lib tests, clippy/fmt clean.
---
 zstd/src/encoding/blocks/compressed.rs | 83 ++++++++++++++++++++++----
 1 file changed, 72 insertions(+), 11 deletions(-)

diff --git a/zstd/src/encoding/blocks/compressed.rs b/zstd/src/encoding/blocks/compressed.rs
index 94bdfc52..364147a1 100644
--- a/zstd/src/encoding/blocks/compressed.rs
+++ b/zstd/src/encoding/blocks/compressed.rs
@@ -79,6 +79,28 @@ fn min_gain(src_size: usize, strategy: crate::encoding::strategy::StrategyTag) -
     (src_size >> minlog) + 2
 }
 
+/// Donor `compress_literals` raw-fallback gate
+/// (`zstd_compress_literals.c:187-188`): emit raw when
+/// `cLitSize >= srcSize - minGain`, where `cLitSize` is the HUF payload
+/// plus tree description (the bytes `HUF_compress*` writes — excluding
+/// the surrounding literals lhSize) and `srcSize` is the literal-payload
+/// length. Compares payload-vs-srcSize, NOT on-wire-vs-on-wire, so the
+/// gate is symmetric in header overhead.
+///
+/// Centralized helper so `compress_literals` and
+/// `estimate_literals_section_bytes` share the exact same decision and
+/// neither side can drift back to the pre-2026-05 on-wire comparison
+/// (which inflated the threshold by `compressed_lhsize - raw_lhsize`
+/// bytes and rejected marginally-winning compressed sections).
+#[inline]
+fn use_raw_literal_fallback(
+    huf_section_size: usize,
+    literals_len: usize,
+    strategy: crate::encoding::strategy::StrategyTag,
+) -> bool {
+    huf_section_size >= literals_len.saturating_sub(min_gain(literals_len, strategy))
+}
+
 /// Donor `kInverseProbabilityLog256`: floor(-log2(x / 256) * 256).
 const INVERSE_PROBABILITY_LOG_256: [usize; 256] = [
     0, 2048, 1792, 1642, 1536, 1453, 1386, 1329, 1280, 1236, 1197, 1162, 1130, 1100, 1073, 1047,
@@ -382,13 +404,14 @@ fn encode_block_parts_with_sequence_scratch<M: Matcher>(
     // after passing the `min_lits` gate and running a full HUF compress —
     // so donor emits raw for any all-identical section under `min_lits`
     // (e.g. 8..63 bytes at fast/dfast/greedy/lazy, 6..7 bytes with HUF
-    // reuse). RLE is unconditionally better than raw on any non-empty
-    // all-identical input (2 bytes vs `1 + len`), so our pre-check fires
-    // for ANY all-identical literal slice regardless of strategy/min_lits.
-    // This produces strictly smaller output than donor on the small
-    // all-identical edges while still matching donor on `>= min_lits`
-    // inputs (where donor's compress+`cLitSize==1` path reaches the same
-    // RLE block). Note the order — RLE pre-check runs BEFORE `min_lits`;
+    // reuse). RLE is at worst equal to raw on `len == 1`
+    // (both produce a 2-byte section) and strictly smaller for `len >= 2`
+    // (2 bytes vs `1 + len`), so our pre-check fires for ANY all-identical
+    // literal slice regardless of strategy/min_lits. This produces strictly
+    // smaller output than donor on the small all-identical edges while
+    // still matching donor on `>= min_lits` inputs (where donor's
+    // compress+`cLitSize==1` path reaches the same RLE block).
+    // Note the order — RLE pre-check runs BEFORE `min_lits`;
     // `estimate_literals_section_bytes` mirrors this exactly so probe
     // costs match emit byte-for-byte.
     if !literals_vec.is_empty() && all_bytes_identical(literals_vec) {
@@ -611,9 +634,8 @@ fn estimate_literals_section_bytes(
     // bytes and rejects compressed sections that donor would keep,
     // losing ratio. Mirror donor's payload-vs-srcSize form here.
     let raw_section_bytes = uncompressed_literals_header_bytes(literals.len()) + literals.len();
-    let mg = min_gain(literals.len(), strategy);
     let huf_section_size = total - compressed_header; // tree_desc + payload, no lhSize
-    if huf_section_size >= literals.len().saturating_sub(mg) {
+    if use_raw_literal_fallback(huf_section_size, literals.len(), strategy) {
         *last_huff = None;
         return raw_section_bytes;
     }
@@ -1783,8 +1805,7 @@ fn compress_literals(
     // subtraction covers tiny inputs where `literals.len() < minGain`.
     let compressed_header_len = compressed_literals_header_bytes(literals.len());
     let huf_section_size = total_len - compressed_header_len; // tree_desc + payload, no lhSize
-    let mg = min_gain(literals.len(), strategy);
-    if huf_section_size >= literals.len().saturating_sub(mg) {
+    if use_raw_literal_fallback(huf_section_size, literals.len(), strategy) {
         writer.reset_to(reset_idx);
         raw_literals(literals, writer);
         HuffmanTableUpdate::Cleared
@@ -1884,6 +1905,46 @@ mod tests {
         assert_eq!(min_gain(64, StrategyTag::Fast), 3);
     }
 
+    #[test]
+    fn use_raw_literal_fallback_uses_payload_vs_srcsize_threshold() {
+        use super::{compressed_literals_header_bytes, use_raw_literal_fallback};
+        // Donor formula: `huf_section_size >= literals_len - min_gain`,
+        // payload-vs-srcSize (no headers on either side). Verify the
+        // gate is symmetric in header overhead by hitting the boundary
+        // where the old on-wire `total >= raw_section - mg` form would
+        // have disagreed.
+        let strategy = StrategyTag::Fast; // min_gain(20, Fast) = (20>>6)+2 = 2
+
+        // literals_len = 20: raw_header = 1, compressed_header = 3.
+        // New threshold (payload-vs-srcSize):
+        //   keep huf iff huf_section_size <  20 - 2 = 18
+        //   fallback   iff huf_section_size >= 18
+        //
+        // Old (regressed) threshold on-wire-vs-on-wire:
+        //   total = huf_section_size + 3
+        //   fallback iff total >= (20 + 1) - 2 = 19
+        //                iff huf_section_size >= 16
+        //
+        // Gap where formulas disagree: huf_section_size in [16, 18).
+        // The new formula MUST keep huf in this gap.
+        let literals_len = 20usize;
+        // Sanity-check the literals-header constants the math relies on.
+        assert_eq!(compressed_literals_header_bytes(literals_len), 3);
+        assert_eq!(super::uncompressed_literals_header_bytes(literals_len), 1);
+
+        // Inside the gap — new keeps, old would have rejected:
+        assert!(!use_raw_literal_fallback(16, literals_len, strategy));
+        assert!(!use_raw_literal_fallback(17, literals_len, strategy));
+
+        // At/above new threshold — both new and old fall back:
+        assert!(use_raw_literal_fallback(18, literals_len, strategy));
+        assert!(use_raw_literal_fallback(19, literals_len, strategy));
+
+        // Below the old threshold — both keep:
+        assert!(!use_raw_literal_fallback(15, literals_len, strategy));
+        assert!(!use_raw_literal_fallback(0, literals_len, strategy));
+    }
+
     #[test]
     fn estimator_literals_section_mirrors_emit_for_short_inputs() {
         use super::{

From bd6645f08f6dbba4c4980253a2880c0527c176f2 Mon Sep 17 00:00:00 2001
From: Dmitry Prudnikov <mail@polaz.com>
Date: Tue, 19 May 2026 05:01:38 +0300
Subject: [PATCH 09/14] docs(encoder): clarify Lazy strategy mapping and
 min_gain usage scope
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

`min_literals_to_compress` Lazy mapping: drop the "most-aggressive-of-band"
phrasing — donor's shift table pins strategies 1..6 to `shift = 3`
uniformly, so Lazy → 64-byte floor regardless of which donor index in
the 4..6 band we'd nominally pick. No aggressiveness gradient exists
within this band.

`min_gain` doc: split the donor formula (gates both block-level and
literal-section in donor) from the current crate usage (helper is
wired into literal-section gates only; block-level emit and probe
still use uniform `(source_len >> 8) + 2`). Previous wording conflated
the two and read as if the helper were already plumbed everywhere.
---
 zstd/src/encoding/blocks/compressed.rs | 33 +++++++++++++-------------
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/zstd/src/encoding/blocks/compressed.rs b/zstd/src/encoding/blocks/compressed.rs
index 364147a1..2394331c 100644
--- a/zstd/src/encoding/blocks/compressed.rs
+++ b/zstd/src/encoding/blocks/compressed.rs
@@ -26,10 +26,11 @@ const MAX_NB_BLOCK_SPLITS: usize = 196;
 /// → 16, strategy 9 (btultra2) → 8.
 ///
 /// Our `StrategyTag` enum has seven variants (no separate lazy2/btlazy2 —
-/// the `Lazy` variant covers donor strategies 4..6). Map our variants to
-/// donor strategy numbers using the most-aggressive-of-band mapping
-/// (`Lazy → strat=5 → shift=3`) so the threshold matches what donor would
-/// pick for the levels we route to that variant.
+/// the `Lazy` variant covers donor strategies 4..6). Within the
+/// fast..lazy2 band donor's shift table is flat: strategies 1..6 all
+/// pin `shift = MIN(9 - strat, 3) = 3`, so `Lazy → 64-byte floor`
+/// regardless of which donor index (4, 5, or 6) we'd nominally use.
+/// No aggressiveness gradient within this band to preserve.
 #[inline]
 fn min_literals_to_compress(
     strategy: crate::encoding::strategy::StrategyTag,
@@ -49,9 +50,9 @@ fn min_literals_to_compress(
 }
 
 /// Donor `ZSTD_minGain` (`zstd_compress_internal.h:677-684`):
-/// strategy-aware minimum-compression margin used by both block-level
-/// emit ("compressed block must beat raw + minGain bytes") and the
-/// literal-section expansion check.
+/// strategy-aware minimum-compression margin. In donor it gates both
+/// the block-level "compressed block must beat raw + minGain" decision
+/// and the literal-section `cLitSize >= srcSize - minGain` fallback.
 ///
 /// Formula: `minlog = (strat >= btultra) ? strat - 1 : 6; (src_size >>
 /// minlog) + 2`. So:
@@ -59,15 +60,15 @@ fn min_literals_to_compress(
 /// - btultra (strat 8): minlog=7 → ~0.78% margin + 2 bytes
 /// - btultra2 (strat 9): minlog=8 → ~0.39% margin + 2 bytes
 ///
-/// Used in `compress_literals` and `estimate_literals_section_bytes` to
-/// match donor's `cLitSize >= srcSize - minGain` raw-fallback gate.
-/// Existing block-level emit / probe paths (`emit_single_sequence_block`,
-/// `SplitEstimator::estimate_subblock_size`) already use `min_log = 8`
-/// directly (`source_len >> 8 + 2`) — that is the btultra2 margin, the
-/// most permissive of the band, applied uniformly. Migrating those sites
-/// to this strategy-aware helper is a separate cleanup; this commit only
-/// wires the helper into the literal-section gates that previously had
-/// no `min_gain` margin at all (bare `>= raw_section_bytes`).
+/// **Current usage in this crate:** wired into the literal-section
+/// raw-fallback gate (`compress_literals` +
+/// `estimate_literals_section_bytes`) only — those sites previously
+/// had no margin at all (bare `>= raw_section_bytes`).
+/// **Not yet wired into** the block-level emit/probe paths
+/// (`emit_single_sequence_block`, `SplitEstimator::estimate_subblock_size`),
+/// which still use a uniform `(source_len >> 8) + 2` calculation
+/// (the btultra2 value applied across all strategies). Migrating
+/// those sites is a separate cleanup.
 #[inline]
 fn min_gain(src_size: usize, strategy: crate::encoding::strategy::StrategyTag) -> usize {
     use crate::encoding::strategy::StrategyTag;

From 9945077ce369bacace5e58b04355f23dcb7a8a3c Mon Sep 17 00:00:00 2001
From: Dmitry Prudnikov <mail@polaz.com>
Date: Tue, 19 May 2026 05:24:13 +0300
Subject: [PATCH 10/14] docs(encoder): document strategy_tag invariant +
 accurate RLE-vs-raw
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

`CompressState.strategy_tag` doc: replace the misleading "defaults to
Fast" wording with an explicit invariant — the field must be
initialized from the active `CompressionLevel` and re-synced on every
`matcher.reset()`. There is no `Default` impl; production constructors
and tests must supply a value.

`estimate_literals_section_bytes` docstring: tighten the RLE-vs-raw
size claim so it matches the actual invariant — RLE equals raw at
`len == 1` (both 2 bytes) and is strictly smaller for `len >= 2`,
i.e. RLE is never worse than raw on all-identical input.
---
 zstd/src/encoding/blocks/compressed.rs | 10 ++++++----
 zstd/src/encoding/frame_compressor.rs  | 14 ++++++++++----
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/zstd/src/encoding/blocks/compressed.rs b/zstd/src/encoding/blocks/compressed.rs
index 2394331c..9ef6d2b5 100644
--- a/zstd/src/encoding/blocks/compressed.rs
+++ b/zstd/src/encoding/blocks/compressed.rs
@@ -560,10 +560,12 @@ fn estimate_literals_section_bytes(
 ) -> usize {
     // Mirror `encode_block_parts_with_sequence_scratch` literal-mode branches
     // **in the same order**. The emitter pre-checks `all_identical`
-    // (any non-empty section) BEFORE the `min_lits` gate — RLE is
-    // unconditionally smaller than raw on all-identical inputs, so it
-    // is selected regardless of strategy. Estimator must use the same
-    // ordering and predicate so probe costs match emit byte-for-byte.
+    // (any non-empty section) BEFORE the `min_lits` gate — on
+    // all-identical inputs RLE is equal to raw at `len == 1` (both
+    // 2 bytes) and strictly smaller for `len >= 2`, so it is never
+    // worse than raw and is selected regardless of strategy. Estimator
+    // must use the same ordering and predicate so probe costs match
+    // emit byte-for-byte.
     if !literals.is_empty() && all_bytes_identical(literals) {
         *last_huff = None;
         return uncompressed_literals_header_bytes(literals.len()) + 1;
diff --git a/zstd/src/encoding/frame_compressor.rs b/zstd/src/encoding/frame_compressor.rs
index bc67a14b..7c5069e2 100644
--- a/zstd/src/encoding/frame_compressor.rs
+++ b/zstd/src/encoding/frame_compressor.rs
@@ -394,10 +394,16 @@ pub(crate) struct CompressState<M: Matcher> {
     /// `matcher.reset()` call. Used by the literal-compression gates
     /// (`min_literals_to_compress`, `min_gain`) in
     /// `encoding::blocks::compressed` to mirror donor's strategy-aware
-    /// thresholds (`zstd_compress_literals.c:114-127, 187-188`). Defaults
-    /// to `Fast` so a `CompressState` constructed without an explicit
-    /// reset still has a valid value (matches the level-1 default donor
-    /// would emit for `disableLiteralCompression == 0`).
+    /// thresholds (`zstd_compress_literals.c:114-127, 187-188`).
+    ///
+    /// **Invariant (required of every construction site):** must be
+    /// initialized from the active `CompressionLevel` via
+    /// `StrategyTag::for_compression_level`, and re-synced on every
+    /// `matcher.reset()` so the level-aware gates stay correct after
+    /// a level change. There is no `Default` impl — production paths
+    /// (`FrameCompressor::new`, `new_with_matcher`, the streaming
+    /// encoder constructor) plumb this explicitly. Tests that build
+    /// `CompressState` by hand must also supply a value.
     pub(crate) strategy_tag: crate::encoding::strategy::StrategyTag,
 }
 

From 25d8dbaca19042afacbb58dc2907476853a5621c Mon Sep 17 00:00:00 2001
From: Dmitry Prudnikov <mail@polaz.com>
Date: Tue, 19 May 2026 05:43:40 +0300
Subject: [PATCH 11/14] fix(encoder): sync strategy_tag on streaming reset path

`StreamingEncoder::ensure_frame_started` resets the matcher and clears
last_huff / fse_tables for the new frame but did not refresh
`state.strategy_tag` from the active compression level. That left the
literal-compression gates (`min_literals_to_compress`, `min_gain`)
reading the construction-time strategy on every subsequent frame,
which would silently leak the wrong gate floor if the encoder were
ever extended with a between-frame level change.

Mirror `FrameCompressor::compress`'s reset-time sync at the streaming
reset site so both entry points are byte-equivalent at the gate
level. Tighten the `CompressState.strategy_tag` invariant docstring
to enumerate the two owned reset sites explicitly. Add a regression
test that exercises the streaming reset across four levels and
asserts the synced tag matches `StrategyTag::for_compression_level`.

533/533 lib tests pass; clippy and fmt clean.
---
 zstd/src/encoding/frame_compressor.rs  |  9 +++--
 zstd/src/encoding/streaming_encoder.rs | 47 ++++++++++++++++++++++++++
 2 files changed, 53 insertions(+), 3 deletions(-)

diff --git a/zstd/src/encoding/frame_compressor.rs b/zstd/src/encoding/frame_compressor.rs
index 7c5069e2..c557eb75 100644
--- a/zstd/src/encoding/frame_compressor.rs
+++ b/zstd/src/encoding/frame_compressor.rs
@@ -398,9 +398,12 @@ pub(crate) struct CompressState<M: Matcher> {
     ///
     /// **Invariant (required of every construction site):** must be
     /// initialized from the active `CompressionLevel` via
-    /// `StrategyTag::for_compression_level`, and re-synced on every
-    /// `matcher.reset()` so the level-aware gates stay correct after
-    /// a level change. There is no `Default` impl — production paths
+    /// `StrategyTag::for_compression_level`, and re-synced from the
+    /// active level alongside every `matcher.reset()` call so the
+    /// level-aware gates stay correct after a level change. The two
+    /// reset sites that own this sync are `FrameCompressor::compress`
+    /// and `StreamingEncoder::ensure_frame_started`. There is no
+    /// `Default` impl — production constructors
     /// (`FrameCompressor::new`, `new_with_matcher`, the streaming
     /// encoder constructor) plumb this explicitly. Tests that build
     /// `CompressState` by hand must also supply a value.
diff --git a/zstd/src/encoding/streaming_encoder.rs b/zstd/src/encoding/streaming_encoder.rs
index 54687234..78e696a5 100644
--- a/zstd/src/encoding/streaming_encoder.rs
+++ b/zstd/src/encoding/streaming_encoder.rs
@@ -236,6 +236,13 @@ impl<W: Write, M: Matcher> StreamingEncoder<W, M> {
         self.state.fse_tables.ll_previous = None;
         self.state.fse_tables.ml_previous = None;
         self.state.fse_tables.of_previous = None;
+        // Sync `state.strategy_tag` from the active compression level so the
+        // literal-compression gates (`min_literals_to_compress`, `min_gain`
+        // in `encoding::blocks::compressed`) see the correct strategy for
+        // every frame. Mirrors `FrameCompressor::compress` and keeps both
+        // entry points byte-equivalent at the gate level.
+        self.state.strategy_tag =
+            crate::encoding::strategy::StrategyTag::for_compression_level(self.compression_level);
         #[cfg(feature = "hash")]
         {
             self.hasher = XxHash64::with_seed(0);
@@ -1175,6 +1182,46 @@ mod tests {
         );
     }
 
+    #[test]
+    fn ensure_frame_started_syncs_strategy_tag_from_compression_level() {
+        // The literal-compression gates (`min_literals_to_compress`,
+        // `min_gain`) read `state.strategy_tag`. Regression: every reset
+        // site MUST refresh that tag from the active compression level,
+        // otherwise gates use stale strategy after a frame start.
+        use crate::encoding::strategy::StrategyTag;
+        // Pick a level pair whose resolved StrategyTag differs so a
+        // missed sync would visibly leak the construction-time value.
+        let cases = [
+            (
+                CompressionLevel::Fastest,
+                StrategyTag::for_compression_level(CompressionLevel::Fastest),
+            ),
+            (
+                CompressionLevel::Default,
+                StrategyTag::for_compression_level(CompressionLevel::Default),
+            ),
+            (
+                CompressionLevel::Better,
+                StrategyTag::for_compression_level(CompressionLevel::Better),
+            ),
+            (
+                CompressionLevel::Best,
+                StrategyTag::for_compression_level(CompressionLevel::Best),
+            ),
+        ];
+        for (level, expected_tag) in cases {
+            let mut encoder = StreamingEncoder::new(Vec::new(), level);
+            // First write triggers `ensure_frame_started` → matcher reset
+            // → strategy_tag sync.
+            encoder.write_all(b"x").unwrap();
+            assert_eq!(
+                encoder.state.strategy_tag, expected_tag,
+                "strategy_tag drift after frame start at level {level:?}",
+            );
+            let _ = encoder.finish().unwrap();
+        }
+    }
+
     #[test]
     fn no_pledged_size_omits_fcs_from_header() {
         let mut encoder = StreamingEncoder::new(Vec::new(), CompressionLevel::Fastest);

From 150a660b7847c331d35ddb957862e8407913d429 Mon Sep 17 00:00:00 2001
From: Dmitry Prudnikov <mail@polaz.com>
Date: Tue, 19 May 2026 06:02:11 +0300
Subject: [PATCH 12/14] =?UTF-8?q?test(encoder):=20cover=20FrameCompressor:?=
 =?UTF-8?q?:set=5Fcompression=5Flevel=20=E2=86=92=20strategy=5Ftag?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The streaming reset-time sync test only exercises a single fixed level
per encoder, which would pass even with the reset-time sync deleted
because the construction-time tag already matches. Add a regression
test that drives the level-change path explicitly: construct
`FrameCompressor` at `CompressionLevel::Fastest`, call
`set_compression_level(Best)` to cross from the Fast band into the
BtUltra2 band, run a full `compress()` cycle, then verify
`state.strategy_tag` reflects the new level's
`StrategyTag::for_compression_level` resolution.

Includes an `assert_ne!` fixture invariant that the two chosen levels
resolve to different `StrategyTag` variants, so a future renumbering
that collapses the bands would be loud about it rather than turning
the test into a no-op.

534/534 lib tests pass; clippy and fmt clean.
---
 zstd/src/encoding/frame_compressor.rs | 45 +++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/zstd/src/encoding/frame_compressor.rs b/zstd/src/encoding/frame_compressor.rs
index c557eb75..d5e76aba 100644
--- a/zstd/src/encoding/frame_compressor.rs
+++ b/zstd/src/encoding/frame_compressor.rs
@@ -2223,4 +2223,49 @@ mod tests {
         decoder.collect_to_writer(&mut decoded).unwrap();
         assert_eq!(decoded, data, "roundtrip must reproduce the input verbatim");
     }
+
+    /// Regression: `set_compression_level` followed by `compress()` must
+    /// refresh `state.strategy_tag` through the reset-time sync so the
+    /// literal-compression gates (`min_literals_to_compress`,
+    /// `min_gain`) use the NEW level's strategy. Picks a level pair that
+    /// crosses strategy bands (Fastest=Fast band → Best=BtUltra2 band)
+    /// so a missed sync would leave the construction-time tag visible
+    /// and trip the assertion.
+    #[cfg(feature = "std")]
+    #[test]
+    fn set_compression_level_then_compress_refreshes_strategy_tag() {
+        use super::CompressionLevel;
+        use crate::encoding::strategy::StrategyTag;
+
+        let data = vec![0xABu8; 256];
+        let mut out = Vec::new();
+        let mut compressor = FrameCompressor::new(CompressionLevel::Fastest);
+        let initial_tag = compressor.state.strategy_tag;
+        assert_eq!(
+            initial_tag,
+            StrategyTag::for_compression_level(CompressionLevel::Fastest),
+            "construction-time strategy_tag must reflect initial level",
+        );
+
+        // Switch to a level whose resolved strategy lives in a different
+        // band, then run a full compress cycle — the matcher.reset()
+        // inside `compress` is the only site that can refresh the tag.
+        compressor.set_compression_level(CompressionLevel::Best);
+        compressor.set_source(data.as_slice());
+        compressor.set_drain(&mut out);
+        compressor.compress();
+
+        let new_tag = compressor.state.strategy_tag;
+        let expected = StrategyTag::for_compression_level(CompressionLevel::Best);
+        assert_eq!(
+            new_tag, expected,
+            "strategy_tag must follow set_compression_level → compress, \
+             got {new_tag:?} expected {expected:?}",
+        );
+        assert_ne!(
+            new_tag, initial_tag,
+            "test fixture invariant: chosen levels must resolve to \
+             different StrategyTag variants",
+        );
+    }
 }

From a4b484b3c8e0cfbdb8751036fd8162a3686b917d Mon Sep 17 00:00:00 2001
From: Dmitry Prudnikov <mail@polaz.com>
Date: Tue, 19 May 2026 06:12:55 +0300
Subject: [PATCH 13/14] docs(encoder): drop hard-coded byte counts in
 literal-gate rationale
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

`min_literals_to_compress` doc: replace the "≥30 bytes" HUF
tree-description lower bound with neutral phrasing — small alphabets
with low max symbols can serialize substantially smaller descriptions.
The exact byte cost depends on alphabet size and max symbol, not a
fixed minimum; the surrounding sentence already conveys that
overhead-dominates-payload is what drives the per-strategy floor.

RLE-shortcut docstrings (emit and estimator branches): replace
"2 bytes vs `1 + len`" with the header-helper-aware form. Both RLE
and raw use the same `uncompressed_literals_header_bytes(len)`
(1/2/3/5 bytes by length tier), so the size relationship is
RLE = lhSize + 1, raw = lhSize + len → equal at `len == 1`, RLE
smaller by `len - 1` for `len >= 2`. The previous wording was only
correct for the `len < 32` tier where lhSize = 1.

No code changes; clippy and fmt clean.
---
 zstd/src/encoding/blocks/compressed.rs | 36 +++++++++++++++-----------
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/zstd/src/encoding/blocks/compressed.rs b/zstd/src/encoding/blocks/compressed.rs
index 9ef6d2b5..ef3698e6 100644
--- a/zstd/src/encoding/blocks/compressed.rs
+++ b/zstd/src/encoding/blocks/compressed.rs
@@ -20,8 +20,10 @@ const MAX_NB_BLOCK_SPLITS: usize = 196;
 /// Formula: `shift = MIN(9 - donor_strategy, 3); mintc = (huf_repeat ==
 /// valid) ? 6 : (8 << shift)`. With huf reuse available, the per-block huf
 /// header overhead is gone, so the cheap floor is 6 bytes. Without it, the
-/// huf tree-description overhead (≥30 bytes) dominates on small payloads
-/// and we'd produce a worse-than-raw block — donor's shift table gives
+/// huf tree-description must be serialized per block — alphabet size and
+/// max symbol determine its exact byte cost, but on payloads near the
+/// per-strategy floor that overhead dominates and the compressed section
+/// loses to raw. Donor's shift table picks the floor per strategy:
 /// strategy 1..6 → 64 bytes, strategy 7 (btopt) → 32, strategy 8 (btultra)
 /// → 16, strategy 9 (btultra2) → 8.
 ///
@@ -405,13 +407,16 @@ fn encode_block_parts_with_sequence_scratch<M: Matcher>(
     // after passing the `min_lits` gate and running a full HUF compress —
     // so donor emits raw for any all-identical section under `min_lits`
     // (e.g. 8..63 bytes at fast/dfast/greedy/lazy, 6..7 bytes with HUF
-    // reuse). RLE is at worst equal to raw on `len == 1`
-    // (both produce a 2-byte section) and strictly smaller for `len >= 2`
-    // (2 bytes vs `1 + len`), so our pre-check fires for ANY all-identical
-    // literal slice regardless of strategy/min_lits. This produces strictly
-    // smaller output than donor on the small all-identical edges while
-    // still matching donor on `>= min_lits` inputs (where donor's
-    // compress+`cLitSize==1` path reaches the same RLE block).
+    // reuse). RLE and raw share the same lhSize for a given `len`
+    // (both use `uncompressed_literals_header_bytes`), so RLE = lhSize + 1
+    // and raw = lhSize + len. That makes RLE equal to raw on `len == 1`
+    // and smaller by exactly `len - 1` bytes for `len >= 2`, regardless of
+    // the lhSize tier (1 / 2 / 3 / 5 bytes). Our pre-check fires for ANY
+    // all-identical literal slice regardless of strategy/min_lits.
+    // This produces strictly smaller output than donor on the small
+    // all-identical edges while still matching donor on `>= min_lits`
+    // inputs (where donor's compress+`cLitSize==1` path reaches the same
+    // RLE block).
     // Note the order — RLE pre-check runs BEFORE `min_lits`;
     // `estimate_literals_section_bytes` mirrors this exactly so probe
     // costs match emit byte-for-byte.
@@ -560,12 +565,13 @@ fn estimate_literals_section_bytes(
 ) -> usize {
     // Mirror `encode_block_parts_with_sequence_scratch` literal-mode branches
     // **in the same order**. The emitter pre-checks `all_identical`
-    // (any non-empty section) BEFORE the `min_lits` gate — on
-    // all-identical inputs RLE is equal to raw at `len == 1` (both
-    // 2 bytes) and strictly smaller for `len >= 2`, so it is never
-    // worse than raw and is selected regardless of strategy. Estimator
-    // must use the same ordering and predicate so probe costs match
-    // emit byte-for-byte.
+    // (any non-empty section) BEFORE the `min_lits` gate — RLE and raw
+    // share `uncompressed_literals_header_bytes(len)` (1/2/3/5 bytes by
+    // length tier), so on all-identical inputs RLE = lhSize + 1 equals
+    // raw = lhSize + len at `len == 1` and is smaller by `len - 1` for
+    // `len >= 2`. RLE is never worse than raw, so it is selected
+    // regardless of strategy. Estimator must use the same ordering and
+    // predicate so probe costs match emit byte-for-byte.
     if !literals.is_empty() && all_bytes_identical(literals) {
         *last_huff = None;
         return uncompressed_literals_header_bytes(literals.len()) + 1;

From 3933521be56aa3020c6e8bbe9df2b18b21d92a52 Mon Sep 17 00:00:00 2001
From: Dmitry Prudnikov <mail@polaz.com>
Date: Tue, 19 May 2026 06:29:11 +0300
Subject: [PATCH 14/14] test(encoder): tighten strategy_tag regressions to
 actually fail without the sync
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Three factual cleanups so the literal-gate tests provably guard the
reset-time `strategy_tag` sync and the docstrings around the RLE
pre-check stop misrepresenting historical behavior.

- `set_compression_level_then_compress_refreshes_strategy_tag`: switch
  from `CompressionLevel::Best` to `CompressionLevel::Level(20)` so the
  post-switch tag genuinely crosses into the BtUltra2 band. `Best`
  resolves to `Lazy` today, which keeps `min_literals_to_compress` in
  the same `shift=3 → 64-byte` band as `Fast` and weakens the signal.
  Add an explicit `assert_eq!(expected, StrategyTag::BtUltra2)` fixture
  invariant so future renumbering breaks loudly.
- `ensure_frame_started_refreshes_stale_strategy_tag_at_reset`
  (renamed): the previous version constructed the encoder at the same
  level it asserted, so `StreamingEncoder::new`'s own init satisfied
  the assertion even if the reset-time sync were deleted. Now the test
  deliberately corrupts `state.strategy_tag` to a sentinel
  (`BtUltra2`) before the first write and asserts the reset path
  restores the legitimate tag — a missing sync leaves the sentinel
  visible and trips the assertion.
- RLE pre-check docstring on the emit side: drop the "6/7 bytes with
  HUF reuse" example. With HUF reuse `min_lits == 6`, so 6/7-byte
  sections are NOT below the gate; donor would HUF-encode them and
  reach RLE via `cLitSize == 1`. The justification for the pre-check
  is the genuine under-gate range (e.g. 8..63 bytes at
  fast/dfast/greedy/lazy without HUF reuse), not the HUF-reuse band.
- Estimator/emit parity test prose: the historical behavior for
  6/7-byte all-identical sections under the prior hardcoded `len >= 8`
  gate was RAW (not HUF). Correct both the per-case rationale and the
  block-level docstring; the test cases themselves are unchanged.

534/534 lib tests pass; clippy and fmt clean.
---
 zstd/src/encoding/blocks/compressed.rs | 25 +++++-----
 zstd/src/encoding/frame_compressor.rs  | 23 ++++++---
 zstd/src/encoding/streaming_encoder.rs | 64 ++++++++++++++------------
 3 files changed, 66 insertions(+), 46 deletions(-)

diff --git a/zstd/src/encoding/blocks/compressed.rs b/zstd/src/encoding/blocks/compressed.rs
index ef3698e6..f1c586df 100644
--- a/zstd/src/encoding/blocks/compressed.rs
+++ b/zstd/src/encoding/blocks/compressed.rs
@@ -406,8 +406,8 @@ fn encode_block_parts_with_sequence_scratch<M: Matcher>(
     // the `cLitSize == 1` branch (`zstd_compress_literals.c:192-201`)
     // after passing the `min_lits` gate and running a full HUF compress —
     // so donor emits raw for any all-identical section under `min_lits`
-    // (e.g. 8..63 bytes at fast/dfast/greedy/lazy, 6..7 bytes with HUF
-    // reuse). RLE and raw share the same lhSize for a given `len`
+    // (e.g. 8..63 bytes at fast/dfast/greedy/lazy without HUF reuse).
+    // RLE and raw share the same lhSize for a given `len`
     // (both use `uncompressed_literals_header_bytes`), so RLE = lhSize + 1
     // and raw = lhSize + len. That makes RLE equal to raw on `len == 1`
     // and smaller by exactly `len - 1` bytes for `len >= 2`, regardless of
@@ -1966,9 +1966,11 @@ mod tests {
         // estimator's predicted size MUST equal the bytes the emitter
         // actually writes. Cases include: (a) fresh state with
         // `last_huff_table: None` covering the strategy-specific
-        // `min_lits` band (8/16/32/64), (b) seeded HUF-reuse state
-        // covering the lowered floor of 6 and 6/7-byte all-identical
-        // sections that previously hit the HUF path instead of RLE,
+        // `min_lits` band (8/16/32/64), (b) seeded HUF-reuse state at
+        // the lowered floor of 6 — under the prior hardcoded `len >= 8`
+        // gate 6/7-byte all-identical sections went raw, now they pass
+        // `min_lits == 6` and the donor-parity HUF+`cLitSize==1` path
+        // would route them to RLE; the pre-check shortcuts that path,
         // (c) sub-`min_lits` all-identical sections that take the
         // RLE pre-check regardless of strategy.
         type Inputs = &'static [(usize, bool)];
@@ -1997,12 +1999,13 @@ mod tests {
                 false,
                 &[(8, true), (31, true), (32, false)],
             ),
-            // HUF reuse path: floor drops to 6, so 6/7-byte sections
-            // previously hit the HUF compress path. With the RLE
-            // pre-check now non-empty-only, all-identical 6/7-byte
-            // sections must route through RLE and stay byte-equivalent
-            // estimator-vs-emit. Also exercise non-identical 6-byte
-            // raw fallback and 16-byte HUF reuse path.
+            // HUF reuse path: floor drops to 6. Under the prior
+            // hardcoded `len >= 8` gate 6/7-byte sections went raw;
+            // post-fix, all-identical 6/7-byte sections take the RLE
+            // pre-check and stay byte-equivalent estimator-vs-emit
+            // (donor parity would route them HUF→cLitSize==1→RLE).
+            // Also exercise non-identical 6-byte raw fallback and
+            // 16-byte HUF reuse path.
             (
                 StrategyTag::Lazy,
                 true,
diff --git a/zstd/src/encoding/frame_compressor.rs b/zstd/src/encoding/frame_compressor.rs
index d5e76aba..140deece 100644
--- a/zstd/src/encoding/frame_compressor.rs
+++ b/zstd/src/encoding/frame_compressor.rs
@@ -2227,10 +2227,14 @@ mod tests {
     /// Regression: `set_compression_level` followed by `compress()` must
     /// refresh `state.strategy_tag` through the reset-time sync so the
     /// literal-compression gates (`min_literals_to_compress`,
-    /// `min_gain`) use the NEW level's strategy. Picks a level pair that
-    /// crosses strategy bands (Fastest=Fast band → Best=BtUltra2 band)
-    /// so a missed sync would leave the construction-time tag visible
-    /// and trip the assertion.
+    /// `min_gain`) use the NEW level's strategy. Picks a level pair
+    /// that genuinely crosses strategy bands — `Fastest` resolves to
+    /// `Fast`, `Level(20)` resolves to `BtUltra2` — so a missed sync
+    /// would leave the construction-time tag visible and trip the
+    /// assertion. `CompressionLevel::Best` would also pass type-wise
+    /// but resolves to `Lazy` today, which keeps `min_literals_to_compress`
+    /// in the same `shift=3 → 64-byte` band as `Fast` and weakens the
+    /// signal that the gate floor actually moved.
     #[cfg(feature = "std")]
     #[test]
     fn set_compression_level_then_compress_refreshes_strategy_tag() {
@@ -2250,18 +2254,25 @@ mod tests {
         // Switch to a level whose resolved strategy lives in a different
         // band, then run a full compress cycle — the matcher.reset()
         // inside `compress` is the only site that can refresh the tag.
-        compressor.set_compression_level(CompressionLevel::Best);
+        let new_level = CompressionLevel::Level(20);
+        compressor.set_compression_level(new_level);
         compressor.set_source(data.as_slice());
         compressor.set_drain(&mut out);
         compressor.compress();
 
         let new_tag = compressor.state.strategy_tag;
-        let expected = StrategyTag::for_compression_level(CompressionLevel::Best);
+        let expected = StrategyTag::for_compression_level(new_level);
         assert_eq!(
             new_tag, expected,
             "strategy_tag must follow set_compression_level → compress, \
              got {new_tag:?} expected {expected:?}",
         );
+        assert_eq!(
+            expected,
+            StrategyTag::BtUltra2,
+            "test fixture invariant: Level(20) must resolve to BtUltra2 \
+             so the post-switch tag visibly crosses the band boundary",
+        );
         assert_ne!(
             new_tag, initial_tag,
             "test fixture invariant: chosen levels must resolve to \
diff --git a/zstd/src/encoding/streaming_encoder.rs b/zstd/src/encoding/streaming_encoder.rs
index 78e696a5..5499f371 100644
--- a/zstd/src/encoding/streaming_encoder.rs
+++ b/zstd/src/encoding/streaming_encoder.rs
@@ -1183,40 +1183,46 @@ mod tests {
     }
 
     #[test]
-    fn ensure_frame_started_syncs_strategy_tag_from_compression_level() {
+    fn ensure_frame_started_refreshes_stale_strategy_tag_at_reset() {
         // The literal-compression gates (`min_literals_to_compress`,
-        // `min_gain`) read `state.strategy_tag`. Regression: every reset
-        // site MUST refresh that tag from the active compression level,
-        // otherwise gates use stale strategy after a frame start.
+        // `min_gain`) read `state.strategy_tag`. Regression: every
+        // reset site MUST refresh that tag from the active compression
+        // level — relying on construction-time initialization alone is
+        // not enough, because later mutations or reuse patterns can
+        // leave the tag stale.
+        //
+        // To exercise the RESET-time refresh (not just the
+        // construction-time init that `StreamingEncoder::new` does for
+        // free), this test deliberately corrupts `state.strategy_tag`
+        // to a value that does NOT match the active level, then
+        // triggers `ensure_frame_started` and asserts the reset path
+        // wrote the correct tag back. If the sync line in
+        // `ensure_frame_started` were deleted, the corrupted value
+        // would survive the write and fail the assertion.
         use crate::encoding::strategy::StrategyTag;
-        // Pick a level pair whose resolved StrategyTag differs so a
-        // missed sync would visibly leak the construction-time value.
-        let cases = [
-            (
-                CompressionLevel::Fastest,
-                StrategyTag::for_compression_level(CompressionLevel::Fastest),
-            ),
-            (
-                CompressionLevel::Default,
-                StrategyTag::for_compression_level(CompressionLevel::Default),
-            ),
-            (
-                CompressionLevel::Better,
-                StrategyTag::for_compression_level(CompressionLevel::Better),
-            ),
-            (
-                CompressionLevel::Best,
-                StrategyTag::for_compression_level(CompressionLevel::Best),
-            ),
-        ];
-        for (level, expected_tag) in cases {
+        for level in [
+            CompressionLevel::Fastest,
+            CompressionLevel::Default,
+            CompressionLevel::Better,
+            CompressionLevel::Best,
+        ] {
+            let expected = StrategyTag::for_compression_level(level);
             let mut encoder = StreamingEncoder::new(Vec::new(), level);
-            // First write triggers `ensure_frame_started` → matcher reset
-            // → strategy_tag sync.
+            // Pick a sentinel that differs from the legitimate tag so
+            // a missing reset-time sync is observable. BtUltra2 is the
+            // most-aggressive variant; the four levels above resolve
+            // to Fast/Dfast/Lazy/Lazy respectively, none equal to it.
+            let sentinel = StrategyTag::BtUltra2;
+            assert_ne!(
+                expected, sentinel,
+                "sentinel must differ from the legitimate tag at level {level:?}",
+            );
+            encoder.state.strategy_tag = sentinel;
             encoder.write_all(b"x").unwrap();
             assert_eq!(
-                encoder.state.strategy_tag, expected_tag,
-                "strategy_tag drift after frame start at level {level:?}",
+                encoder.state.strategy_tag, expected,
+                "reset-time strategy_tag sync missing at level {level:?}: \
+                 sentinel survived `ensure_frame_started`",
             );
             let _ = encoder.finish().unwrap();
         }