From 7d2050a4a08e98c6aa014b123ed97ace93bcbca8 Mon Sep 17 00:00:00 2001 From: Dmitry Prudnikov Date: Tue, 19 May 2026 02:50:34 +0300 Subject: [PATCH 01/14] =?UTF-8?q?docs(encoder):=20honest=20G3=20bail-out?= =?UTF-8?q?=20wording=20=E2=80=94=20split-win=20is=20theoretically=20possi?= =?UTF-8?q?ble?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The previous wording claimed the bail-out routes blocks that "would have been raw-fallbacked by the real emit anyway, by the same threshold". That is true ONLY for the single-partition path the bail-out routes to — it is NOT a proof that a recursive split could never beat the whole-block outcome. In principle both sub-blocks could compress strictly (no raw-fallback in either half) and produce `cost(first) + cost(second) < source_len + 3`. The wider donor band gives at most `min_gain` bytes of theoretical recoverable ratio per block — small, but non-zero. Comment split into two paragraphs: 1. The "no wire-output drift" guarantee (only applies to the single-partition path the bail-out actually takes). 2. The "what we forgo by skipping the split case" caveat — both sub-blocks would need to compress strictly AND beat the whole- block cost; band is bounded by `min_gain`, empirically zero delta in the bench matrix. No behavior change. Doc-only. --- zstd/src/encoding/blocks/compressed.rs | 22 ++++++++++++++-------- 1 file changed, 14 insertions(+), 8 deletions(-) diff --git a/zstd/src/encoding/blocks/compressed.rs b/zstd/src/encoding/blocks/compressed.rs index 543ced6d..1f742a86 100644 --- a/zstd/src/encoding/blocks/compressed.rs +++ b/zstd/src/encoding/blocks/compressed.rs @@ -968,14 +968,20 @@ impl SplitEstimator<'_> { // single-partition path → `emit_single_sequence_block`, // which applies the SAME `min_gain` expansion fallback (its // `buffers.compressed.len() >= source_len - min_gain` check - // right before deciding raw-fallback). So whatever block we - // bail on would have been raw-fallbacked by the real emit - // anyway, by the same threshold. - // - For a missed split-win to matter, both sub-blocks would - // need to compress strictly (no raw-fallback in either - // half), AND `cost(first) + cost(second) < source_len + 3`. - // The wider donor band gives at most `min_gain` bytes of - // theoretical recoverable ratio per block. + // right before deciding raw-fallback). So for the + // single-partition path specifically, any block we bail on + // here would also raw-fallback there by the same threshold — + // no wire-output drift from this bail-out vs the "let the + // real emit decide" alternative. + // - Returning here does skip the split case, so this is NOT a + // proof that a recursive split could never do better: in + // principle, both sub-blocks could compress strictly (no + // raw-fallback in either half) and beat the whole-block + // outcome. For such a missed split-win to matter, both + // sub-blocks would need to compress strictly AND + // `cost(first) + cost(second) < source_len + 3`. The wider + // donor band gives at most `min_gain` bytes of theoretical + // recoverable ratio per block. // - Empirically validated: `compare_ffi --list` REPORT lines // show **zero rust_bytes delta** vs main on every // (scenario, level) cell across the full bench matrix. From 19a3eda6b31ddea40fa230ad5ba4e225df716d0b Mon Sep 17 00:00:00 2001 From: Dmitry Prudnikov Date: Tue, 19 May 2026 03:09:52 +0300 Subject: [PATCH 02/14] feat(encoder): strategy-aware literal gates (G4 + G5) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Donor `ZSTD_minLiteralsToCompress` (`zstd_compress_literals.c:114-127`) floors literal-compression entry at `8 << shift` bytes with shift derived from strategy index (fast/dfast/greedy/lazy=3, btopt=2, btultra=1, btultra2=0) and 6-byte floor on reused HUF tables. Previously hardcoded `< 8` floor — same lower bound but ignored strategy: btultra2 missed the 8-byte tail compression opportunity, btopt missed 32-byte band, etc. Donor `ZSTD_minGain` (`zstd_compress_internal.h:677-684`) requires compressed literals to beat raw by `(srcSize >> minlog) + 2` margin where minlog=6 for fast..btopt, 7 for btultra, 8 for btultra2. Previously bare `>= raw_section_bytes` — overcompressed micro-wins that fell back to raw on real emit, biasing splitter cost predictions. `CompressState.strategy_tag` plumbed through frame compressor, streaming encoder, and `CompressedBlockScratch::scratch_state` so emit and estimator paths use identical gates — keeps splitter probe costs in sync with actual emit decisions. - Verified 514/514 lib tests pass, clippy + fmt clean - Full ratio matrix (203 cells): 0 deltas vs main — gates donor-equivalent on current fixtures - Block-level `min_gain = (srcSize >> 8) + 2` (btultra2 margin) kept uniform at `SplitEstimator::estimate_subblock_size` / `emit_single_sequence_block` — migration to strategy-aware block-level gain is a separate cleanup Related: #23 --- zstd/src/encoding/blocks/compressed.rs | 111 +++++++++++++++++++++++-- zstd/src/encoding/frame_compressor.rs | 23 +++++ zstd/src/encoding/levels/fastest.rs | 2 + zstd/src/encoding/streaming_encoder.rs | 3 + 4 files changed, 134 insertions(+), 5 deletions(-) diff --git a/zstd/src/encoding/blocks/compressed.rs b/zstd/src/encoding/blocks/compressed.rs index 1f742a86..68d91851 100644 --- a/zstd/src/encoding/blocks/compressed.rs +++ b/zstd/src/encoding/blocks/compressed.rs @@ -13,6 +13,72 @@ use crate::{ const MIN_SEQUENCES_BLOCK_SPLITTING: usize = 300; const MAX_NB_BLOCK_SPLITS: usize = 196; +/// Donor `ZSTD_minLiteralsToCompress` (`zstd_compress_literals.c:114-127`): +/// strategy-aware floor below which `compress_literals` does not even +/// attempt huf compression and falls back to raw. +/// +/// Formula: `shift = MIN(9 - donor_strategy, 3); mintc = (huf_repeat == +/// valid) ? 6 : (8 << shift)`. With huf reuse available, the per-block huf +/// header overhead is gone, so the cheap floor is 6 bytes. Without it, the +/// huf tree-description overhead (≥30 bytes) dominates on small payloads +/// and we'd produce a worse-than-raw block — donor's shift table gives +/// strategy 1..6 → 64 bytes, strategy 7 (btopt) → 32, strategy 8 (btultra) +/// → 16, strategy 9 (btultra2) → 8. +/// +/// Our `StrategyTag` enum has seven variants (no separate lazy2/btlazy2 — +/// the `Lazy` variant covers donor strategies 4..6). Map our variants to +/// donor strategy numbers using the most-aggressive-of-band mapping +/// (`Lazy → strat=5 → shift=3`) so the threshold matches what donor would +/// pick for the levels we route to that variant. +#[inline] +fn min_literals_to_compress( + strategy: crate::encoding::strategy::StrategyTag, + has_huf_table: bool, +) -> usize { + use crate::encoding::strategy::StrategyTag; + if has_huf_table { + return 6; + } + let shift: u32 = match strategy { + StrategyTag::Fast | StrategyTag::Dfast | StrategyTag::Greedy | StrategyTag::Lazy => 3, + StrategyTag::BtOpt => 2, + StrategyTag::BtUltra => 1, + StrategyTag::BtUltra2 => 0, + }; + 8usize << shift +} + +/// Donor `ZSTD_minGain` (`zstd_compress_internal.h:677-684`): +/// strategy-aware minimum-compression margin used by both block-level +/// emit ("compressed block must beat raw + minGain bytes") and the +/// literal-section expansion check. +/// +/// Formula: `minlog = (strat >= btultra) ? strat - 1 : 6; (src_size >> +/// minlog) + 2`. So: +/// - fast..btopt (strat 1..7): minlog=6 → ~1.5% margin + 2 bytes +/// - btultra (strat 8): minlog=7 → ~0.78% margin + 2 bytes +/// - btultra2 (strat 9): minlog=8 → ~0.39% margin + 2 bytes +/// +/// Used in `compress_literals` and `estimate_literals_section_bytes` to +/// match donor's `cLitSize >= srcSize - minGain` raw-fallback gate. +/// Existing block-level emit / probe paths (`emit_single_sequence_block`, +/// `SplitEstimator::estimate_subblock_size`) already use `min_log = 8` +/// directly (`source_len >> 8 + 2`) — that is the btultra2 margin, the +/// most permissive of the band, applied uniformly. Migrating those sites +/// to this strategy-aware helper is a separate cleanup; this commit only +/// wires the helper into the literal-section gates that previously had +/// no `min_gain` margin at all (bare `>= raw_section_bytes`). +#[inline] +fn min_gain(src_size: usize, strategy: crate::encoding::strategy::StrategyTag) -> usize { + use crate::encoding::strategy::StrategyTag; + let minlog: u32 = match strategy { + StrategyTag::BtUltra => 7, + StrategyTag::BtUltra2 => 8, + _ => 6, + }; + (src_size >> minlog) + 2 +} + /// Donor `kInverseProbabilityLog256`: floor(-log2(x / 256) * 256). const INVERSE_PROBABILITY_LOG_256: [usize; 256] = [ 0, 2048, 1792, 1642, 1536, 1453, 1386, 1329, 1280, 1236, 1197, 1162, 1130, 1100, 1073, 1047, @@ -203,6 +269,7 @@ pub(crate) fn compress_block_with_post_split( fse_tables: clone_fse_tables(&state.fse_tables), block_scratch: super::CompressedBlockScratch::new(), offset_hist: state.offset_hist, + strategy_tag: state.strategy_tag, }, workspace, }; @@ -302,11 +369,30 @@ fn encode_block_parts_with_sequence_scratch( // literals section let mut writer = BitWriter::from(output); + // Donor `compress_literals` (`zstd_compress_literals.c:153-160`): + // `srcSize < ZSTD_minLiteralsToCompress(strategy, prevHuf->repeatMode)` + // → `ZSTD_noCompressLiterals` (raw). The threshold is strategy-aware + // (see `min_literals_to_compress`). With huf reuse available the + // floor drops to 6 since there is no per-block huf-header overhead. + let strategy = state.strategy_tag; + let has_huf_table = state.last_huff_table.is_some(); + let min_lits = min_literals_to_compress(strategy, has_huf_table); + // RLE keeps a separate `>= 8 + all_bytes_identical` gate: donor's + // `cLitSize == 1` RLE-detection path (`zstd_compress_literals.c:192-201`) + // fires after the compress attempt, but our pre-check is cheaper and + // produces the same RLE block for `len >= 8 && all_identical`. RLE + // for sub-8-byte all-identical inputs goes through the raw path in + // both donor (`srcSize < 8` arm) and us — same wire output. if literals_vec.len() >= 8 && all_bytes_identical(literals_vec) { rle_literals(literals_vec, &mut writer); state.last_huff_table = None; - } else if literals_vec.len() >= 8 { - match compress_literals(literals_vec, state.last_huff_table.as_ref(), &mut writer) { + } else if literals_vec.len() >= min_lits { + match compress_literals( + literals_vec, + state.last_huff_table.as_ref(), + &mut writer, + strategy, + ) { HuffmanTableUpdate::New(table) => { state.last_huff_table.replace(table); } @@ -416,6 +502,7 @@ fn estimate_block_parts_size( literals_vec, &mut state.last_huff_table, &mut workspace.lit_counts, + state.strategy_tag, ); let seq_bytes = if workspace.sequences.is_empty() { @@ -437,9 +524,11 @@ fn estimate_literals_section_bytes( literals: &[u8], last_huff: &mut Option, counts: &mut [usize; 256], + strategy: crate::encoding::strategy::StrategyTag, ) -> usize { // Mirror `encode_block_parts_with_sequence_scratch` literal-mode branches. - if literals.len() < 8 { + let min_lits = min_literals_to_compress(strategy, last_huff.is_some()); + if literals.len() < min_lits { *last_huff = None; return uncompressed_literals_header_bytes(literals.len()) + literals.len(); } @@ -503,7 +592,8 @@ fn estimate_literals_section_bytes( // small payloads where the raw header (1 byte for size < 32) is smaller // than the compressed header (3 bytes for size < 1 KB). let raw_section_bytes = uncompressed_literals_header_bytes(literals.len()) + literals.len(); - if total >= raw_section_bytes { + let mg = min_gain(literals.len(), strategy); + if total >= raw_section_bytes.saturating_sub(mg) { *last_huff = None; return raw_section_bytes; } @@ -1588,6 +1678,7 @@ fn compress_literals( literals: &[u8], last_table: Option<&huff0_encoder::HuffmanTable>, writer: &mut BitWriter<&mut Vec>, + strategy: crate::encoding::strategy::StrategyTag, ) -> HuffmanTableUpdate { let reset_idx = writer.index(); @@ -1659,7 +1750,16 @@ fn compress_literals( // `literals.len()` would bias toward raw on small payloads where the // raw header is smaller than the compressed header. let raw_section_bytes = uncompressed_literals_header_bytes(literals.len()) + literals.len(); - if total_len >= raw_section_bytes { + // Donor `compress_literals` (`zstd_compress_literals.c:187-188`): + // `cLitSize >= srcSize - minGain` → raw fallback. `minGain` is + // strategy-aware (`min_gain` helper above; ~1.5% for fast..btopt, + // ~0.78% for btultra, ~0.39% for btultra2). Previously this check + // used bare `>= raw_section_bytes` (zero margin) which let + // marginally-compressing blocks emit huf when raw + reused table + // would compress better in the next block. Saturating subtraction + // covers tiny inputs where `raw_section_bytes < min_gain`. + let mg = min_gain(literals.len(), strategy); + if total_len >= raw_section_bytes.saturating_sub(mg) { writer.reset_to(reset_idx); raw_literals(literals, writer); HuffmanTableUpdate::Cleared @@ -1733,6 +1833,7 @@ mod tests { fse_tables: FseTables::new(), block_scratch: super::CompressedBlockScratch::new(), offset_hist: [10, 20, 30], + strategy_tag: crate::encoding::strategy::StrategyTag::Fast, }; let source = [0xA5; 8]; let sequences = [RawSequence { diff --git a/zstd/src/encoding/frame_compressor.rs b/zstd/src/encoding/frame_compressor.rs index e58bb68d..bc67a14b 100644 --- a/zstd/src/encoding/frame_compressor.rs +++ b/zstd/src/encoding/frame_compressor.rs @@ -390,6 +390,15 @@ pub(crate) struct CompressState { /// Offset history for repeat offset encoding: [rep0, rep1, rep2]. /// Initialized to [1, 4, 8] per RFC 8878 §3.1.2.5. pub(crate) offset_hist: [u32; 3], + /// Strategy tag resolved from the current `CompressionLevel` at every + /// `matcher.reset()` call. Used by the literal-compression gates + /// (`min_literals_to_compress`, `min_gain`) in + /// `encoding::blocks::compressed` to mirror donor's strategy-aware + /// thresholds (`zstd_compress_literals.c:114-127, 187-188`). Defaults + /// to `Fast` so a `CompressState` constructed without an explicit + /// reset still has a valid value (matches the level-1 default donor + /// would emit for `disableLiteralCompression == 0`). + pub(crate) strategy_tag: crate::encoding::strategy::StrategyTag, } impl FrameCompressor { @@ -408,6 +417,9 @@ impl FrameCompressor { fse_tables: FseTables::new(), block_scratch: crate::encoding::blocks::CompressedBlockScratch::new(), offset_hist: [1, 4, 8], + strategy_tag: crate::encoding::strategy::StrategyTag::for_compression_level( + compression_level, + ), }, #[cfg(feature = "hash")] hasher: XxHash64::with_seed(0), @@ -430,6 +442,9 @@ impl FrameCompressor { fse_tables: FseTables::new(), block_scratch: crate::encoding::blocks::CompressedBlockScratch::new(), offset_hist: [1, 4, 8], + strategy_tag: crate::encoding::strategy::StrategyTag::for_compression_level( + compression_level, + ), }, compression_level, #[cfg(feature = "hash")] @@ -489,6 +504,14 @@ impl FrameCompressor { // Clearing buffers to allow re-using of the compressor self.state.matcher.reset(self.compression_level); self.state.offset_hist = [1, 4, 8]; + // Sync `state.strategy_tag` to the level resolved at this reset so + // the literal-compression gates (`min_literals_to_compress` / + // `min_gain` in `encoding::blocks::compressed`) see the correct + // strategy for the next frame. Frame-by-frame level changes go + // through this same `compress()` entry point, so re-syncing here + // covers level switches without touching the matcher dispatch. + self.state.strategy_tag = + crate::encoding::strategy::StrategyTag::for_compression_level(self.compression_level); let cached_entropy = if use_dictionary_state { self.dictionary_entropy_cache.as_ref() } else { diff --git a/zstd/src/encoding/levels/fastest.rs b/zstd/src/encoding/levels/fastest.rs index 5200f359..5ddf09c3 100644 --- a/zstd/src/encoding/levels/fastest.rs +++ b/zstd/src/encoding/levels/fastest.rs @@ -186,6 +186,7 @@ mod tests { fse_tables: FseTables::new(), block_scratch: crate::encoding::blocks::CompressedBlockScratch::new(), offset_hist: [1, 4, 8], + strategy_tag: crate::encoding::strategy::StrategyTag::Fast, }; let mut output = Vec::new(); @@ -213,6 +214,7 @@ mod tests { fse_tables: FseTables::new(), block_scratch: crate::encoding::blocks::CompressedBlockScratch::new(), offset_hist: [1, 4, 8], + strategy_tag: crate::encoding::strategy::StrategyTag::Fast, }; let mut output = Vec::new(); diff --git a/zstd/src/encoding/streaming_encoder.rs b/zstd/src/encoding/streaming_encoder.rs index 5ecee5db..54687234 100644 --- a/zstd/src/encoding/streaming_encoder.rs +++ b/zstd/src/encoding/streaming_encoder.rs @@ -66,6 +66,9 @@ impl StreamingEncoder { fse_tables: FseTables::new(), block_scratch: crate::encoding::blocks::CompressedBlockScratch::new(), offset_hist: [1, 4, 8], + strategy_tag: crate::encoding::strategy::StrategyTag::for_compression_level( + compression_level, + ), }, pending: Vec::new(), encoded_scratch: Vec::new(), From f7d74262ec3eb75c6e69c16de9c13264413cd860 Mon Sep 17 00:00:00 2001 From: Dmitry Prudnikov Date: Tue, 19 May 2026 03:17:17 +0300 Subject: [PATCH 03/14] fix(encoder): estimator RLE gate mirrors emit's >= 8 guard MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Emitter only emits RLE when `len >= 8 && all_identical` (`encode_block_parts_with_sequence_scratch` RLE branch). After G4 landed, estimator's `min_lits` can be 6 on reused HUF tables, so a bare `all_identical` check let estimator predict a 1-byte RLE payload for 6-7 byte all-identical literals that the emitter would actually write as raw — splitter probe costs drift from real emit output. Mirror the same `len >= 8` guard explicitly so estimator and emit stay byte-equivalent on every input. Reported by CodeRabbit on PR #182. --- zstd/src/encoding/blocks/compressed.rs | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/zstd/src/encoding/blocks/compressed.rs b/zstd/src/encoding/blocks/compressed.rs index 68d91851..2fa3428d 100644 --- a/zstd/src/encoding/blocks/compressed.rs +++ b/zstd/src/encoding/blocks/compressed.rs @@ -532,7 +532,13 @@ fn estimate_literals_section_bytes( *last_huff = None; return uncompressed_literals_header_bytes(literals.len()) + literals.len(); } - if all_bytes_identical(literals) { + // Mirror emit's RLE gate exactly: emitter only emits RLE when + // `len >= 8 && all_identical` (see comment at the RLE branch above). + // With reused HUF tables `min_lits == 6`, so a bare `all_identical` + // check here would have estimator predict a 1-byte RLE payload for + // 6-7 byte all-identical literals that the emitter would actually + // write as raw — splitter probe costs drift from real emit output. + if literals.len() >= 8 && all_bytes_identical(literals) { *last_huff = None; return uncompressed_literals_header_bytes(literals.len()) + 1; } From 42034fc7d00e256ec8b3d34bee1ff2842e735ff8 Mon Sep 17 00:00:00 2001 From: Dmitry Prudnikov Date: Tue, 19 May 2026 03:27:00 +0300 Subject: [PATCH 04/14] fix(encoder): mirror emit order in estimator + cover gates with tests MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Emit checks the RLE pre-shortcut (`len >= 8 && all_identical`) BEFORE the strategy-aware `min_lits` gate, so an all-identical 8-byte literal section emits a 2-byte RLE block under every strategy — including fast/dfast/greedy/lazy where `min_lits == 64`. The estimator previously gated `min_lits` first, predicting raw for those inputs while emit wrote RLE — splitter probe costs diverged. Swap the order in `estimate_literals_section_bytes` so both paths agree byte-for-byte; update emit's RLE-pre-check comment to be accurate about the donor divergence. Add unit tests covering: - `min_literals_to_compress` per-strategy values with and without HUF reuse (donor table from `zstd_compress_literals.c:114-127`) - `min_gain` per-strategy margins (`zstd_compress_internal.h:677-684`) including small-source `(src >> minlog) + 2` edge - Estimator/emit parity for boundary literal lengths around each strategy's `min_lits` and across the RLE `>= 8` cliff Verified 531/531 lib tests pass; ratio matrix (203 cells) shows 0 delta vs main. --- zstd/src/encoding/blocks/compressed.rs | 150 ++++++++++++++++++++++--- 1 file changed, 132 insertions(+), 18 deletions(-) diff --git a/zstd/src/encoding/blocks/compressed.rs b/zstd/src/encoding/blocks/compressed.rs index 2fa3428d..c755d70a 100644 --- a/zstd/src/encoding/blocks/compressed.rs +++ b/zstd/src/encoding/blocks/compressed.rs @@ -377,12 +377,16 @@ fn encode_block_parts_with_sequence_scratch( let strategy = state.strategy_tag; let has_huf_table = state.last_huff_table.is_some(); let min_lits = min_literals_to_compress(strategy, has_huf_table); - // RLE keeps a separate `>= 8 + all_bytes_identical` gate: donor's - // `cLitSize == 1` RLE-detection path (`zstd_compress_literals.c:192-201`) - // fires after the compress attempt, but our pre-check is cheaper and - // produces the same RLE block for `len >= 8 && all_identical`. RLE - // for sub-8-byte all-identical inputs goes through the raw path in - // both donor (`srcSize < 8` arm) and us — same wire output. + // RLE pre-check: donor `compress_literals` reaches RLE through + // `cLitSize == 1` after a full compress attempt + // (`zstd_compress_literals.c:192-201`), but only AFTER passing the + // `min_lits` gate — so donor at fast/dfast/greedy/lazy emits raw + // for 8..63 all-identical inputs. Our pre-check fires earlier and + // emits the smaller RLE block instead (2 bytes vs `1+len` raw): a + // strict win in compressed size on all-identical sections, no + // donor binary parity but better ratio. Note the order — RLE + // pre-check runs BEFORE `min_lits`. `estimate_literals_section_bytes` + // mirrors this order so probe costs match emit byte-for-byte. if literals_vec.len() >= 8 && all_bytes_identical(literals_vec) { rle_literals(literals_vec, &mut writer); state.last_huff_table = None; @@ -526,22 +530,23 @@ fn estimate_literals_section_bytes( counts: &mut [usize; 256], strategy: crate::encoding::strategy::StrategyTag, ) -> usize { - // Mirror `encode_block_parts_with_sequence_scratch` literal-mode branches. + // Mirror `encode_block_parts_with_sequence_scratch` literal-mode branches + // **in the same order**. The emitter checks the RLE pre-shortcut + // (`len >= 8 && all_identical`) BEFORE the `min_lits` gate, so an + // all-identical 8-byte literal section emits a 2-byte RLE block under + // any strategy — including fast/dfast/greedy/lazy where `min_lits == 64`. + // If estimator gated `min_lits` first, it would predict a raw block + // for those inputs while the emitter writes RLE, biasing splitter + // probe costs. Keep the order identical to emit. + if literals.len() >= 8 && all_bytes_identical(literals) { + *last_huff = None; + return uncompressed_literals_header_bytes(literals.len()) + 1; + } let min_lits = min_literals_to_compress(strategy, last_huff.is_some()); if literals.len() < min_lits { *last_huff = None; return uncompressed_literals_header_bytes(literals.len()) + literals.len(); } - // Mirror emit's RLE gate exactly: emitter only emits RLE when - // `len >= 8 && all_identical` (see comment at the RLE branch above). - // With reused HUF tables `min_lits == 6`, so a bare `all_identical` - // check here would have estimator predict a 1-byte RLE payload for - // 6-7 byte all-identical literals that the emitter would actually - // write as raw — splitter probe costs drift from real emit output. - if literals.len() >= 8 && all_bytes_identical(literals) { - *last_huff = None; - return uncompressed_literals_header_bytes(literals.len()) + 1; - } counts.fill(0); for &b in literals { @@ -1782,9 +1787,11 @@ mod tests { use super::{ FseTableMode, RawSequence, choose_table, emit_single_sequence_block, encode_match_len, - encode_offset_with_history, previous_table, remember_last_used_tables, + encode_offset_with_history, min_gain, min_literals_to_compress, previous_table, + remember_last_used_tables, }; use crate::encoding::frame_compressor::{CompressState, FseTables, PreviousFseTable}; + use crate::encoding::strategy::StrategyTag; use crate::fse::fse_encoder::build_table_from_symbol_counts; use alloc::vec::Vec; @@ -1824,6 +1831,113 @@ mod tests { assert_eq!(hist, [9, 10, 20]); } + #[test] + fn min_literals_to_compress_matches_donor_table() { + for strat in [ + StrategyTag::Fast, + StrategyTag::Dfast, + StrategyTag::Greedy, + StrategyTag::Lazy, + ] { + assert_eq!(min_literals_to_compress(strat, false), 64); + assert_eq!(min_literals_to_compress(strat, true), 6); + } + assert_eq!(min_literals_to_compress(StrategyTag::BtOpt, false), 32); + assert_eq!(min_literals_to_compress(StrategyTag::BtOpt, true), 6); + assert_eq!(min_literals_to_compress(StrategyTag::BtUltra, false), 16); + assert_eq!(min_literals_to_compress(StrategyTag::BtUltra, true), 6); + assert_eq!(min_literals_to_compress(StrategyTag::BtUltra2, false), 8); + assert_eq!(min_literals_to_compress(StrategyTag::BtUltra2, true), 6); + } + + #[test] + fn min_gain_matches_donor_margin() { + let src = 4096usize; + for strat in [ + StrategyTag::Fast, + StrategyTag::Dfast, + StrategyTag::Greedy, + StrategyTag::Lazy, + StrategyTag::BtOpt, + ] { + assert_eq!(min_gain(src, strat), (src >> 6) + 2); + } + assert_eq!(min_gain(src, StrategyTag::BtUltra), (src >> 7) + 2); + assert_eq!(min_gain(src, StrategyTag::BtUltra2), (src >> 8) + 2); + assert_eq!(min_gain(0, StrategyTag::Fast), 2); + assert_eq!(min_gain(63, StrategyTag::Fast), 2); + assert_eq!(min_gain(64, StrategyTag::Fast), 3); + } + + #[test] + fn estimator_literals_section_mirrors_emit_for_short_inputs() { + use super::{ + CompressedBlockScratch, EntropyOnlyMatcher, EstimatorWorkspace, + encode_block_parts_with_sequence_scratch, estimate_block_parts_size, + }; + // For each strategy at boundary literal lengths around `min_lits` + // and across the RLE pre-check `>= 8` cliff, estimator's predicted + // size MUST equal the bytes the emitter actually writes. + let cases: &[(StrategyTag, &[(usize, bool)])] = &[ + ( + StrategyTag::Fast, + &[(8, true), (8, false), (63, true), (63, false), (64, false)], + ), + ( + StrategyTag::BtUltra2, + &[(7, true), (7, false), (8, true), (8, false), (16, false)], + ), + (StrategyTag::BtOpt, &[(8, true), (31, true), (32, false)]), + ]; + + for (strat, inputs) in cases { + for (len, identical) in *inputs { + let literals: Vec = if *identical { + alloc::vec![0x5Au8; *len] + } else { + (0..*len as u8).collect() + }; + let mut est_state = CompressState:: { + matcher: EntropyOnlyMatcher, + last_huff_table: None, + fse_tables: FseTables::new(), + block_scratch: CompressedBlockScratch::new(), + offset_hist: [1, 4, 8], + strategy_tag: *strat, + }; + let mut emit_state = CompressState:: { + matcher: EntropyOnlyMatcher, + last_huff_table: None, + fse_tables: FseTables::new(), + block_scratch: CompressedBlockScratch::new(), + offset_hist: [1, 4, 8], + strategy_tag: *strat, + }; + let mut workspace = EstimatorWorkspace::default(); + let est = estimate_block_parts_size(&mut est_state, &literals, &[], &mut workspace); + let mut emitted: Vec = Vec::new(); + let mut scratch: Vec = Vec::new(); + encode_block_parts_with_sequence_scratch( + &mut emit_state, + &literals, + &[], + &mut emitted, + &mut scratch, + ); + assert_eq!( + est, + emitted.len(), + "estimator/emit parity broken: strategy={:?} len={} identical={} est={} emit={}", + strat, + len, + identical, + est, + emitted.len(), + ); + } + } + } + #[test] fn encode_match_len_uses_correct_upper_range_base() { assert_eq!(encode_match_len(65539), (52, 0, 16)); From 8c7e7724674a6e2fa3d7803abd00092927d3308f Mon Sep 17 00:00:00 2001 From: Dmitry Prudnikov Date: Tue, 19 May 2026 03:49:06 +0300 Subject: [PATCH 05/14] fix(encoder): donor-formula min_gain + RLE for any all-identical section MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two related parity fixes to the literal-section pipeline: 1. min_gain comparison now matches donor formula. Previously compared `total_len >= raw_section_bytes - mg` (compressed lhSize on LHS, raw lhSize on RHS), which skewed the threshold by `compressed_header - raw_header` bytes and forced raw fallback on marginally-winning compressed sections that donor would keep. Switch to donor's `cLitSize >= srcSize - minGain` form (`zstd_compress_literals.c:187-188`): compare `(total_len - compressed_header_len) >= literals.len() - mg`. Applied in both estimator (`estimate_literals_section_bytes`) and emit (`compress_literals`) so probe costs stay in lockstep. 2. RLE pre-check fires for any non-empty all-identical literal section, not just `len >= 8`. The old `len >= 8` floor caused 6/7-byte all-identical sections with a reusable HUF table (`min_lits == 6`) to skip RLE and emit a larger HUF block — donor reaches the same RLE block via `cLitSize == 1` after the `min_lits` gate. Dropping the floor matches donor on `>= min_lits` inputs and produces strictly smaller output than donor on the `< min_lits` all-identical edges (where donor falls through to raw). Applied symmetrically in estimator and emit. 531/531 lib tests pass; ratio matrix (203 cells via REPORT lines) shows 0 delta vs main — both fixes are ratio-safe on current fixtures. --- zstd/src/encoding/blocks/compressed.rs | 100 ++++++++++++++----------- 1 file changed, 57 insertions(+), 43 deletions(-) diff --git a/zstd/src/encoding/blocks/compressed.rs b/zstd/src/encoding/blocks/compressed.rs index c755d70a..4de1bab4 100644 --- a/zstd/src/encoding/blocks/compressed.rs +++ b/zstd/src/encoding/blocks/compressed.rs @@ -377,17 +377,21 @@ fn encode_block_parts_with_sequence_scratch( let strategy = state.strategy_tag; let has_huf_table = state.last_huff_table.is_some(); let min_lits = min_literals_to_compress(strategy, has_huf_table); - // RLE pre-check: donor `compress_literals` reaches RLE through - // `cLitSize == 1` after a full compress attempt - // (`zstd_compress_literals.c:192-201`), but only AFTER passing the - // `min_lits` gate — so donor at fast/dfast/greedy/lazy emits raw - // for 8..63 all-identical inputs. Our pre-check fires earlier and - // emits the smaller RLE block instead (2 bytes vs `1+len` raw): a - // strict win in compressed size on all-identical sections, no - // donor binary parity but better ratio. Note the order — RLE - // pre-check runs BEFORE `min_lits`. `estimate_literals_section_bytes` - // mirrors this order so probe costs match emit byte-for-byte. - if literals_vec.len() >= 8 && all_bytes_identical(literals_vec) { + // RLE pre-check: donor `compress_literals` reaches RLE only through + // the `cLitSize == 1` branch (`zstd_compress_literals.c:192-201`) + // after passing the `min_lits` gate and running a full HUF compress — + // so donor emits raw for any all-identical section under `min_lits` + // (e.g. 8..63 bytes at fast/dfast/greedy/lazy, 6..7 bytes with HUF + // reuse). RLE is unconditionally better than raw on any non-empty + // all-identical input (2 bytes vs `1 + len`), so our pre-check fires + // for ANY all-identical literal slice regardless of strategy/min_lits. + // This produces strictly smaller output than donor on the small + // all-identical edges while still matching donor on `>= min_lits` + // inputs (where donor's compress+`cLitSize==1` path reaches the same + // RLE block). Note the order — RLE pre-check runs BEFORE `min_lits`; + // `estimate_literals_section_bytes` mirrors this exactly so probe + // costs match emit byte-for-byte. + if !literals_vec.is_empty() && all_bytes_identical(literals_vec) { rle_literals(literals_vec, &mut writer); state.last_huff_table = None; } else if literals_vec.len() >= min_lits { @@ -531,14 +535,12 @@ fn estimate_literals_section_bytes( strategy: crate::encoding::strategy::StrategyTag, ) -> usize { // Mirror `encode_block_parts_with_sequence_scratch` literal-mode branches - // **in the same order**. The emitter checks the RLE pre-shortcut - // (`len >= 8 && all_identical`) BEFORE the `min_lits` gate, so an - // all-identical 8-byte literal section emits a 2-byte RLE block under - // any strategy — including fast/dfast/greedy/lazy where `min_lits == 64`. - // If estimator gated `min_lits` first, it would predict a raw block - // for those inputs while the emitter writes RLE, biasing splitter - // probe costs. Keep the order identical to emit. - if literals.len() >= 8 && all_bytes_identical(literals) { + // **in the same order**. The emitter pre-checks `all_identical` + // (any non-empty section) BEFORE the `min_lits` gate — RLE is + // unconditionally smaller than raw on all-identical inputs, so it + // is selected regardless of strategy. Estimator must use the same + // ordering and predicate so probe costs match emit byte-for-byte. + if !literals.is_empty() && all_bytes_identical(literals) { *last_huff = None; return uncompressed_literals_header_bytes(literals.len()) + 1; } @@ -595,16 +597,23 @@ fn estimate_literals_section_bytes( let compressed_header = compressed_literals_header_bytes(literals.len()); let total = compressed_header + tree_desc + payload; - // Mirror `compress_literals` raw fallback: compare on-wire sections - // *byte-for-byte* — `total` already includes the compressed literals - // header (and the Huffman tree description when emitting a new table), - // so the raw side must also include its own 1–3 byte header. Comparing - // against bare `literals.len()` would bias the splitter toward raw on - // small payloads where the raw header (1 byte for size < 32) is smaller - // than the compressed header (3 bytes for size < 1 KB). + // Donor `compress_literals` raw-fallback gate + // (`zstd_compress_literals.c:187-188`): + // `cLitSize >= srcSize - minGain` + // where `cLitSize` is the encoded literals payload + tree description + // (output of `HUF_compress*`, excluding the surrounding lhSize bytes) + // and `srcSize` is the literal-payload length. In our terms: + // - donor `cLitSize` ≡ `total - compressed_header` (tree_desc + payload) + // - donor `srcSize` ≡ `literals.len()` + // Using the on-wire `total >= raw_section_bytes - mg` form (which + // includes the compressed header on the LHS and the raw header on + // the RHS) skews the threshold by `compressed_header - raw_header` + // bytes and rejects compressed sections that donor would keep, + // losing ratio. Mirror donor's payload-vs-srcSize form here. let raw_section_bytes = uncompressed_literals_header_bytes(literals.len()) + literals.len(); let mg = min_gain(literals.len(), strategy); - if total >= raw_section_bytes.saturating_sub(mg) { + let donor_clit = total - compressed_header; // tree_desc + payload + if donor_clit >= literals.len().saturating_sub(mg) { *last_huff = None; return raw_section_bytes; } @@ -1754,23 +1763,28 @@ fn compress_literals( writer.change_bits(size_index, encoded_len as u64, size_bits); let total_len = (writer.index() - reset_idx) / 8; - // Compare on-wire sections byte-for-byte: `total_len` already includes - // the compressed literals header + tree description + payload, so the - // raw side must also include its own 1–3 byte header - // (`uncompressed_literals_header_bytes`). Comparing against bare - // `literals.len()` would bias toward raw on small payloads where the - // raw header is smaller than the compressed header. - let raw_section_bytes = uncompressed_literals_header_bytes(literals.len()) + literals.len(); - // Donor `compress_literals` (`zstd_compress_literals.c:187-188`): - // `cLitSize >= srcSize - minGain` → raw fallback. `minGain` is - // strategy-aware (`min_gain` helper above; ~1.5% for fast..btopt, - // ~0.78% for btultra, ~0.39% for btultra2). Previously this check - // used bare `>= raw_section_bytes` (zero margin) which let - // marginally-compressing blocks emit huf when raw + reused table - // would compress better in the next block. Saturating subtraction - // covers tiny inputs where `raw_section_bytes < min_gain`. + // Donor `compress_literals` raw-fallback gate + // (`zstd_compress_literals.c:187-188`): + // `cLitSize >= srcSize - minGain` + // where donor's `cLitSize` is the encoded literals payload plus the + // tree description (output of `HUF_compress*`, excluding the + // surrounding `lhSize` literals header), and `srcSize` is the + // literal-payload length. In our terms: + // - donor `cLitSize` ≡ `total_len - compressed_literals_header_bytes` + // (i.e. tree_desc + huf_payload, no lhSize) + // - donor `srcSize` ≡ `literals.len()` + // Comparing `total_len >= raw_section_bytes - minGain` (with the + // compressed-section lhSize on the LHS and raw-section header on + // the RHS) skews the threshold by `compressed_header - raw_header` + // bytes and rejects compressed sections that donor would keep — + // direct ratio loss. Mirror donor's payload-vs-srcSize form here. + // `minGain` is strategy-aware (`min_gain` helper above; ~1.56% for + // fast..btopt, ~0.78% for btultra, ~0.39% for btultra2). Saturating + // subtraction covers tiny inputs where `literals.len() < minGain`. + let compressed_header_len = compressed_literals_header_bytes(literals.len()); + let donor_clit = total_len - compressed_header_len; // tree_desc + payload let mg = min_gain(literals.len(), strategy); - if total_len >= raw_section_bytes.saturating_sub(mg) { + if donor_clit >= literals.len().saturating_sub(mg) { writer.reset_to(reset_idx); raw_literals(literals, writer); HuffmanTableUpdate::Cleared From 70e79ec482c4ebd616d2922c36fffeb3a4e8058c Mon Sep 17 00:00:00 2001 From: Dmitry Prudnikov Date: Tue, 19 May 2026 04:08:24 +0300 Subject: [PATCH 06/14] test(encoder): extend literal-gate parity test to HUF reuse path MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The parity test previously only seeded `last_huff_table: None`, so the floor-6 HUF reuse path (`ZSTD_minLiteralsToCompress` with valid prior HUF table) and 6/7-byte all-identical sections that newly route through the unconditional RLE pre-check were not exercised in estimator/emit parity. Add cases: - Sub-`min_lits` all-identical (`len in 1..=7` at fast strategy): exercises the new "any non-empty all-identical → RLE" path - Seeded HUF table with `Lazy` strategy at 6/7-byte all-identical and 16-byte non-identical: exercises the `min_lits == 6` floor and the HUF-reuse decision under the new RLE pre-check Also update the in-test docstring to drop the stale "RLE `>= 8` cliff" phrasing — the cliff was removed when RLE became unconditional for non-empty all-identical inputs. 531/531 lib tests pass. --- zstd/src/encoding/blocks/compressed.rs | 70 ++++++++++++++++++++++---- 1 file changed, 61 insertions(+), 9 deletions(-) diff --git a/zstd/src/encoding/blocks/compressed.rs b/zstd/src/encoding/blocks/compressed.rs index 4de1bab4..ea28728e 100644 --- a/zstd/src/encoding/blocks/compressed.rs +++ b/zstd/src/encoding/blocks/compressed.rs @@ -1807,6 +1807,7 @@ mod tests { use crate::encoding::frame_compressor::{CompressState, FseTables, PreviousFseTable}; use crate::encoding::strategy::StrategyTag; use crate::fse::fse_encoder::build_table_from_symbol_counts; + use crate::huff0::huff0_encoder; use alloc::vec::Vec; fn tables_match( @@ -1890,30 +1891,80 @@ mod tests { encode_block_parts_with_sequence_scratch, estimate_block_parts_size, }; // For each strategy at boundary literal lengths around `min_lits` - // and across the RLE pre-check `>= 8` cliff, estimator's predicted - // size MUST equal the bytes the emitter actually writes. - let cases: &[(StrategyTag, &[(usize, bool)])] = &[ + // and across the all-identical RLE pre-check (fires for any + // non-empty all-identical input under every strategy/HUF state), + // estimator's predicted size MUST equal the bytes the emitter + // actually writes. Cases include: (a) fresh state with + // `last_huff_table: None` covering the strategy-specific + // `min_lits` band (8/16/32/64), (b) seeded HUF-reuse state + // covering the lowered floor of 6 and 6/7-byte all-identical + // sections that previously hit the HUF path instead of RLE, + // (c) sub-`min_lits` all-identical sections that take the + // RLE pre-check regardless of strategy. + type Inputs = &'static [(usize, bool)]; + let cases: &[(StrategyTag, bool, Inputs)] = &[ + // (strategy, seed_huff_reuse, [(len, all_identical)]) ( StrategyTag::Fast, - &[(8, true), (8, false), (63, true), (63, false), (64, false)], + false, + &[ + (1, true), // sub-min_lits all-identical → RLE + (5, true), // sub-min_lits all-identical → RLE + (8, true), + (8, false), + (63, true), + (63, false), + (64, false), + ], ), ( StrategyTag::BtUltra2, + false, &[(7, true), (7, false), (8, true), (8, false), (16, false)], ), - (StrategyTag::BtOpt, &[(8, true), (31, true), (32, false)]), + ( + StrategyTag::BtOpt, + false, + &[(8, true), (31, true), (32, false)], + ), + // HUF reuse path: floor drops to 6, so 6/7-byte sections + // previously hit the HUF compress path. With the RLE + // pre-check now non-empty-only, all-identical 6/7-byte + // sections must route through RLE and stay byte-equivalent + // estimator-vs-emit. Also exercise non-identical 6-byte + // raw fallback and 16-byte HUF reuse path. + ( + StrategyTag::Lazy, + true, + &[(6, true), (7, true), (6, false), (16, false)], + ), ]; - for (strat, inputs) in cases { + for (strat, seed_huff, inputs) in cases { for (len, identical) in *inputs { let literals: Vec = if *identical { alloc::vec![0x5Au8; *len] } else { (0..*len as u8).collect() }; + // Seed both estimator and emit state with the same + // synthetic HUF table when `seed_huff` is true so the + // reuse path's `min_lits == 6` floor is exercised. + // Counts from a varied byte sequence give a valid + // (writeable) table that survives the `decide_huff_reuse` + // decision when literals are large enough to consider it. + let seed_table = if *seed_huff { + let mut counts = [0usize; 256]; + for b in (0..=63u8).chain(64..=127u8) { + counts[b as usize] = 1; + } + Some(huff0_encoder::HuffmanTable::build_from_counts(&counts)) + } else { + None + }; let mut est_state = CompressState:: { matcher: EntropyOnlyMatcher, - last_huff_table: None, + last_huff_table: seed_table.clone(), fse_tables: FseTables::new(), block_scratch: CompressedBlockScratch::new(), offset_hist: [1, 4, 8], @@ -1921,7 +1972,7 @@ mod tests { }; let mut emit_state = CompressState:: { matcher: EntropyOnlyMatcher, - last_huff_table: None, + last_huff_table: seed_table, fse_tables: FseTables::new(), block_scratch: CompressedBlockScratch::new(), offset_hist: [1, 4, 8], @@ -1941,8 +1992,9 @@ mod tests { assert_eq!( est, emitted.len(), - "estimator/emit parity broken: strategy={:?} len={} identical={} est={} emit={}", + "estimator/emit parity broken: strategy={:?} seed_huff={} len={} identical={} est={} emit={}", strat, + seed_huff, len, identical, est, From 60931c17edb9d8aff11bd80b748a075851cc33a7 Mon Sep 17 00:00:00 2001 From: Dmitry Prudnikov Date: Tue, 19 May 2026 04:24:36 +0300 Subject: [PATCH 07/14] refactor(encoder): rename literal-gate vars to describe value, not origin MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit `donor_clit` told the reader where the formula came from, not what the value represents. Rename to `huf_section_size` (tree description plus HUF payload, excluding the literals lhSize) — describes the bytes themselves so the call site is readable without recalling reference naming conventions. Same rename in both the emit (`compress_literals`) and estimator (`estimate_literals_section_bytes`) sides. Test names follow the same principle: - `min_literals_to_compress_matches_donor_table` → `min_literals_to_compress_returns_per_strategy_floor` - `min_gain_matches_donor_margin` → `min_gain_returns_per_strategy_margin` Behavior unchanged; 531/531 lib tests pass, clippy/fmt clean, ratio matrix 0 deltas vs main. --- zstd/src/encoding/blocks/compressed.rs | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/zstd/src/encoding/blocks/compressed.rs b/zstd/src/encoding/blocks/compressed.rs index ea28728e..94bdfc52 100644 --- a/zstd/src/encoding/blocks/compressed.rs +++ b/zstd/src/encoding/blocks/compressed.rs @@ -612,8 +612,8 @@ fn estimate_literals_section_bytes( // losing ratio. Mirror donor's payload-vs-srcSize form here. let raw_section_bytes = uncompressed_literals_header_bytes(literals.len()) + literals.len(); let mg = min_gain(literals.len(), strategy); - let donor_clit = total - compressed_header; // tree_desc + payload - if donor_clit >= literals.len().saturating_sub(mg) { + let huf_section_size = total - compressed_header; // tree_desc + payload, no lhSize + if huf_section_size >= literals.len().saturating_sub(mg) { *last_huff = None; return raw_section_bytes; } @@ -1782,9 +1782,9 @@ fn compress_literals( // fast..btopt, ~0.78% for btultra, ~0.39% for btultra2). Saturating // subtraction covers tiny inputs where `literals.len() < minGain`. let compressed_header_len = compressed_literals_header_bytes(literals.len()); - let donor_clit = total_len - compressed_header_len; // tree_desc + payload + let huf_section_size = total_len - compressed_header_len; // tree_desc + payload, no lhSize let mg = min_gain(literals.len(), strategy); - if donor_clit >= literals.len().saturating_sub(mg) { + if huf_section_size >= literals.len().saturating_sub(mg) { writer.reset_to(reset_idx); raw_literals(literals, writer); HuffmanTableUpdate::Cleared @@ -1847,7 +1847,7 @@ mod tests { } #[test] - fn min_literals_to_compress_matches_donor_table() { + fn min_literals_to_compress_returns_per_strategy_floor() { for strat in [ StrategyTag::Fast, StrategyTag::Dfast, @@ -1866,7 +1866,7 @@ mod tests { } #[test] - fn min_gain_matches_donor_margin() { + fn min_gain_returns_per_strategy_margin() { let src = 4096usize; for strat in [ StrategyTag::Fast, From bcc0c0d03ac2a8f8af59090870131e83fa2d1515 Mon Sep 17 00:00:00 2001 From: Dmitry Prudnikov Date: Tue, 19 May 2026 04:30:06 +0300 Subject: [PATCH 08/14] refactor(encoder): centralize raw-literal-fallback gate MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Extract `use_raw_literal_fallback(huf_section_size, literals_len, strategy)` helper so `compress_literals` (emit) and `estimate_literals_section_bytes` (probe) share one decision point — neither side can drift back to an on-wire-vs-on-wire comparison that inflates the threshold by the lhSize delta and rejects marginally compressed sections. Add `use_raw_literal_fallback_uses_payload_vs_srcsize_threshold` test that walks the 2-byte gap (`compressed_lhsize - raw_lhsize`) at `literals.len() = 20` to lock in the payload-vs-srcSize semantics. Tighten the RLE shortcut docstring: RLE equals raw at `len == 1` (both 2 bytes), strictly smaller only at `len >= 2`. 532/532 lib tests, clippy/fmt clean. --- zstd/src/encoding/blocks/compressed.rs | 83 ++++++++++++++++++++++---- 1 file changed, 72 insertions(+), 11 deletions(-) diff --git a/zstd/src/encoding/blocks/compressed.rs b/zstd/src/encoding/blocks/compressed.rs index 94bdfc52..364147a1 100644 --- a/zstd/src/encoding/blocks/compressed.rs +++ b/zstd/src/encoding/blocks/compressed.rs @@ -79,6 +79,28 @@ fn min_gain(src_size: usize, strategy: crate::encoding::strategy::StrategyTag) - (src_size >> minlog) + 2 } +/// Donor `compress_literals` raw-fallback gate +/// (`zstd_compress_literals.c:187-188`): emit raw when +/// `cLitSize >= srcSize - minGain`, where `cLitSize` is the HUF payload +/// plus tree description (the bytes `HUF_compress*` writes — excluding +/// the surrounding literals lhSize) and `srcSize` is the literal-payload +/// length. Compares payload-vs-srcSize, NOT on-wire-vs-on-wire, so the +/// gate is symmetric in header overhead. +/// +/// Centralized helper so `compress_literals` and +/// `estimate_literals_section_bytes` share the exact same decision and +/// neither side can drift back to the pre-2026-05 on-wire comparison +/// (which inflated the threshold by `compressed_lhsize - raw_lhsize` +/// bytes and rejected marginally-winning compressed sections). +#[inline] +fn use_raw_literal_fallback( + huf_section_size: usize, + literals_len: usize, + strategy: crate::encoding::strategy::StrategyTag, +) -> bool { + huf_section_size >= literals_len.saturating_sub(min_gain(literals_len, strategy)) +} + /// Donor `kInverseProbabilityLog256`: floor(-log2(x / 256) * 256). const INVERSE_PROBABILITY_LOG_256: [usize; 256] = [ 0, 2048, 1792, 1642, 1536, 1453, 1386, 1329, 1280, 1236, 1197, 1162, 1130, 1100, 1073, 1047, @@ -382,13 +404,14 @@ fn encode_block_parts_with_sequence_scratch( // after passing the `min_lits` gate and running a full HUF compress — // so donor emits raw for any all-identical section under `min_lits` // (e.g. 8..63 bytes at fast/dfast/greedy/lazy, 6..7 bytes with HUF - // reuse). RLE is unconditionally better than raw on any non-empty - // all-identical input (2 bytes vs `1 + len`), so our pre-check fires - // for ANY all-identical literal slice regardless of strategy/min_lits. - // This produces strictly smaller output than donor on the small - // all-identical edges while still matching donor on `>= min_lits` - // inputs (where donor's compress+`cLitSize==1` path reaches the same - // RLE block). Note the order — RLE pre-check runs BEFORE `min_lits`; + // reuse). RLE is at worst equal to raw on `len == 1` + // (both produce a 2-byte section) and strictly smaller for `len >= 2` + // (2 bytes vs `1 + len`), so our pre-check fires for ANY all-identical + // literal slice regardless of strategy/min_lits. This produces strictly + // smaller output than donor on the small all-identical edges while + // still matching donor on `>= min_lits` inputs (where donor's + // compress+`cLitSize==1` path reaches the same RLE block). + // Note the order — RLE pre-check runs BEFORE `min_lits`; // `estimate_literals_section_bytes` mirrors this exactly so probe // costs match emit byte-for-byte. if !literals_vec.is_empty() && all_bytes_identical(literals_vec) { @@ -611,9 +634,8 @@ fn estimate_literals_section_bytes( // bytes and rejects compressed sections that donor would keep, // losing ratio. Mirror donor's payload-vs-srcSize form here. let raw_section_bytes = uncompressed_literals_header_bytes(literals.len()) + literals.len(); - let mg = min_gain(literals.len(), strategy); let huf_section_size = total - compressed_header; // tree_desc + payload, no lhSize - if huf_section_size >= literals.len().saturating_sub(mg) { + if use_raw_literal_fallback(huf_section_size, literals.len(), strategy) { *last_huff = None; return raw_section_bytes; } @@ -1783,8 +1805,7 @@ fn compress_literals( // subtraction covers tiny inputs where `literals.len() < minGain`. let compressed_header_len = compressed_literals_header_bytes(literals.len()); let huf_section_size = total_len - compressed_header_len; // tree_desc + payload, no lhSize - let mg = min_gain(literals.len(), strategy); - if huf_section_size >= literals.len().saturating_sub(mg) { + if use_raw_literal_fallback(huf_section_size, literals.len(), strategy) { writer.reset_to(reset_idx); raw_literals(literals, writer); HuffmanTableUpdate::Cleared @@ -1884,6 +1905,46 @@ mod tests { assert_eq!(min_gain(64, StrategyTag::Fast), 3); } + #[test] + fn use_raw_literal_fallback_uses_payload_vs_srcsize_threshold() { + use super::{compressed_literals_header_bytes, use_raw_literal_fallback}; + // Donor formula: `huf_section_size >= literals_len - min_gain`, + // payload-vs-srcSize (no headers on either side). Verify the + // gate is symmetric in header overhead by hitting the boundary + // where the old on-wire `total >= raw_section - mg` form would + // have disagreed. + let strategy = StrategyTag::Fast; // min_gain(20, Fast) = (20>>6)+2 = 2 + + // literals_len = 20: raw_header = 1, compressed_header = 3. + // New threshold (payload-vs-srcSize): + // keep huf iff huf_section_size < 20 - 2 = 18 + // fallback iff huf_section_size >= 18 + // + // Old (regressed) threshold on-wire-vs-on-wire: + // total = huf_section_size + 3 + // fallback iff total >= (20 + 1) - 2 = 19 + // iff huf_section_size >= 16 + // + // Gap where formulas disagree: huf_section_size in [16, 18). + // The new formula MUST keep huf in this gap. + let literals_len = 20usize; + // Sanity-check the literals-header constants the math relies on. + assert_eq!(compressed_literals_header_bytes(literals_len), 3); + assert_eq!(super::uncompressed_literals_header_bytes(literals_len), 1); + + // Inside the gap — new keeps, old would have rejected: + assert!(!use_raw_literal_fallback(16, literals_len, strategy)); + assert!(!use_raw_literal_fallback(17, literals_len, strategy)); + + // At/above new threshold — both new and old fall back: + assert!(use_raw_literal_fallback(18, literals_len, strategy)); + assert!(use_raw_literal_fallback(19, literals_len, strategy)); + + // Below the old threshold — both keep: + assert!(!use_raw_literal_fallback(15, literals_len, strategy)); + assert!(!use_raw_literal_fallback(0, literals_len, strategy)); + } + #[test] fn estimator_literals_section_mirrors_emit_for_short_inputs() { use super::{ From bd6645f08f6dbba4c4980253a2880c0527c176f2 Mon Sep 17 00:00:00 2001 From: Dmitry Prudnikov Date: Tue, 19 May 2026 05:01:38 +0300 Subject: [PATCH 09/14] docs(encoder): clarify Lazy strategy mapping and min_gain usage scope MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit `min_literals_to_compress` Lazy mapping: drop the "most-aggressive-of-band" phrasing — donor's shift table pins strategies 1..6 to `shift = 3` uniformly, so Lazy → 64-byte floor regardless of which donor index in the 4..6 band we'd nominally pick. No aggressiveness gradient exists within this band. `min_gain` doc: split the donor formula (gates both block-level and literal-section in donor) from the current crate usage (helper is wired into literal-section gates only; block-level emit and probe still use uniform `(source_len >> 8) + 2`). Previous wording conflated the two and read as if the helper were already plumbed everywhere. --- zstd/src/encoding/blocks/compressed.rs | 33 +++++++++++++------------- 1 file changed, 17 insertions(+), 16 deletions(-) diff --git a/zstd/src/encoding/blocks/compressed.rs b/zstd/src/encoding/blocks/compressed.rs index 364147a1..2394331c 100644 --- a/zstd/src/encoding/blocks/compressed.rs +++ b/zstd/src/encoding/blocks/compressed.rs @@ -26,10 +26,11 @@ const MAX_NB_BLOCK_SPLITS: usize = 196; /// → 16, strategy 9 (btultra2) → 8. /// /// Our `StrategyTag` enum has seven variants (no separate lazy2/btlazy2 — -/// the `Lazy` variant covers donor strategies 4..6). Map our variants to -/// donor strategy numbers using the most-aggressive-of-band mapping -/// (`Lazy → strat=5 → shift=3`) so the threshold matches what donor would -/// pick for the levels we route to that variant. +/// the `Lazy` variant covers donor strategies 4..6). Within the +/// fast..lazy2 band donor's shift table is flat: strategies 1..6 all +/// pin `shift = MIN(9 - strat, 3) = 3`, so `Lazy → 64-byte floor` +/// regardless of which donor index (4, 5, or 6) we'd nominally use. +/// No aggressiveness gradient within this band to preserve. #[inline] fn min_literals_to_compress( strategy: crate::encoding::strategy::StrategyTag, @@ -49,9 +50,9 @@ fn min_literals_to_compress( } /// Donor `ZSTD_minGain` (`zstd_compress_internal.h:677-684`): -/// strategy-aware minimum-compression margin used by both block-level -/// emit ("compressed block must beat raw + minGain bytes") and the -/// literal-section expansion check. +/// strategy-aware minimum-compression margin. In donor it gates both +/// the block-level "compressed block must beat raw + minGain" decision +/// and the literal-section `cLitSize >= srcSize - minGain` fallback. /// /// Formula: `minlog = (strat >= btultra) ? strat - 1 : 6; (src_size >> /// minlog) + 2`. So: @@ -59,15 +60,15 @@ fn min_literals_to_compress( /// - btultra (strat 8): minlog=7 → ~0.78% margin + 2 bytes /// - btultra2 (strat 9): minlog=8 → ~0.39% margin + 2 bytes /// -/// Used in `compress_literals` and `estimate_literals_section_bytes` to -/// match donor's `cLitSize >= srcSize - minGain` raw-fallback gate. -/// Existing block-level emit / probe paths (`emit_single_sequence_block`, -/// `SplitEstimator::estimate_subblock_size`) already use `min_log = 8` -/// directly (`source_len >> 8 + 2`) — that is the btultra2 margin, the -/// most permissive of the band, applied uniformly. Migrating those sites -/// to this strategy-aware helper is a separate cleanup; this commit only -/// wires the helper into the literal-section gates that previously had -/// no `min_gain` margin at all (bare `>= raw_section_bytes`). +/// **Current usage in this crate:** wired into the literal-section +/// raw-fallback gate (`compress_literals` + +/// `estimate_literals_section_bytes`) only — those sites previously +/// had no margin at all (bare `>= raw_section_bytes`). +/// **Not yet wired into** the block-level emit/probe paths +/// (`emit_single_sequence_block`, `SplitEstimator::estimate_subblock_size`), +/// which still use a uniform `(source_len >> 8) + 2` calculation +/// (the btultra2 value applied across all strategies). Migrating +/// those sites is a separate cleanup. #[inline] fn min_gain(src_size: usize, strategy: crate::encoding::strategy::StrategyTag) -> usize { use crate::encoding::strategy::StrategyTag; From 9945077ce369bacace5e58b04355f23dcb7a8a3c Mon Sep 17 00:00:00 2001 From: Dmitry Prudnikov Date: Tue, 19 May 2026 05:24:13 +0300 Subject: [PATCH 10/14] docs(encoder): document strategy_tag invariant + accurate RLE-vs-raw MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit `CompressState.strategy_tag` doc: replace the misleading "defaults to Fast" wording with an explicit invariant — the field must be initialized from the active `CompressionLevel` and re-synced on every `matcher.reset()`. There is no `Default` impl; production constructors and tests must supply a value. `estimate_literals_section_bytes` docstring: tighten the RLE-vs-raw size claim so it matches the actual invariant — RLE equals raw at `len == 1` (both 2 bytes) and is strictly smaller for `len >= 2`, i.e. RLE is never worse than raw on all-identical input. --- zstd/src/encoding/blocks/compressed.rs | 10 ++++++---- zstd/src/encoding/frame_compressor.rs | 14 ++++++++++---- 2 files changed, 16 insertions(+), 8 deletions(-) diff --git a/zstd/src/encoding/blocks/compressed.rs b/zstd/src/encoding/blocks/compressed.rs index 2394331c..9ef6d2b5 100644 --- a/zstd/src/encoding/blocks/compressed.rs +++ b/zstd/src/encoding/blocks/compressed.rs @@ -560,10 +560,12 @@ fn estimate_literals_section_bytes( ) -> usize { // Mirror `encode_block_parts_with_sequence_scratch` literal-mode branches // **in the same order**. The emitter pre-checks `all_identical` - // (any non-empty section) BEFORE the `min_lits` gate — RLE is - // unconditionally smaller than raw on all-identical inputs, so it - // is selected regardless of strategy. Estimator must use the same - // ordering and predicate so probe costs match emit byte-for-byte. + // (any non-empty section) BEFORE the `min_lits` gate — on + // all-identical inputs RLE is equal to raw at `len == 1` (both + // 2 bytes) and strictly smaller for `len >= 2`, so it is never + // worse than raw and is selected regardless of strategy. Estimator + // must use the same ordering and predicate so probe costs match + // emit byte-for-byte. if !literals.is_empty() && all_bytes_identical(literals) { *last_huff = None; return uncompressed_literals_header_bytes(literals.len()) + 1; diff --git a/zstd/src/encoding/frame_compressor.rs b/zstd/src/encoding/frame_compressor.rs index bc67a14b..7c5069e2 100644 --- a/zstd/src/encoding/frame_compressor.rs +++ b/zstd/src/encoding/frame_compressor.rs @@ -394,10 +394,16 @@ pub(crate) struct CompressState { /// `matcher.reset()` call. Used by the literal-compression gates /// (`min_literals_to_compress`, `min_gain`) in /// `encoding::blocks::compressed` to mirror donor's strategy-aware - /// thresholds (`zstd_compress_literals.c:114-127, 187-188`). Defaults - /// to `Fast` so a `CompressState` constructed without an explicit - /// reset still has a valid value (matches the level-1 default donor - /// would emit for `disableLiteralCompression == 0`). + /// thresholds (`zstd_compress_literals.c:114-127, 187-188`). + /// + /// **Invariant (required of every construction site):** must be + /// initialized from the active `CompressionLevel` via + /// `StrategyTag::for_compression_level`, and re-synced on every + /// `matcher.reset()` so the level-aware gates stay correct after + /// a level change. There is no `Default` impl — production paths + /// (`FrameCompressor::new`, `new_with_matcher`, the streaming + /// encoder constructor) plumb this explicitly. Tests that build + /// `CompressState` by hand must also supply a value. pub(crate) strategy_tag: crate::encoding::strategy::StrategyTag, } From 25d8dbaca19042afacbb58dc2907476853a5621c Mon Sep 17 00:00:00 2001 From: Dmitry Prudnikov Date: Tue, 19 May 2026 05:43:40 +0300 Subject: [PATCH 11/14] fix(encoder): sync strategy_tag on streaming reset path `StreamingEncoder::ensure_frame_started` resets the matcher and clears last_huff / fse_tables for the new frame but did not refresh `state.strategy_tag` from the active compression level. That left the literal-compression gates (`min_literals_to_compress`, `min_gain`) reading the construction-time strategy on every subsequent frame, which would silently leak the wrong gate floor if the encoder were ever extended with a between-frame level change. Mirror `FrameCompressor::compress`'s reset-time sync at the streaming reset site so both entry points are byte-equivalent at the gate level. Tighten the `CompressState.strategy_tag` invariant docstring to enumerate the two owned reset sites explicitly. Add a regression test that exercises the streaming reset across four levels and asserts the synced tag matches `StrategyTag::for_compression_level`. 533/533 lib tests pass; clippy and fmt clean. --- zstd/src/encoding/frame_compressor.rs | 9 +++-- zstd/src/encoding/streaming_encoder.rs | 47 ++++++++++++++++++++++++++ 2 files changed, 53 insertions(+), 3 deletions(-) diff --git a/zstd/src/encoding/frame_compressor.rs b/zstd/src/encoding/frame_compressor.rs index 7c5069e2..c557eb75 100644 --- a/zstd/src/encoding/frame_compressor.rs +++ b/zstd/src/encoding/frame_compressor.rs @@ -398,9 +398,12 @@ pub(crate) struct CompressState { /// /// **Invariant (required of every construction site):** must be /// initialized from the active `CompressionLevel` via - /// `StrategyTag::for_compression_level`, and re-synced on every - /// `matcher.reset()` so the level-aware gates stay correct after - /// a level change. There is no `Default` impl — production paths + /// `StrategyTag::for_compression_level`, and re-synced from the + /// active level alongside every `matcher.reset()` call so the + /// level-aware gates stay correct after a level change. The two + /// reset sites that own this sync are `FrameCompressor::compress` + /// and `StreamingEncoder::ensure_frame_started`. There is no + /// `Default` impl — production constructors /// (`FrameCompressor::new`, `new_with_matcher`, the streaming /// encoder constructor) plumb this explicitly. Tests that build /// `CompressState` by hand must also supply a value. diff --git a/zstd/src/encoding/streaming_encoder.rs b/zstd/src/encoding/streaming_encoder.rs index 54687234..78e696a5 100644 --- a/zstd/src/encoding/streaming_encoder.rs +++ b/zstd/src/encoding/streaming_encoder.rs @@ -236,6 +236,13 @@ impl StreamingEncoder { self.state.fse_tables.ll_previous = None; self.state.fse_tables.ml_previous = None; self.state.fse_tables.of_previous = None; + // Sync `state.strategy_tag` from the active compression level so the + // literal-compression gates (`min_literals_to_compress`, `min_gain` + // in `encoding::blocks::compressed`) see the correct strategy for + // every frame. Mirrors `FrameCompressor::compress` and keeps both + // entry points byte-equivalent at the gate level. + self.state.strategy_tag = + crate::encoding::strategy::StrategyTag::for_compression_level(self.compression_level); #[cfg(feature = "hash")] { self.hasher = XxHash64::with_seed(0); @@ -1175,6 +1182,46 @@ mod tests { ); } + #[test] + fn ensure_frame_started_syncs_strategy_tag_from_compression_level() { + // The literal-compression gates (`min_literals_to_compress`, + // `min_gain`) read `state.strategy_tag`. Regression: every reset + // site MUST refresh that tag from the active compression level, + // otherwise gates use stale strategy after a frame start. + use crate::encoding::strategy::StrategyTag; + // Pick a level pair whose resolved StrategyTag differs so a + // missed sync would visibly leak the construction-time value. + let cases = [ + ( + CompressionLevel::Fastest, + StrategyTag::for_compression_level(CompressionLevel::Fastest), + ), + ( + CompressionLevel::Default, + StrategyTag::for_compression_level(CompressionLevel::Default), + ), + ( + CompressionLevel::Better, + StrategyTag::for_compression_level(CompressionLevel::Better), + ), + ( + CompressionLevel::Best, + StrategyTag::for_compression_level(CompressionLevel::Best), + ), + ]; + for (level, expected_tag) in cases { + let mut encoder = StreamingEncoder::new(Vec::new(), level); + // First write triggers `ensure_frame_started` → matcher reset + // → strategy_tag sync. + encoder.write_all(b"x").unwrap(); + assert_eq!( + encoder.state.strategy_tag, expected_tag, + "strategy_tag drift after frame start at level {level:?}", + ); + let _ = encoder.finish().unwrap(); + } + } + #[test] fn no_pledged_size_omits_fcs_from_header() { let mut encoder = StreamingEncoder::new(Vec::new(), CompressionLevel::Fastest); From 150a660b7847c331d35ddb957862e8407913d429 Mon Sep 17 00:00:00 2001 From: Dmitry Prudnikov Date: Tue, 19 May 2026 06:02:11 +0300 Subject: [PATCH 12/14] =?UTF-8?q?test(encoder):=20cover=20FrameCompressor:?= =?UTF-8?q?:set=5Fcompression=5Flevel=20=E2=86=92=20strategy=5Ftag?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The streaming reset-time sync test only exercises a single fixed level per encoder, which would pass even with the reset-time sync deleted because the construction-time tag already matches. Add a regression test that drives the level-change path explicitly: construct `FrameCompressor` at `CompressionLevel::Fastest`, call `set_compression_level(Best)` to cross from the Fast band into the BtUltra2 band, run a full `compress()` cycle, then verify `state.strategy_tag` reflects the new level's `StrategyTag::for_compression_level` resolution. Includes an `assert_ne!` fixture invariant that the two chosen levels resolve to different `StrategyTag` variants, so a future renumbering that collapses the bands would be loud about it rather than turning the test into a no-op. 534/534 lib tests pass; clippy and fmt clean. --- zstd/src/encoding/frame_compressor.rs | 45 +++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) diff --git a/zstd/src/encoding/frame_compressor.rs b/zstd/src/encoding/frame_compressor.rs index c557eb75..d5e76aba 100644 --- a/zstd/src/encoding/frame_compressor.rs +++ b/zstd/src/encoding/frame_compressor.rs @@ -2223,4 +2223,49 @@ mod tests { decoder.collect_to_writer(&mut decoded).unwrap(); assert_eq!(decoded, data, "roundtrip must reproduce the input verbatim"); } + + /// Regression: `set_compression_level` followed by `compress()` must + /// refresh `state.strategy_tag` through the reset-time sync so the + /// literal-compression gates (`min_literals_to_compress`, + /// `min_gain`) use the NEW level's strategy. Picks a level pair that + /// crosses strategy bands (Fastest=Fast band → Best=BtUltra2 band) + /// so a missed sync would leave the construction-time tag visible + /// and trip the assertion. + #[cfg(feature = "std")] + #[test] + fn set_compression_level_then_compress_refreshes_strategy_tag() { + use super::CompressionLevel; + use crate::encoding::strategy::StrategyTag; + + let data = vec![0xABu8; 256]; + let mut out = Vec::new(); + let mut compressor = FrameCompressor::new(CompressionLevel::Fastest); + let initial_tag = compressor.state.strategy_tag; + assert_eq!( + initial_tag, + StrategyTag::for_compression_level(CompressionLevel::Fastest), + "construction-time strategy_tag must reflect initial level", + ); + + // Switch to a level whose resolved strategy lives in a different + // band, then run a full compress cycle — the matcher.reset() + // inside `compress` is the only site that can refresh the tag. + compressor.set_compression_level(CompressionLevel::Best); + compressor.set_source(data.as_slice()); + compressor.set_drain(&mut out); + compressor.compress(); + + let new_tag = compressor.state.strategy_tag; + let expected = StrategyTag::for_compression_level(CompressionLevel::Best); + assert_eq!( + new_tag, expected, + "strategy_tag must follow set_compression_level → compress, \ + got {new_tag:?} expected {expected:?}", + ); + assert_ne!( + new_tag, initial_tag, + "test fixture invariant: chosen levels must resolve to \ + different StrategyTag variants", + ); + } } From a4b484b3c8e0cfbdb8751036fd8162a3686b917d Mon Sep 17 00:00:00 2001 From: Dmitry Prudnikov Date: Tue, 19 May 2026 06:12:55 +0300 Subject: [PATCH 13/14] docs(encoder): drop hard-coded byte counts in literal-gate rationale MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit `min_literals_to_compress` doc: replace the "≥30 bytes" HUF tree-description lower bound with neutral phrasing — small alphabets with low max symbols can serialize substantially smaller descriptions. The exact byte cost depends on alphabet size and max symbol, not a fixed minimum; the surrounding sentence already conveys that overhead-dominates-payload is what drives the per-strategy floor. RLE-shortcut docstrings (emit and estimator branches): replace "2 bytes vs `1 + len`" with the header-helper-aware form. Both RLE and raw use the same `uncompressed_literals_header_bytes(len)` (1/2/3/5 bytes by length tier), so the size relationship is RLE = lhSize + 1, raw = lhSize + len → equal at `len == 1`, RLE smaller by `len - 1` for `len >= 2`. The previous wording was only correct for the `len < 32` tier where lhSize = 1. No code changes; clippy and fmt clean. --- zstd/src/encoding/blocks/compressed.rs | 36 +++++++++++++++----------- 1 file changed, 21 insertions(+), 15 deletions(-) diff --git a/zstd/src/encoding/blocks/compressed.rs b/zstd/src/encoding/blocks/compressed.rs index 9ef6d2b5..ef3698e6 100644 --- a/zstd/src/encoding/blocks/compressed.rs +++ b/zstd/src/encoding/blocks/compressed.rs @@ -20,8 +20,10 @@ const MAX_NB_BLOCK_SPLITS: usize = 196; /// Formula: `shift = MIN(9 - donor_strategy, 3); mintc = (huf_repeat == /// valid) ? 6 : (8 << shift)`. With huf reuse available, the per-block huf /// header overhead is gone, so the cheap floor is 6 bytes. Without it, the -/// huf tree-description overhead (≥30 bytes) dominates on small payloads -/// and we'd produce a worse-than-raw block — donor's shift table gives +/// huf tree-description must be serialized per block — alphabet size and +/// max symbol determine its exact byte cost, but on payloads near the +/// per-strategy floor that overhead dominates and the compressed section +/// loses to raw. Donor's shift table picks the floor per strategy: /// strategy 1..6 → 64 bytes, strategy 7 (btopt) → 32, strategy 8 (btultra) /// → 16, strategy 9 (btultra2) → 8. /// @@ -405,13 +407,16 @@ fn encode_block_parts_with_sequence_scratch( // after passing the `min_lits` gate and running a full HUF compress — // so donor emits raw for any all-identical section under `min_lits` // (e.g. 8..63 bytes at fast/dfast/greedy/lazy, 6..7 bytes with HUF - // reuse). RLE is at worst equal to raw on `len == 1` - // (both produce a 2-byte section) and strictly smaller for `len >= 2` - // (2 bytes vs `1 + len`), so our pre-check fires for ANY all-identical - // literal slice regardless of strategy/min_lits. This produces strictly - // smaller output than donor on the small all-identical edges while - // still matching donor on `>= min_lits` inputs (where donor's - // compress+`cLitSize==1` path reaches the same RLE block). + // reuse). RLE and raw share the same lhSize for a given `len` + // (both use `uncompressed_literals_header_bytes`), so RLE = lhSize + 1 + // and raw = lhSize + len. That makes RLE equal to raw on `len == 1` + // and smaller by exactly `len - 1` bytes for `len >= 2`, regardless of + // the lhSize tier (1 / 2 / 3 / 5 bytes). Our pre-check fires for ANY + // all-identical literal slice regardless of strategy/min_lits. + // This produces strictly smaller output than donor on the small + // all-identical edges while still matching donor on `>= min_lits` + // inputs (where donor's compress+`cLitSize==1` path reaches the same + // RLE block). // Note the order — RLE pre-check runs BEFORE `min_lits`; // `estimate_literals_section_bytes` mirrors this exactly so probe // costs match emit byte-for-byte. @@ -560,12 +565,13 @@ fn estimate_literals_section_bytes( ) -> usize { // Mirror `encode_block_parts_with_sequence_scratch` literal-mode branches // **in the same order**. The emitter pre-checks `all_identical` - // (any non-empty section) BEFORE the `min_lits` gate — on - // all-identical inputs RLE is equal to raw at `len == 1` (both - // 2 bytes) and strictly smaller for `len >= 2`, so it is never - // worse than raw and is selected regardless of strategy. Estimator - // must use the same ordering and predicate so probe costs match - // emit byte-for-byte. + // (any non-empty section) BEFORE the `min_lits` gate — RLE and raw + // share `uncompressed_literals_header_bytes(len)` (1/2/3/5 bytes by + // length tier), so on all-identical inputs RLE = lhSize + 1 equals + // raw = lhSize + len at `len == 1` and is smaller by `len - 1` for + // `len >= 2`. RLE is never worse than raw, so it is selected + // regardless of strategy. Estimator must use the same ordering and + // predicate so probe costs match emit byte-for-byte. if !literals.is_empty() && all_bytes_identical(literals) { *last_huff = None; return uncompressed_literals_header_bytes(literals.len()) + 1; From 3933521be56aa3020c6e8bbe9df2b18b21d92a52 Mon Sep 17 00:00:00 2001 From: Dmitry Prudnikov Date: Tue, 19 May 2026 06:29:11 +0300 Subject: [PATCH 14/14] test(encoder): tighten strategy_tag regressions to actually fail without the sync MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Three factual cleanups so the literal-gate tests provably guard the reset-time `strategy_tag` sync and the docstrings around the RLE pre-check stop misrepresenting historical behavior. - `set_compression_level_then_compress_refreshes_strategy_tag`: switch from `CompressionLevel::Best` to `CompressionLevel::Level(20)` so the post-switch tag genuinely crosses into the BtUltra2 band. `Best` resolves to `Lazy` today, which keeps `min_literals_to_compress` in the same `shift=3 → 64-byte` band as `Fast` and weakens the signal. Add an explicit `assert_eq!(expected, StrategyTag::BtUltra2)` fixture invariant so future renumbering breaks loudly. - `ensure_frame_started_refreshes_stale_strategy_tag_at_reset` (renamed): the previous version constructed the encoder at the same level it asserted, so `StreamingEncoder::new`'s own init satisfied the assertion even if the reset-time sync were deleted. Now the test deliberately corrupts `state.strategy_tag` to a sentinel (`BtUltra2`) before the first write and asserts the reset path restores the legitimate tag — a missing sync leaves the sentinel visible and trips the assertion. - RLE pre-check docstring on the emit side: drop the "6/7 bytes with HUF reuse" example. With HUF reuse `min_lits == 6`, so 6/7-byte sections are NOT below the gate; donor would HUF-encode them and reach RLE via `cLitSize == 1`. The justification for the pre-check is the genuine under-gate range (e.g. 8..63 bytes at fast/dfast/greedy/lazy without HUF reuse), not the HUF-reuse band. - Estimator/emit parity test prose: the historical behavior for 6/7-byte all-identical sections under the prior hardcoded `len >= 8` gate was RAW (not HUF). Correct both the per-case rationale and the block-level docstring; the test cases themselves are unchanged. 534/534 lib tests pass; clippy and fmt clean. --- zstd/src/encoding/blocks/compressed.rs | 25 +++++----- zstd/src/encoding/frame_compressor.rs | 23 ++++++--- zstd/src/encoding/streaming_encoder.rs | 64 ++++++++++++++------------ 3 files changed, 66 insertions(+), 46 deletions(-) diff --git a/zstd/src/encoding/blocks/compressed.rs b/zstd/src/encoding/blocks/compressed.rs index ef3698e6..f1c586df 100644 --- a/zstd/src/encoding/blocks/compressed.rs +++ b/zstd/src/encoding/blocks/compressed.rs @@ -406,8 +406,8 @@ fn encode_block_parts_with_sequence_scratch( // the `cLitSize == 1` branch (`zstd_compress_literals.c:192-201`) // after passing the `min_lits` gate and running a full HUF compress — // so donor emits raw for any all-identical section under `min_lits` - // (e.g. 8..63 bytes at fast/dfast/greedy/lazy, 6..7 bytes with HUF - // reuse). RLE and raw share the same lhSize for a given `len` + // (e.g. 8..63 bytes at fast/dfast/greedy/lazy without HUF reuse). + // RLE and raw share the same lhSize for a given `len` // (both use `uncompressed_literals_header_bytes`), so RLE = lhSize + 1 // and raw = lhSize + len. That makes RLE equal to raw on `len == 1` // and smaller by exactly `len - 1` bytes for `len >= 2`, regardless of @@ -1966,9 +1966,11 @@ mod tests { // estimator's predicted size MUST equal the bytes the emitter // actually writes. Cases include: (a) fresh state with // `last_huff_table: None` covering the strategy-specific - // `min_lits` band (8/16/32/64), (b) seeded HUF-reuse state - // covering the lowered floor of 6 and 6/7-byte all-identical - // sections that previously hit the HUF path instead of RLE, + // `min_lits` band (8/16/32/64), (b) seeded HUF-reuse state at + // the lowered floor of 6 — under the prior hardcoded `len >= 8` + // gate 6/7-byte all-identical sections went raw, now they pass + // `min_lits == 6` and the donor-parity HUF+`cLitSize==1` path + // would route them to RLE; the pre-check shortcuts that path, // (c) sub-`min_lits` all-identical sections that take the // RLE pre-check regardless of strategy. type Inputs = &'static [(usize, bool)]; @@ -1997,12 +1999,13 @@ mod tests { false, &[(8, true), (31, true), (32, false)], ), - // HUF reuse path: floor drops to 6, so 6/7-byte sections - // previously hit the HUF compress path. With the RLE - // pre-check now non-empty-only, all-identical 6/7-byte - // sections must route through RLE and stay byte-equivalent - // estimator-vs-emit. Also exercise non-identical 6-byte - // raw fallback and 16-byte HUF reuse path. + // HUF reuse path: floor drops to 6. Under the prior + // hardcoded `len >= 8` gate 6/7-byte sections went raw; + // post-fix, all-identical 6/7-byte sections take the RLE + // pre-check and stay byte-equivalent estimator-vs-emit + // (donor parity would route them HUF→cLitSize==1→RLE). + // Also exercise non-identical 6-byte raw fallback and + // 16-byte HUF reuse path. ( StrategyTag::Lazy, true, diff --git a/zstd/src/encoding/frame_compressor.rs b/zstd/src/encoding/frame_compressor.rs index d5e76aba..140deece 100644 --- a/zstd/src/encoding/frame_compressor.rs +++ b/zstd/src/encoding/frame_compressor.rs @@ -2227,10 +2227,14 @@ mod tests { /// Regression: `set_compression_level` followed by `compress()` must /// refresh `state.strategy_tag` through the reset-time sync so the /// literal-compression gates (`min_literals_to_compress`, - /// `min_gain`) use the NEW level's strategy. Picks a level pair that - /// crosses strategy bands (Fastest=Fast band → Best=BtUltra2 band) - /// so a missed sync would leave the construction-time tag visible - /// and trip the assertion. + /// `min_gain`) use the NEW level's strategy. Picks a level pair + /// that genuinely crosses strategy bands — `Fastest` resolves to + /// `Fast`, `Level(20)` resolves to `BtUltra2` — so a missed sync + /// would leave the construction-time tag visible and trip the + /// assertion. `CompressionLevel::Best` would also pass type-wise + /// but resolves to `Lazy` today, which keeps `min_literals_to_compress` + /// in the same `shift=3 → 64-byte` band as `Fast` and weakens the + /// signal that the gate floor actually moved. #[cfg(feature = "std")] #[test] fn set_compression_level_then_compress_refreshes_strategy_tag() { @@ -2250,18 +2254,25 @@ mod tests { // Switch to a level whose resolved strategy lives in a different // band, then run a full compress cycle — the matcher.reset() // inside `compress` is the only site that can refresh the tag. - compressor.set_compression_level(CompressionLevel::Best); + let new_level = CompressionLevel::Level(20); + compressor.set_compression_level(new_level); compressor.set_source(data.as_slice()); compressor.set_drain(&mut out); compressor.compress(); let new_tag = compressor.state.strategy_tag; - let expected = StrategyTag::for_compression_level(CompressionLevel::Best); + let expected = StrategyTag::for_compression_level(new_level); assert_eq!( new_tag, expected, "strategy_tag must follow set_compression_level → compress, \ got {new_tag:?} expected {expected:?}", ); + assert_eq!( + expected, + StrategyTag::BtUltra2, + "test fixture invariant: Level(20) must resolve to BtUltra2 \ + so the post-switch tag visibly crosses the band boundary", + ); assert_ne!( new_tag, initial_tag, "test fixture invariant: chosen levels must resolve to \ diff --git a/zstd/src/encoding/streaming_encoder.rs b/zstd/src/encoding/streaming_encoder.rs index 78e696a5..5499f371 100644 --- a/zstd/src/encoding/streaming_encoder.rs +++ b/zstd/src/encoding/streaming_encoder.rs @@ -1183,40 +1183,46 @@ mod tests { } #[test] - fn ensure_frame_started_syncs_strategy_tag_from_compression_level() { + fn ensure_frame_started_refreshes_stale_strategy_tag_at_reset() { // The literal-compression gates (`min_literals_to_compress`, - // `min_gain`) read `state.strategy_tag`. Regression: every reset - // site MUST refresh that tag from the active compression level, - // otherwise gates use stale strategy after a frame start. + // `min_gain`) read `state.strategy_tag`. Regression: every + // reset site MUST refresh that tag from the active compression + // level — relying on construction-time initialization alone is + // not enough, because later mutations or reuse patterns can + // leave the tag stale. + // + // To exercise the RESET-time refresh (not just the + // construction-time init that `StreamingEncoder::new` does for + // free), this test deliberately corrupts `state.strategy_tag` + // to a value that does NOT match the active level, then + // triggers `ensure_frame_started` and asserts the reset path + // wrote the correct tag back. If the sync line in + // `ensure_frame_started` were deleted, the corrupted value + // would survive the write and fail the assertion. use crate::encoding::strategy::StrategyTag; - // Pick a level pair whose resolved StrategyTag differs so a - // missed sync would visibly leak the construction-time value. - let cases = [ - ( - CompressionLevel::Fastest, - StrategyTag::for_compression_level(CompressionLevel::Fastest), - ), - ( - CompressionLevel::Default, - StrategyTag::for_compression_level(CompressionLevel::Default), - ), - ( - CompressionLevel::Better, - StrategyTag::for_compression_level(CompressionLevel::Better), - ), - ( - CompressionLevel::Best, - StrategyTag::for_compression_level(CompressionLevel::Best), - ), - ]; - for (level, expected_tag) in cases { + for level in [ + CompressionLevel::Fastest, + CompressionLevel::Default, + CompressionLevel::Better, + CompressionLevel::Best, + ] { + let expected = StrategyTag::for_compression_level(level); let mut encoder = StreamingEncoder::new(Vec::new(), level); - // First write triggers `ensure_frame_started` → matcher reset - // → strategy_tag sync. + // Pick a sentinel that differs from the legitimate tag so + // a missing reset-time sync is observable. BtUltra2 is the + // most-aggressive variant; the four levels above resolve + // to Fast/Dfast/Lazy/Lazy respectively, none equal to it. + let sentinel = StrategyTag::BtUltra2; + assert_ne!( + expected, sentinel, + "sentinel must differ from the legitimate tag at level {level:?}", + ); + encoder.state.strategy_tag = sentinel; encoder.write_all(b"x").unwrap(); assert_eq!( - encoder.state.strategy_tag, expected_tag, - "strategy_tag drift after frame start at level {level:?}", + encoder.state.strategy_tag, expected, + "reset-time strategy_tag sync missing at level {level:?}: \ + sentinel survived `ensure_frame_started`", ); let _ = encoder.finish().unwrap(); }