feat(encoding): add dictionary compression support by polaz · Pull Request #44 · structured-world/structured-zstd

polaz · 2026-03-28T19:54:27Z

Summary

add encoder dictionary support via FrameCompressor::set_dictionary and set_dictionary_from_bytes
write dictionary id to frame header and prime matcher state with dictionary content/history before encoding
keep advertised frame window size decoupled from internal dictionary-retention budget
cap dictionary-retention budget to bytes actually committed to matcher history (ignore short uncommittable tails)
add raw-content dictionary constructor (Dictionary::from_raw_content) for dict_builder output
add dictionary validation errors for zero dictionary id and zero repeat offsets
add regression tests covering:
- dict id enforcement and roundtrip
- C zstd decompression of dict-compressed output
- roundtrip with dict_builder-generated dictionary
- dictionary tail budgeting and window-size invariants

Validation

cargo fmt -- --check
cargo build --workspace
cargo nextest run --workspace
cargo clippy -p structured-zstd --all-targets -- -D warnings
cargo clippy -p structured-zstd --all-targets --features dict_builder -- -D warnings
cargo nextest run -p structured-zstd --features dict_builder -E "test(dictionary_compression_roundtrips_with_dict_builder_dictionary) | test(dictionary_compression_sets_required_dict_id_and_roundtrips)"

Closes #8

Summary by CodeRabbit

New Features
- Attach, load, clear, and advertise dictionaries for compression; compressors can prime matcher and encoder state from attached dictionaries
- Load dictionaries from raw bytes; added encoder helpers and clonable tables for reuse
Bug Fixes
- Reject invalid dictionaries (zero IDs or zero repeat offsets) with specific errors and clearer size/bounds checks; adjusted default offset history to improve safety
Tests
- Unit tests for decoding, priming behavior, and non‑panic edge cases
Chores
- Adjusted package exclude paths in metadata

- add FrameCompressor dictionary APIs, including parse-from-bytes helper - write dictionary id into frame header and prime matcher with dictionary history - support raw-content dictionaries for dict_builder outputs - add regression tests for dict-id enforcement, C interop, and dict_builder roundtrip Closes #8

coderabbitai · 2026-03-28T19:54:39Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f66e1240-bd79-4a71-92af-b36893f26fef

📥 Commits

Reviewing files that changed from the base of the PR and between 29de24a and 127d41d.

📒 Files selected for processing (9)

zstd/Cargo.toml
zstd/src/decoding/dictionary.rs
zstd/src/decoding/errors.rs
zstd/src/encoding/frame_compressor.rs
zstd/src/encoding/match_generator.rs
zstd/src/encoding/mod.rs
zstd/src/fse/fse_decoder.rs
zstd/src/huff0/huff0_decoder.rs
zstd/src/huff0/huff0_encoder.rs

📝 Walkthrough

Walkthrough

Adds encoder-side dictionary support: new Dictionary constructor, stricter dictionary decode validation and errors, matcher priming and window-budget tracking, FrameCompressor APIs to attach/seed dictionaries and advertise dictionary_id, helpers to convert decoder tables to encoder tables, tests, and a Cargo packaging tweak.

Changes

Cohort / File(s)	Summary
Dictionary decoding & errors `zstd/src/decoding/dictionary.rs`, `zstd/src/decoding/errors.rs`	Add `Dictionary::from_raw_content(id, Vec<u8>)`; strengthen `Dictionary::decode_dict` with minimum-size checks, reject `dict_id == 0`, require non‑zero repeat offsets, change default `offset_hist` to `[1,4,8]`; add `DictionaryTooSmall`, `ZeroDictionaryId`, `ZeroRepeatOffsetInDictionary` error variants and Display messages; add unit tests.
Frame compressor & public API + tests `zstd/src/encoding/frame_compressor.rs`	Add `dictionary` and `dictionary_entropy_cache` fields and `set_dictionary`/`set_dictionary_from_bytes`/`clear_dictionary` APIs; conditionally prime matcher and seed entropy tables based on compression level and matcher capability; emit `FrameHeader.dictionary_id` only when priming is active; tests for id advertisement, seeding, and validation.
Match generator / Matcher trait `zstd/src/encoding/match_generator.rs`, `zstd/src/encoding/mod.rs`	Introduce `reported_window_size` and `dictionary_retained_budget`; implement `prime_with_dictionary` and `supports_dictionary_priming`; commit dict chunks into matcher history with backend-specific tail rules; budget-driven eviction/trim loops and retirement logic; SuffixStore key hardening and eviction-report fixes; tests.
Entropy conversion & encoder clone `zstd/src/fse/fse_decoder.rs`, `zstd/src/huff0/huff0_decoder.rs`, `zstd/src/huff0/huff0_encoder.rs`	Add `to_encoder_table()` helpers converting decoder FSE/Huffman tables to encoder-side tables (returning `Option`); derive `Clone` for encoder `HuffmanTable`.
Packaging metadata `zstd/Cargo.toml`	Adjust package `exclude` to `dict_tests/files/*` (stop excluding `dict_tests/`).

Sequence Diagram

sequenceDiagram
    participant User as "User"
    participant Dict as "Dictionary"
    participant Compressor as "FrameCompressor"
    participant MatchGen as "MatchGenerator"
    participant Matcher as "Matcher"

    User->>Dict: decode(bytes) / from_raw_content(id,content)
    Dict-->>User: Dictionary | Err

    User->>Compressor: set_dictionary(Dictionary)
    User->>Compressor: compress(data)
    Compressor->>Compressor: detect attached dictionary

    alt dictionary attached and matcher supports priming
        Compressor->>MatchGen: prime_with_dictionary(dict.content, dict.offset_hist)
        MatchGen->>Matcher: commit chunks / populate hash & chains
        MatchGen->>Matcher: apply offset_hist
        Compressor->>Compressor: seed previous Huffman & FSE tables
        Compressor->>Compressor: set FrameHeader.dictionary_id
    end

    Compressor-->>User: compressed bytes

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

perf(encoding): align fastest matcher with zstd fast path #39: Overlaps on matcher priming and offset-history handling in match_generator.rs.
feat(encoder): FSE table reuse and offset history optimization #33: Related work on seeding/reusing FSE and Huffman tables in the compressor state.
fix(encoding): implement default compression level #34: Related matcher initialization/reset and window-size semantics affecting priming logic.

Poem

🐰
I nibble bytes and line them neat,
I prime the matcher, set repeats complete.
I stash the tables, hum and hum,
Frames wear IDs before they run.
Tiny bytes compress — hop, yum!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat(encoding): add dictionary compression support' directly and clearly summarizes the main change: adding dictionary compression to the encoding module.
Linked Issues check	✅ Passed	All acceptance criteria from issue `#8` are met: FrameCompressor accepts dictionaries via set_dictionary/set_dictionary_from_bytes, dictionary ID is written to frame header, C zstd decompression is tested, dict_builder roundtrips are tested, and offset history/entropy tables are properly initialized.
Out of Scope Changes check	✅ Passed	All changes are directly related to implementing dictionary compression support as specified in issue `#8`. No unrelated modifications or scope creep detected.
Docstring Coverage	✅ Passed	Docstring coverage is 97.26% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/#8-dictionary-compression

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-03-28T19:56:00Z

Codecov Report

❌ Patch coverage is 96.07843% with 32 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
zstd/src/decoding/dictionary.rs	91.26%	9 Missing ⚠️
zstd/src/encoding/frame_compressor.rs	97.50%	9 Missing ⚠️
zstd/src/decoding/errors.rs	0.00%	6 Missing ⚠️
zstd/src/encoding/match_generator.rs	98.10%	6 Missing ⚠️
zstd/src/encoding/mod.rs	75.00%	1 Missing ⚠️
zstd/src/huff0/huff0_decoder.rs	94.11%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copilot

Pull request overview

Adds dictionary compression support to the encoder pipeline, enabling frames to be compressed with a provided Zstd dictionary and advertising the dictionary ID in the frame header for decoder interoperability.

Changes:

Extend the Matcher trait + default matcher to support priming matcher state from dictionary history/content.
Add dictionary attachment APIs to FrameCompressor and emit dictionary_id in the encoded frame header.
Add dictionary constructors/validation (including rejecting dictionary ID 0) and new regression tests for dictionary-compressed roundtrips + zstd-ffi interop.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
zstd/src/encoding/mod.rs	Adds `prime_with_dictionary` hook to the `Matcher` trait.
zstd/src/encoding/match_generator.rs	Implements dictionary priming for the default matcher backends and sets repeat-offset history.
zstd/src/encoding/frame_compressor.rs	Stores an attached dictionary, primes state per frame, writes dict ID in header, and adds dictionary compression tests.
zstd/src/decoding/errors.rs	Adds `ZeroDictionaryId` decode error variant and Display formatting.
zstd/src/decoding/dictionary.rs	Adds `Dictionary::from_raw_content` and rejects zero dictionary IDs during decode.

zstd/src/encoding/frame_compressor.rs

zstd/src/decoding/dictionary.rs

zstd/src/encoding/frame_compressor.rs

sw-release-bot

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'structured-zstd vs C FFI'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.15.

Benchmark suite	Current: `127d41d`	Previous: `29de24a`	Ratio
`compress/fastest/small-4k-log-lines/matrix/pure_rust`	`0.035` ms	`0.026` ms	`1.35`
`compress/fastest/high-entropy-1m/matrix/c_ffi`	`0.309` ms	`0.267` ms	`1.16`
`compress/default/low-entropy-1m/matrix/c_ffi`	`0.233` ms	`0.197` ms	`1.18`
`decompress/fastest/small-10k-random/rust_stream/matrix/c_ffi`	`0.002` ms	`0.001` ms	`2`
`decompress/default/small-10k-random/rust_stream/matrix/c_ffi`	`0.002` ms	`0.001` ms	`2`

This comment was automatically generated by workflow using github-action-benchmark.

CC: @polaz

- reject dictionary id 0 in FrameCompressor::set_dictionary - return explicit DictionaryDecodeError on undersized dictionary buffers - keep dict_tests assets in crate package so include_bytes tests compile downstream

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

zstd/src/decoding/dictionary.rs (1)
77-83: ⚠️ Potential issue | 🟠 Major

Critical: Initial offset_hist uses incorrect default value.

Line 82 initializes offset_hist to [2, 4, 8], but the ZSTD specification (RFC 8878 §3.1.2.5) mandates the default repeat offsets are [1, 4, 8]. This is inconsistent with:

from_raw_content at line 60 which uses [1, 4, 8]

DecoderScratch::new() in scratch.rs:44 which uses [1, 4, 8]

This value is overwritten later (lines 136-138) when parsing a valid dictionary, so it only affects the intermediate state. However, if any error path or future refactoring reads this before it's overwritten, it would produce incorrect offset decoding.
Proposed fix
         let mut new_dict = Dictionary {
             id: 0,
             fse: FSEScratch::new(),
             huf: HuffmanScratch::new(),
             dict_content: Vec::new(),
-            offset_hist: [2, 4, 8],
+            offset_hist: [1, 4, 8],
         };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@zstd/src/decoding/dictionary.rs` around lines 77 - 83, The Dictionary struct
is initialized with the wrong default repeat offsets; change the initial
offset_hist in the Dictionary::new-path (where Dictionary is constructed with
FSEScratch::new() and HuffmanScratch::new()) from [2, 4, 8] to the correct
RFC8878 defaults [1, 4, 8] so it matches from_raw_content and
DecoderScratch::new(), ensuring any interim/error-path reads see the correct
repeat offsets; update the offset_hist field in that Dictionary initialization
accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/decoding/dictionary.rs`:
- Around line 163-173: The test function
decode_dict_malformed_input_returns_error_instead_of_panicking contains assert!
macro calls with formatting that fails cargo fmt; reformat those assert!
invocations to adhere to rustfmt (put the condition and message on the same line
or follow rustfmt's preferred multi-line style) so the file passes cargo fmt;
locate the test function and update the two assert! calls around
Dictionary::decode_dict(&raw) and result.unwrap().is_err() (and any related
string literals referencing decode_dict or malformed dictionary messages) to be
formatted properly.

In `@zstd/src/encoding/frame_compressor.rs`:
- Around line 164-169: Add a short clarifying comment explaining why offset_hist
is assigned twice: once to FrameCompressor.state.offset_hist (used during
sequence encoding) and again via Matcher.prime_with_dictionary (which sets the
matcher's internal offset history used for match generation); place the comment
near the block in frame_compressor.rs where self.dictionary is primed (around
the self.state.offset_hist assignment and matcher.prime_with_dictionary call)
and mention both uses (sequence encoding vs match generation) and reference the
match_generator behavior so future maintainers understand the intentional dual
assignment.
- Around line 318-327: The method FrameCompressor::set_dictionary currently
panics via assert when dictionary.id == 0; change it to return a Result to match
set_dictionary_from_bytes so callers can handle the zero-ID error: replace the
assert in set_dictionary with a check that returns an Err variant (introduce or
reuse a DictionaryError/FrameCompressorError variant like InvalidDictionaryId)
when id == 0 and return Ok(previous_dictionary_opt) on success, and update
callers/tests accordingly to handle Result; keep the function name
set_dictionary and its return semantics (previous Option<Dictionary>) wrapped in
Result<Option<Dictionary>, YourErrorType> so API usage mirrors
set_dictionary_from_bytes.

---

Outside diff comments:
In `@zstd/src/decoding/dictionary.rs`:
- Around line 77-83: The Dictionary struct is initialized with the wrong default
repeat offsets; change the initial offset_hist in the Dictionary::new-path
(where Dictionary is constructed with FSEScratch::new() and
HuffmanScratch::new()) from [2, 4, 8] to the correct RFC8878 defaults [1, 4, 8]
so it matches from_raw_content and DecoderScratch::new(), ensuring any
interim/error-path reads see the correct repeat offsets; update the offset_hist
field in that Dictionary initialization accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 364b693b-b273-4d81-b744-5170a275b26f

📥 Commits

Reviewing files that changed from the base of the PR and between f45a5a8 and 373eef0.

📒 Files selected for processing (4)

zstd/Cargo.toml
zstd/src/decoding/dictionary.rs
zstd/src/decoding/errors.rs
zstd/src/encoding/frame_compressor.rs

zstd/src/decoding/dictionary.rs

zstd/src/encoding/frame_compressor.rs

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

zstd/src/decoding/dictionary.rs (1)
77-83: ⚠️ Potential issue | 🟡 Minor

Inconsistent default offset_hist initialization.

Line 82 initializes offset_hist to [2, 4, 8], but:

from_raw_content (line 60) uses [1, 4, 8]

CompressState in frame_compressor.rs (lines 108, 127, 163) uses [1, 4, 8]

RFC 8878 §3.1.2.5 specifies the default as [1, 4, 8]

This value gets overwritten by lines 136-138 for fully-parsed dictionaries, so it only affects the transient state. However, for consistency and correctness if the overwrite logic ever changes, this should match the RFC default.
Proposed fix
         let mut new_dict = Dictionary {
             id: 0,
             fse: FSEScratch::new(),
             huf: HuffmanScratch::new(),
             dict_content: Vec::new(),
-            offset_hist: [2, 4, 8],
+            offset_hist: [1, 4, 8],
         };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@zstd/src/decoding/dictionary.rs` around lines 77 - 83, The Dictionary
struct's transient default for offset_hist is inconsistent (currently [2, 4, 8])
and should match RFC 8878 and other code paths; change the default
initialization in Dictionary::new (the Dictionary literal created in the
constructor at the diff) to [1, 4, 8] so it matches from_raw_content and the
CompressState usage; ensure you update the offset_hist field in the Dictionary
instantiation (symbol: Dictionary, field: offset_hist) to [1, 4, 8] to keep
behavior consistent with RFC 8878 §3.1.2.5.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/encoding/frame_compressor.rs`:
- Around line 314-327: The method FrameCompressor::set_dictionary currently
asserts (panics) when dictionary.id == 0; change it to return a Result instead
to match set_dictionary_from_bytes and avoid panics for a public API: replace
the assert_ne! with an early Err return (e.g.,
Err(crate::error::InvalidDictionaryId) or an appropriate crate error type) and
change the signature to return Result<Option<crate::decoding::Dictionary>, _>,
keeping the existing behavior of self.dictionary.replace(dictionary) as
Ok(Some/None) on success; update callers/tests accordingly to handle the Result.

---

Outside diff comments:
In `@zstd/src/decoding/dictionary.rs`:
- Around line 77-83: The Dictionary struct's transient default for offset_hist
is inconsistent (currently [2, 4, 8]) and should match RFC 8878 and other code
paths; change the default initialization in Dictionary::new (the Dictionary
literal created in the constructor at the diff) to [1, 4, 8] so it matches
from_raw_content and the CompressState usage; ensure you update the offset_hist
field in the Dictionary instantiation (symbol: Dictionary, field: offset_hist)
to [1, 4, 8] to keep behavior consistent with RFC 8878 §3.1.2.5.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7c3e769b-37d3-496f-87aa-3b7424082751

📥 Commits

Reviewing files that changed from the base of the PR and between f45a5a8 and 05a712c.

📒 Files selected for processing (4)

zstd/Cargo.toml
zstd/src/decoding/dictionary.rs
zstd/src/decoding/errors.rs
zstd/src/encoding/frame_compressor.rs

zstd/src/encoding/frame_compressor.rs

- use RFC default repeat offsets [1,4,8] in decode_dict initialization - document intentional dual offset history priming in compressor state and matcher - document fail-fast zero-id contract for set_dictionary

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

zstd/src/encoding/frame_compressor.rs

zstd/Cargo.toml

- assert first set_dictionary_from_bytes insert returns None - explicitly discard optional previous dictionary in dict_builder roundtrip test - exclude dict_tests/files/** while keeping dict_tests/dictionary for include_bytes tests

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

zstd/src/decoding/dictionary.rs (1)
120-138: ⚠️ Potential issue | 🟠 Major

Reject zero repeat offsets during dictionary parsing.

The new length guard prevents the slice panic, but decode_dict() still accepts offset1..offset3 == 0 verbatim. Those values are copied straight into live decoder state by zstd/src/decoding/scratch.rs, and zstd/src/decoding/sequence_execution.rs treats them as repeat offsets (scratch[0] - 1 on one branch). A malformed dictionary can therefore survive parsing and only fail much later when it is actually used. Please validate all three parsed repeat offsets here and return a decode error if any are zero.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@zstd/src/decoding/dictionary.rs` around lines 120 - 138, In decode_dict
(dictionary.rs) validate the three parsed repeat offsets (offset1, offset2,
offset3) after converting from raw_tables and before assigning into
new_dict.offset_hist: if any offset == 0 return a DictionaryDecodeError
indicating an invalid/zero repeat offset (e.g.,
Err(DictionaryDecodeError::InvalidRepeatOffset { index: <0|1|2>, got: 0 }) or
the crate's closest error variant) instead of accepting them verbatim; this
prevents zero offsets from being copied into new_dict.offset_hist and later used
by scratch.rs / sequence_execution.rs.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/encoding/frame_compressor.rs`:
- Around line 164-170: When a dictionary is present, you must also restore the
encoder's previous entropy tables so the first block can use the "repeat
previous table" path: after priming offsets in the branch that checks
self.dictionary.as_ref(), detect whether dict.huf and dict.fse are populated
and, if so, set self.state.last_huff_table to the dictionary's Huffman tables
and set self.state.fse_tables.ll_previous, self.state.fse_tables.ml_previous,
and self.state.fse_tables.of_previous from dict.fse (or the corresponding fields
on dict) so the encoder's previous-table state matches the parsed dictionary
entropy tables.

---

Outside diff comments:
In `@zstd/src/decoding/dictionary.rs`:
- Around line 120-138: In decode_dict (dictionary.rs) validate the three parsed
repeat offsets (offset1, offset2, offset3) after converting from raw_tables and
before assigning into new_dict.offset_hist: if any offset == 0 return a
DictionaryDecodeError indicating an invalid/zero repeat offset (e.g.,
Err(DictionaryDecodeError::InvalidRepeatOffset { index: <0|1|2>, got: 0 }) or
the crate's closest error variant) instead of accepting them verbatim; this
prevents zero offsets from being copied into new_dict.offset_hist and later used
by scratch.rs / sequence_execution.rs.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f5742492-88f2-444d-9408-59e1818f8aae

📥 Commits

Reviewing files that changed from the base of the PR and between 05a712c and 9093834.

📒 Files selected for processing (3)

zstd/Cargo.toml
zstd/src/decoding/dictionary.rs
zstd/src/encoding/frame_compressor.rs

zstd/src/encoding/frame_compressor.rs

- restore previous Huffman/FSE encoder tables from parsed dictionaries before first block - convert decoder-side entropy tables into encoder tables for dictionary priming - reject zero repeat offsets during dictionary parsing with explicit decode error - add regression tests for entropy seeding and zero-repeat-offset rejection

zstd/src/encoding/frame_compressor.rs

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

zstd/src/encoding/frame_compressor.rs

- precompute decoder->encoder entropy conversions when dictionary is set - reuse cached tables across compress() calls to avoid per-frame rebuild - keep explicit fail-fast comment for zero dictionary id API contract - derive Clone for encoder HuffmanTable to support cache reuse

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

zstd/src/encoding/match_generator.rs

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

zstd/src/encoding/frame_compressor.rs

zstd/src/encoding/match_generator.rs

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.

- clarify set_dictionary docs for uncompressed and non-priming matchers - track retained dictionary budget separately from advertised live window - shrink matcher capacity as primed dictionary bytes are evicted - add regression tests for simple and dfast budget retirement

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

zstd/src/encoding/frame_compressor.rs

- seed huffman table directly via Option::clone_from from cached entropy - cache FSE previous tables as PreviousFseTable to avoid per-frame reboxing - remove temporary clone/map allocations in dictionary seeding path

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

zstd/src/encoding/match_generator.rs

zstd/src/encoding/frame_compressor.rs

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/encoding/frame_compressor.rs`:
- Around line 196-209: The current seeding creates an extra temporary by calling
cached_entropy.and_then(|cache| cache.ll_previous.clone()) before clone_from;
instead, avoid the intermediate clone by matching borrowed cached_entropy and
passing a reference to the inner Option to clone_from: e.g. if let Some(cache) =
&cached_entropy {
self.state.fse_tables.ll_previous.clone_from(&cache.ll_previous); } else {
self.state.fse_tables.ll_previous.clone_from(&None); } and do the same for
ml_previous and of_previous so clone_from operates on a borrowed Option
reference rather than a freshly cloned value.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 206b40f6-cc75-4c1d-8bb3-91e38f82131c

📥 Commits

Reviewing files that changed from the base of the PR and between 5da6696 and 702c64d.

📒 Files selected for processing (1)

zstd/src/encoding/frame_compressor.rs

zstd/src/encoding/frame_compressor.rs

- keep dfast eviction callbacks on logical slice length, not vec capacity - add regression tests for add_data/trim_to_window eviction length semantics - remove intermediate Option clones in FSE dictionary seeding path

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

zstd/src/encoding/frame_compressor.rs

polaz · 2026-03-29T14:56:52Z

@coderabbitai re-review

coderabbitai · 2026-03-29T14:57:03Z

@polaz Sure, I'll re-review the PR now!

🧠 Learnings used

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 44
File: zstd/src/encoding/frame_compressor.rs:0-0
Timestamp: 2026-03-28T22:55:01.577Z
Learning: In `zstd/src/encoding/frame_compressor.rs`, `FrameCompressor::set_dictionary` returns `Result<Option<Dictionary>, DictionaryDecodeError>` and enforces two fail-fast programmer-error contracts via `Err(...)` rather than panics: (1) `dictionary.id == 0` returns `DictionaryDecodeError::ZeroDictionaryId`; (2) any zero entry in `dictionary.offset_hist` returns `DictionaryDecodeError::ZeroRepeatOffsetInDictionary`. The fallible parsing paths (`Dictionary::decode_dict`, `Dictionary::from_raw_content`) apply the same rejections at input boundaries. Do not flag these as inconsistencies — the uniform `Result`-based contract is intentional.

✅ Actions performed

Full review triggered.

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/decoding/dictionary.rs`:
- Around line 233-245: Update the test decode_dict_rejects_zero_repeat_offsets
to assert the exact error variant instead of just is_err(): call
Dictionary::decode_dict(&raw), unwrap the Err and match it against
DictionaryDecodeError::ZeroRepeatOffsetInDictionary { index: 0 } (or use
assert_eq! on the error) so the test fails for unrelated parse errors; reference
the existing test name, Dictionary::decode_dict, offset_history_start, and the
enum variant DictionaryDecodeError::ZeroRepeatOffsetInDictionary when making the
change.
- Around line 43-61: The from_raw_content constructor currently accepts an empty
dict_content which yields a dictionary with no usable primed entropy; add a
guard at the start of Dictionary::from_raw_content to reject empty content (e.g.
check dict_content.is_empty()) and return an appropriate DictionaryDecodeError
variant (add a new variant like EmptyDictionaryDecode or reuse an existing
suitable error) so callers cannot create empty raw-content dictionaries; update
any error enum and tests accordingly.

In `@zstd/src/encoding/frame_compressor.rs`:
- Around line 624-678: The test
dictionary_compression_roundtrips_with_dict_builder_dictionary currently only
verifies correctness and large-payload compression; add a no-dictionary baseline
and an assertion that the dict-trained compressor actually reduces size on a
small input: create a small representative payload (1–10 KB, e.g. take first N
bytes of payload or build a shorter payload) and compress it twice — once with a
FrameCompressor without calling set_dictionary and once with
set_dictionary(encoder_dict) — then compare the resulting compressed buffer
lengths and assert that the dict-compressed length is strictly smaller than the
no-dict compressed length; use the existing symbols FrameCompressor,
set_dictionary, set_source, set_drain, and compress and keep the
roundtrip/decoder checks unchanged.
- Around line 681-711: The assertions currently run after compressor.compress(),
which only checks the final encoder state; instead verify seeding immediately
after setting the dictionary and source but before any block is encoded: after
calling FrameCompressor::new(...), compressor.set_dictionary_from_bytes(...),
compressor.set_source(...), and compressor.set_drain(...), move the assertions
that check compressor.state.last_huff_table and
compressor.state.fse_tables.{ll_previous,ml_previous,of_previous} to run before
compressor.compress(); this ensures you assert the tables were seeded prior to
the first block being encoded.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: c996e1dc-f635-412b-a6cd-d76e2a7f8301

📥 Commits

Reviewing files that changed from the base of the PR and between 29de24a and f11ac36.

📒 Files selected for processing (9)

zstd/Cargo.toml
zstd/src/decoding/dictionary.rs
zstd/src/decoding/errors.rs
zstd/src/encoding/frame_compressor.rs
zstd/src/encoding/match_generator.rs
zstd/src/encoding/mod.rs
zstd/src/fse/fse_decoder.rs
zstd/src/huff0/huff0_decoder.rs
zstd/src/huff0/huff0_encoder.rs

zstd/src/decoding/dictionary.rs

zstd/src/encoding/frame_compressor.rs

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.

polaz · 2026-03-29T16:39:08Z

@coderabbitai re-review

coderabbitai · 2026-03-29T16:39:20Z

@polaz Sure, I'll re-review the PR now!

🧠 Learnings used

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 44
File: zstd/src/encoding/frame_compressor.rs:0-0
Timestamp: 2026-03-28T22:55:01.577Z
Learning: In `zstd/src/encoding/frame_compressor.rs`, `FrameCompressor::set_dictionary` returns `Result<Option<Dictionary>, DictionaryDecodeError>` and enforces two fail-fast programmer-error contracts via `Err(...)` rather than panics: (1) `dictionary.id == 0` returns `DictionaryDecodeError::ZeroDictionaryId`; (2) any zero entry in `dictionary.offset_hist` returns `DictionaryDecodeError::ZeroRepeatOffsetInDictionary`. The fallible parsing paths (`Dictionary::decode_dict`, `Dictionary::from_raw_content`) apply the same rejections at input boundaries. Do not flag these as inconsistencies — the uniform `Result`-based contract is intentional.

✅ Actions performed

Full review triggered.

Copilot AI review requested due to automatic review settings March 28, 2026 19:54

Copilot started reviewing on behalf of polaz March 28, 2026 19:55 View session

Copilot AI reviewed Mar 28, 2026

View reviewed changes

zstd/src/encoding/frame_compressor.rs Outdated Show resolved Hide resolved

zstd/src/decoding/dictionary.rs Show resolved Hide resolved

zstd/src/encoding/frame_compressor.rs Show resolved Hide resolved

sw-release-bot bot reviewed Mar 28, 2026

View reviewed changes

polaz added 2 commits March 28, 2026 22:13

fix(encoding): harden dictionary input handling

373eef0

- reject dictionary id 0 in FrameCompressor::set_dictionary - return explicit DictionaryDecodeError on undersized dictionary buffers - keep dict_tests assets in crate package so include_bytes tests compile downstream

style(tests): format dictionary panic regression asserts

05a712c

coderabbitai bot reviewed Mar 28, 2026

View reviewed changes

zstd/src/decoding/dictionary.rs Show resolved Hide resolved

zstd/src/encoding/frame_compressor.rs Outdated Show resolved Hide resolved

zstd/src/encoding/frame_compressor.rs Show resolved Hide resolved

polaz requested a review from Copilot March 28, 2026 20:19

Copilot started reviewing on behalf of polaz March 28, 2026 20:19 View session

coderabbitai bot reviewed Mar 28, 2026

View reviewed changes

zstd/src/encoding/frame_compressor.rs Show resolved Hide resolved

fix(dict): align defaults and clarify dictionary priming

dc4ec49

- use RFC default repeat offsets [1,4,8] in decode_dict initialization - document intentional dual offset history priming in compressor state and matcher - document fail-fast zero-id contract for set_dictionary

Copilot AI reviewed Mar 28, 2026

View reviewed changes

zstd/src/encoding/frame_compressor.rs Outdated Show resolved Hide resolved

zstd/src/encoding/frame_compressor.rs Outdated Show resolved Hide resolved

zstd/src/encoding/frame_compressor.rs Outdated Show resolved Hide resolved

zstd/Cargo.toml Outdated Show resolved Hide resolved

coderabbitai bot reviewed Mar 28, 2026

View reviewed changes

zstd/src/encoding/frame_compressor.rs Outdated Show resolved Hide resolved

polaz commented Mar 28, 2026

View reviewed changes

zstd/src/encoding/frame_compressor.rs Outdated Show resolved Hide resolved

polaz requested a review from Copilot March 28, 2026 21:20

Copilot started reviewing on behalf of polaz March 28, 2026 21:21 View session

Copilot AI reviewed Mar 28, 2026

View reviewed changes

zstd/src/encoding/frame_compressor.rs Outdated Show resolved Hide resolved

zstd/src/encoding/frame_compressor.rs Outdated Show resolved Hide resolved

polaz requested a review from Copilot March 28, 2026 21:38

Copilot started reviewing on behalf of polaz March 28, 2026 21:38 View session

Copilot AI reviewed Mar 28, 2026

View reviewed changes

zstd/src/encoding/match_generator.rs Show resolved Hide resolved

fix(encoding): harden dictionary priming suffix reuse

7d8a5f0

polaz requested a review from Copilot March 28, 2026 21:51

Copilot started reviewing on behalf of polaz March 28, 2026 21:52 View session

Copilot AI reviewed Mar 28, 2026

View reviewed changes

zstd/src/encoding/frame_compressor.rs Outdated Show resolved Hide resolved

zstd/src/encoding/frame_compressor.rs Outdated Show resolved Hide resolved

zstd/src/encoding/match_generator.rs Show resolved Hide resolved

Copilot AI reviewed Mar 29, 2026

View reviewed changes

polaz requested a review from Copilot March 29, 2026 14:03

Copilot started reviewing on behalf of polaz March 29, 2026 14:03 View session

Copilot AI reviewed Mar 29, 2026

View reviewed changes

zstd/src/encoding/frame_compressor.rs Outdated Show resolved Hide resolved

zstd/src/encoding/frame_compressor.rs Outdated Show resolved Hide resolved

perf(encoding): avoid extra entropy-table cloning

702c64d

- seed huffman table directly via Option::clone_from from cached entropy - cache FSE previous tables as PreviousFseTable to avoid per-frame reboxing - remove temporary clone/map allocations in dictionary seeding path

polaz requested a review from Copilot March 29, 2026 14:29

Copilot started reviewing on behalf of polaz March 29, 2026 14:30 View session

Copilot AI reviewed Mar 29, 2026

View reviewed changes

zstd/src/encoding/match_generator.rs Show resolved Hide resolved

zstd/src/encoding/frame_compressor.rs Outdated Show resolved Hide resolved

coderabbitai bot reviewed Mar 29, 2026

View reviewed changes

zstd/src/encoding/frame_compressor.rs Outdated Show resolved Hide resolved

fix(encoding): correct dfast eviction accounting

3705230

- keep dfast eviction callbacks on logical slice length, not vec capacity - add regression tests for add_data/trim_to_window eviction length semantics - remove intermediate Option clones in FSE dictionary seeding path

polaz requested a review from Copilot March 29, 2026 14:47

Copilot started reviewing on behalf of polaz March 29, 2026 14:47 View session

Copilot AI reviewed Mar 29, 2026

View reviewed changes

zstd/src/encoding/frame_compressor.rs Show resolved Hide resolved

fix(encoding): preserve entropy tables for clone_from reuse

f11ac36

polaz requested a review from Copilot March 29, 2026 14:56

Copilot started reviewing on behalf of polaz March 29, 2026 14:57 View session

Copilot AI reviewed Mar 29, 2026

View reviewed changes

coderabbitai bot reviewed Mar 29, 2026

View reviewed changes

zstd/src/decoding/dictionary.rs Show resolved Hide resolved

zstd/src/decoding/dictionary.rs Show resolved Hide resolved

zstd/src/encoding/frame_compressor.rs Show resolved Hide resolved

zstd/src/encoding/frame_compressor.rs Show resolved Hide resolved

fix(dictionary): reject empty raw dictionaries

127d41d

polaz requested a review from Copilot March 29, 2026 15:39

Copilot started reviewing on behalf of polaz March 29, 2026 15:39 View session

Copilot AI reviewed Mar 29, 2026

View reviewed changes

polaz merged commit 13b0866 into main Mar 29, 2026
14 of 15 checks passed

polaz deleted the feat/#8-dictionary-compression branch March 29, 2026 16:52

sw-release-bot bot mentioned this pull request Mar 28, 2026

chore: release v0.0.4 #40

Open

Conversation

polaz commented Mar 28, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

codecov bot commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sw-release-bot bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

⚠️ Performance Alert ⚠️

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

polaz commented Mar 28, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 28, 2026 •

edited

Loading

codecov bot commented Mar 28, 2026 •

edited

Loading

sw-release-bot bot left a comment •

edited

Loading