feat(encoder): FSE table reuse and offset history optimization by polaz · Pull Request #33 · structured-world/structured-zstd

polaz · 2026-03-25T22:11:40Z

Summary

Implement repeat offset encoding (codes 1/2/3) with offset history tracking across blocks, matching the decoder's RFC 8878 §3.1.2.5 behavior
Replace hardcoded FSE table selection with a cost-based heuristic that charges the exact serialized table-header bit cost and builds candidate tables directly from symbol counts
Keep FSE Repeat mode interoperable by tracking the actual last-used table for each symbol type across Encoded, Predefined, and Repeat modes, and resetting that state at frame boundaries
Raise the touched crates to Rust 2024 edition syntax while following the latest stable Rust by default and keeping rust-version = "1.92" / CI msrv on 1.92.0

Technical Details

encode_offset_with_history() converts absolute offsets into repeat-offset codes, including the shifted zero-literal mapping (1 -> rep[1], 2 -> rep[2], 3 -> rep[0]-1)
choose_table() now compares Encoded vs Predefined vs Repeat using exact table_header_bits() accounting, avoids selecting unencodable Repeat/Predefined candidates, and falls back to a safe Encoded table when zero-cost candidates cannot represent the stream
After each compressed block, ll_previous / ml_previous / of_previous are updated to the actual table used by the block so the next Repeat mode matches decoder state; FrameCompressor::compress() clears that state at the start of each new frame
rust-toolchain.toml now follows stable for default local/CI use, while Cargo manifests keep rust-version = "1.92" and the dedicated CI msrv job verifies 1.92.0 explicitly
CI installs i686-unknown-linux-gnu explicitly in the cross job so the 32-bit target remains available independently of the default toolchain selection
Added regression coverage for exact FSE header sizing, repeat-offset mapping, multi-block interop, reused FrameCompressor state, and exact display-message text for touched decoder errors

Test Plan

cargo nextest run --workspace
cargo clippy --workspace --all-targets --features hash,std,dict_builder -- -D warnings
cargo fmt --all -- --check

Closes #17

Summary by CodeRabbit

New Features
- Improved compression for repeat-offset-heavy inputs via offset-history-aware encoding and smarter FSE table selection.
- More reliable reuse/selection of previously chosen encoding tables across blocks/frames.
Bug Fixes
- Fixed an out-of-bounds condition in FSE table serialization and strengthened header/encoding assertions.
Tests
- Added extensive roundtrip and cross-validation tests (Rust↔FFI), plus new unit tests for table/header/offset behaviors.
Chores
- Updated projects to Rust 2024 edition, added a toolchain file, and CI/workflow tweaks.

- Implement repeat offset encoding (codes 1/2/3) with offset history tracking across blocks, matching decoder's RFC 8878 §3.1.2.5 logic - Replace hardcoded FSE table selection with cost-based heuristic that compares new table (with header cost) vs predefined vs previous table - Add offset_hist [u32; 3] to CompressState, initialized per RFC 8878 - Add symbol_probability() and num_symbols() accessors to FSETable - Add 5 regression tests covering repeat offsets, zero-ll sequences, multi-block persistence, compression ratio, and C zstd interop Closes #17

coderabbitai · 2026-03-25T22:12:00Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Implements per‑frame RFC‑8878 repeat‑offset encoding with in‑place offset history, FSE table reuse/selection using full 256‑symbol histograms and exact header‑bit cost, persists last‑used tables across blocks, introduces helper APIs/tests, and bumps crate/tooling to Rust 2024.

Changes

Cohort / File(s)	Summary
Block encoding & FSE selection `zstd/src/encoding/blocks/compressed.rs`	Replace simple offset arithmetic with `encode_offset_with_history(...)`; build 256‑symbol histograms, estimate dynamic table + header cost, choose among RepeatLast/Predefined/Encoded, always persist last‑used tables, add helpers and unit tests, rename `RepeateLast`→`RepeatLast`.
Frame compressor state `zstd/src/encoding/frame_compressor.rs`	Introduce `PreviousFseTable` (`Default`/`Custom(Box<FSETable>)`), change `*_previous` fields to `Option<PreviousFseTable>`, add `as_table(...)`, add `CompressState::offset_hist: [u32;3]` initialized to `[1,4,8]`, and reset previous tables & offset history on rerun.
FSE encoder internals & tests `zstd/src/fse/fse_encoder.rs`, `zstd/src/fse/mod.rs`	Add `symbol_probability` and `table_header_bits`, strengthen `write_table` with alignment checks and debug header-bit verification, fix zero‑run bound, add `build_table_from_symbol_counts`, and unit test validating header bit counting.
Roundtrip & cross‑validation tests `zstd/src/tests/roundtrip_integrity.rs`, `zstd/tests/cross_validation.rs`, `zstd/src/tests/mod.rs`	New streaming helper, repeat‑offset fixtures, multi‑block/repeat/zero‑LL tests, compressor‑reuse tests, and 4 Rust↔C FFI cross‑validation tests; minor import/assert tweaks.
Ringbuffer, decoding & error messages `zstd/src/decoding/ringbuffer.rs`, `zstd/src/decoding/block_decoder.rs`, `zstd/src/decoding/errors.rs`	Wrap inner `unsafe` blocks with SAFETY comments, rename/swap copy helper implementations (`copy_with_checks`/`copy_with_nobranch_check`), add correctness tests for copy paths, reformat and tighten Display messages, and add unit tests for error strings.
Smaller refactors & formatting-only edits `zstd/src/huff0/huff0_encoder.rs`, `zstd/src/dictionary/mod.rs`, `zstd/src/encoding/frame_header.rs`, `zstd/src/encoding/levels/fastest.rs`, `zstd/src/encoding/match_generator.rs`, `zstd/src/bit_io/bit_writer.rs`, `zstd/benches/...`	Iterator/return simplification, removal of debug wrapper, conditional reformatting, import reorderings, minor test expression tweaks, and formatting-only panic/assert rewrites.
Build/tooling & CI `cli/Cargo.toml`, `zstd/Cargo.toml`, `rust-toolchain.toml`, `.github/workflows/ci.yml`, `.github/copilot-instructions.md`	Bump crate editions to 2024, add `rust-toolchain.toml` (stable + clippy/rustfmt), adjust CI cross‑target step and msrv pin, and narrow clippy instruction scope.

Sequence Diagram(s)

sequenceDiagram
    participant FC as FrameCompressor
    participant CS as CompressState
    participant BE as BlockEncoder
    participant TS as TableSelector
    participant FSE as FSEModule
    participant OE as OffsetEncoder

    FC->>CS: reset offset_hist = [1,4,8] & clear previous tables
    FC->>BE: start block encoding
    BE->>CS: read offset_hist & previous tables
    BE->>OE: encode_offset_with_history(actual_offset, ll, &mut offset_hist)
    OE-->>CS: update/rotate offset_hist
    BE->>TS: provide 256-symbol counts (LL/ML/OF)
    TS->>FSE: build_table_from_symbol_counts / table_header_bits()
    FSE-->>TS: candidate table + header bit cost
    TS-->>BE: choose mode (RepeatLast / Predefined / Encoded)
    BE->>CS: remember_last_used_tables(chosen_tables)
    BE-->>FC: emit encoded block

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 I hopped through bits and memory lanes,

Rep offsets tucked in gentle chains,
Tables lingered, learned, then stayed,
Header bits counted, true not swayed,
Frames now snug — a compact art.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly summarizes the main changes: FSE table reuse and offset history optimization, which aligns with the primary objectives of implementing cost-based FSE table selection and offset history tracking across blocks.
Linked Issues check	✅ Passed	All core requirements from issue `#17` are met: FSE tables are reused via cost-based selection using exact table-header bit costs [blocks/compressed.rs], offset history is maintained and updated across blocks [frame_compressor.rs], repeat-offset encoding uses codes 1/2/3 [encode_offset_with_history], and crates are updated to Rust 2024 edition [Cargo.toml files]. Comprehensive test coverage validates the implementation.
Out of Scope Changes check	✅ Passed	All changes are directly related to the stated objectives: encoder FSE/offset optimizations, edition upgrades, CI/toolchain standardization, and test infrastructure. Refactoring in ringbuffer.rs and error message improvements are supporting quality improvements aligned with the feature scope.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/#17-feat-fse-table-reuse-and-offset-history-optimizati

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-03-25T22:13:12Z

Codecov Report

❌ Patch coverage is 97.50693% with 18 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
zstd/src/decoding/ringbuffer.rs	97.69%	6 Missing ⚠️
zstd/src/bit_io/bit_writer.rs	71.42%	4 Missing ⚠️
zstd/src/encoding/blocks/compressed.rs	98.85%	3 Missing ⚠️
zstd/src/decoding/block_decoder.rs	0.00%	2 Missing ⚠️
zstd/src/dictionary/mod.rs	0.00%	2 Missing ⚠️
zstd/src/encoding/frame_compressor.rs	90.90%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

The `--all-features` flag includes `rustc-dep-of-std` which pulls in `compiler_builtins` — an internal feature for Rust stdlib builds that fails on stable. Align with CI which already uses explicit features.

Copilot

Pull request overview

This PR improves encoder compression efficiency and interoperability by adding Zstd repeat-offset encoding (rep codes 1/2/3) with cross-block offset history, and by enabling FSE table reuse via a cost-based selection heuristic.

Changes:

Add repeat offset encoding with [rep0, rep1, rep2] history persisted across blocks and reset per frame.
Replace fixed FSE table selection with a heuristic that compares estimated encoding cost + header overhead vs predefined vs repeat-last table reuse.
Add regression and cross-validation tests for repeat offsets, multi-block persistence, and C zstd interop.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
zstd/src/encoding/blocks/compressed.rs	Implements offset history encoding and cost-based FSE table mode selection (encoded vs predefined vs repeat-last).
zstd/src/encoding/frame_compressor.rs	Adds `offset_hist` to compression state and resets it at frame start.
zstd/src/fse/fse_encoder.rs	Exposes table probability/introspection helpers used by the new heuristic.
zstd/src/tests/roundtrip_integrity.rs	Adds roundtrip tests for repeat offsets, multi-block behavior, and zero-literal-length sequences.
zstd/tests/cross_validation.rs	Adds Rust-compress → C-decompress cross-validation for repeat offsets and multi-block history.

Comments suppressed due to low confidence (1)

zstd/src/encoding/blocks/compressed.rs:63

choose_table() treats state.fse_tables.*_previous as “the table that Repeat will reuse”, but *_previous is only set when an Encoded table is used. If a block selects Predefined tables, *_previous will still point at an older encoded table; a later block may pick Repeat and encode with that stale table even though the decoder will repeat the immediately preceding predefined table, causing decode failure. Consider clearing *_previous when a block uses Predefined (or otherwise ensure *_previous always matches the last table actually used in the prior block).

        let ll_mode = choose_table(
            state.fse_tables.ll_previous.as_ref(),
            &state.fse_tables.ll_default,
            sequences.iter().map(|seq| encode_literal_length(seq.ll).0),
            9,

zstd/src/encoding/frame_compressor.rs

zstd/src/encoding/blocks/compressed.rs

zstd/src/tests/roundtrip_integrity.rs

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/copilot-instructions.md:
- Line 13: The Clippy instruction line lacks an explanation for excluding the
fuzz_exports crate; update the .github/copilot-instructions.md Clippy bullet
(the line mentioning `cargo clippy -p structured-zstd ...`) to add a brief
rationale for excluding `fuzz_exports` (e.g., "fuzz_exports is excluded because
fuzzing harnesses intentionally use patterns or unsafe code not meant for
regular linting"), referencing the existing "rustc-dep-of-std" exclusion text
for format and consistency; ensure the new sentence mentions `fuzz_exports` by
name so contributors understand the scope.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0250aeaf-34df-4710-bbb6-f81421b219ba

📥 Commits

Reviewing files that changed from the base of the PR and between 2a27161 and f1e162e.

📒 Files selected for processing (1)

.github/copilot-instructions.md

.github/copilot-instructions.md

- charge exact FSE table header cost in table selection - store the actual last-used tables across blocks and reset them per frame - add interop and regression coverage for multi-block and reuse cases Closes #17

- add a narrow regression for remembering last-used FSE tables - remove extra copying from reused FrameCompressor coverage - fix workspace lint noise in existing test helpers Closes #17

- cover Huffman-heavy seed=100 in ffi-to-rust direction - cover repeat-offset-friendly inputs in ffi-to-rust direction Closes #17

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

zstd/src/encoding/blocks/compressed.rs

zstd/src/fse/fse_encoder.rs

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/encoding/blocks/compressed.rs`:
- Around line 160-171: The dynamic-table candidate must be skipped for
single-symbol distributions: after computing max_symbol (from counts and
rposition) check if max_symbol == 0 (i.e., only symbol 0 may be present) and in
that case do not call build_table_from_symbol_counts/build_table_from_counts;
instead treat the dynamic candidate as unavailable by setting new_total_cost to
a very large value (e.g., usize::MAX) so the code falls back to
Repeat/Predefined. Update the code paths that reference new_table,
new_encoding_cost, and new_header_cost so they are only computed when max_symbol
> 0 (keep estimate_encoding_cost and table_header_bits usage unchanged
otherwise).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 53718647-1aa2-4213-b4ae-4ce06e839fb1

📥 Commits

Reviewing files that changed from the base of the PR and between f1e162e and 5468134.

📒 Files selected for processing (7)

.github/copilot-instructions.md
zstd/src/encoding/blocks/compressed.rs
zstd/src/encoding/frame_compressor.rs
zstd/src/fse/fse_encoder.rs
zstd/src/fse/mod.rs
zstd/src/tests/roundtrip_integrity.rs
zstd/tests/cross_validation.rs

zstd/src/encoding/blocks/compressed.rs

- cover choose_table on a single emitted symbol - reproduce the dynamic-table panic before the fix Closes #17

- skip the dynamic-table path for single-symbol distributions - reduce repeat-table persistence overhead for encoded modes - cache debug table-header verification in write_table Closes #17

coderabbitai

♻️ Duplicate comments (1)

zstd/src/encoding/blocks/compressed.rs (1)
161-172: ⚠️ Potential issue | 🔴 Critical

Single-symbol distributions can still trigger a panic.

The previous review flagged that when all emitted codes are the same symbol (e.g., max_symbol == 0), build_table_from_symbol_counts receives a single-element slice and may panic during internal redistribution logic. This appears unaddressed in the current code.

Consider guarding this path by checking if only one distinct symbol exists before building the dynamic table candidate, falling back to Predefined/Repeat when the data is single-symbol.
,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@zstd/src/encoding/blocks/compressed.rs` around lines 161 - 172, The code
builds a dynamic table even when there is only one distinct symbol, which can
panic inside build_table_from_symbol_counts; add a guard using the computed
max_symbol (or check counts[..=max_symbol].len() == 1) before calling
build_table_from_symbol_counts in this block and skip building the dynamic
candidate for single-symbol distributions—set
new_encoding_cost/new_header_cost/new_total_cost to usize::MAX (or otherwise
force selection of Predefined/Repeat) so the single-symbol path doesn't attempt
redistribution and avoids the panic.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@zstd/src/encoding/blocks/compressed.rs`:
- Around line 161-172: The code builds a dynamic table even when there is only
one distinct symbol, which can panic inside build_table_from_symbol_counts; add
a guard using the computed max_symbol (or check counts[..=max_symbol].len() ==
1) before calling build_table_from_symbol_counts in this block and skip building
the dynamic candidate for single-symbol distributions—set
new_encoding_cost/new_header_cost/new_total_cost to usize::MAX (or otherwise
force selection of Predefined/Repeat) so the single-symbol path doesn't attempt
redistribution and avoids the panic.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0003c641-ef3c-429d-9726-6d48761907a9

📥 Commits

Reviewing files that changed from the base of the PR and between 5468134 and 50812ea.

📒 Files selected for processing (5)

cli/src/progress.rs
zstd/src/encoding/blocks/compressed.rs
zstd/src/tests/mod.rs
zstd/src/tests/roundtrip_integrity.rs
zstd/tests/cross_validation.rs

- document why choose_table keeps exact cost comparison - explain why only the degenerate single-symbol case is short-circuited Closes #17

polaz · 2026-03-26T00:25:50Z

@coderabbitai Re duplicate single-symbol panic finding in zstd/src/encoding/blocks/compressed.rs: fixed in commits a154c95 and 9dd75e3. choose_table() now skips the dynamic-table candidate for single-symbol distributions, and choose_table_handles_single_symbol_distribution covers the regression.

coderabbitai · 2026-03-26T00:26:04Z

Tip

For best results, initiate chat on the files or code changes.

@polaz Thanks for the update! Confirmed — skipping the dynamic-table candidate in choose_table() for single-symbol distributions is the correct fix, and having choose_table_handles_single_symbol_distribution as a dedicated regression test ensures this path stays covered going forward.

🧠 Learnings used

Learnt from: CR
Repo: structured-world/structured-zstd PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-25T04:29:01.996Z
Learning: Avoid suggesting architectural changes that diverge too far from upstream (ruzstd/KillingSpark/zstd-rs fork)

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/encoding/blocks/compressed.rs`:
- Around line 173-213: The code currently skips creating a mode-1/RLE candidate
when distinct_symbols == 1, so one-symbol streams are forced to
Predefined/Repeat and lose the optimal RLE path; fix by explicitly constructing
an RLE candidate when distinct_symbols == 1 (e.g. call the existing
RLE/table-builder helper or add a new build_rle_table_from_symbol_counts),
compute its cost using estimate_encoding_cost(...) plus any header cost
(table_header_bits() or zero if headerless), include that candidate in the cost
comparison along with new_table, predefined_cost, and previous_cost, and return
FseTableMode::Rle(...) when it is cheapest (add/use the appropriate FseTableMode
variant for RLE instead of falling back to Predefined).

In `@zstd/src/fse/fse_encoder.rs`:
- Around line 152-198: table_header_bits() currently assumes the header starts
at a byte boundary which mismatches write_table() that pads relative to the
writer's current bit index; this can produce off-by-up-to-7-bit estimates and
trigger the debug assertion. Fix by making table_header_bits accept a
starting_bit_offset (0..7) or a boolean assert_aligned flag: if you add
starting_bit_offset, incorporate it into the final byte-alignment calculation
(use (bits + starting_bit_offset) % 8 to compute padding and return the total
bits including that offset) and update all callers (and tests) to pass the
writer's current bit index mod 8; alternatively, if you prefer asserting
alignment, add an assertion at the start of table_header_bits that the caller is
byte-aligned (starting_bit_offset == 0) and document that requirement. Ensure
the change is mirrored where table_header_bits is used and also apply identical
logic to the other helper noted around the 200-250 range.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f1471ea5-79bf-47ce-ab69-d6b0ea89a14a

📥 Commits

Reviewing files that changed from the base of the PR and between 50812ea and dd48ad3.

📒 Files selected for processing (2)

zstd/src/encoding/blocks/compressed.rs
zstd/src/fse/fse_encoder.rs

zstd/src/encoding/blocks/compressed.rs

zstd/src/fse/fse_encoder.rs

…bol streams stay on predefined/repeat paths until sequence-section RLE exists\n- assert byte-aligned FSE table header writes and document the matching size contract\n\nCloses #17

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

zstd/src/encoding/blocks/compressed.rs

…epeat mode has no previous table\n- store previous default/custom tables without cloning full defaults each block\n- raise crates to edition 2024 and fix resulting clippy/unsafe issues\n\nCloses #17

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

zstd/src/encoding/frame_compressor.rs (1)
3-9: ⚠️ Potential issue | 🟡 Minor

Fix import ordering to pass cargo fmt.

The CI pipeline reports that cargo fmt --all -- --check failed due to formatting differences in the import block. Run cargo fmt to fix the import ordering.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@zstd/src/encoding/frame_compressor.rs` around lines 3 - 9, The import block
is out of rustfmt-expected order causing CI failure; fix it by running rustfmt
(cargo fmt) or manually reorder the use statements so they follow rustfmt's
canonical grouping/ordering (e.g., alloc::{boxed::Box, vec::Vec},
core::convert::TryInto, #[cfg(feature = "hash")] use twox_hash::XxHash64; and
#[cfg(feature = "hash")] use core::hash::Hasher;) so rustfmt no longer reports
differences.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/decoding/ringbuffer.rs`:
- Around line 522-524: The method
RingBuffer::extend_from_within_unchecked_branchless is incorrectly calling the
if-based helper copy_with_checks; change that call to the branchless helper
(e.g., copy_with_checks_branchless or the no-branch variant defined next to
copy_with_checks) so the method behavior and documentation match the optimized,
branchless path—update the single call in
extend_from_within_unchecked_branchless to invoke the branchless helper with the
same parameters (m1_ptr, m2_ptr, f1_ptr, f2_ptr, m1_in_f1, m2_in_f1, m1_in_f2,
m2_in_f2).

In `@zstd/src/encoding/blocks/compressed.rs`:
- Around line 569-580: The test module's use/import ordering and formatting is
off; run cargo fmt (or manually reorder and format the use statements) so the
imports in the #[cfg(test)] mod tests block are properly sorted/wrapped.
Specifically, fix the imports that reference choose_table,
encode_offset_with_history, previous_table, remember_last_used_tables,
FseTableMode, FseTables, PreviousFseTable, build_table_from_symbol_counts and
the tables_match helper so they follow rustfmt's ordering and line-wrapping
rules (e.g., group and alphabetize items inside use super::{...} and align
crate::... imports), then re-run cargo fmt to ensure CI passes.
- Around line 1-9: The import block in compressed.rs is not ordered per rustfmt;
run `cargo fmt` (or manually sort the `use` statements) so imports like
BitWriter, CompressState/FseTables/PreviousFseTable, Matcher/Sequence,
build_table_from_symbol_counts/FSETable/State, and huff0_encoder are grouped and
alphabetically ordered to satisfy rustfmt; ensure grouped crates (alloc vs
crate::...) remain separated and the listed symbols (BitWriter, CompressState,
FseTables, PreviousFseTable, Matcher, Sequence, build_table_from_symbol_counts,
FSETable, State, huff0_encoder) appear in the properly formatted import block.

In `@zstd/src/encoding/frame_header.rs`:
- Around line 45-48: Reformat the if-let chain to satisfy rustfmt by placing the
let pattern first and ensuring proper spacing/indentation; change the block to
use "if let Some(window_size) = self.window_size && !self.single_segment {" then
compute log/ exponent and call output.push(exponent << 3) with standard rustfmt
spacing, or simply run rustfmt on the file. Refer to symbols:
self.single_segment, self.window_size, window_size.next_power_of_two(), ilog2(),
exponent, and output.push to locate and fix the block.

---

Outside diff comments:
In `@zstd/src/encoding/frame_compressor.rs`:
- Around line 3-9: The import block is out of rustfmt-expected order causing CI
failure; fix it by running rustfmt (cargo fmt) or manually reorder the use
statements so they follow rustfmt's canonical grouping/ordering (e.g.,
alloc::{boxed::Box, vec::Vec}, core::convert::TryInto, #[cfg(feature = "hash")]
use twox_hash::XxHash64; and #[cfg(feature = "hash")] use core::hash::Hasher;)
so rustfmt no longer reports differences.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5c140f9e-fbf2-42c2-bd8e-ce70b25e459b

📥 Commits

Reviewing files that changed from the base of the PR and between c08ea51 and 3316ccf.

📒 Files selected for processing (7)

cli/Cargo.toml
zstd/Cargo.toml
zstd/src/decoding/ringbuffer.rs
zstd/src/encoding/blocks/compressed.rs
zstd/src/encoding/frame_compressor.rs
zstd/src/encoding/frame_header.rs
zstd/src/huff0/huff0_encoder.rs

zstd/src/decoding/ringbuffer.rs

zstd/src/encoding/blocks/compressed.rs

zstd/src/encoding/frame_header.rs

…ithin_unchecked_branchless back to the no-branch helper\n- keep workspace formatting aligned after the edition-2024 cleanup\n\nCloses #17

…fter the edition-2024 cleanup\n- align imports and multiline formatting with current rustfmt output\n\nCloses #17

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

.github/workflows/ci.yml (1)
133-134: 🧹 Nitpick | 🔵 Trivial

Consider pinning the benchmark job toolchain for consistency.

The lint, test, and cross-i686 jobs now pin toolchain: "1.94.0", but the benchmark job still uses floating @stable. This could cause:

Inconsistent benchmark results if Rust releases affect codegen/performance

Subtle differences between benchmarked code and tested code

If performance tracking against the latest stable is intentional, consider adding a comment to document this choice. Otherwise, pin to 1.94.0 for consistency:
♻️ Suggested change
       - uses: dtolnay/rust-toolchain@stable
+        with:
+          toolchain: "1.94.0"
       - uses: Swatinem/rust-cache@v2
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/ci.yml around lines 133 - 134, The benchmark job currently
uses dtolnay/rust-toolchain@stable (and the rust-cache action) which leaves the
Rust toolchain floating; change the benchmark job to pin the toolchain to
"1.94.0" the same way as the lint/test/cross-i686 jobs by updating the
rust-toolchain action invocation and any related rust-cache inputs to reference
toolchain "1.94.0" (or add a one-line comment in the benchmark job explaining
why a floating stable toolchain is intentionally desired); target the benchmark
job's dtolnay/rust-toolchain@stable and Swatinem/rust-cache@v2 usages when
making this change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In @.github/workflows/ci.yml:
- Around line 133-134: The benchmark job currently uses
dtolnay/rust-toolchain@stable (and the rust-cache action) which leaves the Rust
toolchain floating; change the benchmark job to pin the toolchain to "1.94.0"
the same way as the lint/test/cross-i686 jobs by updating the rust-toolchain
action invocation and any related rust-cache inputs to reference toolchain
"1.94.0" (or add a one-line comment in the benchmark job explaining why a
floating stable toolchain is intentionally desired); target the benchmark job's
dtolnay/rust-toolchain@stable and Swatinem/rust-cache@v2 usages when making this
change.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a2975d88-73d4-4955-accd-341f395c6d43

📥 Commits

Reviewing files that changed from the base of the PR and between 5d54817 and 0854328.

📒 Files selected for processing (4)

.github/workflows/ci.yml
rust-toolchain.toml
zstd/src/decoding/errors.rs
zstd/tests/cross_validation.rs

Copilot

Pull request overview

Copilot reviewed 24 out of 24 changed files in this pull request and generated no new comments.

polaz · 2026-03-26T10:41:56Z

@coderabbitai re-review

coderabbitai · 2026-03-26T10:42:41Z

@polaz Sure, I'll kick off a full re-review of PR #33 now, taking into account all the changes since the last review, including the single-symbol distribution fix and the FSE table reuse / offset history optimizations.

🧠 Learnings used

Learnt from: CR
Repo: structured-world/structured-zstd PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-25T04:29:01.996Z
Learning: Avoid suggesting architectural changes that diverge too far from upstream (ruzstd/KillingSpark/zstd-rs fork)

✅ Actions performed

Full review triggered.

- switch the default workspace and CI toolchain back to stable\n- pin the dedicated msrv job to 1.92.0 explicitly\n- keep rust-version 1.92 in Cargo manifests as the compatibility floor\n\nCloses #17

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/decoding/ringbuffer.rs`:
- Around line 775-776: The helper assert_branchless_matches_checked() currently
calls checked.reserve(len) and branchless.reserve(len), which mutates the buffer
layouts and defeats tests for wraparound cases; remove those reserve(len) calls
and instead assert the precondition that the fixture already satisfies free() >=
len (e.g., assert!(checked.free() >= len && branchless.free() >= len)); update
tests that call wrapped_write() and wrapped_data() to use fixtures where free()
>= len so the helper compares the real split (wraparound) layouts without
relinearizing them. Ensure references to assert_branchless_matches_checked(),
wrapped_write(), wrapped_data(), checked and branchless are used when locating
the change.
- Line 517: Change the strict pointer comparisons in the debug assertions to
allow one-past-end equality: replace the `>` checks that compare
`self.buf.as_ptr().add(self.cap)` with `f1_ptr.add(m1_in_f1 + m2_in_f1)` (and
the analogous check using `f2_ptr` if present) to use `>=` instead of `>`, so
the `debug_assert!`s permit valid one-past-end pointer values during full-slice
fills.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: c8587ec9-7321-46a0-960d-805a079d86f6

📥 Commits

Reviewing files that changed from the base of the PR and between 1c9c616 and 2e6edc5.

📒 Files selected for processing (24)

.github/copilot-instructions.md
.github/workflows/ci.yml
cli/Cargo.toml
cli/src/progress.rs
rust-toolchain.toml
zstd/Cargo.toml
zstd/benches/compare_ffi.rs
zstd/benches/decode_all.rs
zstd/src/bit_io/bit_writer.rs
zstd/src/decoding/block_decoder.rs
zstd/src/decoding/errors.rs
zstd/src/decoding/ringbuffer.rs
zstd/src/dictionary/mod.rs
zstd/src/encoding/blocks/compressed.rs
zstd/src/encoding/frame_compressor.rs
zstd/src/encoding/frame_header.rs
zstd/src/encoding/levels/fastest.rs
zstd/src/encoding/match_generator.rs
zstd/src/fse/fse_encoder.rs
zstd/src/fse/mod.rs
zstd/src/huff0/huff0_encoder.rs
zstd/src/tests/mod.rs
zstd/src/tests/roundtrip_integrity.rs
zstd/tests/cross_validation.rs

zstd/src/decoding/ringbuffer.rs

Copilot

Pull request overview

Copilot reviewed 24 out of 24 changed files in this pull request and generated 2 comments.

zstd/src/encoding/blocks/compressed.rs

zstd/src/tests/roundtrip_integrity.rs

- allow exact one-past-end pointer bounds in branchless ringbuffer writes\n- keep the wraparound regression helper on real split layouts\n- remove frame-overhead bias from the multi-block reuse size assertion\n- document why the FSE cost model keeps the shared entropy estimate\n\nCloses #17

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/encoding/blocks/compressed.rs`:
- Around line 235-243: In the None fallback branch where you construct
fallback_counts and call build_table_from_symbol_counts, add a concise inline
comment explaining that for max_symbol == 0 we create a two-element slice
[counts[0], 0] to satisfy build_table_from_symbol_counts' requirement for at
least two symbols (workaround for single-symbol streams) so the
FseTableMode::Encoded(fallback) can be built; reference the fallback_counts
variable, the max_symbol check, and the build_table_from_symbol_counts call so
future readers understand the phantom zero-count symbol is deliberate.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0bb2fedc-b2b8-49e4-9b85-5c2e5d9e66a3

📥 Commits

Reviewing files that changed from the base of the PR and between 8298996 and 7262247.

📒 Files selected for processing (3)

zstd/src/decoding/ringbuffer.rs
zstd/src/encoding/blocks/compressed.rs
zstd/src/tests/roundtrip_integrity.rs

zstd/src/encoding/blocks/compressed.rs

Copilot

Pull request overview

Copilot reviewed 24 out of 24 changed files in this pull request and generated 1 comment.

zstd/src/tests/roundtrip_integrity.rs

- document the single-symbol fallback table workaround\n- weaken the repetitive-vs-random compression assertion to a stable ordering check\n\nCloses #17

Copilot

Pull request overview

Copilot reviewed 24 out of 24 changed files in this pull request and generated no new comments.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/tests/roundtrip_integrity.rs`:
- Around line 280-299: The test roundtrip_zero_literal_length_sequences claims
to exercise three shifted-repeat remap branches but only alternates prefix and
shifted_prefix so it exercises only the rep[1] path; either update the doc
comment to narrow the claim or extend the fixture to include a third overlapping
window that forces rep[2] and rep[0]-1 paths (e.g. create shifted_prefix2 =
data[2..52].to_vec() and change the loop to cycle through prefix,
shifted_prefix, shifted_prefix2 (or arrange a sequence like prefix,
shifted_prefix, shifted_prefix2, prefix) so the encoder is driven to emit the
rep[2] and rep[0]-1 remaps in roundtrip_zero_literal_length_sequences).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0dd5d2a2-ce29-4e0b-bd28-38fe34aa04f7

📥 Commits

Reviewing files that changed from the base of the PR and between 7262247 and 874b249.

📒 Files selected for processing (2)

zstd/src/encoding/blocks/compressed.rs
zstd/src/tests/roundtrip_integrity.rs

zstd/src/tests/roundtrip_integrity.rs

- align the zero literal-length regression comments with the fixture actually exercised end-to-end\n\nCloses #17

Copilot

Pull request overview

Copilot reviewed 24 out of 24 changed files in this pull request and generated no new comments.

Copilot AI review requested due to automatic review settings March 25, 2026 22:11

Copilot started reviewing on behalf of polaz March 25, 2026 22:12 View session

docs: fix clippy command in copilot-instructions

f1e162e

The `--all-features` flag includes `rustc-dep-of-std` which pulls in `compiler_builtins` — an internal feature for Rust stdlib builds that fails on stable. Align with CI which already uses explicit features.

Copilot AI reviewed Mar 25, 2026

View reviewed changes

zstd/src/encoding/frame_compressor.rs Show resolved Hide resolved

zstd/src/encoding/blocks/compressed.rs Outdated Show resolved Hide resolved

zstd/src/tests/roundtrip_integrity.rs Outdated Show resolved Hide resolved

zstd/src/tests/roundtrip_integrity.rs Outdated Show resolved Hide resolved

coderabbitai bot reviewed Mar 25, 2026

View reviewed changes

.github/copilot-instructions.md Outdated Show resolved Hide resolved

polaz added 2 commits March 26, 2026 02:03

fix(encoder): align fse repeat state with decoder

5468134

- charge exact FSE table header cost in table selection - store the actual last-used tables across blocks and reset them per frame - add interop and regression coverage for multi-block and reuse cases Closes #17

test(encoder): harden fse regressions

e472885

- add a narrow regression for remembering last-used FSE tables - remove extra copying from reused FrameCompressor coverage - fix workspace lint noise in existing test helpers Closes #17

polaz requested a review from Copilot March 26, 2026 00:12

Copilot started reviewing on behalf of polaz March 26, 2026 00:12 View session

test(interop): add reverse fse regression coverage

50812ea

- cover Huffman-heavy seed=100 in ffi-to-rust direction - cover repeat-offset-friendly inputs in ffi-to-rust direction Closes #17

Copilot AI reviewed Mar 26, 2026

View reviewed changes

zstd/src/encoding/blocks/compressed.rs Outdated Show resolved Hide resolved

zstd/src/encoding/blocks/compressed.rs Outdated Show resolved Hide resolved

zstd/src/fse/fse_encoder.rs Outdated Show resolved Hide resolved

coderabbitai bot reviewed Mar 26, 2026

View reviewed changes

zstd/src/encoding/blocks/compressed.rs Outdated Show resolved Hide resolved

polaz added 2 commits March 26, 2026 02:21

test(encoder): add single-symbol table regression

a154c95

- cover choose_table on a single emitted symbol - reproduce the dynamic-table panic before the fix Closes #17

fix(encoder): avoid single-symbol fse panic

9dd75e3

- skip the dynamic-table path for single-symbol distributions - reduce repeat-table persistence overhead for encoded modes - cache debug table-header verification in write_table Closes #17

coderabbitai bot reviewed Mar 26, 2026

View reviewed changes

docs(encoder): clarify fse cost comparison

dd48ad3

- document why choose_table keeps exact cost comparison - explain why only the degenerate single-symbol case is short-circuited Closes #17

coderabbitai bot reviewed Mar 26, 2026

View reviewed changes

zstd/src/encoding/blocks/compressed.rs Show resolved Hide resolved

zstd/src/fse/fse_encoder.rs Show resolved Hide resolved

fix(fse): assert table header alignment\n\n- document that single-sym…

c08ea51

…bol streams stay on predefined/repeat paths until sequence-section RLE exists\n- assert byte-aligned FSE table header writes and document the matching size contract\n\nCloses #17

polaz requested a review from Copilot March 26, 2026 06:39

Copilot started reviewing on behalf of polaz March 26, 2026 06:40 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

zstd/src/encoding/blocks/compressed.rs Outdated Show resolved Hide resolved

zstd/src/encoding/blocks/compressed.rs Show resolved Hide resolved

zstd/src/encoding/blocks/compressed.rs Outdated Show resolved Hide resolved

fix(encoder): harden repeat table state\n\n- avoid none unwrap when r…

3316ccf

…epeat mode has no previous table\n- store previous default/custom tables without cloning full defaults each block\n- raise crates to edition 2024 and fix resulting clippy/unsafe issues\n\nCloses #17

coderabbitai bot reviewed Mar 26, 2026

View reviewed changes

zstd/src/decoding/ringbuffer.rs Outdated Show resolved Hide resolved

zstd/src/encoding/blocks/compressed.rs Show resolved Hide resolved

zstd/src/encoding/blocks/compressed.rs Show resolved Hide resolved

zstd/src/encoding/frame_header.rs Outdated Show resolved Hide resolved

polaz added 2 commits March 26, 2026 09:40

fix(ringbuffer): restore branchless copy path\n\n- wire extend_from_w…

9ff9143

…ithin_unchecked_branchless back to the no-branch helper\n- keep workspace formatting aligned after the edition-2024 cleanup\n\nCloses #17

style(rustfmt): apply workspace formatting\n\n- run cargo fmt --all a…

7b72b57

…fter the edition-2024 cleanup\n- align imports and multiline formatting with current rustfmt output\n\nCloses #17

coderabbitai bot reviewed Mar 26, 2026

View reviewed changes

polaz requested a review from Copilot March 26, 2026 10:33

Copilot started reviewing on behalf of polaz March 26, 2026 10:33 View session

ci(bench): pin benchmark toolchain to 1.94

2e6edc5

Copilot AI reviewed Mar 26, 2026

View reviewed changes

ci(rust): follow stable and pin msrv

8298996

- switch the default workspace and CI toolchain back to stable\n- pin the dedicated msrv job to 1.92.0 explicitly\n- keep rust-version 1.92 in Cargo manifests as the compatibility floor\n\nCloses #17

polaz requested a review from Copilot March 26, 2026 10:54

Copilot started reviewing on behalf of polaz March 26, 2026 10:55 View session

coderabbitai bot reviewed Mar 26, 2026

View reviewed changes

zstd/src/decoding/ringbuffer.rs Outdated Show resolved Hide resolved

zstd/src/decoding/ringbuffer.rs Outdated Show resolved Hide resolved

Copilot AI reviewed Mar 26, 2026

View reviewed changes

zstd/src/encoding/blocks/compressed.rs Show resolved Hide resolved

zstd/src/tests/roundtrip_integrity.rs Outdated Show resolved Hide resolved

polaz requested a review from Copilot March 26, 2026 11:17

Copilot started reviewing on behalf of polaz March 26, 2026 11:17 View session

coderabbitai bot reviewed Mar 26, 2026

View reviewed changes

zstd/src/encoding/blocks/compressed.rs Show resolved Hide resolved

Copilot AI reviewed Mar 26, 2026

View reviewed changes

zstd/src/tests/roundtrip_integrity.rs Outdated Show resolved Hide resolved

test(zstd): stabilize review-driven assertions

874b249

- document the single-symbol fallback table workaround\n- weaken the repetitive-vs-random compression assertion to a stable ordering check\n\nCloses #17

polaz requested a review from Copilot March 26, 2026 11:35

Copilot started reviewing on behalf of polaz March 26, 2026 11:35 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

coderabbitai bot reviewed Mar 26, 2026

View reviewed changes

zstd/src/tests/roundtrip_integrity.rs Show resolved Hide resolved

test(zstd): narrow zero-ll coverage claim

95e929e

- align the zero literal-length regression comments with the fixture actually exercised end-to-end\n\nCloses #17

polaz requested a review from Copilot March 26, 2026 13:31

Copilot started reviewing on behalf of polaz March 26, 2026 13:32 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

polaz merged commit 98894d0 into main Mar 26, 2026
15 checks passed

polaz deleted the feat/#17-feat-fse-table-reuse-and-offset-history-optimizati branch March 26, 2026 15:15

sw-release-bot bot mentioned this pull request Mar 26, 2026

chore: release v0.0.3 #37

Merged

coderabbitai bot mentioned this pull request Mar 28, 2026

feat(encoding): add dictionary compression support #44

Merged

Conversation

polaz commented Mar 25, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Technical Details

Test Plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

codecov bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

polaz commented Mar 26, 2026

Uh oh!

coderabbitai bot commented Mar 26, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

polaz commented Mar 26, 2026

Uh oh!

polaz commented Mar 25, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 25, 2026 •

edited

Loading

codecov bot commented Mar 25, 2026 •

edited

Loading