Conversation
- Implement repeat offset encoding (codes 1/2/3) with offset history tracking across blocks, matching decoder's RFC 8878 §3.1.2.5 logic - Replace hardcoded FSE table selection with cost-based heuristic that compares new table (with header cost) vs predefined vs previous table - Add offset_hist [u32; 3] to CompressState, initialized per RFC 8878 - Add symbol_probability() and num_symbols() accessors to FSETable - Add 5 regression tests covering repeat offsets, zero-ll sequences, multi-block persistence, compression ratio, and C zstd interop Closes #17
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughImplements per‑frame RFC‑8878 repeat‑offset encoding with in‑place offset history, FSE table reuse/selection using full 256‑symbol histograms and exact header‑bit cost, persists last‑used tables across blocks, introduces helper APIs/tests, and bumps crate/tooling to Rust 2024. Changes
Sequence Diagram(s)sequenceDiagram
participant FC as FrameCompressor
participant CS as CompressState
participant BE as BlockEncoder
participant TS as TableSelector
participant FSE as FSEModule
participant OE as OffsetEncoder
FC->>CS: reset offset_hist = [1,4,8] & clear previous tables
FC->>BE: start block encoding
BE->>CS: read offset_hist & previous tables
BE->>OE: encode_offset_with_history(actual_offset, ll, &mut offset_hist)
OE-->>CS: update/rotate offset_hist
BE->>TS: provide 256-symbol counts (LL/ML/OF)
TS->>FSE: build_table_from_symbol_counts / table_header_bits()
FSE-->>TS: candidate table + header bit cost
TS-->>BE: choose mode (RepeatLast / Predefined / Encoded)
BE->>CS: remember_last_used_tables(chosen_tables)
BE-->>FC: emit encoded block
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
The `--all-features` flag includes `rustc-dep-of-std` which pulls in `compiler_builtins` — an internal feature for Rust stdlib builds that fails on stable. Align with CI which already uses explicit features.
There was a problem hiding this comment.
Pull request overview
This PR improves encoder compression efficiency and interoperability by adding Zstd repeat-offset encoding (rep codes 1/2/3) with cross-block offset history, and by enabling FSE table reuse via a cost-based selection heuristic.
Changes:
- Add repeat offset encoding with
[rep0, rep1, rep2]history persisted across blocks and reset per frame. - Replace fixed FSE table selection with a heuristic that compares estimated encoding cost + header overhead vs predefined vs repeat-last table reuse.
- Add regression and cross-validation tests for repeat offsets, multi-block persistence, and C zstd interop.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| zstd/src/encoding/blocks/compressed.rs | Implements offset history encoding and cost-based FSE table mode selection (encoded vs predefined vs repeat-last). |
| zstd/src/encoding/frame_compressor.rs | Adds offset_hist to compression state and resets it at frame start. |
| zstd/src/fse/fse_encoder.rs | Exposes table probability/introspection helpers used by the new heuristic. |
| zstd/src/tests/roundtrip_integrity.rs | Adds roundtrip tests for repeat offsets, multi-block behavior, and zero-literal-length sequences. |
| zstd/tests/cross_validation.rs | Adds Rust-compress → C-decompress cross-validation for repeat offsets and multi-block history. |
Comments suppressed due to low confidence (1)
zstd/src/encoding/blocks/compressed.rs:63
choose_table()treatsstate.fse_tables.*_previousas “the table thatRepeatwill reuse”, but*_previousis only set when anEncodedtable is used. If a block selectsPredefinedtables,*_previouswill still point at an older encoded table; a later block may pickRepeatand encode with that stale table even though the decoder will repeat the immediately preceding predefined table, causing decode failure. Consider clearing*_previouswhen a block usesPredefined(or otherwise ensure*_previousalways matches the last table actually used in the prior block).
let ll_mode = choose_table(
state.fse_tables.ll_previous.as_ref(),
&state.fse_tables.ll_default,
sequences.iter().map(|seq| encode_literal_length(seq.ll).0),
9,
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/copilot-instructions.md:
- Line 13: The Clippy instruction line lacks an explanation for excluding the
fuzz_exports crate; update the .github/copilot-instructions.md Clippy bullet
(the line mentioning `cargo clippy -p structured-zstd ...`) to add a brief
rationale for excluding `fuzz_exports` (e.g., "fuzz_exports is excluded because
fuzzing harnesses intentionally use patterns or unsafe code not meant for
regular linting"), referencing the existing "rustc-dep-of-std" exclusion text
for format and consistency; ensure the new sentence mentions `fuzz_exports` by
name so contributors understand the scope.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0250aeaf-34df-4710-bbb6-f81421b219ba
📒 Files selected for processing (1)
.github/copilot-instructions.md
- charge exact FSE table header cost in table selection - store the actual last-used tables across blocks and reset them per frame - add interop and regression coverage for multi-block and reuse cases Closes #17
- add a narrow regression for remembering last-used FSE tables - remove extra copying from reused FrameCompressor coverage - fix workspace lint noise in existing test helpers Closes #17
- cover Huffman-heavy seed=100 in ffi-to-rust direction - cover repeat-offset-friendly inputs in ffi-to-rust direction Closes #17
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@zstd/src/encoding/blocks/compressed.rs`:
- Around line 160-171: The dynamic-table candidate must be skipped for
single-symbol distributions: after computing max_symbol (from counts and
rposition) check if max_symbol == 0 (i.e., only symbol 0 may be present) and in
that case do not call build_table_from_symbol_counts/build_table_from_counts;
instead treat the dynamic candidate as unavailable by setting new_total_cost to
a very large value (e.g., usize::MAX) so the code falls back to
Repeat/Predefined. Update the code paths that reference new_table,
new_encoding_cost, and new_header_cost so they are only computed when max_symbol
> 0 (keep estimate_encoding_cost and table_header_bits usage unchanged
otherwise).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 53718647-1aa2-4213-b4ae-4ce06e839fb1
📒 Files selected for processing (7)
.github/copilot-instructions.mdzstd/src/encoding/blocks/compressed.rszstd/src/encoding/frame_compressor.rszstd/src/fse/fse_encoder.rszstd/src/fse/mod.rszstd/src/tests/roundtrip_integrity.rszstd/tests/cross_validation.rs
There was a problem hiding this comment.
♻️ Duplicate comments (1)
zstd/src/encoding/blocks/compressed.rs (1)
161-172:⚠️ Potential issue | 🔴 CriticalSingle-symbol distributions can still trigger a panic.
The previous review flagged that when all emitted codes are the same symbol (e.g.,
max_symbol == 0),build_table_from_symbol_countsreceives a single-element slice and may panic during internal redistribution logic. This appears unaddressed in the current code.Consider guarding this path by checking if only one distinct symbol exists before building the dynamic table candidate, falling back to Predefined/Repeat when the data is single-symbol.
,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@zstd/src/encoding/blocks/compressed.rs` around lines 161 - 172, The code builds a dynamic table even when there is only one distinct symbol, which can panic inside build_table_from_symbol_counts; add a guard using the computed max_symbol (or check counts[..=max_symbol].len() == 1) before calling build_table_from_symbol_counts in this block and skip building the dynamic candidate for single-symbol distributions—set new_encoding_cost/new_header_cost/new_total_cost to usize::MAX (or otherwise force selection of Predefined/Repeat) so the single-symbol path doesn't attempt redistribution and avoids the panic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@zstd/src/encoding/blocks/compressed.rs`:
- Around line 161-172: The code builds a dynamic table even when there is only
one distinct symbol, which can panic inside build_table_from_symbol_counts; add
a guard using the computed max_symbol (or check counts[..=max_symbol].len() ==
1) before calling build_table_from_symbol_counts in this block and skip building
the dynamic candidate for single-symbol distributions—set
new_encoding_cost/new_header_cost/new_total_cost to usize::MAX (or otherwise
force selection of Predefined/Repeat) so the single-symbol path doesn't attempt
redistribution and avoids the panic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0003c641-ef3c-429d-9726-6d48761907a9
📒 Files selected for processing (5)
cli/src/progress.rszstd/src/encoding/blocks/compressed.rszstd/src/tests/mod.rszstd/src/tests/roundtrip_integrity.rszstd/tests/cross_validation.rs
- document why choose_table keeps exact cost comparison - explain why only the degenerate single-symbol case is short-circuited Closes #17
|
@coderabbitai Re duplicate single-symbol panic finding in |
|
Tip For best results, initiate chat on the files or code changes.
🧠 Learnings used |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@zstd/src/encoding/blocks/compressed.rs`:
- Around line 173-213: The code currently skips creating a mode-1/RLE candidate
when distinct_symbols == 1, so one-symbol streams are forced to
Predefined/Repeat and lose the optimal RLE path; fix by explicitly constructing
an RLE candidate when distinct_symbols == 1 (e.g. call the existing
RLE/table-builder helper or add a new build_rle_table_from_symbol_counts),
compute its cost using estimate_encoding_cost(...) plus any header cost
(table_header_bits() or zero if headerless), include that candidate in the cost
comparison along with new_table, predefined_cost, and previous_cost, and return
FseTableMode::Rle(...) when it is cheapest (add/use the appropriate FseTableMode
variant for RLE instead of falling back to Predefined).
In `@zstd/src/fse/fse_encoder.rs`:
- Around line 152-198: table_header_bits() currently assumes the header starts
at a byte boundary which mismatches write_table() that pads relative to the
writer's current bit index; this can produce off-by-up-to-7-bit estimates and
trigger the debug assertion. Fix by making table_header_bits accept a
starting_bit_offset (0..7) or a boolean assert_aligned flag: if you add
starting_bit_offset, incorporate it into the final byte-alignment calculation
(use (bits + starting_bit_offset) % 8 to compute padding and return the total
bits including that offset) and update all callers (and tests) to pass the
writer's current bit index mod 8; alternatively, if you prefer asserting
alignment, add an assertion at the start of table_header_bits that the caller is
byte-aligned (starting_bit_offset == 0) and document that requirement. Ensure
the change is mirrored where table_header_bits is used and also apply identical
logic to the other helper noted around the 200-250 range.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: f1471ea5-79bf-47ce-ab69-d6b0ea89a14a
📒 Files selected for processing (2)
zstd/src/encoding/blocks/compressed.rszstd/src/fse/fse_encoder.rs
…bol streams stay on predefined/repeat paths until sequence-section RLE exists\n- assert byte-aligned FSE table header writes and document the matching size contract\n\nCloses #17
…epeat mode has no previous table\n- store previous default/custom tables without cloning full defaults each block\n- raise crates to edition 2024 and fix resulting clippy/unsafe issues\n\nCloses #17
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
zstd/src/encoding/frame_compressor.rs (1)
3-9:⚠️ Potential issue | 🟡 MinorFix import ordering to pass
cargo fmt.The CI pipeline reports that
cargo fmt --all -- --checkfailed due to formatting differences in the import block. Runcargo fmtto fix the import ordering.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@zstd/src/encoding/frame_compressor.rs` around lines 3 - 9, The import block is out of rustfmt-expected order causing CI failure; fix it by running rustfmt (cargo fmt) or manually reorder the use statements so they follow rustfmt's canonical grouping/ordering (e.g., alloc::{boxed::Box, vec::Vec}, core::convert::TryInto, #[cfg(feature = "hash")] use twox_hash::XxHash64; and #[cfg(feature = "hash")] use core::hash::Hasher;) so rustfmt no longer reports differences.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@zstd/src/decoding/ringbuffer.rs`:
- Around line 522-524: The method
RingBuffer::extend_from_within_unchecked_branchless is incorrectly calling the
if-based helper copy_with_checks; change that call to the branchless helper
(e.g., copy_with_checks_branchless or the no-branch variant defined next to
copy_with_checks) so the method behavior and documentation match the optimized,
branchless path—update the single call in
extend_from_within_unchecked_branchless to invoke the branchless helper with the
same parameters (m1_ptr, m2_ptr, f1_ptr, f2_ptr, m1_in_f1, m2_in_f1, m1_in_f2,
m2_in_f2).
In `@zstd/src/encoding/blocks/compressed.rs`:
- Around line 569-580: The test module's use/import ordering and formatting is
off; run cargo fmt (or manually reorder and format the use statements) so the
imports in the #[cfg(test)] mod tests block are properly sorted/wrapped.
Specifically, fix the imports that reference choose_table,
encode_offset_with_history, previous_table, remember_last_used_tables,
FseTableMode, FseTables, PreviousFseTable, build_table_from_symbol_counts and
the tables_match helper so they follow rustfmt's ordering and line-wrapping
rules (e.g., group and alphabetize items inside use super::{...} and align
crate::... imports), then re-run cargo fmt to ensure CI passes.
- Around line 1-9: The import block in compressed.rs is not ordered per rustfmt;
run `cargo fmt` (or manually sort the `use` statements) so imports like
BitWriter, CompressState/FseTables/PreviousFseTable, Matcher/Sequence,
build_table_from_symbol_counts/FSETable/State, and huff0_encoder are grouped and
alphabetically ordered to satisfy rustfmt; ensure grouped crates (alloc vs
crate::...) remain separated and the listed symbols (BitWriter, CompressState,
FseTables, PreviousFseTable, Matcher, Sequence, build_table_from_symbol_counts,
FSETable, State, huff0_encoder) appear in the properly formatted import block.
In `@zstd/src/encoding/frame_header.rs`:
- Around line 45-48: Reformat the if-let chain to satisfy rustfmt by placing the
let pattern first and ensuring proper spacing/indentation; change the block to
use "if let Some(window_size) = self.window_size && !self.single_segment {" then
compute log/ exponent and call output.push(exponent << 3) with standard rustfmt
spacing, or simply run rustfmt on the file. Refer to symbols:
self.single_segment, self.window_size, window_size.next_power_of_two(), ilog2(),
exponent, and output.push to locate and fix the block.
---
Outside diff comments:
In `@zstd/src/encoding/frame_compressor.rs`:
- Around line 3-9: The import block is out of rustfmt-expected order causing CI
failure; fix it by running rustfmt (cargo fmt) or manually reorder the use
statements so they follow rustfmt's canonical grouping/ordering (e.g.,
alloc::{boxed::Box, vec::Vec}, core::convert::TryInto, #[cfg(feature = "hash")]
use twox_hash::XxHash64; and #[cfg(feature = "hash")] use core::hash::Hasher;)
so rustfmt no longer reports differences.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 5c140f9e-fbf2-42c2-bd8e-ce70b25e459b
📒 Files selected for processing (7)
cli/Cargo.tomlzstd/Cargo.tomlzstd/src/decoding/ringbuffer.rszstd/src/encoding/blocks/compressed.rszstd/src/encoding/frame_compressor.rszstd/src/encoding/frame_header.rszstd/src/huff0/huff0_encoder.rs
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.github/workflows/ci.yml (1)
133-134: 🧹 Nitpick | 🔵 TrivialConsider pinning the benchmark job toolchain for consistency.
The
lint,test, andcross-i686jobs now pintoolchain: "1.94.0", but the benchmark job still uses floating@stable. This could cause:
- Inconsistent benchmark results if Rust releases affect codegen/performance
- Subtle differences between benchmarked code and tested code
If performance tracking against the latest stable is intentional, consider adding a comment to document this choice. Otherwise, pin to 1.94.0 for consistency:
♻️ Suggested change
- uses: dtolnay/rust-toolchain@stable + with: + toolchain: "1.94.0" - uses: Swatinem/rust-cache@v2🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/ci.yml around lines 133 - 134, The benchmark job currently uses dtolnay/rust-toolchain@stable (and the rust-cache action) which leaves the Rust toolchain floating; change the benchmark job to pin the toolchain to "1.94.0" the same way as the lint/test/cross-i686 jobs by updating the rust-toolchain action invocation and any related rust-cache inputs to reference toolchain "1.94.0" (or add a one-line comment in the benchmark job explaining why a floating stable toolchain is intentionally desired); target the benchmark job's dtolnay/rust-toolchain@stable and Swatinem/rust-cache@v2 usages when making this change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In @.github/workflows/ci.yml:
- Around line 133-134: The benchmark job currently uses
dtolnay/rust-toolchain@stable (and the rust-cache action) which leaves the Rust
toolchain floating; change the benchmark job to pin the toolchain to "1.94.0"
the same way as the lint/test/cross-i686 jobs by updating the rust-toolchain
action invocation and any related rust-cache inputs to reference toolchain
"1.94.0" (or add a one-line comment in the benchmark job explaining why a
floating stable toolchain is intentionally desired); target the benchmark job's
dtolnay/rust-toolchain@stable and Swatinem/rust-cache@v2 usages when making this
change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: a2975d88-73d4-4955-accd-341f395c6d43
📒 Files selected for processing (4)
.github/workflows/ci.ymlrust-toolchain.tomlzstd/src/decoding/errors.rszstd/tests/cross_validation.rs
|
@coderabbitai re-review |
|
🧠 Learnings used✅ Actions performedFull review triggered. |
- switch the default workspace and CI toolchain back to stable\n- pin the dedicated msrv job to 1.92.0 explicitly\n- keep rust-version 1.92 in Cargo manifests as the compatibility floor\n\nCloses #17
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@zstd/src/decoding/ringbuffer.rs`:
- Around line 775-776: The helper assert_branchless_matches_checked() currently
calls checked.reserve(len) and branchless.reserve(len), which mutates the buffer
layouts and defeats tests for wraparound cases; remove those reserve(len) calls
and instead assert the precondition that the fixture already satisfies free() >=
len (e.g., assert!(checked.free() >= len && branchless.free() >= len)); update
tests that call wrapped_write() and wrapped_data() to use fixtures where free()
>= len so the helper compares the real split (wraparound) layouts without
relinearizing them. Ensure references to assert_branchless_matches_checked(),
wrapped_write(), wrapped_data(), checked and branchless are used when locating
the change.
- Line 517: Change the strict pointer comparisons in the debug assertions to
allow one-past-end equality: replace the `>` checks that compare
`self.buf.as_ptr().add(self.cap)` with `f1_ptr.add(m1_in_f1 + m2_in_f1)` (and
the analogous check using `f2_ptr` if present) to use `>=` instead of `>`, so
the `debug_assert!`s permit valid one-past-end pointer values during full-slice
fills.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: c8587ec9-7321-46a0-960d-805a079d86f6
📒 Files selected for processing (24)
.github/copilot-instructions.md.github/workflows/ci.ymlcli/Cargo.tomlcli/src/progress.rsrust-toolchain.tomlzstd/Cargo.tomlzstd/benches/compare_ffi.rszstd/benches/decode_all.rszstd/src/bit_io/bit_writer.rszstd/src/decoding/block_decoder.rszstd/src/decoding/errors.rszstd/src/decoding/ringbuffer.rszstd/src/dictionary/mod.rszstd/src/encoding/blocks/compressed.rszstd/src/encoding/frame_compressor.rszstd/src/encoding/frame_header.rszstd/src/encoding/levels/fastest.rszstd/src/encoding/match_generator.rszstd/src/fse/fse_encoder.rszstd/src/fse/mod.rszstd/src/huff0/huff0_encoder.rszstd/src/tests/mod.rszstd/src/tests/roundtrip_integrity.rszstd/tests/cross_validation.rs
- allow exact one-past-end pointer bounds in branchless ringbuffer writes\n- keep the wraparound regression helper on real split layouts\n- remove frame-overhead bias from the multi-block reuse size assertion\n- document why the FSE cost model keeps the shared entropy estimate\n\nCloses #17
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@zstd/src/encoding/blocks/compressed.rs`:
- Around line 235-243: In the None fallback branch where you construct
fallback_counts and call build_table_from_symbol_counts, add a concise inline
comment explaining that for max_symbol == 0 we create a two-element slice
[counts[0], 0] to satisfy build_table_from_symbol_counts' requirement for at
least two symbols (workaround for single-symbol streams) so the
FseTableMode::Encoded(fallback) can be built; reference the fallback_counts
variable, the max_symbol check, and the build_table_from_symbol_counts call so
future readers understand the phantom zero-count symbol is deliberate.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0bb2fedc-b2b8-49e4-9b85-5c2e5d9e66a3
📒 Files selected for processing (3)
zstd/src/decoding/ringbuffer.rszstd/src/encoding/blocks/compressed.rszstd/src/tests/roundtrip_integrity.rs
- document the single-symbol fallback table workaround\n- weaken the repetitive-vs-random compression assertion to a stable ordering check\n\nCloses #17
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@zstd/src/tests/roundtrip_integrity.rs`:
- Around line 280-299: The test roundtrip_zero_literal_length_sequences claims
to exercise three shifted-repeat remap branches but only alternates prefix and
shifted_prefix so it exercises only the rep[1] path; either update the doc
comment to narrow the claim or extend the fixture to include a third overlapping
window that forces rep[2] and rep[0]-1 paths (e.g. create shifted_prefix2 =
data[2..52].to_vec() and change the loop to cycle through prefix,
shifted_prefix, shifted_prefix2 (or arrange a sequence like prefix,
shifted_prefix, shifted_prefix2, prefix) so the encoder is driven to emit the
rep[2] and rep[0]-1 remaps in roundtrip_zero_literal_length_sequences).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0dd5d2a2-ce29-4e0b-bd28-38fe34aa04f7
📒 Files selected for processing (2)
zstd/src/encoding/blocks/compressed.rszstd/src/tests/roundtrip_integrity.rs
- align the zero literal-length regression comments with the fixture actually exercised end-to-end\n\nCloses #17
Summary
rust-version = "1.92"/ CImsrvon1.92.0Technical Details
encode_offset_with_history()converts absolute offsets into repeat-offset codes, including the shifted zero-literal mapping (1 -> rep[1],2 -> rep[2],3 -> rep[0]-1)choose_table()now compares Encoded vs Predefined vs Repeat using exacttable_header_bits()accounting, avoids selecting unencodable Repeat/Predefined candidates, and falls back to a safe Encoded table when zero-cost candidates cannot represent the streamll_previous/ml_previous/of_previousare updated to the actual table used by the block so the nextRepeatmode matches decoder state;FrameCompressor::compress()clears that state at the start of each new framerust-toolchain.tomlnow followsstablefor default local/CI use, while Cargo manifests keeprust-version = "1.92"and the dedicated CImsrvjob verifies1.92.0explicitlyi686-unknown-linux-gnuexplicitly in the cross job so the 32-bit target remains available independently of the default toolchain selectionFrameCompressorstate, and exact display-message text for touched decoder errorsTest Plan
cargo nextest run --workspacecargo clippy --workspace --all-targets --features hash,std,dict_builder -- -D warningscargo fmt --all -- --checkCloses #17
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Chores