docs(#154): bump quick-start to 0.0.21 + drop legacy ruzstd Changelog.md#155
Conversation
…ngelog.md The README quick-start blocks still pinned `structured-zstd = "0.0.20"` even though crates.io has been publishing 0.0.21 since the v0.0.21 release on 2026-05-17 and docs.rs renders 0.0.21 as latest. Both occurrences (default-features and `default-features = false`) updated. Top-level `Changelog.md` was the inherited ruzstd legacy file: > This document records the changes made between versions, starting > with version 0.5.0 > # After 0.8.2 (Current) That numbering belongs to the upstream ruzstd lineage and is unrelated to our 0.0.x line. The authoritative changelog is `zstd/CHANGELOG.md`, maintained by release-plz and up to date through 0.0.21. Remove the stale duplicate. Closes #154
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (2)
💤 Files with no reviewable changes (1)
📝 WalkthroughWalkthroughREADME installation examples updated to reference the latest ChangesDocumentation Version Update
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
Updates documentation to reflect the v0.0.21 release and removes an outdated, upstream-inherited changelog file that no longer matches this project’s version lineage.
Changes:
- Bump the README quick-start dependency version from
0.0.20to0.0.21(including thedefault-features = falseexample). - Delete the legacy top-level
Changelog.md(ruzstd lineage) in favor of the authoritativezstd/CHANGELOG.md.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| README.md | Updates quick-start dependency snippets to 0.0.21. |
| Changelog.md | Removes stale upstream-derived changelog content that doesn’t apply to the 0.0.x release line. |
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'structured-zstd vs C FFI (x86_64-gnu)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.30.
| Benchmark suite | Current: 99759db | Previous: 1e0f695 | Ratio |
|---|---|---|---|
compress/level_22_btultra2/low-entropy-1m/matrix/pure_rust |
1.769 ms |
1.293 ms |
1.37 |
compress/level_22_btultra2/low-entropy-1m/matrix/c_ffi |
1.728 ms |
1.184 ms |
1.46 |
This comment was automatically generated by workflow using github-action-benchmark.
CC: @polaz
Two doc polish carry-overs after the rebase onto main (which picked up #155 — the legacy Changelog.md removal): * README quick-start now points at zstd/CHANGELOG.md (release-plz maintained) so visitors browsing the GitHub repo or the docs.rs landing page have a one-click path to per-release history. * sequence_capture.rs:517 docstring still referenced constant_run_routes_through_matcher_path, but that test was renamed to rejects_constant_run_with_rle_on_wire_block when the raw/RLE detector started enforcing the contract at the public-API boundary. Update the cross-reference to point at the current test name and clarify that the new invariant is enforced by the rejection, not by a matcher-path check.
* feat(bench): #99 add Rust↔FFI sequence-stream comparator Lane D / 7-tooling-seq-cmp: prerequisite for 7-compress-default ratio-divergence audit. Captures the encoder sequence stream from both sides on a fixed (input, level) pair and prints a per-position diff so deviations from donor can be triaged into "algorithmic win" / "cost source" / "missed match upstream skipped" classes. Pieces: - `encoding::sequence_capture` (bench_internals feature) — public `compress_and_collect_sequences(input, level) -> Vec<CapturedRawSequence>` drives the production `FrameCompressor` through a `CapturingMatcher` wrapper that forwards every `Matcher` method to `MatchGeneratorDriver` and appends each `Sequence::Triple` to a shared recorder. Going through `FrameCompressor` (not the matcher in isolation) preserves block-splitter decisions and per-level resets so the captured stream matches what the on-wire frame encodes. 2 unit tests pin the wrapper contract (recording present on a repeating pattern, empty on incompressible LCG). - `benches/compare_ffi_sequences` (`harness=false`, requires `dict_builder` + `bench_internals`) — runs the capture against two fixtures (`decodecorpus_files/z000033` and a 16 KiB synthetic low-entropy log) and diffs against `ZSTD_generateSequences`. FFI block delimiters (offset=0, ml=0) are dropped and used solely to tag block indices. Output: per-fixture summary + the first 30 diverging rows in a plain-text table. Sample output on level=3 / z000033 (1 MiB): rust=46678 seqs, ffi=47713 seqs, 24.1% equal; surfaces the FFI emitting 5 small sequences at the start before Rust emits anything — the kind of shape that lets 7-compress-default zero in on the residual ratio gap. Closes #99 * fix(bench): #149 capture per-block trailing literals for alignment The first iteration dropped `Sequence::Literals` events on the Rust side and used FFI delimiter sequences only for block-index bookkeeping on the FFI side. Both choices undercount the cumulative position `Σ (ll + ml)` at every block boundary by the trailing-literal length of that block, shifting every subsequent diff row on multi-block fixtures. On `level=3 / z000033` (8 blocks) the spurious shift turned into ~38k RustOnly+FfiOnly rows (out of 60k total) — most of which were the same matches misaligned by 1-200 bytes. Fix: - `SequenceCapture { sequences, block_tail_lengths }` returned from `compress_and_collect_sequences`. `CapturingMatcher` records the `Sequence::Literals { literals }` tail per block (including the fully-literal case for `skip_matching` raw blocks, where the entire committed space is the tail). - `compare_ffi_sequences::ffi_generate_sequences` returns `(Vec<FfiSeq>, Vec<u32>)` — the dummy delimiter's `litLength` is now captured as the FFI-side trailing-literal length. - `align_and_diff` applies tail length at each block boundary on both sides before advancing cumulative position to the next triple. Helper `advance_tail_for_completed_blocks` keeps the bookkeeping in one place. - Module + bench docstrings narrowed to make explicit that the tool emits raw `Equal/Differ/RustOnly/FfiOnly` verdicts; the "algorithmic win / cost source / missed match" labels are human-applied interpretations of the raw verdicts, not tool output (PR #149 review #4). - Unit tests extended with the cumulative-position invariant `Σ (ll + ml) + Σ tails == input.len()` — fails immediately if any tail length is dropped on either path. Bench output on `level=3 / z000033` after fix: alignment: equal=20338 (34.0%) differ=14155 rust_only=12185 ffi_only=13220 (was: equal=16014 (24.1%) differ=11834 rust_only=18830 ffi_only=19865) * fix(bench): flush pending tails on iterator end + gate compare on equal current positions Three correctness bugs in compare_ffi_sequences alignment: 1. `advance_tail_for_completed_blocks` was gated on `peek()` returning `Some`, so the FINAL block's tail (always present) and any literal-only blocks past the iterator's end never contributed to the cumulative position counter. Fix: advance to `tails.len() as u32` when `peek()` is `None` so pending tails flush before the loop exits. 2. The Some/Some branch in `align_and_diff` compared field-by-field whenever `r_next_pos == f_next_pos`, even when the current positions diverged — collapsing an "extra sequence then catch-up" case into a single misleading `Differ` row whenever the summed deltas happened to balance. Gate the comparison on `rust_pos == ffi_pos`; otherwise advance the lagging side, which matches the function-comment contract. 3. RLE-style blocks that bypass the matcher would silently produce a short capture (Σ (ll+ml) + Σ tails != input.len()). Add a fail-fast invariant check after `compress()` so callers get a clear panic instead of silent wrong alignment. Constant runs in practice still flow through `skip_matching`, so the assert doesn't trigger today; the new test `constant_run_routes_through_matcher_path` pins that observation and will turn red if a future encoder change reintroduces a true bypass. Test suite: 539/539 (3 in sequence_capture::tests pinning cumulative invariants + 1 verifying constant-run path stays matcher-routed). * fix(bench): mirror FFI invariant assert + corpus env-var resolution + clarify tail-vec doc Three correctness / hygiene fixes: * ffi_generate_sequences previously trusted ZSTD_generateSequences to emit a block delimiter for every block. If a delimiter is omitted or filtered incorrectly the returned `tails` vec ends up short and align_and_diff walks a stale cursor — emitting misleading Differ/FfiOnly rows instead of failing on the broken precondition. Mirror the Rust-side fail-fast invariant `Σ(ll+ml)+Σ(tails)==input.len()` before returning `(out, tails)` with an FFI-specific diagnostic. * compare_ffi_sequences fixture lookup ignored STRUCTURED_ZSTD_BENCH_CORPUS_PATH and CARGO_MANIFEST_DIR — the same resolution path documented in `support::load_decode_corpus_scenario` for compare_ffi/compare_ffi_memory. Without it, CI runs that invoke the prebuilt bench binary directly (bypassing cargo) silently skip the canonical z000033 fixture. Honor both env vars first, then fall back to the repo-relative paths. * The `block_tail_lengths` doc on `SequenceCapture` previously equated its length to `sequences.last().block_idx+1`, but that's wrong for any block that emitted no `Sequence::Triple` (raw blocks via `skip_matching`, fully-literal blocks). Describe as one entry per emitted block, including zero-triple blocks. * fix(bench): reject Uncompressed level + correct align_and_diff doc + clarify skip_matching wording * compress_and_collect_sequences accepted CompressionLevel::Uncompressed, which routes through FrameCompressor's raw-block fast-path WITHOUT consulting CapturingMatcher. The recorder stayed empty and the fail-fast invariant later panicked with a misleading "matcher-bypassing block path" message on a valid enum variant. Reject the variant up front with an explicit diagnostic that points at the actual constraint. * align_and_diff's doc previously claimed the cumulative position counters fed an "equality assertion the bench prints in summary", but the function neither returns positions nor does the caller assert anything against them. Rewrite the paragraph to describe what actually happens: counters are internal state driving the per-iteration "which side is behind" decision and the final drain loop, with fail-fast invariants enforced upstream on the capture and FFI sides. * Renamed captures_no_triples_on_incompressible_input -> captures_bounded_triples_on_incompressible_input. The assertion has always been seqs.len() < data.len() / 16 (a sparse-trickle bound), not strict zero, because the dfast hash can luck into a 5-byte collision on any random 1 KiB stream. The old name misrepresented the actual contract. * The CapturingMatcher::skip_matching{,_with_hint} wording described the no-triple path as "raw / uncompressed" only, but skip_matching_with_hint is also taken on the RLE fast-path for constant runs. Accounting is unchanged (entire committed space as tail), but the wording could mislead future changes around RLE handling. Reworded both inline notes and the post-compress invariant block. Test suite: 539/539 pass (one test renamed, all other behavior identical). * fix(bench): guard set_len bounds + drop unused support module + use to_string + multi-block test * ffi_generate_sequences now asserts nb_seqs <= bound before buf.set_len(nb_seqs). ZSTD_sequenceBound is the documented upper bound but the assert is one line and prevents an uninitialized-memory exposure if libzstd ever ships a regression. * Removed `#[allow(dead_code)] mod support;` from compare_ffi_sequences. Cargo does not auto-include sibling modules in benches/, so leaving the declaration out is the correct way to opt out of the shared bench helper and decouple this diagnostic tool from unrelated changes to the harness. * Replaced `format!("{}", counter % 1000)` with `(counter % 1000).to_string()` in the synthetic log fixture builder. Equivalent semantics, idiomatic, avoids clippy::useless_format on strict bench-lint setups. * Added captures_multi_block_tails_and_indices: 256 KiB fixture (two 128 KiB frame blocks) pins per-block tail accounting across a block boundary. Asserts block_tail_lengths.len() >= 2, full Σ(ll+ml)+Σ(tails)==input.len() reconstruction, every triple's block_idx in [0, num_blocks), and contiguous block_idx monotonicity in the triple stream. Single-block tests pass even with a regression that mis-increments current_block; this is the first test that fails on that regression. Test suite: 540/540 (one new multi-block test). * fix(bench): reject empty-input and pre-split levels + document raw-fallback caveat * Reject empty input with an explicit assert. `FrameCompressor` emits a zero-length raw block for empty input without invoking the matcher, so the post-compress invariant trivially passes (`0 == 0`) but `block_tail_lengths.len()` stays 0 — violating the public "one entry per emitted block" contract. Callers using `tail_lengths.len()` as a block count would get misleading metadata. * Reject pre-split levels (`Best`, `Level(11..=22)`). The donor block-splitter (`donor_split_block_by_chunks` / `_fromBorders`) can emit multiple physical on-wire blocks per single `start_matching` call, but `CapturingMatcher::current_block` increments once per matcher invocation. The recorded `block_tail_lengths` would NOT be "one entry per emitted on-wire block" at those levels, shifting alignment against FFI delimiters by the split-block count. Current scope is the Lane A `7-compress-default` audit at `Level(3)`; higher levels would need per-physical-block hooks that don't exist yet. * Document the raw-fallback caveat. The capture records what the matcher PRODUCED, not what got written on-wire. `compress_block_encoded` may emit a raw block when `compressed.len() >= MAX_BLOCK_SIZE`, in which case the matcher's triples are still in the capture but no sequences are on the wire — comparator will show spurious `RustOnly` for that block. No clean rollback exists from inside the `Matcher` trait; fixing would require encoder-side hooks downstream of the matcher pass, which is out of scope for this bench-only tool. Documented the limitation in the public docstring so triage interpreters can recognize the pattern. * test(bench): pin guard panics for empty input + Uncompressed + pre-split levels Add four #[should_panic] regression tests for the compress_and_collect_sequences guard paths introduced in earlier fixes: * rejects_empty_input — empty slice triggers the "requires non-empty input" assert. * rejects_uncompressed_level — CompressionLevel::Uncompressed triggers the matcher-bypass diagnostic. * rejects_pre_split_numeric_level — Level(11) triggers the "does not support pre-split levels" diagnostic (covers the Level(11..=22) range). * rejects_best_preset — CompressionLevel::Best triggers the same pre-split guard (Best == Level 11 per public docs, so both name + number variants must reject). Without these tests a future relaxation of any guard would silently regress the captured stream into a wrong-but-passing shape on edge inputs. Suite: 544/544. * fix(bench): allow Best preset — only numeric Level(11..=22) pre-splits donor_pre_split_level matches the EXACT enum variants (CompressionLevel::Level(11..=15) and CompressionLevel::Level(16..=22)) and falls through to None for the named Best preset. Despite Best's docstring saying "roughly equivalent to Level 11", the routing in frame_compressor.rs treats Best as a distinct variant that does NOT trigger the donor block-splitter, so the per-matcher-call block counter stays correct for it. The previous guard incorrectly rejected Best, blocking a valid capture path with a misleading "multiple physical blocks per matcher call" diagnostic that does not apply to that variant. Replace rejects_best_preset (was a #[should_panic] test) with captures_through_best_preset, a positive test that drives a 32 KiB repeating-pattern fixture through Best and asserts Σ(ll+ml)+Σ(tails)==input.len(). The positive test pins the path so a future tightening of the guard to also reject Best would surface as a red test rather than silent regression. * fix(bench): detect raw/RLE on-wire blocks + extend pre-split reject + match seq by-ref + refresh stale wording Five findings from CodeRabbit (Major) and Copilot (Major+Minor): * Parse the emitted Zstandard frame headers (RFC 8878 §3.1.1.2.2) after compression and panic if any Raw_Block or RLE_Block is present. The matcher hook records triples eagerly; the encoder may later discard a compressed attempt and emit a Raw_Block when compressed.len() >= MAX_BLOCK_SIZE, leaving phantom triples in the capture whose on-wire form has no sequences. Without this detection the comparator would report spurious RustOnly rows for raw-fallback blocks. New helper detect_raw_or_rle_blocks_in_frame walks the frame header (variable-length: dict_id_flag, single_segment_flag, fcs_flag combinations) and iterates 3-byte block headers until last_block, returning on-wire indices of any raw / RLE blocks. * Tighten the pre-split-level guard from Level(11..=22) to Level(n >= 11). Positive numeric levels above CompressionLevel:: MAX_LEVEL clamp to Level 22 matcher parameters per resolve_level_params (match_generator.rs:412-415), so Level(23+) would slip past the old check and hit the same post-split path as Level(22). * Match `&seq` instead of `seq` in CapturingMatcher::start_matching. Today every Sequence field is Copy so the by-value match works, but binding by-ref is robust if a non-Copy field is added. * Refresh stale wording in two test docstrings: the rotating-pattern test no longer claims constant runs bypass the matcher (the new raw-block detector handles all bypass cases uniformly), and the pre-split-level test docstring now correctly distinguishes the numeric Level(n>=11) reject path from the Best-preset allow path that captures_through_best_preset pins. * Update the public docstring's "raw-fallback" section: the limitation is now fail-fast detected, not a triage caveat. Test reshape: two former positive tests (constant_run_routes_through_matcher_path, captures_bounded_triples_on_incompressible_input) became #[should_panic] tests for the new raw/RLE detector — those fixtures intentionally exercise the on-wire raw paths that the detector now rejects. captures_multi_block_tails_and_indices fixture shrunk from 256 KiB / 64-byte cycle to 200 KiB / 16-byte cycle to ensure both blocks stay on the Compressed_Block path. Suite: 544/544. * fix(bench): narrow post-split reject to Level(n >= 16) The previous guard rejected Level(n >= 11), but only Level(16..=22) actually breaks the per-matcher-call block counter. Two distinct splitter mechanisms exist in frame_compressor.rs: * Pre-split (Level(11..=15) borders via donor_optimal_block_size): the splitter chooses a shrunken block_len BEFORE the matcher runs; the suffix is parked in pending_input and the next compress-loop iteration calls the matcher again on the suffix. Each matcher call still maps to exactly ONE physical on-wire block, so CapturingMatcher::current_block tracks correctly. * Post-split (Level(16..=22) + window >= 1<<17, dispatched from levels/fastest.rs::compress_block_encoded via compress_block_with_post_split): a single matcher call's output is split into multiple physical blocks. One matcher call -> N blocks -> current_block only increments once, block_tail_lengths.len() is short by N - 1. Reject Level(n >= 16) only. Covers Level(16..=22) and clamped Level(>22) (match_generator.rs:412-415 lands on Level 22 params for n > 22). Renamed rejects_pre_split_numeric_level -> rejects_post_split_numeric_level (now uses Level(16) to hit the actual broken path). Added captures_through_pre_split_level_15 positive test: 32 KiB rotating pattern through Level(15) verifies the borders-pre-split path produces correct cumulative reconstruction. Suite: 559/559. * docs: surface release notes pointer in README + fix stale test reference Two doc polish carry-overs after the rebase onto main (which picked up #155 — the legacy Changelog.md removal): * README quick-start now points at zstd/CHANGELOG.md (release-plz maintained) so visitors browsing the GitHub repo or the docs.rs landing page have a one-click path to per-release history. * sequence_capture.rs:517 docstring still referenced constant_run_routes_through_matcher_path, but that test was renamed to rejects_constant_run_with_rle_on_wire_block when the raw/RLE detector started enforcing the contract at the public-API boundary. Update the cross-reference to point at the current test name and clarify that the new invariant is enforced by the rejection, not by a matcher-path check. * docs: link CHANGELOG by absolute GitHub URL so symlinked crate render works `zstd/README.md` is a crate-local symlink to `../README.md`, so the packaged crate published to crates.io / docs.rs renders the README from the crate root. The previous relative link `zstd/CHANGELOG.md` breaks there — it resolves to `zstd/zstd/CHANGELOG.md`, which does not exist. Repository-root rendering worked, but the crate-local render is what most users see. Switch to an absolute https://github.com/... URL that resolves identically from both rendering positions. * docs: switch README quick-start to `cargo add` to eliminate version drift Previously the README pinned `structured-zstd = "0.0.21"` in two TOML snippets. Every release required a manual bump in two places, and the pin went stale across the v0.0.20 → 0.0.21 cycle (caught in #154 only after the release). `cargo add structured-zstd` is the modern canonical install pattern (cargo 1.62+), and it never needs README updates — cargo resolves the latest compatible version at invocation time. Same for the `no_std` block: `cargo add structured-zstd --no-default-features`. Side-benefit: the README now stays evergreen across the rapid 0.0.x cadence without a release-plz hook or CI lint gate. * feat(bench): add level-sweep mode + assert_zstd_ok for FFI calls - STRUCTURED_ZSTD_BENCH_LEVEL accepts single / range / list / `all`, STRUCTURED_ZSTD_BENCH_MAX_ROWS caps per-fixture diff rows - Compact per-level summary line so a multi-level run stays scannable - Document Differ/RustOnly/FfiOnly as raw classifications, not value judgments — divergence from donor is allowed and sometimes expected - Route FFI panics through an `assert_zstd_ok` helper mirroring `encoding/match_generator.rs:6150-6162`, so messages carry the libzstd symbolic error name instead of a raw rc= - Refresh stale fixture-size note (256 KiB → 200 KiB) and rewrite the Raw_Block explanation to point at the raw fast-path classifier * fix(bench): reject non-positive levels in STRUCTURED_ZSTD_BENCH_LEVEL partition() now classifies values outside 1..=MAX_SUPPORTED_LEVEL as dropped (was only checking the upper bound, letting 0 and negatives through into compress_and_collect_sequences where they fail the Uncompressed/post-split guards anyway, but with a less helpful error) * fix(bench): mirror windowLog=14 small-input override + clarify diff label - FFI sequence path now applies the same windowLog=14 cap on inputs <= 16 KiB that compare_ffi.rs uses, so tiny fixtures see the same window on both sides and a divergence in the stream reflects a real strategy difference, not a window-config artifact - Summary line renames the bucket label from 'diverge' to 'differ' so RustOnly / FfiOnly aren't conflated with the DiffRow::Differ count
Summary
Two small polish gaps after the 0.0.21 release:
README quick-start (both root `README.md` and `zstd/README.md`, which is a symlink to it) still showed `structured-zstd = "0.0.20"` in both the default-features and `default-features = false` blocks. Bumped both to 0.0.21.
Top-level `Changelog.md` is inherited ruzstd content ("This document records the changes made between versions, starting with version 0.5.0" / "After 0.8.2 (Current)") — completely inapplicable to our 0.0.x lineage. The authoritative changelog is `zstd/CHANGELOG.md`, maintained by release-plz and current through 0.0.21. Deleted the stale duplicate.
Test plan
Doc-only, no code/tests/CI implications:
Closes #154
Summary by CodeRabbit
no_stdconfigurations.