perf(huff0): cache encoded weight-description bytes on `HuffmanTable` and reuse in emit path by Copilot · Pull Request #170 · structured-world/structured-zstd

Copilot · 2026-05-18T10:57:57Z

Goal

Eliminate duplicated FSE-encoding of Huffman weights across HuffmanTable's sizing + emit paths. Same weight stream was previously encoded multiple times per emitted compressed-literals block (once via try_table_description_size, once via HuffmanEncoder::write_table). Caching the encoded bytes on the table instance removes the redundancy without changing selection or output semantics.

What changed

std builds: lazy cache on HuffmanTable via std::sync::OnceLock<Option<Vec<u8>>> (atomic-init, Sync, lock-free read-fast-path). Populated on first call to try_table_description_size / writeable_table_description_size or HuffmanEncoder::write_table. Subsequent calls on the same HuffmanTable instance reuse the cached FSE bytes.
no_std builds: cache field is #[cfg(feature = "std")]-gated and absent entirely. try_table_description_size and write_table use the original recompute-every-time path. This preserves the Sync auto-trait on pub HuffmanTable for no_std + alloc consumers that share encoder tables across threads (e.g. via Arc<HuffmanTable>) — core::cell::OnceCell would have made the type !Sync and silently broken downstream API.
write_table cold-path raw fallback computes weights() exactly once, sharing the slice between cache initialization and the raw-nibble writer. The pre-merge implementation recomputed weights twice on this path, which was a measurable hotspot for small / low-cardinality tables.

Behavioral guarantees

Selection unchanged: try_table_description_size returns exactly the byte count the writer would produce. Planner decisions (compute_block_size_to_compressed, compress_literals_or_reuse) are not affected.
Raw fallback unchanged: when FSE description is not representable / not beneficial, the writer still emits raw nibble-packed weights.
Wire-format unchanged: byte-identical output to pre-PR for every (input, level) cell.
C FFI surface unchanged: cache is a Rust-internal implementation detail, never crosses the cdylib boundary.

API surface

#[cfg(feature = "std")]
type CachedDescription = std::sync::OnceLock<Option<Vec<u8>>>;

pub struct HuffmanTable {
    codes: Vec<(u32, u8)>,
    #[cfg(feature = "std")]
    cached_encoded_weight_description: CachedDescription,
}

No changes to public methods or trait impls. HuffmanTable retains Sync + Send + Clone under both feature configurations.

Tests

cached_encoded_weight_description_is_reused_for_write_table (std-only): verifies the cache populates on first size query and write_table emits exactly the cached bytes.
write_table_raw_path_initializes_none_cache (std-only): verifies raw-fallback path correctly stores Some(None) sentinel in the cache so subsequent calls skip the failed FSE attempt.
505 / 505 existing lib tests pass on default features.
cargo check --no-default-features builds clean (no_std path).

Benchmark — vs `main` (`414355a`), same-session A/B

compress/level_3_dfast/small-4k-log-lines/matrix/pure_rust (default preset, single-block small input — strongest relative gain since each encode hits the cold path):

Branch	Median	CI
`main`	28.01 µs	[26.36, 30.86] µs
this PR	25.58 µs	[24.62, 26.82] µs
delta	−8.7%	tighter CI on PR (cache removes per-emit jitter)

L2_dfast for cross-check on the same scenario:

Branch	Median	CI
`main`	32.10 µs	[31.73, 32.69] µs
this PR	28.85 µs	[28.72, 29.00] µs
delta	−10.1%	clean statistical separation

The cache eliminates one redundant FSE-encode-of-weights per emitted compressed-literals block. On single-block small inputs that's ~1-3 µs out of ~28-32 µs total = ~10% relative. On larger inputs (z000033, large-log-stream) the relative gain shrinks because the LZ pass dominates encode time, but the per-block absolute saving scales linearly with block count — same ratio, same speed across the corpus matrix.

Ratio sweep

All compare_ffi --features dict_builder REPORT cells identical to main on the test scenarios (small-4k-log L1-L22, z000033 L1-L22, large-log-stream L1-L22, low/high-entropy 1m). No new rust_bytes > ffi_bytes cells; no regression of pre-existing ones.

Related issues

Closes perf(huff0): investigate shared Huffman weight-description cache between planner and emitter [SPIKE] #169 (investigation spike for shared description cache).

codecov · 2026-05-18T12:02:04Z

Codecov Report

❌ Patch coverage is 98.97959% with 1 line in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
zstd/src/huff0/huff0_encoder.rs	98.97%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

polaz · 2026-05-18T12:02:48Z

@coderabbitai full review

coderabbitai · 2026-05-18T12:02:53Z

✅ Actions performed

Full review triggered.

coderabbitai · 2026-05-18T12:03:01Z

📝 Walkthrough

Walkthrough

This PR adds caching of FSE-encoded weight-descriptions to HuffmanTable using OnceCell. The cache is populated lazily via a new helper function and reused in write_table and try_table_description_size to avoid redundant encoding computations.

Changes

Huffman weight-description caching

Layer / File(s)	Summary
Cache field and lazy initialization `zstd/src/huff0/huff0_encoder.rs`	Adds `OnceCell` import, new `cached_encoded_weight_description` field to `HuffmanTable`, `cached_encoded_weight_description()` lazy loader that derives raw weights and computes FSE encoding, and field initialization in `build_from_weights`.
Cache usage in table writing and size estimation `zstd/src/huff0/huff0_encoder.rs`	`write_table` prefers cached FSE description; `try_table_description_size` returns cached size instead of recomputing.
Cache validation test `zstd/src/huff0/huff0_encoder.rs`	Unit test builds a large-alphabet `HuffmanTable`, accesses cached description, writes table, and verifies output begins with cached bytes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related issues

perf(huff0): investigate shared Huffman weight-description cache between planner and emitter [SPIKE] #169: Directly implements the proposed cached FSE-encoded weight-description stored on HuffmanTable and reused in both write_table and size estimation.
perf(huff0): donor HUF_optimalTableLog fast-path single-shot tableLog with cheap description-size proxy #167: Related to the same hot path in huff0_encoder.rs; both target reducing repeated FSE-encoding work on the weight description.

Poem

🐰 A cache born from cells so wise,
Weight descriptions need not be surprised,
Encoded once, then shared with glee,
Through write and size, reuse flows free,
One hop of speed, test passes by! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: adding a cache for encoded weight-description bytes on HuffmanTable and reusing it in the emit path for performance improvement.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch copilot/investigate-huffman-cache

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

polaz · 2026-05-18T12:03:44Z

@copilot re-review - and re-chech internet / repo links access

Copilot

Pull request overview

This PR reduces duplicated Huffman weight-description work during block compression by caching the FSE-encoded weight-description bytes on HuffmanTable and reusing them for both sizing and emission.

Changes:

Added a lazy cache (OnceCell<Option<Vec<u8>>>) to HuffmanTable for the encoded weight description.
Reworked table-description sizing and HuffmanEncoder::write_table() to reuse cached encoded bytes when available.
Added a unit test asserting the cache is populated by sizing and reused verbatim by write_table().

Copilot · 2026-05-18T12:05:37Z

@copilot re-review - and re-chech internet / repo links access

Re-review complete on commit 65a43a7: no additional code issues found in the current diff.

Re-check results:

Repo link access: https://github.com/structured-world/structured-zstd is reachable.
Internet access: https://www.rust-lang.org/ is reachable.

I also re-ran validation locally: cargo clippy -p structured-zstd --features hash,std,dict_builder -- -D warnings and the focused cache test both pass.

polaz · 2026-05-18T12:07:03Z

@copilot address your review threads, verify the actual code benchmarks after your theoretical implementation and provide achieved numbers (both pre and post) in issue body

Agent-Logs-Url: https://github.com/structured-world/structured-zstd/sessions/6bdb0283-45b1-4909-8dde-e9b2114fbf4c Co-authored-by: polaz <4152123+polaz@users.noreply.github.com>

…back Agent-Logs-Url: https://github.com/structured-world/structured-zstd/sessions/29cef8e6-3ed8-4ed3-a365-84f0ebf18546 Co-authored-by: polaz <4152123+polaz@users.noreply.github.com>

Agent-Logs-Url: https://github.com/structured-world/structured-zstd/sessions/29cef8e6-3ed8-4ed3-a365-84f0ebf18546 Co-authored-by: polaz <4152123+polaz@users.noreply.github.com>

…on field `core::cell::OnceCell` is `!Sync`, which made `pub HuffmanTable` silently lose its `Sync` auto-trait when the cache field was added. Downstream consumers sharing encoder tables across threads would see this as a breaking change. Introduce a `CachedDescription` type alias that resolves to: - `std::sync::OnceLock<Option<Vec<u8>>>` when `feature = "std"` is active — `Sync` via atomic-init, lock-free read-fast-path. - `core::cell::OnceCell<Option<Vec<u8>>>` otherwise — keeps no_std builds working, narrower thread-safety expected for no_std embedded targets that don't share heap state across threads. Same `get` / `get_or_init` interface in both branches, no call-site changes needed beyond the constructor (`CachedDescription::new()`). Benchmark on `compress/level_2_dfast/small-4k-log-lines/matrix/pure_rust`: 36.0 µs after the switch, within criterion noise of the pre-fix 37.3 µs — the std-build atomic-init read-path is effectively free.

Copilot · 2026-05-18T12:27:08Z

@copilot address your review threads, verify the actual code benchmarks after your theoretical implementation and provide achieved numbers (both pre and post) in issue body

Addressed in 5ff1e50.

Review-thread follow-up is now covered:

HuffmanTable cache cell uses std::sync::OnceLock on std builds (keeps thread-safe auto-traits there) and core::cell::OnceCell on no_std builds.
prior write_table() duplicate-weights fix remains in place (1cd4860 / 33153c3).

Bench numbers (actual runs, same target):

benchmark: compress/level_3_dfast/small-4k-log-lines/matrix/pure_rust
pre: 35.929–36.031 µs (108.41–108.72 MiB/s)
post: 36.383–36.586 µs (106.77–107.36 MiB/s)

I also updated the PR body checklist entry with these pre/post values.

polaz · 2026-05-18T12:36:36Z

@coderabbitai full review

coderabbitai · 2026-05-18T12:36:41Z

✅ Actions performed

Full review triggered.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

…che to std builds Cold-path raw fallback recomputed `weights()` twice — once via `cached_encoded_weight_description_with_weights(weights)` to initialize the cache, then again inside the prior `write_raw_table_description()` helper that fetched its own weights slice. For small / low-cardinality tables that's a measurable hotspot. Inline the raw-write path in `write_table` so it reuses the already-computed `weights` slice in the cold branch, while keeping the cached-`None` sentinel branch using a single fresh recompute (unavoidable — the cache stores only the FSE encoding, not the raw nibbles). The `write_raw_table_description` helper goes away — its one remaining caller was the cached-`None` path, inlined there too. Cache field `cached_encoded_weight_description` is now `#[cfg(feature = "std")]`. `core::cell::OnceCell` is `!Sync`, so in no_std builds the cache would have broken the `Sync` auto-trait for `pub HuffmanTable` — potentially breaking downstream consumers running no_std+alloc with `Arc<HuffmanTable>`. std builds keep `OnceLock<Option<Vec<u8>>>` (Sync, atomic-init). no_std builds drop the cache field entirely and revert to recompute-every-time — `try_table_description_size` and `write_table` get cfg-branched non-cached paths that match pre-cache semantics exactly. Cache-touching tests are gated on `feature = "std"` so the test suite still compiles in no_std-only configurations.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Copilot AI assigned Copilot and polaz May 18, 2026

Copilot started work on behalf of polaz May 18, 2026 10:58 View session

Copilot AI linked an issue May 18, 2026 that may be closed by this pull request

perf(huff0): investigate shared Huffman weight-description cache between planner and emitter [SPIKE] #169

Closed

Copilot AI changed the title ~~[WIP] Investigate shared Huffman weight-description cache between planner and emitter~~ perf(huff0): cache encoded weight-description bytes on HuffmanTable and reuse in emit path May 18, 2026

Copilot finished work on behalf of polaz May 18, 2026 11:06

Copilot AI requested a review from polaz May 18, 2026 11:06

polaz marked this pull request as ready for review May 18, 2026 11:59

Copilot AI review requested due to automatic review settings May 18, 2026 11:59

Copilot started reviewing on behalf of polaz May 18, 2026 11:59 View session

Copilot started work on behalf of polaz May 18, 2026 12:03 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

Comment thread zstd/src/huff0/huff0_encoder.rs

Comment thread zstd/src/huff0/huff0_encoder.rs

Copilot finished work on behalf of polaz May 18, 2026 12:06

Copilot started work on behalf of polaz May 18, 2026 12:06 View session

Copilot AI requested review from Copilot and removed request for Copilot May 18, 2026 12:16

Copilot finished work on behalf of polaz May 18, 2026 12:17

Copilot started work on behalf of polaz May 18, 2026 12:17 View session

Copilot AI and others added 4 commits May 18, 2026 15:19

Initial plan

da10101

perf(huff0): cache encoded weight descriptions on HuffmanTable

183c9e5

Agent-Logs-Url: https://github.com/structured-world/structured-zstd/sessions/6bdb0283-45b1-4909-8dde-e9b2114fbf4c Co-authored-by: polaz <4152123+polaz@users.noreply.github.com>

perf(huff0): avoid duplicate raw-weight recompute in write_table fall…

15e9417

…back Agent-Logs-Url: https://github.com/structured-world/structured-zstd/sessions/29cef8e6-3ed8-4ed3-a365-84f0ebf18546 Co-authored-by: polaz <4152123+polaz@users.noreply.github.com>

refactor(huff0): dedupe raw table write path in write_table

2c7df5b

Agent-Logs-Url: https://github.com/structured-world/structured-zstd/sessions/29cef8e6-3ed8-4ed3-a365-84f0ebf18546 Co-authored-by: polaz <4152123+polaz@users.noreply.github.com>

Copilot AI review requested due to automatic review settings May 18, 2026 12:23

Copilot finished work on behalf of polaz May 18, 2026 12:27

Copilot AI review requested due to automatic review settings May 18, 2026 12:33

polaz force-pushed the copilot/investigate-huffman-cache branch from 5ff1e50 to bdb20b0 Compare May 18, 2026 12:33

polaz unassigned Copilot May 18, 2026

polaz approved these changes May 18, 2026

View reviewed changes

Copilot started reviewing on behalf of polaz May 18, 2026 12:36 View session

polaz requested review from Copilot and removed request for Copilot May 18, 2026 12:36

Copilot started reviewing on behalf of polaz May 18, 2026 12:37 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

Comment thread zstd/src/huff0/huff0_encoder.rs Outdated

Comment thread zstd/src/huff0/huff0_encoder.rs Outdated

polaz requested review from Copilot and polaz and removed request for polaz May 18, 2026 12:52

Copilot started reviewing on behalf of polaz May 18, 2026 12:53 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

Comment thread zstd/src/huff0/huff0_encoder.rs

polaz merged commit c71559e into main May 18, 2026
27 of 28 checks passed

polaz deleted the copilot/investigate-huffman-cache branch May 18, 2026 13:06

sw-release-bot Bot mentioned this pull request May 18, 2026

chore: release v0.0.22 #156

Open

polaz mentioned this pull request May 18, 2026

perf(decode + encode-greedy): close 3-5× donor gap on negative-level decompress; share SIMD primitives + add dedicated greedy strategy #178

Open

7 tasks

Conversation

Copilot AI commented May 18, 2026 • edited by polaz Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Goal

What changed

Behavioral guarantees

API surface

Tests

Benchmark — vs main (414355a), same-session A/B

Ratio sweep

Related issues

Uh oh!

codecov Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

polaz commented May 18, 2026

Uh oh!

coderabbitai Bot commented May 18, 2026

Uh oh!

coderabbitai Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Poem

❌ Failed checks (1 warning)

Uh oh!

polaz commented May 18, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI commented May 18, 2026

Uh oh!

polaz commented May 18, 2026

Uh oh!

Copilot AI commented May 18, 2026

Uh oh!

polaz commented May 18, 2026

Uh oh!

coderabbitai Bot commented May 18, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented May 18, 2026 •

edited by polaz

Loading

Benchmark — vs `main` (`414355a`), same-session A/B

codecov Bot commented May 18, 2026 •

edited

Loading

coderabbitai Bot commented May 18, 2026 •

edited

Loading