perf(encoder): bump Fast window_log to 19 (donor parity, large-log-stream ratio fix)#187
Conversation
The Fast strategy's `MatchGenerator` keeps a per-`WindowEntry`
`SuffixStore`, so a hash entry's lifetime is bounded by `max_window_size`.
At `window_log = 17` that cap is exactly one `MAX_BLOCK_SIZE` block:
each new block evicts the previous one's `SuffixStore` and the
matcher cannot recall periodic patterns that span the block boundary.
On a 100 MiB periodic log stream (`large-log-stream`, ~380-byte
cycle, ~800 blocks at MAX_BLOCK_SIZE) this caused the Fast matcher
to re-emit the cycle's first ~380 bytes as literals at the head of
every block. Final wire output was 94 175 bytes vs donor's 9 752
bytes — a 9.66× ratio gap that masked Fast's entire compression
benefit on this fixture.
Bumping `window_log` to 19 matches donor's "level 1 / base for
negative levels" row in `clevels.h` (`{ 19, 13, 14, 1, 7, 0,
ZSTD_fast }`) and gives the `SuffixStore`-per-`WindowEntry` ring
~4 blocks of history. With cross-block recall available, Fast finds
the period-380 long match starting in block 2 and the per-block
literal preamble collapses to a single 380-byte head + a chain of
short sequences:
- L1 fast × large-log-stream: rust_bytes 94 175 → 11 355
(ratio 9.66× → 1.16×, donor 9 752)
Same bump applied to the negative-level Fast branch
(`resolve_level_params` Level(n) where n<0) so L−7..L−1 inherit the
donor "base for negative levels" windowLog=19.
Negative levels gain too on this fixture:
- L−7 fast × large-log-stream: 11 348 vs donor 14 592 (much better
than pre-fix)
- L−1 fast × large-log-stream: 11 354 vs donor 9 771 (1.16×, similar
to L1)
Full Fast matrix (8 levels × 7 scenarios = 56 cells): no scenario
regressed by more than +16 bytes; the only non-trivial cost is on
`low-entropy-1m` L−2..L1 (rust 176 vs donor 160-167) and the
`large-log-stream` L−2..L1 cluster (rust ~11.35 KiB vs donor 9.7-10.6
KiB — under 1.2× of donor). Both are dwarfed by the ~80 KiB win on
the L1 cell that triggered this fix.
Test fixups for the new window size:
- `dictionary_roundtrip_stays_valid_after_output_exceeds_window`
payload bumped from `128 KiB + 64` to `512 KiB + 64` so it still
crosses the advertised window boundary at L1 Fastest.
- Two `driver.window_size()` assertions on `CompressionLevel::Fastest`
updated from `1 << 17` to `1 << 19`.
534/534 lib tests pass; clippy and fmt clean.
Closes #186.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThe PR increases the window-log parameter for fast compression strategy from 17 to 19 (expanding the window from 128 KiB to 512 KiB), updates the corresponding test assertions in two regression tests, and adjusts a cross-window-boundary test fixture to account for the larger window. ChangesFast Strategy Window Size Adjustment
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
This PR aligns the encoder’s Fast strategy window size with the donor zstd tuning by increasing window_log from 17 (128 KiB) to 19 (512 KiB). This fixes a large ratio regression in Fast mode by retaining enough match history to capture periodic patterns that span MAX_BLOCK_SIZE block boundaries.
Changes:
- Bump Level 1 (Fast /
CompressionLevel::Fastest)window_logto 19 in the level parameter table. - Bump negative (ultra-fast) levels’ Fast backend
window_logto 19 for the same cross-block pattern retention. - Update unit tests that asserted Fastest window size and that depended on crossing the advertised window boundary.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
zstd/src/encoding/match_generator.rs |
Updates Fast level tuning (window_log) and adjusts associated driver window-size assertions in tests. |
zstd/src/encoding/frame_compressor.rs |
Updates a dictionary roundtrip test payload size so it still exceeds the new Fastest advertised window. |
Summary
Bump Fast strategy
window_logfrom 17 to 19 to match donor's{ 19, 13, 14, 1, 7, 0, ZSTD_fast }row inclevels.h. The Fastmatcher keeps a per-
WindowEntrySuffixStore; atwindow_log = 17the ring held exactly one
MAX_BLOCK_SIZEblock, so each new blockevicted the previous block's hash entries and the matcher could not
recall periodic patterns that span a block boundary. Bumping to 19
gives the ring ~4 blocks of history.
Impact
On
large-log-stream(100 MiB of repeating 4-line log cycle,~380 byte period, ~800 blocks at MAX_BLOCK_SIZE) the Fast matcher
previously re-emitted the cycle's first ~380 bytes as literals at the
head of every block. Final output was 94 175 bytes vs donor's 9 752.
After the bump:
Full Fast matrix (8 levels × 7 scenarios = 56 cells) re-measured —
no scenario regressed by more than +16 bytes. The remaining
non-trivial costs:
low-entropy-1mL−2..L1: rust 176 vs donor 160-167 (1.05-1.10×)large-log-streamL−2..L1 cluster: rust ~11.35 KiB vs donor9.7-10.6 KiB (under 1.2×)
Both are dwarfed by the ~80 KiB win on the L1 cell.
Verification
across all 7 scenarios
CompressionLevel::Fastest; cross-window-boundary payload size inthe dictionary roundtrip test)
Test fixups
dictionary_roundtrip_stays_valid_after_output_exceeds_window:payload bumped from
128 KiB + 64to512 KiB + 64so it stillcrosses the advertised window boundary at the new Fastest size.
driver_*_releases_oversized_hc_tablesand adjacent driver tests:1u64 << 17→1u64 << 19in twowindow_sizeassertions onCompressionLevel::Fastest.Related
Summary by CodeRabbit
Bug Fixes
Performance