Skip to content

perf(encoder): bump Fast window_log to 19 (donor parity, large-log-stream ratio fix)#187

Merged
polaz merged 1 commit into
mainfrom
feat/#186-fast-large-log-ratio
May 19, 2026
Merged

perf(encoder): bump Fast window_log to 19 (donor parity, large-log-stream ratio fix)#187
polaz merged 1 commit into
mainfrom
feat/#186-fast-large-log-ratio

Conversation

@polaz
Copy link
Copy Markdown
Member

@polaz polaz commented May 19, 2026

Summary

Bump Fast strategy window_log from 17 to 19 to match donor's
{ 19, 13, 14, 1, 7, 0, ZSTD_fast } row in clevels.h. The Fast
matcher keeps a per-WindowEntry SuffixStore; at window_log = 17
the ring held exactly one MAX_BLOCK_SIZE block, so each new block
evicted the previous block's hash entries and the matcher could not
recall periodic patterns that span a block boundary. Bumping to 19
gives the ring ~4 blocks of history.

Impact

On large-log-stream (100 MiB of repeating 4-line log cycle,
~380 byte period, ~800 blocks at MAX_BLOCK_SIZE) the Fast matcher
previously re-emitted the cycle's first ~380 bytes as literals at the
head of every block. Final output was 94 175 bytes vs donor's 9 752.
After the bump:

level rust_bytes before rust_bytes after donor ratio after
L1 fast × large-log-stream 94 175 11 355 9 752 1.16×
L−7 fast × large-log-stream (similar regression) 11 348 14 592 0.78×
L−4 fast × large-log-stream (similar regression) 11 353 12 182 0.93×

Full Fast matrix (8 levels × 7 scenarios = 56 cells) re-measured —
no scenario regressed by more than +16 bytes. The remaining
non-trivial costs:

  • low-entropy-1m L−2..L1: rust 176 vs donor 160-167 (1.05-1.10×)
  • large-log-stream L−2..L1 cluster: rust ~11.35 KiB vs donor
    9.7-10.6 KiB (under 1.2×)

Both are dwarfed by the ~80 KiB win on the L1 cell.

Verification

  • 534/534 lib tests pass
  • clippy + fmt clean
  • Full ratio matrix re-measured via REPORT lines for Fast levels
    across all 7 scenarios
  • Two existing tests updated (window-size assertions on
    CompressionLevel::Fastest; cross-window-boundary payload size in
    the dictionary roundtrip test)

Test fixups

  • dictionary_roundtrip_stays_valid_after_output_exceeds_window:
    payload bumped from 128 KiB + 64 to 512 KiB + 64 so it still
    crosses the advertised window boundary at the new Fastest size.
  • driver_*_releases_oversized_hc_tables and adjacent driver tests:
    1u64 << 171u64 << 19 in two window_size assertions on
    CompressionLevel::Fastest.

Related

Summary by CodeRabbit

  • Bug Fixes

    • Improved dictionary validation to correctly handle larger payloads across window boundaries.
  • Performance

    • Adjusted fast compression level tuning for improved compression window sizing.

Review Change Stack

The Fast strategy's `MatchGenerator` keeps a per-`WindowEntry`
`SuffixStore`, so a hash entry's lifetime is bounded by `max_window_size`.
At `window_log = 17` that cap is exactly one `MAX_BLOCK_SIZE` block:
each new block evicts the previous one's `SuffixStore` and the
matcher cannot recall periodic patterns that span the block boundary.

On a 100 MiB periodic log stream (`large-log-stream`, ~380-byte
cycle, ~800 blocks at MAX_BLOCK_SIZE) this caused the Fast matcher
to re-emit the cycle's first ~380 bytes as literals at the head of
every block. Final wire output was 94 175 bytes vs donor's 9 752
bytes — a 9.66× ratio gap that masked Fast's entire compression
benefit on this fixture.

Bumping `window_log` to 19 matches donor's "level 1 / base for
negative levels" row in `clevels.h` (`{ 19, 13, 14, 1, 7, 0,
ZSTD_fast }`) and gives the `SuffixStore`-per-`WindowEntry` ring
~4 blocks of history. With cross-block recall available, Fast finds
the period-380 long match starting in block 2 and the per-block
literal preamble collapses to a single 380-byte head + a chain of
short sequences:

- L1 fast × large-log-stream: rust_bytes 94 175 → 11 355
  (ratio 9.66× → 1.16×, donor 9 752)

Same bump applied to the negative-level Fast branch
(`resolve_level_params` Level(n) where n<0) so L−7..L−1 inherit the
donor "base for negative levels" windowLog=19.

Negative levels gain too on this fixture:
- L−7 fast × large-log-stream: 11 348 vs donor 14 592 (much better
  than pre-fix)
- L−1 fast × large-log-stream: 11 354 vs donor 9 771 (1.16×, similar
  to L1)

Full Fast matrix (8 levels × 7 scenarios = 56 cells): no scenario
regressed by more than +16 bytes; the only non-trivial cost is on
`low-entropy-1m` L−2..L1 (rust 176 vs donor 160-167) and the
`large-log-stream` L−2..L1 cluster (rust ~11.35 KiB vs donor 9.7-10.6
KiB — under 1.2× of donor). Both are dwarfed by the ~80 KiB win on
the L1 cell that triggered this fix.

Test fixups for the new window size:
- `dictionary_roundtrip_stays_valid_after_output_exceeds_window`
  payload bumped from `128 KiB + 64` to `512 KiB + 64` so it still
  crosses the advertised window boundary at L1 Fastest.
- Two `driver.window_size()` assertions on `CompressionLevel::Fastest`
  updated from `1 << 17` to `1 << 19`.

534/534 lib tests pass; clippy and fmt clean.

Closes #186.
Copilot AI review requested due to automatic review settings May 19, 2026 07:51
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 19, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: caf5a253-e4be-48a5-95a7-476e11a4b1f2

📥 Commits

Reviewing files that changed from the base of the PR and between af4fddd and bf47618.

📒 Files selected for processing (2)
  • zstd/src/encoding/frame_compressor.rs
  • zstd/src/encoding/match_generator.rs

📝 Walkthrough

Walkthrough

The PR increases the window-log parameter for fast compression strategy from 17 to 19 (expanding the window from 128 KiB to 512 KiB), updates the corresponding test assertions in two regression tests, and adjusts a cross-window-boundary test fixture to account for the larger window.

Changes

Fast Strategy Window Size Adjustment

Layer / File(s) Summary
Fast strategy window parameter updates
zstd/src/encoding/match_generator.rs
The LEVEL_TABLE entry for compression level 1 and the negative-level path in resolve_level_params both change window_log from 17 to 19, expanding the advertised window for the fast backend.
Test expectations for new window size
zstd/src/encoding/match_generator.rs, zstd/src/encoding/frame_compressor.rs
Test assertions in driver_switches_backends_and_initializes_dfast_via_reset and driver_best_to_fastest_releases_oversized_hc_tables are updated to expect (1u64 << 19), and the payload in dictionary_roundtrip_stays_valid_after_output_exceeds_window is increased to a 512 KiB threshold to keep the cross-window test valid.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Poem

🐰 A window grew from small to wide,
Fast matches find their hiding place,
Nineteen's log lets cycles hide no more—
While tests keep watch with assertions sure.
Compress fast, compress with pride! 💨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: bumping Fast strategy's window_log parameter from 17 to 19, which directly addresses the primary objective stated in linked issue #186.
Linked Issues check ✅ Passed Code changes successfully implement the fix for issue #186: window_log increased to 19 for Fast levels, test fixtures updated accordingly, and large-log-stream ratio improved from 9.66× to 1.16× donor parity, meeting the ≤1.5× acceptance criterion.
Out of Scope Changes check ✅ Passed All changes are directly scoped to issue #186: window_log tuning in match_generator.rs and corresponding test adjustments in frame_compressor.rs. No unrelated modifications present.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/#186-fast-large-log-ratio

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aligns the encoder’s Fast strategy window size with the donor zstd tuning by increasing window_log from 17 (128 KiB) to 19 (512 KiB). This fixes a large ratio regression in Fast mode by retaining enough match history to capture periodic patterns that span MAX_BLOCK_SIZE block boundaries.

Changes:

  • Bump Level 1 (Fast / CompressionLevel::Fastest) window_log to 19 in the level parameter table.
  • Bump negative (ultra-fast) levels’ Fast backend window_log to 19 for the same cross-block pattern retention.
  • Update unit tests that asserted Fastest window size and that depended on crossing the advertised window boundary.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
zstd/src/encoding/match_generator.rs Updates Fast level tuning (window_log) and adjusts associated driver window-size assertions in tests.
zstd/src/encoding/frame_compressor.rs Updates a dictionary roundtrip test payload size so it still exceeds the new Fastest advertised window.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ratio(encoder): fast L1 emits 9.66× more bytes than donor on large-log-stream

2 participants