forked from KillingSpark/zstd-rs
-
Notifications
You must be signed in to change notification settings - Fork 0
Closed
Labels
P1-highHigh priority — core functionalityHigh priority — core functionalityenhancementNew feature or requestNew feature or requestperformancePerformance optimizationPerformance optimization
Description
Summary
Dictionary decompression works, dictionary building works (dict_builder feature), but dictionary compression is unimplemented. This is critical for CoordiNode's per-label trained dictionaries in LSM-tree where small values benefit enormously from shared dictionaries.
Current state
frame_compressor.rs:148— dictionary ID field set toNone, no dict integrationencoding/blocks/compressed.rs:27— offset history hardcoded to[1, 4, 8], not loaded from dictionaryencoding/blocks/compressed.rs:54— FSE table reuse not implemented
C reference implementation
Dictionary compression flow (zstd_compress.c)
- Load dictionary — parse magic, extract Huffman table, FSE tables, offset history, raw content
- Initialize matcher — prefill hash/chain tables with dictionary content positions
- Set initial state — offset history from dict (
rep[0..3]), entropy tables from dict - Frame header — write dictionary ID field
- First block — can reference dictionary content via offsets
Key functions
ZSTD_compress_insertDictionary()— main entry pointZSTD_loadCEntropy()— parse entropy tables from dict headerZSTD_loadDictionaryContent()— fill hash/chain tables with dict positionsZSTD_CCtx_refCDict()— reference pre-built dictionary
What needs to be implemented
- Dictionary loading in encoder — parse dict format, extract tables + content
- Matcher prefill — insert dictionary content positions into hash tables
- Initial offset history — load
[rep0, rep1, rep2]from dictionary instead of[1, 4, 8] - Initial entropy tables — use Huffman/FSE tables from dictionary for first block
- Frame header dict ID — write dictionary ID when dict is used
- FrameCompressor API — method to attach dictionary before compression
Acceptance criteria
-
FrameCompressoraccepts dictionary via new API method - Compressed output includes dictionary ID in frame header
- C zstd can decompress dict-compressed output
- structured-zstd can decompress own dict-compressed output
- Compression ratio on small values (1-10KB) significantly improves with trained dict
- Roundtrip test with dict_builder-generated dictionaries
Time estimate
3d
Blocked by
- feat: FSE table reuse and offset history optimization in encoder #17 (FSE table reuse and offset history — must work correctly before adding dict support)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
P1-highHigh priority — core functionalityHigh priority — core functionalityenhancementNew feature or requestNew feature or requestperformancePerformance optimizationPerformance optimization