zrip

Pure Rust zstd codec. Levels -8 through 4 (Fast and DFast strategies). Optimized for encode throughput in transfer pipelines that need standard zstd frames at high speed.

Why zrip

Fastest pure-Rust zstd encoder. Consistently outperforms other pure-Rust zstd implementations on both encode and decode across all supported levels. See the benchmarks below.

Negative levels (-8 through -1). Unlocks zstd's fastest compression tiers, useful when throughput matters more than ratio. L-8 is zrip's own addition beyond C zstd's range: raw literals only, no Huffman table build, approaching LZ4-class encode speed while still producing standard zstd frames.

Memory safety. Unsafe is minimized and confined to small, auditable primitives modules. The paranoid feature eliminates all remaining unsafe. See SAFETY.md.

Small codebase. ~12k lines of Rust. Levels above 4 add complexity for compression ratios that only matter in archival storage, not transfer pipelines.

Dictionary compression. COVER and FastCOVER training built in for small-message workloads (log lines, JSON records, RPC payloads).

no_std + alloc. Works in embedded and kernel contexts with the alloc feature; frame requires std.

WebAssembly. Available as @paddor/zrip on JSR. Auto-detects WASM SIMD support. 15% faster encode than C zstd compiled to WASM.

Performance

x86_64 details (per-file pipeline, scatter, matrix)

aarch64 (Apple M4)

wasm32 (wasmtime)

API

// One-shot (allocating)
let compressed = zrip::compress(input, 1)?;
let original   = zrip::decompress(&compressed)?;

// One-shot into caller buffer
let n = zrip::compress_into(input, &mut output_buf, 1)?;
zrip::decompress_into(&compressed, &mut output_vec)?;

// Reusable context (amortizes table allocation across calls)
let mut ctx = zrip::CompressContext::new(1)?;
let compressed = ctx.compress(input)?;

let mut dec = zrip::DecompressContext::new();
let original = dec.decompress(&compressed)?;

Streaming

use std::io::Write;

let mut enc = zrip::FrameEncoder::new(Vec::new(), 1)?;
enc.write_all(b"hello")?;
enc.write_all(b" world")?;
let compressed = enc.finish()?;

use std::io::Read;
let mut dec = zrip::FrameDecoder::new(&compressed[..]);
let mut out = String::new();
dec.read_to_string(&mut out)?;

FrameEncoder and FrameDecoder own persistent hash tables and workspace buffers. Call reset() to start a new frame while reusing all allocations:

let mut enc = zrip::FrameEncoder::new(Vec::new(), 1)?;
enc.write_all(b"first frame")?;
let first  = enc.reset(Vec::new())?;  // finishes frame, keeps buffers
enc.write_all(b"second frame")?;
let second = enc.finish()?;

Large-window and long distance matching

The normal match finder operates within a sliding window (512 KiB at L1). LDM finds matches at distances up to 1 << window_log bytes by sampling positions into a separate hash table. Useful for data with long-range repeats: log files, database dumps, source archives. See DESIGN.md for how LDM works.

let opts = zrip::Options::default().window_log(24).ldm(true);
let compressed = zrip::compress_opts(input, 1, &opts)?;

Streaming with LDM:

use std::io::Write;
let opts = zrip::Options::default().window_log(24).ldm(true);
let mut enc = zrip::FrameEncoder::with_options(Vec::new(), 1, &opts)?;
enc.write_all(input)?;
let compressed = enc.finish()?;

window_log controls the maximum match distance (24 = 16 MiB, 27 = 128 MiB). The main cost is memory: the encoder allocates an 8 MiB LDM hash table plus a window buffer of 1 << window_log bytes, and the decoder allocates a window buffer of the same size (declared in the frame header). At window_log=27 that is ~136 MiB on each side. On data without long-range repeats, LDM adds overhead with no ratio benefit.

Dictionary compression

let dict = zrip::Dictionary::from_bytes(&dict_bytes)?;
let compressed = zrip::compress_with_dict(input, 1, &dict)?;
let original   = zrip::decompress_with_dict(&compressed, &dict)?;

Streaming with a dictionary:

let mut enc = zrip::FrameEncoder::with_dict(Vec::new(), 1, dict.clone())?;
enc.write_all(input)?;
let compressed = enc.finish()?;

let mut dec = zrip::FrameDecoder::with_dict(&compressed[..], dict);
let mut output = Vec::new();
dec.read_to_end(&mut output)?;

CompressContext::with_dict() and DecompressContext::with_dict() provide the same reuse for one-shot compression.

Dictionary training

Build a dictionary from sample data using the built-in FastCOVER trainer. Requires the dict_builder feature.

use zrip::dict::{train_dict_fastcover, fastcover::FastCoverParams};

let samples: Vec<&[u8]> = messages.iter().map(|m| m.as_bytes()).collect();
let dict = train_dict_fastcover(&samples, 16384, FastCoverParams::default());
// Use with compress_with_dict() / decompress_with_dict()

Features

Feature	Default	Description
`std`	yes	Enables `CompressContext`, `DecompressContext`
`frame`	yes	Frame header parsing and writing; implies `std`
`alloc`	yes	`no_std` + heap via `alloc` crate
`ldm`	yes	Long distance matching for large-window compression
`dict_builder`	no	COVER/FastCOVER dictionary training
`simd`	yes	`fearless_simd` runtime dispatch (AVX2+BMI2, NEON, SIMD128)
`paranoid`	no	Pure safe Rust: `forbid(unsafe_code)` on all crates
`nightly`	no	`#[optimize]` attributes on hot functions

Safety

SAFETY.md documents the unsafe boundary and catalogs C zstd memory safety bugs that Rust prevents by construction.

All codec paths are fuzz-tested (16 targets, ~10.7M executions) and verified under Miri on both x86_64 and aarch64. Fuzz targets cover round-trip correctness, cross-validation against C zstd, streaming, dictionary modes, and corruption resistance (bitflip, splice, truncate, overwrite).

Design

DESIGN.md covers the encode/decode pipeline, performance architecture, SIMD dispatch, compile-time specialization, and divergences from C zstd.

Levels

Level	Strategy	Hash table	Min match	Literals	Sequences
-8	Fast	32 KB	5	Raw	Predefined FSE
-7..-1	Fast	32 KB	5	Huffman	Predefined/custom FSE
1	Fast	64 KB	4	Huffman	Predefined/custom FSE
2	Fast	512 KB	4	Huffman	Predefined/custom FSE
3	DFast	2x 1 MB	4	Huffman	Predefined/custom FSE
4	DFast	2x 2 MB	4	Huffman	Predefined/custom FSE

Level 0 maps to the library default (currently level 1). See DESIGN.md for parameter details and pipeline behavior per level.

Name		Name	Last commit message	Last commit date
Latest commit History 224 Commits
.cargo		.cargo
.github/workflows		.github/workflows
bench		bench
crates		crates
doc/charts		doc/charts
examples		examples
fuzz		fuzz
jsr		jsr
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Cargo.toml		Cargo.toml
DESIGN.md		DESIGN.md
DEVELOPMENT.md		DEVELOPMENT.md
LICENSE		LICENSE
README.md		README.md
SAFETY.md		SAFETY.md
SECURITY.md		SECURITY.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

zrip

Why zrip

Performance

API

Streaming

Large-window and long distance matching

Dictionary compression

Dictionary training

Features

Safety

Design

Levels

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

zrip

Why zrip

Performance

API

Streaming

Large-window and long distance matching

Dictionary compression

Dictionary training

Features

Safety

Design

Levels

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages