rust-zstd

Pure Rust implementation of the Zstandard compression format (RFC 8878). Compress and decompress with zero external C dependencies.

What is Zstandard?

Zstandard (zstd) is a lossless data compression algorithm developed by Yann Collet at Meta. It targets real-time compression scenarios, offering a wide range of compression/speed trade-offs while being backed by an extremely fast decoder.

How it works

Zstandard combines three classical compression techniques in a layered pipeline:

Input bytes
  │
  ▼
┌─────────────────────────────────────────────────┐
│  1. LZ77 Match Finding                         │
│     Slide a window over the input and find      │
│     repeated byte sequences ("matches").        │
│     Each match is encoded as a back-reference:  │
│       (literal_length, offset, match_length)    │
└──────────────────────┬──────────────────────────┘
                       │
  ▼
┌─────────────────────────────────────────────────┐
│  2. Huffman Coding (Literals)                   │
│     Bytes that don't belong to any match are    │
│     called "literals". They are compressed with │
│     Huffman coding — frequent bytes get shorter  │
│     binary codes, rare bytes get longer ones.   │
└──────────────────────┬──────────────────────────┘
                       │
  ▼
┌─────────────────────────────────────────────────┐
│  3. FSE — Finite State Entropy (Sequences)      │
│     The sequence of (literal_length, offset,    │
│     match_length) triples is encoded with FSE,  │
│     a tANS-family entropy coder that approaches │
│     the Shannon limit while decoding at table-  │
│     lookup speed.                               │
└──────────────────────┬──────────────────────────┘
                       │
  ▼
┌─────────────────────────────────────────────────┐
│  4. Frame / Block Structure                     │
│     Compressed data is organized into frames,   │
│     each containing one or more blocks (≤128KB  │
│     decompressed). Blocks can be:               │
│       • Raw — stored uncompressed               │
│       • RLE — single repeated byte              │
│       • Compressed — Huffman literals + FSE     │
│         sequences                               │
└─────────────────────────────────────────────────┘

The decoder reverses this pipeline: parse frame/block headers, decode FSE sequences, decode Huffman literals, then execute the match copy operations to reconstruct the original data.

Features

Pure Rust — no C bindings, no unsafe, no libc
Compress + Decompress — full codec, not decode-only
Spec-compliant — output is decodable by any standard zstd decoder (C libzstd, Python zstandard, etc.)
Compression levels 0–11 — from raw storage to deep lazy matching
Parallel compression — optional rayon-based block encoding (enabled by default)
Competitive ratios — within 0–5% of C zstd, better on some workloads
Fast decoder — 1.05–68x faster than C zstd across tested datasets
~5400 lines of Rust (vs ~30,000 lines in C zstd)

Installation

Add to your Cargo.toml:

[dependencies]
zstd-rs = { git = "https://github.com/physwkim/rust-zstd.git" }

Parallel compression is enabled by default. To disable it:

[dependencies]
zstd-rs = { git = "https://github.com/physwkim/rust-zstd.git", default-features = false }

API

Compress

use zstd_rs::compress;

// Compress with a specific level (0-11)
let compressed = compress(b"Hello, World!", 3);

// Level guide:
//   0     — no compression (raw blocks, fastest)
//   1-2   — greedy matching (fast)
//   3-5   — lazy matching (balanced)
//   6-8   — lazy matching + deeper search
//   9-11  — lazy matching + deepest search (best ratio)

use zstd_rs::compress_to_vec;

// Convenience wrapper — compresses at level 1
let compressed = compress_to_vec(b"Hello, World!");

Decompress

use zstd_rs::decompress;

let original = decompress(&compressed).expect("valid zstd frame");

decompress supports:

Single and concatenated frames
Skippable frames (silently skipped)
All standard block types (Raw, RLE, Compressed)
Huffman and FSE entropy coding
Repeat offsets and all sequence modes

Roundtrip example

use zstd_rs::{compress, decompress};

fn main() {
    let data = b"The quick brown fox jumps over the lazy dog. \
                  The quick brown fox jumps over the lazy dog.";

    let compressed = compress(data, 3);
    println!(
        "compressed {} bytes -> {} bytes ({:.1}x)",
        data.len(),
        compressed.len(),
        data.len() as f64 / compressed.len() as f64
    );

    let decompressed = decompress(&compressed).unwrap();
    assert_eq!(&decompressed, data);
    println!("roundtrip OK");
}

Performance

Benchmarked against C zstd 1.5.6 on Apple M4, 128 KB test data per dataset:

Compression ratio (vs C zstd)

Dataset	Level 1	Level 3	Level 7	Level 11
zeros	0.75x	0.76x	0.77x	0.77x
text	1.04x	1.05x	1.05x	1.05x
f64	1.03x	1.01x	1.01x	1.01x
mixed	1.00x	1.00x	1.02x	1.02x

Values < 1.0 = smaller output than C (better). All datasets within 5% of C zstd.

Decompression speed

Dataset	Rust (MB/s)	C (MB/s)	Speedup
zeros	33,653	492	68x
text	1,328	484	2.7x
f64	494	363	1.4x
mixed	385	368	1.05x

Compression speed

Dataset	Level 1	Level 3	Level 7	Level 11
zeros	57%	66%	33%	19%
text	74%	68%	44%	18%
f64	34%	36%	67%	131%
mixed	23%	21%	14%	20%

Percentage of C zstd compression speed. Match finding is the main bottleneck at lower levels.

Architecture

src/
├── lib.rs          # Public API: compress, compress_to_vec, decompress
├── compress.rs     # Encoder: match finding, Huffman, FSE, block encoding
├── decode.rs       # Decoder: frame/block parsing, entropy decoding, sequence execution
├── fse.rs          # FSE compression tables and sequence encoder
├── bitstream.rs    # Forward/backward bit writers
└── constants.rs    # Zstd format constants, predefined tables, code mappings

Compression pipeline

Match finding — hash-based (greedy or lazy) across the entire input with cross-block window
Block splitting — sequences partitioned into ≤128 KB decompressed blocks
Repeat offset resolution — zstd's 3-offset history per RFC 8878 §3.1.2.5
Literal encoding — Huffman with canonical codes, Treeless mode for sequential blocks
Sequence encoding — FSE tables (predefined or custom) for literal lengths, match lengths, and offsets
Block assembly — compressed block if smaller than raw, otherwise raw/RLE fallback

Testing

# Run all tests (144 roundtrip tests across 12 levels × 12 datasets)
cargo test

# Without parallel feature
cargo test --no-default-features

Acknowledgments

The decoder (decode.rs) is ported from ruzstd 0.8.2 by Moritz Borcherding, used under the MIT license. The encoder and all other modules are original implementations based on the Zstandard specification and C zstd's educational decoder.

License

BSD-3-Clause

The decoder module (src/decode.rs) retains its original MIT license from ruzstd. See the file header for the full license text.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github/workflows		.github/workflows
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rust-zstd

What is Zstandard?

How it works

Features

Installation

API

Compress

Decompress

Roundtrip example

Performance

Compression ratio (vs C zstd)

Decompression speed

Compression speed

Architecture

Compression pipeline

Testing

Acknowledgments

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

rust-zstd

What is Zstandard?

How it works

Features

Installation

API

Compress

Decompress

Roundtrip example

Performance

Compression ratio (vs C zstd)

Decompression speed

Compression speed

Architecture

Compression pipeline

Testing

Acknowledgments

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages