A high-performance integer compression library implemented in Rust that uses SIMD instructions to compress 32-bit unsigned integers into variable-length byte sequences. This implementation follows the Stream VByte algorithm described in "Stream VByte: Faster Byte-Oriented Integer Compression".
- Variable-length Compression: Efficiently compresses 32-bit integers into 1-4 bytes based on value magnitude
- SIMD Optimization: Uses x86_64 SIMD instructions for parallel processing of integer blocks
- Dual Implementation: Provides both scalar and SIMD variants for maximum compatibility
- Zero-Copy Design: Employs unsafe Rust for direct memory manipulation without unnecessary copying
- Memory-efficient: Uses compact control headers (2 bits per integer) to track compression ratios
The compressed data format consists of three sections:
- Total Integer Count (usize bytes)
- Control Headers (compressed size indicators)
- Compressed Data (variable-length encoded integers)
Each control header uses 2 bits to indicate compression level:
00
(0): 1-byte compression01
(1): 2-byte compression10
(2): 3-byte compression11
(3): 4-byte compression (uncompressed)
Headers are packed four per byte, with bits ordered right-to-left within each byte.
-
SIMD Processing
- Processes 8 integers simultaneously using 128-bit SIMD registers
- Uses specialized x86_64 instructions for parallel comparisons and bit manipulation
- Includes lookup tables for rapid compression length calculation
-
Memory Management
- Direct memory manipulation using unsafe Rust for zero-copy operations
- Efficient slice manipulation without unnecessary allocations
- Careful pointer arithmetic for optimal performance
-
Error Handling
- Comprehensive validation of input data
- Robust error handling using the
anyhow
crate - Proper bounds checking during compression/decompression
-
Scalar Implementation (
scl.rs
)- Traditional single-integer processing
- Fallback implementation for non-SIMD platforms
- Clear, maintainable code for reference
-
SIMD Implementation (
smd.rs
)- Leverages x86_64 SIMD instructions
- Processes multiple integers in parallel
- Uses lookup tables for optimization
-
Common Utilities (
lib.rs
)- Shared constants and utilities
- Header calculation functions
- Type definitions and common traits
- Comprehensive unit tests for both implementations
- Property-based testing with random input data
- Edge case validation
- Performance benchmarking comparisons
-
Memory Efficiency
- Optimal compression ratios for different integer ranges
- Minimal memory overhead for control structures
- Efficient handling of large datasets
-
Performance
- SIMD parallelization for up to 8x throughput
- Minimal branching in critical paths
- Efficient bit manipulation techniques
-
Code Quality
- Type-safe Rust implementation
- Clear separation of concerns
- Well-documented interfaces
- Comprehensive error handling
use svb::{smd, scl};
// SIMD-accelerated compression
let compressed = smd::enc(&integers)?;
// SIMD-accelerated decompression
let decompressed = smd::dec(&compressed)?;
// Scalar fallback compression
let compressed = scl::enc(&integers)?;
// Scalar fallback decompression
let decompressed = scl::dec(&compressed)?;
- Advanced Rust programming
- SIMD optimization
- Low-level memory management
- Algorithm implementation
- Performance optimization
- Systems programming
- Technical documentation
- Test-driven development
Bytes are organized as total integer count
, followed by control headers
, followed by the compressed data
.
Total Integer Count | Control Headers | Compressed Data |
---|---|---|
usize bytes |
bytes |
bytes |
Byte layout for svb compression.
Two bits
indicate how much compression occurs in a 4-byte integer.
The two bits are called a control header.
Compression Size | 1 byte | 2 bytes | 3 bytes | 4 bytes |
---|---|---|---|---|
Bit value | 00 |
01 |
10 |
11 |
Integer value of bits | 0 | 1 | 2 | 3 |
Compression size represented as two bits.
A header byte holds four control headers.
Within the header byte, bit values are indexed from right-to-left.
Header Byte Index | 3 | 2 | 1 | 0 |
---|---|---|---|---|
Example bit values | 00 |
00 |
11 |
01 |
A header byte containing four header values. The right-most two bits indicate compression size for the first integer.
Lemire blog: Stream VByte: breaking new speed records for integer compression
arXiv article: Stream VByte: Faster Byte-Oriented Integer Compression
Lemire C code: streamvbyte
- Good overview of format in README.
Pierce Rust code: stream-vbyte-rust
. ├── Cargo.lock ├── Cargo.toml ├── LICENSE ├── README.md └── svb ├── Cargo.toml └── src ├── lib.rs ├── scl.rs └── smd.rs