v0.3.0 — Levels, Compaction, and a Frozen Format
Pre-releaselsm-db v0.3.0 — Levels, Compaction, and a Frozen Format
The real engine. v0.3.0 turns the single-run foundation into a proper
log-structured merge store: flushes append immutable sorted runs, reads merge
across the memtable and every run, and a background thread compacts runs into one
when they accumulate — concurrent with reads and writes. A manifest makes the run
set crash-recoverable, and the on-disk format is now frozen for the 1.x
series.
The Tier-1 API is unchanged; the only public addition is configuration.
What is lsm-db?
A log-structured merge-tree storage engine for Rust — the write path that powers
RocksDB, LevelDB, Cassandra, and ScyllaDB, packaged as a small, audited library.
It is the storage layer the portfolio's database crates (txn-db, Hive DB) build
on, so the durability and read/write contract is implemented and tested once.
What's new in 0.3.0
Multiple runs and a newest-wins merge
Each flush now writes a new immutable run instead of rewriting one. Point reads
consult the memtable, then each run from newest to oldest; range
scans merge the memtable and every run into one
ascending stream, with the newest source winning per key and tombstones resolved
away. The merge surfaces a corruption error rather than silently truncating if a
run's block fails its checksum.
# fn main() -> Result<(), Box<dyn std::error::Error>> {
# let db = lsm_db::Lsm::open(tempfile::tempdir()?.path())?;
db.put(b"k", b"first")?;
db.flush()?; // run 1
db.put(b"k", b"second")?;
db.flush()?; // run 2 — newer
assert_eq!(db.get(b"k")?, Some(b"second".to_vec())); // newest wins
# Ok(())
# }Background compaction
A dedicated thread merges the runs into one when their count reaches the
configured trigger (default four). The expensive merge runs with no lock
held; only the final swap takes the engine lock, so compaction never blocks
reads or writes for the duration of the merge. A run superseded by compaction is
reference counted — its file is deleted only once the last reader still holding it
has finished. Dropping the Lsm stops and joins the compactor.
use lsm_db::{Lsm, LsmConfig};
# fn main() -> Result<(), Box<dyn std::error::Error>> {
// Compact whenever six runs accumulate.
let db = Lsm::open_with(
tempfile::tempdir()?.path(),
LsmConfig::new().compaction_trigger(6),
)?;
# let _ = db;
# Ok(())
# }A frozen, block-structured on-disk format
Runs are now block-structured: data blocks (~4 KiB) with a block index, a
per-block CRC32C verified on read, and a per-index CRC32C in the footer.
Tombstones are stored on disk so a deletion can mask older runs until compaction
resolves it. Opening a run reads only the footer and index; values are read one
block at a time with a single positioned read.
The byte layout is frozen for the 1.x series and specified in
docs/SSTABLE_FORMAT.md.
Crash recovery via a manifest
A MANIFEST file records the live runs in recency order and the next sequence
number, rewritten atomically on every flush and compaction. It is the single
source of truth: on open, the engine opens exactly the runs it names and reclaims
any temporary file or run file it does not — the orphans a crash mid-flush or
mid-compaction leaves behind. Because both runs and the manifest are installed by
atomic rename, recovery always lands on a consistent state.
Testing
- Property test: a randomized put/delete workload against a small,
frequently-compacting engine matches aBTreeMapmodel exactly — no live key
lost or duplicated through compaction. - Concurrency stress: four writer threads and a scanning reader against an
engine with a low compaction trigger; the final state is exact and scans are
always strictly increasing (no torn or duplicated keys). - Crash recovery: ungraceful exit, stale temporary file, orphan run, missing
run, and corrupted block are all handled correctly. loommodel: exhaustive interleavings of a reader versus a compaction swap
confirm a reader never observes a torn merge, for both a live value and a
deletion.
Counts at this tag:
- Default features: 54 unit + 4 integration + 2 compaction + 6 recovery +
3 property + 23 doctests. loom(underRUSTFLAGS="--cfg loom"): 2 model checks.
All green on stable and MSRV (1.85) across Linux, macOS, and Windows; cargo fmt,
cargo clippy -D warnings, cargo doc -D warnings, cargo deny check, and
cargo audit clean. Zero unsafe (#![forbid(unsafe_code)]).
What's next
- 0.4.0 — Durability + crash recovery. A
wal-db-backed write path under the
durabilityfeature: log before the memtable insert, and replay the log on
open so no acknowledged write is lost across a crash.
Installation
[dependencies]
lsm-db = "0.3"MSRV: Rust 1.85 (2024 edition).
Documentation
Full diff: v0.2.0...v0.3.0.
Changelog: CHANGELOG.md.