Skip to content

v0.2.0 — Foundation

Pre-release
Pre-release

Choose a tag to compare

@jamesgober jamesgober released this 06 Jun 22:10
· 9 commits to main since this release

lsm-db v0.2.0 — Foundation

The first working engine. v0.2.0 turns the scaffold into a real
log-structured merge store: writes buffer in a sorted in-memory memtable and
flush to an immutable, fsynced sorted run on disk, reads check the buffer and
fall through to the run, and deletes are tombstones that resolve away on flush.
The Tier-1 API — open, put, get, delete, scan — is locked in and
tested against a reference model. Flushed data survives reopening.

This is a pre-1.0 foundation release. The on-disk format is not frozen yet
(that happens in 0.3 alongside multi-level compaction), and un-flushed writes are
not yet crash-safe (write-ahead logging lands in 0.4).

What is lsm-db?

A log-structured merge-tree storage engine for Rust — the write path that powers
RocksDB, LevelDB, Cassandra, and ScyllaDB, packaged as a small, audited library.
It is the storage layer the portfolio's database crates (txn-db, Hive DB) build
on, so the durability and read/write contract is implemented and tested once.

What's new in 0.2.0

The Lsm engine — the Tier-1 API

The whole common case is five calls over one type. Keys and values are arbitrary
bytes; keys are ordered lexicographically.

use lsm_db::Lsm;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let db = Lsm::open("my-db")?;

    db.put(b"user:1", b"alice")?;
    assert_eq!(db.get(b"user:1")?, Some(b"alice".to_vec()));

    db.delete(b"user:1")?;
    assert_eq!(db.get(b"user:1")?, None);
    Ok(())
}

Lsm is Send + Sync and every method takes &self, so one engine can be
shared across threads behind an Arc — readers run in parallel, writes
serialize, and a scan is a consistent snapshot that never blocks writers.

Range scans

scan merges the buffer and the on-disk run into one ascending stream over any
Vec<u8> range. The returned Scan is an ExactSizeIterator and
DoubleEndedIterator.

# fn main() -> Result<(), Box<dyn std::error::Error>> {
# let db = lsm_db::Lsm::open(tempfile::tempdir()?.path())?;
db.put(b"a", b"1")?;
db.put(b"b", b"2")?;
db.put(b"c", b"3")?;

// Half-open range [a, c).
let pairs: Vec<_> = db.scan(b"a".to_vec()..b"c".to_vec())?.collect();
assert_eq!(pairs, vec![(b"a".to_vec(), b"1".to_vec()), (b"b".to_vec(), b"2".to_vec())]);

// Prefix scan, full scan, reverse — all work.
assert_eq!(db.scan(..)?.count(), 3);
# Ok(())
# }

Grouped, atomic writes with Batch

A Batch collects puts and deletes and applies them under a single lock
acquisition, so concurrent readers see either none or all of the group.

# fn main() -> Result<(), Box<dyn std::error::Error>> {
use lsm_db::Batch;
# let db = lsm_db::Lsm::open(tempfile::tempdir()?.path())?;
let mut batch = Batch::new();
batch.put(b"a", b"1");
batch.put(b"b", b"2");
batch.delete(b"c");
db.write(batch)?;
# Ok(())
# }

Tunable write buffer with LsmConfig

Tier-2 tuning controls how large the memtable grows before it flushes. The
default is 4 MiB (DEFAULT_MEMTABLE_CAPACITY).

use lsm_db::{Lsm, LsmConfig};
# fn main() -> Result<(), Box<dyn std::error::Error>> {
let db = Lsm::open_with(
    tempfile::tempdir()?.path(),
    LsmConfig::new().memtable_capacity(64 * 1024),
)?;
# let _ = db;
# Ok(())
# }

Durable, atomic flush

A flush writes the new run to a temporary file, fsyncs it, drops the old run's
handle, and atomically renames the new file into place. A crash leaves either the
old run or the new one, never a torn file, and a leftover temporary file from an
interrupted flush is discarded on the next open. On-disk reads use positioned
reads (pread on Unix, seek_read on Windows) so concurrent readers share one
file handle without seeking over each other.

Error type integrated with error-forge

Every fallible call returns Result<T, Error>. Error implements
error_forge::ForgeError (kind / caption / is_fatal) and exposes the
underlying io::Error as its source. Corruption is the only fatal variant.

Testing

  • Property tests (proptest) check get and scan against a BTreeMap
    reference model across three memtable sizes (in-memory, frequent flush, flush
    every write), plus a flush-and-reopen check and randomized sub-range scans.
  • Integration tests cover multi-flush workloads with overwrites and deletes,
    reopen, atomic batches, and concurrent readers running alongside a writer.
  • criterion benchmarks cover point write, point read (hit and miss), and scan.

Counts at this tag:

  • Default features: 43 unit + 4 integration + 3 property + 21 doctests.
  • --all-features: identical (the optional integrations add no tested code yet).

All green on stable and MSRV (1.85) across Linux, macOS, and Windows; cargo fmt,
cargo clippy -D warnings, cargo doc -D warnings, cargo deny check, and
cargo audit clean. Zero unsafe (#![forbid(unsafe_code)]).

What's next

  • 0.3.0 — Levels + compaction + format freeze. Multiple sorted-run levels, a
    merge-iterator across them, background compaction concurrent with reads and
    writes, and the byte-level on-disk format specified and frozen for 1.x.

Installation

[dependencies]
lsm-db = "0.2"

MSRV: Rust 1.85 (2024 edition).

Documentation


Changelog: CHANGELOG.md.