Skip to content

v0.6.0 — Optimization

Pre-release
Pre-release

Choose a tag to compare

@jamesgober jamesgober released this 08 Jun 09:09
· 5 commits to main since this release

lsm-db v0.6.0 — Optimization

A block cache, and an honest comparison. v0.6.0 adds a shared cache of
decoded run blocks so repeat point reads do no I/O, and measures lsm-db
against sled and redb on the same workload — with the numbers recorded
honestly, gaps and all. No new public behaviour; one additive config knob.

What is lsm-db?

A log-structured merge-tree storage engine for Rust — the write path that powers
RocksDB, LevelDB, Cassandra, and ScyllaDB, packaged as a small, audited library.
It is the storage layer the portfolio's database crates (txn-db, Hive DB) build
on, so the durability and read/write contract is implemented and tested once.

What's new in 0.6.0

Block cache

A point lookup that reaches a run decodes one data block — a positioned read, a
CRC32C check, and a parse. The new block cache (on by default, 8 MiB) keeps
recently-read decoded blocks so a repeat lookup over a hot working set returns
its block from cache with none of that work.

use lsm_db::{Lsm, LsmConfig};
# fn main() -> Result<(), Box<dyn std::error::Error>> {
// A 64 MiB cache; or `.block_cache_capacity(0)` to turn it off.
let db = Lsm::open_with(
    tempfile::tempdir()?.path(),
    LsmConfig::new().block_cache_capacity(64 << 20),
)?;
# let _ = db;
# Ok(())
# }

The cache is shared across an engine's runs and uses sharded CLOCK eviction —
the classic O(1) buffer-pool policy — so it adds no new runtime dependency.
Sequential scans and compaction read each block once and bypass the cache rather
than pollute it. A CI-enforced test asserts that a repeat lookup of the same key
reads zero data blocks, while a lookup with the cache disabled reads one.

Measured against sled and redb

A fair-shape comparison (identical keys, values, counts) over a 10,000-key set,
recorded in docs/PERFORMANCE.md:

Operation lsm-db sled 0.34 redb 2.6
Point read (hit) 125 ns 215 ns 156 ns
Bulk insert (10k) 11.0 ms 24.9 ms 22.9 ms
Full scan (10k) 1.80 ms 1.61 ms 0.39 ms

lsm-db leads point reads and bulk inserts — the LSM tree's home turf. redb's
in-place B-tree scan is faster than lsm-db's, which materialises a consistent
snapshot of the range. That is the one place lsm-db trails, and it is called
out plainly rather than buried.

Notes

  • Range scan still materialises a snapshot. Streaming it lazily would close
    the scan gap but requires a fallible Scan iterator (block I/O would move
    into iteration), which conflicts with the simplified-API mandate (today's
    Scan is infallible) and the upcoming 0.7 API freeze. It is recorded as a 2.0
    consideration; the snapshot semantics will not change within 1.x.
  • The cache only touches the point-read path, so there is no regression on the
    write or scan paths.

Testing

  • CI-enforced cache behaviour: a repeat lookup reads zero data blocks; a lookup
    with the cache disabled reads one.
  • Block-cache unit tests (hit/miss, distinct keys, eviction bound, re-insert,
    disabled).
  • All prior suites pass with the cache on by default.

Counts at this tag:

  • Default features: 59 unit + 4 integration + 2 compaction + 6 recovery +
    3 property + 23 doctests.
  • --all-features: 72 unit + 6 bloom + 8 durability + 4 integration +
    2 compaction + 6 recovery + 3 property + 23 doctests.
  • loom (under RUSTFLAGS="--cfg loom"): 2 model checks.

All green on stable and MSRV (1.87) across Windows and Linux (WSL2); cargo fmt,
cargo clippy -D warnings, cargo doc -D warnings, cargo deny check, and
cargo audit clean. Zero unsafe (#![forbid(unsafe_code)]).

Breaking changes

None. LsmConfig::block_cache_capacity / block_cache_capacity_bytes and
DEFAULT_BLOCK_CACHE_CAPACITY are additive.

What's next

  • 0.7.0 — Hardening + API freeze. Adversarial and corrupted-input tests
    (truncated runs, garbage blocks — no panic, no over-allocation), edge cases
    (disk-full during flush/compaction, very large values), cross-platform
    re-verification; the public API formally frozen.

Installation

[dependencies]
lsm-db = "0.6"
# Crash-safe writes and/or bloom-filtered point reads:
lsm-db = { version = "0.6", features = ["durability", "bloom"] }

MSRV: Rust 1.87 (2024 edition).

Documentation


Full diff: v0.5.0...v0.6.0.
Changelog: CHANGELOG.md.