v0.6.0 — Optimization
Pre-releaselsm-db v0.6.0 — Optimization
A block cache, and an honest comparison. v0.6.0 adds a shared cache of
decoded run blocks so repeat point reads do no I/O, and measures lsm-db
against sled and redb on the same workload — with the numbers recorded
honestly, gaps and all. No new public behaviour; one additive config knob.
What is lsm-db?
A log-structured merge-tree storage engine for Rust — the write path that powers
RocksDB, LevelDB, Cassandra, and ScyllaDB, packaged as a small, audited library.
It is the storage layer the portfolio's database crates (txn-db, Hive DB) build
on, so the durability and read/write contract is implemented and tested once.
What's new in 0.6.0
Block cache
A point lookup that reaches a run decodes one data block — a positioned read, a
CRC32C check, and a parse. The new block cache (on by default, 8 MiB) keeps
recently-read decoded blocks so a repeat lookup over a hot working set returns
its block from cache with none of that work.
use lsm_db::{Lsm, LsmConfig};
# fn main() -> Result<(), Box<dyn std::error::Error>> {
// A 64 MiB cache; or `.block_cache_capacity(0)` to turn it off.
let db = Lsm::open_with(
tempfile::tempdir()?.path(),
LsmConfig::new().block_cache_capacity(64 << 20),
)?;
# let _ = db;
# Ok(())
# }The cache is shared across an engine's runs and uses sharded CLOCK eviction —
the classic O(1) buffer-pool policy — so it adds no new runtime dependency.
Sequential scans and compaction read each block once and bypass the cache rather
than pollute it. A CI-enforced test asserts that a repeat lookup of the same key
reads zero data blocks, while a lookup with the cache disabled reads one.
Measured against sled and redb
A fair-shape comparison (identical keys, values, counts) over a 10,000-key set,
recorded in docs/PERFORMANCE.md:
| Operation | lsm-db |
sled 0.34 |
redb 2.6 |
|---|---|---|---|
| Point read (hit) | 125 ns | 215 ns | 156 ns |
| Bulk insert (10k) | 11.0 ms | 24.9 ms | 22.9 ms |
| Full scan (10k) | 1.80 ms | 1.61 ms | 0.39 ms |
lsm-db leads point reads and bulk inserts — the LSM tree's home turf. redb's
in-place B-tree scan is faster than lsm-db's, which materialises a consistent
snapshot of the range. That is the one place lsm-db trails, and it is called
out plainly rather than buried.
Notes
- Range scan still materialises a snapshot. Streaming it lazily would close
the scan gap but requires a fallibleScaniterator (block I/O would move
into iteration), which conflicts with the simplified-API mandate (today's
Scanis infallible) and the upcoming 0.7 API freeze. It is recorded as a 2.0
consideration; the snapshot semantics will not change within 1.x. - The cache only touches the point-read path, so there is no regression on the
write or scan paths.
Testing
- CI-enforced cache behaviour: a repeat lookup reads zero data blocks; a lookup
with the cache disabled reads one. - Block-cache unit tests (hit/miss, distinct keys, eviction bound, re-insert,
disabled). - All prior suites pass with the cache on by default.
Counts at this tag:
- Default features: 59 unit + 4 integration + 2 compaction + 6 recovery +
3 property + 23 doctests. --all-features: 72 unit + 6 bloom + 8 durability + 4 integration +
2 compaction + 6 recovery + 3 property + 23 doctests.loom(underRUSTFLAGS="--cfg loom"): 2 model checks.
All green on stable and MSRV (1.87) across Windows and Linux (WSL2); cargo fmt,
cargo clippy -D warnings, cargo doc -D warnings, cargo deny check, and
cargo audit clean. Zero unsafe (#![forbid(unsafe_code)]).
Breaking changes
None. LsmConfig::block_cache_capacity / block_cache_capacity_bytes and
DEFAULT_BLOCK_CACHE_CAPACITY are additive.
What's next
- 0.7.0 — Hardening + API freeze. Adversarial and corrupted-input tests
(truncated runs, garbage blocks — no panic, no over-allocation), edge cases
(disk-full during flush/compaction, very large values), cross-platform
re-verification; the public API formally frozen.
Installation
[dependencies]
lsm-db = "0.6"
# Crash-safe writes and/or bloom-filtered point reads:
lsm-db = { version = "0.6", features = ["durability", "bloom"] }MSRV: Rust 1.87 (2024 edition).
Documentation
Full diff: v0.5.0...v0.6.0.
Changelog: CHANGELOG.md.