Release v0.2.0 · feichai0017/holt

Breaking

Public API surface closure. holt::layout, holt::journal,
holt::store are now pub(crate). The supported holt::*
surface is Tree, TreeBuilder, TreeConfig, Storage,
Error, Result, RangeBuilder, RangeEntry, RangeIter,
BlobStats, TreeStats, CheckpointerStats, TxnBatch,
CheckpointConfig, Backend, MemoryBackend,
PersistentBackend, AlignedBlobBuf, BlobGuid. The
metrics::render_prometheus renderer is part of the
metrics-feature surface.
pub use holt::BufferManager removed; BufferManager is
internal.
BlobGuid now re-exported at the crate root for custom
Backend implementations.
RangeBuilder::new is pub(crate) — use Tree::range() /
Tree::scan_prefix().
TreeConfig::checkpoint_byte_interval field +
TreeBuilder::checkpoint_byte_interval method removed. The
field was reserved and never read.
AllocOutcome shrunk to { slot }; ExtentAllocOutcome
shrunk to { byte_offset }. The other fields were dead.
encode_record returns () instead of Result<()> — no
fallible step.
BufferManager::capacity() / clear() removed. Dead code.
TreeConfig::flush_on_write renamed to
memory_flush_on_write — the field had no effect on
persistent trees; the v0.1 name suggested per-write fsync, which
it never was.
Error::NodeCorrupt is a struct variant with optional
blob_guid + slot fields. Construct via
Error::node_corrupt(ctx) and enrich via .with_blob_guid(g)
/ .with_slot(s). Pattern-matchers must spread the new fields
(NodeCorrupt { context, .. }).

Fixed — durability (W2D-strict)

Checkpoint error paths no longer drop drained state. Manual
Tree::checkpoint and the background round now restore every
snapshot they drained on every error return — WAL flush
failure, I/O worker channel-closed, and pre-delete Sync
failure paths previously left dirty / pending partially
drained, allowing the next round to truncate the WAL with cache
state still pending. See ARCHITECTURE.md §6 for the seven-phase
protocol.
Abort-on-dirty-failure gate before pending-delete. A failed
parent write_through no longer propagates to the dependent
child's manifest delete (which would have left the on-disk
parent referencing a slot the manifest no longer had). The pre-
delete sync still runs to fsync the writes that did succeed;
the pending set is restored and the next round retries the
parent + child together.
Writer ↔ background-checkpoint W2D race. Pending-delete
snapshot now drains inside the same wal.lock critical section
as snapshot_dirty + wal.flush, closing the inversion window
where a writer could land a fresh blob between the two drains.
scan.rs::refresh_blob_node_pointers inline bm.commit
replaced with bm.mark_dirty(parent_guid, STRUCTURAL_SEQ) so
the post-compact pointer repair stages through the unified
dirty-set protocol instead of pushing cache state straight to
backend.
Tree::compact documented NOT online-safe — running
concurrently with reads or writes can torn-read across
BlobNode crossings. The v0.3 maintenance latch will lift this.

Added

io-uring feature flag (Linux only). PersistentBackend
reads/writes route through a per-backend io_uring (depth 8)
instead of pread/pwrite.
tracing feature flag (off by default). Structured
tracing events on checkpoint round complete, spillover,
merge, compact, WAL truncate, and eviction sweeps. Zero-
cost when the feature is off.
metrics feature flag (off by default). Renders
TreeStats into Prometheus text format. Gauges
(holt_slots, holt_tombstones, holt_compactions) follow
the convention of dropping the _total suffix.
3-thread background checkpointer — planner + dedicated I/O
worker + cold-blob eviction sweep, parked between rounds via
park_timeout(idle_interval). Default disabled; opt in via
TreeBuilder::checkpoint(CheckpointConfig::default() .enabled(true)). Drop runs one final synchronous round on
the calling thread.
Tree::scan_prefix(p) — one-line wrapper for
tree.range().prefix(p).
Tree::stats extended with bm_dirty_count,
bm_pending_delete_count, bm_cache_hits / bm_cache_misses,
bm_optimistic_restarts, and an Option<CheckpointerStats>.
Silent observability reads — pin_silent /
get_cached_silent / collect_blob_guids_silent don't bump
cache counters or refresh the LRU tick, so Tree::stats and
metrics scrapes don't pollute the counters they report.
Error::Internal(&'static str) variant for invariant-
violation paths (previously Error::NotYetImplemented, now
reserved for genuine walker-arm feature gaps). Non-breaking
thanks to Error's #[non_exhaustive] marker.

Changed

Sharded BufferManager cache — v0.1's
Mutex<HashMap<BlobGuid, _>> + VecDeque<BlobGuid> LRU
replaced by DashMap<BlobGuid, Arc<CachedBlob>> with
clock_tick / last_touched eviction; concurrent pins on
different blobs hit different shards instead of contending on
a single mutex.
Cached Tree.root_pin — every get / put / delete
keeps the root pinned via Arc<CachedBlob> and skips the BM
hash lookup on the root hop (~300 ns/op on the hot path).
RangeIter delimiter fast-forward — after emitting a
CommonPrefix(C), ascend the descent stack past C's subtree
instead of scanning every leaf. *_list_dir is now
O(distinct_rollups).
Hardware-accelerated CRC32 via crc32fast — auto-detects
PCLMULQDQ on x86_64 and ARM-CRC32 on AArch64. Drops per-record
WAL cost from ~110 ns to ~20 ns on supported hardware.
SIMD Node48 / Node256 range-iter scans — vpcmpeqb / NEON
byte search for Node48::index[256], slot-index scan for
Node256::children[256]. Worth ~80-120 ns per next() on
wide branch nodes; matters most for *_list_dir.

Benchmarks

Group B — scale curve across kv / objstore / fs × four
dataset sizes ({ 20 k, 100 k, 500 k, 2 M }). The 500 k tier
already exceeds the default 32 MB buffer pool; the 2 M tier
(~192 MB payload) forces full eviction churn. Get scales
beautifully on all three workloads (holt wins every cell with
the lead vs RocksDB widening to 5.4× / 2.8× / 2.2× at 2 M).
Put wins at 20 k / 100 k / 500 k, ties RocksDB at 2 M kv,
but loses 8-22 % to RocksDB / SQLite at 2 M on objstore / fs
— the regime where LSM-style write amortization is the right
choice and ART-over-blobs isn't competitive; cross-blob lock-
coupling is queued for v0.3 to close the gap.
Group C — p95/p99 under maintenance interference
(tests/bench_contention_p95.rs, #[ignore]). 4 writer
threads + 5 ms-cadence background checkpointer + concurrent
Tree::compact(); tracks every put latency via
hdrhistogram. M3 Pro: 307k ops/s sustained, p50 = 2 µs,
p99 = 108 µs.
PGO build profile docs in PGO.md.

Full numbers in benches/RESULTS.md.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Breaking

Fixed — durability (W2D-strict)

Added

Changed

Benchmarks

Uh oh!