v0.5.0
This release adds a two-axis durability model (who owns durability ×
where data lives) and the metadata-shaped fast paths a replicated
metadata service needs, plus crash-consistent on-disk recovery for the
state-machine mode. It contains breaking API and on-disk changes — see
Changed.
Added
- Durability policy.
Durability::Wal { sync }/Durability::StateMachine
replaces the ad-hocwal_syncflag and is orthogonal toStorage.Walis
single-node — holt's own write-ahead log is the durable record.
StateMachineis for a replicated state machine: an external log (e.g. Raft)
owns durability and replay, and holt attaches no WAL. - Durable state-machine recovery. Under
Durability::StateMachinewith file
storage,DB::commit_durable(applied_index)/Tree::commit_durablewrite a
crash-consistent on-disk checkpoint without a WAL: a copy-on-write snapshot
plus an atomic manifest rename recording the durable roots,applied_index,
and the resumenext_seq. Reopen rehydrates from it and exposes
durable_applied_index(); the external log replays only the tail past that
index. Verified by fault injection and a SIGKILL crash soak. DB::export_checkpoint/DB::install_checkpoint— a whole-DB
logical-KV snapshot image carryingapplied_index, for shipping and
installing state-machine snapshots (RaftInstallSnapshot).Tree::put_many_if_absent— create every absent key as one atomic batch
(single WAL record), reporting per key whether it wasCreatedor
AlreadyExists.DB::scatter— independent single-key conditional writes across families
with no cross-family atomic barrier; each runs on its own per-key concurrent
path so unrelated keys never serialize.StateMachine-only (the log owns
write ordering).ScanStats— per-scanvisited/returned/rollup/restarts
accounting onRangeIter/KeyRangeIter(read via.stats()), and the
return ofKeyRangeBuilder::visit. Surfaces work-vs-yield so callers can spot
tombstone-bloated listings.- Copy-on-write snapshots.
Tree::snapshotreturns a stable
point-in-timeSnapshothandle in O(1) — only the root frame is
copied; the rest is shared with the live tree and forked
copy-on-write only when a live write would overwrite a frame the
snapshot still references. Reads have 1× amplification and there is no
write overhead while no snapshot is live. Tree::gc/DB::gcreclaim snapshot frames that a crash left
orphaned because it occurred while a snapshot was still live.
Changed
- Breaking.
TreeConfig.wal_syncandTreeBuilder::wal_sync()are removed;
useTreeConfig.durability/TreeBuilder::durability(Durability). - Breaking.
KeyRangeBuilder::visitreturnsScanStatsinstead of the
emitted count (usestats.returned + stats.rollup). - Breaking (on-disk). The file-store manifest is v2 (durable trailer); v1
manifests are not migrated. - Under
Durability::StateMachine, atomic batches take the mutation gate shared
rather than exclusive — the external log serializes writes, so applies no
longer fence concurrent range scans.view/snapshotcapture still fences,
so consistent point-in-time reads are unaffected. DB::opengates the WAL on durability (attach_wal()), not just on storage,
so a file-backedStateMachinedatabase no longer attaches a holt WAL.Tree::view/DB::vieware reimplemented on copy-on-write
snapshots: same API and point-in-time semantics, but capture is now
O(1) instead of eagerly copying every reachable blob frame, and holds
no second in-memory copy of the captured subtree.