Releases: feichai0017/holt
Releases · feichai0017/holt
v0.7.1
Fixed
- Durability: an acknowledged write could be lost after a crash. The
checkpoint's WAL-truncate gate (maybe_truncate) only checked the
BufferManager dirty/flushing/pending counters, not the store's own deferred
durability (needs_flush). The I/O worker retires a written-through blob
right after thepwritebut before the data fsync + manifest-delta persist,
so the WAL could be truncated while a just-written blob's new slot mapping
was still only in the in-memory manifest — leaving a crashed reopen with the
acknowledged record in neither the WAL normanifest.log. The gate now also
waits onneeds_flush(), mirroring the existingrun_roundearly-skip
guard. Surfaced by the nightly crash-soak; 0.7.0's lazy routing compaction
amplified the exposure by re-writing the root blob every round. - Durability: a torn WAL tail is now truncated on reopen. Previously the
writer reopened withO_APPENDover the torn bytes, turning a partial tail
record into a mid-log torn record that a later replay would stop at,
silently stranding every acknowledged record written after it.replay_wal
now truncates the WAL to the last complete record on open — standard WAL
recovery; the torn record was never acknowledged (the crash preceded its
fdatasync), so nothing durable is lost.
v0.5.5
Fixed
FileBlobStore::opennow takes an exclusiveflock(2)on
<data_dir>/store.lockand holds it for the lifetime of the
instance. Two live instances on one data directory previously
replayedmanifest.loginto the samenext_slot, assigned the
same slot to different blob GUIDs, and appended conflicting set
deltas — permanently poisoning the manifest (every later open
failed withFileBlobStore::Manifest::duplicate slot) while the
colliding frames overwrote each other inblobs.dat. Since 0.5.0
even read-only snapshots persist frozen root frames, so the
overlap window of a plain handover (store = reopen(path)) was
enough to trip this. Open now waits up to 5 s for the previous
instance to finish dropping, so handover reopen serializes; a
genuinely concurrent second opener fails with a clear
WouldBlockerror instead of corrupting the store. Same-process
double-opens are caught too (flockis per open-file-description),
and the kernel releases the lock if the holder crashes.
v0.7.0
Added
BlobStore::read_blobs— a batched full-frame read on the public trait.
The default loops overread_blob; stores override it for device
parallelism (Linuxio_uringsubmits one ring batch, thepreadstore
fans the reads across worker threads). Used by the cold-scan read-ahead.- Page-granular cold reads. A point lookup on routed (write-cold) data fetches
only the header page, the blob's routing region, and the one leaf page its
descent reaches (~18 KB mean) instead of pinning the whole 512 KB frame
(~27× less cold I/O). The routing region is built at compaction. - Per-blob bloom filter at the tail of the routing region (read for free with
it): cold negative lookups answerNotFoundwithout a leaf-page read.
No false negatives. Additive on disk (bloom_len == 0= no bloom). - Bounded resident routing cache: routing regions for hot blobs are held in a
byte-bounded cache so repeat cold reads skip the routing-region read. - Cold-scan I/O read-ahead: range scans prefetch upcoming child blobs through
pin_scan_many→ batchedread_blobs, reading them at the device's natural
queue depth instead of one serial round-trip each.
Changed
- Breaking — on-disk format. Manifest format
v4→v6(the blob header now
records the per-blob routing-region geometry). Older manifests are not
migrated — the loader rejects any non-v6manifest, so a store written by
0.6.x cannot be opened by this release (and av6store cannot be opened by
0.6.x). Pre-1.0 with no production deployments; recreate the store on upgrade. - Compaction builds (and the read path validates) the routing region + bloom;
structural write-path mutations de-route a blob, and write-cold blobs are
re-routed lazily by maintenance.
Removed
- Removed the
cold.idxcold-read sidecar — the in-blob routing region is now
the sole cold-read path. - Removed the
docs/design/working notes. Rationale for shipped features lives
in commit messages; rationale for rejected paths (io_uring WAL rewrite, the
two blob-fill fixes) lives in git history.
v0.6.0
Added
- Added a shared in-memory WAL byte ring as the only append path. Foreground
writers reserve byte ranges concurrently, copy encoded records directly into
the ring, and a single flusher drains committed byte prefixes to the WAL file. - Added loom coverage and crash-soak validation for the WAL ring's
reserve/publish/flush ordering, including multi-publisher gap-safety checks.
Changed
- Flattened leaf storage and child addressing for the persistent ART layout:
small records can stay inline, child body offsets are stored directly, and
inner-node child scans use compactu16addressing with SIMD fast paths. - Reworked the WAL group-commit plumbing around ring backpressure instead of a
per-record channel/worker handoff, reducing the concurrent durable write
bottleneck while preserving the existing WAL record format and replay reader. - Tightened journal validation so empty or over-capacity records are rejected
before reservation instead of relying on debug-only assertions.
Removed
- Removed the legacy WAL channel backend and its transitional design documents.
The ring-backed journal is now the only implementation. - Removed rejected experiment notes that were no longer part of the supported
architecture.
Validation
cargo test --workspace --all-features --lockedcargo clippy --workspace --all-features --all-targets --locked -- -D warningsRUSTFLAGS="--cfg loom" cargo test -p holt --lib journal::ring::loom --locked
v0.5.4
Removed
- Removed the external-log state-machine surface from holt core:
Durability::StateMachine,DB::commit_durable,
Tree::commit_durable,durable_applied_index,DB::scatter,
DB::scatter_independent, and the file-storeDurableManifest
trailer. - Checkpoint images are now pure DB archive/transfer images. They
contain family key/value data and no longer carry an external
applied_index. - Atomic DB/Tree batches always use the exclusive mutation gate again;
holt no longer has a StateMachine-only relaxed batch mode.
v0.5.3
Fixed
- Preserved checkpoint-owned cache images while copy-on-write snapshot reclaim,
DB-wide GC, direct blob deletes, or write-through paths run concurrently.
This fixes NoKV-style metadata pressure that could otherwise report
snapshot_dirty_versions: dirty entry lost cache imageor
write_through_batch: flushing entry lost cache image. - Kept direct write-through from retiring another in-flight checkpoint epoch;
it now clears only unclaimed dirty state and leaves flushing ownership intact.
Validation
cargo fmt --all -- --checkcargo clippy --workspace --all-targets -- -D warningscargo test store::buffer_manager::tests -- --nocapture- NoKV sibling FUSE/RustFS/JuiceFS smoke with local Holt patch completed without
checkpoint invariant failures.
v0.5.2
Added
- Added
CheckpointImage::validate()to validate a full exported DB
checkpoint image before install or archive handoff, not just its header. - Added
KeyScanOutcomeandKeyRangeBuilder::visit_with_outcomeso callers
can distinguish prefix-list cache hits from real ART walks without changing
the stableScanStatsfield set. - Added
PrefixCount,Tree::prefix_count, andView::prefix_countfor
bounded DFS-style prefix cardinality checks. Non-zero limits scan at most one
entry past the limit and report whether the count is exact. - Added
DB::scatter_independentfor StateMachine-mode independent single-key
fan-out across named families. It rejects duplicate(tree, key)pairs and
applies unrelated writes concurrently through Holt's native per-key paths.
Changed
- Refactored
DB::scatterto share the same single-key apply helper as
scatter_independent, keeping ordered scatter semantics while avoiding a
second implementation of each operation kind. - Clarified
DB::install_checkpointas a fresh/wiped-DB install path; Holt does
not expose online live-DB checkpoint replacement.
Validation
cargo clippy --workspace --all-targets -- -D warningscargo test --test scan_stats --test scatter --test checkpoint
v0.5.1
Fixed
- Added durable recovery coverage for NoKV-style metadata stores using many
named families and multi-familyDB::atomicbatches under
Durability::StateMachine. - Verified that
DB::commit_durable(applied_index)can reopen a
metadata-service-shaped checkpoint without a Holt WAL and retain the durable
applied index needed for external-log replay.
Validation
cargo test --test sm_durable durable_recovers_metadata_store_shaped_workload -- --exact --nocapturecargo test --release --test sm_durable durable_recovers_metadata_store_shaped_workload -- --exact- NoKV sibling validation with a local Holt patch:
cargo test --config 'patch.crates-io.holt.path="../holt"' -p nokv-meta -p nokv-cluster -p nokv-server
v0.5.0
This release adds a two-axis durability model (who owns durability ×
where data lives) and the metadata-shaped fast paths a replicated
metadata service needs, plus crash-consistent on-disk recovery for the
state-machine mode. It contains breaking API and on-disk changes — see
Changed.
Added
- Durability policy.
Durability::Wal { sync }/Durability::StateMachine
replaces the ad-hocwal_syncflag and is orthogonal toStorage.Walis
single-node — holt's own write-ahead log is the durable record.
StateMachineis for a replicated state machine: an external log (e.g. Raft)
owns durability and replay, and holt attaches no WAL. - Durable state-machine recovery. Under
Durability::StateMachinewith file
storage,DB::commit_durable(applied_index)/Tree::commit_durablewrite a
crash-consistent on-disk checkpoint without a WAL: a copy-on-write snapshot
plus an atomic manifest rename recording the durable roots,applied_index,
and the resumenext_seq. Reopen rehydrates from it and exposes
durable_applied_index(); the external log replays only the tail past that
index. Verified by fault injection and a SIGKILL crash soak. DB::export_checkpoint/DB::install_checkpoint— a whole-DB
logical-KV snapshot image carryingapplied_index, for shipping and
installing state-machine snapshots (RaftInstallSnapshot).Tree::put_many_if_absent— create every absent key as one atomic batch
(single WAL record), reporting per key whether it wasCreatedor
AlreadyExists.DB::scatter— independent single-key conditional writes across families
with no cross-family atomic barrier; each runs on its own per-key concurrent
path so unrelated keys never serialize.StateMachine-only (the log owns
write ordering).ScanStats— per-scanvisited/returned/rollup/restarts
accounting onRangeIter/KeyRangeIter(read via.stats()), and the
return ofKeyRangeBuilder::visit. Surfaces work-vs-yield so callers can spot
tombstone-bloated listings.- Copy-on-write snapshots.
Tree::snapshotreturns a stable
point-in-timeSnapshothandle in O(1) — only the root frame is
copied; the rest is shared with the live tree and forked
copy-on-write only when a live write would overwrite a frame the
snapshot still references. Reads have 1× amplification and there is no
write overhead while no snapshot is live. Tree::gc/DB::gcreclaim snapshot frames that a crash left
orphaned because it occurred while a snapshot was still live.
Changed
- Breaking.
TreeConfig.wal_syncandTreeBuilder::wal_sync()are removed;
useTreeConfig.durability/TreeBuilder::durability(Durability). - Breaking.
KeyRangeBuilder::visitreturnsScanStatsinstead of the
emitted count (usestats.returned + stats.rollup). - Breaking (on-disk). The file-store manifest is v2 (durable trailer); v1
manifests are not migrated. - Under
Durability::StateMachine, atomic batches take the mutation gate shared
rather than exclusive — the external log serializes writes, so applies no
longer fence concurrent range scans.view/snapshotcapture still fences,
so consistent point-in-time reads are unaffected. DB::opengates the WAL on durability (attach_wal()), not just on storage,
so a file-backedStateMachinedatabase no longer attaches a holt WAL.Tree::view/DB::vieware reimplemented on copy-on-write
snapshots: same API and point-in-time semantics, but capture is now
O(1) instead of eagerly copying every reachable blob frame, and holds
no second in-memory copy of the captured subtree.
v0.4.2
Fixed
- Fixed a DB checkpoint race where a concurrent pending delete could
remove an in-flight cache image after the checkpoint worker had
claimed it, causingwrite_through_batch: flushing entry lost cache imageand blocking crash-safe checkpoint completion. - Kept pending-delete cleanup from reclaiming cache and route-resident
state until the delete has been applied to the inner blob store.