Skip to content

Releases: feichai0017/holt

v0.7.1

13 Jun 14:32
279addf

Choose a tag to compare

Fixed

  • Durability: an acknowledged write could be lost after a crash. The
    checkpoint's WAL-truncate gate (maybe_truncate) only checked the
    BufferManager dirty/flushing/pending counters, not the store's own deferred
    durability (needs_flush). The I/O worker retires a written-through blob
    right after the pwrite but before the data fsync + manifest-delta persist,
    so the WAL could be truncated while a just-written blob's new slot mapping
    was still only in the in-memory manifest — leaving a crashed reopen with the
    acknowledged record in neither the WAL nor manifest.log. The gate now also
    waits on needs_flush(), mirroring the existing run_round early-skip
    guard. Surfaced by the nightly crash-soak; 0.7.0's lazy routing compaction
    amplified the exposure by re-writing the root blob every round.
  • Durability: a torn WAL tail is now truncated on reopen. Previously the
    writer reopened with O_APPEND over the torn bytes, turning a partial tail
    record into a mid-log torn record that a later replay would stop at,
    silently stranding every acknowledged record written after it. replay_wal
    now truncates the WAL to the last complete record on open — standard WAL
    recovery; the torn record was never acknowledged (the crash preceded its
    fdatasync), so nothing durable is lost.

v0.5.5

12 Jun 07:32

Choose a tag to compare

Fixed

  • FileBlobStore::open now takes an exclusive flock(2) on
    <data_dir>/store.lock and holds it for the lifetime of the
    instance. Two live instances on one data directory previously
    replayed manifest.log into the same next_slot, assigned the
    same slot to different blob GUIDs, and appended conflicting set
    deltas — permanently poisoning the manifest (every later open
    failed with FileBlobStore::Manifest::duplicate slot) while the
    colliding frames overwrote each other in blobs.dat. Since 0.5.0
    even read-only snapshots persist frozen root frames, so the
    overlap window of a plain handover (store = reopen(path)) was
    enough to trip this. Open now waits up to 5 s for the previous
    instance to finish dropping, so handover reopen serializes; a
    genuinely concurrent second opener fails with a clear
    WouldBlock error instead of corrupting the store. Same-process
    double-opens are caught too (flock is per open-file-description),
    and the kernel releases the lock if the holder crashes.

v0.7.0

11 Jun 13:19

Choose a tag to compare

Added

  • BlobStore::read_blobs — a batched full-frame read on the public trait.
    The default loops over read_blob; stores override it for device
    parallelism (Linux io_uring submits one ring batch, the pread store
    fans the reads across worker threads). Used by the cold-scan read-ahead.
  • Page-granular cold reads. A point lookup on routed (write-cold) data fetches
    only the header page, the blob's routing region, and the one leaf page its
    descent reaches (~18 KB mean) instead of pinning the whole 512 KB frame
    (~27× less cold I/O). The routing region is built at compaction.
  • Per-blob bloom filter at the tail of the routing region (read for free with
    it): cold negative lookups answer NotFound without a leaf-page read.
    No false negatives. Additive on disk (bloom_len == 0 = no bloom).
  • Bounded resident routing cache: routing regions for hot blobs are held in a
    byte-bounded cache so repeat cold reads skip the routing-region read.
  • Cold-scan I/O read-ahead: range scans prefetch upcoming child blobs through
    pin_scan_many → batched read_blobs, reading them at the device's natural
    queue depth instead of one serial round-trip each.

Changed

  • Breaking — on-disk format. Manifest format v4v6 (the blob header now
    records the per-blob routing-region geometry). Older manifests are not
    migrated
    — the loader rejects any non-v6 manifest, so a store written by
    0.6.x cannot be opened by this release (and a v6 store cannot be opened by
    0.6.x). Pre-1.0 with no production deployments; recreate the store on upgrade.
  • Compaction builds (and the read path validates) the routing region + bloom;
    structural write-path mutations de-route a blob, and write-cold blobs are
    re-routed lazily by maintenance.

Removed

  • Removed the cold.idx cold-read sidecar — the in-blob routing region is now
    the sole cold-read path.
  • Removed the docs/design/ working notes. Rationale for shipped features lives
    in commit messages; rationale for rejected paths (io_uring WAL rewrite, the
    two blob-fill fixes) lives in git history.

v0.6.0

09 Jun 16:39
2122f1c

Choose a tag to compare

Added

  • Added a shared in-memory WAL byte ring as the only append path. Foreground
    writers reserve byte ranges concurrently, copy encoded records directly into
    the ring, and a single flusher drains committed byte prefixes to the WAL file.
  • Added loom coverage and crash-soak validation for the WAL ring's
    reserve/publish/flush ordering, including multi-publisher gap-safety checks.

Changed

  • Flattened leaf storage and child addressing for the persistent ART layout:
    small records can stay inline, child body offsets are stored directly, and
    inner-node child scans use compact u16 addressing with SIMD fast paths.
  • Reworked the WAL group-commit plumbing around ring backpressure instead of a
    per-record channel/worker handoff, reducing the concurrent durable write
    bottleneck while preserving the existing WAL record format and replay reader.
  • Tightened journal validation so empty or over-capacity records are rejected
    before reservation instead of relying on debug-only assertions.

Removed

  • Removed the legacy WAL channel backend and its transitional design documents.
    The ring-backed journal is now the only implementation.
  • Removed rejected experiment notes that were no longer part of the supported
    architecture.

Validation

  • cargo test --workspace --all-features --locked
  • cargo clippy --workspace --all-features --all-targets --locked -- -D warnings
  • RUSTFLAGS="--cfg loom" cargo test -p holt --lib journal::ring::loom --locked

v0.5.4

07 Jun 07:53

Choose a tag to compare

Removed

  • Removed the external-log state-machine surface from holt core:
    Durability::StateMachine, DB::commit_durable,
    Tree::commit_durable, durable_applied_index, DB::scatter,
    DB::scatter_independent, and the file-store DurableManifest
    trailer.
  • Checkpoint images are now pure DB archive/transfer images. They
    contain family key/value data and no longer carry an external
    applied_index.
  • Atomic DB/Tree batches always use the exclusive mutation gate again;
    holt no longer has a StateMachine-only relaxed batch mode.

v0.5.3

06 Jun 14:36

Choose a tag to compare

Fixed

  • Preserved checkpoint-owned cache images while copy-on-write snapshot reclaim,
    DB-wide GC, direct blob deletes, or write-through paths run concurrently.
    This fixes NoKV-style metadata pressure that could otherwise report
    snapshot_dirty_versions: dirty entry lost cache image or
    write_through_batch: flushing entry lost cache image.
  • Kept direct write-through from retiring another in-flight checkpoint epoch;
    it now clears only unclaimed dirty state and leaves flushing ownership intact.

Validation

  • cargo fmt --all -- --check
  • cargo clippy --workspace --all-targets -- -D warnings
  • cargo test store::buffer_manager::tests -- --nocapture
  • NoKV sibling FUSE/RustFS/JuiceFS smoke with local Holt patch completed without
    checkpoint invariant failures.

v0.5.2

06 Jun 08:49

Choose a tag to compare

Added

  • Added CheckpointImage::validate() to validate a full exported DB
    checkpoint image before install or archive handoff, not just its header.
  • Added KeyScanOutcome and KeyRangeBuilder::visit_with_outcome so callers
    can distinguish prefix-list cache hits from real ART walks without changing
    the stable ScanStats field set.
  • Added PrefixCount, Tree::prefix_count, and View::prefix_count for
    bounded DFS-style prefix cardinality checks. Non-zero limits scan at most one
    entry past the limit and report whether the count is exact.
  • Added DB::scatter_independent for StateMachine-mode independent single-key
    fan-out across named families. It rejects duplicate (tree, key) pairs and
    applies unrelated writes concurrently through Holt's native per-key paths.

Changed

  • Refactored DB::scatter to share the same single-key apply helper as
    scatter_independent, keeping ordered scatter semantics while avoiding a
    second implementation of each operation kind.
  • Clarified DB::install_checkpoint as a fresh/wiped-DB install path; Holt does
    not expose online live-DB checkpoint replacement.

Validation

  • cargo clippy --workspace --all-targets -- -D warnings
  • cargo test --test scan_stats --test scatter --test checkpoint

v0.5.1

06 Jun 07:47

Choose a tag to compare

Fixed

  • Added durable recovery coverage for NoKV-style metadata stores using many
    named families and multi-family DB::atomic batches under
    Durability::StateMachine.
  • Verified that DB::commit_durable(applied_index) can reopen a
    metadata-service-shaped checkpoint without a Holt WAL and retain the durable
    applied index needed for external-log replay.

Validation

  • cargo test --test sm_durable durable_recovers_metadata_store_shaped_workload -- --exact --nocapture
  • cargo test --release --test sm_durable durable_recovers_metadata_store_shaped_workload -- --exact
  • NoKV sibling validation with a local Holt patch:
    cargo test --config 'patch.crates-io.holt.path="../holt"' -p nokv-meta -p nokv-cluster -p nokv-server

v0.5.0

06 Jun 05:58

Choose a tag to compare

This release adds a two-axis durability model (who owns durability ×
where data lives) and the metadata-shaped fast paths a replicated
metadata service needs, plus crash-consistent on-disk recovery for the
state-machine mode. It contains breaking API and on-disk changes — see
Changed.

Added

  • Durability policy. Durability::Wal { sync } / Durability::StateMachine
    replaces the ad-hoc wal_sync flag and is orthogonal to Storage. Wal is
    single-node — holt's own write-ahead log is the durable record.
    StateMachine is for a replicated state machine: an external log (e.g. Raft)
    owns durability and replay, and holt attaches no WAL.
  • Durable state-machine recovery. Under Durability::StateMachine with file
    storage, DB::commit_durable(applied_index) / Tree::commit_durable write a
    crash-consistent on-disk checkpoint without a WAL: a copy-on-write snapshot
    plus an atomic manifest rename recording the durable roots, applied_index,
    and the resume next_seq. Reopen rehydrates from it and exposes
    durable_applied_index(); the external log replays only the tail past that
    index. Verified by fault injection and a SIGKILL crash soak.
  • DB::export_checkpoint / DB::install_checkpoint — a whole-DB
    logical-KV snapshot image carrying applied_index, for shipping and
    installing state-machine snapshots (Raft InstallSnapshot).
  • Tree::put_many_if_absent — create every absent key as one atomic batch
    (single WAL record), reporting per key whether it was Created or
    AlreadyExists.
  • DB::scatter — independent single-key conditional writes across families
    with no cross-family atomic barrier; each runs on its own per-key concurrent
    path so unrelated keys never serialize. StateMachine-only (the log owns
    write ordering).
  • ScanStats — per-scan visited / returned / rollup / restarts
    accounting on RangeIter / KeyRangeIter (read via .stats()), and the
    return of KeyRangeBuilder::visit. Surfaces work-vs-yield so callers can spot
    tombstone-bloated listings.
  • Copy-on-write snapshots. Tree::snapshot returns a stable
    point-in-time Snapshot handle in O(1) — only the root frame is
    copied; the rest is shared with the live tree and forked
    copy-on-write only when a live write would overwrite a frame the
    snapshot still references. Reads have 1× amplification and there is no
    write overhead while no snapshot is live.
  • Tree::gc / DB::gc reclaim snapshot frames that a crash left
    orphaned because it occurred while a snapshot was still live.

Changed

  • Breaking. TreeConfig.wal_sync and TreeBuilder::wal_sync() are removed;
    use TreeConfig.durability / TreeBuilder::durability(Durability).
  • Breaking. KeyRangeBuilder::visit returns ScanStats instead of the
    emitted count (use stats.returned + stats.rollup).
  • Breaking (on-disk). The file-store manifest is v2 (durable trailer); v1
    manifests are not migrated.
  • Under Durability::StateMachine, atomic batches take the mutation gate shared
    rather than exclusive — the external log serializes writes, so applies no
    longer fence concurrent range scans. view / snapshot capture still fences,
    so consistent point-in-time reads are unaffected.
  • DB::open gates the WAL on durability (attach_wal()), not just on storage,
    so a file-backed StateMachine database no longer attaches a holt WAL.
  • Tree::view / DB::view are reimplemented on copy-on-write
    snapshots: same API and point-in-time semantics, but capture is now
    O(1) instead of eagerly copying every reachable blob frame, and holds
    no second in-memory copy of the captured subtree.

v0.4.2

02 Jun 01:30

Choose a tag to compare

Fixed

  • Fixed a DB checkpoint race where a concurrent pending delete could
    remove an in-flight cache image after the checkpoint worker had
    claimed it, causing write_through_batch: flushing entry lost cache image and blocking crash-safe checkpoint completion.
  • Kept pending-delete cleanup from reclaiming cache and route-resident
    state until the delete has been applied to the inner blob store.