Skip to content

Streaming daemon: Slice 1 · Layer 1 — Foundations #809

Description

@chowbao

Part of #808 (epic) · Builds on: feature/full-history · Reference implementation: #805

Slice 1 · Layer 1 of 4 — Foundations. The durable-state substrate of the streaming daemon, with no daemon goroutines yet — self-contained and reviewable on its own.

Context

The daemon keeps two tiers — hot (recent ledgers in a per-chunk RocksDB) and cold (older ledgers as immutable files). All durable state is tracked in a small catalog (meta-store RocksDB) via a strict one-write protocol, so cleanup walks catalog keys and never lists directories. This layer builds that substrate — the types, the catalog API, config, and locking that every later layer composes. It produces no running daemon.

Scope

  • Geometry & layout (build on pkg/chunk)
    • chunk id ↔ chunkFirstLedger / chunkLastLedger; LedgersPerChunk = 10_000; genesis = ledger 2
    • bucket id (chunk / 1000, %05d) for the on-disk directory layout
    • the signed-chunk sentinel chunk −1 (chunkLastLedger(-1) = 1) for the "nothing ingested yet" watermark
  • Catalog key schema
    • key families: chunk:{c:08d}:{ledgers|events|txhash}, hot:chunk:{c:08d}, config:* pins
      • L1 wires the ledgers kind; the events / txhash kinds arrive in Slices 2–3
    • a strict key↔path bijection — every key maps to exactly one file path
    • states: freezing | frozen | pruning (per-chunk artifacts) and transient | ready (hot DB)
    • typed reads:
      • State(chunk, kind) / HotState(chunk) — artifact & hot-DB state
      • ChunkArtifactKeys(), HotChunkKeys(), ReadyHotChunkKeys() — enumerate keys
      • EarliestLedger() — config-pin read; plus low-level Get / Has
      • (the tx-hash FrozenCoverage / IndexKeys reads are added in Slice 3)
  • The one-write protocol (mark-then-write)
    • put "freezing" before any I/O
    • write the file
    • fsync the file + parent dirent (+ grandparent on a new bucket dir)
    • flip the key to "frozen" (single put per-chunk; atomic commit batch for multi-key)
    • deletion is the reverse: demote "pruning" → unlink → fsyncDir → delete key ⟹ key absent ⟹ file gone
  • Key-driven chunk sweep (sweepChunkArtifacts)
    • the only per-chunk deleter: demote → unlink → fsyncDir → delete key, batched per family
  • Config
    • TOML schema ([service], [backfill], [backfill.bsb], [immutable_storage.*], [catalog], [streaming], [streaming.hot_storage], [logging]) + loader + defaults
    • single-process flock on the catalog path and every configured storage root + the hot-storage root (a second daemon on any shared root exits)
  • Cross-cutting primitives
    • RetentionGate — the reader-retention predicate (below-floor ⟹ not-found; ChunkBelowFloor)
    • the ArtifactSet / Kind abstraction + crash hooks

Acceptance

  • Crash-safety: simulated power-loss between each ordered protocol step; "every file has its key" and "key absent ⟹ file gone" hold at every interruption; multi-key batch atomicity.
  • Config: the loader parses valid configs and applies defaults. The malformed-case matrix (zero/over-max cpi, zero workers, negative retries, misaligned/sub-genesis floor) is rejected by validateConfig, which lives in Layer 4 (Streaming daemon: Slice 1 · Layer 4 — Daemon assembly, operability & validation #812 — it calls networkTip); those rejection assertions ship there, not in this layer's standalone tests.
  • Locking: two daemons sharing any root are blocked.
  • Green standalone: go build + go vet + go test -short pass for the package on its own (no later layers).

Design references

Dependencies

  • Depends on: nothing (branches off feature/full-history)
  • Composes (already merged): pkg/chunk, pkg/rocksdb, pkg/stores/metastore

Out of scope (later layers)

  • validateConfig in full — both the structural malformed-case rejections and the network-dependent earliest-ledger resolution → Layer 4 (Streaming daemon: Slice 1 · Layer 4 — Daemon assembly, operability & validation #812); it calls networkTip (a startup concern), so the whole function lands there — L1 supplies only the schema / loader / defaults
  • any daemon goroutine, hot-DB ingestion, freezing, the resolver → Layers 2–4
  • the events and tx-hash data types → Slices 2 and 3

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions