Skip to content

M3 foundation: v3 registry + convergent publish + composable FS5 layers#2

Merged
redsolver merged 7 commits into
mainfrom
m3-foundation
Apr 27, 2026
Merged

M3 foundation: v3 registry + convergent publish + composable FS5 layers#2
redsolver merged 7 commits into
mainfrom
m3-foundation

Conversation

@redsolver
Copy link
Copy Markdown
Contributor

Foundation work for Milestone 3 (Multi-Device Sync & Sharing). Lands the wire format, per-layer/per-blob abstractions, and convergence semantics that the upcoming feature PRs (relay E2E, FUSE mount, share-link export, snap-watch) all build on. No user-facing CLI surface added in this PR.

The marquee shift: concurrent producers (rw-mount flush + snap + parallel snaps from N writers) converge to the union of every writer's changes without any per-vault lock. The merge happens at the data layer (prolly tree merge_and_persist + CAS dedup); this PR adds the operational layer that actually triggers it.

What's in

  • feat(s5_core)!: v3 registry wire format. Per-vault namespace tag on registry entries: TYPE(0x5c) | KEYTYPE(0xed) | PUBKEY | VAULT_ID(16) | REVISION | LEN | PAYLOAD | SIG. Lookup is (pubkey, vault_id), so one device-signing-key signs across every vault, with VAULT_ID disambiguating. should_store becomes strict-CAS so a concurrent writer reliably detects "I lost the race" via verify-after-set; the Ord tie-break for read-side reconciliation is unchanged.

  • fix(s5_store_local): atomic put_bytes via tmp+rename. Closes the O_TRUNC window where concurrent registry writers tripped "insufficient bytes for deserialization" the moment 5+ publishers raced.

  • feat(s5_fs_v2): ReadableLayer + Pipeline + MergedView + WritableOverlay. Composable read/write primitives. Snapshot, MergedView, and WritableOverlay all implement ReadableLayer. Pipeline (with NodeCache) is the per-blob ops machinery, lifted out of Snapshot so writable mounts can hold it without dragging a full snapshot. WritableOverlay owns base + Pipeline + the entry buffer, and exposes flush(store) as the canonical "fold the entries into a fresh prolly tree on top of the base" operation, used by both rw mounts and ingest.

  • feat(s5_node): publish convergence loop + peer-snapshot loading. run_publish is now a verify-and-retry-with-merge loop (max 16 attempts, jittered backoff). Combined with strict-CAS in s5_core and atomic put_bytes in s5_store_local, this is what lets concurrent producers converge without locking. peer_load::load_peer_snapshot is the read-side primitive behind live multi-peer mounts: given a peer's device signing pubkey + the shared vault_id, fetch their published Transparent Node, age-decrypt with our recipient identity, return a Snapshot ready to drop into a MergedView. Recovery derivation chain: KEY_SLOT_RECOVERY (in the vault root's TraversalContext) yields vault_id and recovery_signing_key.

  • feat(s5_node, s5_node_api)!: M3 vault config schema + Result-shaped RPC. NodeConfigVault gets the M3 schema (recipients, sources, blob_stores, meta_targets, plaintext_tree, watch). meta_targets is intentionally separate from blob_stores because implicit fallback would leak vault structure to backends meant only for opaque content. RunTaskResponse becomes a tagged enum (Spawned / Refused); the CLI client wrapper flattens to Result<SpawnedTask>, so daemon-side dispatch failures come back as real Errs with the actual error string instead of the prior task_id == 0 sentinel.

  • chore: adapt consumers to v3 wire format + new APIs. Bindings, registries, ingest, s5_fs, and s5_cli adapt to the StreamKey API change, the WritableOverlay constructor, and the removal of peer-tied registry variants.

  • test(s5_node): concurrent_publish + peer_load round-trip. concurrent_publish covers back-to-back, parallel, and 10-way parallel divergent publishers, asserting every writer's changes are reachable in the latest published TN. peer_load proves the read side: device A publishes via TaskExecutor, then a fresh code path (no daemon, no per-device state) loads A's snapshot via load_peer_snapshot and surfaces it through MergedView. Drops three legacy tests that depended on the pre-v2 s5_fs::sync surface; their multi-device coverage is replaced by the async_relay test landing in the next PR.

Validation

cargo test --workspace --lib --tests   # every crate "ok.", no FAILED
cargo test --test concurrent_publish   # 10-way convergence
cargo test --test peer_load            # peer-side load via relay

Breaking changes

  • Registry wire byte 0x010x5c. Old bytes rejected on parse.
  • NodeConfigRegistry::{Remote, Tee} removed (peer-tied transport).

Per-vault namespace tag on registry entries:

  TYPE(0x5c) | KEYTYPE(0xed) | PUBKEY | VAULT_ID(16) |
  REVISION(8 BE) | LEN | PAYLOAD | SIG

Lookup is `(pubkey, vault_id)` so one device-signing-key signs across
every vault. The 16-byte vault_id derives from the vault root's
recovery_secret (set up on the FS5 side).

`StreamKey::PublicKeyEd25519` stays for non-vault callers
(DirActor, bindings) — their data round-trips unchanged.

`should_store` becomes strict-CAS: same-revision-different-hash is
rejected so a concurrent writer reliably detects "I lost the race"
via verify-after-set. The Ord tie-break is unchanged for read-side
reconciliation across replicas.

`MULTIHASH_BLAKE3` is re-exported from `s5_core` so registry
payloads and BlobIds spell the same byte the same way.

Assisted-by: claude-opus-4-7
The previous `tokio::fs::write` opens with O_TRUNC then streams,
leaving an empty/partial file in the window between truncate and
write completion. Concurrent readers under registry workloads that
do get-after-set saw "insufficient bytes for deserialization" the
moment 5+ writers raced.

Switch to write-tmp + rename. Tmp suffix combines pid (cross-process
uniqueness) with a process-wide atomic counter (intra-process
uniqueness — two concurrent put_bytes calls in the same process
otherwise race on a shared tmp path).

Assisted-by: claude-opus-4-7
Composable per-layer/per-blob primitives so writable mounts,
multi-peer merges, and concurrent publishes compose cleanly.

ReadableLayer trait: Snapshot, MergedView, and WritableOverlay all
implement the same `(get, get_raw, scan, chunk_mask)` contract.
chunk_mask has a default; Snapshot reads from BuildContext,
MergedView delegates to layer 0, WritableOverlay to its base.

Pipeline (new): per-blob ops machinery (encrypt/decompress/upload,
walk_byte_stream, child_for). Lifted out of Snapshot so WritableFs
and ingest can hold the per-blob ops without a full Snapshot.
NodeCache wraps the shared decoded-node cache.

MergedView: k-way priority merge over Vec<Arc<dyn ReadableLayer>>.
Per-stream-exhaustion flag fixes the unfold re-poll panic that
fired when MergedView was used downstream of WritableOverlay.

WritableOverlay: owns base + Pipeline so callers reach for both via
one Arc. Exposes `flush(store)` — the canonical "fold entries into
a fresh prolly tree on top of the base" op, used by both rw mounts
and ingest. Two-way merge in scan() got the same exhaustion fix.

Snapshot becomes thinner: per-blob methods delegate to as_pipeline().
KEY_SLOT_RECOVERY (0x12) added — per-vault recovery_secret slot
from which vault_id and recovery_signing_key derive.

BlobPipeline.skip_when_unhelpful is per-pipeline (in
TraversalContext) so the policy travels with the encoding
definition and propagates correctly through merge_contexts.

Assisted-by: claude-opus-4-7
`run_publish` is now a verify-and-retry loop. Each attempt:

  1. Fetch the latest registry-published TN.
  2. If our local TN's tree differs, merge_and_persist both into the
     union (changes win on collision; tombstones resolve normally).
  3. Encrypt + upload the merged TN.
  4. Sign + registry.set at prev_revision + 1.
  5. Read back: if registry holds our hash, we won. Otherwise loop
     with jittered exponential backoff (max 16 attempts).

Combined with strict-CAS in s5_core and atomic put_bytes in
s5_store_local (this PR), concurrent rw-mount-flush + snap + parallel
snaps all converge to the union without per-vault locking. See
tests/concurrent_publish.rs.

`peer_load::load_peer_snapshot` (new): given a peer's
device_signing_pubkey + the shared vault_id, fetch their published
TN, age-decrypt with our recipient identity, return a Snapshot
ready to drop into a MergedView. Primitive behind live multi-peer
mounts.

Recovery derivation chain: KEY_SLOT_RECOVERY → vault_id (16-byte
truncated blake3) and recovery_signing_key (Ed25519). The recovery
entry maps recovery_pubkey → device_signing_pubkey under the same
vault_id, so a holder of just the paper key can discover the
device's vault registry entries.

verify_recovery_secret_invariant: fetched-previous-published-TN's
recovery_secret must match local; mismatch fails the publish to
prevent foreign history splicing.

Assisted-by: claude-opus-4-7
NodeConfigVault gets the M3 schema:

  - recipients: Vec<String>   — [key.*] names the published TN is
    age-encrypted to
  - sources: Vec<String>      — [source.*] names this vault ingests
  - blob_stores: Vec<String>  — read-fallback chain for file content
  - meta_targets: Vec<String> — relay destinations for the encrypted
    TN. Distinct from blob_stores by design — implicit fallback would
    leak vault structure to backends meant only for opaque content.
  - plaintext_tree: bool      — store FS5 tree nodes in plaintext
    (for content-store interop, e.g. Hugging Face Xet). The
    published TN is still age-encrypted to recipients.
  - watch: bool               — flag for the future inotify path.

Drops the placeholder `peers: Vec<String>` field.

NodeConfigRegistry loses the `Remote` and `Tee` variants — they
wrapped the old peer-tied registry transport. Local, Redb, Memory,
Store(name), and Multi(backends, write_policy) remain.

config.validate() does cross-reference validation at config-load
time so missing recipients/sources/stores/keys surface as a startup
error rather than as an opaque task failure.

RPC responses become tagged enums where they previously used
sentinels. `RunTaskResponse` is `Spawned { task_id, spec_json }` or
`Refused { error }`; the `S5NodeClient::run_task` wrapper flattens
to `Result<SpawnedTask>`. Replaces the prior `task_id == 0`
sentinel — daemon-side dispatch failures now come back as a real
Err with the actual error string.

Assisted-by: claude-opus-4-7
Tracks the StreamKey API change (storage_key/from_storage_key
replaces to_bytes/from_bytes), the WritableOverlay::new signature
(Box<dyn ReadableLayer>), and the removal of
NodeConfigRegistry::Remote/Tee:

- bindings/{s5_flutter, s5_wasm}: switch from StreamKey::Vault
  placeholder to legacy PublicKeyEd25519 for non-vault use.
- s5_fs/src/actor{, /persistence}: same — DirActor is non-vault.
- registries/{redb, store}: storage_key() for the backend key
  encoding (uniform across all StreamKey variants).
- ingest/local/src/backup: WritableOverlay::new takes Box.
- s5_cli/{main, cmd/{blobs, mount, snapshots}}: drop peer-tied
  operations that referenced the removed registry variants.
- vup_cli/src/recovery: tests updated to the new
  recovery_signing_key signature (recovery_secret bytes, not
  age_secret + vault_name).

Assisted-by: claude-opus-4-7
Two new integration tests covering the foundation work:

- concurrent_publish: back-to-back, parallel, and 10-way parallel
  divergent publishers all converge to the union of every writer's
  changes. Demonstrates the convergence loop + strict-CAS + atomic
  put_bytes interplay end-to-end against MultiRegistry over a real
  LocalStore relay.

- peer_load: device A publishes via TaskExecutor + TaskSpec::Backup,
  then a fresh code path (no daemon, no per-device state) calls
  load_peer_snapshot against the same relay store + a fresh
  MultiRegistry, verifies the produced Snapshot's tree surfaces
  through MergedView.

local_links_serve was carried forward from the working tree
(absent on upstream/main) — exercises the pinned-link store path.

Drops three legacy tests (fs_sync_complete, fs_sync_large_blob,
workflows) that depended on the pre-v2 s5_fs sync surface and the
removed s5_node::sync module. Their multi-device-sync coverage is
replaced by the async_relay test landing in PR B.

Assisted-by: claude-opus-4-7
@redsolver redsolver merged commit 104fdd3 into main Apr 27, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant