Skip to content

peer-sync metadata integrity: watermark/correlation history (H6) + folder-Merkle self-check (M2)#87

Merged
mbertschler merged 7 commits into
mainfrom
claude/77-peersync-metadata
Jun 4, 2026
Merged

peer-sync metadata integrity: watermark/correlation history (H6) + folder-Merkle self-check (M2)#87
mbertschler merged 7 commits into
mainfrom
claude/77-peersync-metadata

Conversation

@mbertschler
Copy link
Copy Markdown
Owner

Closes #77 (SAFETY-AUDIT findings H6 + M2).

Stacked on #78 (claude/78-runs-audit) so the diff here is only #77's delta. I'll retarget the base to main once #78 merges (then the Closes keyword fires).

H6 — append-only watermark & correlation history

  • New insert-only peer_sync_state_history table (migration v12). Every UpsertPeerSyncState writes the upsert and a history row in one transaction, so a watermark can never be silently overwritten.
  • UpsertPeerSyncState gains an allowRewind bool: a non-monotonic move (new < current) is refused with a structured *WatermarkRewindError (matchable via errors.Is/errors.As). The production close path advances monotonically and passes false; a NULL/absent watermark imposes no floor. No CLI --allow-rewind surface (no operator flow needs it today).
  • SetCorrelatedRunID writes a runs_audit row, reusing runs: don't let the runs log lie — guard terminal-state overwrite (H2) + distinguish manual-fail (H3) #78's appendRunAuditTx + new constant TransitionSetCorrelatedRunID — no parallel audit table.
  • New squirrel peer-sync history <volume> <peer> lists watermark transitions.

M2 — folder Merkle self-check

  • squirrel audit --folders re-derives each folder's shallow/deep BLAKE3 from the live file/child rows (reusing store/folders.go helpers, read-only — no hashing reimplemented) and reports divergences from the stored values, exiting non-zero on any. Mutually exclusive with --deep.

Verify

Migration chain v10→v11→v12 passes; go vet ./... + go test ./... green, gofmt clean. Verified locally with rclone v1.74.1 on PATH so the node integration tests actually execute (they skip silently without it).

https://claude.ai/code/session_01GV1qgPE2YcUnuoLvy566kj


Generated by Claude Code

claude added 6 commits June 3, 2026 22:57
peer_sync_state is overwrite-in-place by design — UpsertPeerSyncState
upserts, so the live row only ever shows the current watermark. Add a
v12 migration creating an insert-only peer_sync_state_history table that
preserves every watermark advance, so a bad or hostile watermark move
can be detected and the drift anchor reconstructed (SAFETY-AUDIT H6).

The (volume_id, peer_node_id) index backs the per-pair history read; the
pair is high-cardinality across a fleet so a plain composite index is
appropriate rather than a partial one. SchemaVersion bumps 11 -> 12 and
the migration appends sequentially after v11 in the registry.

https://claude.ai/code/session_01GV1qgPE2YcUnuoLvy566kj
UpsertPeerSyncState now writes the upsert and an insert-only
peer_sync_state_history row in one transaction, so the append-only
watermark log can never diverge from the live row. It also reads the
current watermark first and refuses a backwards move (new < current)
with a structured *WatermarkRewindError (wrapping the ErrWatermarkRewind
sentinel) unless a new allowRewind parameter is set — the opt-in for
genuine recovery (SAFETY-AUDIT H6).

The opt-in is a store-layer bool rather than a CLI --allow-rewind flag:
the only production caller is the receiver's successful-close path in
agent/sync.go, which advances monotonically and passes false. A
dedicated CLI surface would be disproportionate for a parameter no
operator flow sets today. Existing call sites (and tests) pass false;
ListPeerSyncStateHistory reads the per-pair log oldest-first.

https://claude.ai/code/session_01GV1qgPE2YcUnuoLvy566kj
SetCorrelatedRunID overwrites runs.correlated_run_id in place. Wrap the
update and a 'set-correlated-run-id' runs_audit row in one transaction
so the overwrite-in-place column gains an append-only trail of every
value it held; the note carries the old->new ids (prior NULL rendered as
"none"). Reuses #78's appendRunAuditTx helper and adds the exported
TransitionSetCorrelatedRunID constant alongside the existing transition
kinds (SAFETY-AUDIT H6).

https://claude.ai/code/session_01GV1qgPE2YcUnuoLvy566kj
New `peer-sync` parent command namespacing node-sync forensics, with a
`history` subcommand (its own file) that lists the append-only watermark
transition log for one (volume, peer) pair, oldest first. This is the
read surface for the peer_sync_state_history table — the live
peer_sync_state row only shows the current watermark (SAFETY-AUDIT H6).

https://claude.ai/code/session_01GV1qgPE2YcUnuoLvy566kj
CheckFolderHashes re-derives every folder's shallow and deep BLAKE3 for
a volume from the current file/child rows and compares each to the stored
shallow_blake3/deep_blake3, returning one FolderHashDivergence per folder
that disagrees. It reuses the steady-state recompute helpers (no hashing
reimplemented) and runs read-only in one transaction.

`squirrel audit --folders` drives this with no disk walk and no run row:
it prints every divergence (folder path, stored vs derived) and exits
non-zero if any folder diverged, or "consistent" and zero otherwise. The
flag is mutually exclusive with --deep (SAFETY-AUDIT M2).

https://claude.ai/code/session_01GV1qgPE2YcUnuoLvy566kj
Store tests: UpsertPeerSyncState appends one history row per advance and
the live row tracks only the latest; a backwards move is refused with a
*WatermarkRewindError (errors.Is/As) leaving the row and history
untouched, and succeeds under allowRewind; SetCorrelatedRunID appends a
'set-correlated-run-id' row whose note carries none->value then
value->value; the v11->v12 migration creates peer_sync_state_history.

CLI tests: `audit --folders` reports "consistent" on a clean volume and
detects a corrupted stored shallow hash (exiting non-zero with the
divergence detail); `peer-sync history` lists seeded watermark
transitions and fails fast on an unknown volume.

https://claude.ai/code/session_01GV1qgPE2YcUnuoLvy566kj
Copilot AI review requested due to automatic review settings June 3, 2026 22:59
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request addresses SAFETY-AUDIT issue #77 by hardening peer-sync metadata integrity (append-only watermark/correlation trails + monotonicity guard) and adding an index-only self-check to detect folder Merkle hash drift.

Changes:

  • Add append-only history for peer-sync watermark advances (peer_sync_state_history) and enforce monotonic watermark updates by default (rewind requires opt-in).
  • Append runs_audit entries when stamping runs.correlated_run_id, preserving an append-only trail for that overwrite-in-place column.
  • Add squirrel audit --folders (folder Merkle re-derivation + divergence reporting) and squirrel peer-sync history (watermark history listing), with tests and migration v12.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
sync/node_test.go Updates watermark upsert call sites for new allowRewind parameter.
store/runs.go Wraps correlated-run-id updates in a tx and appends a runs_audit transition.
store/runs_audit.go Adds TransitionSetCorrelatedRunID constant for correlation stamping audit entries.
store/peer_sync.go Adds rewind-guard error type, history row type, history listing API, and tx+history write on upsert.
store/peer_sync_test.go Adds unit/migration tests for watermark history, rewind guard, and correlated-run-id audit trail.
store/nodes_test.go Updates watermark upsert call sites for new allowRewind parameter.
store/migrations.go Bumps schema to v12 and adds migration creating peer_sync_state_history.
store/folder_check.go Adds store-level folder hash self-check (CheckFolderHashes).
cmd/squirrel/root.go Registers new peer-sync command namespace.
cmd/squirrel/peer_sync.go Adds squirrel peer-sync parent command.
cmd/squirrel/peer_sync_history.go Adds squirrel peer-sync history <volume> <peer> command implementation.
cmd/squirrel/peer_sync_history_test.go Adds CLI tests for peer-sync history.
cmd/squirrel/audit.go Adds audit --folders mode (index-only folder Merkle self-check), mutually exclusive with --deep.
cmd/squirrel/audit_test.go Adds CLI tests for audit --folders including a corruption-detection fixture.
agent/sync.go Advances peer-sync watermark with allowRewind=false on successful close.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread store/migrations.go Outdated
Comment on lines +1279 to +1281
// the same NowNs() tick still order deterministically by id. last_shared
// _run_id is nullable for the same reason it is on peer_sync_state: the
// first contact carries no shared run id. Like runs_audit, rows are
Comment thread store/peer_sync_test.go Outdated
Comment on lines +183 to +186
// TestMigrateV11ToV12CreatesPeerSyncHistory seeds a v11-shape database by
// hand (the schema the v12 migration builds on: volumes, nodes,
// peer_sync_state, runs_audit, schema_version=11), opens it to trigger
// the v11→v12 step, and asserts the peer_sync_state_history table exists
Comment thread cmd/squirrel/peer_sync_history_test.go Outdated
Comment on lines +29 to +30
if !strings.Contains(out, "7") || !strings.Contains(out, "42") {
t.Fatalf("history output missing watermark values:\n%s", out)
- migrations.go: fix a doc-comment line break that split last_shared_run_id
  across lines (rendered as "last_shared _run_id").
- peer_sync_test.go: correct the fixture comment — v11SchemaDDL seeds a
  minimal schema (schema_version, volumes, nodes, peer_sync_state); it does
  not create runs_audit, so the comment no longer claims it.
- peer_sync_history_test.go: tighten the watermark assertion to check the
  last column per data row (oldest-first [7, 42]) instead of a bare
  Contains, which could match digits inside the RFC3339 timestamps.

https://claude.ai/code/session_01GV1qgPE2YcUnuoLvy566kj
Base automatically changed from claude/78-runs-audit to main June 4, 2026 10:08
@mbertschler mbertschler merged commit 7c8b952 into main Jun 4, 2026
2 checks passed
@mbertschler mbertschler deleted the claude/77-peersync-metadata branch June 4, 2026 10:09
mbertschler added a commit that referenced this pull request Jun 4, 2026
…ture

agent: fix UpsertPeerSyncState call broken by the #80/#87 merge integration
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

peer-sync metadata integrity: append-only watermark/correlation history (H6) + folder-Merkle self-check (M2)

3 participants