peer-sync metadata integrity: watermark/correlation history (H6) + folder-Merkle self-check (M2)#87
Merged
Merged
Conversation
peer_sync_state is overwrite-in-place by design — UpsertPeerSyncState upserts, so the live row only ever shows the current watermark. Add a v12 migration creating an insert-only peer_sync_state_history table that preserves every watermark advance, so a bad or hostile watermark move can be detected and the drift anchor reconstructed (SAFETY-AUDIT H6). The (volume_id, peer_node_id) index backs the per-pair history read; the pair is high-cardinality across a fleet so a plain composite index is appropriate rather than a partial one. SchemaVersion bumps 11 -> 12 and the migration appends sequentially after v11 in the registry. https://claude.ai/code/session_01GV1qgPE2YcUnuoLvy566kj
UpsertPeerSyncState now writes the upsert and an insert-only peer_sync_state_history row in one transaction, so the append-only watermark log can never diverge from the live row. It also reads the current watermark first and refuses a backwards move (new < current) with a structured *WatermarkRewindError (wrapping the ErrWatermarkRewind sentinel) unless a new allowRewind parameter is set — the opt-in for genuine recovery (SAFETY-AUDIT H6). The opt-in is a store-layer bool rather than a CLI --allow-rewind flag: the only production caller is the receiver's successful-close path in agent/sync.go, which advances monotonically and passes false. A dedicated CLI surface would be disproportionate for a parameter no operator flow sets today. Existing call sites (and tests) pass false; ListPeerSyncStateHistory reads the per-pair log oldest-first. https://claude.ai/code/session_01GV1qgPE2YcUnuoLvy566kj
SetCorrelatedRunID overwrites runs.correlated_run_id in place. Wrap the update and a 'set-correlated-run-id' runs_audit row in one transaction so the overwrite-in-place column gains an append-only trail of every value it held; the note carries the old->new ids (prior NULL rendered as "none"). Reuses #78's appendRunAuditTx helper and adds the exported TransitionSetCorrelatedRunID constant alongside the existing transition kinds (SAFETY-AUDIT H6). https://claude.ai/code/session_01GV1qgPE2YcUnuoLvy566kj
New `peer-sync` parent command namespacing node-sync forensics, with a `history` subcommand (its own file) that lists the append-only watermark transition log for one (volume, peer) pair, oldest first. This is the read surface for the peer_sync_state_history table — the live peer_sync_state row only shows the current watermark (SAFETY-AUDIT H6). https://claude.ai/code/session_01GV1qgPE2YcUnuoLvy566kj
CheckFolderHashes re-derives every folder's shallow and deep BLAKE3 for a volume from the current file/child rows and compares each to the stored shallow_blake3/deep_blake3, returning one FolderHashDivergence per folder that disagrees. It reuses the steady-state recompute helpers (no hashing reimplemented) and runs read-only in one transaction. `squirrel audit --folders` drives this with no disk walk and no run row: it prints every divergence (folder path, stored vs derived) and exits non-zero if any folder diverged, or "consistent" and zero otherwise. The flag is mutually exclusive with --deep (SAFETY-AUDIT M2). https://claude.ai/code/session_01GV1qgPE2YcUnuoLvy566kj
Store tests: UpsertPeerSyncState appends one history row per advance and the live row tracks only the latest; a backwards move is refused with a *WatermarkRewindError (errors.Is/As) leaving the row and history untouched, and succeeds under allowRewind; SetCorrelatedRunID appends a 'set-correlated-run-id' row whose note carries none->value then value->value; the v11->v12 migration creates peer_sync_state_history. CLI tests: `audit --folders` reports "consistent" on a clean volume and detects a corrupted stored shallow hash (exiting non-zero with the divergence detail); `peer-sync history` lists seeded watermark transitions and fails fast on an unknown volume. https://claude.ai/code/session_01GV1qgPE2YcUnuoLvy566kj
There was a problem hiding this comment.
Pull request overview
This pull request addresses SAFETY-AUDIT issue #77 by hardening peer-sync metadata integrity (append-only watermark/correlation trails + monotonicity guard) and adding an index-only self-check to detect folder Merkle hash drift.
Changes:
- Add append-only history for peer-sync watermark advances (
peer_sync_state_history) and enforce monotonic watermark updates by default (rewind requires opt-in). - Append
runs_auditentries when stampingruns.correlated_run_id, preserving an append-only trail for that overwrite-in-place column. - Add
squirrel audit --folders(folder Merkle re-derivation + divergence reporting) andsquirrel peer-sync history(watermark history listing), with tests and migration v12.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| sync/node_test.go | Updates watermark upsert call sites for new allowRewind parameter. |
| store/runs.go | Wraps correlated-run-id updates in a tx and appends a runs_audit transition. |
| store/runs_audit.go | Adds TransitionSetCorrelatedRunID constant for correlation stamping audit entries. |
| store/peer_sync.go | Adds rewind-guard error type, history row type, history listing API, and tx+history write on upsert. |
| store/peer_sync_test.go | Adds unit/migration tests for watermark history, rewind guard, and correlated-run-id audit trail. |
| store/nodes_test.go | Updates watermark upsert call sites for new allowRewind parameter. |
| store/migrations.go | Bumps schema to v12 and adds migration creating peer_sync_state_history. |
| store/folder_check.go | Adds store-level folder hash self-check (CheckFolderHashes). |
| cmd/squirrel/root.go | Registers new peer-sync command namespace. |
| cmd/squirrel/peer_sync.go | Adds squirrel peer-sync parent command. |
| cmd/squirrel/peer_sync_history.go | Adds squirrel peer-sync history <volume> <peer> command implementation. |
| cmd/squirrel/peer_sync_history_test.go | Adds CLI tests for peer-sync history. |
| cmd/squirrel/audit.go | Adds audit --folders mode (index-only folder Merkle self-check), mutually exclusive with --deep. |
| cmd/squirrel/audit_test.go | Adds CLI tests for audit --folders including a corruption-detection fixture. |
| agent/sync.go | Advances peer-sync watermark with allowRewind=false on successful close. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+1279
to
+1281
| // the same NowNs() tick still order deterministically by id. last_shared | ||
| // _run_id is nullable for the same reason it is on peer_sync_state: the | ||
| // first contact carries no shared run id. Like runs_audit, rows are |
Comment on lines
+183
to
+186
| // TestMigrateV11ToV12CreatesPeerSyncHistory seeds a v11-shape database by | ||
| // hand (the schema the v12 migration builds on: volumes, nodes, | ||
| // peer_sync_state, runs_audit, schema_version=11), opens it to trigger | ||
| // the v11→v12 step, and asserts the peer_sync_state_history table exists |
Comment on lines
+29
to
+30
| if !strings.Contains(out, "7") || !strings.Contains(out, "42") { | ||
| t.Fatalf("history output missing watermark values:\n%s", out) |
- migrations.go: fix a doc-comment line break that split last_shared_run_id across lines (rendered as "last_shared _run_id"). - peer_sync_test.go: correct the fixture comment — v11SchemaDDL seeds a minimal schema (schema_version, volumes, nodes, peer_sync_state); it does not create runs_audit, so the comment no longer claims it. - peer_sync_history_test.go: tighten the watermark assertion to check the last column per data row (oldest-first [7, 42]) instead of a bare Contains, which could match digits inside the RFC3339 timestamps. https://claude.ai/code/session_01GV1qgPE2YcUnuoLvy566kj
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #77 (SAFETY-AUDIT findings H6 + M2).
H6 — append-only watermark & correlation history
peer_sync_state_historytable (migration v12). EveryUpsertPeerSyncStatewrites the upsert and a history row in one transaction, so a watermark can never be silently overwritten.UpsertPeerSyncStategains anallowRewind bool: a non-monotonic move (new < current) is refused with a structured*WatermarkRewindError(matchable viaerrors.Is/errors.As). The production close path advances monotonically and passesfalse; a NULL/absent watermark imposes no floor. No CLI--allow-rewindsurface (no operator flow needs it today).SetCorrelatedRunIDwrites aruns_auditrow, reusing runs: don't let the runs log lie — guard terminal-state overwrite (H2) + distinguish manual-fail (H3) #78'sappendRunAuditTx+ new constantTransitionSetCorrelatedRunID— no parallel audit table.squirrel peer-sync history <volume> <peer>lists watermark transitions.M2 — folder Merkle self-check
squirrel audit --foldersre-derives each folder's shallow/deep BLAKE3 from the live file/child rows (reusingstore/folders.gohelpers, read-only — no hashing reimplemented) and reports divergences from the stored values, exiting non-zero on any. Mutually exclusive with--deep.Verify
Migration chain v10→v11→v12 passes;
go vet ./...+go test ./...green, gofmt clean. Verified locally with rclone v1.74.1 onPATHso the node integration tests actually execute (they skip silently without it).https://claude.ai/code/session_01GV1qgPE2YcUnuoLvy566kj
Generated by Claude Code