Skip to content

fix: recover safe fleet sync drift#111

Merged
kunchenguid merged 4 commits into
mainfrom
fm/fleet-stuck-recover-r6
Jun 27, 2026
Merged

fix: recover safe fleet sync drift#111
kunchenguid merged 4 commits into
mainfrom
fm/fleet-stuck-recover-r6

Conversation

@kunchenguid

Copy link
Copy Markdown
Owner

Intent

Close a real gap in firstmate's project-clone syncing: a pooled clone that drifts off its default branch was silently skipped forever by both the post-merge teardown sync and the bootstrap fleet-sync, so it fell further behind on every merge with only an easy-to-miss 'skipped' line (a live clone left on a detached HEAD by a stray 'git checkout origin/main' sat 17 commits behind, skipped on every bootstrap).

The change is to bin/fm-fleet-sync.sh (plus tests and an AGENTS.md section 3 doc update), with two behavioral additions on top of the existing fast-forward-when-safe logic; nothing is ever forced, stashed, or discarded.

  1. Auto-recover the one unambiguously safe drift: in sync_project, when the clone is on a detached HEAD that is NOT the default branch AND the working tree is clean AND the detached HEAD is an ancestor of origin/ (holds no unique commits) AND no other worktree currently has checked out, re-attach with 'git checkout ' and let the existing fast-forward path run. Reported distinctly as ': recovered: re-attached , synced ..'. This is a non-destructive checkout of an already-published commit, so it strands nothing; it exactly self-heals the live 17-behind example.

  2. Keep every unsafe case a skip but make it loud and quantified instead of a quiet drift: a clone that is dirty, on a non-default named branch, detached with unique commits (HEAD not an ancestor of origin/), or diverged is left untouched and reported as ': STUCK: on , N commits behind origin/ - needs attention', with N computed via 'git rev-list --count HEAD..origin/'. A chronically-stuck clone (growing N) is now visibly distinct from a benign one-off skip.

Deliberate decisions and constraints, so the diff does not look surprising:

  • Only the safe, clean, ancestor-of-origin/ detached case auto-recovers. A non-default named branch or any unique/uncommitted work is reported, never auto-changed, because it may hold real work. Preventing the detachment upstream is explicitly out of scope; this task is detect + safe-recover + alarm.
  • I added a 'default_checked_out_elsewhere' guard so we never attempt a checkout git would refuse (default checked out in another worktree); that case is reported STUCK instead.
  • The dirty check now runs once and is cached in a 'dirty' variable, and dirty is escalated from the old 'skipped: dirty working tree' to a STUCK warning, since a dirty firstmate clone is itself an anomaly worth flagging. A dirty clone that is also current still reports STUCK (0 behind) by design.
  • The previously-quiet 'diverged' skip is now also a STUCK report.
  • All pre-existing safety and behavior is preserved deliberately: on-default-and-clean clones still fast-forward exactly as before; local-only / no-origin / not-a-repo / fetch-failure remain benign 'skipped:' lines; gone-branch pruning is unchanged; both the per-project (fm-fleet-sync.sh ) and whole-fleet forms behave.
  • bin/fm-bootstrap.sh relays fleet-sync output as FLEET_SYNC lines but filtered by ': skipped:' only, so the new outcomes would have been dropped; I added ': STUCK:' and ': recovered:' to that case so both flow through. AGENTS.md section 3 (and the bootstrap header comment) now document the new recovered (self-healed, no action) and STUCK (needs attention) outcomes alongside skipped.
  • Added tests/fm-fleet-sync.test.sh covering: detached-clean-ancestor recovered and fast-forwarded; detached-with-unique-commit reported STUCK and untouched; dirty reported STUCK and untouched; non-default named branch reported STUCK and untouched; diverged reported STUCK and untouched; on-default-clean-behind fast-forwarded; already-current unchanged; no-origin and local-only skipped as before; the whole-fleet form; and a bootstrap integration test asserting STUCK and recovered flow through as FLEET_SYNC lines. shellcheck-clean and the full existing suite stays green.

What Changed

  • Adds fleet sync recovery for clean detached project clones whose HEAD is safely behind origin/<default>, reattaching them to the default branch before the existing fast-forward path runs.
  • Escalates unsafe clone states such as dirty worktrees, non-default branches, detached unique commits, and diverged defaults into STUCK reports with behind counts instead of quiet skips.
  • Relays the new fleet sync recovered and STUCK outcomes through bootstrap, documents the behavior, and adds coverage for safe recovery, unsafe stuck states, whole-fleet sync, and bootstrap reporting.

Risk Assessment

✅ Low: Captain, the change is narrowly scoped, keeps unsafe git states read-only, and adds focused coverage for the important detached, dirty, branch, diverged, and bootstrap relay cases.

Testing

Inspected the scoped diff and implementation, ran the focused fleet-sync behavior test, ran every tests/*.test.sh shell test, captured an end-to-end CLI transcript showing safe detached recovery, quantified STUCK warnings, preserved unsafe state, ordinary fast-forward, and bootstrap FLEET_SYNC relay, then confirmed the working tree stayed clean.

Evidence: fleet-sync end-to-end CLI transcript

Key excerpt: recover: recovered: re-attached main, synced 0ae3203..09c1b91 unique: STUCK: on detached HEAD with unique commits, 1 commits behind origin/main - needs attention FLEET_SYNC: boot-recover: recovered: re-attached main, synced 057de7a..93f7086 FLEET_SYNC: boot-stuck: STUCK: on branch main with uncommitted changes, 1 commits behind origin/main - needs attention

Fleet sync end-to-end evidence
root: /Users/kunchen/.no-mistakes/worktrees/016d88035d58/01KW5JBQEX9335CC171AT08YH5
fixture: /var/folders/5x/4nqprlbx0518k3ybcb1sz6gr0000gn/T/no-mistakes-evidence/01KW5JBQEX9335CC171AT08YH5/fleet-sync-e2e-fixture

Scenario: whole-fleet sync sees safe detached drift, unsafe states, and a normal behind default branch.
$ FM_HOME=/var/folders/5x/4nqprlbx0518k3ybcb1sz6gr0000gn/T/no-mistakes-evidence/01KW5JBQEX9335CC171AT08YH5/fleet-sync-e2e-fixture/home-fleet FM_ROOT_OVERRIDE=/Users/kunchen/.no-mistakes/worktrees/016d88035d58/01KW5JBQEX9335CC171AT08YH5 bin/fm-fleet-sync.sh
dirty: STUCK: on branch main with uncommitted changes, 1 commits behind origin/main - needs attention
fastforward: synced 5796095..2ec7d70
feature: STUCK: on branch feature, 1 commits behind origin/main - needs attention
recover: recovered: re-attached main, synced 0ae3203..09c1b91
unique: STUCK: on detached HEAD with unique commits, 1 commits behind origin/main - needs attention

Post-state checks after whole-fleet sync:
recover branch: main
recover HEAD equals origin/main: yes
dirty edit preserved: yes
feature branch after sync: feature
unique clone state after sync: detached
fastforward HEAD equals origin/main: yes

Scenario: bootstrap relays recovered and STUCK fleet-sync outcomes as FLEET_SYNC lines.
$ FM_HOME=/var/folders/5x/4nqprlbx0518k3ybcb1sz6gr0000gn/T/no-mistakes-evidence/01KW5JBQEX9335CC171AT08YH5/fleet-sync-e2e-fixture/home-bootstrap FM_ROOT_OVERRIDE=/Users/kunchen/.no-mistakes/worktrees/016d88035d58/01KW5JBQEX9335CC171AT08YH5 bin/fm-bootstrap.sh | grep '^FLEET_SYNC:'
FLEET_SYNC: boot-recover: recovered: re-attached main, synced 057de7a..93f7086
FLEET_SYNC: boot-stuck: STUCK: on branch main with uncommitted changes, 1 commits behind origin/main - needs attention

Pipeline

Updates from git push no-mistakes

✅ **intent** - passed

✅ No issues found.

✅ **Rebase** - passed

✅ No issues found.

🔧 **Review** - 1 issue found → auto-fixed ✅
  • 🚨 bin/fm-fleet-sync.sh:198 - When the clone is detached at a clean ancestor of origin/main, this checkout runs before verifying that the local default branch itself is safe. If local main contains unique/diverged commits and is merely not checked out elsewhere, fleet sync will move the pool clone onto that unsafe local main, then report STUCK, violating the intended 'unsafe cases are left untouched' invariant and potentially basing later work on unlanded commits. Check that an existing local $DEFAULT is an ancestor of $BASE before checkout, or report the divergence without switching branches.

🔧 Fix: Guard detached recovery from diverged local defaults
✅ Re-checked - no issues remain.

✅ **Test** - passed

✅ No issues found.

  • bash tests/fm-fleet-sync.test.sh
  • for t in tests/*.test.sh; do printf '== %s ==\n' "$t"; bash "$t"; done
  • Manual isolated fleet fixture: FM_HOME=/var/folders/5x/4nqprlbx0518k3ybcb1sz6gr0000gn/T/no-mistakes-evidence/01KW5JBQEX9335CC171AT08YH5/fleet-sync-e2e-fixture/home-fleet FM_ROOT_OVERRIDE=/Users/kunchen/.no-mistakes/worktrees/016d88035d58/01KW5JBQEX9335CC171AT08YH5 bin/fm-fleet-sync.sh
  • Manual bootstrap relay fixture: FM_HOME=/var/folders/5x/4nqprlbx0518k3ybcb1sz6gr0000gn/T/no-mistakes-evidence/01KW5JBQEX9335CC171AT08YH5/fleet-sync-e2e-fixture/home-bootstrap FM_ROOT_OVERRIDE=/Users/kunchen/.no-mistakes/worktrees/016d88035d58/01KW5JBQEX9335CC171AT08YH5 bin/fm-bootstrap.sh | grep &#39;^FLEET_SYNC:&#39;
  • git status --short
✅ **Document** - passed

✅ No issues found.

✅ **Lint** - passed

✅ No issues found.

✅ **Push** - passed

✅ No issues found.

A pooled clone that drifts off its default branch was silently skipped
forever by both the post-merge teardown sync and bootstrap fleet-sync,
falling further behind on every merge with only an easy-to-miss skip line.

- Auto-recover the one safe case: a clean, detached HEAD that is an
  ancestor of origin/<default> and whose <default> is free to check out
  is re-attached and fast-forwarded, reported as 'recovered:'.
- Every other off-default state (dirty, non-default named branch,
  detached with unique commits, diverged) is left untouched and reported
  as a quantified 'STUCK: ... N commits behind ... - needs attention'
  warning instead of a quiet drift. Nothing is forced, stashed, or discarded.
- Relay the new recovered:/STUCK: outcomes through bootstrap FLEET_SYNC
  lines; document both in AGENTS.md section 3.
- Add tests/fm-fleet-sync.test.sh covering recover, every stuck variant,
  ordinary fast-forward, already-current, local-only/no-origin skips, the
  whole-fleet form, and the bootstrap relay.
@kunchenguid kunchenguid force-pushed the fm/fleet-stuck-recover-r6 branch from 9e9efd9 to 8d89ec2 Compare June 27, 2026 23:05
@kunchenguid kunchenguid merged commit 0869131 into main Jun 27, 2026
4 checks passed
@kunchenguid kunchenguid deleted the fm/fleet-stuck-recover-r6 branch June 27, 2026 23:15
leo1oel added a commit to leo1oel/nemo that referenced this pull request Jun 29, 2026
…nes (#14)

Port upstream kunchenguid#111 to the herdr fork. A pooled clone that drifted off its
default branch was silently skipped forever by the per-spawn and post-merge
teardown syncs, falling further behind on every merge with only an easy-to-miss
skip line.

- Auto-recover the one safe case: a clean, detached HEAD that is an ancestor of
  origin/<default> and whose <default> is free to check out is re-attached and
  fast-forwarded, reported as 'recovered:'. Re-attaching to an already-published
  commit strands nothing.
- Every other off-default state (dirty, non-default named branch, detached with
  unique commits, diverged, or <default> checked out in another worktree) is left
  untouched and reported as a quantified 'STUCK: ... N commits behind ... - needs
  attention' warning instead of a quiet drift. Nothing is forced, stashed, or
  discarded.

Also align fm-fleet-sync.sh with the rest of the fork's FM_HOME contract
(AGENTS.md section 2): it now resolves projects/ from FM_HOME (honoring
FM_ROOT_OVERRIDE for bin/), matching fm-project-mode.sh and fm-guard.sh, so a
secondmate home syncs its own clones rather than the primary's. This was a latent
bug; the new whole-fleet test exercises it.

Wire the recovered:/STUCK: outcomes into AGENTS.md (prime-directive fleet-sync
exception, teardown, layout) and the docs. Upstream's bootstrap FLEET_SYNC relay
has no analogue here: this fork has no fm-bootstrap; fleet-sync output flows
through fm-spawn and fm-teardown directly.

Add tests/fm-fleet-sync.test.sh (11 cases, self-contained per the fork's harness
convention): recover, every stuck variant, ordinary fast-forward, already-current,
local-only/no-origin benign skips, and the whole-fleet form. Upstream's
bootstrap-relay case is dropped (N/A).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant