fix(agents): close the SE PR-merge cycle — --auto, self-review fallback, CTO handoff, ci.yml edited trigger#201
Conversation
PR #200 hit "base branch policy prohibits the merge" 25s after the agent woke, because the bare `gh pr merge --squash` ran while `test` (~2min) was still in flight. The agent comment incorrectly diagnosed it as a self-approval block and asked ohld to merge manually. Branch protection on `production` does NOT require pull-request reviews (only `lint` and `test` status checks), so self-approval was never the issue. The actual problem was the merge-vs-CI race; the previous 5-iteration polling loop masked it but was removed in #187, exposing it. Replace the polling loop + bare merge with `gh pr merge --squash --auto`. GitHub now queues the squash-merge and fires it the moment required status checks (`lint`, `test`) pass. No polling, no race, no admin bypass needed. Verification step learns three states: - MERGED: CI was green when --auto ran; immediate merge → done - OPEN + autoMergeRequest != null: queued, GitHub will fire → done - OPEN + autoMergeRequest == null: real failure → blocked Single-shot fast-fail on already-red CI stays (avoids stale "queued" issues when there's nothing left to wait for). Explicitly forbids --admin in the merge instructions to remove the bad runtime suggestion that appeared on PR #200.
|
…ited trigger Three failure modes from this morning's investigation, all upstream of the --auto fix in this PR's first commit: 1. Step 7 told SE to never use `gh pr comment` for review signal, but `gh pr review --approve|--request-changes` always exits non-zero with the self-review block when the gh CLI is authed as the PR author — the dominant case for ohld-authored internal PRs. SE's working fallback (proven across PRs #193, #194, #195, #197, #198) is a prefixed `gh pr comment "STAFF ENGINEER REVIEW: APPROVED|CHANGES REQUESTED — <summary>"`. Document it as the expected path; reserve the "real review required" guidance for external-author PRs. 2. PR #174 sat 9 days because a `gh pr comment` with "Verdict: changes requested" was posted with no Paperclip handoff, so CTO never picked it up. Make `paperclipCreateIssue [pr:NNN] address review changes` MANDATORY whenever changes are requested (either path). 3. Step 8a gated on `reviewDecision == APPROVED`, which never triggers for ohld-authored PRs (self-review block leaves it empty). Re-state as "Step 7 produced an APPROVAL outcome" — covers both real `--approve` and the comment-fallback. Branch protection on `production` requires only `lint+test`, not a formal review, so `--auto` queues regardless. Plus: ci.yml `pull_request` triggers gain `edited` so retargeted PRs fire CI. Today PR #196 retargeted from the deleted `main` branch to `production` and `notify-staff-engineer` fired but `lint+test` did not. Net diff in AGENTS.md: +15/-9 lines. Step 8 (the --auto rewrite from the first commit) is unchanged. Followed Anthropic's Opus 4.7 prompting guidance: positive examples over negation, scope-explicit, no nested "but if" branching.
|
STAFF ENGINEER REVIEW: CHANGES REQUESTED (Posted as comment because P1 — Blocking. I verified against the repo settings:
Fix is one line, not in this PR — flip the repo setting first: ``` Then this PR ships as-is and the race fix actually works. P2 — Non-blocking but real. Queued-merge `done` can strand PRs on later CI failure. In step 8 verification, `OPEN + autoMergeRequest != null` marks the Paperclip issue `done`. If `lint` or `test` later fails after the queue is set, GitHub leaves the PR open and the queue never fires. No synchronize event = no agent re-trigger. The PR sits forever until ohld notices. Mitigations to consider: keep the issue `in_progress` (blocked on a separate poller routine), or at minimum call this risk out in a CTO follow-up so the next reader knows. P3 — Minor. Bundled CI workflow change isn't justified in PR body. `.github/workflows/ci.yml` adds `types: [opened, reopened, synchronize, edited]`. The PR description doesn't mention it. Not dangerous (just runs CI on description edits), but it's an unrelated bundled change. Either justify in the body or split into its own commit. Recommendation: enable `allow_auto_merge` on the repo first, push a small follow-up commit (or amended push) clarifying the `edited` trigger, then this lands cleanly. I'll re-review on the next `synchronize`. @ohld for the manual repo-setting flip. |
|
STAFF ENGINEER REVIEW: CHANGES REQUESTED — P1 verified by /codex round 4 + direct P1 — Confirmed via
Fix (pick one):
Codex round 4 also flagged (worth folding into the same fix):
Holding the merge. CI is green (lint+test+notify-staff-engineer all pass), so once P1 is addressed I'll re-approve and merge on the next sync. Disabled auto-merge defensively (no queue existed). Will re-review on next |
…external Address PR #201 SE review (P1 + P2): P1 — repo had `allow_auto_merge: false`, so `gh pr merge --squash --auto` would fail and route every internal PR to `blocked`. Repo setting flipped to `true` out-of-band (`gh api -X PATCH repos/ffmemes/ff-backend -f allow_auto_merge=true`); AGENTS.md step 8 now names the prerequisite and tells the agent to bail with a clear comment + `blocked` if it ever drifts back to false. No bare `gh pr merge` fallback — that re-opens the CI race PR #201 exists to close. P2 — step 1 META fetch now also pulls `isCrossRepository` as `IS_FORK`. Step 7 (CTO handoff scope) and step 8b (merge gate) gate the internal classification on `IS_FORK == false` AND existing author/branch logic. A fork PR named `fix/FFM-foo` previously matched the internal allowlist via branch-prefix alone and could have auto-merged. PR #201 itself is `isCrossRepository=false`, so unaffected. Pure prompt edit; no production code touched. Co-Authored-By: Paperclip <noreply@paperclip.ing>
|
Addressed P1 + P2 in P1 — Verified: To prevent silent regressions if the setting drifts back to
I considered option B (bare-merge fallback) and rejected it: step 8c only filters FAILURE/ERROR/CANCELLED, so an in-flight PENDING check would let a bare merge race against CI. Step 8c → bare merge is not safe in general, only safe if all required checks have already turned SUCCESS. Adding that conditional logic adds complexity for a misconfig that's now a one-API-call fix. P2 — fork-PR spoofing. Step 1 META fetch now also pulls P2 (deferred — queued-merge Re-review on the next |
|
STAFF ENGINEER REVIEW: APPROVED — main merge-race fix is sound ( /review (Claude structural) + /codex (adversarial second-opinion) both passed. Codex flagged two non-blocking follow-ups, filing the [P1] as a separate ticket:
Skipped /cso — the ci.yml change is a single event-type addition, no scripts/secrets/auth surface. Queuing |
Fix verification — post-#201 internal PRsWindow: 2026-04-25T20:44Z (merge of #201) → 2026-04-26T18:10Z (~24h)
Notes:
Aggregate verdict: fix verified — the Generated by Claude Code |
…firefighting (#205) * feat(agents/se): Self-Check Gate + Anti-Patterns Log to close silent firefighting Adds a mandatory verification step before staff-engineer marks an execution issue done. Each PR-review outcome (merged / queued / blocked-CI / changes- requested / external / already-resolved) maps to a path with explicit checks that must pass; failures route to status=blocked with a reason instead of a silent close. Path A3 probes the next-link (Coolify deploy) via last_online_at on /api/v1/applications/<uuid>: if the container is still healthy on a pre-merge timestamp 5 min after merge, files [chain-broken:coolify-not-triggered] HIGH for CTO. This catches the GH→Coolify webhook drop case ohld reported. ANTI-PATTERNS.md is the case-log feeding the gate. Each row maps to a check; six seed rows from real PRs (#177 6-day silent trigger drop, #199 17h zero-artifact merge, #201 auto-merge race during changes-requested, #200 bare-merge CI race, the user-reported chain-break, and the SC=$(gh pr view --json comments) JSON corruption found while testing). Tested locally against PRs #199 (correctly identified as silent exit, 0 review-signal artifacts) and #201 (signal found, A3 timestamp probe passes). * fix(agents/se): address codex review of self-check gate Two real bugs codex caught before push: [P1] A2/B2/C2 grep was too narrow — only matched the comment-fallback form (STAFF ENGINEER REVIEW: APPROVED) used when GitHub self-review-blocks ohld. For non-ohld internal and external authors, step 7 posts a real `gh pr review --approve -b "Review summary"` whose body lacks the prefix. Those valid approvals would have failed the gate. Fix: accept EITHER a .reviews[] entry with state="APPROVED" OR the comment-prefix. Same dual-form for D1 CHANGES_REQUESTED. [P2] A3 Coolify probe fired immediately after merge, when last_online_at is still pre-merge from the previous deploy. Coolify needs ~3-5 min for the deploy + healthcheck cycle. Without a grace window, every healthy PR would file [chain-broken:coolify-not-triggered] and drown CTO in false alarms. Fix: probe only fires when now - mergedAt >= 300s; otherwise deferred to QA's hourly Process Health Check. Both fixes logged as #7 and #8 in ANTI-PATTERNS.md so the same blind spots don't reappear in future gate revisions. * fix(agents/se): address SE agent CHANGES REQUESTED on PR #205 Two P1s the SE agent caught reviewing this PR: [P1.1] Path A3 used BSD `date -u -j -f` which doesn't exist on the Linux agent runtime. Probe failed at the first line, MERGED_EPOCH was empty, the chain-broken issue never fired. Fix: GNU `date -u -d` auto-parses both mergedAt (ISO 8601) and last_online_at (YYYY-MM-DD HH:MM:SS). [P1.2] A2/B2/C2/D1/E1 jq commands grepped ALL artifacts on the PR. Spec said "for THIS run" but had no time filter, so a stale APPROVED comment from a prior wake would let a current silent-exit wake pass A2 — exactly the silent-close mode the gate was meant to fix. Fix: capture WAKE_START_ISO at the top of every wake, filter via --arg t in A2 + D1 jq calls. Skipped per minimal-code preference: ANTI-PATTERNS rows for these (caught pre-merge, not a production failure), Coolify UUID drift note, D3 MCP-tool prose tweak. All non-blocking from the SE review. Verified locally against PR #205 own data — the new wake-scope filter correctly finds the SE review when WAKE_START < submittedAt and rejects it otherwise. * fix(agents/se): route all SE wake exits through Self-Check Gate Codex adversarial review of PR #205 (5 findings, 1 P1 + 4 P2): [P1] Steps 0/7/8 bypassed the gate entirely — `paperclipUpdateIssue done|blocked` was called directly inside each step, so the verification block in step 9 was dead code as wired (cases #1/#3/#4/#5 from the ANTI-PATTERNS log all closed via those direct calls). Restructured so each terminal branch sets `OUTCOME_PATH=A|B|C|D|E|F` and jumps to step 9; step 9 is now the single `paperclipUpdateIssue` call site for the wake. Added an explicit terminal-status mapping table. [P2] `/tmp/sc.json` and `/tmp/app.json` are cross-run race traps — two parallel SE wakes overwrite each other's snapshots. PR-scoped to `/tmp/sc-${PR_NUMBER}.json` and `/tmp/app-${PR_NUMBER}.json`. [P2] Path A3 Coolify probe didn't validate the curl response. Empty body on 401/404/500/network failure → `jq -r .last_online_at` empty → `date -u -d ""` undefined → silent no-op. Now checks curl exit, HTTP 200, and non-empty `last_online_at`; on failure files `[chain-broken:coolify-probe-unhealthy]` for CTO. [P2] Path E was missing the `>= $WAKE_START_ISO` freshness filter that A2/B2/C2/D1 already have. A stale APPROVED review from a prior wake could have let a current silent-exit external-PR wake pass the gate. E1/E2 now filter by wake-start. [P2] A3 chain-broken contradicted the general "any failed check → blocked" rule. Called out A3 as the explicit non-blocking exception: SE delivered review + merge regardless; the broken handoff is a separate `[chain-broken:*]` ticket. Updated "When a check fails" and the terminal-status table to make this explicit. Co-Authored-By: Paperclip <noreply@paperclip.ing> * fix(agents/se): round-2 review fixes — bash semantics, precheck order Round-2 SE review of commit 72ea6ed found 2 P1 regressions and 1 P2 that the structural pass missed: [P1] CI-red branch fall-through. Old code had `exit 0`; my refactor replaced it with `# ... goto step 9`, but bash doesn't honor prose comments — execution fell through to `gh pr merge --squash --auto` and queued the merge for a PR that should stay blocked. Fix: introduce a `SKIP_MERGE` flag set by either precheck failure (CI red OR auto-merge disabled), and gate the `gh pr merge ...` block on `[ -z "$SKIP_MERGE" ]`. The single exit point is still step 9. [P1] A3 chain-broken issue never filed. Both A3 failure branches used `: "file [chain-broken:*] PR #<n> ..."`, but `:` is the bash null command — the string is just an evaluated argument, no issue is ever created. The wake closed `done` silently exactly as ANTI-PATTERNS #5 warned about. Fix: bash block now computes `A3_RESULT` and `A3_DETAIL` only; an explicit prose step below the bash block tells the agent to invoke the `paperclipCreateIssue` MCP tool when A3_RESULT is probe-unhealthy or not-triggered (filing an issue is a tool call, not a shell command, so it shouldn't have been in the bash block). [P2] `allow_auto_merge` precheck was documented AFTER the `gh pr merge --squash --auto` call that depends on it. If the setting drifts back to false, the merge call errors first and the diagnostic recovery never runs. Moved the precheck above the merge command (now under the same `c.` heading as the CI-red precheck), gated by the SKIP_MERGE flag described above. Co-Authored-By: Paperclip <noreply@paperclip.ing> --------- Co-authored-by: Paperclip <noreply@paperclip.ing>
Summary
Two commits, both surfaced from this morning's session investigating why PRs #174, #196, #199, #200 were stuck. Together they close the Staff Engineer PR-merge cycle for the dominant ohld-authored case.
fe65fa3—--autokills the merge/CI racePR #200 hit "the base branch policy prohibits the merge" 25s after the agent woke, because the bare
gh pr merge --squashran whiletest(~2 min) was still in flight. The previous 5-iteration polling loop masked this race; #187 removed it and exposed it. Replaced withgh pr merge --squash --auto— GitHub queues the squash and fires the momentlint+testpass. No polling, no race, no--adminbypass. Verification step learns three states (MERGED / queued / real-failure).1f259c9— step 7 self-review fallback + CTO handoff +ci.ymleditedtriggerThree failure modes upstream of
--auto:Step 7 told SE to never use
gh pr commentfor the review signal, butgh pr review --approve|--request-changesalways exits non-zero with the self-review block when the gh CLI is authed as the PR author — the dominant case for ohld-authored PRs. SE's working fallback (proven across PRs chore(deps): upgrade Python 3.10/3.12 → 3.14 (Batch 0) #193–chore(deps): fastapi 0.115 → 0.136, uvicorn 0.32 → 0.46 (Batch 3) #198) is a prefixedgh pr comment "STAFF ENGINEER REVIEW: APPROVED|CHANGES REQUESTED — <summary>". Documented as the expected path; reserved the "real review required" guidance for external-author PRs.PR test: describe_memes tests + suppress QA firefighting #174 sat 9 days because a
gh pr commentchange-request was posted with no Paperclip handoff, so CTO never picked it up. MadepaperclipCreateIssue [pr:NNN] address review changesMANDATORY whenever changes are requested for internal authors only (CTO can't fix external PRs — those tag@ohldinstead).Step 8a gated on
reviewDecision == APPROVED, which never triggers for ohld-authored PRs (self-review block leaves it empty). Re-stated as "Step 7 produced an APPROVAL outcome" — covers both real--approveand the comment-fallback. Branch protection onproductionrequires onlylint+test, so--autoqueues regardless.Plus:
ci.yml'spull_requesttriggers gaineditedso retargeted PRs fire CI. Today PR #196 retargeted from the deletedmainbranch toproductionandnotify-staff-engineerfired butlint+testdid not.Step 1 now also fetches
headRefNameso the internal-vs-external check (used in step 7's CTO handoff scope and step 8b's merge gate) actually has both vars to work with — previously the prose mentioned branch-prefix matching but onlyauthorwas fetched.Codex review
Two rounds. Round 1 found 1 [P1] (CTO handoff over-scoped to external authors) + 1 [P1] (changes-requested fallback didn't cancel a previously queued auto-merge). Round 2 found 1 [P2] (auto-merge cancel must come BEFORE the comment, not after — race window) + 1 [P2] (
HEAD_BRANCHreferenced but never fetched). All four addressed in1f259c9. Round 3: PASS — zero [P1] findings.Known follow-ups (Codex round-3 [P2]s, deferred)
These are about
fe65fa3's pre-existing design, not introduced by this PR. Worth a follow-up:gh pr checksred gate is over-broad — step 8c inspects every check on the PR, not just the requiredlint+test. The legacylinters.ymlworkflow runs in parallel; if it fails, SE will post "CI red, leaving merge blocked" even though--autowould still merge once required checks pass. Fix:gh pr checks --requiredor filter to required check names. Optional bigger fix: deletelinters.yml(it's superseded byci.ymlper fix(agents): close the PR review cycle — read trigger payload, force --approve, poll CI #189's intent).donewhen auto-merge is just queued — iflintortestlater turns red and no one pushes again, the queued merge sits indefinitely with no open execution issue to wake another agent. Trade-off: keep blocked untilstate == MERGED, OR add a periodic "queued PR sweep" routine.Test plan
python3 -c "import yaml; yaml.safe_load(open('.github/workflows/ci.yml'))"— parses;pull_request.types == ['opened', 'reopened', 'synchronize', 'edited']agents/staff-engineer/AGENTS.mdre-read end-to-end after edits: no danglingFail-loud verificationreferences, noverified via reviewDecision == APPROVED above(that block was removed), step 8a now uses "APPROVAL outcome" wording/codex reviewwithmodel_reasoning_effort=high— final round: zero [P1]lint+testjobs green on this PRagents/_sync_config.pypropagates to PaperclipSafety
ci.ymleditedis purely additive — worst case is more CI runs on retargetsgit revert <sha>reverts both commits cleanly