feat(agents): lean skill-dedup v1 — trust imported skills#177
Merged
ohld merged 2 commits intoproductionfrom Apr 23, 2026
Merged
feat(agents): lean skill-dedup v1 — trust imported skills#177ohld merged 2 commits intoproductionfrom
ohld merged 2 commits intoproductionfrom
Conversation
Prompt-level hotfix targeting the three patterns from recent Paperclip issue data (200 issues analyzed): 1. Slug-first titles ([pr:NNN], [incident:<slug>], [deploy:...], etc.) so recurring work collapses onto one ticket instead of spawning new ones each firefight. 2. Dedupe preflight via paperclipListIssues/Api search before every paperclipCreateIssue — kills "DB pool exhausted x3 tickets" pattern. 3. Single-writer rule — only CEO opens strategic issues; other agents create execution tickets only from their explicit workflow. 4. Staff Engineer PR Review idempotency — check gh pr view state first; skip review entirely if merged/closed. Kills stale PR Review tickets for PRs that were merged/closed externally. Block is wrapped in HTML comment markers (BEGIN/END: issue-hygiene-v1) so it can be ripped out with one sed when Paperclip ships equivalent platform features. Also corrects runbook: local Paperclip MCP must be registered via `claude mcp add` (writes to ~/.claude.json), NOT via `mcpServers` in `.claude/settings.local.json` (Claude Code ignores that key there).
…reimplementing Based on a full audit of each AGENTS.md against the gstack skill catalog, deletes custom prose that duplicates an imported skill's logic, and adds skill imports for gaps. ## QA - Collapse "Log Sources" — 3 fenced code blocks of raw Sentry/Coolify/DB commands become 3 one-liners. `/sentry-cli` (new import) handles Sentry. - Post-Deploy Verification — trust `/canary` for error scan + baseline; delete the redundant "check Sentry, check DB" steps that `/canary` already covers. Add `/sentry-cli` scan alongside `/canary` for an explicit issue view. - Exploratory Testing — replace 8-bullet "improvise" list with `/qa exhaustive` (covers the same surface with a structured tier). - Bug escalation now runs `/investigate` (new import) before handing off to CTO, producing a real root-cause report instead of just the error string. - New skills in frontmatter: sentry-cli, health, investigate. ## CEO - Delete the "think like a CEO" bullet list in the daily heartbeat — `/plan-ceo-review` carries the 10-star thinking prompts. - Delete the weekly-review checklist — `/retro` produces exactly this output (commit history, velocity, per-person contributions, trends). ## Staff Engineer - Delete generic PR review checklist (N+1, SQL injection, trust boundaries, stale reads, secrets) — `/review` covers all of it. - Keep only project-specific paranoia: candidates.py interpolation, blender invariants, public-repo secret rejection. - Add `/codex review` (adversarial second opinion) + `/simplify` (reuse/quality audit) to the review workflow. - New skills in frontmatter: codex, simplify. Net: -40 lines across the 3 files, zero behavior loss. Skills added in frontmatter are documentation — actual attachment via Paperclip API PATCH needs to follow.
Member
Author
|
Retargeting to production base: close/reopen to re-fire CI and Staff Engineer trigger |
This was referenced Apr 24, 2026
ohld
added a commit
that referenced
this pull request
Apr 27, 2026
…firefighting (#205) * feat(agents/se): Self-Check Gate + Anti-Patterns Log to close silent firefighting Adds a mandatory verification step before staff-engineer marks an execution issue done. Each PR-review outcome (merged / queued / blocked-CI / changes- requested / external / already-resolved) maps to a path with explicit checks that must pass; failures route to status=blocked with a reason instead of a silent close. Path A3 probes the next-link (Coolify deploy) via last_online_at on /api/v1/applications/<uuid>: if the container is still healthy on a pre-merge timestamp 5 min after merge, files [chain-broken:coolify-not-triggered] HIGH for CTO. This catches the GH→Coolify webhook drop case ohld reported. ANTI-PATTERNS.md is the case-log feeding the gate. Each row maps to a check; six seed rows from real PRs (#177 6-day silent trigger drop, #199 17h zero-artifact merge, #201 auto-merge race during changes-requested, #200 bare-merge CI race, the user-reported chain-break, and the SC=$(gh pr view --json comments) JSON corruption found while testing). Tested locally against PRs #199 (correctly identified as silent exit, 0 review-signal artifacts) and #201 (signal found, A3 timestamp probe passes). * fix(agents/se): address codex review of self-check gate Two real bugs codex caught before push: [P1] A2/B2/C2 grep was too narrow — only matched the comment-fallback form (STAFF ENGINEER REVIEW: APPROVED) used when GitHub self-review-blocks ohld. For non-ohld internal and external authors, step 7 posts a real `gh pr review --approve -b "Review summary"` whose body lacks the prefix. Those valid approvals would have failed the gate. Fix: accept EITHER a .reviews[] entry with state="APPROVED" OR the comment-prefix. Same dual-form for D1 CHANGES_REQUESTED. [P2] A3 Coolify probe fired immediately after merge, when last_online_at is still pre-merge from the previous deploy. Coolify needs ~3-5 min for the deploy + healthcheck cycle. Without a grace window, every healthy PR would file [chain-broken:coolify-not-triggered] and drown CTO in false alarms. Fix: probe only fires when now - mergedAt >= 300s; otherwise deferred to QA's hourly Process Health Check. Both fixes logged as #7 and #8 in ANTI-PATTERNS.md so the same blind spots don't reappear in future gate revisions. * fix(agents/se): address SE agent CHANGES REQUESTED on PR #205 Two P1s the SE agent caught reviewing this PR: [P1.1] Path A3 used BSD `date -u -j -f` which doesn't exist on the Linux agent runtime. Probe failed at the first line, MERGED_EPOCH was empty, the chain-broken issue never fired. Fix: GNU `date -u -d` auto-parses both mergedAt (ISO 8601) and last_online_at (YYYY-MM-DD HH:MM:SS). [P1.2] A2/B2/C2/D1/E1 jq commands grepped ALL artifacts on the PR. Spec said "for THIS run" but had no time filter, so a stale APPROVED comment from a prior wake would let a current silent-exit wake pass A2 — exactly the silent-close mode the gate was meant to fix. Fix: capture WAKE_START_ISO at the top of every wake, filter via --arg t in A2 + D1 jq calls. Skipped per minimal-code preference: ANTI-PATTERNS rows for these (caught pre-merge, not a production failure), Coolify UUID drift note, D3 MCP-tool prose tweak. All non-blocking from the SE review. Verified locally against PR #205 own data — the new wake-scope filter correctly finds the SE review when WAKE_START < submittedAt and rejects it otherwise. * fix(agents/se): route all SE wake exits through Self-Check Gate Codex adversarial review of PR #205 (5 findings, 1 P1 + 4 P2): [P1] Steps 0/7/8 bypassed the gate entirely — `paperclipUpdateIssue done|blocked` was called directly inside each step, so the verification block in step 9 was dead code as wired (cases #1/#3/#4/#5 from the ANTI-PATTERNS log all closed via those direct calls). Restructured so each terminal branch sets `OUTCOME_PATH=A|B|C|D|E|F` and jumps to step 9; step 9 is now the single `paperclipUpdateIssue` call site for the wake. Added an explicit terminal-status mapping table. [P2] `/tmp/sc.json` and `/tmp/app.json` are cross-run race traps — two parallel SE wakes overwrite each other's snapshots. PR-scoped to `/tmp/sc-${PR_NUMBER}.json` and `/tmp/app-${PR_NUMBER}.json`. [P2] Path A3 Coolify probe didn't validate the curl response. Empty body on 401/404/500/network failure → `jq -r .last_online_at` empty → `date -u -d ""` undefined → silent no-op. Now checks curl exit, HTTP 200, and non-empty `last_online_at`; on failure files `[chain-broken:coolify-probe-unhealthy]` for CTO. [P2] Path E was missing the `>= $WAKE_START_ISO` freshness filter that A2/B2/C2/D1 already have. A stale APPROVED review from a prior wake could have let a current silent-exit external-PR wake pass the gate. E1/E2 now filter by wake-start. [P2] A3 chain-broken contradicted the general "any failed check → blocked" rule. Called out A3 as the explicit non-blocking exception: SE delivered review + merge regardless; the broken handoff is a separate `[chain-broken:*]` ticket. Updated "When a check fails" and the terminal-status table to make this explicit. Co-Authored-By: Paperclip <noreply@paperclip.ing> * fix(agents/se): round-2 review fixes — bash semantics, precheck order Round-2 SE review of commit 72ea6ed found 2 P1 regressions and 1 P2 that the structural pass missed: [P1] CI-red branch fall-through. Old code had `exit 0`; my refactor replaced it with `# ... goto step 9`, but bash doesn't honor prose comments — execution fell through to `gh pr merge --squash --auto` and queued the merge for a PR that should stay blocked. Fix: introduce a `SKIP_MERGE` flag set by either precheck failure (CI red OR auto-merge disabled), and gate the `gh pr merge ...` block on `[ -z "$SKIP_MERGE" ]`. The single exit point is still step 9. [P1] A3 chain-broken issue never filed. Both A3 failure branches used `: "file [chain-broken:*] PR #<n> ..."`, but `:` is the bash null command — the string is just an evaluated argument, no issue is ever created. The wake closed `done` silently exactly as ANTI-PATTERNS #5 warned about. Fix: bash block now computes `A3_RESULT` and `A3_DETAIL` only; an explicit prose step below the bash block tells the agent to invoke the `paperclipCreateIssue` MCP tool when A3_RESULT is probe-unhealthy or not-triggered (filing an issue is a tool call, not a shell command, so it shouldn't have been in the bash block). [P2] `allow_auto_merge` precheck was documented AFTER the `gh pr merge --squash --auto` call that depends on it. If the setting drifts back to false, the merge call errors first and the diagnostic recovery never runs. Moved the precheck above the merge command (now under the same `c.` heading as the CI-red precheck), gated by the SKIP_MERGE flag described above. Co-Authored-By: Paperclip <noreply@paperclip.ing> --------- Co-authored-by: Paperclip <noreply@paperclip.ing>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Audit of each AGENTS.md against the gstack skill catalog surfaced ~40 lines of custom prose that re-implements logic a skill the agent already imports already does, plus 3 skill imports that should be added. This PR ships the top-3 wins identified by the audit.
Stacked on top of #176 (issue hygiene v1). Target base was set to
feat/agent-issue-hygiene-v1— merge #176 first, then this.Changes
QA (
agents/qa/AGENTS.md, -45 lines)Log Sources: 3 fenced blocks of raw Sentry/Coolify/DB commands become 3 one-liners.Post-Deploy Verification: trust/canaryfor error scan + baseline; delete redundant "check Sentry / check DB" steps that/canaryalready covers. Add explicit Sentry issue scan.Exploratory Testing: replace 8-bullet "improvise" list with/qa exhaustive(structured tier covers the same surface)./investigatebefore handing off to CTO, producing a real root-cause report.health,investigate(attached via Paperclip API PATCH).CEO (
agents/ceo/AGENTS.md, -12 lines)/plan-ceo-reviewalready carries the 10-star thinking prompts./retroproduces exactly this output (commit history, velocity, per-person contributions, trends).Staff Engineer (
agents/staff-engineer/AGENTS.md, -11 lines)/reviewcovers it all.candidates.pySQL interpolation, blender weight invariants, public-repo secret rejection./codex reviewas adversarial second-opinion step (Codex CLI verified installed + authenticated on the Paperclip runtime at/paperclip/.codex/auth.json).codex(attached via Paperclip API PATCH).Skills NOT added
sentry-cliandsimplifyexist in Claude Code's plugin marketplace but are NOT in garrytan/gstack — so we can't import them into Paperclip without extra work. QA's Sentry calls use the rawsentryCLI directly (already installed on the server). Staff Engineer drops/simplifyentirely;/codex review+/reviewcover the surface.What's already done
./agents/deploy.sh)POST /skills/import)health,investigatecodexGET /api/agents/<id>Test plan
/investigateon a critical error instead of a bare "CRITICAL" paperclip ticket/canaryonly (no manual Sentry+DB duplicate scan)/review+/codex reviewin sequence/retrowithout the redundant checklist