Skip to content

feat(agents): lean skill-dedup v1 — trust imported skills#177

Merged
ohld merged 2 commits intoproductionfrom
feat/agent-skill-dedup-v1
Apr 23, 2026
Merged

feat(agents): lean skill-dedup v1 — trust imported skills#177
ohld merged 2 commits intoproductionfrom
feat/agent-skill-dedup-v1

Conversation

@ohld
Copy link
Copy Markdown
Member

@ohld ohld commented Apr 17, 2026

Summary

Audit of each AGENTS.md against the gstack skill catalog surfaced ~40 lines of custom prose that re-implements logic a skill the agent already imports already does, plus 3 skill imports that should be added. This PR ships the top-3 wins identified by the audit.

Stacked on top of #176 (issue hygiene v1). Target base was set to feat/agent-issue-hygiene-v1 — merge #176 first, then this.

Changes

QA (agents/qa/AGENTS.md, -45 lines)

  • Collapse Log Sources: 3 fenced blocks of raw Sentry/Coolify/DB commands become 3 one-liners.
  • Post-Deploy Verification: trust /canary for error scan + baseline; delete redundant "check Sentry / check DB" steps that /canary already covers. Add explicit Sentry issue scan.
  • Exploratory Testing: replace 8-bullet "improvise" list with /qa exhaustive (structured tier covers the same surface).
  • Critical-bug escalation now runs /investigate before handing off to CTO, producing a real root-cause report.
  • Added skills: health, investigate (attached via Paperclip API PATCH).

CEO (agents/ceo/AGENTS.md, -12 lines)

  • Drop the "think like a CEO" bullet list in the daily heartbeat — /plan-ceo-review already carries the 10-star thinking prompts.
  • Drop the weekly-review checklist — /retro produces exactly this output (commit history, velocity, per-person contributions, trends).

Staff Engineer (agents/staff-engineer/AGENTS.md, -11 lines)

  • Drop the generic PR review checklist (N+1, SQL injection, trust boundaries, stale reads, secrets) — /review covers it all.
  • Keep only project-specific paranoia: candidates.py SQL interpolation, blender weight invariants, public-repo secret rejection.
  • Add /codex review as adversarial second-opinion step (Codex CLI verified installed + authenticated on the Paperclip runtime at /paperclip/.codex/auth.json).
  • Added skill: codex (attached via Paperclip API PATCH).

Skills NOT added

sentry-cli and simplify exist in Claude Code's plugin marketplace but are NOT in garrytan/gstack — so we can't import them into Paperclip without extra work. QA's Sentry calls use the raw sentry CLI directly (already installed on the server). Staff Engineer drops /simplify entirely; /codex review + /review cover the surface.

What's already done

  • Edited AGENTS.md files locally
  • Deployed to Paperclip server (./agents/deploy.sh)
  • Re-imported gstack skills to pick up latest (POST /skills/import)
  • PATCH'd QA agent — attached health, investigate
  • PATCH'd Staff Engineer agent — attached codex
  • Verified attachment via GET /api/agents/<id>

Test plan

  • Next QA scan triggers /investigate on a critical error instead of a bare "CRITICAL" paperclip ticket
  • Next QA post-deploy run invokes /canary only (no manual Sentry+DB duplicate scan)
  • Next Staff Engineer PR review runs /review + /codex review in sequence
  • Next Monday CEO heartbeat runs /retro without the redundant checklist

ohld added 2 commits April 17, 2026 11:14
Prompt-level hotfix targeting the three patterns from recent Paperclip
issue data (200 issues analyzed):

1. Slug-first titles ([pr:NNN], [incident:<slug>], [deploy:...], etc.)
   so recurring work collapses onto one ticket instead of spawning new
   ones each firefight.
2. Dedupe preflight via paperclipListIssues/Api search before every
   paperclipCreateIssue — kills "DB pool exhausted x3 tickets" pattern.
3. Single-writer rule — only CEO opens strategic issues; other agents
   create execution tickets only from their explicit workflow.
4. Staff Engineer PR Review idempotency — check gh pr view state first;
   skip review entirely if merged/closed. Kills stale PR Review tickets
   for PRs that were merged/closed externally.

Block is wrapped in HTML comment markers (BEGIN/END: issue-hygiene-v1)
so it can be ripped out with one sed when Paperclip ships equivalent
platform features.

Also corrects runbook: local Paperclip MCP must be registered via
`claude mcp add` (writes to ~/.claude.json), NOT via `mcpServers` in
`.claude/settings.local.json` (Claude Code ignores that key there).
…reimplementing

Based on a full audit of each AGENTS.md against the gstack skill catalog,
deletes custom prose that duplicates an imported skill's logic, and adds
skill imports for gaps.

## QA
- Collapse "Log Sources" — 3 fenced code blocks of raw Sentry/Coolify/DB
  commands become 3 one-liners. `/sentry-cli` (new import) handles Sentry.
- Post-Deploy Verification — trust `/canary` for error scan + baseline;
  delete the redundant "check Sentry, check DB" steps that `/canary`
  already covers. Add `/sentry-cli` scan alongside `/canary` for an
  explicit issue view.
- Exploratory Testing — replace 8-bullet "improvise" list with
  `/qa exhaustive` (covers the same surface with a structured tier).
- Bug escalation now runs `/investigate` (new import) before handing
  off to CTO, producing a real root-cause report instead of just the
  error string.
- New skills in frontmatter: sentry-cli, health, investigate.

## CEO
- Delete the "think like a CEO" bullet list in the daily heartbeat —
  `/plan-ceo-review` carries the 10-star thinking prompts.
- Delete the weekly-review checklist — `/retro` produces exactly this
  output (commit history, velocity, per-person contributions, trends).

## Staff Engineer
- Delete generic PR review checklist (N+1, SQL injection, trust
  boundaries, stale reads, secrets) — `/review` covers all of it.
- Keep only project-specific paranoia: candidates.py interpolation,
  blender invariants, public-repo secret rejection.
- Add `/codex review` (adversarial second opinion) + `/simplify`
  (reuse/quality audit) to the review workflow.
- New skills in frontmatter: codex, simplify.

Net: -40 lines across the 3 files, zero behavior loss.

Skills added in frontmatter are documentation — actual attachment via
Paperclip API PATCH needs to follow.
@ohld ohld changed the base branch from feat/agent-issue-hygiene-v1 to production April 23, 2026 10:21
@ohld
Copy link
Copy Markdown
Member Author

ohld commented Apr 23, 2026

Retargeting to production base: close/reopen to re-fire CI and Staff Engineer trigger

@ohld ohld closed this Apr 23, 2026
@ohld ohld reopened this Apr 23, 2026
@ohld ohld merged commit a5fcf2c into production Apr 23, 2026
3 checks passed
@ohld ohld deleted the feat/agent-skill-dedup-v1 branch April 23, 2026 10:27
ohld added a commit that referenced this pull request Apr 27, 2026
…firefighting (#205)

* feat(agents/se): Self-Check Gate + Anti-Patterns Log to close silent firefighting

Adds a mandatory verification step before staff-engineer marks an execution
issue done. Each PR-review outcome (merged / queued / blocked-CI / changes-
requested / external / already-resolved) maps to a path with explicit checks
that must pass; failures route to status=blocked with a reason instead of a
silent close.

Path A3 probes the next-link (Coolify deploy) via last_online_at on
/api/v1/applications/<uuid>: if the container is still healthy on a pre-merge
timestamp 5 min after merge, files [chain-broken:coolify-not-triggered]
HIGH for CTO. This catches the GH→Coolify webhook drop case ohld reported.

ANTI-PATTERNS.md is the case-log feeding the gate. Each row maps to a check;
six seed rows from real PRs (#177 6-day silent trigger drop, #199 17h
zero-artifact merge, #201 auto-merge race during changes-requested, #200
bare-merge CI race, the user-reported chain-break, and the
SC=$(gh pr view --json comments) JSON corruption found while testing).

Tested locally against PRs #199 (correctly identified as silent exit, 0
review-signal artifacts) and #201 (signal found, A3 timestamp probe passes).

* fix(agents/se): address codex review of self-check gate

Two real bugs codex caught before push:

[P1] A2/B2/C2 grep was too narrow — only matched the comment-fallback form
(STAFF ENGINEER REVIEW: APPROVED) used when GitHub self-review-blocks ohld.
For non-ohld internal and external authors, step 7 posts a real `gh pr review
--approve -b "Review summary"` whose body lacks the prefix. Those valid
approvals would have failed the gate. Fix: accept EITHER a .reviews[] entry
with state="APPROVED" OR the comment-prefix. Same dual-form for D1
CHANGES_REQUESTED.

[P2] A3 Coolify probe fired immediately after merge, when last_online_at is
still pre-merge from the previous deploy. Coolify needs ~3-5 min for the
deploy + healthcheck cycle. Without a grace window, every healthy PR would
file [chain-broken:coolify-not-triggered] and drown CTO in false alarms.
Fix: probe only fires when now - mergedAt >= 300s; otherwise deferred to
QA's hourly Process Health Check.

Both fixes logged as #7 and #8 in ANTI-PATTERNS.md so the same blind spots
don't reappear in future gate revisions.

* fix(agents/se): address SE agent CHANGES REQUESTED on PR #205

Two P1s the SE agent caught reviewing this PR:

[P1.1] Path A3 used BSD `date -u -j -f` which doesn't exist on the Linux
agent runtime. Probe failed at the first line, MERGED_EPOCH was empty, the
chain-broken issue never fired. Fix: GNU `date -u -d` auto-parses both
mergedAt (ISO 8601) and last_online_at (YYYY-MM-DD HH:MM:SS).

[P1.2] A2/B2/C2/D1/E1 jq commands grepped ALL artifacts on the PR. Spec
said "for THIS run" but had no time filter, so a stale APPROVED comment
from a prior wake would let a current silent-exit wake pass A2 — exactly
the silent-close mode the gate was meant to fix. Fix: capture
WAKE_START_ISO at the top of every wake, filter via --arg t in A2 + D1
jq calls.

Skipped per minimal-code preference: ANTI-PATTERNS rows for these (caught
pre-merge, not a production failure), Coolify UUID drift note, D3 MCP-tool
prose tweak. All non-blocking from the SE review.

Verified locally against PR #205 own data — the new wake-scope filter
correctly finds the SE review when WAKE_START < submittedAt and rejects
it otherwise.

* fix(agents/se): route all SE wake exits through Self-Check Gate

Codex adversarial review of PR #205 (5 findings, 1 P1 + 4 P2):

[P1] Steps 0/7/8 bypassed the gate entirely — `paperclipUpdateIssue
done|blocked` was called directly inside each step, so the verification
block in step 9 was dead code as wired (cases #1/#3/#4/#5 from the
ANTI-PATTERNS log all closed via those direct calls). Restructured so
each terminal branch sets `OUTCOME_PATH=A|B|C|D|E|F` and jumps to step
9; step 9 is now the single `paperclipUpdateIssue` call site for the
wake. Added an explicit terminal-status mapping table.

[P2] `/tmp/sc.json` and `/tmp/app.json` are cross-run race traps —
two parallel SE wakes overwrite each other's snapshots. PR-scoped to
`/tmp/sc-${PR_NUMBER}.json` and `/tmp/app-${PR_NUMBER}.json`.

[P2] Path A3 Coolify probe didn't validate the curl response. Empty
body on 401/404/500/network failure → `jq -r .last_online_at` empty →
`date -u -d ""` undefined → silent no-op. Now checks curl exit, HTTP
200, and non-empty `last_online_at`; on failure files
`[chain-broken:coolify-probe-unhealthy]` for CTO.

[P2] Path E was missing the `>= $WAKE_START_ISO` freshness filter that
A2/B2/C2/D1 already have. A stale APPROVED review from a prior wake
could have let a current silent-exit external-PR wake pass the gate.
E1/E2 now filter by wake-start.

[P2] A3 chain-broken contradicted the general "any failed check →
blocked" rule. Called out A3 as the explicit non-blocking exception:
SE delivered review + merge regardless; the broken handoff is a
separate `[chain-broken:*]` ticket. Updated "When a check fails" and
the terminal-status table to make this explicit.

Co-Authored-By: Paperclip <noreply@paperclip.ing>

* fix(agents/se): round-2 review fixes — bash semantics, precheck order

Round-2 SE review of commit 72ea6ed found 2 P1 regressions and 1 P2
that the structural pass missed:

[P1] CI-red branch fall-through. Old code had `exit 0`; my refactor
replaced it with `# ... goto step 9`, but bash doesn't honor prose
comments — execution fell through to `gh pr merge --squash --auto`
and queued the merge for a PR that should stay blocked. Fix: introduce
a `SKIP_MERGE` flag set by either precheck failure (CI red OR
auto-merge disabled), and gate the `gh pr merge ...` block on
`[ -z "$SKIP_MERGE" ]`. The single exit point is still step 9.

[P1] A3 chain-broken issue never filed. Both A3 failure branches used
`: "file [chain-broken:*] PR #<n> ..."`, but `:` is the bash null
command — the string is just an evaluated argument, no issue is ever
created. The wake closed `done` silently exactly as ANTI-PATTERNS #5
warned about. Fix: bash block now computes `A3_RESULT` and `A3_DETAIL`
only; an explicit prose step below the bash block tells the agent to
invoke the `paperclipCreateIssue` MCP tool when A3_RESULT is
probe-unhealthy or not-triggered (filing an issue is a tool call, not
a shell command, so it shouldn't have been in the bash block).

[P2] `allow_auto_merge` precheck was documented AFTER the
`gh pr merge --squash --auto` call that depends on it. If the setting
drifts back to false, the merge call errors first and the diagnostic
recovery never runs. Moved the precheck above the merge command (now
under the same `c.` heading as the CI-red precheck), gated by the
SKIP_MERGE flag described above.

Co-Authored-By: Paperclip <noreply@paperclip.ing>

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant