From 14e3eb34b7742d60a8945f2089bd6548c2c6cfc9 Mon Sep 17 00:00:00 2001 From: Nir Adler Date: Fri, 29 May 2026 16:29:57 +0300 Subject: [PATCH 01/16] docs: add debug-agent plugin spec and implementation plan --- .../plans/2026-05-29-debug-agent-plugin.md | 101 +++++ .../2026-05-29-debug-agent-plugin-design.md | 392 ++++++++++++++++++ 2 files changed, 493 insertions(+) create mode 100644 docs/superpowers/plans/2026-05-29-debug-agent-plugin.md create mode 100644 docs/superpowers/specs/2026-05-29-debug-agent-plugin-design.md diff --git a/docs/superpowers/plans/2026-05-29-debug-agent-plugin.md b/docs/superpowers/plans/2026-05-29-debug-agent-plugin.md new file mode 100644 index 0000000..52f4008 --- /dev/null +++ b/docs/superpowers/plans/2026-05-29-debug-agent-plugin.md @@ -0,0 +1,101 @@ +# debug-agent Plugin — Implementation Plan (high-level) + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development to implement this plan task-by-task. +> **This is a borrow-and-refine plan, not from-scratch.** Tasks state the goal, the source material to pull, refinement directives, and acceptance criteria. **The executing subagent decides the specific edits** — what to copy, cut, merge, and reword — within those boundaries. Do not expect prescribed line-by-line code. + +**Goal:** Ship a Claude Code plugin `debug-agent` bundling the existing `dbga` debugger skill + 3 consolidated language skills (Python/Go/Node) + 4 agents (architect + 3 experts), installable as a full plugin and as single skills via `npx skills`. + +**Architecture:** Plugin lives at `plugin/` with `.claude-plugin/marketplace.json` at repo root. Canonical skills under `plugin/skills/`; language-invariant content in `skills/_shared/`; agents in `plugin/agents/`. Content is merged from wshobson/agents + VoltAgent (both MIT) and refined with our Evidence-First + clean-code principles. + +**Tech stack:** Markdown SKILL.md + agent definitions; `dbga` (Python CLI); skill-creator eval harness; `npx skills` CLI; `claude plugin validate`. + +**Spec:** `docs/superpowers/specs/2026-05-29-debug-agent-plugin-design.md` — read it before starting; it holds the principles, layout, and decisions every task must honor. + +--- + +## Phase 0 — De-risk first (do before anything else) + +### Task 0: Verify `npx skills` resolves the planned layout +**Why first:** the whole canonical-skills-under-`plugin/skills/` decision rests on this. The `skills` CLI does NOT scan arbitrary depth by default. +- Create a throwaway `plugin/skills/probe/SKILL.md` and run `npx skills add --skill probe`. +- **Acceptance:** resolves cleanly. If it does NOT, switch to declaring skills in the manifest (or `--full-depth`) and record the chosen mechanism in the spec's Decisions section before proceeding. Delete the probe. + +--- + +## Phase A — Plugin skeleton (main thread, sequential) + +### Task 1: Scaffold plugin + manifests +**Files:** `plugin/.claude-plugin/plugin.json`, `.claude-plugin/marketplace.json` (repo root), `plugin/README.md`, `plugin/LICENSE`, `plugin/THIRD_PARTY_NOTICES.md`. +- Use the manifest sketches in the spec verbatim as the starting point. +- README documents BOTH install paths + the name glossary (`dbga` marketplace / `debug-agent` plugin / `debug_agent` import). +- `THIRD_PARTY_NOTICES.md`: placeholder structure now; subagents fill upstream MIT text + SHA per file they vendor. +- **Acceptance:** `claude plugin validate ./plugin` passes; `claude --plugin-dir ./plugin` loads with no errors. + +### Task 2: Move the `debug-agent` skill into the plugin +**Scope:** move `skills/debug-agent/` → `plugin/skills/debug-agent/` (keep its references intact). Update **all 5** references: `CLAUDE.md`, `CHANGELOG.md`, `README.md` (×3). +- Verify `git check-ignore -v plugin/.claude-plugin/plugin.json` does NOT match `.gitignore`'s `.claude/`. +- **Acceptance:** repo test suite still green (`uv run pytest -m "not e2e"`); the moved skill loads under the plugin; existing `npx skills add … --skill debug-agent` documented against the new path. +- **Exempt** this SKILL.md from the <500-word rule — do not rewrite it. + +### Task 3: Author `skills/_shared/` +**Files:** `plugin/skills/_shared/{clean-code,evidence-first,dependency-hygiene}.md`. +- Language-invariant only. clean-code = self-explaining, no-comments-unless-asked (mirror `code-simplifier` philosophy). evidence-first = the validation/debug discipline + the canonical Evidence-First block (single source of truth). dependency-hygiene = audit-then-**suggest** (mark mutating commands as suggest-only, never auto-run). +- **Acceptance:** the three files exist, are concise, and are the only home for this content (language skills will cross-reference them by name). + +### Task 4: Author the `architect` agent +**Files:** `plugin/agents/architect.md` (model: opus). +- Orchestration loop per spec; wired as opt-in main-thread agent (NOT forced via settings.json `agent` key). Allowed to dispatch the experts with per-call model override. Concise: checklist + when-to-delegate, defers detail to skills. +- **Acceptance:** appears in `/agents`; running `claude --agent debug-agent:architect` lets it dispatch an expert. + +### Task 5: `/debug-agent:setup` command + Task 6: `references/agent-teams.md` + Task 7: fix CLAUDE.md +- T5: `plugin/commands/setup.md` — optional installer (uv → pipx → pip fallback), prints `dbga --version`, notes missing Go/Node toolchains. **Acceptance:** `/debug-agent:setup` installs and confirms version. +- T6: `plugin/references/agent-teams.md` — document the experimental teams path (Windows = in-process). **Acceptance:** file present, accurate. +- T7: update repo `CLAUDE.md` "Python-only by design today" to the merged multi-language reality, matching the skill's Honest Limits. **Acceptance:** line no longer contradicts the shipped Go/Node support. + +--- + +## Phase B — Per-language (one subagent each, parallel, non-overlapping paths) + +> Dispatch 3 subagents — Python, Go, Node. Each owns ONLY `plugin/skills//**` and `plugin/agents/-expert.md`. **Each subagent figures out exactly what to borrow and how to refine it** within the directives below. + +### Task 8 / 9 / 10: Build `` skill + `-expert` agent +**Sources to pull (MIT):** +- Python: wshobson `python-development` skills (design-patterns, anti-patterns, code-style, error-handling, async, project-structure) + agent `python-pro`; VoltAgent `python-pro` depth. +- Go: wshobson `systems-programming/go-concurrency-patterns` + agent `golang-pro`; VoltAgent `golang-pro`. +- Node: wshobson `javascript-typescript` skills (modern-js, ts-advanced-types, nodejs-backend, js-testing) + agents `typescript-pro`/`javascript-pro`; VoltAgent `typescript-pro` (primary) + `javascript-pro` (JS-fallback section only). + +**Directives:** +- Write `plugin/skills//SKILL.md` as a **slim index (<500 words)** routing to `references/` (language-specific deltas only — see spec layout). Cross-reference `skills/_shared/*` and `debug-agent` **by name**; do NOT copy their content. +- Write language-specific reference files (design-patterns, concurrency/async, types where relevant, errors-structure, debugging recipes with `dbga`). +- Write `plugin/agents/-expert.md` (model: sonnet) — merge VoltAgent depth + wshobson structure, dedup, inject the Evidence-First block, point at its skill. Concise; no restating reference content. +- `description` = triggers only ("Use when…"), no workflow summary, keyword-rich. +- Add upstream MIT notice + SHA to `THIRD_PARTY_NOTICES.md` for files substantially copied. +- Draft `plugin/skills//evals/evals.json` (2–3 realistic prompts). +- **Acceptance:** skill loads as `/debug-agent:`; `wc -w SKILL.md` < 500; expert in `/agents`; references present; evals.json present; no duplication of `_shared` content. + +--- + +## Phase C — Eval + final verification + +### Task 11: Behavioral scenarios (all 4 skills) +- Run the 3 subagent scenarios from the spec (e2e architect→debug→fix→verify; correct-reference retrieval; no-comments-under-pressure) via the skill-creator baseline-vs-with-skill pattern, through a POSIX shell, `generate_review.py --static`. +- **Acceptance:** with-skill beats baseline on the no-comments + evidence-first assertions; gaps fed back into the skills. + +### Task 12: One shared description-trigger optimization +- Single ~20-query set (negatives = cross-skill near-misses python/go/node/debug-agent); run `run_loop`; apply each `best_description`. +- **Acceptance:** the four skills fire on their own intent and stay quiet on the others'. + +### Task 13: Full benchmark for `debug-agent` + `python` only +- aggregate_benchmark → review. Go/Node spot-checked, not full-looped. +- **Acceptance:** positive with-skill delta recorded (goal, not hard gate). + +### Task 14: Release verification +- `claude plugin validate ./plugin`; `--plugin-dir` load; `/help` lists `/debug-agent:*`; `/agents` lists architect + 3 experts; `npx skills add --skill python|go|node|debug-agent` each install standalone; e2e architect loop on a known-buggy script. +- **Acceptance:** all pass; tag `0.1.0`. + +--- + +## Notes +- Frequent commits per task on `feat/claude-plugin`. +- No AI attribution in commits/PRs (per user rules). +- Each Phase-B subagent works in isolated paths to avoid write conflicts; the main thread merges `THIRD_PARTY_NOTICES.md` additions if they touch the same file. diff --git a/docs/superpowers/specs/2026-05-29-debug-agent-plugin-design.md b/docs/superpowers/specs/2026-05-29-debug-agent-plugin-design.md new file mode 100644 index 0000000..0672f76 --- /dev/null +++ b/docs/superpowers/specs/2026-05-29-debug-agent-plugin-design.md @@ -0,0 +1,392 @@ +# Design: `debug-agent` Claude Code Plugin + +Date: 2026-05-29 +Status: Final — ready for implementation plan +Owner: Nir + +## Goal + +Package the `dbga` evidence-first debugger plus a consolidated set of +language skills and specialist agents as a distributable **Claude Code +plugin**, giving a complete **design → develop → debug deeply → verify → +clean up** workflow for Python, Go, and Node. + +Two install paths must both work cleanly: + +1. **Full plugin** via marketplace — + `claude plugin marketplace add niradler/dbga` then + `/plugin install debug-agent@dbga`. +2. **Single skill** via the `skills` CLI — + `npx skills add niradler/dbga --skill python` (or `go`, `node`, + `debug-agent`). + +## Final shape + +**4 agents** and **4 skills** — one consolidated skill + one expert per +language, an `architect` to orchestrate, and the debugger skill. + +### Agents (`agents/*.md`) + +| Agent | Model | Scope | +| --- | --- | --- | +| `architect` | **opus** | Language-agnostic. Owns high-level design, decomposition, cross-cutting decisions, and the evidence-first orchestration loop: gather runtime evidence → delegate language work to the matching expert → verify against real flows. Delegates; rarely writes code itself. | +| `python-expert` | sonnet (architect may override to opus for hard tasks) | Full Python specialist. Drives the `python` + `debug-agent` skills. | +| `go-expert` | sonnet (overridable) | Full Go specialist. Drives the `go` + `debug-agent` skills. | +| `node-expert` | sonnet (overridable) | TypeScript-focused (small JS-fallback section). Drives the `node` + `debug-agent` skills. | + +There is no separate `code-reviewer` agent: clean-code review is a +cross-cutting responsibility every agent carries (see Working Principles) and +is backed by each skill's `clean-code` reference. + +### Skills (`skills/*/SKILL.md`) + +| Skill | Role | +| --- | --- | +| `python` | Main Python development skill. SKILL.md routes to many reference files (progressive disclosure). | +| `go` | Main Go development skill + references. | +| `node` | Main Node/TypeScript development skill + references. | +| `debug-agent` | Existing evidence-first `dbga` driver (Python/Go/Node over DAP). Moved into the plugin. | + +Each skill is **self-contained** → any one installs cleanly on its own via +`npx skills`. Agents are plugin-only (the `skills` CLI installs skills, not +agents) — expected and documented. + +## Source material & licensing + +We **combine and learn from both** MIT-licensed sources — the goal is the +best result, not fidelity to any one repo: + +- **wshobson/agents** (MIT) — has both agents and skills. Supplies the + per-topic skill content (design-patterns, code-style, error-handling, async, + anti-patterns, concurrency) and lean specialist agents. +- **VoltAgent/awesome-claude-code-subagents** (MIT) — agents only, but deep + (e.g. `python-pro` ≈ 3,800 words: operational checklists, type-system + mastery, async, testing methodology, security, collaboration protocol). + +Combination rules: + +1. **Each language skill** consolidates the relevant wshobson skills as + **language-specific reference files**, enriched with the matching deep + sections harvested from VoltAgent's agents. Language-**invariant** content + (clean-code/no-comments, evidence-first discipline, dependency-hygiene + discipline) is authored **once** in `skills/_shared/` and cross-referenced + by name — never triple-copied across python/go/node. +2. **Each expert agent** merges the VoltAgent + wshobson versions of that + language (VoltAgent depth + wshobson structure), deduplicated, then points + at its skill + the `debug-agent` skill. +3. The `architect` agent is **authored fresh** (no single upstream + equivalent), distilling the cross-cutting orchestration + working + principles below. +4. Preserve upstream LICENSE/attribution; record the source commit SHA of each + vendored file. + +## Working principles (embedded in every agent + each skill's SKILL.md) + +These are the non-negotiables the whole plugin enforces: + +1. **Evidence and validation first.** Decisions are made by validating against + **real use flows run against the code** — not by reasoning about source. + Use logs, debugger breakpoints (`dbga`), and common practices to observe + what actually happens. Never declare a fix done until correct behavior is + **observed** at the point the bug occurred. +2. **Debug with the toolkit, don't guess.** On a crash/hang/wrong-output, + reach for the `debug-agent` skill and `dbga` (diagnose, live sessions, + `eval`, instrument) before sprinkling prints or guessing fixes. +3. **Proactive dependency hygiene.** On new install/setup and when touching + dependencies, push to latest and audit proactively, then suggest bumps: + - Node: `npm outdated`, `npm audit`, `npm install @latest`. + - Python: `uv lock --upgrade` / `uv pip install -U`, `pip-audit`. + - Go: `go list -u -m all`, `go get -u ./...`, `govulncheck ./...`. +4. **Clean, self-explaining code** (mirrors the official `code-simplifier`): + - Readable and **explicit over compact**; clarity beats brevity. + - **Never add code comments unless explicitly asked.** Code should explain + itself through clear names and structure. Remove comments that restate + obvious code. + - Avoid nested ternaries; prefer if/else or switch for multiple conditions. + - Reduce nesting and redundancy; consolidate related logic. + - Preserve functionality; don't over-simplify or strip helpful + abstractions. +5. **Deliver clean, working, verified code — always.** The loop is design → + implement → run real flows → debug with evidence → simplify → verify. +6. **Token economy.** These files are read by an agent, not a human. Slim, + to-the-point, minimum words while keeping what's vital. Authoring + constraints below enforce this. + +## Authoring constraints (slim, agent-facing — from writing-skills) + +Every skill and agent in this plugin follows: + +- **SKILL.md is the slim index, not the manual.** Target < 500 words; route to + `references/*.md` via progressive disclosure. Heavy/per-topic detail lives in + references, loaded only when needed. +- **Descriptions are triggers only.** Third person, start with "Use when…", + list symptoms/contexts. **No workflow summary** (a summarized description + makes the agent skip the body). +- **Names:** lowercase, hyphenated, active (`python`, `go`, `node`, + `debug-agent`; reference files like `error-handling`, `clean-code`). +- **Keyword coverage** for discovery (errors, symptoms, tool/command names). +- **Cross-reference by name**, not `@path` (no force-loading). Reference the + matching expert agent and `debug-agent` skill where relevant. +- **One excellent example per pattern**, not many; no multi-language dilution. +- **Agents are concise too** — operational checklist + when-to-delegate, detail + deferred to the skills they drive rather than restated inline. + +A short version of principles 1–2 is injected as a standard **Evidence-First +Debugging** block in each agent/skill body: + +```markdown +## Evidence-First Debugging (debug-agent toolkit) + +You have `dbga` — an evidence-first debugger for Python/Go/Node over DAP — +and the `debug-agent` skill. When code crashes, hangs, produces wrong output, +or you need live runtime state, DO NOT guess from source. Gather evidence: + +- `dbga diagnose -- ` → triage a crash to the deepest user frame +- `dbga session start --break-at file:line --