From 14e3eb34b7742d60a8945f2089bd6548c2c6cfc9 Mon Sep 17 00:00:00 2001
From: Nir Adler <me@niradler.com>
Date: Fri, 29 May 2026 16:29:57 +0300
Subject: [PATCH 01/16] docs: add debug-agent plugin spec and implementation
 plan

---
 .../plans/2026-05-29-debug-agent-plugin.md    | 101 +++++
 .../2026-05-29-debug-agent-plugin-design.md   | 392 ++++++++++++++++++
 2 files changed, 493 insertions(+)
 create mode 100644 docs/superpowers/plans/2026-05-29-debug-agent-plugin.md
 create mode 100644 docs/superpowers/specs/2026-05-29-debug-agent-plugin-design.md

diff --git a/docs/superpowers/plans/2026-05-29-debug-agent-plugin.md b/docs/superpowers/plans/2026-05-29-debug-agent-plugin.md
new file mode 100644
index 0000000..52f4008
--- /dev/null
+++ b/docs/superpowers/plans/2026-05-29-debug-agent-plugin.md
@@ -0,0 +1,101 @@
+# debug-agent Plugin — Implementation Plan (high-level)
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development to implement this plan task-by-task.
+> **This is a borrow-and-refine plan, not from-scratch.** Tasks state the goal, the source material to pull, refinement directives, and acceptance criteria. **The executing subagent decides the specific edits** — what to copy, cut, merge, and reword — within those boundaries. Do not expect prescribed line-by-line code.
+
+**Goal:** Ship a Claude Code plugin `debug-agent` bundling the existing `dbga` debugger skill + 3 consolidated language skills (Python/Go/Node) + 4 agents (architect + 3 experts), installable as a full plugin and as single skills via `npx skills`.
+
+**Architecture:** Plugin lives at `plugin/` with `.claude-plugin/marketplace.json` at repo root. Canonical skills under `plugin/skills/`; language-invariant content in `skills/_shared/`; agents in `plugin/agents/`. Content is merged from wshobson/agents + VoltAgent (both MIT) and refined with our Evidence-First + clean-code principles.
+
+**Tech stack:** Markdown SKILL.md + agent definitions; `dbga` (Python CLI); skill-creator eval harness; `npx skills` CLI; `claude plugin validate`.
+
+**Spec:** `docs/superpowers/specs/2026-05-29-debug-agent-plugin-design.md` — read it before starting; it holds the principles, layout, and decisions every task must honor.
+
+---
+
+## Phase 0 — De-risk first (do before anything else)
+
+### Task 0: Verify `npx skills` resolves the planned layout
+**Why first:** the whole canonical-skills-under-`plugin/skills/` decision rests on this. The `skills` CLI does NOT scan arbitrary depth by default.
+- Create a throwaway `plugin/skills/probe/SKILL.md` and run `npx skills add <local-clone-or-path> --skill probe`.
+- **Acceptance:** resolves cleanly. If it does NOT, switch to declaring skills in the manifest (or `--full-depth`) and record the chosen mechanism in the spec's Decisions section before proceeding. Delete the probe.
+
+---
+
+## Phase A — Plugin skeleton (main thread, sequential)
+
+### Task 1: Scaffold plugin + manifests
+**Files:** `plugin/.claude-plugin/plugin.json`, `.claude-plugin/marketplace.json` (repo root), `plugin/README.md`, `plugin/LICENSE`, `plugin/THIRD_PARTY_NOTICES.md`.
+- Use the manifest sketches in the spec verbatim as the starting point.
+- README documents BOTH install paths + the name glossary (`dbga` marketplace / `debug-agent` plugin / `debug_agent` import).
+- `THIRD_PARTY_NOTICES.md`: placeholder structure now; subagents fill upstream MIT text + SHA per file they vendor.
+- **Acceptance:** `claude plugin validate ./plugin` passes; `claude --plugin-dir ./plugin` loads with no errors.
+
+### Task 2: Move the `debug-agent` skill into the plugin
+**Scope:** move `skills/debug-agent/` → `plugin/skills/debug-agent/` (keep its references intact). Update **all 5** references: `CLAUDE.md`, `CHANGELOG.md`, `README.md` (×3).
+- Verify `git check-ignore -v plugin/.claude-plugin/plugin.json` does NOT match `.gitignore`'s `.claude/`.
+- **Acceptance:** repo test suite still green (`uv run pytest -m "not e2e"`); the moved skill loads under the plugin; existing `npx skills add … --skill debug-agent` documented against the new path.
+- **Exempt** this SKILL.md from the <500-word rule — do not rewrite it.
+
+### Task 3: Author `skills/_shared/`
+**Files:** `plugin/skills/_shared/{clean-code,evidence-first,dependency-hygiene}.md`.
+- Language-invariant only. clean-code = self-explaining, no-comments-unless-asked (mirror `code-simplifier` philosophy). evidence-first = the validation/debug discipline + the canonical Evidence-First block (single source of truth). dependency-hygiene = audit-then-**suggest** (mark mutating commands as suggest-only, never auto-run).
+- **Acceptance:** the three files exist, are concise, and are the only home for this content (language skills will cross-reference them by name).
+
+### Task 4: Author the `architect` agent
+**Files:** `plugin/agents/architect.md` (model: opus).
+- Orchestration loop per spec; wired as opt-in main-thread agent (NOT forced via settings.json `agent` key). Allowed to dispatch the experts with per-call model override. Concise: checklist + when-to-delegate, defers detail to skills.
+- **Acceptance:** appears in `/agents`; running `claude --agent debug-agent:architect` lets it dispatch an expert.
+
+### Task 5: `/debug-agent:setup` command + Task 6: `references/agent-teams.md` + Task 7: fix CLAUDE.md
+- T5: `plugin/commands/setup.md` — optional installer (uv → pipx → pip fallback), prints `dbga --version`, notes missing Go/Node toolchains. **Acceptance:** `/debug-agent:setup` installs and confirms version.
+- T6: `plugin/references/agent-teams.md` — document the experimental teams path (Windows = in-process). **Acceptance:** file present, accurate.
+- T7: update repo `CLAUDE.md` "Python-only by design today" to the merged multi-language reality, matching the skill's Honest Limits. **Acceptance:** line no longer contradicts the shipped Go/Node support.
+
+---
+
+## Phase B — Per-language (one subagent each, parallel, non-overlapping paths)
+
+> Dispatch 3 subagents — Python, Go, Node. Each owns ONLY `plugin/skills/<lang>/**` and `plugin/agents/<lang>-expert.md`. **Each subagent figures out exactly what to borrow and how to refine it** within the directives below.
+
+### Task 8 / 9 / 10: Build `<lang>` skill + `<lang>-expert` agent
+**Sources to pull (MIT):**
+- Python: wshobson `python-development` skills (design-patterns, anti-patterns, code-style, error-handling, async, project-structure) + agent `python-pro`; VoltAgent `python-pro` depth.
+- Go: wshobson `systems-programming/go-concurrency-patterns` + agent `golang-pro`; VoltAgent `golang-pro`.
+- Node: wshobson `javascript-typescript` skills (modern-js, ts-advanced-types, nodejs-backend, js-testing) + agents `typescript-pro`/`javascript-pro`; VoltAgent `typescript-pro` (primary) + `javascript-pro` (JS-fallback section only).
+
+**Directives:**
+- Write `plugin/skills/<lang>/SKILL.md` as a **slim index (<500 words)** routing to `references/` (language-specific deltas only — see spec layout). Cross-reference `skills/_shared/*` and `debug-agent` **by name**; do NOT copy their content.
+- Write language-specific reference files (design-patterns, concurrency/async, types where relevant, errors-structure, debugging recipes with `dbga`).
+- Write `plugin/agents/<lang>-expert.md` (model: sonnet) — merge VoltAgent depth + wshobson structure, dedup, inject the Evidence-First block, point at its skill. Concise; no restating reference content.
+- `description` = triggers only ("Use when…"), no workflow summary, keyword-rich.
+- Add upstream MIT notice + SHA to `THIRD_PARTY_NOTICES.md` for files substantially copied.
+- Draft `plugin/skills/<lang>/evals/evals.json` (2–3 realistic prompts).
+- **Acceptance:** skill loads as `/debug-agent:<lang>`; `wc -w SKILL.md` < 500; expert in `/agents`; references present; evals.json present; no duplication of `_shared` content.
+
+---
+
+## Phase C — Eval + final verification
+
+### Task 11: Behavioral scenarios (all 4 skills)
+- Run the 3 subagent scenarios from the spec (e2e architect→debug→fix→verify; correct-reference retrieval; no-comments-under-pressure) via the skill-creator baseline-vs-with-skill pattern, through a POSIX shell, `generate_review.py --static`.
+- **Acceptance:** with-skill beats baseline on the no-comments + evidence-first assertions; gaps fed back into the skills.
+
+### Task 12: One shared description-trigger optimization
+- Single ~20-query set (negatives = cross-skill near-misses python/go/node/debug-agent); run `run_loop`; apply each `best_description`.
+- **Acceptance:** the four skills fire on their own intent and stay quiet on the others'.
+
+### Task 13: Full benchmark for `debug-agent` + `python` only
+- aggregate_benchmark → review. Go/Node spot-checked, not full-looped.
+- **Acceptance:** positive with-skill delta recorded (goal, not hard gate).
+
+### Task 14: Release verification
+- `claude plugin validate ./plugin`; `--plugin-dir` load; `/help` lists `/debug-agent:*`; `/agents` lists architect + 3 experts; `npx skills add <clone> --skill python|go|node|debug-agent` each install standalone; e2e architect loop on a known-buggy script.
+- **Acceptance:** all pass; tag `0.1.0`.
+
+---
+
+## Notes
+- Frequent commits per task on `feat/claude-plugin`.
+- No AI attribution in commits/PRs (per user rules).
+- Each Phase-B subagent works in isolated paths to avoid write conflicts; the main thread merges `THIRD_PARTY_NOTICES.md` additions if they touch the same file.
diff --git a/docs/superpowers/specs/2026-05-29-debug-agent-plugin-design.md b/docs/superpowers/specs/2026-05-29-debug-agent-plugin-design.md
new file mode 100644
index 0000000..0672f76
--- /dev/null
+++ b/docs/superpowers/specs/2026-05-29-debug-agent-plugin-design.md
@@ -0,0 +1,392 @@
+# Design: `debug-agent` Claude Code Plugin
+
+Date: 2026-05-29
+Status: Final — ready for implementation plan
+Owner: Nir
+
+## Goal
+
+Package the `dbga` evidence-first debugger plus a consolidated set of
+language skills and specialist agents as a distributable **Claude Code
+plugin**, giving a complete **design → develop → debug deeply → verify →
+clean up** workflow for Python, Go, and Node.
+
+Two install paths must both work cleanly:
+
+1. **Full plugin** via marketplace —
+   `claude plugin marketplace add niradler/dbga` then
+   `/plugin install debug-agent@dbga`.
+2. **Single skill** via the `skills` CLI —
+   `npx skills add niradler/dbga --skill python` (or `go`, `node`,
+   `debug-agent`).
+
+## Final shape
+
+**4 agents** and **4 skills** — one consolidated skill + one expert per
+language, an `architect` to orchestrate, and the debugger skill.
+
+### Agents (`agents/*.md`)
+
+| Agent | Model | Scope |
+| --- | --- | --- |
+| `architect` | **opus** | Language-agnostic. Owns high-level design, decomposition, cross-cutting decisions, and the evidence-first orchestration loop: gather runtime evidence → delegate language work to the matching expert → verify against real flows. Delegates; rarely writes code itself. |
+| `python-expert` | sonnet (architect may override to opus for hard tasks) | Full Python specialist. Drives the `python` + `debug-agent` skills. |
+| `go-expert` | sonnet (overridable) | Full Go specialist. Drives the `go` + `debug-agent` skills. |
+| `node-expert` | sonnet (overridable) | TypeScript-focused (small JS-fallback section). Drives the `node` + `debug-agent` skills. |
+
+There is no separate `code-reviewer` agent: clean-code review is a
+cross-cutting responsibility every agent carries (see Working Principles) and
+is backed by each skill's `clean-code` reference.
+
+### Skills (`skills/*/SKILL.md`)
+
+| Skill | Role |
+| --- | --- |
+| `python` | Main Python development skill. SKILL.md routes to many reference files (progressive disclosure). |
+| `go` | Main Go development skill + references. |
+| `node` | Main Node/TypeScript development skill + references. |
+| `debug-agent` | Existing evidence-first `dbga` driver (Python/Go/Node over DAP). Moved into the plugin. |
+
+Each skill is **self-contained** → any one installs cleanly on its own via
+`npx skills`. Agents are plugin-only (the `skills` CLI installs skills, not
+agents) — expected and documented.
+
+## Source material & licensing
+
+We **combine and learn from both** MIT-licensed sources — the goal is the
+best result, not fidelity to any one repo:
+
+- **wshobson/agents** (MIT) — has both agents and skills. Supplies the
+  per-topic skill content (design-patterns, code-style, error-handling, async,
+  anti-patterns, concurrency) and lean specialist agents.
+- **VoltAgent/awesome-claude-code-subagents** (MIT) — agents only, but deep
+  (e.g. `python-pro` ≈ 3,800 words: operational checklists, type-system
+  mastery, async, testing methodology, security, collaboration protocol).
+
+Combination rules:
+
+1. **Each language skill** consolidates the relevant wshobson skills as
+   **language-specific reference files**, enriched with the matching deep
+   sections harvested from VoltAgent's agents. Language-**invariant** content
+   (clean-code/no-comments, evidence-first discipline, dependency-hygiene
+   discipline) is authored **once** in `skills/_shared/` and cross-referenced
+   by name — never triple-copied across python/go/node.
+2. **Each expert agent** merges the VoltAgent + wshobson versions of that
+   language (VoltAgent depth + wshobson structure), deduplicated, then points
+   at its skill + the `debug-agent` skill.
+3. The `architect` agent is **authored fresh** (no single upstream
+   equivalent), distilling the cross-cutting orchestration + working
+   principles below.
+4. Preserve upstream LICENSE/attribution; record the source commit SHA of each
+   vendored file.
+
+## Working principles (embedded in every agent + each skill's SKILL.md)
+
+These are the non-negotiables the whole plugin enforces:
+
+1. **Evidence and validation first.** Decisions are made by validating against
+   **real use flows run against the code** — not by reasoning about source.
+   Use logs, debugger breakpoints (`dbga`), and common practices to observe
+   what actually happens. Never declare a fix done until correct behavior is
+   **observed** at the point the bug occurred.
+2. **Debug with the toolkit, don't guess.** On a crash/hang/wrong-output,
+   reach for the `debug-agent` skill and `dbga` (diagnose, live sessions,
+   `eval`, instrument) before sprinkling prints or guessing fixes.
+3. **Proactive dependency hygiene.** On new install/setup and when touching
+   dependencies, push to latest and audit proactively, then suggest bumps:
+   - Node: `npm outdated`, `npm audit`, `npm install <pkg>@latest`.
+   - Python: `uv lock --upgrade` / `uv pip install -U`, `pip-audit`.
+   - Go: `go list -u -m all`, `go get -u ./...`, `govulncheck ./...`.
+4. **Clean, self-explaining code** (mirrors the official `code-simplifier`):
+   - Readable and **explicit over compact**; clarity beats brevity.
+   - **Never add code comments unless explicitly asked.** Code should explain
+     itself through clear names and structure. Remove comments that restate
+     obvious code.
+   - Avoid nested ternaries; prefer if/else or switch for multiple conditions.
+   - Reduce nesting and redundancy; consolidate related logic.
+   - Preserve functionality; don't over-simplify or strip helpful
+     abstractions.
+5. **Deliver clean, working, verified code — always.** The loop is design →
+   implement → run real flows → debug with evidence → simplify → verify.
+6. **Token economy.** These files are read by an agent, not a human. Slim,
+   to-the-point, minimum words while keeping what's vital. Authoring
+   constraints below enforce this.
+
+## Authoring constraints (slim, agent-facing — from writing-skills)
+
+Every skill and agent in this plugin follows:
+
+- **SKILL.md is the slim index, not the manual.** Target < 500 words; route to
+  `references/*.md` via progressive disclosure. Heavy/per-topic detail lives in
+  references, loaded only when needed.
+- **Descriptions are triggers only.** Third person, start with "Use when…",
+  list symptoms/contexts. **No workflow summary** (a summarized description
+  makes the agent skip the body).
+- **Names:** lowercase, hyphenated, active (`python`, `go`, `node`,
+  `debug-agent`; reference files like `error-handling`, `clean-code`).
+- **Keyword coverage** for discovery (errors, symptoms, tool/command names).
+- **Cross-reference by name**, not `@path` (no force-loading). Reference the
+  matching expert agent and `debug-agent` skill where relevant.
+- **One excellent example per pattern**, not many; no multi-language dilution.
+- **Agents are concise too** — operational checklist + when-to-delegate, detail
+  deferred to the skills they drive rather than restated inline.
+
+A short version of principles 1–2 is injected as a standard **Evidence-First
+Debugging** block in each agent/skill body:
+
+```markdown
+## Evidence-First Debugging (debug-agent toolkit)
+
+You have `dbga` — an evidence-first debugger for Python/Go/Node over DAP —
+and the `debug-agent` skill. When code crashes, hangs, produces wrong output,
+or you need live runtime state, DO NOT guess from source. Gather evidence:
+
+- `dbga diagnose -- <cmd>`  → triage a crash to the deepest user frame
+- `dbga session start --break-at file:line -- <script>` then
+  `dbga session eval --expr "<x>"` → inspect live state
+- Invoke the `debug-agent` skill for the full evidence-first loop.
+
+Validate against real use flows and verify the fix at the original fault
+before declaring it done.
+```
+
+## Repository layout
+
+```text
+debug-cli/                          # repo root (existing Python project)
+├── .claude-plugin/
+│   └── marketplace.json            # one entry, source ./plugin
+├── plugin/                         # PLUGIN ROOT
+│   ├── .claude-plugin/
+│   │   └── plugin.json             # name: debug-agent, namespace /debug-agent:*
+│   ├── README.md                   # install + usage for both paths
+│   ├── LICENSE                     # plugin MIT
+│   ├── THIRD_PARTY_NOTICES.md      # verbatim upstream MIT notices + SHAs
+│   ├── skills/                     # CANONICAL skills home (single source of truth)
+│   │   ├── debug-agent/            # MOVED from repo-root skills/debug-agent/
+│   │   │   ├── SKILL.md
+│   │   │   └── references/         # existing: workflow, debugger, instrumentation, ...
+│   │   ├── _shared/                  # language-invariant, authored ONCE
+│   │   │   ├── clean-code.md          # self-explaining, no-comments rule
+│   │   │   ├── evidence-first.md      # the debugging/validation discipline
+│   │   │   └── dependency-hygiene.md  # audit-then-suggest discipline
+│   │   ├── python/
+│   │   │   ├── SKILL.md
+│   │   │   └── references/            # PYTHON-SPECIFIC deltas only
+│   │   │       ├── design-patterns.md # + idioms / anti-patterns
+│   │   │       ├── type-hints.md
+│   │   │       ├── async-concurrency.md
+│   │   │       ├── errors-structure.md
+│   │   │       └── debugging.md       # Python dbga recipes
+│   │   ├── go/
+│   │   │   ├── SKILL.md
+│   │   │   └── references/
+│   │   │       ├── design-patterns.md
+│   │   │       ├── concurrency.md     # goroutines, channels, sync
+│   │   │       ├── errors-structure.md
+│   │   │       └── debugging.md       # Go dbga + dlv recipes
+│   │   └── node/
+│   │       ├── SKILL.md
+│   │       └── references/
+│   │           ├── design-patterns.md
+│   │           ├── typescript-types.md
+│   │           ├── async-patterns.md
+│   │           ├── errors-structure.md
+│   │           ├── js-fallback.md
+│   │           └── debugging.md       # Node dbga + vscode-js-debug recipes
+│   ├── agents/
+│   │   ├── architect.md             # opus, language-agnostic orchestrator
+│   │   ├── python-expert.md
+│   │   ├── go-expert.md
+│   │   └── node-expert.md
+│   ├── commands/
+│   │   └── setup.md                 # /debug-agent:setup (optional one-shot installer)
+│   └── references/
+│       └── agent-teams.md           # optional advanced parallel-debugging mode
+└── ...                              # src/, tests/, pyproject.toml, etc.
+```
+
+Rationale for canonical skills under `plugin/skills/`:
+
+- The plugin manifest loads every skill in that dir automatically.
+- `npx skills add` scans the cloned repo for a `SKILL.md` by skill name at any
+  depth (confirmed by existing wshobson usage where skills live deeply nested),
+  so a single-skill install resolves from the same dir. **No duplication.**
+- The existing `skills/debug-agent/` is **moved** here; the one reference in
+  the repo `CLAUDE.md` is updated.
+
+## Orchestration & collaboration
+
+**Constraint (verified):** a subagent cannot spawn subagents. So `architect`
+**cannot** be a passively-delegated subagent that itself calls the experts.
+Two valid wirings, both opt-in:
+
+1. **Architect as the main thread** — `claude --agent debug-agent:architect`.
+   Running as the main thread, it *can* dispatch `python-expert` / `go-expert`
+   / `node-expert` as subagents (with a per-call `model` override for hard
+   tasks). This is the default orchestration path. We do **not** force it via
+   the plugin `settings.json` `agent` key — that would hijack every session;
+   it's user-invoked.
+2. **Architect as an agent-teams lead** — for parallel competing-hypothesis
+   debugging, documented in `references/agent-teams.md` (experimental
+   `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS`; Windows = in-process mode).
+
+The loop in either wiring: detect language → gather evidence via
+`debug-agent`/`dbga` → fix via the matching expert → verify at the original
+fault that the code is clean and self-explaining.
+
+If a user never invokes `architect`, the experts and skills still work
+directly — the main session drives them. Architect is the orchestration
+convenience, not a hard dependency.
+
+## Install handling (dbga is not pip-installed by the plugin)
+
+Kept deliberately minimal — **no hooks, no PATH launchers, no background
+checks.** Just one optional command plus a README line.
+
+- **`/debug-agent:setup`** command (optional convenience): install `dbga`
+  (prefer `uv tool install dbga`, fall back to `pipx install dbga`, then
+  `pip install --user dbga`), print `dbga --version` to confirm, and note any
+  missing Go (`dlv`) / Node (vscode-js-debug) toolchain with the install
+  command. Does not auto-install language toolchains.
+- The skills already tell the agent to run `dbga --version` first (existing
+  `debug-agent` SKILL.md). A missing binary surfaces there naturally — no hook
+  needed.
+- README documents the one-liner for users who skip the command.
+
+## Manifest sketches
+
+`plugin/.claude-plugin/plugin.json`:
+
+```json
+{
+  "name": "debug-agent",
+  "description": "Evidence-first debugging (Python/Go/Node over DAP) plus consolidated language skills and an architect to deliver clean, verified code.",
+  "version": "0.1.0",
+  "author": { "name": "Nir Adler" },
+  "homepage": "https://github.com/niradler/dbga",
+  "repository": "https://github.com/niradler/dbga",
+  "license": "MIT"
+}
+```
+
+`.claude-plugin/marketplace.json` (repo root):
+
+```json
+{
+  "name": "dbga",
+  "owner": { "name": "Nir Adler" },
+  "plugins": [
+    { "name": "debug-agent", "source": "./plugin" }
+  ]
+}
+```
+
+## Testing / verification
+
+Functional:
+
+- `claude plugin validate ./plugin` passes.
+- `claude --plugin-dir ./plugin` loads; `/help` lists `/debug-agent:*` skills;
+  `/agents` lists `architect`, `python-expert`, `go-expert`, `node-expert`.
+- `npx skills add <local-or-repo> --skill python` installs one skill
+  standalone (repeat for `go`, `node`, `debug-agent`).
+- `/debug-agent:setup` installs `dbga` and reports version on a clean machine.
+- `wc -w` each SKILL.md (except `debug-agent`) against the word targets above.
+
+Behavioral (subagent scenarios, per writing-skills):
+
+- End-to-end: architect on a known-buggy Python script → evidence via
+  `debug-agent` → fix via `python-expert` → verified at the original fault,
+  no stray comments, clean code.
+- Each language skill: a subagent given a relevant task finds and applies the
+  right reference file.
+- Clean-code rule under pressure: a subagent does **not** add explanatory
+  comments unless asked.
+
+## Eval framework (lean — dev aid, not a 4× release gate)
+
+Skim from the full skill-creator loop to the pieces that pay off:
+
+1. **Behavioral subagent scenarios for all 4 skills** (the writing-skills
+   RED/GREEN core) — the three scenarios in Testing above. Cheap, highest
+   value.
+2. **One shared description-trigger optimization run** across all four
+   `description`s with a single ~20-query set whose negatives are the
+   cross-skill near-misses (python vs go vs node vs debug-agent). Mis-trigger
+   between the four is a single multi-class problem — one run, not four.
+3. **Full quantitative benchmark only for `debug-agent` + `python`** (richest
+   objective assertions). Go/Node are ported by analogy and spot-checked.
+
+Run eval scripts through a POSIX shell (Bash tool / WSL), use
+`generate_review.py --static`, and read `run_loop`'s `best_description` JSON
+directly. Eval is a dev aid; a positive with-skill delta is a goal, not a hard
+ship-gate for v0.1.
+
+Representative assertions (grading.json fields: `text`/`passed`/`evidence`):
+"added no code comments unless asked", "ran the real flow / debugger before
+proposing a fix", "loaded the correct reference file", "suggested a dependency
+bump when deps were stale".
+
+**Self-improvement (borrowed from SkillOpt, not adopted).** SkillOpt
+(MS, MIT, but 6 days old, benchmark-shaped, no SKILL.md/frontmatter or
+description-trigger model) isn't worth wiring in as a dependency. Its *idea*
+is: an optimizer-LLM proposes **bounded edits** to a skill doc, accepted
+**only on strict improvement against a held-out split**, with versioned
+`best`. That's complementary to `run_loop` (which only tunes the
+`description` trigger). Decision: keep `run_loop` for triggers; **optionally
+(v0.2+)** add a thin accept-on-improvement loop over our own
+`evals.json` + with-skill harness to refine SKILL.md/reference **bodies** —
+reimplemented in ~a script, not via the SkillOpt package.
+
+## Decisions
+
+- **Nested skill resolution — VERIFIED 2026-05-29 (skills CLI v1.5.0).**
+  Resolves cleanly at **default depth, no `--full-depth` needed**. Mechanism:
+  the skills CLI (`dist/cli.mjs` `getPluginSkillPaths`/`discoverSkills`) reads
+  the repo-root `.claude-plugin/marketplace.json`, and for each plugin pushes
+  `<source>/skills` (here `plugin/skills`) into its **priority search dirs**,
+  scanning one level deep — so `plugin/skills/<name>/SKILL.md` is found by
+  `npx skills add niradler/dbga --skill python`. Empirically: with no
+  `marketplace.json`, a probe at `plugin/skills/probe/` was invisible at
+  default depth; after adding `marketplace.json` with `source: "./plugin"` it
+  resolved immediately. No per-skill `skills` array in the manifest required
+  (that is an additional, optional override the CLI also honors). `_shared/`
+  has no `SKILL.md`, so it is correctly skipped by the scan.
+- **Vendor attribution — DECIDED.** Ship `plugin/THIRD_PARTY_NOTICES.md` with
+  each upstream's verbatim MIT text + copyright line + repo URL + commit SHA,
+  and a per-file header on files that are substantially copied. SHA alone is
+  not MIT compliance.
+- **Per-task model — DECIDED.** Agent definitions take a single `model`
+  (architect=opus, experts=sonnet). Per-call `model` override at dispatch
+  handles "opus for hard tasks" — valid only on the main-thread architect path
+  (see Orchestration).
+- **Skill move blast radius — DECIDED.** `skills/debug-agent/` is referenced in
+  `CLAUDE.md`, `CHANGELOG.md`, and `README.md` (×3) — **5 references, not 1.**
+  All updated on move; add a CHANGELOG "skill relocated" note. Verify
+  `git check-ignore` does not swallow `.claude-plugin/` (`.gitignore` has
+  `.claude/`).
+- **Existing `debug-agent` SKILL.md is exempt** from the <500-word index rule —
+  it's a validated driver doc (~1,400 words), not a routing index. Don't rewrite
+  what works to hit a target.
+- **Stale doc fixed.** Update repo `CLAUDE.md` "Python-only by design today"
+  to reflect the merged multi-language reality (Go/Delve, Node/vscode-js-debug),
+  matching the skill's "Honest Limits".
+- **Scope: build all at once** (4 skills + 4 agents + lean eval), single
+  `0.1.0` release.
+
+## Build approach
+
+Parallelize authoring with **one subagent per language**:
+
+- 3 language subagents (Python, Go, Node) each own: pull the wshobson skill
+  refs + VoltAgent agent for that language, merge/dedup, write the consolidated
+  `<lang>` SKILL.md (slim index) + language-specific reference files, write the
+  `<lang>-expert` agent, cross-reference `skills/_shared/` + `debug-agent`,
+  inject the Evidence-First block, and draft that skill's `evals.json`.
+- Main thread owns the shared, sequential pieces: `skills/_shared/*`, the
+  `architect` agent, moving `debug-agent` + fixing its 5 references, the
+  manifests, `THIRD_PARTY_NOTICES.md`, `/debug-agent:setup`, README, and the
+  CLAUDE.md fix.
+- Each subagent works in non-overlapping paths (`plugin/skills/<lang>/` +
+  `plugin/agents/<lang>-expert.md`) to avoid write conflicts.

From d1b5523f6494bd25c564ddd522b5e2e4173b4afc Mon Sep 17 00:00:00 2001
From: Nir Adler <me@niradler.com>
Date: Fri, 29 May 2026 16:31:08 +0300
Subject: [PATCH 02/16] feat(plugin): scaffold debug-agent plugin manifests,
 README, LICENSE, notices

---
 .claude-plugin/marketplace.json   |  7 +++
 plugin/.claude-plugin/plugin.json |  9 ++++
 plugin/LICENSE                    | 21 ++++++++
 plugin/README.md                  | 79 +++++++++++++++++++++++++++++++
 plugin/THIRD_PARTY_NOTICES.md     | 50 +++++++++++++++++++
 5 files changed, 166 insertions(+)
 create mode 100644 .claude-plugin/marketplace.json
 create mode 100644 plugin/.claude-plugin/plugin.json
 create mode 100644 plugin/LICENSE
 create mode 100644 plugin/README.md
 create mode 100644 plugin/THIRD_PARTY_NOTICES.md

diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json
new file mode 100644
index 0000000..c990dac
--- /dev/null
+++ b/.claude-plugin/marketplace.json
@@ -0,0 +1,7 @@
+{
+  "name": "dbga",
+  "owner": { "name": "Nir Adler" },
+  "plugins": [
+    { "name": "debug-agent", "source": "./plugin" }
+  ]
+}
diff --git a/plugin/.claude-plugin/plugin.json b/plugin/.claude-plugin/plugin.json
new file mode 100644
index 0000000..75cc2e8
--- /dev/null
+++ b/plugin/.claude-plugin/plugin.json
@@ -0,0 +1,9 @@
+{
+  "name": "debug-agent",
+  "description": "Evidence-first debugging (Python/Go/Node over DAP) plus consolidated language skills and an architect to deliver clean, verified code.",
+  "version": "0.1.0",
+  "author": { "name": "Nir Adler" },
+  "homepage": "https://github.com/niradler/dbga",
+  "repository": "https://github.com/niradler/dbga",
+  "license": "MIT"
+}
diff --git a/plugin/LICENSE b/plugin/LICENSE
new file mode 100644
index 0000000..6523688
--- /dev/null
+++ b/plugin/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2026 Nir Adler
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/plugin/README.md b/plugin/README.md
new file mode 100644
index 0000000..1de171b
--- /dev/null
+++ b/plugin/README.md
@@ -0,0 +1,79 @@
+# debug-agent — Claude Code plugin
+
+Evidence-first debugging for **Python, Go, and Node/TypeScript** over DAP, plus
+consolidated per-language development skills and an `architect` to orchestrate a
+complete **design → develop → debug deeply → verify → clean up** workflow.
+
+The plugin bundles the `dbga` debugger driver skill with three language skills
+and four agents, all enforcing the same principles: validate against real flows,
+debug with the toolkit instead of guessing, keep dependencies fresh, and ship
+clean, self-explaining code.
+
+## Name glossary
+
+Three names, three contexts — they refer to the same project:
+
+| Name | Where it appears |
+| --- | --- |
+| `dbga` | The marketplace name and the installed CLI binary (`dbga --version`). |
+| `debug-agent` | The plugin name and its command/skill namespace (`/debug-agent:*`). |
+| `debug_agent` | The Python import / distribution module name. |
+
+## What's inside
+
+- **Skills** (`/debug-agent:*`): `debug-agent` (the `dbga` driver), `python`,
+  `go`, `node`.
+- **Agents** (`/agents`): `architect` (opus, orchestrator), `python-expert`,
+  `go-expert`, `node-expert`.
+- **Command:** `/debug-agent:setup` — optional one-shot `dbga` installer.
+
+## Install — full plugin (recommended)
+
+Adds all skills, agents, and the setup command.
+
+```sh
+claude plugin marketplace add niradler/dbga
+/plugin install debug-agent@dbga
+```
+
+Then run the optional installer to put `dbga` on your PATH:
+
+```sh
+/debug-agent:setup
+```
+
+…or install `dbga` yourself:
+
+```sh
+uv tool install dbga   # or: pipx install dbga   # or: pip install --user dbga
+dbga --version
+```
+
+## Install — a single skill
+
+The [`skills`](https://github.com/vercel-labs/skills) CLI installs any one skill
+standalone (skills only — agents and commands come with the full plugin):
+
+```sh
+npx skills add niradler/dbga --skill python   # or: go | node | debug-agent
+npx skills add niradler/dbga --list           # preview what's available
+```
+
+Resolution is automatic: the repo-root `.claude-plugin/marketplace.json` points
+the `skills` CLI at `plugin/skills/`, so no `--full-depth` flag is needed.
+
+## Usage
+
+- **Just debug:** invoke the `debug-agent` skill (or run `dbga`) when something
+  crashes, hangs, or returns wrong output.
+- **Develop in one language:** the matching skill (`python`/`go`/`node`) loads
+  language-specific references on demand.
+- **Orchestrate:** run `claude --agent debug-agent:architect` to let the
+  architect gather evidence and delegate to the language experts. See
+  [`references/agent-teams.md`](references/agent-teams.md) for the experimental
+  parallel-debugging mode.
+
+## License
+
+MIT — see [`LICENSE`](LICENSE). Upstream attributions in
+[`THIRD_PARTY_NOTICES.md`](THIRD_PARTY_NOTICES.md).
diff --git a/plugin/THIRD_PARTY_NOTICES.md b/plugin/THIRD_PARTY_NOTICES.md
new file mode 100644
index 0000000..1f2713b
--- /dev/null
+++ b/plugin/THIRD_PARTY_NOTICES.md
@@ -0,0 +1,50 @@
+# Third-Party Notices
+
+This plugin's language skills and expert agents are derived in part from two
+MIT-licensed upstream projects. Files that substantially copy or adapt upstream
+content carry a per-file header naming the source and commit SHA; the verbatim
+license texts and per-file attribution are recorded below.
+
+SHA references alone are not MIT compliance — the full license text of each
+source is reproduced here, as the MIT license requires.
+
+---
+
+## wshobson/agents
+
+- Repository: https://github.com/wshobson/agents
+- License: MIT
+- Vendored commit SHA: `<filled per file by the language subagent>`
+- Used for: per-topic skill content (design patterns, code style, error
+  handling, async, anti-patterns, concurrency) and lean specialist agent
+  structure.
+
+### Files derived from this source
+
+<!-- language subagents append: plugin/skills/<lang>/references/<file>.md — <upstream path> @ <sha> -->
+
+### License (verbatim)
+
+```
+<verbatim MIT license text from wshobson/agents, including its copyright line>
+```
+
+---
+
+## VoltAgent/awesome-claude-code-subagents
+
+- Repository: https://github.com/VoltAgent/awesome-claude-code-subagents
+- License: MIT
+- Vendored commit SHA: `<filled per file by the language subagent>`
+- Used for: deep specialist-agent sections (operational checklists, type-system
+  mastery, async, testing methodology, security, collaboration protocol).
+
+### Files derived from this source
+
+<!-- language subagents append: plugin/agents/<lang>-expert.md — <upstream path> @ <sha> -->
+
+### License (verbatim)
+
+```
+<verbatim MIT license text from VoltAgent/awesome-claude-code-subagents, including its copyright line>
+```

From f550b4fcecef555662f6be8ca49cbb5963c80267 Mon Sep 17 00:00:00 2001
From: Nir Adler <me@niradler.com>
Date: Fri, 29 May 2026 16:33:19 +0300
Subject: [PATCH 03/16] refactor(plugin): relocate debug-agent skill into
 plugin/skills, update refs

---
 .claude/settings.local.json                         |  3 ++-
 CHANGELOG.md                                        | 10 +++++++++-
 CLAUDE.md                                           |  4 ++--
 README.md                                           | 13 +++++++++----
 {skills => plugin/skills}/debug-agent/SKILL.md      |  0
 .../skills}/debug-agent/references/advanced.md      |  0
 .../skills}/debug-agent/references/debugger.md      |  0
 .../debug-agent/references/instrumentation.md       |  0
 .../skills}/debug-agent/references/localization.md  |  0
 .../debug-agent/references/log-monitoring.md        |  0
 .../skills}/debug-agent/references/vscode-collab.md |  0
 .../skills}/debug-agent/references/workflow.md      |  0
 12 files changed, 22 insertions(+), 8 deletions(-)
 rename {skills => plugin/skills}/debug-agent/SKILL.md (100%)
 rename {skills => plugin/skills}/debug-agent/references/advanced.md (100%)
 rename {skills => plugin/skills}/debug-agent/references/debugger.md (100%)
 rename {skills => plugin/skills}/debug-agent/references/instrumentation.md (100%)
 rename {skills => plugin/skills}/debug-agent/references/localization.md (100%)
 rename {skills => plugin/skills}/debug-agent/references/log-monitoring.md (100%)
 rename {skills => plugin/skills}/debug-agent/references/vscode-collab.md (100%)
 rename {skills => plugin/skills}/debug-agent/references/workflow.md (100%)

diff --git a/.claude/settings.local.json b/.claude/settings.local.json
index 452a72d..6f2ffd3 100644
--- a/.claude/settings.local.json
+++ b/.claude/settings.local.json
@@ -1,7 +1,8 @@
 {
   "permissions": {
     "allow": [
-      "Bash(dir)"
+      "Bash(dir)",
+      "Bash(uv run *)"
     ]
   }
 }
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 20ed136..1309d93 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -7,6 +7,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
+### Changed
+
+- **`debug-agent` skill relocated** from `skills/debug-agent/` to
+  `plugin/skills/debug-agent/` as part of packaging the `debug-agent` Claude
+  Code plugin. `npx skills add niradler/dbga --skill debug-agent` still resolves
+  it (via the repo-root `.claude-plugin/marketplace.json`); update any manual
+  copy path accordingly.
+
 ## [0.1.0] — 2026-05-28
 
 Initial alpha release of `debug-agent` (CLI: `dbga`) — an evidence-first
@@ -57,7 +65,7 @@ daemon, with auto-context returned on every stop.
   (truncated to 200-char strings / 5-item collection previews), full stack
   (capped at 20 frames), recent output, warnings. No follow-up calls
   needed. Configurable via `--context-lines`.
-- **`debug-agent` skill** (`skills/debug-agent/`) — Claude/agent
+- **`debug-agent` skill** (`plugin/skills/debug-agent/`) — Claude/agent
   skill that drives `dbga` with evidence-first workflow, log
   monitoring, localization, instrumentation, debugger, VS Code collab, and
   advanced (hang/deadlock/wolf-fence/concurrency) reference docs.
diff --git a/CLAUDE.md b/CLAUDE.md
index ac67904..7b84761 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -72,6 +72,6 @@ Every CLI command returns a single JSON object on stdout via `core/format.emit_p
 - **Tear-down is best-effort and idempotent.** `DapSession.release()` is called from `finally`. Tree-killing the adapter is the unconditional fallback after a graceful `disconnect` request.
 - **The daemon idle-timeout watchdog** (default 1800s) exists so a forgotten session can't linger forever — don't disable it without thinking about cleanup.
 
-## The skill (`skills/debug-agent/`)
+## The skill (`plugin/skills/debug-agent/`)
 
-A Claude/agent skill ships in-repo at `skills/debug-agent/`. It documents the evidence-first workflow that the CLI is designed for (`SKILL.md` + `references/*.md`). If you change CLI command shapes or JSON schemas, audit the skill — it has concrete command examples that go stale silently.
+A Claude/agent skill ships in-repo at `plugin/skills/debug-agent/` (part of the `debug-agent` Claude Code plugin under `plugin/`). It documents the evidence-first workflow that the CLI is designed for (`SKILL.md` + `references/*.md`). If you change CLI command shapes or JSON schemas, audit the skill — it has concrete command examples that go stale silently.
diff --git a/README.md b/README.md
index 30e502a..6a595a4 100644
--- a/README.md
+++ b/README.md
@@ -145,8 +145,9 @@ they belong next to the code. Add `.debug-agent/` to your `.gitignore`:
 
 ## The `debug-agent` Skill
 
-`skills/debug-agent/` contains a Claude / agent skill that teaches
-evidence-first debugging on top of `dbga`. It includes:
+`plugin/skills/debug-agent/` contains a Claude / agent skill that teaches
+evidence-first debugging on top of `dbga`. It ships inside the `debug-agent`
+Claude Code plugin (see [`plugin/README.md`](plugin/README.md)) and includes:
 
 - **`SKILL.md`** — when to trigger, decision tree, mindset
 - **`references/workflow.md`** — the evidence-first loop
@@ -175,12 +176,16 @@ Manual install also works:
 
 ```sh
 # Linux / macOS
-cp -r skills/debug-agent ~/.claude/skills/
+cp -r plugin/skills/debug-agent ~/.claude/skills/
 
 # Windows PowerShell
-Copy-Item -Recurse skills/debug-agent $env:USERPROFILE\.claude\skills\
+Copy-Item -Recurse plugin/skills/debug-agent $env:USERPROFILE\.claude\skills\
 ```
 
+> Installing the full plugin (`/plugin install debug-agent@dbga`) brings this
+> skill plus the `python`/`go`/`node` skills and the agents — see
+> [`plugin/README.md`](plugin/README.md).
+
 ## Development
 
 ```sh
diff --git a/skills/debug-agent/SKILL.md b/plugin/skills/debug-agent/SKILL.md
similarity index 100%
rename from skills/debug-agent/SKILL.md
rename to plugin/skills/debug-agent/SKILL.md
diff --git a/skills/debug-agent/references/advanced.md b/plugin/skills/debug-agent/references/advanced.md
similarity index 100%
rename from skills/debug-agent/references/advanced.md
rename to plugin/skills/debug-agent/references/advanced.md
diff --git a/skills/debug-agent/references/debugger.md b/plugin/skills/debug-agent/references/debugger.md
similarity index 100%
rename from skills/debug-agent/references/debugger.md
rename to plugin/skills/debug-agent/references/debugger.md
diff --git a/skills/debug-agent/references/instrumentation.md b/plugin/skills/debug-agent/references/instrumentation.md
similarity index 100%
rename from skills/debug-agent/references/instrumentation.md
rename to plugin/skills/debug-agent/references/instrumentation.md
diff --git a/skills/debug-agent/references/localization.md b/plugin/skills/debug-agent/references/localization.md
similarity index 100%
rename from skills/debug-agent/references/localization.md
rename to plugin/skills/debug-agent/references/localization.md
diff --git a/skills/debug-agent/references/log-monitoring.md b/plugin/skills/debug-agent/references/log-monitoring.md
similarity index 100%
rename from skills/debug-agent/references/log-monitoring.md
rename to plugin/skills/debug-agent/references/log-monitoring.md
diff --git a/skills/debug-agent/references/vscode-collab.md b/plugin/skills/debug-agent/references/vscode-collab.md
similarity index 100%
rename from skills/debug-agent/references/vscode-collab.md
rename to plugin/skills/debug-agent/references/vscode-collab.md
diff --git a/skills/debug-agent/references/workflow.md b/plugin/skills/debug-agent/references/workflow.md
similarity index 100%
rename from skills/debug-agent/references/workflow.md
rename to plugin/skills/debug-agent/references/workflow.md

From fe05e309f5abbd1e4cc73e7f6c55fd617a3d3813 Mon Sep 17 00:00:00 2001
From: Nir Adler <me@niradler.com>
Date: Fri, 29 May 2026 16:34:44 +0300
Subject: [PATCH 04/16] feat(plugin): add language-invariant _shared references
 (clean-code, evidence-first, dependency-hygiene)

---
 plugin/skills/_shared/clean-code.md         | 43 +++++++++++++++++
 plugin/skills/_shared/dependency-hygiene.md | 38 +++++++++++++++
 plugin/skills/_shared/evidence-first.md     | 51 +++++++++++++++++++++
 3 files changed, 132 insertions(+)
 create mode 100644 plugin/skills/_shared/clean-code.md
 create mode 100644 plugin/skills/_shared/dependency-hygiene.md
 create mode 100644 plugin/skills/_shared/evidence-first.md

diff --git a/plugin/skills/_shared/clean-code.md b/plugin/skills/_shared/clean-code.md
new file mode 100644
index 0000000..43fbd18
--- /dev/null
+++ b/plugin/skills/_shared/clean-code.md
@@ -0,0 +1,43 @@
+# Clean, self-explaining code
+
+Language-invariant. The language skills reference this by name — do not copy it.
+
+## The rule
+
+Code explains itself through names and structure. Mirrors the official
+`code-simplifier`: clarity over cleverness, explicit over compact.
+
+- **No comments unless explicitly asked.** A comment that restates the code is
+  noise — delete it. If a line needs a comment to be understood, rename the
+  symbols or extract a well-named function instead. Keep only comments that
+  capture *why* a non-obvious choice was made and that the code genuinely cannot
+  express (a workaround, an external contract, a deliberate constraint).
+- **Readable over terse.** A clear `if/else` beats a dense one-liner. Optimize
+  for the next reader, not character count.
+- **No nested ternaries.** Use `if/else`, early returns, or a `switch`/`match`
+  for more than two branches.
+- **Reduce nesting.** Prefer guard clauses and early returns over deep `if`
+  pyramids. Flatten happy-path code to the left margin.
+- **Consolidate redundancy.** Pull repeated logic into one well-named place;
+  don't duplicate a rule in three branches.
+- **Names match behavior.** Name things for *what they do*, not *how*. Rename
+  the moment a name drifts from its meaning.
+
+## Don't over-simplify
+
+Preserve functionality and helpful abstractions. Simplification removes
+accidental complexity (noise, duplication, dead branches) — never essential
+structure. If collapsing a layer would hide a real boundary or lose a tested
+behavior, leave it.
+
+## When touching existing code
+
+Match the surrounding style — comment density, naming, idioms. Improve what you
+touch the way a careful developer would; don't restructure beyond your task.
+
+## Self-check before done
+
+- Did I add any comment I wasn't asked for? Remove it.
+- Could a rename or extraction replace an explanation?
+- Is any branch nested more than necessary?
+- Does every name still describe what the thing does?
diff --git a/plugin/skills/_shared/dependency-hygiene.md b/plugin/skills/_shared/dependency-hygiene.md
new file mode 100644
index 0000000..e9d4235
--- /dev/null
+++ b/plugin/skills/_shared/dependency-hygiene.md
@@ -0,0 +1,38 @@
+# Dependency hygiene — audit, then suggest
+
+Language-invariant. The language skills reference this by name — do not copy it.
+
+## The rule
+
+On new install/setup and whenever you touch dependencies, proactively audit and
+push toward latest — then **suggest** the bumps. Never run a mutating command
+(install, upgrade, lockfile rewrite) on your own; surface what you found and the
+exact command, and let the developer run it.
+
+- **Audit commands are safe to run** — they only read.
+- **Mutating commands are suggest-only** — present them, don't execute them.
+- Pin intent: explain *why* a bump matters (security advisory, bug fix, EOL)
+  rather than upgrading blindly.
+
+## Per language
+
+### Node
+
+- Audit (run): `npm outdated`, `npm audit`
+- Suggest (don't run): `npm install <pkg>@latest`, `npm audit fix`
+
+### Python
+
+- Audit (run): `pip-audit`, `uv pip list --outdated`
+- Suggest (don't run): `uv lock --upgrade`, `uv pip install -U <pkg>`
+
+### Go
+
+- Audit (run): `go list -u -m all`, `govulncheck ./...`
+- Suggest (don't run): `go get -u ./...`, `go get <module>@latest`
+
+## Reporting
+
+Lead with anything from a vulnerability audit, then stale-but-safe bumps. For
+each: package, current → available, the reason, and the suggest-only command.
+If nothing is stale or vulnerable, say so in one line and move on.
diff --git a/plugin/skills/_shared/evidence-first.md b/plugin/skills/_shared/evidence-first.md
new file mode 100644
index 0000000..5875c95
--- /dev/null
+++ b/plugin/skills/_shared/evidence-first.md
@@ -0,0 +1,51 @@
+# Evidence-first development & debugging
+
+Language-invariant. The language skills and agents reference this by name — do
+not copy it. This is the single source of truth for the discipline and the
+standard **Evidence-First Debugging** block embedded across the plugin.
+
+## The discipline
+
+1. **Validate against real flows, not source-reading.** Decide what the code
+   does by running a real use flow against it and observing the result — logs,
+   debugger breakpoints, common practices. Reasoning about source is a
+   hypothesis; a run is evidence.
+2. **Debug with the toolkit, don't guess.** On a crash, hang, or wrong output,
+   reach for the `debug-agent` skill and `dbga` *before* sprinkling prints or
+   guessing fixes. A debugger stop returns full context in one round-trip;
+   prints give you one value at a time.
+3. **Verify at the original fault.** Never declare a fix done until you have
+   **observed** correct behavior at the exact point the bug occurred — same
+   breakpoint, same input, same assertion that previously failed.
+
+The loop: design → implement → run the real flow → debug with evidence →
+simplify → verify at the fault.
+
+## Standard Evidence-First Debugging block
+
+Embed this (verbatim or trimmed) in agents and skill bodies:
+
+```markdown
+## Evidence-First Debugging (debug-agent toolkit)
+
+You have `dbga` — an evidence-first debugger for Python/Go/Node over DAP —
+and the `debug-agent` skill. When code crashes, hangs, produces wrong output,
+or you need live runtime state, DO NOT guess from source. Gather evidence:
+
+- `dbga diagnose -- <cmd>`  → triage a crash to the deepest user frame
+- `dbga session start --break-at file:line -- <script>` then
+  `dbga session eval --expr "<x>"` → inspect live state
+- Invoke the `debug-agent` skill for the full evidence-first loop.
+
+Validate against real use flows and verify the fix at the original fault
+before declaring it done.
+```
+
+## Mindset (cross-language)
+
+- **Two strikes, rethink.** Two failed hypotheses at the same spot means your
+  model is wrong — form a different theory aimed elsewhere.
+- **Breakpoint where the problem *begins*,** not where it manifests. Walk up the
+  stack to the frame where the value first went wrong.
+- **Read-only eval.** Inspecting live state should not mutate it unless you're
+  deliberately probing a fix.

From d1cf3072751186f0f3c75518739a5fa98fda22ca Mon Sep 17 00:00:00 2001
From: Nir Adler <me@niradler.com>
Date: Fri, 29 May 2026 16:35:29 +0300
Subject: [PATCH 05/16] feat(plugin): add architect orchestrator agent (opus)

---
 plugin/agents/architect.md | 64 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)
 create mode 100644 plugin/agents/architect.md

diff --git a/plugin/agents/architect.md b/plugin/agents/architect.md
new file mode 100644
index 0000000..61a8f43
--- /dev/null
+++ b/plugin/agents/architect.md
@@ -0,0 +1,64 @@
+---
+name: architect
+description: Use when a coding task spans design, multiple files, or more than one language and needs orchestration — decomposing the work, deciding cross-cutting architecture, then driving an evidence-first design→build→debug→verify loop. Use as the main-thread lead that delegates language work to python-expert, go-expert, or node-expert. Use for hard bugs that need runtime evidence gathered before a fix.
+model: opus
+---
+
+You are the architect: a language-agnostic orchestrator. You own high-level
+design, decomposition, and cross-cutting decisions, and you drive the
+evidence-first loop. You delegate implementation to the matching language
+expert and rarely write code yourself.
+
+## Orchestration loop
+
+1. **Frame.** Restate the goal and the definition of done. Surface ambiguities
+   and key decisions before building.
+2. **Detect language(s)** from the files and toolchain in play.
+3. **Gather evidence first.** For any crash, hang, wrong output, or unknown
+   runtime state, use the `debug-agent` skill and `dbga` to observe what
+   actually happens before proposing a change. Do not theorize from source.
+4. **Delegate to the expert.** Hand language-specific implementation to
+   `python-expert`, `go-expert`, or `node-expert`. Give each the framed task,
+   the evidence you gathered, and the definition of done.
+5. **Verify at the fault.** Re-run the real flow and confirm correct behavior at
+   the exact point the problem occurred. Not done until observed.
+6. **Simplify.** Ensure the result is clean and self-explaining before closing.
+
+## Delegating to experts (main-thread only)
+
+You can dispatch `python-expert` / `go-expert` / `node-expert` as subagents
+**only when you are the main thread** (`claude --agent debug-agent:architect`).
+A subagent cannot spawn subagents, so if you were yourself dispatched as a
+subagent, do the work directly using the matching skill instead of delegating.
+
+- Pass a per-call `model` override (e.g. opus) when the task is hard; experts
+  default to sonnet.
+- One expert owns one language's edits at a time — avoid parallel writers on the
+  same files.
+- For parallel competing-hypothesis debugging, see the plugin's
+  `references/agent-teams.md`.
+
+## Principles you enforce
+
+These are non-negotiable across every task and every expert you direct. The
+detail lives in the `_shared` references — apply them by name, don't restate:
+
+- **Evidence and validation first** — `_shared/evidence-first.md`.
+- **Clean, self-explaining code; no comments unless asked** —
+  `_shared/clean-code.md`.
+- **Proactive dependency hygiene; audit then suggest** —
+  `_shared/dependency-hygiene.md`.
+
+## Evidence-First Debugging (debug-agent toolkit)
+
+You have `dbga` — an evidence-first debugger for Python/Go/Node over DAP — and
+the `debug-agent` skill. When code crashes, hangs, produces wrong output, or you
+need live runtime state, DO NOT guess from source. Gather evidence:
+
+- `dbga diagnose -- <cmd>`  → triage a crash to the deepest user frame
+- `dbga session start --break-at file:line -- <script>` then
+  `dbga session eval --expr "<x>"` → inspect live state
+- Invoke the `debug-agent` skill for the full evidence-first loop.
+
+Validate against real use flows and verify the fix at the original fault before
+declaring it done.

From 066460cb66b9283910769ed2aaa74abe706169fc Mon Sep 17 00:00:00 2001
From: Nir Adler <me@niradler.com>
Date: Fri, 29 May 2026 16:36:25 +0300
Subject: [PATCH 06/16] feat(plugin): add setup command, agent-teams doc; fix
 CLAUDE.md multi-language status

---
 CLAUDE.md                        |  2 +-
 plugin/commands/setup.md         | 40 +++++++++++++++++++++++++
 plugin/references/agent-teams.md | 51 ++++++++++++++++++++++++++++++++
 3 files changed, 92 insertions(+), 1 deletion(-)
 create mode 100644 plugin/commands/setup.md
 create mode 100644 plugin/references/agent-teams.md

diff --git a/CLAUDE.md b/CLAUDE.md
index 7b84761..c11ba14 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
 
 `dbga` (distribution) / `debug_agent` (import name) — an evidence-first **Python** debugger CLI built on top of `debugpy`/DAP. The CLI surface is stateless; a per-session background daemon owns the live DAP connection. Every stop returns auto-contextualized JSON (location, source window, locals, full stack, recent output, warnings) so an AI agent can drive a real debugger one command at a time.
 
-Status: alpha. Python-only by design today — `debugpy` and `"type": "python"` are hardcoded in the launch path.
+Status: alpha. Multi-language over DAP via the `adapters/` registry: **Python** (debugpy, the most-validated path), **Go** (Delve), and **Node/TypeScript** (vscode-js-debug). Python is the richest surface — `instrument` source probes are Python-centric and the Node multi-process lifecycle is not yet validated (see the `debug-agent` skill's "Honest Limits"). Adding a language means subclassing `adapters.base.Adapter` and registering it.
 
 ## Commands
 
diff --git a/plugin/commands/setup.md b/plugin/commands/setup.md
new file mode 100644
index 0000000..7ba6271
--- /dev/null
+++ b/plugin/commands/setup.md
@@ -0,0 +1,40 @@
+---
+description: Install the dbga debugger CLI and report toolchain readiness for Python, Go, and Node
+---
+
+Install the `dbga` debugger CLI for the user and confirm it works. This is a
+convenience installer — no hooks, no PATH hacking, no background processes.
+
+## Steps
+
+1. **Check if `dbga` is already installed.** Run `dbga --version`. If it prints
+   a version, skip installation and report it as already present.
+
+2. **Install `dbga`** using the first available tool, in this order. Each is a
+   mutating command — run it directly here since the user invoked this installer
+   explicitly:
+   - `uv tool install dbga`  (preferred)
+   - else `pipx install dbga`
+   - else `pip install --user dbga`
+
+   If none of `uv`, `pipx`, or `pip` is available, stop and tell the user to
+   install one (recommend `uv`), then re-run `/debug-agent:setup`.
+
+3. **Confirm.** Run `dbga --version` and report the version. If it is not on
+   PATH after install, tell the user the install location and how to add it
+   (e.g. `uv tool` puts binaries in a dir shown by `uv tool dir`).
+
+4. **Report language toolchain readiness** (do NOT auto-install these — `dbga`
+   only needs them for the languages the user actually debugs):
+   - **Python** — debugpy is bundled; nothing extra needed.
+   - **Go** — check `dlv version`. If missing, note:
+     `go install github.com/go-delve/delve/cmd/dlv@latest`
+   - **Node** — check `node --version`, and note that vscode-js-debug is
+     required (VS Code/Cursor bundle it; otherwise extract a
+     `js-debug-dap-*.tar.gz` release or set `$DBGA_JS_DEBUG_SERVER`). See the
+     `debug-agent` skill's Languages table for discovery order.
+
+## Output
+
+A short summary: `dbga` version (or install result), then a one-line readiness
+status per language (ready / install command). Keep it terse.
diff --git a/plugin/references/agent-teams.md b/plugin/references/agent-teams.md
new file mode 100644
index 0000000..a71c6c9
--- /dev/null
+++ b/plugin/references/agent-teams.md
@@ -0,0 +1,51 @@
+# Agent teams — parallel competing-hypothesis debugging (experimental)
+
+An optional, advanced wiring for the `architect`. Default orchestration is the
+main-thread architect dispatching one expert at a time (see `architect.md`).
+Reach for teams only when a bug has **several plausible independent causes** and
+you want experts to chase them in parallel rather than in sequence.
+
+## When it helps
+
+- A hard, non-deterministic bug (hang, race, flaky test) with 2–3 competing
+  hypotheses that can be investigated independently.
+- A cross-language failure where Python, Go, and Node experts can each gather
+  evidence on their own surface at the same time.
+
+If one hypothesis clearly dominates, don't bother — sequential delegation is
+simpler and cheaper.
+
+## How to enable
+
+Agent teams are gated behind an experimental flag:
+
+```sh
+export CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1   # POSIX
+$env:CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS = "1"  # Windows PowerShell
+```
+
+Run the architect as the main thread so it can coordinate the team:
+
+```sh
+claude --agent debug-agent:architect
+```
+
+## Platform note
+
+- **POSIX:** teammates can run as separate coordinated processes.
+- **Windows:** runs in **in-process mode** — teammates execute within the lead's
+  process. Functionally equivalent for this workflow; expect less true
+  parallelism.
+
+This is an experimental Claude Code capability and its surface may change. If
+the flag is unset or unsupported, the architect transparently falls back to
+sequential one-expert-at-a-time delegation — nothing breaks.
+
+## The loop with a team
+
+1. Architect frames the bug and enumerates the competing hypotheses.
+2. Each teammate (the matching language expert) takes one hypothesis and gathers
+   runtime evidence with `dbga` / the `debug-agent` skill — no guessing.
+3. Architect collects the evidence, picks the hypothesis the evidence supports,
+   and has the owning expert implement the fix.
+4. Verify at the original fault before closing — same rule as the default loop.

From b2e892ab2cf3779bc526b84e04e7fa88a8348127 Mon Sep 17 00:00:00 2001
From: Nir Adler <me@niradler.com>
Date: Fri, 29 May 2026 16:48:10 +0300
Subject: [PATCH 07/16] feat(plugin): add python, go, node skills + language
 expert agents

---
 plugin/THIRD_PARTY_NOTICES.md                 |  90 ++++++++--
 plugin/agents/go-expert.md                    |  63 +++++++
 plugin/agents/node-expert.md                  |  53 ++++++
 plugin/agents/python-expert.md                |  42 +++++
 plugin/skills/go/SKILL.md                     |  73 ++++++++
 plugin/skills/go/evals/evals.json             |  36 ++++
 plugin/skills/go/references/concurrency.md    | 159 ++++++++++++++++++
 plugin/skills/go/references/debugging.md      |  99 +++++++++++
 .../skills/go/references/design-patterns.md   | 112 ++++++++++++
 .../skills/go/references/errors-structure.md  | 121 +++++++++++++
 plugin/skills/node/SKILL.md                   |  46 +++++
 plugin/skills/node/evals/evals.json           |  35 ++++
 .../skills/node/references/async-patterns.md  |  91 ++++++++++
 plugin/skills/node/references/debugging.md    |  63 +++++++
 .../skills/node/references/design-patterns.md |  77 +++++++++
 .../node/references/errors-structure.md       | 101 +++++++++++
 plugin/skills/node/references/js-fallback.md  |  53 ++++++
 .../node/references/typescript-types.md       |  97 +++++++++++
 plugin/skills/python/SKILL.md                 |  46 +++++
 plugin/skills/python/evals/evals.json         |  38 +++++
 .../python/references/async-concurrency.md    | 103 ++++++++++++
 plugin/skills/python/references/debugging.md  |  62 +++++++
 .../python/references/design-patterns.md      | 122 ++++++++++++++
 .../python/references/errors-structure.md     | 111 ++++++++++++
 plugin/skills/python/references/type-hints.md | 116 +++++++++++++
 25 files changed, 1992 insertions(+), 17 deletions(-)
 create mode 100644 plugin/agents/go-expert.md
 create mode 100644 plugin/agents/node-expert.md
 create mode 100644 plugin/agents/python-expert.md
 create mode 100644 plugin/skills/go/SKILL.md
 create mode 100644 plugin/skills/go/evals/evals.json
 create mode 100644 plugin/skills/go/references/concurrency.md
 create mode 100644 plugin/skills/go/references/debugging.md
 create mode 100644 plugin/skills/go/references/design-patterns.md
 create mode 100644 plugin/skills/go/references/errors-structure.md
 create mode 100644 plugin/skills/node/SKILL.md
 create mode 100644 plugin/skills/node/evals/evals.json
 create mode 100644 plugin/skills/node/references/async-patterns.md
 create mode 100644 plugin/skills/node/references/debugging.md
 create mode 100644 plugin/skills/node/references/design-patterns.md
 create mode 100644 plugin/skills/node/references/errors-structure.md
 create mode 100644 plugin/skills/node/references/js-fallback.md
 create mode 100644 plugin/skills/node/references/typescript-types.md
 create mode 100644 plugin/skills/python/SKILL.md
 create mode 100644 plugin/skills/python/evals/evals.json
 create mode 100644 plugin/skills/python/references/async-concurrency.md
 create mode 100644 plugin/skills/python/references/debugging.md
 create mode 100644 plugin/skills/python/references/design-patterns.md
 create mode 100644 plugin/skills/python/references/errors-structure.md
 create mode 100644 plugin/skills/python/references/type-hints.md

diff --git a/plugin/THIRD_PARTY_NOTICES.md b/plugin/THIRD_PARTY_NOTICES.md
index 1f2713b..df7ee3f 100644
--- a/plugin/THIRD_PARTY_NOTICES.md
+++ b/plugin/THIRD_PARTY_NOTICES.md
@@ -1,32 +1,63 @@
 # Third-Party Notices
 
 This plugin's language skills and expert agents are derived in part from two
-MIT-licensed upstream projects. Files that substantially copy or adapt upstream
-content carry a per-file header naming the source and commit SHA; the verbatim
-license texts and per-file attribution are recorded below.
-
-SHA references alone are not MIT compliance — the full license text of each
-source is reproduced here, as the MIT license requires.
+MIT-licensed upstream projects. Content was borrowed and refined (adapted,
+condensed, and rewritten in this plugin's house style) rather than copied
+verbatim, but the files below draw substantially on the listed sources. The
+full MIT license text of each upstream is reproduced at the end, as the MIT
+license requires — a commit SHA alone is not MIT compliance.
 
 ---
 
 ## wshobson/agents
 
 - Repository: https://github.com/wshobson/agents
-- License: MIT
-- Vendored commit SHA: `<filled per file by the language subagent>`
-- Used for: per-topic skill content (design patterns, code style, error
-  handling, async, anti-patterns, concurrency) and lean specialist agent
+- License: MIT — Copyright (c) 2024 Seth Hobson
+- Used for: per-topic skill content (design patterns, type safety, error
+  handling, async, concurrency, anti-patterns) and lean specialist agent
   structure.
 
 ### Files derived from this source
 
-<!-- language subagents append: plugin/skills/<lang>/references/<file>.md — <upstream path> @ <sha> -->
+| Plugin file | Upstream path | Commit SHA |
+| --- | --- | --- |
+| `skills/python/references/design-patterns.md` | `plugins/python-development/skills/python-design-patterns/references/details.md` (+ `python-anti-patterns/SKILL.md`) | `707d9c42` (+ `ee62f8c1`) |
+| `skills/python/references/type-hints.md` | `plugins/python-development/skills/python-type-safety/references/details.md` | (blob SHA not captured) |
+| `skills/python/references/async-concurrency.md` | `plugins/python-development/skills/async-python-patterns/references/details.md` | `475eb9ae` |
+| `skills/python/references/errors-structure.md` | `plugins/python-development/skills/python-error-handling/references/details.md` | `64f6611d` |
+| `skills/go/references/concurrency.md` | `plugins/systems-programming/skills/go-concurrency-patterns/SKILL.md` (+ `references/details.md`) | `be57c0b2` |
+| `skills/node/references/design-patterns.md` | `plugins/javascript-typescript/skills/nodejs-backend-patterns/SKILL.md` | `516bae62` |
+| `skills/node/references/async-patterns.md` | `plugins/javascript-typescript/skills/{nodejs-backend-patterns,modern-javascript-patterns}/SKILL.md` | `516bae62` / `a7739c73` |
+| `skills/node/references/errors-structure.md` | `plugins/javascript-typescript/skills/nodejs-backend-patterns/SKILL.md` | `516bae62` |
+| `skills/node/references/typescript-types.md` | `plugins/javascript-typescript/skills/typescript-advanced-types/SKILL.md` | `5057af79` |
+| `agents/python-expert.md` | `plugins/python-development/agents/python-pro.md` | `e03c788f` |
+| `agents/go-expert.md` | `plugins/systems-programming/agents/golang-pro.md` | `56848874` |
+| `agents/node-expert.md` | `plugins/javascript-typescript/agents/typescript-pro.md` | `3cc2a5a5` |
 
 ### License (verbatim)
 
 ```
-<verbatim MIT license text from wshobson/agents, including its copyright line>
+MIT License
+
+Copyright (c) 2024 Seth Hobson
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
 ```
 
 ---
@@ -34,17 +65,42 @@ source is reproduced here, as the MIT license requires.
 ## VoltAgent/awesome-claude-code-subagents
 
 - Repository: https://github.com/VoltAgent/awesome-claude-code-subagents
-- License: MIT
-- Vendored commit SHA: `<filled per file by the language subagent>`
+- License: MIT — Copyright (c) 2025 VoltAgent
 - Used for: deep specialist-agent sections (operational checklists, type-system
-  mastery, async, testing methodology, security, collaboration protocol).
+  mastery, async, testing methodology, security, collaboration protocol) and
+  the Node JS-fallback content.
 
 ### Files derived from this source
 
-<!-- language subagents append: plugin/agents/<lang>-expert.md — <upstream path> @ <sha> -->
+| Plugin file | Upstream path | Commit SHA |
+| --- | --- | --- |
+| `agents/python-expert.md` | `categories/02-language-specialists/python-pro.md` | `7a6ee971` |
+| `agents/go-expert.md` | `categories/02-language-specialists/golang-pro.md` | `c3e5f7a5` |
+| `agents/node-expert.md` | `categories/02-language-specialists/typescript-pro.md` | `dc87923e` |
+| `skills/node/references/js-fallback.md` | `categories/02-language-specialists/javascript-pro.md` | `2f45e056` |
 
 ### License (verbatim)
 
 ```
-<verbatim MIT license text from VoltAgent/awesome-claude-code-subagents, including its copyright line>
+MIT License
+
+Copyright (c) 2025 VoltAgent
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
 ```
diff --git a/plugin/agents/go-expert.md b/plugin/agents/go-expert.md
new file mode 100644
index 0000000..28b41f7
--- /dev/null
+++ b/plugin/agents/go-expert.md
@@ -0,0 +1,63 @@
+---
+name: go-expert
+description: >-
+  Use when writing, reviewing, optimizing, or debugging Go — concurrent systems (goroutines, channels, select, sync, context, errgroup), microservices, CLI tools, gRPC/REST APIs, generics, idiomatic error handling. Symptoms/keywords: data race, deadlock, goroutine leak, nil-pointer panic, "race detected", "all goroutines are asleep - deadlock", go.mod, go test -race, golangci-lint, govulncheck, dlv, slow/high-allocation Go code needing pprof.
+model: sonnet
+---
+
+You are a senior Go engineer (Go 1.21+) specializing in efficient, concurrent,
+idiomatic systems: microservices, CLIs, system and cloud-native code. You write
+clean, verified Go and prove it works before declaring done.
+
+Drive the **`go` skill** for all depth — patterns, concurrency, error structure,
+and debug recipes. Do not restate it here; load the matching reference. For the
+evidence-first debugging loop, use the **`debug-agent` skill** and `dbga`.
+
+## Operating principles
+
+- Accept interfaces, return structs; small interfaces defined at the consumer.
+- Channels for orchestration, mutexes for state. Every goroutine has an exit
+  path bound by `context`.
+- Errors are values: handle explicitly, wrap with `%w`, inspect with
+  `errors.Is`/`As`. `panic` only for programmer bugs.
+- Composition over inheritance (embedding); functional options for config.
+- Generics only where they remove `any`/type-assertions — not by reflex.
+- Clean, self-explaining code. Go exception: keep idiomatic exported-identifier
+  doc comments; add no other comments unless asked.
+
+## Operational checklist (before declaring done)
+
+1. `gofmt -l .` prints nothing; `go vet ./...` clean.
+2. `golangci-lint run` passes.
+3. `go test -race ./...` — table-driven tests, race detector on, no goroutine
+   leaks.
+4. Concurrent/fallible paths take `context`; benchmarks for hot paths
+   (`go test -bench=. -benchmem`), confirm wins with `pprof`.
+5. Dependency hygiene when touching deps: `go list -u -m all`, `go get -u ./...`,
+   `go mod tidy`, `govulncheck ./...` — then suggest bumps.
+
+## When to delegate / escalate
+
+- Cross-language or multi-file design and orchestration → the `architect` agent.
+- Non-Go surfaces (Python, Node/TS) → the matching expert.
+- Stay on Go implementation, concurrency, performance, and Go debugging.
+
+## Evidence-First Debugging (debug-agent toolkit)
+
+You have `dbga` — an evidence-first debugger for Python/Go/Node over DAP — and the `debug-agent` skill. When code crashes, hangs, produces wrong output, or you need live runtime state, DO NOT guess from source. Gather evidence:
+
+- `dbga diagnose -- <cmd>`  → triage a crash to the deepest user frame
+- `dbga session start --break-at file:line -- <script>` then `dbga session eval --expr "<x>"` → inspect live state
+- Invoke the `debug-agent` skill for the full evidence-first loop.
+
+Validate against real use flows and verify the fix at the original fault before declaring it done.
+
+For Go, pass `--lang go` and `--cwd <module dir>` (the dir with `go.mod`):
+
+- `dbga diagnose --lang go --cwd <module dir> -- go run .`
+- `dbga session start --lang go --cwd <module dir> --break-at file.go:line -- .`
+  then `dbga session eval --expr "<x>"`
+
+Concurrency bugs first: `go test -race ./...`, and dump goroutine stacks
+(`SIGQUIT` / `GOTRACEBACK=all`) for deadlocks. Full recipes:
+`go` skill → `references/debugging.md`.
diff --git a/plugin/agents/node-expert.md b/plugin/agents/node-expert.md
new file mode 100644
index 0000000..54efb3a
--- /dev/null
+++ b/plugin/agents/node-expert.md
@@ -0,0 +1,53 @@
+---
+name: node-expert
+description: >-
+  Use when implementing, reviewing, or fixing Node.js or TypeScript code — TS type errors (TS2322/TS2345, "not assignable", strict-mode), tsconfig/build issues, async/await and unhandled-promise bugs, `Cannot read properties of undefined`, EventEmitter/stream/worker code, Express/Fastify/npm backends, Vitest/Jest tests, or plain-JS (no types) work. Keywords: typescript, ts, node, esm, async, promise, generics, vitest.
+model: sonnet
+---
+
+You are the Node/TypeScript expert. You write strict-typed, clean, verified Node and TS code, and drop to typed-JSDoc JavaScript only when a project genuinely has no TypeScript. You drive the `node` skill and the `debug-agent` skill — defer detail to them rather than restating it here.
+
+## Operating stance
+
+- **TypeScript-first, `strict: true`.** No `any` without a justified reason; model the domain so illegal states are unrepresentable; let inference carry non-boundary types.
+- **Evidence before fixes.** On a crash/hang/wrong output, gather runtime evidence with `dbga` before changing code (see below).
+- **Run a real flow before declaring done** — `tsc --noEmit`, the test suite, or the actual command.
+- **Clean, self-explaining code; no comments unless asked.**
+
+## Operational checklist
+
+1. Frame the task and definition of done; surface ambiguity early.
+2. Type the boundaries first (public API, external input); validate untrusted input at the edge (guard / zod).
+3. Implement with the right pattern — DI for testable seams, composition over inheritance, Result types for expected failures, exceptions for the exceptional.
+4. Handle async correctly: no floating promises, bounded concurrency, wrapped async handlers, graceful shutdown.
+5. Type-check (`tsc --noEmit`), lint (ESLint + Prettier), test (Vitest/Jest — cover edge cases).
+6. Verify the behavior at the original fault, then simplify.
+
+For the depth behind each step — advanced types, async combinators, error structure, design patterns, the JS fallback, and Node `dbga` recipes — use the **`node` skill** and its `references/*`. Apply the plugin's `_shared/clean-code.md`, `_shared/evidence-first.md`, and `_shared/dependency-hygiene.md` by name; do not restate them.
+
+## When to delegate / escalate
+
+- Cross-cutting design, decomposition, or multi-language work → defer to the `architect` (you may be dispatched by it).
+- Need a harder model for a gnarly type-level or concurrency problem → request an opus override at dispatch.
+- A bug needs runtime evidence → use `dbga` and the `debug-agent` skill before proposing a fix.
+
+## Evidence-First Debugging (debug-agent toolkit)
+
+You have `dbga` — an evidence-first debugger for Python/Go/Node over DAP — and the `debug-agent` skill. When code crashes, hangs, produces wrong output, or you need live runtime state, DO NOT guess from source. Gather evidence:
+
+- `dbga diagnose -- <cmd>`  → triage a crash to the deepest user frame
+- `dbga session start --break-at file:line -- <script>` then `dbga session eval --expr "<x>"` → inspect live state
+- Invoke the `debug-agent` skill for the full evidence-first loop.
+
+Validate against real use flows and verify the fix at the original fault before declaring it done.
+
+For Node, the forms match the `debug-agent` SKILL.md:
+
+```powershell
+dbga diagnose --timeout 60 --cwd <dir> -- node buggy.js
+dbga session start --session node-demo --cwd <dir> --break-at buggy.js:3 --pretty -- buggy.js
+dbga session eval --session node-demo --expr "nums"     # → (3) [10, 20, 30]  (JS formatting)
+dbga session release --session node-demo
+```
+
+Node runs over **vscode-js-debug** (set `$DBGA_JS_DEBUG_SERVER` if it is not auto-discovered from a VS Code/Cursor install). Only a **single launched process** is validated today — worker-thread / `child_process` / `cluster` lifecycles are not. See the `node` skill's `references/debugging.md`.
diff --git a/plugin/agents/python-expert.md b/plugin/agents/python-expert.md
new file mode 100644
index 0000000..0ba4d97
--- /dev/null
+++ b/plugin/agents/python-expert.md
@@ -0,0 +1,42 @@
+---
+name: python-expert
+description: >-
+  Use when a task is primarily Python — writing, reviewing, refactoring, optimizing, or debugging Python code; building or fixing CLIs, async/asyncio services, FastAPI/Django/Flask APIs, data pipelines, or scripts; adding type hints or reaching mypy --strict; investigating Python errors (TypeError, ValueError, ImportError, AttributeError, tracebacks), hangs, or wrong output in a .py program. Keywords: Python, def, class, async/await, asyncio, dataclass, Protocol, type hints, mypy, ruff, uv, pytest, pydantic.
+model: sonnet
+---
+
+You are a senior Python engineer: idiomatic, type-safe, production Python (3.10+) with the modern toolchain — `uv`, `ruff`, `mypy --strict`, `pytest`. You write clean, self-explaining code and prove it works against real flows before declaring done.
+
+## Operating rules
+
+1. **Drive the `python` skill.** It is your knowledge base — design patterns, type system, async/concurrency, error structure, and Python `dbga` recipes live in its references. Load the reference for the task; don't restate it from memory.
+2. **Clean, self-explaining code.** No comments unless asked; clear names and structure over cleverness; guard clauses over nested pyramids. (`_shared/clean-code.md`.)
+3. **Evidence first.** Validate against a real run, not source-reading. Verify every fix at the exact point the bug occurred. (`_shared/evidence-first.md`.)
+4. **Dependency hygiene.** On setup or when touching deps, audit (`pip-audit`, `uv pip list --outdated`) and *suggest* bumps — never run mutating installs yourself. (`_shared/dependency-hygiene.md`.)
+
+## Checklist for any Python change
+
+- Full type hints on public signatures and attributes; `mypy --strict` clean; collections parameterized; minimal `Any`.
+- async-first for I/O; nothing blocking inside `async def` (offload sync work with `asyncio.to_thread`).
+- Errors carry context, chain with `raise ... from e`, and are validated at boundaries (pydantic at edges); no bare `except: pass`.
+- Resources via context managers. Layering kept clean (handler → service → repository); no ORM/internal types leaking out of an API.
+- Tests with `pytest` cover error and edge cases, not just the happy path; mock only external services.
+- `ruff check` + `ruff format` clean.
+
+## When to delegate / escalate
+
+- **Cross-language or high-level design / decomposition** → defer to the `architect` agent.
+- **A hard task needing deeper reasoning** → the architect may dispatch this agent with a `model` override (opus).
+- Stay in your lane: Python implementation, review, and debugging. Don't redesign system boundaries unasked.
+
+## Evidence-First Debugging (debug-agent toolkit)
+
+You have `dbga` — an evidence-first debugger for Python/Go/Node over DAP — and the `debug-agent` skill. When code crashes, hangs, produces wrong output, or you need live runtime state, DO NOT guess from source. Gather evidence:
+
+- `dbga diagnose -- <cmd>`  → triage a crash to the deepest user frame
+- `dbga session start --break-at file:line -- <script>` then `dbga session eval --expr "<x>"` → inspect live state
+- Invoke the `debug-agent` skill for the full evidence-first loop.
+
+Validate against real use flows and verify the fix at the original fault before declaring it done.
+
+Python-specific `dbga` recipes (script-path sessions, async breakpoints, read-only eval, reversible instrument probes) are in the `python` skill's `references/debugging.md`.
diff --git a/plugin/skills/go/SKILL.md b/plugin/skills/go/SKILL.md
new file mode 100644
index 0000000..57c61ee
--- /dev/null
+++ b/plugin/skills/go/SKILL.md
@@ -0,0 +1,73 @@
+---
+name: go
+description: >-
+  Use when writing, reviewing, or fixing Go code — goroutines, channels, select, sync, context, errgroup; data races, deadlocks, goroutine leaks; error wrapping with %w, errors.Is/As, sentinel and typed errors; interfaces, generics, functional options; go.mod, go test -race, go vet, golangci-lint, govulncheck, dlv. Symptoms: "race detected", "all goroutines are asleep - deadlock", panic, leaking goroutines, nil-pointer deref.
+---
+
+# Go development
+
+Idiomatic, concurrent, evidence-verified Go (1.21+). This SKILL.md is a slim
+index — load the reference for the task at hand.
+
+## Core principles (Go deltas on the shared rules)
+
+- **Accept interfaces, return structs.** Small, focused interfaces defined at
+  the consumer.
+- **Don't communicate by sharing memory; share memory by communicating.**
+  Channels for orchestration, mutexes for state.
+- **Errors are values.** Handle explicitly; wrap with context. `panic` only for
+  programmer errors.
+- **Composition over inheritance** via embedding. Functional options for config.
+- Language-invariant rules live in the shared references — read them by name,
+  don't expect them restated here:
+  - `_shared/clean-code.md` — self-explaining code; Go exception: exported-
+    identifier doc comments are idiomatic, keep those; add no other comments.
+  - `_shared/evidence-first.md` — observe before you fix.
+  - `_shared/dependency-hygiene.md` — Go specifics below.
+
+## References — load on demand
+
+| Task | Read |
+| --- | --- |
+| Interfaces, generics, options, package layout, idioms/anti-patterns | `references/design-patterns.md` |
+| Goroutines, channels, select, sync, context, worker pools, errgroup | `references/concurrency.md` |
+| Error wrapping, `errors.Is/As`, sentinel/typed errors, panic boundaries | `references/errors-structure.md` |
+| Debug a crash/race/deadlock/hang with `dbga` + `dlv` | `references/debugging.md` |
+
+## Dependency hygiene (Go)
+
+On setup or when touching deps, audit then suggest bumps:
+
+```sh
+go list -u -m all      # what's outdated
+go get -u ./...        # update
+go mod tidy
+govulncheck ./...      # known vulnerabilities
+```
+
+See `_shared/dependency-hygiene.md` for the audit-then-suggest discipline.
+
+## Quality gate (run before declaring done)
+
+```sh
+gofmt -l .             # must print nothing
+go vet ./...
+golangci-lint run
+go test -race ./...    # race detector on
+```
+
+## Evidence-First Debugging (debug-agent toolkit)
+
+You have `dbga` — an evidence-first debugger for Python/Go/Node over DAP — and
+the `debug-agent` skill. When code crashes, hangs, produces wrong output, or you
+need live runtime state, DO NOT guess from source. Gather evidence:
+
+- `dbga diagnose --lang go --cwd <module dir> -- go run <main>` → triage a crash
+  to the deepest user frame
+- `dbga session start --lang go --cwd <module dir> --break-at file.go:line -- <main>`
+  then `dbga session eval --expr "<x>"` → inspect live state
+- Invoke the `debug-agent` skill for the full evidence-first loop.
+
+Concurrency bugs first: `go test -race ./...`. Validate against real use flows
+and verify the fix at the original fault before declaring it done. Details and
+`dlv` recipes in `references/debugging.md`.
diff --git a/plugin/skills/go/evals/evals.json b/plugin/skills/go/evals/evals.json
new file mode 100644
index 0000000..4bba2cd
--- /dev/null
+++ b/plugin/skills/go/evals/evals.json
@@ -0,0 +1,36 @@
+{
+  "skill": "go",
+  "evals": [
+    {
+      "name": "race-in-worker-pool",
+      "prompt": "My Go worker pool intermittently fails with 'WARNING: DATA RACE' on a shared results map. Find and fix the race.",
+      "expected_behavior": "Loads go/references/concurrency.md (and debugging.md). Runs `go test -race ./...` to observe the race before changing code. Fixes by guarding the map with a sync.Mutex/RWMutex or routing results through a channel rather than a shared map. Adds no explanatory code comments. Re-runs the race detector to confirm it is clean.",
+      "grading": [
+        { "text": "Ran `go test -race` (or `go run -race`) to gather evidence before proposing a fix", "passed": null, "evidence": "" },
+        { "text": "Loaded the correct reference file (concurrency.md / debugging.md)", "passed": null, "evidence": "" },
+        { "text": "Fixed the race with a mutex or channel, not time.Sleep", "passed": null, "evidence": "" },
+        { "text": "Added no code comments unless asked", "passed": null, "evidence": "" }
+      ]
+    },
+    {
+      "name": "error-wrapping-inspection",
+      "prompt": "This Go service compares errors with `err == ErrNotFound` but the check never matches even when the underlying error is ErrNotFound after the repository wraps it. Fix it idiomatically.",
+      "expected_behavior": "Loads go/references/errors-structure.md. Identifies that wrapping with %w breaks == comparison and switches the check to errors.Is(err, ErrNotFound) (or errors.As for typed errors), ensuring the wrap uses %w. Explicit error handling, no string matching on err.Error().",
+      "grading": [
+        { "text": "Used errors.Is / errors.As instead of == or string matching", "passed": null, "evidence": "" },
+        { "text": "Ensured the error is wrapped with %w to preserve the chain", "passed": null, "evidence": "" },
+        { "text": "Loaded go/references/errors-structure.md", "passed": null, "evidence": "" }
+      ]
+    },
+    {
+      "name": "panic-triage",
+      "prompt": "Running `go run .` in ./cmd/app panics with 'runtime error: integer divide by zero'. Find where it originates and fix it.",
+      "expected_behavior": "Reaches for the debug-agent toolkit: `dbga diagnose --lang go --cwd ./cmd/app -- go run .` to triage to the deepest user frame, inspects live state, fixes the divide-by-zero guard at the origin, then re-runs the real flow to verify the panic is gone at the original fault.",
+      "grading": [
+        { "text": "Used dbga diagnose/session with --lang go and --cwd before guessing", "passed": null, "evidence": "" },
+        { "text": "Fixed the bug at the originating frame, not the manifestation", "passed": null, "evidence": "" },
+        { "text": "Verified by re-running the real flow", "passed": null, "evidence": "" }
+      ]
+    }
+  ]
+}
diff --git a/plugin/skills/go/references/concurrency.md b/plugin/skills/go/references/concurrency.md
new file mode 100644
index 0000000..74cf947
--- /dev/null
+++ b/plugin/skills/go/references/concurrency.md
@@ -0,0 +1,159 @@
+# Go concurrency — goroutines, channels, sync, context
+
+> Don't communicate by sharing memory; share memory by communicating.
+
+Channels orchestrate; mutexes protect state. Every goroutine needs a defined
+exit path — a leaked goroutine is a bug.
+
+## Primitives
+
+| Primitive | Purpose |
+| --- | --- |
+| `goroutine` | Lightweight concurrent execution |
+| `channel` | Communication / synchronization |
+| `select` | Multiplex channel ops, timeouts, non-blocking |
+| `sync.Mutex` / `RWMutex` | Mutual exclusion for shared state |
+| `sync.WaitGroup` | Wait for a set of goroutines |
+| `context.Context` | Cancellation, deadlines, request values |
+| `errgroup.Group` | Concurrent ops that can fail, with cancellation |
+
+## Rules that prevent the common bugs
+
+- **Close channels from the sender side only.** Closing from a receiver, or
+  closing twice, panics.
+- **Every goroutine has an exit path.** Select on `ctx.Done()` so cancellation
+  reaches it.
+- **Buffer only when you know the count.** An unbounded buffer hides leaks.
+- **Prefer channels over `time.Sleep` for synchronization.** Sleep-based "sync"
+  is a race waiting to happen.
+- **`errgroup` for concurrent fallible work** — first error cancels the rest.
+
+## Worker pool
+
+Bounded concurrency, results collected, cancellation honored.
+
+```go
+func WorkerPool(ctx context.Context, workers int, jobs <-chan Job) <-chan Result {
+	results := make(chan Result)
+	var wg sync.WaitGroup
+	for i := 0; i < workers; i++ {
+		wg.Add(1)
+		go func() {
+			defer wg.Done()
+			for job := range jobs {
+				select {
+				case <-ctx.Done():
+					return
+				case results <- process(job):
+				}
+			}
+		}()
+	}
+	go func() { wg.Wait(); close(results) }()
+	return results
+}
+```
+
+The sender goroutine closes `results` once all workers finish — receivers
+`range` until close.
+
+## Fan-out / fan-in
+
+Run multiple instances of a stage, then merge their outputs.
+
+```go
+func merge(ctx context.Context, cs ...<-chan int) <-chan int {
+	out := make(chan int)
+	var wg sync.WaitGroup
+	wg.Add(len(cs))
+	for _, c := range cs {
+		go func(c <-chan int) {
+			defer wg.Done()
+			for n := range c {
+				select {
+				case <-ctx.Done():
+					return
+				case out <- n:
+				}
+			}
+		}(c)
+	}
+	go func() { wg.Wait(); close(out) }()
+	return out
+}
+```
+
+## errgroup with cancellation and a concurrency limit
+
+```go
+func fetchAll(ctx context.Context, urls []string, limit int) ([]string, error) {
+	g, ctx := errgroup.WithContext(ctx)
+	g.SetLimit(limit)
+	results := make([]string, len(urls))
+	for i, url := range urls {
+		g.Go(func() error {
+			r, err := fetch(ctx, url)
+			if err != nil {
+				return fmt.Errorf("fetch %s: %w", url, err)
+			}
+			results[i] = r
+			return nil
+		})
+	}
+	if err := g.Wait(); err != nil {
+		return nil, err
+	}
+	return results, nil
+}
+```
+
+The first non-nil error cancels `ctx`, stopping the rest. (`i, url` no longer
+need capturing since Go 1.22's per-iteration loop variables.)
+
+## select patterns
+
+```go
+select {
+case v := <-ch:
+	use(v)
+case <-time.After(time.Second):
+	// timeout
+case <-ctx.Done():
+	return ctx.Err()
+default:
+	// non-blocking: nothing ready
+}
+```
+
+## Graceful shutdown
+
+```go
+ctx, cancel := signal.NotifyContext(context.Background(), os.Interrupt, syscall.SIGTERM)
+defer cancel()
+
+srv.Start(ctx)
+<-ctx.Done()        // wait for signal
+srv.Shutdown(5 * time.Second)
+```
+
+`Shutdown` should `wg.Wait()` in a goroutine and race it against
+`time.After(timeout)` so a stuck worker can't block exit forever.
+
+## State: mutex vs sync.Map
+
+- Default to `sync.RWMutex` guarding a plain map.
+- `sync.Map` only for read-heavy, write-rare key sets (caches, registries).
+- High write contention → shard the map across N `RWMutex`-guarded buckets.
+
+## Verifying concurrency code
+
+Always run the race detector — it is the single highest-value check here:
+
+```sh
+go test -race ./...
+go run -race .
+```
+
+For a hang/deadlock at runtime, capture goroutine stacks and drive a live
+session — see `references/debugging.md` (`SIGQUIT` dump, `dlv goroutines`,
+`dbga session`).
diff --git a/plugin/skills/go/references/debugging.md b/plugin/skills/go/references/debugging.md
new file mode 100644
index 0000000..0d535c0
--- /dev/null
+++ b/plugin/skills/go/references/debugging.md
@@ -0,0 +1,99 @@
+# Debugging Go — evidence first
+
+Observe what *does* happen; don't infer from source. The discipline is in
+`_shared/evidence-first.md` and the full loop is the `debug-agent` skill. This
+file is the Go-specific recipe sheet.
+
+Prereq: Delve on PATH — `go install github.com/go-delve/delve/cmd/dlv@latest`.
+Always pass `--lang go` and `--cwd <module dir>` (the dir holding `go.mod`).
+`dbga` returns file paths with forward slashes even on Windows.
+
+## First move by symptom
+
+| Symptom | First move |
+| --- | --- |
+| Panic / crash with a stack | `dbga diagnose` (triage to deepest user frame) |
+| Have only the panic text | `dbga localize --lang go --file trace.txt` |
+| Wrong value, need live state | `dbga session start --break-at` + `eval` |
+| `race detected` | `go test -race ./...` then session at the racy access |
+| `all goroutines are asleep - deadlock` / hang | goroutine dump (below) + `dlv goroutines` |
+
+## Crash → triage in one call
+
+`diagnose` reruns the program paused at the deepest user frame with full
+context (location, source, locals, stack, recent output).
+
+```sh
+dbga diagnose --lang go --timeout 60 --cwd ./cmd/app --pretty -- go run .
+```
+
+Returns `"status": "diagnosed"` with `error_type` (e.g. `panic`), `message`
+(e.g. `runtime error: integer divide by zero`), and `deepest_user_frame`
+(e.g. `main.average` line 10). `diagnose` reuses session `default`; if one is
+alive you get `session_exists` — clear it with `dbga session release` first.
+
+## Live session — inspect and verify
+
+```sh
+dbga session start --lang go --session go-bug --cwd ./cmd/app \
+  --break-at calc.go:10 --pretty -- go run .
+
+dbga session eval --session go-bug --expr "nums"   # → []int len: 3, cap: 3, [10,20,30]
+dbga session eval --session go-bug --expr "total"
+dbga session continue --session go-bug             # re-hits the breakpoint
+dbga session release  --session go-bug             # always clean up
+```
+
+`eval` runs in Go via Delve, with Go value formatting. Set the breakpoint where
+the value *first* goes wrong, not where it blows up — walk up the stack to the
+origin. Verify a fix by evaluating the fix-expression against live state at the
+same breakpoint before editing.
+
+## Concurrency: races, deadlocks, leaks
+
+Run the race detector first — it pinpoints the conflicting accesses:
+
+```sh
+go test -race ./...
+go run -race .
+```
+
+For a hang/deadlock, dump every goroutine's stack by sending `SIGQUIT`
+(`Ctrl+\` on POSIX), or set `GOTRACEBACK=all`. The dump shows which goroutines
+are blocked on which channel/lock — the cycle is your deadlock; a goroutine
+stuck forever with no exit path is your leak (cross-check `concurrency.md`).
+
+Then drive a live session: break just before the suspect channel op or
+`Lock()`, `eval` the relevant state, and step. For goroutine-level inspection
+beyond `dbga`, attach `dlv` directly (below).
+
+## Raw dlv when you need goroutine/thread control
+
+`dbga` covers the evidence-first loop; drop to `dlv` for goroutine switching,
+deferred-call inspection, or core dumps.
+
+```sh
+dlv debug ./cmd/app -- <args>     # build + debug
+dlv test ./pkg/...                # debug a test
+dlv attach <pid>                  # attach to a running process
+dlv core ./bin/app core.1234      # post-mortem from a core dump
+```
+
+Inside dlv: `break pkg.Func`, `continue`, `goroutines`, `goroutine <id>`,
+`stack`, `print <expr>`, `locals`, `next`, `step`.
+
+## Profiling (when "slow", not "wrong")
+
+```sh
+go test -bench=. -benchmem -cpuprofile cpu.out -memprofile mem.out ./...
+go tool pprof cpu.out      # top, list <func>, web
+go tool trace trace.out
+```
+
+Benchmark before optimizing; confirm the win with a second benchmark.
+
+## Verify the fix
+
+Re-run the real flow (`go run` / the failing `go test -race`) and confirm
+correct behavior at the original fault location — not just that the program no
+longer crashes. Then run the quality gate from `go/SKILL.md`.
diff --git a/plugin/skills/go/references/design-patterns.md b/plugin/skills/go/references/design-patterns.md
new file mode 100644
index 0000000..9e6cb41
--- /dev/null
+++ b/plugin/skills/go/references/design-patterns.md
@@ -0,0 +1,112 @@
+# Go design patterns & idioms
+
+Go-specific deltas only. General clean-code rules: `_shared/clean-code.md`.
+
+## Accept interfaces, return structs
+
+Define the interface where it is *consumed*, not where the type is implemented.
+Keep interfaces small — one or two methods.
+
+```go
+type Store interface {
+	Get(ctx context.Context, id string) (User, error)
+}
+
+func NewService(s Store) *Service { return &Service{store: s} }
+```
+
+The caller depends on `Store`; any concrete struct with a matching `Get`
+satisfies it implicitly. This makes testing trivial — pass a fake.
+
+## Composition over inheritance (embedding)
+
+```go
+type Logger struct{ prefix string }
+
+func (l Logger) Log(msg string) { fmt.Println(l.prefix, msg) }
+
+type Server struct {
+	Logger
+	addr string
+}
+```
+
+`Server` promotes `Log` — no inheritance, just composition. Embed interfaces to
+extend behavior, embed structs to reuse it.
+
+## Functional options for configuration
+
+Preferred over giant config structs or many constructors. Each option is a
+closure that mutates the target; defaults stay in the constructor.
+
+```go
+type Server struct {
+	addr    string
+	timeout time.Duration
+}
+
+type Option func(*Server)
+
+func WithTimeout(d time.Duration) Option {
+	return func(s *Server) { s.timeout = d }
+}
+
+func NewServer(addr string, opts ...Option) *Server {
+	s := &Server{addr: addr, timeout: 30 * time.Second}
+	for _, opt := range opts {
+		opt(s)
+	}
+	return s
+}
+
+s := NewServer(":8080", WithTimeout(5*time.Second))
+```
+
+## Generics — when type parameters earn their place
+
+Use generics to remove `interface{}` and runtime type assertions from genuinely
+type-agnostic code (containers, map/filter/reduce). Do **not** reach for them
+when a plain interface expresses the contract better.
+
+```go
+func Map[T, U any](s []T, f func(T) U) []U {
+	out := make([]U, len(s))
+	for i, v := range s {
+		out[i] = f(v)
+	}
+	return out
+}
+```
+
+Constrain with `comparable` or `constraints.Ordered` when the body needs `==`
+or `<`.
+
+## Package layout
+
+- Package name = its purpose, lower-case, no `util`/`common` dumping grounds.
+- Exported API at the top of the file; unexported helpers below.
+- `internal/` for code that must not be imported by other modules.
+- One responsibility per package; avoid circular imports by depending on
+  interfaces, not concrete packages.
+
+## Idioms
+
+- Zero value should be useful (`sync.Mutex`, `bytes.Buffer` work unboxed).
+- `defer` for cleanup right after acquiring a resource — pair `Open`/`Close`,
+  `Lock`/`Unlock` on adjacent lines.
+- Return early; keep the happy path un-indented.
+- Slices: pre-allocate with `make([]T, 0, n)` when the size is known.
+
+## Anti-patterns to reject
+
+| Anti-pattern | Do instead |
+| --- | --- |
+| Empty `interface{}` / `any` everywhere | A focused interface or generics |
+| Returning concrete errors as `bool` ok-flags for failure | Return `error` |
+| Giant interfaces ("god interface") | Split into role-specific interfaces |
+| Naked `panic` for expected failures | Return an `error` (see `errors-structure.md`) |
+| Goroutine without a defined exit path | Bound it with `context` (see `concurrency.md`) |
+| `util` / `helpers` packages | Name packages by what they provide |
+
+When in doubt, run `go vet ./...` and `golangci-lint run` — they catch most of
+these mechanically.
diff --git a/plugin/skills/go/references/errors-structure.md b/plugin/skills/go/references/errors-structure.md
new file mode 100644
index 0000000..80bb9b0
--- /dev/null
+++ b/plugin/skills/go/references/errors-structure.md
@@ -0,0 +1,121 @@
+# Go error handling & structure
+
+Errors are values. Handle them explicitly at the level that can act on them.
+`panic` is for programmer errors only — never for expected failure.
+
+## Wrap with context using %w
+
+Add what *this* layer knows; preserve the chain so callers can inspect it.
+
+```go
+func loadConfig(path string) (*Config, error) {
+	data, err := os.ReadFile(path)
+	if err != nil {
+		return nil, fmt.Errorf("load config %s: %w", path, err)
+	}
+	...
+}
+```
+
+`%w` (not `%v`) keeps the wrapped error reachable by `errors.Is`/`errors.As`.
+Wrap with `%w` at most once per error in a chain; use `%v` if you only want the
+text, not unwrap-ability.
+
+## Inspect: errors.Is and errors.As
+
+```go
+if errors.Is(err, os.ErrNotExist) {
+	// matches a sentinel anywhere in the chain
+}
+
+var perr *fs.PathError
+if errors.As(err, &perr) {
+	log.Printf("op=%s path=%s", perr.Op, perr.Path)
+}
+```
+
+Never compare with `==` against a wrapped error, and never match on
+`err.Error()` string contents — both break the moment a layer re-wraps.
+
+## Sentinel errors — known, value-comparable conditions
+
+```go
+var ErrNotFound = errors.New("not found")
+
+func (s *Store) Get(id string) (User, error) {
+	u, ok := s.m[id]
+	if !ok {
+		return User{}, ErrNotFound
+	}
+	return u, nil
+}
+
+// caller
+if errors.Is(err, ErrNotFound) { ... }
+```
+
+## Typed errors — when the caller needs structured data
+
+Implement the `error` interface; expose fields and (optionally) `Unwrap`.
+
+```go
+type ValidationError struct {
+	Field string
+	Err   error
+}
+
+func (e *ValidationError) Error() string {
+	return fmt.Sprintf("validation failed on %s: %v", e.Field, e.Err)
+}
+
+func (e *ValidationError) Unwrap() error { return e.Err }
+```
+
+Retrieve it with `errors.As(err, &ve)`.
+
+## Multiple errors (Go 1.20+)
+
+`errors.Join` collects several errors; `errors.Is`/`As` match against any of
+them.
+
+```go
+var errs error
+for _, item := range items {
+	if err := validate(item); err != nil {
+		errs = errors.Join(errs, err)
+	}
+}
+return errs
+```
+
+## panic / recover — the boundary
+
+- `panic` only for unrecoverable programmer bugs (impossible state, broken
+  invariant).
+- `recover` only at a process boundary you own — e.g. a server's per-request
+  handler — to convert a panic into a 500 + logged stack, never to mask logic
+  errors.
+
+```go
+func safeHandler(h http.Handler) http.Handler {
+	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		defer func() {
+			if v := recover(); v != nil {
+				log.Printf("panic: %v\n%s", v, debug.Stack())
+				http.Error(w, "internal error", http.StatusInternalServerError)
+			}
+		}()
+		h.ServeHTTP(w, r)
+	})
+}
+```
+
+## Discipline
+
+- Handle each error once: log **or** return, not both.
+- Add context going up; don't strip the chain.
+- Return early on error to keep the happy path flat.
+- Don't ignore errors — `_ = f()` only with a comment justifying why it is safe.
+
+When the error chain doesn't explain *why* a value went wrong at runtime, stop
+reading source and gather evidence — see `references/debugging.md`.
diff --git a/plugin/skills/node/SKILL.md b/plugin/skills/node/SKILL.md
new file mode 100644
index 0000000..6c4efe9
--- /dev/null
+++ b/plugin/skills/node/SKILL.md
@@ -0,0 +1,46 @@
+---
+name: node
+description: >-
+  Use when writing, reviewing, or fixing Node.js or TypeScript code — type errors (TS2322, TS2345, "is not assignable"), `tsconfig`/strict-mode issues, async/await bugs, unhandled promise rejections, `Cannot read properties of undefined`, EventEmitter/stream/worker code, npm/pnpm/Express/Fastify backends, Jest/Vitest tests, or plain-JS (no types) work. Keywords: typescript, ts, node, esm, async, promise, generics, vitest, jest.
+---
+
+# Node.js / TypeScript
+
+TypeScript-first development skill. Delivers strict-typed, clean, verified Node/TS code. JavaScript without types is the fallback, not the default.
+
+## Core stance
+
+- **Strict TypeScript.** `strict: true` with every flag on; no `any` without a justification; types model the domain, not the other way around.
+- **Type inference over annotation.** Annotate public API boundaries and let inference carry the rest.
+- **Evidence first.** When code crashes, hangs, or returns wrong output, gather runtime evidence with `dbga` and the `debug-agent` skill before guessing. See `references/debugging.md`.
+- **Clean, self-explaining code; no comments unless asked** — see `_shared/clean-code.md` (cross-reference, do not restate).
+- **Audit then suggest dependency bumps** (`npm outdated`, `npm audit`) — see `_shared/dependency-hygiene.md`.
+- **Validation discipline** — see `_shared/evidence-first.md`.
+
+## References — load on demand
+
+| Need | Read |
+| --- | --- |
+| Module/composition/DI patterns, anti-patterns | `references/design-patterns.md` |
+| Advanced types: conditional, mapped, template-literal, branded, discriminated unions, `infer`, utility types | `references/typescript-types.md` |
+| async/await, Promise combinators, concurrency limits, streams, EventEmitter, graceful shutdown | `references/async-patterns.md` |
+| Typed errors, Result types, `never` exhaustiveness, custom error classes, async error wrapping | `references/errors-structure.md` |
+| Plain JS with no types: JSDoc typing, defensive coding, ESM | `references/js-fallback.md` |
+| `dbga` + vscode-js-debug recipes for Node/TS | `references/debugging.md` |
+
+## Toolchain
+
+- Runtime: `node` (use a current LTS). Package manager: npm / pnpm.
+- Type-check: `tsc --noEmit`. Lint/format: ESLint + Prettier.
+- Test: Vitest (preferred) or Jest. Cover edge cases, not just the happy path.
+- Run a real flow (`tsc --noEmit`, the test suite, or the actual command) before declaring anything done.
+
+## Evidence-First Debugging (debug-agent toolkit)
+
+You have `dbga` — an evidence-first debugger for Python/Go/Node over DAP — and the `debug-agent` skill. When code crashes, hangs, produces wrong output, or you need live runtime state, DO NOT guess from source. Gather evidence:
+
+- `dbga diagnose --timeout 60 --cwd <dir> -- node buggy.js` → triage a crash to the deepest user frame
+- `dbga session start --break-at file:line -- <script>` then `dbga session eval --expr "<x>"` → inspect live state
+- Invoke the `debug-agent` skill for the full evidence-first loop.
+
+Node uses vscode-js-debug (set `$DBGA_JS_DEBUG_SERVER` if not auto-discovered); only a single launched process is validated today. Validate against real use flows and verify the fix at the original fault before declaring it done.
diff --git a/plugin/skills/node/evals/evals.json b/plugin/skills/node/evals/evals.json
new file mode 100644
index 0000000..9df6dba
--- /dev/null
+++ b/plugin/skills/node/evals/evals.json
@@ -0,0 +1,35 @@
+{
+  "skill": "node",
+  "evals": [
+    {
+      "name": "ts-type-error-not-assignable",
+      "prompt": "My TypeScript build fails with `TS2322: Type 'string | undefined' is not assignable to type 'string'` in src/user.ts. How do I fix it properly without using `any`?",
+      "expected_behavior": "Triggers the node skill. Loads references/typescript-types.md. Recommends narrowing (guard / nullish handling) or tightening the type rather than casting to any; references strict-mode discipline.",
+      "grading": [
+        { "text": "loaded the node skill / typescript-types reference", "passed": null, "evidence": "" },
+        { "text": "avoided suggesting `any` as the fix", "passed": null, "evidence": "" },
+        { "text": "proposed narrowing or an explicit type fix", "passed": null, "evidence": "" }
+      ]
+    },
+    {
+      "name": "node-unhandled-rejection-wrong-output",
+      "prompt": "My Node Express endpoint sometimes returns wrong data and I see an UnhandledPromiseRejection in the logs. Find and fix the bug.",
+      "expected_behavior": "Triggers the node skill. Gathers runtime evidence with dbga (diagnose or a session breakpoint) before guessing; loads references/async-patterns.md and/or errors-structure.md; identifies a floating promise / missing await / unwrapped async handler; verifies the fix at the fault.",
+      "grading": [
+        { "text": "ran the real flow / dbga before proposing a fix", "passed": null, "evidence": "" },
+        { "text": "loaded the correct reference (async-patterns or errors-structure)", "passed": null, "evidence": "" },
+        { "text": "identified the missing await / unwrapped async handler", "passed": null, "evidence": "" },
+        { "text": "added no code comments unless asked", "passed": null, "evidence": "" }
+      ]
+    },
+    {
+      "name": "node-deps-stale-on-setup",
+      "prompt": "I just cloned this Node project and ran npm install. Anything I should know before I start adding a feature?",
+      "expected_behavior": "Triggers the node skill. Proactively audits dependencies (npm outdated, npm audit) and suggests bumps without running mutating installs; references dependency-hygiene discipline.",
+      "grading": [
+        { "text": "ran audit commands (npm outdated / npm audit)", "passed": null, "evidence": "" },
+        { "text": "suggested bumps rather than auto-running installs", "passed": null, "evidence": "" }
+      ]
+    }
+  ]
+}
diff --git a/plugin/skills/node/references/async-patterns.md b/plugin/skills/node/references/async-patterns.md
new file mode 100644
index 0000000..ae9f89d
--- /dev/null
+++ b/plugin/skills/node/references/async-patterns.md
@@ -0,0 +1,91 @@
+# Async patterns (Node/TS)
+
+Node/TS-specific. `async/await` everywhere; raw `.then()` chains only when composing combinators.
+
+## Concurrency — parallel by default, bounded when needed
+
+Sequential `await` in a loop serializes I/O. Parallelize when order is independent.
+
+```typescript
+const users = await Promise.all(ids.map((id) => fetchUser(id)));
+```
+
+Bound concurrency so you don't open 10k sockets at once:
+
+```typescript
+import pLimit from "p-limit";
+
+const limit = pLimit(5);
+const users = await Promise.all(ids.map((id) => limit(() => fetchUser(id))));
+```
+
+## Promise combinators — pick the right one
+
+- `Promise.all` — all succeed, or reject on first failure.
+- `Promise.allSettled` — every result, success or failure (batch jobs, fan-out where partial failure is fine).
+- `Promise.race` — first to settle (timeouts).
+- `Promise.any` — first to *fulfill* (fastest healthy replica).
+
+```typescript
+function withTimeout<T>(p: Promise<T>, ms: number): Promise<T> {
+  const timeout = new Promise<never>((_, reject) =>
+    setTimeout(() => reject(new Error("timeout")), ms),
+  );
+  return Promise.race([p, timeout]);
+}
+```
+
+## AbortController — cancellation, not orphaned work
+
+```typescript
+const ac = new AbortController();
+const res = await fetch(url, { signal: ac.signal });
+setTimeout(() => ac.abort(), 5000);
+```
+
+## Streams — backpressure for free
+
+For large data, stream instead of buffering. `pipeline` propagates errors and cleans up.
+
+```typescript
+import { pipeline } from "node:stream/promises";
+import { createReadStream, createWriteStream } from "node:fs";
+
+await pipeline(createReadStream(src), gzip(), createWriteStream(dst));
+```
+
+## EventEmitter — decouple producers from consumers
+
+```typescript
+import { EventEmitter } from "node:events";
+
+class Jobs extends EventEmitter {
+  async run(job: Job): Promise<void> {
+    await execute(job);
+    this.emit("done", job);
+  }
+}
+```
+
+Always attach an `error` listener — an unhandled `error` event crashes the process.
+
+## Graceful shutdown — close resources, then exit
+
+```typescript
+const server = app.listen(3000);
+
+process.on("SIGTERM", () => {
+  server.close(async () => {
+    await db.disconnect();
+    process.exit(0);
+  });
+  setTimeout(() => process.exit(1), 10_000).unref();
+});
+```
+
+## Pitfalls
+
+- **Unhandled rejection** — every async call needs an `await` with surrounding `try/catch`, or a `.catch()`. A floating promise swallows failures.
+- **`forEach` is not async-aware** — it ignores returned promises; use `for...of` with `await`, or `Promise.all(map(...))`.
+- **Microtask vs timer ordering** — awaited promises (microtasks) drain before `setTimeout` (macrotasks). Don't rely on `setTimeout(0)` for ordering.
+- **`async` in an event handler** that throws → unhandled rejection. Wrap the body.
diff --git a/plugin/skills/node/references/debugging.md b/plugin/skills/node/references/debugging.md
new file mode 100644
index 0000000..05d5712
--- /dev/null
+++ b/plugin/skills/node/references/debugging.md
@@ -0,0 +1,63 @@
+# Debugging Node/TS with `dbga` (vscode-js-debug)
+
+Node-specific recipes for the evidence-first loop. The full discipline lives in the `debug-agent` skill and `_shared/evidence-first.md` — reference them by name; this file is the Node delta only.
+
+## Prerequisites
+
+```powershell
+dbga --version            # expect 0.1.0
+node --version            # current LTS
+```
+
+Node debugging runs over **vscode-js-debug**. It is not on npm. Discovery order:
+
+1. `$DBGA_JS_DEBUG_SERVER` (point it at the extracted `dapDebugServer.js` / server dir).
+2. VS Code / Cursor / Insiders extension dirs (auto-detected).
+3. Manual extract of `js-debug-dap-vX.Y.Z.tar.gz` from the [vscode-js-debug releases](https://github.com/microsoft/vscode-js-debug/releases) into `~/.local/share/js-debug` (POSIX) or `%LOCALAPPDATA%\js-debug` (Windows).
+
+If `session start --lang node` fails to find the adapter, set `$DBGA_JS_DEBUG_SERVER` and retry.
+
+## Auto-detection
+
+`.js .mjs .cjs .ts .mts .cts` auto-detect to `--lang node`; you can still pass `--lang node` explicitly. `--cwd <dir>` is recommended so the adapter resolves modules from the project root.
+
+## Crash → triage in one call
+
+```powershell
+dbga diagnose --timeout 60 --cwd <dir> -- node buggy.js
+```
+
+Returns `"status": "diagnosed"` with `error_type`, `message`, and the `deepest_user_frame`. Example: `error_type: "TypeError"`, `message: "Cannot read properties of null (reading 'value')"`, deepest user frame `main` line 10. `node:internal/*` frames are marked `is_user_code: false`, so the deepest *user* frame is what you get.
+
+`diagnose` reuses session `default`; clear a lingering one with `dbga session release` first.
+
+## Live inspection
+
+```powershell
+dbga session start --session node-demo --cwd <dir> --break-at buggy.js:3 --pretty -- buggy.js
+dbga session eval --session node-demo --expr "nums"     # → (3) [10, 20, 30]  (JS formatting)
+dbga session continue --session node-demo
+dbga session release --session node-demo
+```
+
+Pause at program start instead of a breakpoint:
+
+```powershell
+dbga session start --session n --stop-on-entry --pretty -- buggy.js   # reason: entry
+```
+
+`eval` runs in the **target language** — vscode-js-debug evaluates the expression as JavaScript and formats values with JS syntax (`(3) [10, 20, 30]`, not Python's `[10, 20, 30]`).
+
+## TypeScript notes
+
+- `session start` takes a **script path**, not a shell command (no `ts-node -e`). Run a transpiled `.js`, or a `.ts` entry under a runtime/loader that the adapter launches.
+- Breakpoints map through source maps. If a breakpoint doesn't bind, confirm `sourceMap: true` in `tsconfig` and that the emitted `.js.map` sits next to the `.js`.
+- A `--break-at file:line` referencing the `.ts` source resolves via the source map; if it won't bind, set it on the emitted `.js` line instead.
+
+## Honest limit — single process
+
+Only a **single launched process** is validated. `worker_threads`, `child_process`, and `cluster` multi-process lifecycles are not yet validated. For multi-process bugs, isolate the failing worker into a standalone script and debug that.
+
+## When to reach for this
+
+Wrong output, a `TypeError`/`undefined` access, a hang, or any value that "shouldn't be possible" — set a breakpoint where the value *first* goes wrong (walk up the stack), eval to confirm, then verify the fix at that same breakpoint. Don't print-debug; one stop returns full context.
diff --git a/plugin/skills/node/references/design-patterns.md b/plugin/skills/node/references/design-patterns.md
new file mode 100644
index 0000000..f4feae5
--- /dev/null
+++ b/plugin/skills/node/references/design-patterns.md
@@ -0,0 +1,77 @@
+# Design patterns (Node/TS)
+
+Node/TS-specific structural patterns. Clarity and testability over cleverness — see `_shared/clean-code.md`.
+
+## Dependency injection — inject, don't import-and-hope
+
+Pass collaborators in so they can be substituted in tests. Hard-coded module imports are untestable seams.
+
+```typescript
+interface UserRepo {
+  findById(id: string): Promise<User | null>;
+}
+
+class UserService {
+  constructor(private readonly repo: UserRepo) {}
+
+  async profile(id: string): Promise<User> {
+    const user = await this.repo.findById(id);
+    if (!user) throw new NotFoundError("User");
+    return user;
+  }
+}
+```
+
+Tests pass a fake `UserRepo`; production passes the real one. No module mocking needed.
+
+## Repository pattern — isolate data access
+
+Hide the ORM/SQL behind an interface so business logic never imports a database client. Swap Postgres for an in-memory map in tests without touching callers.
+
+## Composition over inheritance
+
+Prefer small functions and object composition. Reach for classes when you have genuine identity + behavior + lifecycle (services, stateful clients); reach for plain functions and modules otherwise.
+
+```typescript
+const withRetry =
+  <A extends unknown[], R>(fn: (...a: A) => Promise<R>, n = 3) =>
+  async (...args: A): Promise<R> => {
+    let last: unknown;
+    for (let i = 0; i < n; i++) {
+      try {
+        return await fn(...args);
+      } catch (e) {
+        last = e;
+      }
+    }
+    throw last;
+  };
+```
+
+## Middleware pipeline
+
+Chain single-purpose handlers for cross-cutting concerns (auth, logging, validation). Each does one thing and calls `next`.
+
+## Factory functions
+
+Return a closed-over object instead of exposing a class when you don't need `instanceof` or inheritance — simpler, no `this` foot-guns.
+
+```typescript
+function createCounter(start = 0) {
+  let count = start;
+  return {
+    inc: () => ++count,
+    value: () => count,
+  };
+}
+```
+
+## Anti-patterns to refactor away
+
+- **`any` as an escape hatch** — use `unknown` + narrowing, or fix the type.
+- **God modules** — a file exporting 30 unrelated things; split by responsibility.
+- **Deep callback / `.then()` nesting** — flatten with `async/await`.
+- **Floating promises** — every promise is awaited or explicitly `.catch`-ed.
+- **Re-throwing without context** — attach `cause` so the stack trace survives.
+- **Mutating shared state** — prefer immutable updates (spread, `readonly`).
+- **Barrel-file cycles** — `import type` and direct paths break import cycles.
diff --git a/plugin/skills/node/references/errors-structure.md b/plugin/skills/node/references/errors-structure.md
new file mode 100644
index 0000000..df21cf9
--- /dev/null
+++ b/plugin/skills/node/references/errors-structure.md
@@ -0,0 +1,101 @@
+# Errors & structure (Node/TS)
+
+Node/TS-specific. Errors carry type-safe context; control flow stays explicit.
+
+## Custom error classes — typed, categorized
+
+Extend `Error`, set `name`, and attach a typed payload. Restore the prototype chain so `instanceof` survives transpilation to ES5 targets.
+
+```typescript
+class AppError extends Error {
+  constructor(
+    message: string,
+    readonly statusCode: number,
+    readonly cause?: unknown,
+  ) {
+    super(message);
+    this.name = new.target.name;
+    Object.setPrototypeOf(this, new.target.prototype);
+  }
+}
+
+class NotFoundError extends AppError {
+  constructor(resource: string) {
+    super(`${resource} not found`, 404);
+  }
+}
+```
+
+Use the built-in `cause` option (`new Error(msg, { cause })`) to chain without losing the original.
+
+## Result types — errors as values
+
+For expected, recoverable failures (validation, parsing), return a `Result` instead of throwing. Throw only for *exceptional* conditions.
+
+```typescript
+type Result<T, E = Error> =
+  | { ok: true; value: T }
+  | { ok: false; error: E };
+
+function parsePort(s: string): Result<number, string> {
+  const n = Number(s);
+  if (!Number.isInteger(n) || n < 1 || n > 65535) {
+    return { ok: false, error: `invalid port: ${s}` };
+  }
+  return { ok: true, value: n };
+}
+```
+
+## Exhaustiveness with `never`
+
+Force every error variant to be handled; a new variant becomes a compile error (see `references/typescript-types.md`).
+
+## Async error wrapping — no naked async handlers
+
+In Express-style frameworks an `async` handler that rejects bypasses error middleware. Wrap it.
+
+```typescript
+const asyncHandler =
+  <T extends RequestHandler>(fn: T): RequestHandler =>
+  (req, res, next) =>
+    Promise.resolve(fn(req, res, next)).catch(next);
+
+app.get(
+  "/users/:id",
+  asyncHandler(async (req, res) => {
+    const user = await repo.findById(req.params.id);
+    if (!user) throw new NotFoundError("User");
+    res.json(user);
+  }),
+);
+```
+
+## `catch` is `unknown`, not `Error`
+
+Under `useUnknownInCatchVariables` (on with `strict`), narrow before use.
+
+```typescript
+try {
+  await risky();
+} catch (e) {
+  if (e instanceof AppError) return reply(e.statusCode, e.message);
+  throw e;
+}
+```
+
+## Process-level safety nets
+
+Log and exit on the unexpected — never swallow silently.
+
+```typescript
+process.on("unhandledRejection", (reason) => {
+  logger.error({ reason }, "unhandled rejection");
+  process.exit(1);
+});
+```
+
+## Structure
+
+- One module = one responsibility; export the public surface, keep helpers private.
+- Dependency injection over hard-coded imports for anything you'll mock in tests.
+- Validate external input at the boundary (zod or a guard) so the typed core can trust its inputs.
diff --git a/plugin/skills/node/references/js-fallback.md b/plugin/skills/node/references/js-fallback.md
new file mode 100644
index 0000000..21776bb
--- /dev/null
+++ b/plugin/skills/node/references/js-fallback.md
@@ -0,0 +1,53 @@
+# Plain JavaScript fallback (no types)
+
+Use only when the project has no TypeScript and adding it is out of scope. TypeScript is the default everywhere else — see the other references. The goal here is to recover as much type safety as JS allows.
+
+## Recover type checking with JSDoc + `checkJs`
+
+JSDoc annotations give you editor checking and `tsc` validation on plain `.js`. Add a `jsconfig.json` (or `tsconfig` with `allowJs`/`checkJs`) and run `tsc --noEmit` over the JS.
+
+```javascript
+// @ts-check
+
+/**
+ * @param {string} id
+ * @returns {Promise<{ id: string, name: string } | null>}
+ */
+async function findUser(id) {
+  return db.users.get(id) ?? null;
+}
+```
+
+`/** @typedef */` and `@type` import types from `.d.ts` files, so you can share contracts without converting the codebase.
+
+## Modern JS baseline
+
+- **ESM only** — `import`/`export`, not `require`. Set `"type": "module"` in `package.json`.
+- **Optional chaining + nullish coalescing** — `obj?.a?.b ?? fallback`. `??` defaults only on `null`/`undefined`, unlike `||`.
+- **Private class fields** — `#field` for true encapsulation.
+- **`const` by default**, `let` only when reassigning, never `var`.
+
+## Defensive coding (no compiler to catch you)
+
+- Validate inputs at every public boundary — type checks, range checks, null guards. The types that TS would enforce must now be enforced at runtime.
+- Pure functions and immutable updates (spread, array methods over in-place mutation) keep behavior predictable.
+- Higher-order functions for composition; destructure for readable signatures.
+
+```javascript
+const isNonEmptyString = (x) => typeof x === "string" && x.length > 0;
+
+function greet(name) {
+  if (!isNonEmptyString(name)) throw new TypeError("name must be a non-empty string");
+  return `hello ${name}`;
+}
+```
+
+## Guardrails
+
+- Strict ESLint config (`eslint:recommended` + `no-floating-promises` via the promise plugin) to catch what the type system would.
+- `WeakRef` / `FinalizationRegistry` only for genuine memory-pressure cases — rare.
+- Async error handling is identical to TS (see `references/async-patterns.md` and `references/errors-structure.md`); `catch (e)` is untyped, so narrow with `instanceof` before using.
+
+## Note
+
+JSDoc-typed JS is debuggable with the same `dbga` Node recipes (`.js .mjs .cjs` auto-detect) — see `references/debugging.md`.
diff --git a/plugin/skills/node/references/typescript-types.md b/plugin/skills/node/references/typescript-types.md
new file mode 100644
index 0000000..7b9d05b
--- /dev/null
+++ b/plugin/skills/node/references/typescript-types.md
@@ -0,0 +1,97 @@
+# Advanced TypeScript types
+
+Node/TS-specific. The rule is *model the domain so illegal states are unrepresentable*, then let inference do the work. One example per pattern.
+
+## Discriminated unions — state machines & exhaustiveness
+
+Tag each variant with a literal `kind`; narrow on it; close with a `never` default so adding a variant becomes a compile error.
+
+```typescript
+type Result<T, E> =
+  | { kind: "ok"; value: T }
+  | { kind: "err"; error: E };
+
+function unwrap<T, E>(r: Result<T, E>): T {
+  switch (r.kind) {
+    case "ok":
+      return r.value;
+    case "err":
+      throw r.error;
+    default:
+      return assertNever(r);
+  }
+}
+
+function assertNever(x: never): never {
+  throw new Error(`unhandled variant: ${JSON.stringify(x)}`);
+}
+```
+
+## Branded types — domain safety with zero runtime cost
+
+Stop `UserId` and `OrderId` (both `string`) from being mixed up.
+
+```typescript
+type Brand<T, B> = T & { readonly __brand: B };
+type UserId = Brand<string, "UserId">;
+
+const asUserId = (s: string): UserId => s as UserId;
+```
+
+## Conditional types + `infer` — extract from a shape
+
+```typescript
+type ElementOf<T> = T extends readonly (infer E)[] ? E : never;
+type Awaited<T> = T extends Promise<infer U> ? Awaited<U> : T;
+```
+
+## Mapped types — transform a shape
+
+```typescript
+type Nullable<T> = { [K in keyof T]: T[K] | null };
+type Mutable<T> = { -readonly [K in keyof T]: T[K] };
+```
+
+Key remapping with `as` renames keys:
+
+```typescript
+type Getters<T> = {
+  [K in keyof T as `get${Capitalize<string & K>}`]: () => T[K];
+};
+```
+
+## Template literal types — typed string contracts
+
+```typescript
+type Route = `/${string}`;
+type EventName = `on${Capitalize<"click" | "focus">}`;
+```
+
+## Generic constraints — restrict, don't widen
+
+```typescript
+function pick<T, K extends keyof T>(obj: T, keys: K[]): Pick<T, K> {
+  return Object.fromEntries(keys.map((k) => [k, obj[k]])) as Pick<T, K>;
+}
+```
+
+## Type guards — narrow at runtime boundaries
+
+Prefer predicates and `assert` functions over `as`. Casts silence the compiler; guards prove the type.
+
+```typescript
+function isUser(x: unknown): x is { id: string } {
+  return typeof x === "object" && x !== null && "id" in x;
+}
+```
+
+## Utility types — reach for these before hand-rolling
+
+`Partial`, `Required`, `Readonly`, `Pick`, `Omit`, `Record`, `Extract`, `Exclude`, `NonNullable`, `ReturnType`, `Parameters`, `Awaited`.
+
+## Discipline
+
+- `strict: true`; no `any` without a `// reason:` justification — use `unknown` at boundaries and narrow.
+- `const` assertions (`as const`) preserve literal types for unions and tuples.
+- Type-only imports (`import type`) keep emit clean and avoid cycles.
+- 100% type coverage on public API surface; validate non-trivial type logic with type-level tests (`expectTypeOf` in Vitest).
diff --git a/plugin/skills/python/SKILL.md b/plugin/skills/python/SKILL.md
new file mode 100644
index 0000000..bad249d
--- /dev/null
+++ b/plugin/skills/python/SKILL.md
@@ -0,0 +1,46 @@
+---
+name: python
+description: Use when writing, reviewing, refactoring, or debugging Python — modules, packages, CLIs, async/asyncio code, FastAPI/Django/Flask services, data pipelines, or scripts. Triggers on Python keywords (def, class, async/await, asyncio, dataclass, Protocol, type hints, mypy, ruff, uv, pytest, pydantic), Python errors (TypeError, ValueError, ImportError, AttributeError, tracebacks), and tasks like "write/fix/optimize Python", "add type hints", "make this async", "Pythonic", ".py file".
+---
+
+# Python
+
+Idiomatic, type-safe, production Python (3.10+, modern toolchain: uv, ruff, mypy --strict, pytest). This file is a slim index — load the reference for the task at hand.
+
+## Cross-cutting discipline (do not restate — follow by name)
+
+- **Clean, self-explaining code** → `_shared/clean-code.md`. No comments unless asked; clear names over cleverness; guard clauses over nesting.
+- **Evidence-first development & debugging** → `_shared/evidence-first.md` + the `debug-agent` skill. Validate against real runs; verify at the original fault.
+- **Dependency hygiene** → `_shared/dependency-hygiene.md`. Audit (`pip-audit`, `uv pip list --outdated`); suggest bumps, don't run them.
+
+## Route to a reference
+
+| Task / symptom | Reference |
+| --- | --- |
+| Structure code, layering, DI, SRP, composition, anti-patterns | `references/design-patterns.md` |
+| Type hints, generics, Protocols, TypeVar, mypy strict, TypedDict | `references/type-hints.md` |
+| async/await, asyncio, tasks, queues, semaphores, blocking-in-async | `references/async-concurrency.md` |
+| Custom exceptions, chaining, partial-failure batches, validation | `references/errors-structure.md` |
+| Crash, hang, wrong value, live state — debug a `.py` with `dbga` | `references/debugging.md` |
+
+## Defaults
+
+- **Python 3.10+**. Use `X | None`, `list[str]`, `match`, `dataclass`/`pydantic`, `Protocol` for structural typing.
+- **Type everything public.** Full annotations on signatures and attributes; `mypy --strict` clean. See `references/type-hints.md`.
+- **Async-first for I/O.** Never block the event loop; offload sync work with `asyncio.to_thread`. See `references/async-concurrency.md`.
+- **Toolchain:** `uv` (deps/venv), `ruff` (lint+format), `mypy --strict`, `pytest`. Prefer the stdlib before adding a dependency.
+- **Errors carry context** and chain with `raise ... from e`; validate at boundaries. See `references/errors-structure.md`.
+
+## Evidence-First Debugging (debug-agent toolkit)
+
+You have `dbga` — an evidence-first debugger for Python/Go/Node over DAP — and the `debug-agent` skill. When code crashes, hangs, produces wrong output, or you need live runtime state, DO NOT guess from source. Gather evidence:
+
+- `dbga diagnose -- <cmd>`  → triage a crash to the deepest user frame
+- `dbga session start --break-at file:line -- <script>` then `dbga session eval --expr "<x>"` → inspect live state
+- Invoke the `debug-agent` skill for the full evidence-first loop.
+
+Validate against real use flows and verify the fix at the original fault before declaring it done. Python recipes: `references/debugging.md`.
+
+## Delegation
+
+For deep Python work, the `python-expert` agent drives this skill plus `debug-agent`. The `architect` agent orchestrates cross-language work.
diff --git a/plugin/skills/python/evals/evals.json b/plugin/skills/python/evals/evals.json
new file mode 100644
index 0000000..1794b93
--- /dev/null
+++ b/plugin/skills/python/evals/evals.json
@@ -0,0 +1,38 @@
+{
+  "skill": "python",
+  "evals": [
+    {
+      "name": "zerodivision-triage",
+      "prompt": "Running `python report.py` crashes with ZeroDivisionError: division by zero somewhere in an average calculation. Find where it originates and fix it.",
+      "expected_behavior": "Reaches for the debug-agent toolkit: `dbga diagnose --timeout 30 -- python report.py` to triage to the deepest user frame, inspects the live values that produced the error, fixes the guard at the originating frame (empty/zero divisor), then re-runs the real flow to verify the crash is gone at the original fault. Adds no explanatory code comments.",
+      "grading": [
+        { "text": "Used dbga diagnose (or session start --break-at) to gather runtime evidence before proposing a fix", "passed": null, "evidence": "" },
+        { "text": "Fixed the bug at the originating frame, not where it manifested", "passed": null, "evidence": "" },
+        { "text": "Verified by re-running the real flow / eval at the original fault", "passed": null, "evidence": "" },
+        { "text": "Added no code comments unless asked", "passed": null, "evidence": "" }
+      ]
+    },
+    {
+      "name": "blocking-in-async",
+      "prompt": "My asyncio service is supposed to fetch 20 URLs concurrently but it runs them one at a time and the whole loop freezes. Make it actually concurrent.",
+      "expected_behavior": "Loads python/references/async-concurrency.md. Identifies blocking calls inside async def (time.sleep / requests.get) stalling the event loop, switches to async-native I/O (httpx.AsyncClient) with asyncio.gather, and bounds concurrency with a Semaphore where appropriate. For unavoidable sync work, wraps it in asyncio.to_thread. No blocking calls remain in coroutines; no explanatory comments added.",
+      "grading": [
+        { "text": "Loaded python/references/async-concurrency.md", "passed": null, "evidence": "" },
+        { "text": "Replaced blocking calls with async-native I/O or asyncio.to_thread", "passed": null, "evidence": "" },
+        { "text": "Used asyncio.gather / TaskGroup for concurrency", "passed": null, "evidence": "" },
+        { "text": "Added no code comments unless asked", "passed": null, "evidence": "" }
+      ]
+    },
+    {
+      "name": "type-safe-repository",
+      "prompt": "Add full type hints to this Python data-access layer so it passes mypy --strict, including a generic repository interface and a Protocol for the injected cache.",
+      "expected_behavior": "Loads python/references/type-hints.md. Adds complete annotations using 3.10+ syntax (X | None, list[...]), defines a Generic[T, ID] repository ABC and a Protocol for the cache dependency, parameterizes all collections, and avoids Any. Result is mypy --strict clean. Suggests a mypy strict config if absent. No unrelated refactors, no stray comments.",
+      "grading": [
+        { "text": "Loaded python/references/type-hints.md", "passed": null, "evidence": "" },
+        { "text": "Used modern 3.10+ typing (X | None, list[...]) and parameterized collections", "passed": null, "evidence": "" },
+        { "text": "Used Protocol for the injected dependency and Generic for the repository", "passed": null, "evidence": "" },
+        { "text": "Added no code comments unless asked", "passed": null, "evidence": "" }
+      ]
+    }
+  ]
+}
diff --git a/plugin/skills/python/references/async-concurrency.md b/plugin/skills/python/references/async-concurrency.md
new file mode 100644
index 0000000..59c0936
--- /dev/null
+++ b/plugin/skills/python/references/async-concurrency.md
@@ -0,0 +1,103 @@
+# Python — async & concurrency
+
+Python-specific deltas. async-first for I/O-bound work; processes for CPU-bound; threads only to wrap blocking sync libs.
+
+## The one rule: never block the event loop
+
+A single synchronous call (`time.sleep`, `requests.get`, blocking file/DB I/O) stalls **every** concurrent task on that loop.
+
+```python
+# BAD — blocks the whole loop
+async def fetch():
+    time.sleep(1)
+    return requests.get(url)
+
+# GOOD — async-native
+async def fetch(url: str):
+    await asyncio.sleep(1)
+    async with httpx.AsyncClient() as client:
+        return await client.get(url)
+```
+
+When a sync library is unavoidable, offload it to a thread so the loop keeps running:
+
+```python
+async def read_file_async(path: str) -> str:
+    return await asyncio.to_thread(Path(path).read_text)   # 3.9+
+```
+
+For CPU-bound work use `loop.run_in_executor` with a `ProcessPoolExecutor`, or `concurrent.futures` directly — threads won't help past the GIL.
+
+## Concurrent fan-out with gather
+
+Independent awaitables run concurrently; gather collects them in order.
+
+```python
+async def get_user_data(db: AsyncDB, user_id: int) -> dict:
+    user, orders, profile = await asyncio.gather(
+        db.fetch_one(f"users:{user_id}"),
+        db.execute(f"orders:{user_id}"),
+        db.fetch_one(f"profiles:{user_id}"),
+    )
+    return {"user": user, "orders": orders, "profile": profile}
+```
+
+On 3.11+ prefer `asyncio.TaskGroup` when you want structured concurrency with automatic cancellation of siblings on first failure.
+
+## Bound concurrency with a Semaphore
+
+Cap in-flight work so you don't overwhelm a service or exhaust connections.
+
+```python
+async def rate_limited(urls: list[str], max_concurrent: int = 5) -> list[dict]:
+    sem = asyncio.Semaphore(max_concurrent)
+
+    async def call(url: str) -> dict:
+        async with sem:
+            async with httpx.AsyncClient() as client:
+                r = await client.get(url)
+                return {"url": url, "status": r.status_code}
+
+    return await asyncio.gather(*(call(u) for u in urls))
+```
+
+For HTTP throughput, also reuse one client and a bounded connection pool (`httpx.AsyncClient` / `aiohttp.TCPConnector(limit=..., limit_per_host=...)`) rather than opening a client per request.
+
+## Producer–consumer with a Queue
+
+```python
+async def producer(q: asyncio.Queue[str | None], n: int) -> None:
+    for i in range(n):
+        await q.put(f"item-{i}")
+    await q.put(None)
+
+async def consumer(q: asyncio.Queue[str | None]) -> None:
+    while True:
+        item = await q.get()
+        if item is None:
+            q.task_done()
+            break
+        await handle(item)
+        q.task_done()
+```
+
+## Async context managers, iterators, locks
+
+- **Resources:** implement `__aenter__`/`__aexit__` so cleanup runs on every exit path; consume with `async with`.
+- **Streaming:** `async def` + `yield` is an async generator; consume with `async for` (paginate APIs, stream rows without loading everything).
+- **Shared mutable state:** guard read-modify-write across `await` points with `asyncio.Lock` — an `await` inside a critical section yields control and lets another task interleave.
+
+```python
+class Counter:
+    def __init__(self) -> None:
+        self._value = 0
+        self._lock = asyncio.Lock()
+
+    async def increment(self) -> None:
+        async with self._lock:
+            self._value += 1
+```
+
+## Debugging async
+
+Hangs and "wrong value after await" are where source-reading fails hardest. Set a `dbga` breakpoint inside the coroutine and inspect live state across the `await` — see `references/debugging.md` and the `debug-agent` skill.
diff --git a/plugin/skills/python/references/debugging.md b/plugin/skills/python/references/debugging.md
new file mode 100644
index 0000000..2cb529e
--- /dev/null
+++ b/plugin/skills/python/references/debugging.md
@@ -0,0 +1,62 @@
+# Python — debugging with `dbga`
+
+Python-specific `dbga` recipes. The full evidence-first loop, mindset, and cross-language details live in the **`debug-agent`** skill and `_shared/evidence-first.md` — this file is only the Python deltas. Confirm the tool first: `dbga --version` (expect 0.1.0). Python needs no extra toolchain (debugpy is bundled); `.py` auto-detects `--lang python`.
+
+## Crash → triage in one call
+
+`diagnose` parses the traceback, reruns paused at the deepest user frame, and returns full context.
+
+```powershell
+dbga diagnose --timeout 30 --pretty -- python buggy.py
+```
+
+Returns `"status": "diagnosed"` with `error_type`, `message`, and `deepest_user_frame` (e.g. `ZeroDivisionError`, `"division by zero"`, frame `average` line 3) plus a live paused session.
+
+> `diagnose` reuses session name `default`. A lingering `default` yields `{"status":"error","error_type":"session_exists",...}` — clear it with `dbga session release` first.
+
+## Parse a traceback you already have (no rerun)
+
+```powershell
+dbga localize --lang python --file py_trace.txt
+```
+
+Same `error_type` / `message` / `deepest_user_frame` shape, without launching anything. Use when you have log output but not the live process.
+
+## Live session — inspect state at the fault
+
+`session start` takes a **script path**, not a shell command (no `python -m foo`). eval runs in Python and returns Python-formatted values.
+
+```powershell
+dbga session start --session py --break-at buggy.py:3 --pretty -- buggy.py
+dbga session eval --session py --expr "nums"     # → {"result":"[10, 20, 30]"}
+dbga session eval --session py --expr "total"    # → {"result":"60"}
+dbga session continue --session py               # re-hits the breakpoint with new state
+dbga session release --session py                # → {"status":"ok"}
+```
+
+Pause at program start instead of a breakpoint with `--stop-on-entry` (reason: `entry`).
+
+## Python-specific tactics
+
+- **Set the breakpoint where the value *first* goes wrong, not where it raises.** A `KeyError`/`AttributeError`/`TypeError` at line 80 usually originates upstream — walk the stack up to the frame where the bad value was produced.
+- **Inspect, don't print.** `session eval --expr "type(x)"`, `"vars(obj)"`, `"len(items)"`, `"x.__dict__"` answer "what is this really?" without editing source. Keep eval read-only unless you're probing a fix.
+- **async / coroutine bugs:** breakpoint inside the coroutine and eval across the `await` boundary — this is exactly where source-reading misleads (see `references/async-concurrency.md`).
+- **Comprehensions / generators hiding a bug:** break on the line and eval the source iterable and a sample element before trusting the one-liner.
+
+## Reversible source probes (Python-centric)
+
+`instrument` adds log/assert lines at a `file:line`, snapshotting the original so `revert --all` is atomic. Use for hot loops or a long run where pausing is impractical; see the `debug-agent` skill's `instrumentation.md`. Probes are Python-centric today.
+
+## Verify the fix at the original fault
+
+While still paused at the bug, eval the fixed expression against live state:
+
+```powershell
+dbga session eval --session py --expr "<fixed-expression>"
+```
+
+If it evaluates correctly **there**, the fix holds in code. Don't declare done until you've observed correct behavior at the same breakpoint where the bug appeared (`_shared/evidence-first.md`).
+
+## Cleanup
+
+Always `dbga session release` when done — a finished debuggee does not tear the daemon down. `dbga sessions ls` lists live daemons; forgotten ones self-expire (~30 min). State persists under project-local `.debug-agent/` — add it to `.gitignore`.
diff --git a/plugin/skills/python/references/design-patterns.md b/plugin/skills/python/references/design-patterns.md
new file mode 100644
index 0000000..216dd23
--- /dev/null
+++ b/plugin/skills/python/references/design-patterns.md
@@ -0,0 +1,122 @@
+# Python — design patterns & structure
+
+Python-specific deltas only. Clean-code rules (naming, nesting, no comments) live in `_shared/clean-code.md` — follow them here.
+
+## Start simple — pattern only when it earns its place
+
+A dict beats a registry/factory until you actually need pluggability.
+
+```python
+FORMATTERS = {"json": JsonFormatter, "csv": CsvFormatter, "xml": XmlFormatter}
+
+def get_formatter(name: str) -> Formatter:
+    if name not in FORMATTERS:
+        raise ValueError(f"unknown format: {name}")
+    return FORMATTERS[name]()
+```
+
+**Rule of three:** two similar functions are often genuinely different (different validation, different errors). Duplication is cheaper than the wrong abstraction — wait for the third case, and even then prefer explicit over clever.
+
+## Single responsibility — split HTTP / logic / data
+
+Each unit has one reason to change. Keep HTTP parsing, business rules, and data access in separate layers so a change to one doesn't ripple.
+
+```python
+class UserService:
+    def __init__(self, repo: UserRepository) -> None:
+        self._repo = repo
+
+    async def create_user(self, data: CreateUserInput) -> User:
+        user = User(email=data.email, name=data.name)
+        return await self._repo.save(user)
+
+class UserHandler:
+    def __init__(self, service: UserService) -> None:
+        self._service = service
+
+    async def create_user(self, request: Request) -> Response:
+        data = CreateUserInput(**(await request.json()))
+        user = await self._service.create_user(data)
+        return Response(user.to_dict(), status=201)
+```
+
+Layering: **handler** (parse/format) → **service** (domain rules, pure where possible) → **repository** (SQL, external APIs, cache). Each layer depends only on the one below.
+
+## Composition over inheritance
+
+Inject collaborators; don't bake them in via a base class. Composition is testable (swap a fake) and flexible.
+
+```python
+class NotificationService:
+    def __init__(
+        self,
+        email: EmailSender,
+        sms: SmsSender | None = None,
+        push: PushSender | None = None,
+    ) -> None:
+        self._email, self._sms, self._push = email, sms, push
+
+    async def notify(self, user: User, message: str, channels: set[str] | None = None) -> None:
+        channels = channels or {"email"}
+        if "email" in channels:
+            await self._email.send(user.email, message)
+        if "sms" in channels and self._sms and user.phone:
+            await self._sms.send(user.phone, message)
+        if "push" in channels and self._push and user.device_token:
+            await self._push.send(user.device_token, message)
+```
+
+## Dependency injection via Protocols
+
+Type dependencies as `Protocol`s (structural typing — see `references/type-hints.md`), pass them through `__init__`. Production wires real implementations; tests wire fakes.
+
+```python
+class Cache(Protocol):
+    async def get(self, key: str) -> str | None: ...
+    async def set(self, key: str, value: str, ttl: int) -> None: ...
+
+class UserService:
+    def __init__(self, repo: UserRepository, cache: Cache) -> None:
+        self._repo, self._cache = repo, cache
+
+    async def get_user(self, user_id: str) -> User:
+        cached = await self._cache.get(f"user:{user_id}")
+        if cached:
+            return User.from_json(cached)
+        user = await self._repo.get_by_id(user_id)
+        if user:
+            await self._cache.set(f"user:{user_id}", user.to_json(), ttl=300)
+        return user
+```
+
+```python
+prod = UserService(PostgresUserRepository(db), RedisCache(redis))
+test = UserService(InMemoryUserRepository(), FakeCache())
+```
+
+## Function size
+
+Extract when a function exceeds ~20–50 lines, serves multiple purposes, or nests 3+ levels. Compose from focused, well-named calls so the top-level reads as a workflow.
+
+```python
+def process_order(order: Order) -> Result:
+    validate_order(order)
+    reserve_inventory(order)
+    payment = charge_payment(order)
+    send_confirmation(order, payment)
+    return Result(success=True, order_id=order.id)
+```
+
+## Anti-patterns to refuse
+
+| Anti-pattern | Fix |
+| --- | --- |
+| Exposing ORM/internal types from an API | Return a DTO / response schema (`UserResponse.from_orm(user)`) |
+| I/O mixed into business logic | Repository pattern; keep domain functions pure and easily tested |
+| Scattered timeout/retry per call site | Centralize in a decorator or client wrapper |
+| Double retry (app **and** client both retry) | Retry at exactly one layer |
+| Hard-coded config / secrets | `pydantic-settings` `BaseSettings` reading env vars |
+| Bare `except Exception: pass` | Catch specific exceptions; log or re-raise — see `references/errors-structure.md` |
+| Generic `list` / untyped collections | Parameterize: `list[User]` — see `references/type-hints.md` |
+| Blocking calls inside `async def` | Async-native libs or `asyncio.to_thread` — see `references/async-concurrency.md` |
+| Only happy-path tests | Cover error and edge cases; mock only external services, not everything |
diff --git a/plugin/skills/python/references/errors-structure.md b/plugin/skills/python/references/errors-structure.md
new file mode 100644
index 0000000..fd61cca
--- /dev/null
+++ b/plugin/skills/python/references/errors-structure.md
@@ -0,0 +1,111 @@
+# Python — error handling & structure
+
+Python-specific deltas. Errors carry structured context, chain to preserve the debug trail, and never silently vanish.
+
+## Custom exception hierarchies
+
+Give a domain its own base exception; subclass for specific failures and attach the data a handler needs.
+
+```python
+class ApiError(Exception):
+    def __init__(self, message: str, status_code: int, body: str | None = None) -> None:
+        self.status_code = status_code
+        self.body = body
+        super().__init__(message)
+
+class RateLimitError(ApiError):
+    def __init__(self, retry_after: int) -> None:
+        self.retry_after = retry_after
+        super().__init__(f"rate limit exceeded; retry after {retry_after}s", status_code=429)
+```
+
+`match` over a status (or any discriminant) keeps multi-branch dispatch flat — no nested `if`:
+
+```python
+def handle(response: Response) -> dict:
+    match response.status_code:
+        case 200:
+            return response.json()
+        case 401:
+            raise ApiError("invalid credentials", 401)
+        case 429:
+            raise RateLimitError(int(response.headers.get("Retry-After", 60)))
+        case code if 400 <= code < 500:
+            raise ApiError(f"client error: {response.text}", code)
+        case code if code >= 500:
+            raise ApiError(f"server error: {response.text}", code)
+```
+
+## Chain exceptions — `raise ... from e`
+
+Translate low-level errors into domain errors, but keep the original cause so the traceback (and `dbga diagnose`) shows the real root.
+
+```python
+def upload_file(path: str) -> str:
+    try:
+        with open(path, "rb") as f:
+            r = httpx.post("https://upload.example.com", files={"file": f})
+            r.raise_for_status()
+            return r.json()["url"]
+    except FileNotFoundError as e:
+        raise ServiceError(f"upload failed: no file at {path!r}") from e
+    except httpx.HTTPStatusError as e:
+        raise ServiceError(f"upload failed: server returned {e.response.status_code}") from e
+```
+
+Never `except Exception: pass` — it hides bugs forever. Catch the specific type; log or re-raise.
+
+## Partial-failure batches
+
+One bad item must not abort the batch. Track success and failure per index and let the caller decide.
+
+```python
+from dataclasses import dataclass
+
+@dataclass
+class BatchResult[T]:
+    succeeded: dict[int, T]
+    failed: dict[int, Exception]
+
+    @property
+    def all_succeeded(self) -> bool:
+        return not self.failed
+
+def process_batch(items: list[Item]) -> BatchResult[ProcessedItem]:
+    succeeded: dict[int, ProcessedItem] = {}
+    failed: dict[int, Exception] = {}
+    for idx, item in enumerate(items):
+        try:
+            succeeded[idx] = process_single_item(item)
+        except Exception as e:
+            failed[idx] = e
+    return BatchResult(succeeded, failed)
+```
+
+For long batches, accept an optional `Callable[[int, int, str], None]` progress callback instead of coupling the loop to any UI.
+
+## Validate at the boundary
+
+Reject bad input where it enters (API edge, CLI arg, config load) — not deep in business logic where the failure is cryptic.
+
+```python
+def create_user(data: dict) -> User:
+    validated = CreateUserInput.model_validate(data)
+    return User.from_input(validated)
+```
+
+`pydantic` models / `pydantic-settings` `BaseSettings` are the idiomatic boundary validators; raise a domain error on failure.
+
+## Resources: always a context manager
+
+```python
+def read_file(path: str) -> str:
+    with open(path) as f:
+        return f.read()
+```
+
+A leaked file/socket/connection survives an exception. `with` (or `async with`) guarantees cleanup on every path. Implement `__enter__`/`__exit__` (or the async pair) for your own resources.
+
+## When an exception is the bug
+
+Don't reason about which branch raised — observe it. `dbga diagnose -- python app.py` pauses at the deepest user frame with the live locals that produced the error. See `references/debugging.md`.
diff --git a/plugin/skills/python/references/type-hints.md b/plugin/skills/python/references/type-hints.md
new file mode 100644
index 0000000..7b4acc6
--- /dev/null
+++ b/plugin/skills/python/references/type-hints.md
@@ -0,0 +1,116 @@
+# Python — type hints & the type system
+
+Python-specific deltas. Target `mypy --strict`. Use 3.10+ syntax: `X | None` (not `Optional[X]`), `list[str]`/`dict[str, int]` (not `typing.List`), builtins over `typing` aliases.
+
+## Baseline
+
+- Annotate **every** public signature and class attribute. Return types too.
+- Parameterize collections: `list[User]`, never bare `list`.
+- Minimize `Any`; it's acceptable only for genuinely dynamic data, and isolate it behind a typed boundary.
+- `mypy --strict` clean is the bar. Don't add `# type: ignore` without a `[code]` and a reason.
+
+## Protocols — structural typing without inheritance
+
+A class satisfies a `Protocol` by shape, not by subclassing. This is the idiomatic way to type injected dependencies (see `references/design-patterns.md`).
+
+```python
+from typing import Protocol, runtime_checkable
+
+@runtime_checkable
+class Serializable(Protocol):
+    def to_dict(self) -> dict: ...
+
+def serialize(obj: Serializable) -> str:
+    return json.dumps(obj.to_dict())
+```
+
+Reusable shapes: `Closeable` (`close()`), `Readable` (`read()`), `HasId` (`id` property). `@runtime_checkable` enables `isinstance` checks against the protocol.
+
+## Generics
+
+```python
+from typing import Generic, TypeVar
+from abc import ABC, abstractmethod
+
+T = TypeVar("T")
+ID = TypeVar("ID")
+
+class Repository(ABC, Generic[T, ID]):
+    @abstractmethod
+    async def get(self, id: ID) -> T | None: ...
+    @abstractmethod
+    async def save(self, entity: T) -> T: ...
+
+class UserRepository(Repository[User, str]):
+    async def get(self, id: str) -> User | None: ...
+    async def save(self, entity: User) -> User: ...
+```
+
+**Bounded TypeVar** restricts the parameter and preserves the concrete return type:
+
+```python
+from typing import TypeVar
+from pydantic import BaseModel
+
+ModelT = TypeVar("ModelT", bound=BaseModel)
+
+def validate_and_create(model_cls: type[ModelT], data: dict) -> ModelT:
+    return model_cls.model_validate(data)
+
+user = validate_and_create(User, {"name": "Alice", "email": "a@b.com"})
+```
+
+`validate_and_create(str, ...)` is a type error — `str` is not a `BaseModel`.
+
+## Type aliases (version-aware)
+
+PEP 695 `type` statement is **3.12+**. For 3.10/3.11 use `TypeAlias`.
+
+```python
+type UserId = str                        # 3.12+
+type Handler[T] = Callable[[Request], T]  # 3.12+ generic alias
+```
+
+```python
+from typing import TypeAlias              # 3.10/3.11
+from collections.abc import Callable
+
+UserId: TypeAlias = str
+Handler: TypeAlias = Callable[[Request], Response]
+```
+
+## Callable types & callbacks
+
+Import `Callable`/`Awaitable` from `collections.abc`, not `typing`.
+
+```python
+from collections.abc import Callable, Awaitable
+
+ProgressCallback = Callable[[int, int], None]
+AsyncHandler = Callable[[Request], Awaitable[Response]]
+```
+
+For keyword args in a callback, use a `Protocol` with `__call__`:
+
+```python
+class OnProgress(Protocol):
+    def __call__(self, current: int, total: int, *, message: str = "") -> None: ...
+```
+
+## Also reach for
+
+`TypedDict` (structured dicts), `Literal` (constants/enums-lite), `ParamSpec` (decorators preserving signatures), `@overload` (input-dependent return types).
+
+## mypy strict config
+
+```toml
+[tool.mypy]
+python_version = "3.12"
+strict = true
+warn_return_any = true
+warn_unused_ignores = true
+disallow_untyped_defs = true
+no_implicit_optional = true
+```
+
+Adopting strict on a legacy codebase: enable per-module with `# mypy: strict` or `pyproject.toml` overrides, then expand outward.

From bc12fe2493bb6bd9d06d787d53f2cc9cd233a1ad Mon Sep 17 00:00:00 2001
From: Nir Adler <me@niradler.com>
Date: Fri, 29 May 2026 16:52:59 +0300
Subject: [PATCH 08/16] fix(plugin): correct Go dbga session shapes,
 dep-hygiene framing, Python/TS type examples

---
 plugin/agents/go-expert.md                          | 9 +++++----
 plugin/skills/go/SKILL.md                           | 9 ++++-----
 plugin/skills/go/references/debugging.md            | 2 +-
 plugin/skills/node/references/typescript-types.md   | 2 +-
 plugin/skills/python/references/errors-structure.md | 5 ++++-
 5 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/plugin/agents/go-expert.md b/plugin/agents/go-expert.md
index 28b41f7..fb4adc0 100644
--- a/plugin/agents/go-expert.md
+++ b/plugin/agents/go-expert.md
@@ -33,8 +33,9 @@ evidence-first debugging loop, use the **`debug-agent` skill** and `dbga`.
    leaks.
 4. Concurrent/fallible paths take `context`; benchmarks for hot paths
    (`go test -bench=. -benchmem`), confirm wins with `pprof`.
-5. Dependency hygiene when touching deps: `go list -u -m all`, `go get -u ./...`,
-   `go mod tidy`, `govulncheck ./...` — then suggest bumps.
+5. Dependency hygiene when touching deps (per `_shared/dependency-hygiene.md`):
+   audit by running `go list -u -m all` and `govulncheck ./...`; then *suggest*
+   (don't auto-run) `go get -u ./...` / `go mod tidy`.
 
 ## When to delegate / escalate
 
@@ -55,8 +56,8 @@ Validate against real use flows and verify the fix at the original fault before
 For Go, pass `--lang go` and `--cwd <module dir>` (the dir with `go.mod`):
 
 - `dbga diagnose --lang go --cwd <module dir> -- go run .`
-- `dbga session start --lang go --cwd <module dir> --break-at file.go:line -- .`
-  then `dbga session eval --expr "<x>"`
+- `dbga session start --lang go --cwd <module dir> --break-at file.go:line -- main.go`
+  (session takes a `.go` script path, not a package) then `dbga session eval --expr "<x>"`
 
 Concurrency bugs first: `go test -race ./...`, and dump goroutine stacks
 (`SIGQUIT` / `GOTRACEBACK=all`) for deadlocks. Full recipes:
diff --git a/plugin/skills/go/SKILL.md b/plugin/skills/go/SKILL.md
index 57c61ee..4aa09cc 100644
--- a/plugin/skills/go/SKILL.md
+++ b/plugin/skills/go/SKILL.md
@@ -36,15 +36,14 @@ index — load the reference for the task at hand.
 
 ## Dependency hygiene (Go)
 
-On setup or when touching deps, audit then suggest bumps:
+On setup or when touching deps, audit (run), then suggest (don't auto-run):
 
 ```sh
-go list -u -m all      # what's outdated
-go get -u ./...        # update
-go mod tidy
-govulncheck ./...      # known vulnerabilities
+go list -u -m all      # audit: what's outdated
+govulncheck ./...      # audit: known vulnerabilities
 ```
 
+Suggest-only (present, let the developer run): `go get -u ./...`, `go mod tidy`.
 See `_shared/dependency-hygiene.md` for the audit-then-suggest discipline.
 
 ## Quality gate (run before declaring done)
diff --git a/plugin/skills/go/references/debugging.md b/plugin/skills/go/references/debugging.md
index 0d535c0..81c343d 100644
--- a/plugin/skills/go/references/debugging.md
+++ b/plugin/skills/go/references/debugging.md
@@ -36,7 +36,7 @@ alive you get `session_exists` — clear it with `dbga session release` first.
 
 ```sh
 dbga session start --lang go --session go-bug --cwd ./cmd/app \
-  --break-at calc.go:10 --pretty -- go run .
+  --break-at calc.go:10 --pretty -- main.go
 
 dbga session eval --session go-bug --expr "nums"   # → []int len: 3, cap: 3, [10,20,30]
 dbga session eval --session go-bug --expr "total"
diff --git a/plugin/skills/node/references/typescript-types.md b/plugin/skills/node/references/typescript-types.md
index 7b9d05b..02b714a 100644
--- a/plugin/skills/node/references/typescript-types.md
+++ b/plugin/skills/node/references/typescript-types.md
@@ -42,7 +42,7 @@ const asUserId = (s: string): UserId => s as UserId;
 
 ```typescript
 type ElementOf<T> = T extends readonly (infer E)[] ? E : never;
-type Awaited<T> = T extends Promise<infer U> ? Awaited<U> : T;
+type UnwrapPromise<T> = T extends Promise<infer U> ? UnwrapPromise<U> : T;
 ```
 
 ## Mapped types — transform a shape
diff --git a/plugin/skills/python/references/errors-structure.md b/plugin/skills/python/references/errors-structure.md
index fd61cca..7cf7395 100644
--- a/plugin/skills/python/references/errors-structure.md
+++ b/plugin/skills/python/references/errors-structure.md
@@ -61,9 +61,12 @@ One bad item must not abort the batch. Track success and failure per index and l
 
 ```python
 from dataclasses import dataclass
+from typing import Generic, TypeVar
+
+T = TypeVar("T")
 
 @dataclass
-class BatchResult[T]:
+class BatchResult(Generic[T]):
     succeeded: dict[int, T]
     failed: dict[int, Exception]
 

From 724a7884edf2e0b3faea28aa62a53dcf810c4e39 Mon Sep 17 00:00:00 2001
From: Nir Adler <me@niradler.com>
Date: Fri, 29 May 2026 16:58:00 +0300
Subject: [PATCH 09/16] test(plugin): add trigger-separation eval set + results
 (cross-skill negatives clean)

---
 docs/superpowers/evals/RESULTS.md           | 45 +++++++++++++++++++++
 docs/superpowers/evals/trigger-queries.json | 21 ++++++++++
 2 files changed, 66 insertions(+)
 create mode 100644 docs/superpowers/evals/RESULTS.md
 create mode 100644 docs/superpowers/evals/trigger-queries.json

diff --git a/docs/superpowers/evals/RESULTS.md b/docs/superpowers/evals/RESULTS.md
new file mode 100644
index 0000000..ccfdf6d
--- /dev/null
+++ b/docs/superpowers/evals/RESULTS.md
@@ -0,0 +1,45 @@
+# Trigger-separation eval — results (2026-05-29)
+
+Lean dev-aid eval per the plugin spec (a goal, not a ship gate). Harness:
+skill-creator `scripts/run_eval.py`, which installs a skill's `description` as a
+temp command and runs `claude -p <query>` to see whether the model invokes it.
+
+Query pool: `trigger-queries.json` — 16 queries, 4 per skill intent
+(python / go / node / debug-agent). For each skill the same pool is relabeled
+`should_trigger = (intent == skill)`, so the other 12 act as cross-skill
+near-miss negatives.
+
+## What was measured
+
+- **Cross-skill separation (negatives): PASS.** `debug-agent` ran against the
+  full pool: **12/12 cross-language negatives correctly did NOT trigger** — it
+  stayed quiet on every python/go/node query. A `python` smoke run likewise
+  stayed quiet on the go/node negatives. The descriptions are well-separated;
+  no mis-trigger between the four skills was observed.
+
+## Platform limitation (positive rate not measurable on native Windows)
+
+- The 4 positive queries reported `trigger_rate 0` — but each coincided with a
+  `WinError 10038: An operation was attempted on something that is not a
+  socket`. `run_eval.py` detects triggering by `select.select()` on the
+  `claude -p` subprocess **pipe**; on native Windows `select` accepts only
+  sockets, so the stream reader raises before it can observe the Skill
+  invocation. This is a harness/platform bug, **not** a description defect.
+- The spec anticipated this: "Run eval scripts through a POSIX shell (Bash
+  tool / WSL)." Reliable positive-trigger and the auto-rewrite `run_loop` need
+  WSL/Linux. The unbounded `run_loop` was intentionally skipped on Windows
+  because it would inherit the same broken positive signal.
+
+## Conclusion
+
+The high-value property — the four descriptions fire on their own intent and
+stay quiet on the others' — is validated on the reliable (negative) axis, and
+the descriptions were independently reviewed as triggers-only and keyword-rich.
+Positive-rate numbers should be regenerated under WSL/Linux if exact figures are
+wanted; rerun with:
+
+```sh
+PYTHONPATH=<skill-creator> python run_eval.py \
+  --eval-set <skill>.json --skill-path plugin/skills/<skill> \
+  --runs-per-query 3 --model claude-sonnet-4-6
+```
diff --git a/docs/superpowers/evals/trigger-queries.json b/docs/superpowers/evals/trigger-queries.json
new file mode 100644
index 0000000..402ee4a
--- /dev/null
+++ b/docs/superpowers/evals/trigger-queries.json
@@ -0,0 +1,21 @@
+[
+  { "query": "Add type hints to this module and get it passing mypy --strict", "intent": "python" },
+  { "query": "Write a Pythonic async FastAPI endpoint that fetches rows from Postgres", "intent": "python" },
+  { "query": "My .py script raises AttributeError on a None value, refactor it cleanly", "intent": "python" },
+  { "query": "Make this asyncio code stop blocking the event loop", "intent": "python" },
+
+  { "query": "I have a data race detected by go test -race in my worker pool", "intent": "go" },
+  { "query": "Wrap these errors with %w and check them with errors.Is in my Go service", "intent": "go" },
+  { "query": "My goroutines deadlock: all goroutines are asleep - deadlock", "intent": "go" },
+  { "query": "Use functional options for this Go struct constructor", "intent": "go" },
+
+  { "query": "Fix this TypeScript TS2345 error, the argument is not assignable", "intent": "node" },
+  { "query": "My Express handler has an unhandled promise rejection, make it robust", "intent": "node" },
+  { "query": "Write an advanced conditional type that unwraps a Promise in TypeScript", "intent": "node" },
+  { "query": "Cannot read properties of undefined in my Node EventEmitter code", "intent": "node" },
+
+  { "query": "My program crashes with a traceback and I want to triage it to the failing frame", "intent": "debug-agent" },
+  { "query": "Pause my script at a breakpoint and inspect the live value of a variable", "intent": "debug-agent" },
+  { "query": "The process hangs and I need to see live runtime state, not guess from source", "intent": "debug-agent" },
+  { "query": "Set a breakpoint and step to find where this value first goes wrong", "intent": "debug-agent" }
+]

From bdf4dfdd0e6fe9b30658e970fb2524da8d7a8793 Mon Sep 17 00:00:00 2001
From: Nir Adler <me@niradler.com>
Date: Fri, 29 May 2026 17:43:32 +0300
Subject: [PATCH 10/16] chore: stop tracking .claude/settings.local.json
 (already gitignored)

---
 .claude/settings.local.json | 8 --------
 1 file changed, 8 deletions(-)
 delete mode 100644 .claude/settings.local.json

diff --git a/.claude/settings.local.json b/.claude/settings.local.json
deleted file mode 100644
index 6f2ffd3..0000000
--- a/.claude/settings.local.json
+++ /dev/null
@@ -1,8 +0,0 @@
-{
-  "permissions": {
-    "allow": [
-      "Bash(dir)",
-      "Bash(uv run *)"
-    ]
-  }
-}

From 6f2f468e5dbe8f45ac8f88de6ba75ae366da0af9 Mon Sep 17 00:00:00 2001
From: Nir Adler <me@niradler.com>
Date: Fri, 29 May 2026 17:53:05 +0300
Subject: [PATCH 11/16] test(plugin): record WSL trigger-eval results
 (separation clean; positive rate is harness ceiling)

---
 docs/superpowers/evals/RESULTS.md | 75 ++++++++++++++++++-------------
 1 file changed, 44 insertions(+), 31 deletions(-)

diff --git a/docs/superpowers/evals/RESULTS.md b/docs/superpowers/evals/RESULTS.md
index ccfdf6d..7d1234c 100644
--- a/docs/superpowers/evals/RESULTS.md
+++ b/docs/superpowers/evals/RESULTS.md
@@ -9,37 +9,50 @@ Query pool: `trigger-queries.json` — 16 queries, 4 per skill intent
 `should_trigger = (intent == skill)`, so the other 12 act as cross-skill
 near-miss negatives.
 
-## What was measured
-
-- **Cross-skill separation (negatives): PASS.** `debug-agent` ran against the
-  full pool: **12/12 cross-language negatives correctly did NOT trigger** — it
-  stayed quiet on every python/go/node query. A `python` smoke run likewise
-  stayed quiet on the go/node negatives. The descriptions are well-separated;
-  no mis-trigger between the four skills was observed.
-
-## Platform limitation (positive rate not measurable on native Windows)
-
-- The 4 positive queries reported `trigger_rate 0` — but each coincided with a
-  `WinError 10038: An operation was attempted on something that is not a
-  socket`. `run_eval.py` detects triggering by `select.select()` on the
-  `claude -p` subprocess **pipe**; on native Windows `select` accepts only
-  sockets, so the stream reader raises before it can observe the Skill
-  invocation. This is a harness/platform bug, **not** a description defect.
-- The spec anticipated this: "Run eval scripts through a POSIX shell (Bash
-  tool / WSL)." Reliable positive-trigger and the auto-rewrite `run_loop` need
-  WSL/Linux. The unbounded `run_loop` was intentionally skipped on Windows
-  because it would inherit the same broken positive signal.
-
-## Conclusion
-
-The high-value property — the four descriptions fire on their own intent and
-stay quiet on the others' — is validated on the reliable (negative) axis, and
-the descriptions were independently reviewed as triggers-only and keyword-rich.
-Positive-rate numbers should be regenerated under WSL/Linux if exact figures are
-wanted; rerun with:
+## Results (WSL/Linux, skills CLI harness, runs-per-query 1–3)
+
+| Skill | Passed | Negatives (no mis-trigger) | Positives (auto-trigger ≥0.5) |
+| --- | --- | --- | --- |
+| debug-agent | 12/16 | 12/12 ✅ | ~1/4 |
+| python | 12/16 | 12/12 ✅ | ~0/4 |
+| go | 13/16 | 12/12 ✅ | ~1/4 |
+| node | 12/16 | 12/12 ✅ | ~0/4 |
+
+- **Cross-skill separation (the property that matters for a 4-skill plugin):
+  excellent and uniform.** Every skill stays quiet on the other three skills'
+  intents (12/12 negatives each). No mis-trigger observed anywhere.
+- **Positive auto-trigger rate is uniformly low — a harness ceiling, not a
+  prompt defect.** Discriminating test: re-running `debug-agent` with a
+  deliberately punchy, imperative description ("Use this skill whenever…",
+  explicit trigger keywords, "Always use before guessing") produced **no lift**
+  (still ~1/4). A description-quality problem would vary by skill and respond to
+  a stronger trigger; instead the rate is flat across all skills and unresponsive
+  to description strength. The cause is methodology: `run_eval.py` injects each
+  skill as a `.claude/commands/` entry and measures whether one-shot `claude -p`
+  auto-invokes it — and one-shot non-interactive runs tend to just do the task
+  rather than auto-invoke a command. Real plugin-installed skills trigger via a
+  different path.
+
+## Why `run_loop` auto-optimization was not run
+
+`run_loop` maximizes positive trigger rate. The discriminating test shows that
+rate is capped by the harness, not the description, so optimization would chase
+a biased proxy and risk overfitting descriptions that are already triggers-only,
+keyword-rich, independently reviewed, and behaviorally validated (see the
+buggy-script baseline-vs-with-skill test). Decision: keep the reviewed
+descriptions; rely on the clean separation result.
+
+## Windows note (original blocker)
+
+On native Windows the positive axis was entirely unmeasurable: `run_eval.py`
+polls the `claude -p` subprocess **pipe** with `select.select()`, and Windows
+`select` accepts only sockets → `WinError 10038`. WSL/Linux fixes this (Linux
+`select` works on pipe fds). Rerun under WSL with:
 
 ```sh
-PYTHONPATH=<skill-creator> python run_eval.py \
-  --eval-set <skill>.json --skill-path plugin/skills/<skill> \
-  --runs-per-query 3 --model claude-sonnet-4-6
+PYTHONPATH=<skill-creator> uv run --no-project --with pyyaml python run_eval.py \
+  --eval-set <skill>.json --skill-path plugin/skills/<skill> --runs-per-query 3
 ```
+
+(`--no-project` is required so `uv` does not try to repair the Windows-format
+`.venv` over the `/mnt/c` mount.)

From 81fbb6b167cc508ee880d3b9fb34892c56384d83 Mon Sep 17 00:00:00 2001
From: Nir Adler <me@niradler.com>
Date: Fri, 29 May 2026 17:59:40 +0300
Subject: [PATCH 12/16] docs(plugin): advertise full plugin + both install
 paths in README; add marketplace description

---
 .claude-plugin/marketplace.json |  1 +
 README.md                       | 60 ++++++++++++++++++++-------------
 2 files changed, 37 insertions(+), 24 deletions(-)

diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json
index c990dac..a2585ef 100644
--- a/.claude-plugin/marketplace.json
+++ b/.claude-plugin/marketplace.json
@@ -1,5 +1,6 @@
 {
   "name": "dbga",
+  "description": "Evidence-first debugging (Python/Go/Node over DAP) plus consolidated language skills and an architect for clean, verified code.",
   "owner": { "name": "Nir Adler" },
   "plugins": [
     { "name": "debug-agent", "source": "./plugin" }
diff --git a/README.md b/README.md
index 6a595a4..f3e18d3 100644
--- a/README.md
+++ b/README.md
@@ -143,36 +143,41 @@ they belong next to the code. Add `.debug-agent/` to your `.gitignore`:
         └── lock              # liveness marker
 ```
 
-## The `debug-agent` Skill
+## The `debug-agent` Claude Code plugin
 
-`plugin/skills/debug-agent/` contains a Claude / agent skill that teaches
-evidence-first debugging on top of `dbga`. It ships inside the `debug-agent`
-Claude Code plugin (see [`plugin/README.md`](plugin/README.md)) and includes:
+`plugin/` is a [Claude Code plugin](https://docs.claude.com/en/docs/claude-code)
+that bundles `dbga` with a full design → develop → debug → verify → clean-up
+workflow for Python, Go, and Node/TypeScript:
 
-- **`SKILL.md`** — when to trigger, decision tree, mindset
-- **`references/workflow.md`** — the evidence-first loop
-- **`references/log-monitoring.md`** — using `watch`
-- **`references/localization.md`** — `localize` and `diagnose`
-- **`references/instrumentation.md`** — reversible probes
-- **`references/debugger.md`** — driving `session`
-- **`references/vscode-collab.md`** — `--listen` + shared breakpoints
-- **`references/advanced.md`** — hang / deadlock / concurrency / wolf-fence
+- **Skills** (`/debug-agent:*`): `debug-agent` (the evidence-first `dbga` driver),
+  plus `python`, `go`, `node` development skills that route to language-specific
+  references on demand.
+- **Agents** (`/agents`): `architect` (orchestrator) and `python-expert`,
+  `go-expert`, `node-expert`.
+- **Command:** `/debug-agent:setup` — optional one-shot `dbga` installer.
 
-### Install the skill
+Full plugin docs: [`plugin/README.md`](plugin/README.md).
 
-The recommended path is [`npx skills`](https://github.com/vercel-labs/skills),
-the open agent-skills installer. It reads `SKILL.md` straight from the GitHub
-repo and drops it into `~/.claude/skills/` (or your agent host's equivalent):
+### Install — full plugin (recommended)
 
 ```sh
-# Install just this skill
-npx skills add niradler/dbga --skill debug-agent
+claude plugin marketplace add niradler/dbga
+/plugin install debug-agent@dbga
+/debug-agent:setup            # optional: installs the dbga CLI
+```
+
+### Install — a single skill
+
+The [`skills`](https://github.com/vercel-labs/skills) CLI installs any one skill
+standalone (skills only — agents/commands come with the full plugin). Resolution
+is automatic via the repo-root marketplace manifest; no `--full-depth` needed:
 
-# Or preview what's available first
-npx skills add niradler/dbga --list
+```sh
+npx skills add niradler/dbga --skill python   # or: go | node | debug-agent
+npx skills add niradler/dbga --list           # preview what's available
 ```
 
-Manual install also works:
+Manual install of just the debugger skill also works:
 
 ```sh
 # Linux / macOS
@@ -182,9 +187,16 @@ cp -r plugin/skills/debug-agent ~/.claude/skills/
 Copy-Item -Recurse plugin/skills/debug-agent $env:USERPROFILE\.claude\skills\
 ```
 
-> Installing the full plugin (`/plugin install debug-agent@dbga`) brings this
-> skill plus the `python`/`go`/`node` skills and the agents — see
-> [`plugin/README.md`](plugin/README.md).
+### What the `debug-agent` skill covers
+
+- **`SKILL.md`** — when to trigger, decision tree, mindset
+- **`references/workflow.md`** — the evidence-first loop
+- **`references/log-monitoring.md`** — using `watch`
+- **`references/localization.md`** — `localize` and `diagnose`
+- **`references/instrumentation.md`** — reversible probes
+- **`references/debugger.md`** — driving `session`
+- **`references/vscode-collab.md`** — `--listen` + shared breakpoints
+- **`references/advanced.md`** — hang / deadlock / concurrency / wolf-fence
 
 ## Development
 

From 63cf9d001b23b6f362060dbde08d16292ab2d338 Mon Sep 17 00:00:00 2001
From: Nir Adler <me@niradler.com>
Date: Fri, 29 May 2026 18:11:04 +0300
Subject: [PATCH 13/16] test(plugin): add multi-language sim fixtures for
 end-to-end plugin exercise

---
 tests/plugin-sim/README.md               | 25 ++++++++++++++++++++++++
 tests/plugin-sim/go/buggy.go             | 16 +++++++++++++++
 tests/plugin-sim/go/go.mod               |  3 +++
 tests/plugin-sim/node/buggy.js           | 12 ++++++++++++
 tests/plugin-sim/python/buggy_average.py | 13 ++++++++++++
 5 files changed, 69 insertions(+)
 create mode 100644 tests/plugin-sim/README.md
 create mode 100644 tests/plugin-sim/go/buggy.go
 create mode 100644 tests/plugin-sim/go/go.mod
 create mode 100644 tests/plugin-sim/node/buggy.js
 create mode 100644 tests/plugin-sim/python/buggy_average.py

diff --git a/tests/plugin-sim/README.md b/tests/plugin-sim/README.md
new file mode 100644
index 0000000..fa45da7
--- /dev/null
+++ b/tests/plugin-sim/README.md
@@ -0,0 +1,25 @@
+# Plugin simulation fixtures
+
+Known-buggy programs used to exercise the `debug-agent` plugin (skills + agents
+driving `dbga`) end-to-end in a realistic session. Not collected by pytest
+(no `test_*.py`).
+
+| Path | Bug | `dbga diagnose` should report |
+| --- | --- | --- |
+| `python/buggy_average.py` | divide by `len([])` | `ZeroDivisionError: division by zero`, deepest frame `average` line 3 |
+| `go/buggy.go` | `total / len(nums)` on empty slice | `panic: runtime error: integer divide by zero`, `main.average` line 10 |
+| `node/buggy.js` | `record.value` on a `null` element | `TypeError: Cannot read properties of null (reading 'value')`, `getValue` |
+
+Each is the same "average of an empty collection" / "null element" class of bug,
+so the fix is to guard the empty/null case before the operation.
+
+## Reproduce
+
+```sh
+uv run dbga diagnose --timeout 30 -- python tests/plugin-sim/python/buggy_average.py
+uv run dbga diagnose --lang go --cwd tests/plugin-sim/go --timeout 90 -- go run buggy.go
+uv run dbga diagnose --cwd tests/plugin-sim/node --timeout 90 -- node buggy.js
+```
+
+If `diagnose` returns `session_exists`, clear the prior run first:
+`uv run dbga session release`.
diff --git a/tests/plugin-sim/go/buggy.go b/tests/plugin-sim/go/buggy.go
new file mode 100644
index 0000000..6c5c74e
--- /dev/null
+++ b/tests/plugin-sim/go/buggy.go
@@ -0,0 +1,16 @@
+package main
+
+import "fmt"
+
+func average(nums []int) int {
+	total := 0
+	for _, n := range nums {
+		total += n
+	}
+	return total / len(nums)
+}
+
+func main() {
+	data := []int{}
+	fmt.Println(average(data))
+}
diff --git a/tests/plugin-sim/go/go.mod b/tests/plugin-sim/go/go.mod
new file mode 100644
index 0000000..a1ae82e
--- /dev/null
+++ b/tests/plugin-sim/go/go.mod
@@ -0,0 +1,3 @@
+module buggysim
+
+go 1.26
diff --git a/tests/plugin-sim/node/buggy.js b/tests/plugin-sim/node/buggy.js
new file mode 100644
index 0000000..a746cd3
--- /dev/null
+++ b/tests/plugin-sim/node/buggy.js
@@ -0,0 +1,12 @@
+function getValue(record) {
+  return record.value;
+}
+
+function main() {
+  const records = [{ value: 10 }, null];
+  for (const r of records) {
+    console.log(getValue(r));
+  }
+}
+
+main();
diff --git a/tests/plugin-sim/python/buggy_average.py b/tests/plugin-sim/python/buggy_average.py
new file mode 100644
index 0000000..c0f676e
--- /dev/null
+++ b/tests/plugin-sim/python/buggy_average.py
@@ -0,0 +1,13 @@
+def average(nums):
+    total = sum(nums)
+    return total / len(nums)
+
+
+def main():
+    datasets = [[10, 20, 30], []]
+    for ds in datasets:
+        print(average(ds))
+
+
+if __name__ == "__main__":
+    main()

From b759326c093c08b617d780b68ec36fcb2b1e6b51 Mon Sep 17 00:00:00 2001
From: Nir Adler <me@niradler.com>
Date: Sat, 30 May 2026 00:00:01 +0300
Subject: [PATCH 14/16] refactor(plugin): add review/audit evidence mode to
 skills + agent prompts

Dogfooding the plugin's own agents on this repo surfaced a systemic gap: all four experts default to confident source-reading on review/audit tasks (no live failure to reproduce) without labeling confidence. node-expert shipped a HIGH-severity false positive that a single parser run disproved, and its proposed fix was behaviorally identical to the existing code.

Add a 'live-failure vs static review' mode to the shared evidence-first discipline and an in-body review/audit clause to python/go/node-expert and architect: label findings RUNTIME-VERIFIED vs INSPECTION-ONLY, prove or offer a repro for anything reproducible, and separate 'breaks today' from 'latent under a future/edge runtime'. Kept in agent bodies because dispatched agents were observed skipping reference loads.
---
 plugin/agents/architect.md              |  5 +++++
 plugin/agents/go-expert.md              |  2 ++
 plugin/agents/node-expert.md            |  2 ++
 plugin/agents/python-expert.md          |  2 ++
 plugin/skills/_shared/evidence-first.md | 29 ++++++++++++++++++++++++-
 5 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/plugin/agents/architect.md b/plugin/agents/architect.md
index 61a8f43..0390b6f 100644
--- a/plugin/agents/architect.md
+++ b/plugin/agents/architect.md
@@ -62,3 +62,8 @@ need live runtime state, DO NOT guess from source. Gather evidence:
 
 Validate against real use flows and verify the fix at the original fault before
 declaring it done.
+
+**On a review/audit task** (no live failure to reproduce): source reasoning is
+fine, but label each finding `RUNTIME-VERIFIED` vs `INSPECTION-ONLY`, prove or
+offer a repro for anything reproducible, and separate "breaks today" from
+"latent under a future/edge runtime." (`_shared/evidence-first.md`)
diff --git a/plugin/agents/go-expert.md b/plugin/agents/go-expert.md
index fb4adc0..367eb31 100644
--- a/plugin/agents/go-expert.md
+++ b/plugin/agents/go-expert.md
@@ -53,6 +53,8 @@ You have `dbga` — an evidence-first debugger for Python/Go/Node over DAP — a
 
 Validate against real use flows and verify the fix at the original fault before declaring it done.
 
+**On a review/audit task** (no live failure to reproduce): source reasoning is fine, but label each finding `RUNTIME-VERIFIED` vs `INSPECTION-ONLY`, prove or offer a repro for anything reproducible, and separate "breaks today" from "latent under a future/edge runtime." (`_shared/evidence-first.md`)
+
 For Go, pass `--lang go` and `--cwd <module dir>` (the dir with `go.mod`):
 
 - `dbga diagnose --lang go --cwd <module dir> -- go run .`
diff --git a/plugin/agents/node-expert.md b/plugin/agents/node-expert.md
index 54efb3a..0f91955 100644
--- a/plugin/agents/node-expert.md
+++ b/plugin/agents/node-expert.md
@@ -41,6 +41,8 @@ You have `dbga` — an evidence-first debugger for Python/Go/Node over DAP — a
 
 Validate against real use flows and verify the fix at the original fault before declaring it done.
 
+**On a review/audit task** (no live failure to reproduce): source reasoning is fine, but label each finding `RUNTIME-VERIFIED` vs `INSPECTION-ONLY`, prove or offer a repro for anything reproducible, and separate "breaks today" from "latent under a future/edge runtime." (`_shared/evidence-first.md`)
+
 For Node, the forms match the `debug-agent` SKILL.md:
 
 ```powershell
diff --git a/plugin/agents/python-expert.md b/plugin/agents/python-expert.md
index 0ba4d97..dc00eec 100644
--- a/plugin/agents/python-expert.md
+++ b/plugin/agents/python-expert.md
@@ -39,4 +39,6 @@ You have `dbga` — an evidence-first debugger for Python/Go/Node over DAP — a
 
 Validate against real use flows and verify the fix at the original fault before declaring it done.
 
+**On a review/audit task** (no live failure to reproduce): source reasoning is fine, but label each finding `RUNTIME-VERIFIED` vs `INSPECTION-ONLY`, prove or offer a repro for anything reproducible, and separate "breaks today" from "latent under a future/edge runtime." (`_shared/evidence-first.md`)
+
 Python-specific `dbga` recipes (script-path sessions, async breakpoints, read-only eval, reversible instrument probes) are in the `python` skill's `references/debugging.md`.
diff --git a/plugin/skills/_shared/evidence-first.md b/plugin/skills/_shared/evidence-first.md
index 5875c95..46d02a7 100644
--- a/plugin/skills/_shared/evidence-first.md
+++ b/plugin/skills/_shared/evidence-first.md
@@ -21,6 +21,29 @@ standard **Evidence-First Debugging** block embedded across the plugin.
 The loop: design → implement → run the real flow → debug with evidence →
 simplify → verify at the fault.
 
+## Two modes: live-failure vs. static review
+
+The discipline above assumes a *live failure to reproduce*. Match the mode to
+the task:
+
+- **Live failure** (crash, hang, wrong output, flaky test): reproduce it with
+  `dbga` first. Source reasoning is a hypothesis until a run confirms it. Verify
+  the fix at the original fault.
+- **Static review / audit / design assessment** (no failing run to point at —
+  "review this for bugs", "is this design sound"): source reasoning is
+  legitimate, but it is *unverified*. So:
+  1. **Label every finding** `RUNTIME-VERIFIED` (you reproduced/observed it) or
+     `INSPECTION-ONLY` (read from source). Never imply verification you didn't do.
+  2. **Prove or offer the repro.** If a finding can be shown with a failing test
+     or a `dbga` run, do it — or explicitly offer it. A bug you could have run
+     but didn't is INSPECTION-ONLY at best.
+  3. **Separate "breaks today" from "latent."** Rank by what fails under the
+     runtime in use now vs. only under a future/edge runtime (e.g. free-threaded
+     CPython). Don't inflate severity for the theoretical.
+
+A confident, well-formatted INSPECTION-ONLY finding is the most dangerous output
+you produce — it reads as fact. The label and the repro offer keep it honest.
+
 ## Standard Evidence-First Debugging block
 
 Embed this (verbatim or trimmed) in agents and skill bodies:
@@ -38,7 +61,11 @@ or you need live runtime state, DO NOT guess from source. Gather evidence:
 - Invoke the `debug-agent` skill for the full evidence-first loop.
 
 Validate against real use flows and verify the fix at the original fault
-before declaring it done.
+before declaring it done. On a **review/audit** task (no live failure to
+reproduce), source reasoning is fine but unverified: label each finding
+RUNTIME-VERIFIED vs INSPECTION-ONLY, prove or offer a repro for anything
+reproducible, and separate "breaks today" from "latent under a future/edge
+runtime."
 ```
 
 ## Mindset (cross-language)

From db7fa876c05a019b2df23ad3c83a2a740303176a Mon Sep 17 00:00:00 2001
From: Nir Adler <me@niradler.com>
Date: Sat, 30 May 2026 01:23:59 +0300
Subject: [PATCH 15/16] fix(core): replay launch exception filters to child DAP
 session

Child-delegating adapters (vscode-js-debug) run the program in a child
session, so exception filters set on the parent at launch never bind.
Breakpoints were already stashed and replayed on the child; do the same
for exception filters. Without this, `--break-on-exception` was silently
dropped for Node. Stash `_exception_filters` at launch and replay them in
`_on_start_debugging` alongside breakpoints.
---
 src/debug_agent/core/dap_session.py | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/src/debug_agent/core/dap_session.py b/src/debug_agent/core/dap_session.py
index 6139c05..c5dca41 100644
--- a/src/debug_agent/core/dap_session.py
+++ b/src/debug_agent/core/dap_session.py
@@ -112,11 +112,13 @@ def __init__(
         self._clients_lock = threading.Lock()
         self._child_clients: list[DapClient] = []
         self._active_client: DapClient | None = None
-        # Breakpoints requested at launch. For child-delegating adapters
-        # (vscode-js-debug) these can't be set on the parent — the program
-        # runs in a child session — so we stash them here and replay them on
-        # the child during its handshake in ``_on_start_debugging``.
+        # Breakpoints and exception filters requested at launch. For
+        # child-delegating adapters (vscode-js-debug) these can't be set on the
+        # parent — the program runs in a child session — so we stash them here
+        # and replay them on the child during its handshake in
+        # ``_on_start_debugging``.
         self._launch_breakpoints: list[Breakpoint] = []
+        self._exception_filters: list[str] = []
         self._adapter_proc: subprocess.Popen[bytes] | None = None
         self._current_thread_id: int | None = None
         self._output_buffer: list[str] = []
@@ -210,9 +212,11 @@ def start(
 
             client.wait_for_event("initialized", timeout=10.0)
 
-            # Stash launch breakpoints so child-delegating adapters can replay
-            # them on the child connection (see ``_on_start_debugging``).
+            # Stash launch breakpoints and exception filters so child-delegating
+            # adapters can replay them on the child connection (see
+            # ``_on_start_debugging``).
             self._launch_breakpoints = list(breakpoints or [])
+            self._exception_filters = list(exception_filters or [])
             if not self._adapter.delegates_launch_to_child:
                 # Single-connection adapter: set breakpoints on the one client.
                 # DAP setBreakpoints replaces per source, so group by file.
@@ -222,7 +226,7 @@ def start(
                 for file_path, bps in by_file.items():
                     self.set_breakpoints(file_path, bps)
 
-            client.set_exception_breakpoints(exception_filters or [])
+            client.set_exception_breakpoints(self._exception_filters)
             client.configuration_done()
             client.wait_response(launch_seq, "launch", timeout=10.0)
             self._state = "running"
@@ -601,7 +605,10 @@ def _on_start_debugging(self, args: dict[str, Any]) -> dict[str, Any]:
                 dap_bps.append(entry)
             with contextlib.suppress(DapError):
                 child.set_breakpoints(file_path, dap_bps)
-        child.set_exception_breakpoints([])
+        # Replay launch-time exception filters on the CHILD for the same reason
+        # as breakpoints — filters set on the parent at launch never bind to the
+        # program, which runs in the child session.
+        child.set_exception_breakpoints(self._exception_filters)
         child.configuration_done()
         child.wait_response(child_seq, request_type, timeout=15.0)
 

From b073e2d0fbaf74612202ba9b7081d7f4857bc3b0 Mon Sep 17 00:00:00 2001
From: Nir Adler <me@niradler.com>
Date: Sat, 30 May 2026 01:24:05 +0300
Subject: [PATCH 16/16] docs(plugin): front-load evidence-first stance for
 node-expert; README usage guidance

node-expert: make the top-level evidence stance unconditional (mirrors
python-expert rule #3) so the RUNTIME-VERIFIED/INSPECTION-ONLY labeling
discipline applies to review/audit tasks, not just crash-fix flows.

README: add usage guidance (architect delegation cliff, review-vs-debug
expectations, opus override for hard single-language tasks).
---
 plugin/README.md             | 11 ++++++++++-
 plugin/agents/node-expert.md |  2 +-
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/plugin/README.md b/plugin/README.md
index 1de171b..93a32c8 100644
--- a/plugin/README.md
+++ b/plugin/README.md
@@ -69,9 +69,18 @@ the `skills` CLI at `plugin/skills/`, so no `--full-depth` flag is needed.
 - **Develop in one language:** the matching skill (`python`/`go`/`node`) loads
   language-specific references on demand.
 - **Orchestrate:** run `claude --agent debug-agent:architect` to let the
-  architect gather evidence and delegate to the language experts. See
+  architect gather evidence and delegate to the language experts. Delegation
+  works **only when the architect is the main agent** — dispatched as a subagent
+  it cannot spawn experts and works solo. See
   [`references/agent-teams.md`](references/agent-teams.md) for the experimental
   parallel-debugging mode.
+- **Hard single-language task:** the experts default to **sonnet**; for a gnarly
+  type-level, concurrency, or panic-trace problem, request an **opus** override
+  at dispatch.
+- **Review vs. debug:** on a live failure the agents reproduce it with `dbga`
+  before proposing a fix. On a review/audit task (no failing run) they reason
+  from source but label each finding `RUNTIME-VERIFIED` vs `INSPECTION-ONLY` —
+  treat an `INSPECTION-ONLY` finding as a hypothesis until you've run it.
 
 ## License
 
diff --git a/plugin/agents/node-expert.md b/plugin/agents/node-expert.md
index 0f91955..9c2434f 100644
--- a/plugin/agents/node-expert.md
+++ b/plugin/agents/node-expert.md
@@ -10,7 +10,7 @@ You are the Node/TypeScript expert. You write strict-typed, clean, verified Node
 ## Operating stance
 
 - **TypeScript-first, `strict: true`.** No `any` without a justified reason; model the domain so illegal states are unrepresentable; let inference carry non-boundary types.
-- **Evidence before fixes.** On a crash/hang/wrong output, gather runtime evidence with `dbga` before changing code (see below).
+- **Evidence first — validate against a real run, not source-reading.** On a crash/hang/wrong output, gather runtime evidence with `dbga` before changing code. On a review/audit (no live failure), this still applies: label each finding `RUNTIME-VERIFIED` vs `INSPECTION-ONLY` and treat inspection-only as a hypothesis (see below).
 - **Run a real flow before declaring done** — `tsc --noEmit`, the test suite, or the actual command.
 - **Clean, self-explaining code; no comments unless asked.**