niradler · niradler · May 29, 2026 · May 29, 2026 · May 29, 2026 · May 29, 2026
diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json
@@ -0,0 +1,8 @@
+{
+  "name": "dbga",
+  "description": "Evidence-first debugging (Python/Go/Node over DAP) plus consolidated language skills and an architect for clean, verified code.",
+  "owner": { "name": "Nir Adler" },
+  "plugins": [
+    { "name": "debug-agent", "source": "./plugin" }
+  ]
+}
diff --git a/.claude/settings.local.json b/.claude/settings.local.json
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,6 +7,20 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
+### Fixed
+
+- Launch-time **exception filters are now replayed to the child session** for
+  child-delegating adapters (vscode-js-debug). They were only set on the parent
+  connection, so `--break-on-exception` was silently dropped for Node.
+
+### Changed
+
+- **`debug-agent` skill relocated** from `skills/debug-agent/` to
+  `plugin/skills/debug-agent/` as part of packaging the `debug-agent` Claude
+  Code plugin. `npx skills add niradler/dbga --skill debug-agent` still resolves
+  it (via the repo-root `.claude-plugin/marketplace.json`); update any manual
+  copy path accordingly.
+
 ## [0.1.1] — 2026-05-29
 
 Multi-language release. The debugger is no longer Python-only: the DAP
@@ -80,7 +94,7 @@ daemon, with auto-context returned on every stop.
   (truncated to 200-char strings / 5-item collection previews), full stack
   (capped at 20 frames), recent output, warnings. No follow-up calls
   needed. Configurable via `--context-lines`.
-- **`debug-agent` skill** (`skills/debug-agent/`) — Claude/agent
+- **`debug-agent` skill** (`plugin/skills/debug-agent/`) — Claude/agent
   skill that drives `dbga` with evidence-first workflow, log
   monitoring, localization, instrumentation, debugger, VS Code collab, and
   advanced (hang/deadlock/wolf-fence/concurrency) reference docs.

diff --git a/CLAUDE.md b/CLAUDE.md
@@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
 
 `dbga` (distribution) / `debug_agent` (import name) — an evidence-first **Python** debugger CLI built on top of `debugpy`/DAP. The CLI surface is stateless; a per-session background daemon owns the live DAP connection. Every stop returns auto-contextualized JSON (location, source window, locals, full stack, recent output, warnings) so an AI agent can drive a real debugger one command at a time.
 
-Status: alpha. Python-only by design today — `debugpy` and `"type": "python"` are hardcoded in the launch path.
+Status: alpha. Multi-language over DAP via the `adapters/` registry: **Python** (debugpy, the most-validated path), **Go** (Delve), and **Node/TypeScript** (vscode-js-debug). Python is the richest surface — `instrument` source probes are Python-centric and the Node multi-process lifecycle is not yet validated (see the `debug-agent` skill's "Honest Limits"). Adding a language means subclassing `adapters.base.Adapter` and registering it.
 
 ## Commands
 
@@ -72,6 +72,6 @@ Every CLI command returns a single JSON object on stdout via `core/format.emit_p
 - **Tear-down is best-effort and idempotent.** `DapSession.release()` is called from `finally`. Tree-killing the adapter is the unconditional fallback after a graceful `disconnect` request.
 - **The daemon idle-timeout watchdog** (default 1800s) exists so a forgotten session can't linger forever — don't disable it without thinking about cleanup.
 
-## The skill (`skills/debug-agent/`)
+## The skill (`plugin/skills/debug-agent/`)
 
-A Claude/agent skill ships in-repo at `skills/debug-agent/`. It documents the evidence-first workflow that the CLI is designed for (`SKILL.md` + `references/*.md`). If you change CLI command shapes or JSON schemas, audit the skill — it has concrete command examples that go stale silently.
+A Claude/agent skill ships in-repo at `plugin/skills/debug-agent/` (part of the `debug-agent` Claude Code plugin under `plugin/`). It documents the evidence-first workflow that the CLI is designed for (`SKILL.md` + `references/*.md`). If you change CLI command shapes or JSON schemas, audit the skill — it has concrete command examples that go stale silently.
diff --git a/README.md b/README.md
@@ -143,44 +143,61 @@ they belong next to the code. Add `.debug-agent/` to your `.gitignore`:
         └── lock              # liveness marker
 ```
 
-## The `debug-agent` Skill
+## The `debug-agent` Claude Code plugin
 
-`skills/debug-agent/` contains a Claude / agent skill that teaches
-evidence-first debugging on top of `dbga`. It includes:
+`plugin/` is a [Claude Code plugin](https://docs.claude.com/en/docs/claude-code)
+that bundles `dbga` with a full design → develop → debug → verify → clean-up
+workflow for Python, Go, and Node/TypeScript:
 
-- **`SKILL.md`** — when to trigger, decision tree, mindset
-- **`references/workflow.md`** — the evidence-first loop
-- **`references/log-monitoring.md`** — using `watch`
-- **`references/localization.md`** — `localize` and `diagnose`
-- **`references/instrumentation.md`** — reversible probes
-- **`references/debugger.md`** — driving `session`
-- **`references/vscode-collab.md`** — `--listen` + shared breakpoints
-- **`references/advanced.md`** — hang / deadlock / concurrency / wolf-fence
+- **Skills** (`/debug-agent:*`): `debug-agent` (the evidence-first `dbga` driver),
+  plus `python`, `go`, `node` development skills that route to language-specific
+  references on demand.
+- **Agents** (`/agents`): `architect` (orchestrator) and `python-expert`,
+  `go-expert`, `node-expert`.
+- **Command:** `/debug-agent:setup` — optional one-shot `dbga` installer.
 
-### Install the skill
+Full plugin docs: [`plugin/README.md`](plugin/README.md).
 
-The recommended path is [`npx skills`](https://github.com/vercel-labs/skills),
-the open agent-skills installer. It reads `SKILL.md` straight from the GitHub
-repo and drops it into `~/.claude/skills/` (or your agent host's equivalent):
+### Install — full plugin (recommended)
 
 ```sh
-# Install just this skill
-npx skills add niradler/dbga --skill debug-agent
+claude plugin marketplace add niradler/dbga
+/plugin install debug-agent@dbga
+/debug-agent:setup            # optional: installs the dbga CLI
+```
+
+### Install — a single skill
+
+The [`skills`](https://github.com/vercel-labs/skills) CLI installs any one skill
+standalone (skills only — agents/commands come with the full plugin). Resolution
+is automatic via the repo-root marketplace manifest; no `--full-depth` needed:
 
-# Or preview what's available first
-npx skills add niradler/dbga --list
+```sh
+npx skills add niradler/dbga --skill python   # or: go | node | debug-agent
+npx skills add niradler/dbga --list           # preview what's available
 ```
 
-Manual install also works:
+Manual install of just the debugger skill also works:
 
 ```sh
 # Linux / macOS
-cp -r skills/debug-agent ~/.claude/skills/
+cp -r plugin/skills/debug-agent ~/.claude/skills/
 
 # Windows PowerShell
-Copy-Item -Recurse skills/debug-agent $env:USERPROFILE\.claude\skills\
+Copy-Item -Recurse plugin/skills/debug-agent $env:USERPROFILE\.claude\skills\
 ```
 
+### What the `debug-agent` skill covers
+
+- **`SKILL.md`** — when to trigger, decision tree, mindset
+- **`references/workflow.md`** — the evidence-first loop
+- **`references/log-monitoring.md`** — using `watch`
+- **`references/localization.md`** — `localize` and `diagnose`
+- **`references/instrumentation.md`** — reversible probes
+- **`references/debugger.md`** — driving `session`
+- **`references/vscode-collab.md`** — `--listen` + shared breakpoints
+- **`references/advanced.md`** — hang / deadlock / concurrency / wolf-fence
+
 ## Development
 
 ```sh

diff --git a/docs/superpowers/evals/RESULTS.md b/docs/superpowers/evals/RESULTS.md
@@ -0,0 +1,58 @@
+# Trigger-separation eval — results (2026-05-29)
+
+Lean dev-aid eval per the plugin spec (a goal, not a ship gate). Harness:
+skill-creator `scripts/run_eval.py`, which installs a skill's `description` as a
+temp command and runs `claude -p <query>` to see whether the model invokes it.
+
+Query pool: `trigger-queries.json` — 16 queries, 4 per skill intent
+(python / go / node / debug-agent). For each skill the same pool is relabeled
+`should_trigger = (intent == skill)`, so the other 12 act as cross-skill
+near-miss negatives.
+
+## Results (WSL/Linux, skills CLI harness, runs-per-query 1–3)
+
+| Skill | Passed | Negatives (no mis-trigger) | Positives (auto-trigger ≥0.5) |
+| --- | --- | --- | --- |
+| debug-agent | 12/16 | 12/12 ✅ | ~1/4 |
+| python | 12/16 | 12/12 ✅ | ~0/4 |
+| go | 13/16 | 12/12 ✅ | ~1/4 |
+| node | 12/16 | 12/12 ✅ | ~0/4 |
+
+- **Cross-skill separation (the property that matters for a 4-skill plugin):
+  excellent and uniform.** Every skill stays quiet on the other three skills'
+  intents (12/12 negatives each). No mis-trigger observed anywhere.
+- **Positive auto-trigger rate is uniformly low — a harness ceiling, not a
+  prompt defect.** Discriminating test: re-running `debug-agent` with a
+  deliberately punchy, imperative description ("Use this skill whenever…",
+  explicit trigger keywords, "Always use before guessing") produced **no lift**
+  (still ~1/4). A description-quality problem would vary by skill and respond to
+  a stronger trigger; instead the rate is flat across all skills and unresponsive
+  to description strength. The cause is methodology: `run_eval.py` injects each
+  skill as a `.claude/commands/` entry and measures whether one-shot `claude -p`
+  auto-invokes it — and one-shot non-interactive runs tend to just do the task
+  rather than auto-invoke a command. Real plugin-installed skills trigger via a
+  different path.
+
+## Why `run_loop` auto-optimization was not run
+
+`run_loop` maximizes positive trigger rate. The discriminating test shows that
+rate is capped by the harness, not the description, so optimization would chase
+a biased proxy and risk overfitting descriptions that are already triggers-only,
+keyword-rich, independently reviewed, and behaviorally validated (see the
+buggy-script baseline-vs-with-skill test). Decision: keep the reviewed
+descriptions; rely on the clean separation result.
+
+## Windows note (original blocker)
+
+On native Windows the positive axis was entirely unmeasurable: `run_eval.py`
+polls the `claude -p` subprocess **pipe** with `select.select()`, and Windows
+`select` accepts only sockets → `WinError 10038`. WSL/Linux fixes this (Linux
+`select` works on pipe fds). Rerun under WSL with:
+
+```sh
+PYTHONPATH=<skill-creator> uv run --no-project --with pyyaml python run_eval.py \
+  --eval-set <skill>.json --skill-path plugin/skills/<skill> --runs-per-query 3
+```
+
+(`--no-project` is required so `uv` does not try to repair the Windows-format
+`.venv` over the `/mnt/c` mount.)
diff --git a/docs/superpowers/evals/trigger-queries.json b/docs/superpowers/evals/trigger-queries.json
@@ -0,0 +1,21 @@
+[
+  { "query": "Add type hints to this module and get it passing mypy --strict", "intent": "python" },
+  { "query": "Write a Pythonic async FastAPI endpoint that fetches rows from Postgres", "intent": "python" },
+  { "query": "My .py script raises AttributeError on a None value, refactor it cleanly", "intent": "python" },
+  { "query": "Make this asyncio code stop blocking the event loop", "intent": "python" },
+
+  { "query": "I have a data race detected by go test -race in my worker pool", "intent": "go" },
+  { "query": "Wrap these errors with %w and check them with errors.Is in my Go service", "intent": "go" },
+  { "query": "My goroutines deadlock: all goroutines are asleep - deadlock", "intent": "go" },
+  { "query": "Use functional options for this Go struct constructor", "intent": "go" },
+
+  { "query": "Fix this TypeScript TS2345 error, the argument is not assignable", "intent": "node" },
+  { "query": "My Express handler has an unhandled promise rejection, make it robust", "intent": "node" },
+  { "query": "Write an advanced conditional type that unwraps a Promise in TypeScript", "intent": "node" },
+  { "query": "Cannot read properties of undefined in my Node EventEmitter code", "intent": "node" },
+
+  { "query": "My program crashes with a traceback and I want to triage it to the failing frame", "intent": "debug-agent" },
+  { "query": "Pause my script at a breakpoint and inspect the live value of a variable", "intent": "debug-agent" },
+  { "query": "The process hangs and I need to see live runtime state, not guess from source", "intent": "debug-agent" },
+  { "query": "Set a breakpoint and step to find where this value first goes wrong", "intent": "debug-agent" }
+]