Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
14e3eb3
docs: add debug-agent plugin spec and implementation plan
niradler May 29, 2026
d1b5523
feat(plugin): scaffold debug-agent plugin manifests, README, LICENSE,…
niradler May 29, 2026
f550b4f
refactor(plugin): relocate debug-agent skill into plugin/skills, upda…
niradler May 29, 2026
fe05e30
feat(plugin): add language-invariant _shared references (clean-code, …
niradler May 29, 2026
d1cf307
feat(plugin): add architect orchestrator agent (opus)
niradler May 29, 2026
066460c
feat(plugin): add setup command, agent-teams doc; fix CLAUDE.md multi…
niradler May 29, 2026
b2e892a
feat(plugin): add python, go, node skills + language expert agents
niradler May 29, 2026
bc12fe2
fix(plugin): correct Go dbga session shapes, dep-hygiene framing, Pyt…
niradler May 29, 2026
724a788
test(plugin): add trigger-separation eval set + results (cross-skill …
niradler May 29, 2026
bdf4dfd
chore: stop tracking .claude/settings.local.json (already gitignored)
niradler May 29, 2026
6f2f468
test(plugin): record WSL trigger-eval results (separation clean; posi…
niradler May 29, 2026
81fbb6b
docs(plugin): advertise full plugin + both install paths in README; a…
niradler May 29, 2026
63cf9d0
test(plugin): add multi-language sim fixtures for end-to-end plugin e…
niradler May 29, 2026
b759326
refactor(plugin): add review/audit evidence mode to skills + agent pr…
niradler May 29, 2026
db7fa87
fix(core): replay launch exception filters to child DAP session
niradler May 29, 2026
b073e2d
docs(plugin): front-load evidence-first stance for node-expert; READM…
niradler May 29, 2026
1093caf
Merge remote-tracking branch 'origin/main' into feat/claude-plugin
niradler May 29, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .claude-plugin/marketplace.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"name": "dbga",
"description": "Evidence-first debugging (Python/Go/Node over DAP) plus consolidated language skills and an architect for clean, verified code.",
"owner": { "name": "Nir Adler" },
"plugins": [
{ "name": "debug-agent", "source": "./plugin" }
]
}
7 changes: 0 additions & 7 deletions .claude/settings.local.json

This file was deleted.

16 changes: 15 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,20 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Fixed

- Launch-time **exception filters are now replayed to the child session** for
child-delegating adapters (vscode-js-debug). They were only set on the parent
connection, so `--break-on-exception` was silently dropped for Node.

### Changed

- **`debug-agent` skill relocated** from `skills/debug-agent/` to
`plugin/skills/debug-agent/` as part of packaging the `debug-agent` Claude
Code plugin. `npx skills add niradler/dbga --skill debug-agent` still resolves
it (via the repo-root `.claude-plugin/marketplace.json`); update any manual
copy path accordingly.

## [0.1.1] — 2026-05-29

Multi-language release. The debugger is no longer Python-only: the DAP
Expand Down Expand Up @@ -80,7 +94,7 @@ daemon, with auto-context returned on every stop.
(truncated to 200-char strings / 5-item collection previews), full stack
(capped at 20 frames), recent output, warnings. No follow-up calls
needed. Configurable via `--context-lines`.
- **`debug-agent` skill** (`skills/debug-agent/`) — Claude/agent
- **`debug-agent` skill** (`plugin/skills/debug-agent/`) — Claude/agent
skill that drives `dbga` with evidence-first workflow, log
monitoring, localization, instrumentation, debugger, VS Code collab, and
advanced (hang/deadlock/wolf-fence/concurrency) reference docs.
Expand Down
6 changes: 3 additions & 3 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co

`dbga` (distribution) / `debug_agent` (import name) — an evidence-first **Python** debugger CLI built on top of `debugpy`/DAP. The CLI surface is stateless; a per-session background daemon owns the live DAP connection. Every stop returns auto-contextualized JSON (location, source window, locals, full stack, recent output, warnings) so an AI agent can drive a real debugger one command at a time.

Status: alpha. Python-only by design today — `debugpy` and `"type": "python"` are hardcoded in the launch path.
Status: alpha. Multi-language over DAP via the `adapters/` registry: **Python** (debugpy, the most-validated path), **Go** (Delve), and **Node/TypeScript** (vscode-js-debug). Python is the richest surface — `instrument` source probes are Python-centric and the Node multi-process lifecycle is not yet validated (see the `debug-agent` skill's "Honest Limits"). Adding a language means subclassing `adapters.base.Adapter` and registering it.

## Commands

Expand Down Expand Up @@ -72,6 +72,6 @@ Every CLI command returns a single JSON object on stdout via `core/format.emit_p
- **Tear-down is best-effort and idempotent.** `DapSession.release()` is called from `finally`. Tree-killing the adapter is the unconditional fallback after a graceful `disconnect` request.
- **The daemon idle-timeout watchdog** (default 1800s) exists so a forgotten session can't linger forever — don't disable it without thinking about cleanup.

## The skill (`skills/debug-agent/`)
## The skill (`plugin/skills/debug-agent/`)

A Claude/agent skill ships in-repo at `skills/debug-agent/`. It documents the evidence-first workflow that the CLI is designed for (`SKILL.md` + `references/*.md`). If you change CLI command shapes or JSON schemas, audit the skill — it has concrete command examples that go stale silently.
A Claude/agent skill ships in-repo at `plugin/skills/debug-agent/` (part of the `debug-agent` Claude Code plugin under `plugin/`). It documents the evidence-first workflow that the CLI is designed for (`SKILL.md` + `references/*.md`). If you change CLI command shapes or JSON schemas, audit the skill — it has concrete command examples that go stale silently.
61 changes: 39 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,44 +143,61 @@ they belong next to the code. Add `.debug-agent/` to your `.gitignore`:
└── lock # liveness marker
```

## The `debug-agent` Skill
## The `debug-agent` Claude Code plugin

`skills/debug-agent/` contains a Claude / agent skill that teaches
evidence-first debugging on top of `dbga`. It includes:
`plugin/` is a [Claude Code plugin](https://docs.claude.com/en/docs/claude-code)
that bundles `dbga` with a full design → develop → debug → verify → clean-up
workflow for Python, Go, and Node/TypeScript:

- **`SKILL.md`** — when to trigger, decision tree, mindset
- **`references/workflow.md`** — the evidence-first loop
- **`references/log-monitoring.md`** — using `watch`
- **`references/localization.md`** — `localize` and `diagnose`
- **`references/instrumentation.md`** — reversible probes
- **`references/debugger.md`** — driving `session`
- **`references/vscode-collab.md`** — `--listen` + shared breakpoints
- **`references/advanced.md`** — hang / deadlock / concurrency / wolf-fence
- **Skills** (`/debug-agent:*`): `debug-agent` (the evidence-first `dbga` driver),
plus `python`, `go`, `node` development skills that route to language-specific
references on demand.
- **Agents** (`/agents`): `architect` (orchestrator) and `python-expert`,
`go-expert`, `node-expert`.
- **Command:** `/debug-agent:setup` — optional one-shot `dbga` installer.

### Install the skill
Full plugin docs: [`plugin/README.md`](plugin/README.md).

The recommended path is [`npx skills`](https://github.com/vercel-labs/skills),
the open agent-skills installer. It reads `SKILL.md` straight from the GitHub
repo and drops it into `~/.claude/skills/` (or your agent host's equivalent):
### Install — full plugin (recommended)

```sh
# Install just this skill
npx skills add niradler/dbga --skill debug-agent
claude plugin marketplace add niradler/dbga
/plugin install debug-agent@dbga
/debug-agent:setup # optional: installs the dbga CLI
```

### Install — a single skill

The [`skills`](https://github.com/vercel-labs/skills) CLI installs any one skill
standalone (skills only — agents/commands come with the full plugin). Resolution
is automatic via the repo-root marketplace manifest; no `--full-depth` needed:

# Or preview what's available first
npx skills add niradler/dbga --list
```sh
npx skills add niradler/dbga --skill python # or: go | node | debug-agent
npx skills add niradler/dbga --list # preview what's available
```

Manual install also works:
Manual install of just the debugger skill also works:

```sh
# Linux / macOS
cp -r skills/debug-agent ~/.claude/skills/
cp -r plugin/skills/debug-agent ~/.claude/skills/

# Windows PowerShell
Copy-Item -Recurse skills/debug-agent $env:USERPROFILE\.claude\skills\
Copy-Item -Recurse plugin/skills/debug-agent $env:USERPROFILE\.claude\skills\
```

### What the `debug-agent` skill covers

- **`SKILL.md`** — when to trigger, decision tree, mindset
- **`references/workflow.md`** — the evidence-first loop
- **`references/log-monitoring.md`** — using `watch`
- **`references/localization.md`** — `localize` and `diagnose`
- **`references/instrumentation.md`** — reversible probes
- **`references/debugger.md`** — driving `session`
- **`references/vscode-collab.md`** — `--listen` + shared breakpoints
- **`references/advanced.md`** — hang / deadlock / concurrency / wolf-fence

## Development

```sh
Expand Down
58 changes: 58 additions & 0 deletions docs/superpowers/evals/RESULTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Trigger-separation eval — results (2026-05-29)

Lean dev-aid eval per the plugin spec (a goal, not a ship gate). Harness:
skill-creator `scripts/run_eval.py`, which installs a skill's `description` as a
temp command and runs `claude -p <query>` to see whether the model invokes it.

Query pool: `trigger-queries.json` — 16 queries, 4 per skill intent
(python / go / node / debug-agent). For each skill the same pool is relabeled
`should_trigger = (intent == skill)`, so the other 12 act as cross-skill
near-miss negatives.

## Results (WSL/Linux, skills CLI harness, runs-per-query 1–3)

| Skill | Passed | Negatives (no mis-trigger) | Positives (auto-trigger ≥0.5) |
| --- | --- | --- | --- |
| debug-agent | 12/16 | 12/12 ✅ | ~1/4 |
| python | 12/16 | 12/12 ✅ | ~0/4 |
| go | 13/16 | 12/12 ✅ | ~1/4 |
| node | 12/16 | 12/12 ✅ | ~0/4 |

- **Cross-skill separation (the property that matters for a 4-skill plugin):
excellent and uniform.** Every skill stays quiet on the other three skills'
intents (12/12 negatives each). No mis-trigger observed anywhere.
- **Positive auto-trigger rate is uniformly low — a harness ceiling, not a
prompt defect.** Discriminating test: re-running `debug-agent` with a
deliberately punchy, imperative description ("Use this skill whenever…",
explicit trigger keywords, "Always use before guessing") produced **no lift**
(still ~1/4). A description-quality problem would vary by skill and respond to
a stronger trigger; instead the rate is flat across all skills and unresponsive
to description strength. The cause is methodology: `run_eval.py` injects each
skill as a `.claude/commands/` entry and measures whether one-shot `claude -p`
auto-invokes it — and one-shot non-interactive runs tend to just do the task
rather than auto-invoke a command. Real plugin-installed skills trigger via a
different path.

## Why `run_loop` auto-optimization was not run

`run_loop` maximizes positive trigger rate. The discriminating test shows that
rate is capped by the harness, not the description, so optimization would chase
a biased proxy and risk overfitting descriptions that are already triggers-only,
keyword-rich, independently reviewed, and behaviorally validated (see the
buggy-script baseline-vs-with-skill test). Decision: keep the reviewed
descriptions; rely on the clean separation result.

## Windows note (original blocker)

On native Windows the positive axis was entirely unmeasurable: `run_eval.py`
polls the `claude -p` subprocess **pipe** with `select.select()`, and Windows
`select` accepts only sockets → `WinError 10038`. WSL/Linux fixes this (Linux
`select` works on pipe fds). Rerun under WSL with:

```sh
PYTHONPATH=<skill-creator> uv run --no-project --with pyyaml python run_eval.py \
--eval-set <skill>.json --skill-path plugin/skills/<skill> --runs-per-query 3
```

(`--no-project` is required so `uv` does not try to repair the Windows-format
`.venv` over the `/mnt/c` mount.)
21 changes: 21 additions & 0 deletions docs/superpowers/evals/trigger-queries.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
[
{ "query": "Add type hints to this module and get it passing mypy --strict", "intent": "python" },
{ "query": "Write a Pythonic async FastAPI endpoint that fetches rows from Postgres", "intent": "python" },
{ "query": "My .py script raises AttributeError on a None value, refactor it cleanly", "intent": "python" },
{ "query": "Make this asyncio code stop blocking the event loop", "intent": "python" },

{ "query": "I have a data race detected by go test -race in my worker pool", "intent": "go" },
{ "query": "Wrap these errors with %w and check them with errors.Is in my Go service", "intent": "go" },
{ "query": "My goroutines deadlock: all goroutines are asleep - deadlock", "intent": "go" },
{ "query": "Use functional options for this Go struct constructor", "intent": "go" },

{ "query": "Fix this TypeScript TS2345 error, the argument is not assignable", "intent": "node" },
{ "query": "My Express handler has an unhandled promise rejection, make it robust", "intent": "node" },
{ "query": "Write an advanced conditional type that unwraps a Promise in TypeScript", "intent": "node" },
{ "query": "Cannot read properties of undefined in my Node EventEmitter code", "intent": "node" },

{ "query": "My program crashes with a traceback and I want to triage it to the failing frame", "intent": "debug-agent" },
{ "query": "Pause my script at a breakpoint and inspect the live value of a variable", "intent": "debug-agent" },
{ "query": "The process hangs and I need to see live runtime state, not guess from source", "intent": "debug-agent" },
{ "query": "Set a breakpoint and step to find where this value first goes wrong", "intent": "debug-agent" }
]
Loading
Loading