Skip to content

feat: multi-language debugging — Python + Go + Node.js via Adapter abstraction#5

Merged
niradler merged 10 commits into
mainfrom
feat/multi-language-debug
May 29, 2026
Merged

feat: multi-language debugging — Python + Go + Node.js via Adapter abstraction#5
niradler merged 10 commits into
mainfrom
feat/multi-language-debug

Conversation

@niradler
Copy link
Copy Markdown
Owner

Summary

Single consolidated PR adding multi-language debugging to dbga — Python (existing, refactored), Go (dlv dap), and Node.js / TypeScript (vscode-js-debug). Supersedes the stacked PRs #2 / #3 / #4; all their commits plus the post-review fixes are here on one branch against main.

What's in it

1. Language Adapter abstraction (adapters/base.py + registry)
Every language-specific decision — DAP-server spawn, launch payload, traceback parser, interpreter peeling, IDE-attach, instrument templates — lives behind an Adapter ABC. Adding a language = subclass + register. --lang flag on session start / localize / diagnose, auto-detected from file extension.

2. PythonAdapter — the original debugpy code lifted behind the interface, behavior-identical (verified: all pre-existing tests pass unchanged).

3. GoAdapter (dlv dap) — mode:"debug" launch, panic + fatal-error stack parser (method receivers, runtime-frame classification, oldest-first ordering), go run peeling, clear missing-dlv error.

4. NodeAdapter (vscode-js-debug) — discovery across $DBGA_JS_DEBUG_SERVER / VS Code・Cursor・Insiders extension dirs / manual install; V8 stack parser; node/ts-node/tsx peeling with -r handling.

5. DAP reverse-request + child-session support (dap_client.py + dap_session.py)
vscode-js-debug delegates every launched program to a child DAP session via a startDebugging reverse-request. DapClient now routes server→client requests to registered handlers; DapSession opens + tracks child connections, multiplexes their events, and routes ops to the active session. This is the change that makes Node fully live, and it's thread-safe (lock-guarded; see review below).

Review (parallel subagents, 3 languages)

Three reviewers ran in parallel over the Python path, the Go adapter, and the Node/DAP core. They found 2 blockers, both fixed in this branch (commit 7e71480):

  • Data race_child_clients / _active_client mutated on the reader thread, read unlocked on the main thread → could leak a child connection or resurrect a torn-down session. Fixed with _clients_lock + snapshot iteration + a released-state guard in the reverse-handler.
  • Go method-receiver truncation(*Server).Handle parsed as github.com/x/y. (broke panics inside any method). Fixed the func regex + regression fixture.

Also fixed: js-debug extension version sort (1.10.0 now beats 1.9.0), an over-claim in the Node docstring (worker/child_process is now honestly scoped as future work), and a fabricated code comment.

Test plan

  • uv run pytest -v161 passed, 0 xfailed, 0 failures
  • Real-adapter integration tests green on Windows: debugpy, dlv dap (delve), vscode-js-debug v1.117.0 (full launch → stopOnEntry → continue → terminated)
  • New regression tests: Go method-receiver parsing, js-debug version sort, DAP reverse-request routing (4 isolated socket-pair tests)
  • ruff check + ruff format --check + mypy --strict src all clean (30 src files)
  • Integration tests auto-skip when a toolchain (dlv, node+js-debug) is absent, so CI stays green on Python-only images

Known scope (honest)

  • Node: single launched process is the validated path. Worker-thread / child_process sub-sessions attach (recursive handler), but wait_for_stop ends the session on the first child exit — full multi-process lifecycle is future work, documented in the NodeAdapter docstring.
  • Go: go test debugging out of scope (needs mode:"test").
  • instrument probe templates remain Python-flavored (passthrough for other langs).

Supersedes

Closes the stacked PRs #2, #3, #4 — same commits, consolidated here.

niradler added 5 commits May 29, 2026 03:01
…fic code behind it

Until now every language-specific decision (spawn debugpy.adapter, build the
`{"type": "python", ...}` launch payload, parse Python tracebacks, peel a
`python foo.py` interpreter prefix in `diagnose`) was inlined in the core /
command layer. Adding a second language would have required surgery in five
or six files.

This change introduces `adapters.base.Adapter` — an ABC that captures every
language-specific knob the rest of the codebase needs:

  * `spawn_adapter(port)` — start the language's DAP server
  * `launch_payload(...)` — build the DAP `launch` request body
  * `parse_traceback(text)` — language-specific stack/panic parser
  * `spawn_listen_mode(...)` + `supports_listen_mode()` — IDE attach (optional)
  * `attach_url(host, port)` — the URL scheme an IDE should use
  * `resolve_launch_target(cmd)` — peel an interpreter prefix for `diagnose`
  * `probe_template(kind, code)` — hook for future `instrument` defaults

`PythonAdapter` ports every existing debugpy code path behind this interface
with no behavior change. The registry in `adapters/__init__.py` exposes
`get_adapter`, `list_adapters`, `detect_language` (by file extension), and
`resolve_language` (explicit > detected > default).

CLI surface:
  * `session start --lang {python,...}` — auto-detected from script extension
    when omitted; persisted to `meta.json` so the daemon picks the right
    adapter on (re)start.
  * `localize --lang {python,...}` — picks the traceback parser.
  * `diagnose --lang {python,...}` — drives both traceback parsing and the
    `python foo.py` → `foo.py` launch-target peeling; auto-inferred from the
    command (interpreter basename or first script-like argument).

Internal moves:
  * `adapters/debugpy_adapter.py` (module) → `adapters/python.py` (class) +
    `adapters/_socket.py` (truly generic `find_free_port` /
    `wait_until_listening`, reusable by future Go/Node adapters).
  * `core/dap_session.py::DapSession` now takes an `Adapter` (defaults to
    PythonAdapter for backwards compatibility). It calls into the adapter
    instead of importing debugpy helpers directly.
  * `core/session_proc.py` reads `meta["lang"]` (defaults to "python" for
    pre-refactor meta files) and constructs the matching adapter.

Testing:
  * 11 new unit tests in `tests/unit/test_adapters.py` cover registry,
    detection, listen-mode flag, launch-payload shape, interpreter peeling,
    and traceback parsing through the adapter.
  * All existing 102 tests (unit + integration + e2e) pass unchanged — the
    Python behavior is byte-identical.
  * `ruff check`, `ruff format`, `mypy --strict src` all clean.

This is the first of a 3-PR stack. PR2 adds GoAdapter (delve `dlv dap`).
PR3 adds NodeAdapter (vscode-js-debug `dapDebugServer.js`).
Second PR in the multi-language stack. Stacks on top of #2 (Adapter ABC
refactor); merge that first.

GoAdapter implements the Adapter contract for Go programs:

  * spawn `dlv dap --listen=127.0.0.1:<port>` as the DAP server, with a
    clear "delve not installed" error pointing at the `go install` command
    when `dlv` isn't on PATH (no cryptic ENOENT).
  * launch payload uses `mode: "debug"` so delve compiles + runs in one
    step from a `.go` file or package directory.
  * `spawn_listen_mode` for IDE attach (VS Code Go extension dap-mode).
  * `parse_traceback` understands both `panic:` and `fatal error:` dumps,
    including extended runtime frames with `+0xN fp=0x... sp=0x... pc=0x...`
    annotations. Frames stored oldest-first (matches Python convention) so
    the shared `deepest_user_frame` heuristic lands on the panic site.
  * runtime / sync / reflect / internal frames are marked `is_user_code=False`
    so the deepest-user heuristic skips runtime scaffolding when reporting
    crash locations.
  * `resolve_launch_target` peels `go run [-flags] <main.go> args...` into
    `(main.go, args)` for `dbga diagnose`. `go test` is out of scope (would
    need `mode: "test"`); it returns None and surfaces the crash without
    rerun, matching the existing Python `-m` behavior.

CLI surface:

  $ dbga session start --break-at main.go:12 -- main.go
  $ dbga diagnose --timeout 30 -- go run main.go
  $ dbga localize --lang go --file panic.txt

Language is auto-detected from the script extension; `--lang go` forces it.

Test plan:

  * 14 new unit tests (`tests/unit/test_go_adapter.py`) covering registry,
    detection, listen-mode flag, launch-payload shape, `go run` peeling,
    panic + fatal-error parsing, and the missing-dlv error path.
  * 1 new integration test (`tests/integration/test_go_session.py`)
    drives real `dlv dap`: initialize / launch / stopOnEntry / continue /
    terminated. Auto-skips when `dlv` or `go` isn't on PATH so the suite
    stays green on Python-only machines.
  * `__debug_bin*` + `*.test` added to .gitignore — `dlv dap` leaves the
    compiled debug binary in cwd.

Local validation:
  - 76 unit tests pass (61 prior + 14 new + 1 misc).
  - 8 integration tests pass (7 Python + 1 Go) — driven against real
    delve 1.22+ on Windows.
  - 45 e2e tests pass unchanged.
  - ruff check + ruff format + mypy --strict all clean.

Out of scope for this PR (deferred to future work):
  - `go test ./...` debugging (needs DAP `mode: "test"` + `--test.run` plumbing).
  - `dbga instrument` probe templates for Go (`fmt.Println` defaults).
Third PR in the multi-language stack. Stacks on top of #3 (GoAdapter);
merge that first.

NodeAdapter scaffolds Node.js / TypeScript debugging via Microsoft's
vscode-js-debug (the same DAP server VS Code itself uses). Status: alpha.

Status — what works:
  * Discovery: `find_dap_server()` locates `dapDebugServer.js` from
    $DBGA_JS_DEBUG_SERVER, then VS Code / Cursor / Insiders extension
    dirs, then a manual `~/.local/share/js-debug/` install. Errors with
    a clear install hint (GitHub releases URL) when nothing is found.
    Verified end-to-end against vscode-js-debug v1.117.0 on Windows.
  * Spawn + handshake: `node dapDebugServer.js <port> 127.0.0.1` accepts
    our DAP `initialize` — `test_node_dap_initialize` passes against the
    real adapter.
  * V8 stack-trace parser: handles named + anonymous frames, node:internal
    + node_modules library detection, oldest-first frame ordering so the
    shared `deepest_user_frame` heuristic lands on the failure site.
    13 fixture-driven cases pass against real Node 20 / 26 traces.
  * `resolve_launch_target` peels `node [-flags] script args`, `ts-node`,
    `tsx`; correctly consumes the `-r module` / `--require module` pair.
  * `--lang node` flag plumbed through `session start`, `localize`,
    `diagnose`. Auto-detection covers `.js .mjs .cjs .ts .mts .cts`.

Status — known blocker (intentional alpha scope):
  vscode-js-debug delegates the actual launched program to a *child* DAP
  session via a reverse `startDebugging` request. Our DapClient currently
  drops all server-to-client requests (see `dap_client.py::_dispatch`),
  so the child is never created and `stopped` never arrives. The full
  launch flow test (`test_node_dap_launch_stops`) is marked `xfail strict`
  and will start passing automatically once DapClient gains reverse-
  request + child-session handling. This is the documented follow-up
  scope — not in this PR.

Test plan:
  * 24 new unit tests in `tests/unit/test_node_adapter.py` covering
    registry, extension detection, launch-payload shape, listen-mode
    flag, V8 parser fixtures (TypeError + ReferenceError + node_modules
    + anonymous frames), missing-node hint, and the env-var discovery
    override path.
  * 1 new integration test pair:
      - test_node_dap_initialize — PASSES against real js-debug.
      - test_node_dap_launch_stops — xfail strict; tracks the reverse-
        request blocker.
    Both auto-skip when node + js-debug aren't both discoverable.
  * Full suite: 152 passed + 1 xfailed locally (61 Python unit +
    14 Go unit + 24 Node unit + 8 misc unit, 9 integration including
    Go + Node initialize, 45 e2e).
  * ruff check + ruff format + mypy --strict all clean (30 src files).

README updated with the language-toolchain install matrix.

Out of scope (deferred to a follow-up PR):
  * DapClient reverse-request handling (`startDebugging` etc.) — the
    one change that promotes NodeAdapter from "handshake works" to
    "full live session". Unblocks worker-thread + child_process attach
    as a bonus.
  * `dbga instrument` probe templates for JS (`console.log` defaults).
Promotes the Node adapter from "alpha (handshake only)" to a full,
live-debugger experience by teaching DapClient and DapSession to handle
DAP server-to-client requests — specifically vscode-js-debug's
`startDebugging`, which delegates every launched program to a fresh
child DAP session.

What this PR adds on top of the previous NodeAdapter scaffold:

DapClient — server-to-client requests
-------------------------------------
* `register_reverse_handler(command, handler)`: register a callable that
  runs when the DAP server sends `type: "request"`. The handler returns
  the response body (or `None` for empty); raising surfaces as a DAP
  `success: false` response with the error message.
* `_dispatch` now routes `type == "request"` through the handler map.
  Unknown commands respond with `"not supported"` so the server isn't
  left waiting.
* `_send_response` emits a properly-framed DAP response with a fresh
  client-side `seq` and the server's `request_seq` echoed back.

DapSession — child-session orchestration
----------------------------------------
* Tracks adapter host/port and a `_child_clients` list. The `start()`
  path registers `startDebugging` so every adapter that delegates
  (currently only vscode-js-debug) gets transparent child-session
  support — Python/Go never send the request so the handler stays
  dormant for them.
* `_on_start_debugging` opens a fresh TCP connection to the same DAP
  server, runs the full DAP handshake on it (initialize / launch (or
  attach) / configurationDone) using whatever configuration the parent
  passed in, registers the handler recursively (workers / child_process
  nest deeper), and appends the new client to `_child_clients`. The
  handler runs on the parent's reader thread; it MUST only do I/O on
  the child connection it just opened to avoid reader-thread deadlock.
* `_active_client` is the client that owns the live debuggee. It starts
  as the parent and gets promoted to the child that just emitted
  `stopped`, so `continue_` / `step` / `evaluate` / `set_breakpoints`
  all route to the right place via `_require_client`.
* `wait_for_stop` now round-robin-polls the parent and every child
  client via `_poll_any_client`. Terminal events drain across all live
  clients.
* `release` disconnects child clients before the parent so they don't
  leak; the parent tree-kill remains the unconditional fallback.

Node.js: alpha → fully live
---------------------------
* NodeAdapter docstring updated — drops the "handshake-only" caveat.
  TypeScript via ts-node/tsx, plus worker threads and `child_process`
  children, all flow through the nested-session mechanism (handler is
  registered recursively on child clients).
* `tests/integration/test_node_session.py::test_node_dap_launch_stops`
  drops `@xfail`. Now goes through `DapSession.start()` (not raw
  DapClient) so the handler fires. Verified end-to-end against real
  vscode-js-debug v1.117.0 on Windows: launch → stopOnEntry → continue
  → terminated, ~6 seconds.

Tests
-----
* 4 new unit tests in `tests/unit/test_dap_reverse_requests.py` cover
  the DapClient routing in isolation (no debugger spawn): unknown
  reverse request → `success: false`; handler return value → response
  body; handler exception → `success: false` + message; response seq
  is distinct from request_seq.
* Full suite: 158 passed locally (previously 152 + 1 xfailed). 0
  failures, 0 xfails. ruff + ruff format + mypy --strict all clean.

CLAUDE.md updated to describe the reverse-request mechanism alongside
the rest of the DAP plumbing.

Sources / spec references this implementation followed:
  * DAP spec `startDebugging` reverse request:
    https://microsoft.github.io/debug-adapter-protocol/specification#Reverse_Requests_StartDebugging
  * vscode-js-debug's child-session model is the same one VS Code's
    debug-adapter client implements; this PR mirrors that contract.
…ames, js-debug version sort

Fixes from the consolidated three-language review (parallel subagent
review of the Python path, Go adapter, and Node/DAP core).

BLOCKER — DapSession child-session data race (core/dap_session.py)
  ``_on_start_debugging`` runs on the parent client's reader thread and
  mutated ``_child_clients`` / ``_active_client`` while the main thread
  read them in ``_poll_any_client`` / ``release`` / ``_require_client``.
  Concrete hazards: a ``startDebugging`` racing ``release`` could slip a
  child past the teardown loop (leaked DAP connection + debuggee) or
  resurrect a just-cleared list. Now guarded by ``_clients_lock``:
    * all reads/writes of both fields take the lock;
    * ``release`` snapshots-and-clears under the lock, then disconnects;
    * ``_on_start_debugging`` checks ``_state`` under the lock when
      publishing the child and disconnects it instead of resurrecting a
      released session.

BLOCKER — Go method-receiver func names truncated (adapters/go.py)
  ``_FUNC_RE``'s non-greedy func group stopped at the first ``(``, so a
  pointer-receiver frame ``github.com/x/y.(*Server).Handle(0x..)`` parsed
  as func ``github.com/x/y.`` — garbage for any panic inside a method
  (the common case). Switched to a greedy ``\S+`` that backtracks to the
  final argument-parens, keeping embedded ``(*Server)`` in the name.
  Regression test + ``go_method_panic.txt`` fixture added.

SHOULD-FIX — js-debug extension version sort (adapters/node.py)
  ``_latest_js_debug_extension`` sorted dirs lexicographically, so
  ``1.9.0`` beat ``1.10.0`` (``'9' > '1'``). Added ``_extension_version_key``
  parsing the trailing semver into an int tuple. Two unit tests added.

Honesty/scope fixes (adapters/node.py)
  * NodeAdapter docstring no longer claims worker-thread / child_process
    debugging "works"; it states the validated path is single-process and
    that multi-process lifecycle (surviving the first child exit) is
    future work — matching what wait_for_stop actually does today.
  * Corrected the fabricated "Preview Wildcat-Analytics" gloss on the
    ``pwa-node`` type id to its real Progressive-Web-App origin.

Validation: 161 passed (158 + 3 new regression tests), ruff + ruff
format + mypy --strict all clean. Real-adapter integration tests
(debugpy, dlv dap, vscode-js-debug v1.117.0) all green on Windows.
niradler added 5 commits May 29, 2026 13:39
… real-user-flow tests

Live-debugging all three languages as a user would (session start
--break-at → eval → continue → release) surfaced two Node-only bugs that
the existing tests missed because the Node integration test only did
launch → stopOnEntry → continue → terminate — it never set a launch
breakpoint or evaluated at a stop.

BUG 1 — launch-time breakpoints never bound (Node)
  vscode-js-debug runs the launched program in a CHILD DAP session, but
  DapSession.start set breakpoints on the PARENT connection before the
  child existed, so they came back "unresolved" and the program ran to
  completion. `session start --break-at app.js:N` returned status
  "terminated"; `diagnose --lang node` rerun never stopped.
  Fix: Adapter gains `delegates_launch_to_child` (True only for Node).
  When set, DapSession defers launch breakpoints and replays them on the
  child during its handshake in `_on_start_debugging` (after `initialized`,
  before `configurationDone`). Single-connection adapters (debugpy, dlv)
  are unchanged.

BUG 2 — eval/inspect failed at a child-session breakpoint (Node)
  `session_proc._handle_eval` and `_build_inspect_context` resolved the
  stopped frame from `session.client` (the PARENT), which has no stopped
  thread for a js-debug child stop — so no frameId reached the child and
  js-debug returned "evaluate: request failed".
  Fix: added `DapSession.active_client` (parent for Python/Go, the stopped
  child for Node) and routed eval frame-resolution + inspect through it.

Why the tests didn't catch it / coverage added (real user flows):
  * tests/e2e/test_cli_session_node.py — start --break-at → assert stopped
    at the line → eval a local (asserts the array + the int) → continue →
    release. Drives the full CLI + daemon + child-session path. This is the
    exact flow whose live run previously returned "terminated" and
    "evaluate: request failed".
  * tests/e2e/test_cli_session_go.py — the same flow for Go (dlv), so
    breakpoint+eval coverage is symmetric across Python (pre-existing
    test_cli_session_ops), Go, and Node.
  * unit guards (no toolchain needed): only Node delegates_launch_to_child;
    active_client defaults to parent then follows the published child.

Verified live on Windows after the fix:
  - Node: `session start --break-at buggy.js:3` → stopped (breakpoint),
    `eval nums` → "(3) [10, 20, 30]", `eval total` → 60; `diagnose -- node
    buggy.js` rerun now stops at the deepest frame (line 10).
  - Python + Go: unchanged, full flows still pass.

Suite: 164 passed + 1 known debugpy adapter-spawn flake (passes in
isolation; documented thread-init race). ruff + mypy --strict clean.
Rewrote the in-repo `skills/debug-agent/` skill to cover Go and Node.js
alongside Python. Every command, flag, and output in the skill was
verified by running `dbga` LIVE against real programs in all three
languages (debugpy, dlv dap, vscode-js-debug) — not against source code.
Authored via subagents loading the skill-creator skill, constrained to a
live-evidence corpus as the sole source of truth.

SKILL.md:
  * description + title broadened to Python · Go · Node.js/TypeScript.
  * New Languages table: per-language toolchain prerequisite, install
    command, and auto-detected extensions; `--lang` + extension
    auto-detection explained.
  * diagnose / localize / session examples for all three languages with
    the exact observed outputs (error types, deepest frames, eval results).
  * Honest Limits section: Node validated path is single-process; eval is
    language-native (three distinct value formattings shown); diagnose
    reuses the `default` session name.
  * Stripped command syntax that wasn't live-verified this pass
    (run/watch/instrument/step/--listen) down to reference pointers rather
    than asserting unverified invocations.

references/:
  * localization.md — Go panic / Node V8 / Python traceback examples with
    real localize+diagnose outcomes; diagnose session_exists note.
  * debugger.md — same-flow-three-languages eval block; the SAME `nums`
    array shown printing three ways (Python `[10, 20, 30]`, Go
    `[]int len: 3, cap: 3, [10,20,30]`, Node `(3) [10, 20, 30]`).
  * vscode-collab.md — removed fabricated `--listen` attach-URL / launch.json
    examples (never exercised live); replaced with an explicit
    "not yet captured live" caveat + accurate prerequisites only.
  * instrumentation.md — probes are Python-flavored; insert/snapshot/revert
    is language-agnostic text editing.
  * workflow.md / advanced.md — minor factual notes.

Evidence corpus lives under tmp/ (gitignored) and is not committed.
Windows CI flaked on `test_session_continue_to_termination` with
"python DAP adapter exited with code 0 before listening ... Exception in
thread Thread-1 (accept_worker)". This is debugpy's known adapter-startup
race (the same thread-init race CLAUDE.md documents for
CREATE_NEW_PROCESS_GROUP): under back-to-back launches the adapter's
accept_worker thread crashes on init, so the adapter exits before it ever
listens. Pre-existing flake, but this PR's extra adapter-spawning tests
made it surface reliably on Windows CI.

Fix (not suppression): `open_adapter_connection` spawns the adapter and
connects with a bounded retry — on "exited before listening" / timeout it
tree-kills the corpse and respawns on a fresh port, up to 3 attempts.
`DapSession.start` now uses it. Benefits all adapters (debugpy, dlv,
js-debug) equally; single transient startup crash no longer fails a
session.

TDD: tests/unit/test_adapter_connect_retry.py drives the orchestration
through a clean seam (fake adapter + monkeypatched find_free_port /
wait_until_listening / kill_tree) — retry-then-succeed, exhaust-then-raise,
and happy-path-no-kill. Written failing first, then implemented to green.

Verified: full suite 168 passed; the integration session suite run 5×
back-to-back is clean (previously intermittent). ruff + mypy clean.
`test_session_start_listen_returns_attach_url` intermittently failed on
Ubuntu CI (`_port_listening(...) == False`). Root cause: the test opened a
SECOND TCP connection to a debugpy `--listen --wait-for-client` listener to
re-confirm it was up — but that listener accepts a single client, so a
throwaway probe can perturb it, and the check was redundant anyway:
`_spawn_listen_mode` already gates `status: listening` on the port
accepting (it waits up to 10s before returning). Replaced the racy
re-probe with a listener-process liveness assertion (`is_pid_alive`),
which—together with the asserted contract fields (status / attach_url /
port / pid)—verifies a usable attach endpoint without a second connect.

No production change; behavior of `session start --listen` is unchanged.
Listen test now passes deterministically (3× local).
Drop the speculative global ~/.debug-agent/ path from the README — breakpoints and source snapshots reference files in the repo, so state stays project-local. Document adding .debug-agent/ to .gitignore.

In the skill, tighten Cleanup to surface `dbga sessions ls` (lists live daemons, reaps dead-pid zombies) and the gitignore note.
@niradler niradler merged commit e06f548 into main May 29, 2026
4 checks passed
@niradler niradler deleted the feat/multi-language-debug branch May 29, 2026 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant