Skip to content

feat: add debug-pytest subcommand#2

Open
mvanhorn wants to merge 1 commit intoillscience:mainfrom
mvanhorn:feat/debug-pytest
Open

feat: add debug-pytest subcommand#2
mvanhorn wants to merge 1 commit intoillscience:mainfrom
mvanhorn:feat/debug-pytest

Conversation

@mvanhorn
Copy link
Copy Markdown

@mvanhorn mvanhorn commented May 9, 2026

Summary

vibe-debug debug-pytest <test_id> runs a pytest test under debugpy and stops at the failing assertion frame, returning locals + traceback in the same shape as debug-python. The path most agent debugging actually takes - "this test is failing, why?" - now works without needing the agent to know the file/line of the assertion ahead of time.

Why this matters

Today debug-python covers python script.py, but pytest's collection / fixture / conftest machinery doesn't fit through that surface. Two adjacent agent-debugger projects ship a pytest path with different shapes:

Microsoft's debug-gym paper (arXiv 2503.21557), which mcp-pdb cites as inspiration, used pytest as the harness for measuring agent debugger gains: Claude 3.7 Sonnet went from 37.2% on SWE-bench Lite without a debugger, to 48.4% with one, to 52.1% with debug(5). SWE-bench is mechanically "make this failing pytest pass." Without a pytest mode, vibe-debug isn't runnable against that benchmark.

Demo

debug-pytest demo

The included examples/test_buggy_discount.py asserts apply_discount(120.0, "gold") == 102.0 against the existing buggy examples/buggy_discount.py. debug-pytest runs it, stops at the AssertionError frame, and dumps:

Pytest: test_buggy_discount.py::test_gold
Outcome: failed
Exception: AssertionError: assert 119.85 == 102.0
Stopped: test_buggy_discount.py:8 in test_gold
Locals:
  actual = 119.85
  loyalty_level = 'gold'
  price = 120.0

The agent gets the full state at the failure point in one round trip - no need to read the traceback, infer where to break, and re-run.

Changes

  • cli.py: new debug-pytest subcommand parallel to debug-python with --pytest-arg, --break, --break-on-failure/--no-break-on-failure (default ON), --eval, --cwd, --python, --timeout, --locals-limit, --json. New _debug_pytest_payload mirrors _debug_python_payload's shape and adds a pytest block (rootdir, test_id, pytest_args, outcome, duration_seconds). New _continue_pytest_execution skips exception stops in _pytest, pluggy, site-packages, and anything outside the test's working tree so the debugger lands at the user's failing assertion, not pytest's internal exception bookkeeping.
  • session.py: new DebugSession.launch_pytest spawns python -m debugpy --listen --wait-for-client -m pytest <test_id> and attaches via DAPClient. New set_exception_breakpoints and exception_info wrap the corresponding DAP requests. The launcher uses the user's --python (or default) so pytest must be installed in the target's environment - vibe-debug does NOT add pytest to its own dependencies.
  • mcp_server.py: new debug_pytest MCP tool with the same input shape. debug_launch and debug_attach schemas gain optional exception_filters for symmetry with debug_pytest's breakpoint-on-failure behavior.
  • examples/test_buggy_discount.py: a 7-line failing pytest against the existing buggy_discount.py. Used by runtime_proof.py to exercise the new path.
  • tools/runtime_proof.py: extended to call debug_pytest and assert the failed AssertionError stop. The proved[] list grows from 14 to 15 entries.
  • tests/test_cli.py: new tests for debug-pytest happy path (failing test → AssertionError stop), parser validation, and --no-break-on-failure mode.

Testing

18 tests pass (python -m unittest discover -s tests -v). New tests cover:

  • tests/test_cli.py::test_debug_pytest_stops_at_failing_assertion - end-to-end run on the included example
  • tests/test_cli.py::test_debug_pytest_no_break_on_failure_runs_to_completion - opt-out behavior
  • tests/test_cli.py::test_debug_pytest_passes_through_pytest_args - --pytest-arg "-k pattern" shlex parsing

vibe-debug doctor passes. python tools/runtime_proof.py proves debug_pytest end-to-end.

Notes

This is the second cold PR in two days (the first is #1, --log for non-pausing logpoints). Both touch independent seams (logpoints adds to set_breakpoints; this adds a new subcommand and launch_pytest), so they can land in either order or independently.

The --break-on-failure machinery (set_exception_breakpoints + exception_info in session.py) is the same building block plan 019 (general --break-on-exception flag for debug-python) would use - this PR adds the infrastructure but only exposes it via pytest's failure mode. Generalizing it to debug-python --break-on-exception {raised,uncaught,all} would be a small follow-up if you want it.

Happy to convert this to a Discussion or close if the surface (subcommand vs flag, default break mode, pytest-not-as-dep) doesn't match how you'd want it shaped.

Run a pytest test under debugpy and stop at the failing assertion frame
with locals + traceback captured. Same flag surface as debug-python plus:

  vibe-debug debug-pytest tests/test_foo.py::test_bar
  vibe-debug debug-pytest tests/test_foo.py --pytest-arg '-k pattern'
  vibe-debug debug-pytest tests/test_foo.py --break tests/test_foo.py:42

--break-on-failure (default ON) installs an AssertionError exception
breakpoint via DAP setExceptionBreakpoints. A continuation loop skips
exceptions raised inside _pytest, pluggy, and site-packages so the
debugger lands at the user's failing assertion.

Pytest is NOT added as a runtime dependency. The launcher invokes
'python -m pytest' under debugpy in the target's venv (uses --python
or the existing default), so vibe-debug stays a debugger-of-anything,
not a pytest-aware tool.

Output payload extends debug-python's shape with a 'pytest' block
containing rootdir, test_id(s), pytest_args, outcome (passed/failed/
stopped), and duration_seconds. An 'exception' block with name +
message + stackTrace appears when the run stops on an exception.

DAP plumbing additions in session.py:
  - DebugSession.launch_pytest spawns 'python -m debugpy --listen
    --wait-for-client -m pytest <test_id>' and attaches via DAPClient
  - DebugSession.set_exception_breakpoints wraps DAP request
  - DebugSession.exception_info wraps DAP request and normalizes the
    response shape

MCP parity: new debug_pytest tool with the same input shape, plus
exception_filters added to debug_launch / debug_attach for symmetry.
runtime_proof exercises the new path so CI-style proof stays green.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant