feat: add debug-pytest subcommand#2
Open
mvanhorn wants to merge 1 commit intoillscience:mainfrom
Open
Conversation
Run a pytest test under debugpy and stop at the failing assertion frame
with locals + traceback captured. Same flag surface as debug-python plus:
vibe-debug debug-pytest tests/test_foo.py::test_bar
vibe-debug debug-pytest tests/test_foo.py --pytest-arg '-k pattern'
vibe-debug debug-pytest tests/test_foo.py --break tests/test_foo.py:42
--break-on-failure (default ON) installs an AssertionError exception
breakpoint via DAP setExceptionBreakpoints. A continuation loop skips
exceptions raised inside _pytest, pluggy, and site-packages so the
debugger lands at the user's failing assertion.
Pytest is NOT added as a runtime dependency. The launcher invokes
'python -m pytest' under debugpy in the target's venv (uses --python
or the existing default), so vibe-debug stays a debugger-of-anything,
not a pytest-aware tool.
Output payload extends debug-python's shape with a 'pytest' block
containing rootdir, test_id(s), pytest_args, outcome (passed/failed/
stopped), and duration_seconds. An 'exception' block with name +
message + stackTrace appears when the run stops on an exception.
DAP plumbing additions in session.py:
- DebugSession.launch_pytest spawns 'python -m debugpy --listen
--wait-for-client -m pytest <test_id>' and attaches via DAPClient
- DebugSession.set_exception_breakpoints wraps DAP request
- DebugSession.exception_info wraps DAP request and normalizes the
response shape
MCP parity: new debug_pytest tool with the same input shape, plus
exception_filters added to debug_launch / debug_attach for symmetry.
runtime_proof exercises the new path so CI-style proof stays green.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
vibe-debug debug-pytest <test_id>runs a pytest test under debugpy and stops at the failing assertion frame, returning locals + traceback in the same shape asdebug-python. The path most agent debugging actually takes - "this test is failing, why?" - now works without needing the agent to know the file/line of the assertion ahead of time.Why this matters
Today
debug-pythoncoverspython script.py, but pytest's collection / fixture / conftest machinery doesn't fit through that surface. Two adjacent agent-debugger projects ship a pytest path with different shapes:start_debug(file_path, use_pytest=True)as a parameter on its main MCP toolpytest JSONreports for machine-readable failuresMicrosoft's debug-gym paper (arXiv 2503.21557), which mcp-pdb cites as inspiration, used pytest as the harness for measuring agent debugger gains: Claude 3.7 Sonnet went from 37.2% on SWE-bench Lite without a debugger, to 48.4% with one, to 52.1% with
debug(5). SWE-bench is mechanically "make this failing pytest pass." Without a pytest mode, vibe-debug isn't runnable against that benchmark.Demo
The included
examples/test_buggy_discount.pyassertsapply_discount(120.0, "gold") == 102.0against the existing buggyexamples/buggy_discount.py.debug-pytestruns it, stops at the AssertionError frame, and dumps:The agent gets the full state at the failure point in one round trip - no need to read the traceback, infer where to break, and re-run.
Changes
cli.py: newdebug-pytestsubcommand parallel todebug-pythonwith--pytest-arg,--break,--break-on-failure/--no-break-on-failure(default ON),--eval,--cwd,--python,--timeout,--locals-limit,--json. New_debug_pytest_payloadmirrors_debug_python_payload's shape and adds apytestblock (rootdir, test_id, pytest_args, outcome, duration_seconds). New_continue_pytest_executionskips exception stops in_pytest, pluggy, site-packages, and anything outside the test's working tree so the debugger lands at the user's failing assertion, not pytest's internal exception bookkeeping.session.py: newDebugSession.launch_pytestspawnspython -m debugpy --listen --wait-for-client -m pytest <test_id>and attaches via DAPClient. Newset_exception_breakpointsandexception_infowrap the corresponding DAP requests. The launcher uses the user's--python(or default) so pytest must be installed in the target's environment - vibe-debug does NOT add pytest to its own dependencies.mcp_server.py: newdebug_pytestMCP tool with the same input shape.debug_launchanddebug_attachschemas gain optionalexception_filtersfor symmetry withdebug_pytest's breakpoint-on-failure behavior.examples/test_buggy_discount.py: a 7-line failing pytest against the existingbuggy_discount.py. Used byruntime_proof.pyto exercise the new path.tools/runtime_proof.py: extended to calldebug_pytestand assert the failed AssertionError stop. The proved[] list grows from 14 to 15 entries.tests/test_cli.py: new tests fordebug-pytesthappy path (failing test → AssertionError stop), parser validation, and--no-break-on-failuremode.Testing
18 tests pass (
python -m unittest discover -s tests -v). New tests cover:tests/test_cli.py::test_debug_pytest_stops_at_failing_assertion- end-to-end run on the included exampletests/test_cli.py::test_debug_pytest_no_break_on_failure_runs_to_completion- opt-out behaviortests/test_cli.py::test_debug_pytest_passes_through_pytest_args---pytest-arg "-k pattern"shlex parsingvibe-debug doctorpasses.python tools/runtime_proof.pyproves debug_pytest end-to-end.Notes
This is the second cold PR in two days (the first is #1,
--logfor non-pausing logpoints). Both touch independent seams (logpoints adds toset_breakpoints; this adds a new subcommand andlaunch_pytest), so they can land in either order or independently.The
--break-on-failuremachinery (set_exception_breakpoints+exception_infoin session.py) is the same building block plan 019 (general--break-on-exceptionflag fordebug-python) would use - this PR adds the infrastructure but only exposes it via pytest's failure mode. Generalizing it todebug-python --break-on-exception {raised,uncaught,all}would be a small follow-up if you want it.Happy to convert this to a Discussion or close if the surface (subcommand vs flag, default break mode, pytest-not-as-dep) doesn't match how you'd want it shaped.