ReProAgent: A Multi-Stage Tool-Augmented Agent Framework for Reproduction Test Generation from Issue Reports
ReProAgent is an automated bug reproduction framework for SWT-bench. It uses a multi-stage tool-augmented agents to generate bug reproduction test cases from bug reports.
The system employs a multi-stage agents architecture:
- Bug Localization via Hierarchical Analysis — Identifies suspicious code locations
- Root Cause Analysis via Execution Path — Analyzes underlying bug causes
- Assertion-aware Test Planning — Provides planning for test creation
- Test Generation via Triadic Review — Creates reproduction test cases and iteratively refine test via test execution results and test review results
.
├── src/ # Core source code
│ ├── main.py # Entry point for the reproduction pipeline
│ ├── config.py # LLM and agent configuration
│ ├── planner.py # Planner agent (LangGraph)
│ ├── generator.py # Generator agent (LangGraph)
│ ├── bug_localization_agent.py
│ ├── root_cause_analysis_agent.py
│ ├── test_generation_agent.py
│ ├── test_generation_hints_agent.py
│ ├── execution_review_agent.py
│ ├── context_manager.py # Context compression and management
│ ├── environment_service.py # Docker-based test execution
│ ├── docker_container_manager.py
│ ├── graph_retriever.py # Knowledge Graph query interface
│ ├── knowledge_graph/ # Code Knowledge Graph system
│ ├── tools/ # LLM-accessible tools (grep, read_file, etc.)
│ ├── swtbench/ # SWT-bench harness integration
├── tests/ # Test suite
├── pyproject.toml # Project dependencies (uv)
├── pytest.ini # Pytest configuration
└── .gitignore
- Python 3.10+
- uv — Python package manager
- Docker — For containerized test execution
- (Optional) Neo4j — For knowledge graph storage (falls back to in-memory store)
-
Clone the repository and navigate to the project directory.
-
Install dependencies with uv:
uv sync
-
Activate the virtual environment:
source .venv/bin/activate -
Configure environment variables in
src/config.pyor via shell exports:export OPENAI_API_BASE="https://api.openai.com/v1" export OPENAI_API_KEY="sk-..."
Run the end-to-end bug reproduction pipeline on SWT-bench Lite:
python -m src.mainRun on SWT-bench Verified:
python -m src.main --dataset-name princeton-nlp/SWT-bench_Verified --dataset-type verifiedProcess specific instances by ID:
python -m src.main --instance-ids sympy__sympy-24152 django__django-17084Process a range:
python -m src.main --start 0 --end 10Process specific repositories:
python -m src.main --repos sympy/sympy django/djangoDisable specific components:
# Disable bug localization
python -m src.main --disable-bug-localization
# Disable root cause analysis
python -m src.main --disable-root-cause-analysis
# Disable test generation hints
python -m src.main --disable-test-hints-generation
# Disable feedback iteration
python -m src.main --disable-feedback-iteration
# Multiple flags can be combined
python -m src.main --disable-bug-localization --disable-root-cause-analysisAfter generating predictions, evaluate them using the SWT-bench harness:
python -m src.swtbench.run_evaluation \
--dataset_name princeton-nlp/SWT-bench_Lite \
--predictions_path outputs_lite/repro_agent_preds_gpt_5_mini_lite.jsonl \
--max_workers 4 \
--run_id repro_agent_evalpytestKey settings in src/config.py:
| Setting | Description |
|---|---|
OPENAI_API_BASE / OPENAI_API_KEY |
LLM endpoint credentials |
MAX_*_COUNT |
Iteration limits for each agent stage (default: 50) |
ENABLE_STREAMING |
Toggle LLM streaming responses |
RECURSION_LIMIT |
LangGraph recursion limit (default: 1024) |
Pre-computed evaluation results are included in this repository:
| Dataset | Report |
|---|---|
| SWT-bench Lite | outputs_gpt_5_mini/ReProAgent.repro_agent_gpt_5_mini_lite.json |
| SWT-bench Verified | outputs_gpt_5_mini/ReProAgent.repro_agent_gpt_5_mini_verified.json |
Prediction files for re-evaluation:
| Dataset | Predictions |
|---|---|
| SWT-bench Lite | outputs_gpt_5_mini/repro_agent_preds_gpt_5_mini_lite.jsonl |
| SWT-bench Verified | outputs_gpt_5_mini/repro_agent_preds_gpt_5_mini_verified.jsonl |