ReProAgent: A Multi-Stage Tool-Augmented Agent Framework for Reproduction Test Generation from Issue Reports

ReProAgent is an automated bug reproduction framework for SWT-bench. It uses a multi-stage tool-augmented agents to generate bug reproduction test cases from bug reports.

Overview

The system employs a multi-stage agents architecture:

Bug Localization via Hierarchical Analysis — Identifies suspicious code locations
Root Cause Analysis via Execution Path — Analyzes underlying bug causes
Assertion-aware Test Planning — Provides planning for test creation
Test Generation via Triadic Review — Creates reproduction test cases and iteratively refine test via test execution results and test review results

Project Structure

.
├── src/                          # Core source code
│   ├── main.py                   # Entry point for the reproduction pipeline
│   ├── config.py                 # LLM and agent configuration
│   ├── planner.py                # Planner agent (LangGraph)
│   ├── generator.py              # Generator agent (LangGraph)
│   ├── bug_localization_agent.py
│   ├── root_cause_analysis_agent.py
│   ├── test_generation_agent.py
│   ├── test_generation_hints_agent.py
│   ├── execution_review_agent.py
│   ├── context_manager.py        # Context compression and management
│   ├── environment_service.py    # Docker-based test execution
│   ├── docker_container_manager.py
│   ├── graph_retriever.py        # Knowledge Graph query interface
│   ├── knowledge_graph/          # Code Knowledge Graph system
│   ├── tools/                    # LLM-accessible tools (grep, read_file, etc.)
│   ├── swtbench/                 # SWT-bench harness integration
├── tests/                        # Test suite
├── pyproject.toml                # Project dependencies (uv)
├── pytest.ini                    # Pytest configuration
└── .gitignore

Prerequisites

Python 3.10+
uv — Python package manager
Docker — For containerized test execution
(Optional) Neo4j — For knowledge graph storage (falls back to in-memory store)

Installation

Clone the repository and navigate to the project directory.
Install dependencies with uv:
```
uv sync
```
Activate the virtual environment:
```
source .venv/bin/activate
```

Configure environment variables in src/config.py or via shell exports:

export OPENAI_API_BASE="https://api.openai.com/v1"
export OPENAI_API_KEY="sk-..."

Running the Pipeline

Full Pipeline (Default)

Run the end-to-end bug reproduction pipeline on SWT-bench Lite:

python -m src.main

Dataset Selection

Run on SWT-bench Verified:

python -m src.main --dataset-name princeton-nlp/SWT-bench_Verified --dataset-type verified

Instance Filtering

Process specific instances by ID:

python -m src.main --instance-ids sympy__sympy-24152 django__django-17084

Process a range:

python -m src.main --start 0 --end 10

Process specific repositories:

python -m src.main --repos sympy/sympy django/django

Ablation Studies

Disable specific components:

# Disable bug localization
python -m src.main --disable-bug-localization

# Disable root cause analysis
python -m src.main --disable-root-cause-analysis

# Disable test generation hints
python -m src.main --disable-test-hints-generation

# Disable feedback iteration
python -m src.main --disable-feedback-iteration

# Multiple flags can be combined
python -m src.main --disable-bug-localization --disable-root-cause-analysis

Evaluation

After generating predictions, evaluate them using the SWT-bench harness:

python -m src.swtbench.run_evaluation \
    --dataset_name princeton-nlp/SWT-bench_Lite \
    --predictions_path outputs_lite/repro_agent_preds_gpt_5_mini_lite.jsonl \
    --max_workers 4 \
    --run_id repro_agent_eval

Running Tests

pytest

Configuration

Key settings in src/config.py:

Setting	Description
`OPENAI_API_BASE` / `OPENAI_API_KEY`	LLM endpoint credentials
`MAX_*_COUNT`	Iteration limits for each agent stage (default: 50)
`ENABLE_STREAMING`	Toggle LLM streaming responses
`RECURSION_LIMIT`	LangGraph recursion limit (default: 1024)

Key Results

Pre-computed evaluation results are included in this repository:

Dataset	Report
SWT-bench Lite	`outputs_gpt_5_mini/ReProAgent.repro_agent_gpt_5_mini_lite.json`
SWT-bench Verified	`outputs_gpt_5_mini/ReProAgent.repro_agent_gpt_5_mini_verified.json`

Prediction files for re-evaluation:

Dataset	Predictions
SWT-bench Lite	`outputs_gpt_5_mini/repro_agent_preds_gpt_5_mini_lite.jsonl`
SWT-bench Verified	`outputs_gpt_5_mini/repro_agent_preds_gpt_5_mini_verified.jsonl`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ReProAgent: A Multi-Stage Tool-Augmented Agent Framework for Reproduction Test Generation from Issue Reports

Overview

Project Structure

Prerequisites

Installation

Running the Pipeline

Full Pipeline (Default)

Dataset Selection

Instance Filtering

Ablation Studies

Evaluation

Running Tests

Configuration

Key Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
outputs_gpt_5_mini		outputs_gpt_5_mini
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

ReProAgent: A Multi-Stage Tool-Augmented Agent Framework for Reproduction Test Generation from Issue Reports

Overview

Project Structure

Prerequisites

Installation

Running the Pipeline

Full Pipeline (Default)

Dataset Selection

Instance Filtering

Ablation Studies

Evaluation

Running Tests

Configuration

Key Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages