Skip to content

iSEngLab/ReProAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ReProAgent: A Multi-Stage Tool-Augmented Agent Framework for Reproduction Test Generation from Issue Reports

ReProAgent is an automated bug reproduction framework for SWT-bench. It uses a multi-stage tool-augmented agents to generate bug reproduction test cases from bug reports.

Overview

The system employs a multi-stage agents architecture:

  • Bug Localization via Hierarchical Analysis — Identifies suspicious code locations
  • Root Cause Analysis via Execution Path — Analyzes underlying bug causes
  • Assertion-aware Test Planning — Provides planning for test creation
  • Test Generation via Triadic Review — Creates reproduction test cases and iteratively refine test via test execution results and test review results

Project Structure

.
├── src/                          # Core source code
│   ├── main.py                   # Entry point for the reproduction pipeline
│   ├── config.py                 # LLM and agent configuration
│   ├── planner.py                # Planner agent (LangGraph)
│   ├── generator.py              # Generator agent (LangGraph)
│   ├── bug_localization_agent.py
│   ├── root_cause_analysis_agent.py
│   ├── test_generation_agent.py
│   ├── test_generation_hints_agent.py
│   ├── execution_review_agent.py
│   ├── context_manager.py        # Context compression and management
│   ├── environment_service.py    # Docker-based test execution
│   ├── docker_container_manager.py
│   ├── graph_retriever.py        # Knowledge Graph query interface
│   ├── knowledge_graph/          # Code Knowledge Graph system
│   ├── tools/                    # LLM-accessible tools (grep, read_file, etc.)
│   ├── swtbench/                 # SWT-bench harness integration
├── tests/                        # Test suite
├── pyproject.toml                # Project dependencies (uv)
├── pytest.ini                    # Pytest configuration
└── .gitignore

Prerequisites

  • Python 3.10+
  • uv — Python package manager
  • Docker — For containerized test execution
  • (Optional) Neo4j — For knowledge graph storage (falls back to in-memory store)

Installation

  1. Clone the repository and navigate to the project directory.

  2. Install dependencies with uv:

    uv sync
  3. Activate the virtual environment:

    source .venv/bin/activate
  4. Configure environment variables in src/config.py or via shell exports:

    export OPENAI_API_BASE="https://api.openai.com/v1"
    export OPENAI_API_KEY="sk-..."

Running the Pipeline

Full Pipeline (Default)

Run the end-to-end bug reproduction pipeline on SWT-bench Lite:

python -m src.main

Dataset Selection

Run on SWT-bench Verified:

python -m src.main --dataset-name princeton-nlp/SWT-bench_Verified --dataset-type verified

Instance Filtering

Process specific instances by ID:

python -m src.main --instance-ids sympy__sympy-24152 django__django-17084

Process a range:

python -m src.main --start 0 --end 10

Process specific repositories:

python -m src.main --repos sympy/sympy django/django

Ablation Studies

Disable specific components:

# Disable bug localization
python -m src.main --disable-bug-localization

# Disable root cause analysis
python -m src.main --disable-root-cause-analysis

# Disable test generation hints
python -m src.main --disable-test-hints-generation

# Disable feedback iteration
python -m src.main --disable-feedback-iteration

# Multiple flags can be combined
python -m src.main --disable-bug-localization --disable-root-cause-analysis

Evaluation

After generating predictions, evaluate them using the SWT-bench harness:

python -m src.swtbench.run_evaluation \
    --dataset_name princeton-nlp/SWT-bench_Lite \
    --predictions_path outputs_lite/repro_agent_preds_gpt_5_mini_lite.jsonl \
    --max_workers 4 \
    --run_id repro_agent_eval

Running Tests

pytest

Configuration

Key settings in src/config.py:

Setting Description
OPENAI_API_BASE / OPENAI_API_KEY LLM endpoint credentials
MAX_*_COUNT Iteration limits for each agent stage (default: 50)
ENABLE_STREAMING Toggle LLM streaming responses
RECURSION_LIMIT LangGraph recursion limit (default: 1024)

Key Results

Pre-computed evaluation results are included in this repository:

Dataset Report
SWT-bench Lite outputs_gpt_5_mini/ReProAgent.repro_agent_gpt_5_mini_lite.json
SWT-bench Verified outputs_gpt_5_mini/ReProAgent.repro_agent_gpt_5_mini_verified.json

Prediction files for re-evaluation:

Dataset Predictions
SWT-bench Lite outputs_gpt_5_mini/repro_agent_preds_gpt_5_mini_lite.jsonl
SWT-bench Verified outputs_gpt_5_mini/repro_agent_preds_gpt_5_mini_verified.jsonl

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages