JoyCode SWE-bench Agent Pipeline

JoyCode is an end-to-end LLM-powered pipeline for fixing real-world open-source software issues. It generates patches, creates and verifies tests, and employs intelligent retry mechanisms to achieve high success rates on the SWE-bench dataset.

Project Status: JoyCode has achieved 74.6% resolution rate on SWE-bench Verified split, demonstrating state-of-the-art performance in automated software engineering.

Key Innovation: Our pipeline combines patch generation with intelligent test creation and failure attribution, enabling robust automated code repair with comprehensive validation and smart retry mechanisms.

✨ Features

🏆 High Performance & Cost Efficiency

74.6% Success Rate on SWE-bench Verified, ranking 2nd globally
30-50% Lower Resource Consumption than top competitors
Exceptional cost-performance ratio with near state-of-the-art results

🔄 Patch-Test Co-generation

Smart Test Generation: Automatic Fail2Pass and Pass2Pass test creation with pre-validation
Collaborative Verification: Patches and tests generated together for comprehensive validation
Closed-loop Iteration: "Generate → Validate → Refine" cycle replacing one-shot approaches

🧠 Intelligent Failure Attribution

Root Cause Analysis: Precise failure attribution to patch vs. test issues
Targeted Retry Strategy: Experience-driven retries based on failure analysis
CSR-Powered Learning: Historical success pattern retrieval for optimization

🏗️ Multi-Agent Architecture

Specialized Agents: Testing, Patch, CSR, and Decision agents with distinct roles
React-based Workflow: "Observe-Think-Act" loop mimicking human developers
Smart Decision Making: LLM-powered voting for optimal patch selection

💡 Smart Resource Management

Token-Efficient Design: Targeted LLM calls avoiding wasteful parallel sampling
Early Failure Detection: Pre-validation to filter invalid paths
Quality-First Generation: Fewer, higher-quality patches over massive sampling

🐳 Production-Ready Engineering

Containerized Execution: Isolated Docker environments with SWE-bench images
Repository-Level Understanding: Multi-file coordination and cross-module reasoning
Comprehensive Logging: Full trajectory recording with optional compression
Multi-LLM Support: Flexible model configuration for different pipeline stages

🚀 Installation

Requirements

Python 3.11+
Docker with access to docker.1ms.run
LLM API keys (OpenAI, Anthropic, etc.)

Setup

# Clone repository
git clone https://github.com/jd-opensource/joycode-agent.git
cd joycode

# Create conda environment
conda create -n joycode python=3.11
conda activate joycode

# Install dependencies
pip install -r requirements.txt

⚙️ Configuration

LLM Configuration

Configure your models in llm_server/model_config.json:

{
  "patch_generation": {
    "api_key": "your_api_key_here",
    "base_url": "https://api.openai.com/v1",
    "model_name": "gpt-5",
    "max_tokens": 4000,
    "temperature": 1
  }
}

Docker Setup

Ensure Docker is running and you can access the registry:

# Test Docker connectivity
docker pull docker.1ms.run/swebench/sweb.eval.x86_64.django__django-11099:latest

Instance Configuration

Specify instances to process in instance_id.txt:

django__django-11099
matplotlib__matplotlib-23562
sympy__sympy-18189
...

📖 Usage

Quick Start

# Run with default settings
python run_patch_pipeline.py --num-processes 1 --enable-post-processing

Common Usage Patterns

# Simple mode (patch only, no tests)
python run_patch_pipeline.py --simple-mode

# Single instance processing
python run_patch_pipeline.py --problem-id django__django-11099 --num-processes 1

# Batch processing with limits
python run_patch_pipeline.py --num-examples 10 --num-processes 4

# Disable test generation
python run_patch_pipeline.py --no-generate-tests --no-validate-with-tests

# Custom configuration
python run_patch_pipeline.py --enable-post-processing --num-processes 2

Command Line Options

Option	Description	Default
`--num-processes`	Number of parallel processes	1
`--enable-post-processing`	Enable trajectory compression and retries	False
`--simple-mode`	Patch generation only	False
`--problem-id`	Process single instance	None
`--num-examples`	Limit number of instances	All
`--no-generate-tests`	Skip test generation	False
`--no-validate-with-tests`	Skip test validation	False

🛠️ Advanced Features

Patch Voting System

Compare and select between multiple patch candidates:

# Prepare voting inputs
python scripts/prepare_voting_data.py

# Run voting
python vote.py

Input Requirements:

patch_1.json: Primary patch candidates
patch_2.json: Alternative patch candidates
test-00000-of-00001.parquet: Instance metadata with problem statements

Pipeline Stages

Container Setup: Pull and start SWE-bench Docker images
Test Generation (optional): Create and validate tests on original code
Agent Execution: Generate patches using LLM agents via cli.py
Validation: Run tests and evaluate patch quality
Post-processing (optional): Trajectory compression, similarity matching, intelligent retries

Output Structure

output_files/
├── <instance_id>/
│   ├── predictions.json              # Generated patch and metadata
│   ├── agent_logs.txt               # Main agent execution logs
│   ├── test_generation_result.json  # Test generation results
│   ├── test_generation_logs.txt     # Test generation logs
│   ├── agent_logs_retry.txt         # Retry attempt logs
│   └── compressed_trajectory.txt    # Compressed execution trajectory
├── successful_cases.txt             # Summary of successful instances
├── failed_cases.txt                 # Summary of failed instances
├── empty_diff_cases.txt            # Cases with no generated patches
└── similar_case_matches_summary.json # Similar case analysis

📊 Performance Results

Submission summary for 20250909_JoyCode on SWE-bench verified split
==================================================
Resolved 373 instances (74.6%)
==================================================
Resolved by Repository:
- astropy/astropy: 13/22 (59.09%)
- django/django: 178/231 (77.06%)
- matplotlib/matplotlib: 25/34 (73.53%)
- mwaskom/seaborn: 1/2 (50.0%)
- pallets/flask: 1/1 (100.0%)
- psf/requests: 3/8 (37.5%)
- pydata/xarray: 19/22 (86.36%)
- pylint-dev/pylint: 2/10 (20.0%)
- pytest-dev/pytest: 17/19 (89.47%)
- scikit-learn/scikit-learn: 28/32 (87.5%)
- sphinx-doc/sphinx: 29/44 (65.91%)
- sympy/sympy: 57/75 (76.0%)

🔧 Development

Repository Structure

joycode/
├── run_patch_pipeline.py           # Main entry point
├── cli.py                         # Core agent CLI
├── test_case_generator/           # Test generation logic
├── test/                         # Test execution and validation
├── utils/docker_utils.py         # Container management
├── llm_server/                   # LLM integration layer
├── princeton-nlp___swe-bench_verified/ # Local SWE-bench dataset
└── vote.py                       # Patch voting system

Troubleshooting

Docker Issues:

# Check Docker connectivity
docker info
docker pull docker.1ms.run/hello-world

LLM Configuration:

# Verify model config
python -c "import json; print(json.load(open('llm_server/model_config.json')))"

Memory Issues:

# Reduce parallel processes
python run_patch_pipeline.py --num-processes 1

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

SWE-bench for providing the benchmark dataset

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
llm_server		llm_server
princeton-nlp___swe-bench_verified		princeton-nlp___swe-bench_verified
prompts		prompts
search_zip		search_zip
sources		sources
test		test
test_case_generator		test_case_generator
tools		tools
utils		utils
.DS_Store		.DS_Store
LICENSE		LICENSE
NOTICE-Third-Party		NOTICE-Third-Party
README.md		README.md
__init__.py		__init__.py
cli.py		cli.py
generate_test.py		generate_test.py
instance_id.txt		instance_id.txt
requirements.txt		requirements.txt
run_patch_pipeline.py		run_patch_pipeline.py
validate_retry_necessity.py		validate_retry_necessity.py
vote.py		vote.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

JoyCode SWE-bench Agent Pipeline

✨ Features

🏆 High Performance & Cost Efficiency

🔄 Patch-Test Co-generation

🧠 Intelligent Failure Attribution

🏗️ Multi-Agent Architecture

💡 Smart Resource Management

🐳 Production-Ready Engineering

🚀 Installation

Requirements

Setup

⚙️ Configuration

LLM Configuration

Docker Setup

Instance Configuration

📖 Usage

Quick Start

Common Usage Patterns

Command Line Options

🛠️ Advanced Features

Patch Voting System

Pipeline Stages

Output Structure

📊 Performance Results

🔧 Development

Repository Structure

Troubleshooting

📄 License

🙏 Acknowledgments

About

Uh oh!

Packages

Contributors 2

Uh oh!

Languages

License

jd-opensource/joycode-agent

Folders and files

Latest commit

History

Repository files navigation

JoyCode SWE-bench Agent Pipeline

✨ Features

🏆 High Performance & Cost Efficiency

🔄 Patch-Test Co-generation

🧠 Intelligent Failure Attribution

🏗️ Multi-Agent Architecture

💡 Smart Resource Management

🐳 Production-Ready Engineering

🚀 Installation

Requirements

Setup

⚙️ Configuration

LLM Configuration

Docker Setup

Instance Configuration

📖 Usage

Quick Start

Common Usage Patterns

Command Line Options

🛠️ Advanced Features

Patch Voting System

Pipeline Stages

Output Structure

📊 Performance Results

🔧 Development

Repository Structure

Troubleshooting

📄 License

🙏 Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Packages 0

Contributors 2

Uh oh!

Languages

Packages