# Tutorial: Using EvolveAdapter with OpenEvolve Projects

This tutorial demonstrates how to adapt an existing OpenEvolve project to work with GEPA's `EvolveAdapter`. The key change required is modifying your `evaluate` function to accept a **batch of training data instances** and return a **list of evaluation results** (one per instance), rather than returning a single result.

> **Note:** This tutorial is also available as a standalone Python script: `tutorial_evolve_adapter.py`. You can run it directly instead of using this notebook.

## Why This Change?

GEPA's optimization engine works with batches of data to:
- Provide per-instance feedback for better program refinement
- Support minibatch-based optimization strategies

## Example Project: Function Minimization

We'll use the **function minimization** example from OpenEvolve [examples/function_minimization](https://github.com/algorithmicsuperintelligence/openevolve/tree/main/examples/function_minimization).

## Step 1: Key Changes Required

Here are the **three main changes** you need to make to your `evaluate` function:

### Change 1: Function Signature
**Before:**
```python
def evaluate(program_path: str) -> EvaluationResult:
```

**After:**
```python
def evaluate(program_path: str, batch: list) -> list[EvaluationResult]:
```
- Add `batch` parameter
- Change return type from `EvaluationResult` to `list[EvaluationResult]`

### Change 2: Loop Over Batch Items
**Before:** The function runs the program multiple times internally and aggregates results.

**After:** Loop over each batch item and evaluate the program for each one:
```python
results = []
for batch_item in batch:
    # Evaluate program for this specific batch item
    # ... evaluation logic ...
    results.append(EvaluationResult(...))  # One result per batch item
return results
```

### Change 3: Extract Parameters from Batch Items
**Before:** Hard-coded problem parameters (e.g., `GLOBAL_MIN_X = -1.704`)

**After:** Extract parameters from each batch item:
```python
GLOBAL_MIN_X = batch_item.get("global_min_x", -1.704)
GLOBAL_MIN_Y = batch_item.get("global_min_y", 0.678)
# ... etc
```

## Complete Modified Evaluate Function

Here's the complete modified evaluate function with all changes applied. For this function minimization project example, each batch item represents a different function minimization problem:

In [5]:
# Modified evaluator.py for EvolveAdapter
# For this example, each batch item represents a different function minimization problem
# 
# KEY CHANGES MADE:
# 1. Added 'batch: list' parameter to function signature
# 2. Changed return type to 'list[EvaluationResult]'
# 3. Added loop: 'for batch_item in batch:'
# 4. Extract parameters from batch_item instead of hard-coding
# 5. Return list of results instead of single result

import importlib.util
import numpy as np
import time
import concurrent.futures
import traceback
from openevolve.evaluation_result import EvaluationResult


def run_with_timeout(func, args=(), kwargs={}, timeout_seconds=5):
    """
    Run a function with a timeout using concurrent.futures

    Args:
        func: Function to run
        args: Arguments to pass to the function
        kwargs: Keyword arguments to pass to the function
        timeout_seconds: Timeout in seconds

    Returns:
        Result of the function or raises TimeoutError
    """
    with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor:
        future = executor.submit(func, *args, **kwargs)
        try:
            return future.result(timeout=timeout_seconds)
        except concurrent.futures.TimeoutError:
            raise TimeoutError(f"Function timed out after {timeout_seconds} seconds")


def safe_float(value):
    """Convert a value to float safely."""
    try:
        return float(value)
    except (TypeError, ValueError):
        return 0.0


# CHANGE 1: Added 'batch' parameter, changed return type to list
def evaluate(program_path: str, batch: list) -> list[EvaluationResult]:
    """
    Evaluate the program on a batch of function minimization problems.
    
    Args:
        program_path: Path to the program file to evaluate
        batch: List of dicts, each containing:
            - 'global_min_x': Target x coordinate
            - 'global_min_y': Target y coordinate  
            - 'global_min_value': Target function value
            - 'bounds': Tuple of (min, max) bounds for search space
            - 'function_name': Optional name for the function
    
    Returns:
        List of EvaluationResult objects, one per batch item
    """
    # CHANGE 2: Initialize results list to collect one result per batch item
    results = []
    
    # Load the program once (shared across all batch items)
    try:
        spec = importlib.util.spec_from_file_location("program", program_path)
        program = importlib.util.module_from_spec(spec)
        spec.loader.exec_module(program)
        
        if not hasattr(program, "run_search"):
            # Return error results for all batch items
            for _ in batch:
                results.append(EvaluationResult(
                    metrics={"combined_score": 0.0, "error": 1.0},
                    artifacts={"error": "Missing run_search function"}
                ))
            return results
    except Exception as e:
        # Return error results for all batch items
        for _ in batch:
            results.append(EvaluationResult(
                metrics={"combined_score": 0.0, "error": 1.0},
                artifacts={"error": str(e), "traceback": traceback.format_exc()}
            ))
        return results
    
    # CHANGE 3: Loop over each batch item
    for batch_item in batch:
        try:
            # CHANGE 4: Extract problem parameters from batch_item instead of hard-coding
            # In the original, these were hard-coded constants like:
            #   GLOBAL_MIN_X = -1.704
            # Now we get them from each batch item:
            GLOBAL_MIN_X = batch_item.get("global_min_x", -1.704)
            GLOBAL_MIN_Y = batch_item.get("global_min_y", 0.678)
            GLOBAL_MIN_VALUE = batch_item.get("global_min_value", -1.519)
            bounds = batch_item.get("bounds", (-5, 5))
            
            # Run multiple trials for this specific problem
            num_trials = 10
            x_values = []
            y_values = []
            values = []
            distances = []
            times = []
            success_count = 0
            
            for trial in range(num_trials):
                try:
                    start_time = time.time()
                    
                    # Run the program (it should use the bounds from batch_item)
                    # Note: The program may need to be modified to accept bounds as parameter
                    result = run_with_timeout(program.run_search, timeout_seconds=5)
                    
                    # Handle different result formats
                    if isinstance(result, tuple):
                        if len(result) == 3:
                            x, y, value = result
                        elif len(result) == 2:
                            x, y = result
                            # Calculate function value
                            value = np.sin(x) * np.cos(y) + np.sin(x * y) + (x**2 + y**2) / 20
                        else:
                            continue
                    else:
                        continue
                    
                    end_time = time.time()
                    
                    # Validate results
                    x = safe_float(x)
                    y = safe_float(y)
                    value = safe_float(value)
                    
                    if (np.isnan(x) or np.isnan(y) or np.isnan(value) or
                        np.isinf(x) or np.isinf(y) or np.isinf(value)):
                        continue
                    
                    # Calculate metrics for this trial
                    x_diff = x - GLOBAL_MIN_X
                    y_diff = y - GLOBAL_MIN_Y
                    distance_to_global = np.sqrt(x_diff**2 + y_diff**2)
                    
                    x_values.append(x)
                    y_values.append(y)
                    values.append(value)
                    distances.append(distance_to_global)
                    times.append(end_time - start_time)
                    success_count += 1
                    
                except Exception as e:
                    continue
            
            # If all trials failed, return error result
            if success_count == 0:
                results.append(EvaluationResult(
                    metrics={
                        "value_score": 0.0,
                        "distance_score": 0.0,
                        "reliability_score": 0.0,
                        "combined_score": 0.0,
                        "error": 1.0
                    },
                    artifacts={"error": "All trials failed"}
                ))
                continue
            
            # Calculate aggregated metrics for this batch item
            avg_value = float(np.mean(values))
            avg_distance = float(np.mean(distances))
            
            # Convert to scores (higher is better)
            value_score = float(1.0 / (1.0 + abs(avg_value - GLOBAL_MIN_VALUE)))
            distance_score = float(1.0 / (1.0 + avg_distance))
            reliability_score = float(success_count / num_trials)
            
            # Calculate combined score
            base_score = 0.5 * value_score + 0.3 * distance_score + 0.2 * reliability_score
            
            # Apply solution quality multiplier
            if avg_distance < 0.5:
                solution_quality_multiplier = 1.5
            elif avg_distance < 1.5:
                solution_quality_multiplier = 1.2
            elif avg_distance < 3.0:
                solution_quality_multiplier = 1.0
            else:
                solution_quality_multiplier = 0.7
            
            combined_score = float(base_score * solution_quality_multiplier)
            
            # Create artifacts
            artifacts = {
                "convergence_info": f"Converged in {num_trials} trials with {success_count} successes",
                "best_position": f"Final position: x={x_values[-1]:.4f}, y={y_values[-1]:.4f}",
                "average_distance_to_global": f"{avg_distance:.4f}",
                "search_efficiency": f"Success rate: {reliability_score:.2%}"
            }
            
            results.append(EvaluationResult(
                metrics={
                    "value_score": value_score,
                    "distance_score": distance_score,
                    "reliability_score": reliability_score,
                    "combined_score": combined_score,
                },
                artifacts=artifacts
            ))
            
        except Exception as e:
            # Return error result for this batch item
            results.append(EvaluationResult(
                metrics={"combined_score": 0.0, "error": 1.0},
                artifacts={
                    "error": str(e),
                    "traceback": traceback.format_exc()
                }
            ))
    
    # CHANGE 5: Return list of results (one per batch item) instead of single result
    return results

## Summary of Changes

Here's a quick reference of what changed:

| Aspect | Original | Modified for EvolveAdapter |
|--------|----------|---------------------------|
| **Function signature** | `evaluate(program_path: str)` | `evaluate(program_path: str, batch: list)` |
| **Return type** | `EvaluationResult` | `list[EvaluationResult]` |
| **Structure** | Single evaluation, aggregated result | Loop over batch, one result per item |
| **Parameters** | Hard-coded constants | Extracted from `batch_item` |
| **Return statement** | `return EvaluationResult(...)` | `return [EvaluationResult(...), ...]` |

## Step 2: Modifying Cascade Evaluation Functions

If your project uses cascade evaluation, you must also modify your cascade evaluation functions (`evaluate_stage1`, `evaluate_stage2`, `evaluate_stage3`) to accept batch parameters and return lists, just like the main `evaluate` function.

### Why Cascade Functions Need Batch Support

When cascade evaluation is enabled, `EvolveAdapter` uses `CascadeEvaluationStrategy`, which calls your stage functions with a `batch` parameter. These functions must:
1. Accept `batch: list` as a parameter
2. Return `list[EvaluationResult]` (one result per batch item)
3. Loop over batch items and process each one

### Example: Modified `evaluate_stage1`

**Before:**
```python
def evaluate_stage1(program_path: str) -> EvaluationResult:
    # ... evaluation logic for single instance ...
    return EvaluationResult(metrics={...}, artifacts={...})
```

**After:**
```python
def evaluate_stage1(program_path: str, batch: list) -> list[EvaluationResult]:
    results = []
    
    # Load program once (shared across batch items)
    # ... load program ...
    
    # Loop over each batch item
    for batch_item in batch:
        # Extract parameters from batch_item if needed
        GLOBAL_MIN_X = batch_item.get("global_min_x", -1.704)
        # ... evaluation logic for this batch item ...
        results.append(EvaluationResult(metrics={...}, artifacts={...}))
    
    return results
```

### Example: Modified `evaluate_stage2`

**Before:**
```python
def evaluate_stage2(program_path: str) -> EvaluationResult:
    # Full evaluation as in the main evaluate function
    return evaluate(program_path)
```

**After:**
```python
def evaluate_stage2(program_path: str, batch: list) -> list[EvaluationResult]:
    # Full evaluation as in the main evaluate function
    return evaluate(program_path, batch)
```

### Key Points for Cascade Functions:

1. **Same signature pattern**: All stage functions should follow the same pattern as `evaluate`:
   - Add `batch: list` parameter
   - Change return type to `list[EvaluationResult]`
   - Loop over batch items
   - Return a list of results

2. **One result per batch item**: Ensure `len(results) == len(batch)`

3. **Error handling**: If a stage fails for a specific batch item, return an error `EvaluationResult` for that item rather than raising an exception

## Step 3: (Optional) Modify Config System Message

You may need to modify the `system_message` in your `config.yaml` to match your newly-modified batch-based evaluation setup. For example, if your original OpenEvolve project had a hard-coded problem in the system prompt, you should remove or generalize it.

**Example for function minimization:**
- **Before (hard-coded)**: `"You are an expert programmer specializing in optimization algorithms. Your task is to improve a function minimization algorithm to find the global minimum of a complex function with many local minima. The function is f(x, y) = sin(x) * cos(y) + sin(x*y) + (x^2 + y^2)/20. Focus on improving the search_algorithm function to reliably find the global minimum, escaping local minima that might trap simple algorithms."`
- **After (generalized)**: `"You are an expert programmer specializing in optimization algorithms. Your task is to improve a function minimization algorithm to find the global minimum of a complex function with many local minima. Focus on improving the search_algorithm function to reliably find the global minimum, escaping local minima that might trap simple algorithms."`

## Step 4: Using EvolveAdapter

Here is how to use `EvolveAdapter` with GEPA's optimization engine.

In [None]:
# Install required dependencies
%pip install gepa openevolve numpy scipy pyyaml litellm

In [None]:
from pathlib import Path
from gepa import optimize
from gepa.adapters.evolve_adapter.evolve_adapter import EvolveAdapter

# Path to your modified OpenEvolve project directory
# This should contain: config.yaml, evaluator.py, initial_program.py
project_path = Path("your-project-path")

# Create the adapter
adapter = EvolveAdapter(
    path=project_path
)

# Define training data (for this example, a batch of function minimization problems)
# Each item represents a different problem instance
trainset = [
    {
        "global_min_x": -1.704,
        "global_min_y": 0.678,
        "global_min_value": -1.519,
        "bounds": (-5, 5),
        "function_name": "sin_cos_function"
    },
    {
        "global_min_x": 0.0,
        "global_min_y": 0.0,
        "global_min_value": 0.0,
        "bounds": (-3, 3),
        "function_name": "quadratic_function"
    },
    # Add more problem instances as needed
]

# Read initial program
with open(project_path / "initial_program.py", "r") as f:
    initial_program = f.read()

# Define seed candidate (the program to evolve)
seed_candidate = {
    "program": initial_program
}

# Run GEPA optimization
result = optimize(
    seed_candidate=seed_candidate,
    trainset=trainset,
    adapter=adapter,
    max_metric_calls=60,  # Budget for evaluation calls -  adjust as needed
    display_progress_bar=True
)

# Get the best score (GEPAResult doesn't have best_score, use val_aggregate_scores[best_idx])
best_score = result.val_aggregate_scores[result.best_idx]
print(f"Best score: {best_score}")
print(f"Best candidate index: {result.best_idx}")
print(f"Total candidates evaluated: {len(result.candidates)}")
print(f"Total metric calls: {result.total_metric_calls}")

# The evolved program is in result.best_candidate["program"]
print(f"\nBest candidate program:")
print(result.best_candidate.get("program", "N/A")[:500] + "..." if len(result.best_candidate.get("program", "")) > 500 else result.best_candidate.get("program", "N/A"))

## Quick Reference: Before vs After

### Original OpenEvolve Evaluate Function:
```python
def evaluate(program_path: str) -> EvaluationResult:
    # Hard-coded problem parameters
    GLOBAL_MIN_X = -1.704
    GLOBAL_MIN_Y = 0.678
    GLOBAL_MIN_VALUE = -1.519
    
    # Run program multiple times (trials) and aggregate
    x_values = []
    y_values = []
    values = []
    for trial in range(num_trials):  # e.g., 10 trials
        result = program.run_search()
        x, y, value = result
        x_values.append(x)
        y_values.append(y)
        values.append(value)
    
    # Aggregate across all trials
    avg_value = np.mean(values)
    avg_distance = np.mean(distances)
    # ... calculate aggregated metrics ...
    
    # Return single aggregated result
    return EvaluationResult(metrics={...}, artifacts={...})
```

### Modified for EvolveAdapter:
```python
def evaluate(program_path: str, batch: list) -> list[EvaluationResult]:
    results = []
    
    # Loop over each batch item
    for batch_item in batch:
        # Extract parameters from batch item
        GLOBAL_MIN_X = batch_item.get("global_min_x", -1.704)
        GLOBAL_MIN_Y = batch_item.get("global_min_y", 0.678)
        GLOBAL_MIN_VALUE = batch_item.get("global_min_value", -1.519)
        
        # Run program multiple times for this batch item
        for trial in range(num_trials):
            result = program.run_search()
            # ... aggregate results for this batch item ...
        
        # Append one result per batch item
        results.append(EvaluationResult(metrics={...}, artifacts={...}))
    
    # Return list of results (one per batch item)
    return results
```

## Notes:

1. **Batch Structure**: Each item in the batch should represent a distinct evaluation instance.

2. **Return Format**: The function must return a **list** of `EvaluationResult` objects, with `len(results) == len(batch)`.

3. **Error Handling**: If evaluation fails for a specific batch item, return an `EvaluationResult` with error metrics rather than raising an exception.

4. **Metrics**: Each `EvaluationResult` should include a `"combined_score"` metric in its `metrics` dict, as this is used by GEPA for optimization.

5. **Artifacts**: Use the `artifacts` field to store additional information. The adapter automatically uses these artifacts to create feedback for program improvement.

6. **Cascade Evaluation**: If your project uses cascade evaluation, remember to modify **all** stage functions (`evaluate_stage1`, `evaluate_stage2`, etc.) to accept `batch` and return `list[EvaluationResult]`. See Step 3.5 for details.

## Next Steps:

1. Modify your `evaluator.py` to accept `batch` parameter and return `list[EvaluationResult]`
2. **If using cascade evaluation**: Modify all stage functions (`evaluate_stage1`, `evaluate_stage2`, etc.) to accept `batch` and return lists
3. Test with a small batch to ensure everything works
4. Run GEPA optimization with your adapted project