# GEPA: Optimize Anything using ASI (additional side information)

GEPA is a text evolution engine: Given a target metric, GEPA can efficiently search for the right parameters (including numerical, textual and code) to improve that metric. This way, GEPA can optimize essentially represent _anything_ that has a textual representation. In this post, we leverage this insight to present GEPA's optimize-anything API, which leverages the reflective capabilities of LLMs, to optimize anything representable as text. Crucially, GEPA can leverage any additional information available from the optimization environment simply by serializing into text.

## Prompt optimization on AIME 2025


### Set up a dataset

In [None]:
from examples.math.dataset import load_math_dataset

trainset, valset, testset = load_math_dataset()

Loaded 45 training examples
Loaded 45 validation examples
Loaded 30 test examples


### Seed candidate

We will use the dspy Signature to easily feed a problem and parse the output.

In [22]:
import dspy
import os

# Define the language model
api_key = os.environ.get("OPENAI_API_KEY")
lm = dspy.LM("gpt-4.1-mini", api_key=api_key, temperature=1.0, max_tokens=32000)
dspy.configure(lm=lm)

# Let's optimize the prompt of the following dspy reasoning module.
class MathSolverSignature(dspy.Signature):
    input = dspy.InputField(desc="The math problem to solve.")
    answer = dspy.OutputField(desc="The final numerical answer.")

predictor = dspy.ChainOfThought(MathSolverSignature)

# This is the initial prompt that we will optimize.
SEED_PROMPT = """Solve the math problem carefully. Break down the steps and provide the final answer as a single number."""

### Fitness function

In [None]:
from typing import Any, Sequence
from examples.math.main import math_metric

def fitness_fn(candidate: dict[str, str], batch: Sequence[Any]) -> list[tuple[float, Any, dict]]:
    predictor.predict.signature.instructions = candidate["prompt"]

    evaluator = dspy.Evaluate(
        devset=list(batch),
        metric=math_metric,
        num_threads=16,
        display_progress=True,
    )
    eval_result = evaluator(predictor)

    results = []

    for example, prediction, metric_result in eval_result.results:
        score = metric_result.score
        feedback = metric_result.feedback

        artifact = {
            "prompt": candidate["prompt"],
            "answer": prediction.answer,
        }

        side_info = {
            "Input": example.input,
            "Output": prediction.answer,
            "Reasoning": prediction.reasoning,
            "ExecutionFeedback": feedback,
        }

        results.append((score, artifact, side_info))

    return results

### Run the optimization

In [21]:
from gepa.optimize_anything import (
    EngineConfig,
    GEPAConfig,
    ReflectionConfig,
    optimize_anything,
)

gepa_config = GEPAConfig(
    engine=EngineConfig(
        max_metric_calls=800,
        track_best_outputs=True,
    ),
    reflection=ReflectionConfig(
        reflection_minibatch_size=3,
        skip_perfect_score=False,
        reflection_lm="openai/gpt-5",
    )
)

result = optimize_anything(
    seed_candidate={"prompt": SEED_PROMPT},
    fitness_fn=fitness_fn,
    dataset=trainset,
    valset=valset,
    config=gepa_config,
)

Average Metric: 21.00 / 45 (46.7%): 100%|██████████| 45/45 [00:00<00:00, 5713.45it/s]

2026/01/01 21:22:28 INFO dspy.evaluate.evaluate: Average Metric: 21.0 / 45 (46.7%)



Iteration 0: Base program full valset score: 0.4666666666666667 over 45 / 45 examples


### Check the improvement

In [15]:
from examples.math.main import evaluate_on_dataset

# Baseline Evaluation
print("\nEvaluating Baseline (Initial Prompt)...")
predictor.predict.signature.instructions = SEED_PROMPT
baseline_score = evaluate_on_dataset(predictor, testset)

# Optimized Evaluation
print("\nEvaluating Best Optimized Program...")
best_prompt = result.best_candidate["prompt"]
print(f"Best Prompt Found:\n{best_prompt}")

predictor.predict.signature.instructions = best_prompt
optimized_score = evaluate_on_dataset(predictor, testset)

print(f"Baseline Score: {baseline_score:.2%}")
print(f"Optimized Score: {optimized_score:.2%}")
print(f"Improvement: {optimized_score - baseline_score:.2%}")



Evaluating Baseline (Initial Prompt)...
  0%|          | 0/30 [00:00<?, ?it/s]

Average Metric: 15.00 / 30 (50.0%): 100%|██████████| 30/30 [00:00<00:00, 637.39it/s]

2026/01/01 16:51:24 INFO dspy.evaluate.evaluate: Average Metric: 15.0 / 30 (50.0%)




Evaluating Best Optimized Program...
Best Prompt Found:
Solve the math problem carefully. Break down the steps and provide the final answer as a single number.
Average Metric: 15.00 / 30 (50.0%): 100%|██████████| 30/30 [00:00<00:00, 5119.38it/s]

2026/01/01 16:51:24 INFO dspy.evaluate.evaluate: Average Metric: 15.0 / 30 (50.0%)



Baseline Score: 50.00%
Optimized Score: 50.00%
Improvement: 0.00%
