# Notebook for DSpy
## In this notebook we use dspy to optimize generating relevance scores for indices 

DSPy optimizers, such as BootstrapFewShotWithRandomSearch and ChainOfThought, work by programmatically tuning prompts and, optionally, LM weights to maximize a user-defined metric. The process involves running a DSPy program on a set of training examples, generating and selecting few-shot demonstrations, and refining instructions or prompt content through iterative search and evaluation. For example, BootstrapFewShotWithRandomSearch repeatedly samples and evaluates different sets of few-shot examples, selecting the best-performing configuration, while ChainOfThought structures the reasoning process for each example.

These methods automate much of the manual prompt engineering process by leveraging given data and metric to systematically improve the program's performance. The optimizer compiles the high-level program into an optimized prompt or set of instructions, which is then used consistently for inference on new inputs.

In [None]:
import os

# API Key
path_to_key = "/Users/erikarnold/Desktop/open_ai_key.txt"
with open(path_to_key, "r") as file:
    os.environ["OPENAI_API_KEY"] = file.read().strip()

### The following cell were a bunch of AI generated test examples that I looked over and modified rationale and scores as I saw fit

In [15]:
import dspy

test_examples = [
    # --- Disease / Epidemiology ---
    dspy.Example(
        index_question="Will there be a large-scale H5N1 outbreak in humans in the next 12 months?",
        market_title="Will more than 1000 humans die of H5N1 in 2025?",
        rationale="Direct measure of human outbreak severity.",
        label="1",
        score="0.9"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will there be a large-scale H5N1 outbreak in humans in the next 12 months?",
        market_title="Will 1000 poultry flocks be infected with H5N1 in 2025?",
        rationale="Indirectly related to human outbreak.",
        label="1",
        score="0.68"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will there be a large-scale H5N1 outbreak in humans in the next 12 months?",
        market_title="Will AI systems reach superhuman performance in 2025?",
        rationale="Unrelated to H5N1 outbreaks.",
        label="0",
        score="0.02"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will global COVID-19 cases increase next year?",
        market_title="Will daily new COVID-19 cases exceed 500,000 worldwide in 2025?",
        rationale="Direct measure of outbreak magnitude.",
        label="1",
        score="0.91"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will global COVID-19 cases increase next year?",
        market_title="Will global flu vaccination rates rise by 10%?",
        rationale="Vaccination rates indirectly affect COVID trends.",
        label="1",
        score="0.47"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will global COVID-19 cases increase next year?",
        market_title="Will global cryptocurrency market cap exceed $3 trillion?",
        rationale="Cryptocurrency unrelated to disease cases.",
        label="0",
        score="0.03"
    ).with_inputs("index_question", "market_title"),

    # --- AI / Technology ---
    dspy.Example(
        index_question="Will AI systems achieve a new frontier-level milestone within the next 12 months?",
        market_title="Will OpenAI release GPT-5 by 2025?",
        rationale="Directly about frontier AI milestone.",
        label="1",
        score="0.85"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will AI systems achieve a new frontier-level milestone within the next 12 months?",
        market_title="Will Google DeepMind's AlphaGo team release a new model?",
        rationale="Loosely predictive of AI frontier progress.",
        label="1",
        score="0.62"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will AI systems achieve a new frontier-level milestone within the next 12 months?",
        market_title="Will US unemployment rate exceed 6% in 2025?",
        rationale="Economic metric unrelated to AI milestones.",
        label="0",
        score="0.01"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will autonomous vehicles be fully deployed in at least 3 major cities within 12 months?",
        market_title="Will Tesla launch fully autonomous vehicles in three new cities?",
        rationale="Directly tests autonomous vehicle deployment.",
        label="1",
        score="0.93"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will autonomous vehicles be fully deployed in at least 3 major cities within 12 months?",
        market_title="Will regulations on electric vehicles increase?",
        rationale="Regulations may influence deployment but are indirect.",
        label="1",
        score="0.38"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will autonomous vehicles be fully deployed in at least 3 major cities within 12 months?",
        market_title="Will there be a major H5N1 outbreak in humans in the next 12 months?",
        rationale="Unrelated to autonomous vehicle deployment.",
        label="0",
        score="0.02"
    ).with_inputs("index_question", "market_title"),

    # --- Global Conflict ---
    dspy.Example(
        index_question="Will the total number of people killed in wars and armed conflicts worldwide in the next 12 months exceed the average annual deaths over the past five years?",
        market_title="Will civilian casualties in Syria exceed 10,000 in 2025?",
        rationale="Directly reflects conflict death counts.",
        label="1",
        score="0.89"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will the total number of people killed in wars and armed conflicts worldwide in the next 12 months exceed the average annual deaths over the past five years?",
        market_title="Will global military spending increase in 2025?",
        rationale="Military spending only loosely predictive of deaths.",
        label="1",
        score="0.51"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will the total number of people killed in wars and armed conflicts worldwide in the next 12 months exceed the average annual deaths over the past five years?",
        market_title="Will AI systems achieve a new frontier-level milestone within the next 12 months?",
        rationale="Unrelated to conflict deaths.",
        label="0",
        score="0.03"
    ).with_inputs("index_question", "market_title"),

    dspy.Example(
        index_question="Will a coup occur in a G7 country within 12 months?",
        market_title="Will there be a military coup in France by next year?",
        rationale="Directly measures index question.",
        label="1",
        score="0.96"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will a coup occur in a G7 country within 12 months?",
        market_title="Will unemployment in Germany exceed 6%?",
        rationale="Economic data loosely related to political stability.",
        label="1",
        score="0.35"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will a coup occur in a G7 country within 12 months?",
        market_title="Will a new exoplanet be discovered in 2025?",
        rationale="Unrelated to coups or politics.",
        label="0",
        score="0.01"
    ).with_inputs("index_question", "market_title"),

    # --- US Elections ---
    dspy.Example(
        index_question="Will the next national election favor the Democratic Party?",
        market_title="Will national polls in key battleground states favor the Democratic Party?",
        rationale="Polls directly indicate likely outcome.",
        label="1",
        score="0.94"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will the next national election favor the Democratic Party?",
        market_title="Will state-level ballot initiatives increase Democratic turnout?",
        rationale="Indirectly predictive of outcome.",
        label="1",
        score="0.67"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will the next national election favor the Democratic Party?",
        market_title="Will global CO2 emissions rise in 2025?",
        rationale="Environmental metrics unrelated to election outcome.",
        label="0",
        score="0.05"
    ).with_inputs("index_question", "market_title"),

    # --- Environment / Climate ---
    dspy.Example(
        index_question="Will global CO2 emissions increase in 2025?",
        market_title="Will total CO2 emissions exceed 36 gigatons?",
        rationale="Direct measure of CO2 increase.",
        label="1",
        score="0.96"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will global CO2 emissions increase in 2025?",
        market_title="Will renewable energy adoption increase by 5%?",
        rationale="Partially predictive of emissions trends.",
        label="1",
        score="0.63"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will global CO2 emissions increase in 2025?",
        market_title="Will AI systems achieve new milestones in 2025?",
        rationale="Unrelated to CO2 emissions.",
        label="0",
        score="0.03"
    ).with_inputs("index_question", "market_title"),

    # --- Economics / Finance ---
    dspy.Example(
        index_question="Will global inflation exceed 5% next year?",
        market_title="Will the US CPI exceed 5% in 2025?",
        rationale="Direct measure of inflation.",
        label="1",
        score="0.94"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will global inflation exceed 5% next year?",
        market_title="Will central banks raise interest rates by 0.5%?",
        rationale="Interest rate increases partially predictive of inflation.",
        label="1",
        score="0.61"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will global inflation exceed 5% next year?",
        market_title="Will the next FIFA World Cup champion be Brazil?",
        rationale="Sports outcomes unrelated to inflation.",
        label="0",
        score="0.02"
    ).with_inputs("index_question", "market_title"),

    # --- Health Metrics ---
    dspy.Example(
        index_question="Will global obesity prevalence increase in 2025?",
        market_title="Will adult obesity rates exceed 15% worldwide?",
        rationale="Direct measure of obesity prevalence.",
        label="1",
        score="0.92"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will global obesity prevalence increase in 2025?",
        market_title="Will sugar consumption per capita increase by 5%?",
        rationale="Partially predictive of obesity trends.",
        label="1",
        score="0.57"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will global obesity prevalence increase in 2025?",
        market_title="Will global CO2 emissions exceed 36 Gt?",
        rationale="Environmental metric unrelated to obesity.",
        label="0",
        score="0.01"
    ).with_inputs("index_question", "market_title"),

    # --- Sports ---
    dspy.Example(
        index_question="Will the next FIFA World Cup champion be Brazil?",
        market_title="Will Brazil win the 2026 World Cup?",
        rationale="Directly predicts outcome of the event.",
        label="1",
        score="0.95"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will the next FIFA World Cup champion be Brazil?",
        market_title="Will the team with the highest FIFA ranking win?",
        rationale="Ranking partially predictive of champion.",
        label="1",
        score="0.6"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will the next FIFA World Cup champion be Brazil?",
        market_title="Will global AI milestones be achieved in 2025?",
        rationale="Unrelated to World Cup outcome.",
        label="0",
        score="0.01"
    ).with_inputs("index_question", "market_title"),

    # --- Space / Science ---
    dspy.Example(
        index_question="Will a crewed mission land on the Moon by 2025?",
        market_title="Will astronauts land on the Moon in 2025?",
        rationale="Directly answers the index question.",
        label="1",
        score="0.95"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will a crewed mission land on the Moon by 2025?",
        market_title="Will NASA announce a Moon mission plan?",
        rationale="Indirectly predictive, plan does not guarantee landing.",
        label="1",
        score="0.48"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will a crewed mission land on the Moon by 2025?",
        market_title="Will global influenza deaths exceed 500,000?",
        rationale="Unrelated to Moon landing.",
        label="0",
        score="0.01"
    ).with_inputs("index_question", "market_title"),

    # --- Culture / Media ---
    dspy.Example(
        index_question="Will a movie gross over $1 billion worldwide in 2025?",
        market_title="Will Marvel release a film earning over $1B?",
        rationale="Directly predicts box office outcome.",
        label="1",
        score="0.95"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will a movie gross over $1 billion worldwide in 2025?",
        market_title="Will streaming subscriptions increase by 10%?",
        rationale="Streaming trends partially predictive.",
        label="1",
        score="0.42"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will a movie gross over $1 billion worldwide in 2025?",
        market_title="Will AI systems achieve new frontier milestones?",
        rationale="Unrelated to box office outcomes.",
        label="0",
        score="0.01"
    ).with_inputs("index_question", "market_title"),

    # --- Social Metrics ---
    dspy.Example(
        index_question="Will global migration exceed 10 million people next year?",
        market_title="Will refugee numbers exceed 10 million worldwide?",
        rationale="Direct measure of migration.",
        label="1",
        score="0.95"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will global migration exceed 10 million people next year?",
        market_title="Will global unemployment rise by 2%?",
        rationale="Economic trends partially predictive of migration.",
        label="1",
        score="0.42"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will global migration exceed 10 million people next year?",
        market_title="Will Brazil win the next FIFA World Cup?",
        rationale="Sports outcomes unrelated to migration.",
        label="0",
        score="0.01"
    ).with_inputs("index_question", "market_title"),

    # --- Health / Epidemics ---
    dspy.Example(
        index_question="Will Ebola cases exceed 5000 worldwide next year?",
        market_title="Will Ebola infections exceed 5000 people?",
        rationale="Directly measures outbreak scale.",
        label="1",
        score="0.95"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will Ebola cases exceed 5000 worldwide next year?",
        market_title="Will vaccination campaigns expand in West Africa?",
        rationale="Indirectly predictive of outbreak containment.",
        label="1",
        score="0.48"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will Ebola cases exceed 5000 worldwide next year?",
        market_title="Will global cryptocurrency prices rise in 2025?",
        rationale="Unrelated to Ebola cases.",
        label="0",
        score="0.01"
    ).with_inputs("index_question", "market_title"),

    # --- Climate / Extreme Events ---
    dspy.Example(
        index_question="Will global average temperature exceed 1.2°C above pre-industrial levels next year?",
        market_title="Will global average temperature anomaly exceed 1.2°C?",
        rationale="Directly measures climate outcome.",
        label="1",
        score="0.95"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will global average temperature exceed 1.2°C above pre-industrial levels next year?",
        market_title="Will renewable energy capacity increase by 10% globally?",
        rationale="Indirectly predictive of temperature trends.",
        label="1",
        score="0.5"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will global average temperature exceed 1.2°C above pre-industrial levels next year?",
        market_title="Will AI systems achieve frontier milestones?",
        rationale="Unrelated to climate change.",
        label="0",
        score="0.01"
    ).with_inputs("index_question", "market_title"),

    # --- Geopolitics / International Relations continued ---
    dspy.Example(
        index_question="Will the UN pass a major climate treaty next year?",
        market_title="Will the UN approve a binding global climate agreement?",
        rationale="Directly addresses the treaty outcome.",
        label="1",
        score="0.94"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will the UN pass a major climate treaty next year?",
        market_title="Will national CO2 emissions decrease in 2025?",
        rationale="Indirectly related, emissions affect treaty relevance.",
        label="1",
        score="0.53"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will the UN pass a major climate treaty next year?",
        market_title="Will Brazil win the next FIFA World Cup?",
        rationale="Unrelated to UN treaty outcomes.",
        label="0",
        score="0.01"
    ).with_inputs("index_question", "market_title"),

    dspy.Example(
        index_question="Will Russia invade a NATO member in 2025?",
        market_title="Will Russian troops cross into NATO territory?",
        rationale="Directly answers the index question.",
        label="1",
        score="0.96"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will Russia invade a NATO member in 2025?",
        market_title="Will NATO increase defense spending above 6% of GDP?",
        rationale="Defense spending is weakly predictive of invasion risk.",
        label="1",
        score="0.31"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will Russia invade a NATO member in 2025?",
        market_title="Will AI systems achieve frontier milestones in 2025?",
        rationale="Unrelated to geopolitical events.",
        label="0",
        score="0.01"
    ).with_inputs("index_question", "market_title"),

    # --- Economy / Finance continued ---
    dspy.Example(
        index_question="Will global GDP growth exceed 4% next year?",
        market_title="Will the IMF project >4% global GDP growth?",
        rationale="Directly measures economic outcome.",
        label="1",
        score="0.95"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will global GDP growth exceed 4% next year?",
        market_title="Will major central banks increase interest rates?",
        rationale="Interest rates partially predictive of growth.",
        label="1",
        score="0.48"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will global GDP growth exceed 4% next year?",
        market_title="Will global CO2 emissions exceed 36 Gt?",
        rationale="Climate unrelated to immediate GDP growth prediction.",
        label="0",
        score="0.02"
    ).with_inputs("index_question", "market_title"),

    dspy.Example(
        index_question="Will Bitcoin price exceed $150,000 in 2025?",
        market_title="Will BTC surpass $150k by end of 2025?",
        rationale="Directly predicts the target price.",
        label="1",
        score="0.94"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will Bitcoin price exceed $150,000 in 2025?",
        market_title="Will Ethereum market cap surpass $1 trillion?",
        rationale="Related crypto market trend, partially predictive.",
        label="1",
        score="0.39"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will Bitcoin price exceed $150,000 in 2025?",
        market_title="Will global influenza deaths exceed 500,000?",
        rationale="Health metrics unrelated to BTC price.",
        label="0",
        score="0.01"
    ).with_inputs("index_question", "market_title"),

    # --- Science / Physics continued ---
    dspy.Example(
        index_question="Will a physics experiment confirm quantum gravity in 2025?",
        market_title="Will quantum gravity effects be experimentally observed?",
        rationale="Directly answers the index question.",
        label="1",
        score="0.95"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will a physics experiment confirm quantum gravity in 2025?",
        market_title="Will new particle discoveries occur in the LHC?",
        rationale="Partially relevant, new physics may indirectly inform theory.",
        label="1",
        score="0.51"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will a physics experiment confirm quantum gravity in 2025?",
        market_title="Will global obesity rates exceed 15%?",
        rationale="Unrelated to physics experiments.",
        label="0",
        score="0.01"
    ).with_inputs("index_question", "market_title"),

    # --- NASA / Space ---
    dspy.Example(
        index_question="Will NASA launch a mission to Mars in 2025?",
        market_title="Will a crewed Mars mission launch in 2025?",
        rationale="Direct measure of the mission outcome.",
        label="1",
        score="0.95"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will NASA launch a mission to Mars in 2025?",
        market_title="Will planetary rover tests proceed on Earth?",
        rationale="Partially predictive, testing is preparatory.",
        label="1",
        score="0.42"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will NASA launch a mission to Mars in 2025?",
        market_title="Will AI systems achieve new frontier milestones?",
        rationale="Unrelated to Mars mission.",
        label="0",
        score="0.01"
    ).with_inputs("index_question", "market_title"),

    # --- Culture / Media continued ---
    dspy.Example(
        index_question="Will a book become a bestseller in 2025?",
        market_title="Will a specific fantasy novel top NYT bestseller list?",
        rationale="Directly predicts index outcome.",
        label="1",
        score="0.95"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will a book become a bestseller in 2025?",
        market_title="Will total book sales increase by 5% globally?",
        rationale="Partially predictive, reflects market trends.",
        label="1",
        score="0.41"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will a book become a bestseller in 2025?",
        market_title="Will H5N1 cause more than 1000 deaths?",
        rationale="Unrelated to book sales.",
        label="0",
        score="0.01"
    ).with_inputs("index_question", "market_title"),

    # --- Social Metrics continued ---
    dspy.Example(
        index_question="Will global crime rates decrease in 2025?",
        market_title="Will overall crime statistics decline worldwide?",
        rationale="Directly measures index question.",
        label="1",
        score="0.94"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will global crime rates decrease in 2025?",
        market_title="Will police budgets increase globally?",
        rationale="Partially predictive, enforcement may reduce crime.",
        label="1",
        score="0.49"
    ).with_inputs("index_question", "market_title"),
    dspy.Example(
        index_question="Will global crime rates decrease in 2025?",
        market_title="Will global AI systems achieve a new milestone?",
        rationale="Unrelated to crime rates.",
        label="0",
        score="0.01"
    ).with_inputs("index_question", "market_title"),
]



In [16]:
len(test_examples)

72

This following code chunk demonstrates how to set up a reproducible DSPy optimization workflow for a classification or reasoning task using OpenAI's GPT-4.1-mini model. It first shuffles a list of 72 DSPy examples and splits them into a training set (30 examples) and a validation set (42 examples). Random seeds are set for both Python and DSPy to ensure deterministic behavior. The language model is configured with zero temperature and top_p=1 for fully deterministic outputs, and a ChainOfThought DSPy module is defined to map input fields (index_question, market_title) to outputs (rationale, label, score) with the same deterministic settings.

The optimizer, BootstrapFewShotWithRandomSearch, is then prepared to tune the prompt by searching for the best set of few-shot examples from the training data. During optimization, the program is compiled by evaluating different combinations of few-shot demonstrations, using the validation set to select the configuration that maximizes the user-defined metric. This process automates prompt engineering and results in an optimized program that can be used for consistent, high-quality inference on new data.

This code configures the BootstrapFewShotWithRandomSearch optimizer to search for the best set of 30 few-shot examples (both bootstrapped and labeled) from the training data, evaluating 5 candidate programs in one optimization round. The metric checks if the predicted label matches the true label. The optimizer compiles the program using the provided train and validation sets, then saves the resulting optimized program to a file named "relevance_model".

This process automates prompt selection and tuning, ensuring the final model uses the most effective few-shot examples for the task. The saved model can later be loaded for inference or further optimization.

In [None]:
import dspy
import random
from dspy.teleprompt import BootstrapFewShotWithRandomSearch

# shuffle the examples
random.Random(0).shuffle(test_examples)

# split into train and validation sets
trainset = test_examples[:30]  # first 30 for training
valset = test_examples[30:]    # remaining 42 for validation

# set seeds
random.seed(0)
dspy.settings.seed = 0   # controls DSPy randomness

# deterministic LLM
dspy.settings.configure(
    lm=dspy.LM(
        model="gpt-4.1-mini",
        provider="openai",
        temperature=0,
        top_p=1,
    )
)

# program definition
program = dspy.ChainOfThought(
    "index_question, market_title -> rationale, label, score",
    temperature=0,
    top_p=1
)

# optimizer 
optimizer = BootstrapFewShotWithRandomSearch(
    metric=lambda pred, true, trace=None: pred.label == true.label,
    max_bootstrapped_demos=30,
    max_labeled_demos=30,
    max_rounds=1,
    num_candidate_programs=5
)

# compile/train
optimized_program = optimizer.compile(
    program,
    trainset=trainset,
    valset=valset
)


Going to sample between 1 and 30 traces per predictor.
Will attempt to bootstrap 5 candidate sets.
Average Metric: 0.00 / 42 (0.0%): 100%|██████████| 42/42 [00:12<00:00,  3.23it/s] 

2025/11/17 21:44:18 INFO dspy.evaluate.evaluate: Average Metric: 0 / 42 (0.0%)



New best score: 0.0 for seed -3
Scores so far: [0.0]
Best score so far: 0.0
Average Metric: 36.00 / 42 (85.7%): 100%|██████████| 42/42 [00:13<00:00,  3.04it/s]

2025/11/17 21:44:32 INFO dspy.evaluate.evaluate: Average Metric: 36 / 42 (85.7%)



New best score: 85.71 for seed -2
Scores so far: [0.0, 85.71]
Best score so far: 85.71


100%|██████████| 30/30 [01:08<00:00,  2.30s/it]


Bootstrapped 28 full traces after 29 examples for up to 1 rounds, amounting to 30 attempts.
Average Metric: 38.00 / 42 (90.5%): 100%|██████████| 42/42 [00:19<00:00,  2.16it/s]

2025/11/17 21:46:00 INFO dspy.evaluate.evaluate: Average Metric: 38 / 42 (90.5%)



New best score: 90.48 for seed -1
Scores so far: [0.0, 85.71, 90.48]
Best score so far: 90.48


100%|██████████| 30/30 [01:06<00:00,  2.23s/it]


Bootstrapped 27 full traces after 29 examples for up to 1 rounds, amounting to 30 attempts.
Average Metric: 36.00 / 42 (85.7%): 100%|██████████| 42/42 [00:15<00:00,  2.65it/s]

2025/11/17 21:47:23 INFO dspy.evaluate.evaluate: Average Metric: 36 / 42 (85.7%)



Scores so far: [0.0, 85.71, 90.48, 85.71]
Best score so far: 90.48


 20%|██        | 6/30 [00:16<01:06,  2.76s/it]


Bootstrapped 5 full traces after 6 examples for up to 1 rounds, amounting to 6 attempts.
Average Metric: 34.00 / 42 (81.0%): 100%|██████████| 42/42 [00:11<00:00,  3.56it/s]

2025/11/17 21:47:51 INFO dspy.evaluate.evaluate: Average Metric: 34 / 42 (81.0%)



Scores so far: [0.0, 85.71, 90.48, 85.71, 80.95]
Best score so far: 90.48


100%|██████████| 30/30 [01:04<00:00,  2.16s/it]


Bootstrapped 28 full traces after 29 examples for up to 1 rounds, amounting to 30 attempts.
Average Metric: 38.00 / 42 (90.5%): 100%|██████████| 42/42 [00:14<00:00,  2.94it/s]

2025/11/17 21:49:11 INFO dspy.evaluate.evaluate: Average Metric: 38 / 42 (90.5%)



Scores so far: [0.0, 85.71, 90.48, 85.71, 80.95, 90.48]
Best score so far: 90.48


 33%|███▎      | 10/30 [00:27<00:55,  2.80s/it]


Bootstrapped 8 full traces after 10 examples for up to 1 rounds, amounting to 10 attempts.
Average Metric: 36.00 / 42 (85.7%): 100%|██████████| 42/42 [00:16<00:00,  2.62it/s]

2025/11/17 21:49:55 INFO dspy.evaluate.evaluate: Average Metric: 36 / 42 (85.7%)



Scores so far: [0.0, 85.71, 90.48, 85.71, 80.95, 90.48, 85.71]
Best score so far: 90.48


 30%|███       | 9/30 [00:25<00:58,  2.80s/it]


Bootstrapped 8 full traces after 9 examples for up to 1 rounds, amounting to 9 attempts.
Average Metric: 38.00 / 42 (90.5%): 100%|██████████| 42/42 [00:26<00:00,  1.56it/s]

2025/11/17 21:50:47 INFO dspy.evaluate.evaluate: Average Metric: 38 / 42 (90.5%)



Scores so far: [0.0, 85.71, 90.48, 85.71, 80.95, 90.48, 85.71, 90.48]
Best score so far: 90.48
8 candidate programs found.


In [None]:
# save to be loaded in pipeline
optimized_program.save("relevance_model", save_program=True)