<a href="https://colab.research.google.com/github/hanhanwu/Hanhan_COLAB_Experiemnts/blob/master/GenAI_Practice/Langwatch/try_dspy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Try DsPy for RAG Prompt Optimization


In [1]:
%%capture --no-stderr
!pip install --upgrade nbformat
%pip install -U --quiet dspy

## Prepare LLM

* `http://20.102.90.50:2017/wiki17_abstracts` provides the sources for retrieval here

In [2]:
import os
import sys
import contextlib
import pandas as pd
from getpass import getpass
import dspy
from pprint import pprint
from google.colab import userdata
from dspy.datasets import HotPotQA
from dspy.teleprompt import MIPROv2


OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
llm = dspy.LM("openai/gpt-4.1-nano", api_key=OPENAI_API_KEY)

# GOOGLE_AI_API_KEY = userdata.get('GOOGLE_AI_API_KEY')
# llm = dspy.LM("gemini/gemini-2.0-flash", api_key=GOOGLE_AI_API_KEY)

print("LLM test response:", llm("Where's Silicon Valley?"))

# the retrieval model
colbertv2_wiki17_abstracts = dspy.ColBERTv2(
    url="http://20.102.90.50:2017/wiki17_abstracts"
)
dspy.settings.configure(lm=llm, rm=colbertv2_wiki17_abstracts)

LLM test response: ['Silicon Valley is located in the southern part of the San Francisco Bay Area in Northern California, United States. It primarily encompasses parts of Santa Clara County, San Mateo County, and Alameda County. The region is renowned as a global center for technology, innovation, and venture capital, home to many major tech companies and startups.']


## Preparing Dataset

In [3]:
dataset = HotPotQA(train_seed=1, train_size=32, eval_seed=2025, dev_size=50, test_size=0)
trainset = [x.with_inputs('question') for x in dataset.train]
devset = [x.with_inputs('question') for x in dataset.dev]

print()
print(len(trainset), len(devset))
print(trainset[0])
print(devset[0])

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/9.19k [00:00<?, ?B/s]

hotpot_qa.py:   0%|          | 0.00/6.42k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/566M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/47.5M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/46.2M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/90447 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/7405 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/7405 [00:00<?, ? examples/s]


32 50
Example({'question': 'At My Window was released by which American singer-songwriter?', 'answer': 'John Townes Van Zandt'}) (input_keys={'question'})
Example({'question': 'Pehchaan: The Face of Truth stars Vinod Khanna, Rati Agnihotri and which Indian actress, producer, and former model who also produced the film?', 'answer': 'Raveena Tandon', 'gold_titles': {'Raveena Tandon', 'Pehchaan: The Face of Truth'}}) (input_keys={'question'})


## Defining DsPy RAG

In [4]:
class GenerateAnswer(dspy.Signature):
    """Answer questions with short factoid answers."""
    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")


class RAG(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()
        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)

    def forward(self, question):
        context = self.retrieve(question).passages
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context,
                               answer=prediction.answer,
                               reasoning=prediction.reasoning)


dev_example = devset[12]
print(f"[Devset] Question: {dev_example.question}")
print(f"[Devset] Answer: {dev_example.answer}")
print(f"[Devset] Relevant Wikipedia Titles: {dev_example.gold_titles}")
print()

generate_answer = RAG()
pred = generate_answer(question=dev_example.question)
print(f"[Prediction] Question: {dev_example.question}")
print(f"[Prediction] Predicted Answer: {pred.answer}")
print(f"[Prediction] Reasoning: {pred.reasoning}")

[Devset] Question: Twelve Inches is a compilation album by which 1980s British band?
[Devset] Answer: Frankie Goes to Hollywood
[Devset] Relevant Wikipedia Titles: {'Twelve Inches', 'Frankie Goes to Hollywood'}

[Prediction] Question: Twelve Inches is a compilation album by which 1980s British band?
[Prediction] Predicted Answer: Bananarama
[Prediction] Reasoning: The context mentions three compilation albums with "Twelve Inch" in their titles. The first is by Soft Cell, a British band active in the 1980s. The second is by Bananarama, also a British group from the 1980s. The third is by Spandau Ballet, another British band from the 1980s. The question asks specifically about "Twelve Inches," which is the album by Bananarama, a British band from the 1980s.


In [5]:
def validate_context_and_answer(example, prediction):
    gold = example.answer.strip().lower()
    pred = prediction.answer.strip().lower()
    score = int(gold == pred)

    print(f"[Trial] Q: {example.question} | Pred: {pred} | GT: {gold} | Score: {score}")
    return score


optimizer = MIPROv2(
    metric=validate_context_and_answer,
    prompt_model=llm,
    task_model=llm,
    num_candidates=2,  # number of proposed instructions
    init_temperature=0.7,
    seed=10,
    auto=None,
    verbose=True,
    track_stats=True
)


with open('dspy_miprov2_verbose_stats.txt', 'w') as f:
    with contextlib.redirect_stdout(f):
        compiled_rag = optimizer.compile(
            RAG(),
            trainset=trainset,
            num_trials=5,
            max_bootstrapped_demos=2,
            max_labeled_demos=3,
            minibatch_size=4,
            requires_permission_to_run=False
        )

2025/06/06 02:40:33 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 1: BOOTSTRAP FEWSHOT EXAMPLES <==
2025/06/06 02:40:33 INFO dspy.teleprompt.mipro_optimizer_v2: These will be used as few-shot example candidates for our program and for creating instructions.

2025/06/06 02:40:33 INFO dspy.teleprompt.mipro_optimizer_v2: Bootstrapping N=2 sets of demonstrations...
2025/06/06 02:40:33 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 2: PROPOSE INSTRUCTION CANDIDATES <==
2025/06/06 02:40:33 INFO dspy.teleprompt.mipro_optimizer_v2: We will use the few-shot examples from the previous step, a generated dataset summary, a summary of the program code, and a randomly selected prompting tip to propose instructions.
2025/06/06 02:40:35 INFO dspy.teleprompt.mipro_optimizer_v2: 
Proposing N=2 instructions...

2025/06/06 02:40:43 INFO dspy.teleprompt.mipro_optimizer_v2: Proposed Instructions for Predictor 0:

2025/06/06 02:40:43 INFO dspy.teleprompt.mipro_optimizer_v2: 0: Answer questions with 

In [6]:
# optimized results
compiled_rag

generate_answer.predict = Predict(StringSignature(context, question -> reasoning, answer
    instructions='Given a set of relevant passages or context related to a factual question, generate a concise, accurate answer that directly responds to the question. Additionally, provide a step-by-step reasoning process explaining how you arrived at the answer, ensuring that your explanation reflects a clear understanding of the facts. Your responses should be brief and focused, emphasizing correctness and clarity to support knowledge verification tasks involving questions about names, dates, and specific details across various subjects. Carefully consider the context, think through the facts logically, and craft an answer that is both precise and supported by the retrieved information.'
    context = Field(annotation=str required=True json_schema_extra={'desc': 'may contain relevant facts', '__dspy_field_type': 'input', 'prefix': 'Context:'})
    question = Field(annotation=str required=True j

In [7]:
# example output with optimized results
dev_example = devset[0]
pred = compiled_rag(question=dev_example.question)
print("\n--- Test on dev example ---")
print(f"Question: {dev_example.question}")
print(f"Predicted Answer: {pred.answer}")
print(f"Ground Truth: {dev_example.answer}")


--- Test on dev example ---
Question: Pehchaan: The Face of Truth stars Vinod Khanna, Rati Agnihotri and which Indian actress, producer, and former model who also produced the film?
Predicted Answer: Raveena Tandon
Ground Truth: Raveena Tandon
