<a href="https://colab.research.google.com/github/hamzafarooq/multi-agent-course/blob/main/Module_6/DSPy/DSPy%20Introduction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Intro to [DSPy](https://github.com/stanfordnlp/dspy?tab=readme-ov-file)

## 1. Installing the requirements

In [None]:
!pip install dspy-ai

In [None]:
import dspy
import sys
import os


## 2. Setting up your LM and RM

We'll start by setting up the language model (LM) and retrieval model (RM).

In this notebook, we'll work with GPT-4o and the retriever ColBERTv2.

To make things easy, we've set up a ColBERTv2 server hosting a Wikipedia 2017 "abstracts" search index (i.e., containing first paragraph of each article from this 2017 dump).


In [None]:
turbo = dspy.LM(model = 'gpt-4o-mini')
colbertv2_wiki17_abstracts = dspy.ColBERTv2(url = 'http://20.102.90.50:2017/wiki17_abstracts')
dspy.settings.configure(lm = turbo, rm = colbertv2_wiki17_abstracts)

In [None]:
from dspy.datasets import HotPotQA #HotPotQA dataset is used to benchmark multi-hop QA

# Load the dataset.
dataset = HotPotQA(train_seed=1, train_size=20, eval_seed=2023, dev_size=50, test_size=0) #Notice the size of training and dev dataset! Teeny tiny compared to other ML models.

# Tell DSPy that the 'question' field is the input. Any other fields are labels and/or metadata.
trainset = [x.with_inputs('question') for x in dataset.train]
devset = [x.with_inputs('question') for x in dataset.dev]

len(trainset), len(devset)

Let's check some examples!

In [None]:
for i in range(5):
  train_example = trainset[i]
  print(f"Question: {train_example.question}")
  print(f"Answer: {train_example.answer}", '\n')

As you see, not all questions are multi-hop, e.g. the very first one. But, the second question is one such question as it requires breaking up the question into pieces in order to provide an answer.

Let's check an example from the development dataset. While we will not touch this for training, we will use this for metric evaluation.

In [None]:
dev_example = devset[18]
print(f"Question: {dev_example.question}")
print(f"Answer: {dev_example.answer}")
print(f"Relevant Wikipedia Titles: {dev_example.gold_titles}")

In [None]:
#This cell instructs how the data is presented to the model
print(f"For this dataset, training examples have input keys {train_example.inputs().keys()} and label keys {train_example.labels().keys()}")
print(f"For this dataset, dev examples have input keys {dev_example.inputs().keys()} and label keys {dev_example.labels().keys()}")

## 3. Defining simple Signature and Predictor

In [None]:
class BasicQA(dspy.Signature):
    """Answer questions with short factoid answers."""

    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")

In [None]:
import os

os.environ["OPENAI_API_KEY"] = 'your api key'  #my key redacted - add your own key here

# Define the predictor.
generate_answer = dspy.Predict(BasicQA)

# Call the predictor on a particular input.
pred = generate_answer(question=dev_example.question)

# Print the input and the prediction.
print(f"Question: {dev_example.question}")
print(f"Predicted Answer: {pred.answer}")

^ If this gives an error complaining about not having and API_Key, click on the link and get a key. And run the following command:

**Wrong answer**. The chef is [Robert Irvine](https://en.wikipedia.org/wiki/Robert_Irvine), who is in fact British.


We can explore the history of this answer.

In [None]:
turbo.inspect_history(n=1)

There is no reasoning or chain of thought in the history of the LLM provided above. Instead of using `Predict`, we will use the `ChainOfThought` module of the `DSPy`.

In [None]:
dev_example = devset[18]
print(f"Question: {dev_example.question}")
print(f"Answer: {dev_example.answer}")
print(f"Relevant Wikipedia Titles: {dev_example.gold_titles}")


In [None]:
# Replacing the dspy.Predict(BasicQA) with dspy.ChainOfThought(BasicQA) -> Notice that the BasicQA signature in untouched.
generate_answer_with_chain_of_thought = dspy.ChainOfThought(BasicQA)

# Call the predictor on the same input.
pred = generate_answer_with_chain_of_thought(question=dev_example.question)

# Print the input, the chain of thought, and the prediction.
print(f"Question: {dev_example.question}")
print(f"Predicted Answer: {pred.answer}")

In [None]:
turbo.inspect_history(n=1)

## 4. Retrieval and basic RAG

In [None]:
retrieve = dspy.Retrieve(k=3)
topK_passages = None
while True:
    try:
        topK_passages = retrieve(dev_example.question).passages
        break
    except Exception as e:
        continue

print(f"Top {retrieve.k} passages for question: {dev_example.question} \n", '-' * 30, '\n')

for idx, passage in enumerate(topK_passages):
    print(f'{idx+1}]', passage, '\n')

In [None]:
topK_passages

In [None]:
#check 3 passages for the same question
for i in range(3):
  while True:
    try:
      print(retrieve(dev_example.question).passages[i], '\n')
      break
    except Exception as e:
      continue

In [None]:
class GenerateAnswer(dspy.Signature):
    """Answer questions with short factoid answers."""

    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")

In [None]:
class RAG(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()

        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)

    def forward(self, question):
        context = self.retrieve(question).passages
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=prediction.answer)

Time for optimizer.


In [None]:
from dspy.teleprompt import BootstrapFewShot

# Validation logic: check that the predicted answer is correct.
# Also check that the retrieved context does actually contain that answer.
def validate_context_and_answer(example, pred, trace=None):
    answer_EM = dspy.evaluate.answer_exact_match(example, pred) #This metric is Exact Match
    return answer_EM

# Set up a basic teleprompter, which will compile our RAG program.
teleprompter = BootstrapFewShot(metric=validate_context_and_answer) #This line bootstraps few-shot examples

# Compile!
compiled_rag = None
while True:
    try:
      compiled_rag = teleprompter.compile(RAG(), trainset=trainset)
      break
    except Exception as e:
      print(f"Exception: {str(e)}")

^ It'll stop once it has reached some performance threshold.


In [None]:
# Ask any question you like to this simple RAG program.
my_question = "What castle did David Gregory inherit?"

# Get the prediction. This contains `pred.context` and `pred.answer`.
pred = None
while True:
    try:
      pred = compiled_rag(my_question)
      break
    except Exception as e:
      continue

# Print the contexts and the answer.
print(f"Question: {my_question}")
print(f"Predicted Answer: {pred.answer}")


In [None]:
# Let's check the retrieved passage

print(f"Retrieved Contexts (truncated): {[c[:200] + '...' for c in pred.context]}")

**For Readability:**

Retrieved Contexts (truncated): ['David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinn...', 'Gregory Tarchaneiotes | Gregory Tarchaneiotes (Greek: Γρηγόριος Ταρχανειώτης , Italian: "Gregorio Tracanioto" or "Tracamoto" ) was a "protospatharius" and the long-reigning catepan of Italy from 998 t...', 'David Gregory (mathematician) | David Gregory (originally spelt Gregorie) FRS (? 1659 – 10 October 1708) was a Scottish mathematician and astronomer. He was professor of mathematics at the University ...']

**And the wikipedia link**: [David Gregory](https://en.wikipedia.org/wiki/David_Gregory_(physician)




In [None]:
turbo.inspect_history(n=1) #To see the last context the LLM has seen. If you wanna see the previous context, you can set "n" to that number

Inspect the parameters

In [None]:
for name, parameter in compiled_rag.named_predictors():
  print(name)
  print(parameter.demos[0], '\n')


## 5. Evaluate the Model

In [None]:
from dspy.evaluate import Evaluate

# Set up the `evaluate_on_hotpotqa` function.
evaluate_on_hotpotqa = Evaluate(devset=devset, num_threads=1, display_progress=True, display_table=5)

# Evaluate the `compiled_rag` program with the `answer_exact_match` metric.
metric = None
while True:
    try:
        metric = dspy.evaluate.answer_exact_match
        evaluate_on_hotpotqa(compiled_rag, metric=metric)
        break
    except Exception as e:
        continue

evaluate_on_hotpotqa(compiled_rag, metric=metric)

In [None]:
def gold_passages_retrieved(example, pred, trace=None):
    gold_titles = set(map(dspy.evaluate.normalize_text, example['gold_titles']))
    found_titles = set(map(dspy.evaluate.normalize_text, [c.split(' | ')[0] for c in pred.context]))

    return gold_titles.issubset(found_titles)
compiled_rag_retrieval_score = None

while True:
    try:
        compiled_rag_retrieval_score = evaluate_on_hotpotqa(compiled_rag, metric=gold_passages_retrieved)
        break
    except Exception as e:
        continue

For more advanced topics, refer to https://github.com/stanfordnlp/dspy/tree/main


<h3 align="center"></h3>


<h3 align="center">---Son---</h3>








