## Import the Data
First step - we will choose the model that we will be trying to optimzie and access the dataset.

In [None]:
import dspy

turbo = dspy.OpenAI(model='gpt-3.5-turbo')
colbertv2_wiki17_abstracts = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')

dspy.settings.configure(lm=turbo, rm=colbertv2_wiki17_abstracts)

Next - we want to split the data in the training and test datasets.

In [2]:
from dspy.datasets import HotPotQA

# Load the dataset.
dataset = HotPotQA(train_seed=1, train_size=20, eval_seed=2023, dev_size=50, test_size=0)

# Tell DSPy that the 'question' field is the input. Any other fields are labels and/or metadata.
trainset = [x.with_inputs('question') for x in dataset.train]
devset = [x.with_inputs('question') for x in dataset.dev]

len(trainset), len(devset)

Downloading builder script:   0%|          | 0.00/6.42k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/9.19k [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/566M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/47.5M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/46.2M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/90447 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/7405 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/7405 [00:00<?, ? examples/s]

  table = cls._concat_blocks(blocks, axis=0)


(20, 50)

## Defining the task
DSPy is a great tool for prompt optimization. It can optimize your prompt along multiple dimensions - selecting the best demos (examples from your dataset) for the best few-shot learning, improving the task description or even generating its own demos that are supposed to optimize the few-shot learning even further (bootstrapping).
We will start with defining the task - and we will do that as simple as possible.

In [4]:
class GenerateAnswer(dspy.Signature):
    """Answer questions with short factoid answers."""

    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")

Next, we will define the RAG class that will actually perform the generation of the response.

In [3]:
class RAG(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()

        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
    
    def forward(self, question):
        context = self.retrieve(question).passages
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=prediction.answer)

## Observe your optimization
Outputs in CLI are cool, but what if you want to watch how the signature is being changed during different optimization steps? Or what if you want to know how much such optimization costed you? LangWatch can help here!
First, you need to connect to the endpoint and save the api key in your `.env` variables.

In [5]:
import langwatch

langwatch.endpoint = "https://app.langwatch.ai"
langwatch.login()

Please go to https://app.langwatch.ai/authorize to get your API key
LangWatch API key set


## Optimize and Explore
Finally, you can run the optimization and see how it's evolving. After each step is accomplished - you can see which demos were chosen for the few-shot learning. Be aware, with `BootstrapFewShot` optimizer you will not be able to improve your signature (task description), however you can try using `COPRO` or `MIPRO` for that.

In [7]:
from dspy.teleprompt import BootstrapFewShot
from dspy import evaluate
from dotenv import load_dotenv
load_dotenv()

# Validation logic: check that the predicted answer is correct.
# Also check that the retrieved context does actually contain that answer.
def validate_context_and_answer(example, pred, trace=None):
    answer_EM = evaluate.answer_exact_match(example, pred)
    answer_PM = evaluate.answer_passage_match(example, pred)
    return answer_EM and answer_PM

# Set up a basic teleprompter, which will compile our RAG program.
teleprompter = BootstrapFewShot(metric=validate_context_and_answer)

langwatch.dspy.init(experiment="rag-dspy-tutorial", optimizer=teleprompter)

# Compile!
compiled_rag = teleprompter.compile(RAG(), trainset=trainset)


[LangWatch] Experiment initialized, run_id: masked-acoustic-seal
[LangWatch] Open https://app.langwatch.ai/inbox-narrator/experiments/rag-dspy-tutorial?runIds=masked-acoustic-seal to track your DSPy training session live



 55%|█████▌    | 11/20 [00:13<00:11,  1.24s/it]
