<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop/blob/main/docs/module_04_llm_apps/02_dspy_demo.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# <img src="../assets/dspy_logo.png" width="2%"> DSPy: Beyond Prompting
---
<img src="../assets/dspy_banner.png">

---

## Intro
- Language Models are like extremely complex machines with capabilities to retrieve and reformulate information from an **extremely large latent space**.
- To guide this search and achieve desired responses we heavily rely on **complex, long and brittle prompts** which (at times) are very specific to certain LLMs
- Being an open area of research, teams are working from different perspectives to abstract and enable rapid development of **LLM-enabled systems**.
- **StanfordDSpy** is one such framework for algorithmally optimizing LM prompts and weights.


> $_{DSpy\ logo\ is\ copyright/ownership\ of\ respective\ teams}$

## Ok, You Got Me Intrigued, Tell Me More?

- The DSpy framework takes inspiration from deep learning frameworks such as <img src="./assets/pytorch_logo.png" width="2%">[PyTorch](https://pytorch.org/)
    - For instance, to build a deep neural network using PyTorch we simply use standard layers such as ``convolution``, ``dropout``, ``linear`` and attach them to optimizers like ``Adam`` and train without worrying about implementing these from scratch everytime.
- Similarly, DSpy provides a a set of standard general purpose **modules** (``ChainOfThought``,``Predict``), **optimizers** (``BootstrapFewShotWithRandomSearch``) and helps us build systems by composing these components as layers into a ``Program`` without explicitly dealing with prompts! Neat isn't it?

### Usual Prompt Based Workflow
<img src="../assets/prompt_workflow.png" width="75%">

---


### LangChain-Like Workflow

<img src="../assets/langchain_workflow.png" width="75%">

---


### DSpy Workflow

<img src="../assets/dspy_workflow.png" >

---


## Time to Put Words into Action

In [None]:
# !pip3 install dspy==2.6.27

In [2]:
import dspy
dspy.__version__

'2.6.27'

In [1]:
import os
import sys
import dspy
from dsp.utils import deduplicate

import itertools
import random
from scraper_utils import NB_Markdown_Scraper

In [2]:
OPENAI_TOKEN = '<YOUR TOKEN>'
lm = dspy.LM('openai/gpt-4o-2024-11-20', api_key=OPENAI_TOKEN)
# lm = dspy.LM(
#     'ollama_chat/llama3.1',
#     api_base='http://localhost:11434',
#     api_key=OPENAI_TOKEN
# )
dspy.configure(lm=lm)

## Prepare Data

We will scrape and extract text/markdown cells from all notebooks in this repository and prepare a dataset using the same.

In [3]:
[f'../{d}' for d in os.listdir("../") if d.startswith("module")]

['../module_01_lm_fundamentals',
 '../module_03_instruction_tuning_and_alignment',
 '../module_02_llm_building_blocks',
 '../module_04_llm_apps']

In [4]:
nb_scraper = NB_Markdown_Scraper([f'../{d}' for d in os.listdir("../") if d.startswith("module")])
nb_scraper.scrape_markdowns()

In [5]:
nb_scraper.notebook_md_dict.keys()

dict_keys(['module_01_lm_fundamentals_01_text_representation', 'module_01_lm_fundamentals_02_contextual_embeddings', 'module_03_instruction_tuning_and_alignment_01_instruction_tuning_llama_txt2py', 'module_03_instruction_tuning_and_alignment_02_RLHF_phi2', 'module_03_instruction_tuning_and_alignment_03_zephyr_alignment_dpo', 'module_02_llm_building_blocks_02_transformers_pipelines', 'module_02_llm_building_blocks_03_training_language_models', 'module_02_llm_building_blocks_01_transformers', 'module_02_llm_building_blocks_04_llm_training_and_scaling', 'module_04_llm_apps_03_mcp_getting_started', 'module_04_llm_apps_01_retrieval_augmented_llm_app', 'module_04_llm_apps_02_dspy_demo'])

In [6]:
with open("./dspy_content.tsv", "w") as record_file:
    for k,v in nb_scraper.notebook_md_dict.items():
        record_file.write(f"{k}\t{v}\n")

In [7]:
doc_ids = []
ctr = 1
for k,_ in nb_scraper.notebook_md_dict.items():
    doc_ids.append(f'{ctr}_{k}')
    ctr+= 1

## Setup Chroma
We started this workshop with **text representation** as one of the key components of any NLP system.
As we progressed from simple Bag of Words setup to highly contextualised Transformer models, we now have rich & dense representations.
The utility of such representations also increased multifold from word/sentence representations to features that can used for a number of downstream tasks.

These representations, also called as vectors or embedding vectors are long series of numbers. Their retrieval and persistence requires specialised database management systems called **Vector Databases**.

Vector Databases are particularly suited for handling data in the form of vectors, embeddings, or feature representations, which are commonly used in various applications like machine learning, natural language processing, computer vision, and recommendation systems.

Key Features:
- High-dimensional Data Support
- Similarity Search
- Indexing Techniques
- Dimensionality Reduction

There are a number of different off-the-shelf options available, such as:
- [ChromaDB](https://www.trychroma.com/)
- [PineCone](https://www.pinecone.io/)
- [Milvus](https://milvus.io/)
- [Weaviate](https://weaviate.io/)
- [AeroSpike](https://aerospike.com/)
- [OpenSearch](https://opensearch.org/)

**Let's Install the Dependencies**
```python
!pip install -q chromadb
!pip install retry
!pip install -U sentence-transformers
```

> ensure chroma is running on your terminal `$>chroma run --path ./chromadb`

## Vector Database: ChromaDB

As mentioned above, there are a number of offering available. For this workshop we will make use of
[ChromaDB](https://www.trychroma.com/).

It is a super simple setup which is easy to use. The following figure showcases the overall flow

<img src="../assets/04_chroma_workflow.png">

> Source :[chromadb](https://docs.trychroma.com/)

## Sentence Transformers

This is an amazing python framework initially proposed along with the seminal paper titled [Sentence-BERT](https://www.sbert.net/).
It provides clean high-level interfaces to easily use Language Models for computing text embeddings for various use-cases.

In this notebook we will leverage pretrained models supported by sentence transformer rather than directly using the package.

There is a [leaderboard](https://huggingface.co/spaces/mteb/leaderboard) now maintained to keep track of the state-of-the-art embedding models called the **Massive Text Embedding Benchmark (MTEB) Leaderboard**

<img src="../assets/04_mteb.png">

> Source : [HuggingFace](https://huggingface.co/spaces/mteb/leaderboard)

In [8]:
import chromadb
from chromadb.utils import embedding_functions
chroma_emb_fn = embedding_functions.SentenceTransformerEmbeddingFunction(model_name="all-MiniLM-L6-v2")
client = chromadb.HttpClient()



In [9]:
CHROMA_COLLECTION_NAME = "workshop_collection"

In [10]:
client.delete_collection(CHROMA_COLLECTION_NAME)

In [11]:
collection = client.create_collection(
    CHROMA_COLLECTION_NAME,
    embedding_function=chroma_emb_fn,
    metadata={"hnsw:space": "cosine"}
)

In [12]:
# Add to collection
collection.add(
    documents=[v for _,v in nb_scraper.notebook_md_dict.items()], 
    ids=doc_ids, # must be unique for each doc
)

In [13]:
results = collection.query(
    query_texts=["RLHF"], # Chroma will embed using the function we provided
    n_results=3 # how many results to return
)
print(results['ids'][0])
print(results['distances'][0])
#print([i[:100] for j in results['documents'] for i in j])

['1_module_01_lm_fundamentals_01_text_representation', '4_module_03_instruction_tuning_and_alignment_02_RLHF_phi2', '2_module_01_lm_fundamentals_02_contextual_embeddings']
[0.799641268192753, 0.8011068851342594, 0.851910180221277]


# Chroma as RM for DSPY

In [14]:
import dspy

In [15]:
import dspy
from dspy.retrieve.chromadb_rm import ChromadbRM
import os
import openai
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction

In [16]:
retriever_model = ChromadbRM(
    CHROMA_COLLECTION_NAME,
    './chromadb/',
    embedding_function=chroma_emb_fn,
    k=5
)

results = retriever_model("RLHF", k=5)

for result in results:
    print(f'Document id::{result.id}')
    print(f'Document score::{result.score}')
    print("Document:", result.long_text[:50],'...' ,"\n")

Document id::1_module_01_lm_fundamentals_01_text_representation
Document score::0.799641268192753
Document: <a target="_blank" href="https://colab.research.go ... 

Document id::4_module_03_instruction_tuning_and_alignment_02_RLHF_phi2
Document score::0.8011068851342594
Document: <a target="_blank" href="https://colab.research.go ... 

Document id::2_module_01_lm_fundamentals_02_contextual_embeddings
Document score::0.851910180221277
Document: <a target="_blank" href="https://colab.research.go ... 

Document id::5_module_03_instruction_tuning_and_alignment_03_zephyr_alignment_dpo
Document score::0.8664296651969585
Document: <a target="_blank" href="https://colab.research.go ... 

Document id::7_module_02_llm_building_blocks_03_training_language_models
Document score::0.9139603656846649
Document: <a target="_blank" href="https://colab.research.go ... 



## Basic DSPy Program

In [17]:
# Set up the LM and RM
dspy.settings.configure(
    lm=lm,
    rm=retriever_model
)

In [18]:
class GenerateAnswer(dspy.Signature):
    """Answer questions with short factoid answers."""

    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")

In [19]:
class RAG(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()

        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
    
    def forward(self, question):
        context = self.retrieve(question).passages
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=prediction.answer)

In [20]:
my_question = "List the models covered in module02"
compiled_rag = RAG()
# Get the prediction. This contains `pred.context` and `pred.answer`.
pred = compiled_rag(my_question)

# Print the contexts and the answer.
print(f"Question: {my_question}")
print(f"Predicted Answer: {pred.answer}")
for c in pred.context:
    print(f"Retrieved Contexts (truncated):{c[:100]}..." )

Question: List the models covered in module02
Predicted Answer: GPT-2
Retrieved Contexts (truncated):<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...
Retrieved Contexts (truncated):<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...
Retrieved Contexts (truncated):<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...


## Multi-Hop DSPy Program

In [21]:
from dsp.utils import deduplicate

In [22]:
class GenerateSearchQuery(dspy.Signature):
    """Write a simple search query that will help answer a complex question."""

    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    query = dspy.OutputField()

In [23]:
class SimplifiedBaleen(dspy.Module):
    def __init__(self, passages_per_hop=3, max_hops=2):
        super().__init__()

        self.generate_query = [dspy.ChainOfThought(GenerateSearchQuery) for _ in range(max_hops)]
        self.retrieve = dspy.Retrieve(k=passages_per_hop)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
        self.max_hops = max_hops
    
    def forward(self, question):
        context = []
        
        for hop in range(self.max_hops):
            query = self.generate_query[hop](context=context, question=question).query
            passages = self.retrieve(query).passages
            context = deduplicate(context + passages)

        pred = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=pred.answer)

In [24]:
# Get the prediction. This contains `pred.context` and `pred.answer`.
uncompiled_baleen = SimplifiedBaleen()  # uncompiled (i.e., zero-shot) program
pred = uncompiled_baleen(my_question)

# Print the contexts and the answer.
print(f"Question: {my_question}")
print(f"Predicted Answer: {pred.answer}")
for c in pred.context:
    print(f"Retrieved Contexts (truncated):{c[:100]}..." )

Question: List the models covered in module02
Predicted Answer: GPT-2
Retrieved Contexts (truncated):<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...
Retrieved Contexts (truncated):<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...
Retrieved Contexts (truncated):<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...
Retrieved Contexts (truncated):<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...


In [25]:
lm.inspect_history(n=1)





[34m[2025-08-17T23:24:37.314904][0m

[31mSystem message:[0m

Your input fields are:
1. `context` (str): may contain relevant facts
2. `question` (str):
Your output fields are:
1. `reasoning` (str): 
2. `answer` (str): often between 1 and 5 words
All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## context ## ]]
{context}

[[ ## question ## ]]
{question}

[[ ## reasoning ## ]]
{reasoning}

[[ ## answer ## ]]
{answer}

[[ ## completed ## ]]
In adhering to this structure, your objective is: 
        Answer questions with short factoid answers.


[31mUser message:[0m

[[ ## context ## ]]
[1] «««
    <a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop_dhs2025/blob/main/docs/module_02_llm_building_blocks/03_training_language_models.ipynb">
      <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
    </a> # Training Language Models ---
    ## Rec

## How Does it All Go?

In [28]:
from IPython.display import display, Markdown

In [27]:
questions = [
    "Which model is used for instruction fine-tuning?",
    "List the models covered in module03",
    "Summarize key takeaways for module02",
    "What is the focus of the following modules: module_01,module_02,module_03 and module_04? Respond as a list",
    "For RLHF what policy is covered in module03?"
]

uncompiled_baleen = SimplifiedBaleen(passages_per_hop=5)  # uncompiled (i.e., zero-shot) program
for question in questions:
    display(Markdown(f"**Question**:{question}"))
    pred = uncompiled_baleen(question)
    display(Markdown(f"**Predicted Answer**: {pred.answer}"))
    display(Markdown("**Retrieved Contexts (truncated)**:"))
    for c in pred.context:
        print(f"{c[:100]}..." )
    display(Markdown("---"))

**Question**:Which model is used for instruction fine-tuning?

**Predicted Answer**: LLaMA 2-7B, 3.1-8B

**Retrieved Contexts (truncated)**:

<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...
<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...
<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...
<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...
<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...
<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...


---

**Question**:List the models covered in module03

**Predicted Answer**: LLaMA, InstructGPT

**Retrieved Contexts (truncated)**:

<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...
<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...
<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...
<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...
<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...
<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...
<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...


---

**Question**:Summarize key takeaways for module02

**Predicted Answer**: Training, tokenization, BERT, GPT

**Retrieved Contexts (truncated)**:

<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...
<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...
<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...
<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...
<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...
<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...
<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...
<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...
<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...


---

**Question**:What is the focus of the following modules: module_01,module_02,module_03 and module_04? Respond as a list

**Predicted Answer**: - Module_01: Text representation and tokenization  
- Module_02: Training language models  
- Module_03: Instruction tuning and alignment  
- Module_04: Advanced applications (tool calling, MCP, vector databases)

**Retrieved Contexts (truncated)**:

<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...
<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...
<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...
<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...
<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...
<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...
<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...


---

**Question**:For RLHF what policy is covered in module03?

**Predicted Answer**: Proximal Policy Optimization

**Retrieved Contexts (truncated)**:

<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...
<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...
<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...
<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...
<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...
<a target="_blank" href="https://colab.research.google.com/github/raghavbali/mastering_llms_workshop...


---