# <img src="./assets/dspy_logo.png" width="2%"> DSPy: Beyond Prompting
---
<img src="./assets/dspy_banner.png">

---

## Intro
- Language Models are like extremely complex machines with capabilities to retrieve and reformulate information from an **extremely large latent space**.
- To guide this search and achieve desired responses we heavily rely on **complex, long and brittle prompts** which (at times) are very specific to certain LLMs
- Being an open area of research, teams are working from different perspectives to abstract and enable rapid development of **LLM-enabled systems**.
- **StanfordDSpy** is one such framework for algorithmally optimizing LM prompts and weights.


> $_{DSpy\ logo\ is\ copyright/ownership\ of\ respective\ teams}$

## Ok, You Got Me Intrigued, Tell Me More?

- The DSpy framework takes inspiration from deep learning frameworks such as <img src="./assets/pytorch_logo.png" width="2%">[PyTorch](https://pytorch.org/)
    - For instance, to build a deep neural network using PyTorch we simply use standard layers such as ``convolution``, ``dropout``, ``linear`` and attach them to optimizers like ``Adam`` and train without worrying about implementing these from scratch everytime.
- Similarly, DSpy provides a a set of standard general purpose **modules** (``ChainOfThought``,``Predict``), **optimizers** (``BootstrapFewShotWithRandomSearch``) and helps us build systems by composing these components as layers into a ``Program`` without explicitly dealing with prompts! Neat isn't it?

### Usual Prompt Based Workflow
<img src="./assets/prompt_workflow.png" width="75%">

---


### LangChain-Like Workflow

<img src="./assets/langchain_workflow.png" width="75%">

---


### DSpy Workflow

<img src="./assets/dspy_workflow.png" >

---


## Time to Put Words into Action

In [1]:
import os
import sys
import dspy
from dsp.utils import deduplicate

import itertools
import random
from scraper_utils import NB_Markdown_Scraper

In [32]:
# this should be modular 
OPENAI_TOKEN = '<YOUR TOKEN>'
llm_model = dspy.OpenAI(model='gpt-4o-mini',
                    api_key=OPENAI_TOKEN,
                    max_tokens=1024*4,
                    temperature=0.7, 
                    model_type="chat")

## Prepare Data

We will scrape and extract text/markdown cells from all notebooks in this repository and prepare a dataset using the same.

In [3]:
nb_scraper = NB_Markdown_Scraper([f'../module_0{i}' for i in range(1,5)])
nb_scraper.scrape_markdowns()

In [4]:
nb_scraper.notebook_md_dict.keys()

dict_keys(['module_01_03_explore_transformers', 'module_01_02_getting_started', 'module_02_02_simple_text_generator', 'module_03_02_instruction_tuning_llama_t2sql', 'module_03_01_llm_training_and_scaling', 'module_03_03_RLHF_phi2', 'module_04_04_retrieval_augmented_llm_app', 'module_04_02_vector_databases_hf_inference_endpoint', 'module_04_03_OpenSource_ClosedSource_LLMs', 'module_04_01_prompt_engineeering_and_langchain', 'module_04_05_dspy_demo'])

In [5]:
with open("./dspy_content.tsv", "w") as record_file:
    for k,v in nb_scraper.notebook_md_dict.items():
        record_file.write(f"{k}\t{v}\n")

In [6]:
doc_ids = []
ctr = 1
for k,_ in nb_scraper.notebook_md_dict.items():
    doc_ids.append(f'{ctr}_{k}')
    ctr+= 1

## Setup Chroma
> ensure chroma is running on your terminal `$>chroma run --path ./chromadb`

In [7]:
import chromadb
from chromadb.utils import embedding_functions
chroma_emb_fn = embedding_functions.SentenceTransformerEmbeddingFunction(model_name="all-MiniLM-L6-v2")
client = chromadb.HttpClient()

In [8]:
CHROMA_COLLECTION_NAME = "workshop_collection"

In [9]:
client.delete_collection(CHROMA_COLLECTION_NAME)

In [10]:
collection = client.create_collection(
    CHROMA_COLLECTION_NAME,
    embedding_function=chroma_emb_fn,
    metadata={"hnsw:space": "cosine"}
)

In [11]:
# Add to collection
collection.add(
    documents=[v for _,v in nb_scraper.notebook_md_dict.items()], 
    ids=doc_ids, # must be unique for each doc
)

In [12]:
results = collection.query(
    query_texts=["RLHF"], # Chroma will embed using the function we provided
    n_results=3 # how many results to return
)
print(results['ids'][0])
print(results['distances'][0])
#print([i[:100] for j in results['documents'] for i in j])

['6_module_03_03_RLHF_phi2', '2_module_01_02_getting_started', '3_module_02_02_simple_text_generator']
[0.6174977412306334, 0.8062083377747705, 0.8820602339897555]


# Chroma as RM for DSPY

In [13]:
import dspy

In [14]:
import dspy
from dspy.retrieve.chromadb_rm import ChromadbRM
import os
import openai
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction

In [15]:
retriever_model = ChromadbRM(
    CHROMA_COLLECTION_NAME,
    './chromadb/',
    embedding_function=chroma_emb_fn,
    k=5
)

results = retriever_model("RLHF", k=5)

for result in results:
    print(f'Document id::{result.id}')
    print(f'Document score::{result.score}')
    print("Document:", result.long_text[:50],'...' ,"\n")

Document id::6_module_03_03_RLHF_phi2
Document score::0.6174977412306334
Document: # Quick Overview of RLFH

The performance of Langu ... 

Document id::2_module_01_02_getting_started
Document score::0.8062083377747705
Document: # Getting Started : Text Representation
<img src=" ... 

Document id::3_module_02_02_simple_text_generator
Document score::0.8820602339897555
Document: # Text Generation <a target="_blank" href="https:/ ... 

Document id::11_module_04_05_dspy_demo
Document score::0.9200280698248913
Document: # <img src="./assets/dspy_logo.png" width="2%"> DS ... 

Document id::8_module_04_02_vector_databases_hf_inference_endpoint
Document score::0.947110437471832
Document: ## Vector Databases

<img src="./assets/vector_ban ... 



## Basic DSPy Program

In [16]:
# Set up the LM and RM
dspy.settings.configure(lm=llm_model,rm=retriever_model)

In [17]:
class GenerateAnswer(dspy.Signature):
    """Answer questions with short factoid answers."""

    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")

In [18]:
class RAG(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()

        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
    
    def forward(self, question):
        context = self.retrieve(question).passages
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=prediction.answer)

In [19]:
my_question = "List the models covered in module03"
compiled_rag = RAG()
# Get the prediction. This contains `pred.context` and `pred.answer`.
pred = compiled_rag(my_question)

# Print the contexts and the answer.
print(f"Question: {my_question}")
print(f"Predicted Answer: {pred.answer}")
for c in pred.context:
    print(f"Retrieved Contexts (truncated):{c[:100]}..." )

Question: List the models covered in module03
Predicted Answer: GPT, BERT, T5
Retrieved Contexts (truncated):# Prompt Engineering
<img src="./assets/pe_banner.jpg">

Prompt Engineering is this thrilling new di...
Retrieved Contexts (truncated):# Scaling Neural Nets and Efficient Training

We have covered quite some ground in previous 2 module...
Retrieved Contexts (truncated):# Text Generation <a target="_blank" href="https://colab.research.google.com/github/raghavbali/llm_w...


## Multi-Hop DSPy Program

In [20]:
from dsp.utils import deduplicate

In [21]:
class GenerateSearchQuery(dspy.Signature):
    """Write a simple search query that will help answer a complex question."""

    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    query = dspy.OutputField()

In [22]:
class SimplifiedBaleen(dspy.Module):
    def __init__(self, passages_per_hop=3, max_hops=2):
        super().__init__()

        self.generate_query = [dspy.ChainOfThought(GenerateSearchQuery) for _ in range(max_hops)]
        self.retrieve = dspy.Retrieve(k=passages_per_hop)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
        self.max_hops = max_hops
    
    def forward(self, question):
        context = []
        
        for hop in range(self.max_hops):
            query = self.generate_query[hop](context=context, question=question).query
            passages = self.retrieve(query).passages
            context = deduplicate(context + passages)

        pred = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=pred.answer)

In [23]:
# Get the prediction. This contains `pred.context` and `pred.answer`.
uncompiled_baleen = SimplifiedBaleen()  # uncompiled (i.e., zero-shot) program
pred = uncompiled_baleen(my_question)

# Print the contexts and the answer.
print(f"Question: {my_question}")
print(f"Predicted Answer: {pred.answer}")
for c in pred.context:
    print(f"Retrieved Contexts (truncated):{c[:100]}..." )

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Question: List the models covered in module03
Predicted Answer: GPT-2, BERT, T5
Retrieved Contexts (truncated):# Prompt Engineering
<img src="./assets/pe_banner.jpg">

Prompt Engineering is this thrilling new di...
Retrieved Contexts (truncated):# Open Source Vs Close Sourced LLMs

Similar to any other piece of technology, LLMs are available in...
Retrieved Contexts (truncated):# <img src="./assets/dspy_logo.png" width="2%"> DSPy: Beyond Prompting
---
<img src="./assets/dspy_b...
Retrieved Contexts (truncated):# Text Generation <a target="_blank" href="https://colab.research.google.com/github/raghavbali/llm_w...


## Let Us Add Some Checks/Assertions

In [24]:
class CheckContextuality(dspy.Signature):
    """Check if the generated response is from the provided context"""

    context = dspy.InputField(desc="may contain relevant facts")
    response = dspy.InputField(desc="generated response to question")
    is_contextual = dspy.OutputField(desc="generate a boolean response as True if the response is based on context otherwise respond with False")

In [25]:
class SimplifiedBaleenwithAssertions(dspy.Module):
    def __init__(self, passages_per_hop=3, max_hops=2):
        super().__init__()

        self.generate_query = [dspy.ChainOfThought(GenerateSearchQuery) for _ in range(max_hops)]
        self.retrieve = dspy.Retrieve(k=passages_per_hop)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
        self.checkcontextuality = dspy.Predict(CheckContextuality)
        self.max_hops = max_hops
    
    def forward(self, question):
        context = []
        
        for hop in range(self.max_hops):
            query = self.generate_query[hop](context=context, question=question).query
            passages = self.retrieve(query).passages
            context = deduplicate(context + passages)

        pred = self.generate_answer(context=context, question=question)
        response = dspy.Prediction(context=context, answer=pred.answer)
        dspy.Suggest(
            self.checkcontextuality(context=context,response=response.answer).is_contextual.lower()=='true',
            "Response should be factual and based on provided context only",
        )
        return response

In [26]:
import functools
from dspy.primitives.assertions import assert_transform_module, backtrack_handler

In [27]:
# backtrack_handler
dspy_program_with_assertions_retry_once = assert_transform_module(SimplifiedBaleenwithAssertions(), 
    functools.partial(backtrack_handler, max_backtracks=1))

In [28]:
out_of_context_question = "Who is the Prime Minister of India?"
pred = dspy_program_with_assertions_retry_once(out_of_context_question)

# Print the contexts and the answer.
print(f"Question: {out_of_context_question}")
print(f"Predicted Answer: {pred.answer}")
for c in pred.context:
    print(f"Retrieved Contexts (truncated):{c[:100]}..." )

Question: Who is the Prime Minister of India?
Predicted Answer: Narendra Modi
Retrieved Contexts (truncated):# Scaling Neural Nets and Efficient Training

We have covered quite some ground in previous 2 module...
Retrieved Contexts (truncated):# Prompt Engineering
<img src="./assets/pe_banner.jpg">

Prompt Engineering is this thrilling new di...
Retrieved Contexts (truncated):# Text Generation <a target="_blank" href="https://colab.research.google.com/github/raghavbali/llm_w...


In [29]:
llm_model.inspect_history(n=1)




Check if the generated response is from the provided context

---

Follow the following format.

Context: may contain relevant facts

Response: generated response to question

Previous Is Contextual: past Is Contextual: with errors

Instructions: Some instructions you must satisfy

Is Contextual: generate a boolean response as True if the response is based on context otherwise respond with False

---

Context:
[1] «# Scaling Neural Nets and Efficient Training

We have covered quite some ground in previous 2 modules and observed the steady increase in size and performance of the models. These gains come at huge cost, actual money and human labour apart from time researching and building these things. Can we estimate these costs and draw some insights about model sizes, datasets and comput requirements? ## Estimating Compute Costs
> Back of the Envelope Calculations : A quick way to get rough estimates


**[LLaMA 3.1](https://github.com/meta-llama/llama-models/blob/main/models/llama3_

'\n\n\nCheck if the generated response is from the provided context\n\n---\n\nFollow the following format.\n\nContext: may contain relevant facts\n\nResponse: generated response to question\n\nPrevious Is Contextual: past Is Contextual: with errors\n\nInstructions: Some instructions you must satisfy\n\nIs Contextual: generate a boolean response as True if the response is based on context otherwise respond with False\n\n---\n\nContext:\n[1] «# Scaling Neural Nets and Efficient Training\n\nWe have covered quite some ground in previous 2 modules and observed the steady increase in size and performance of the models. These gains come at huge cost, actual money and human labour apart from time researching and building these things. Can we estimate these costs and draw some insights about model sizes, datasets and comput requirements? ## Estimating Compute Costs\n> Back of the Envelope Calculations : A quick way to get rough estimates\n\n\n**[LLaMA 3.1](https://github.com/meta-llama/llama-mo

## Next Steps

- Once we have assertions in place, we need to develop an evaluation metric like ``number_of_outofcontext_responses`` which can be a simple average of cases where assertion fails
- Prepare a golden few-shot dataset
- Fine-tune/distill a student model (without assertions) using a teacher model (with assertions) to improve the overall pipeline

## How Does it All Go?

In [30]:
from IPython.display import display, Markdown

In [35]:
questions = [
    "Which model is used for instruction fine-tuning?",
    "List the models covered in module03",
    "Summarize key takeways for module02",
    "What is the focus of the following modules: module_01,module_02,module_03 and module_04? Respond as a list",
    "For RLHF what policy is covered in module03?"
]

uncompiled_baleen = SimplifiedBaleen(passages_per_hop=5)  # uncompiled (i.e., zero-shot) program
for question in questions:
    display(Markdown(f"**Question**:{question}"))
    pred = uncompiled_baleen(question)
    display(Markdown(f"**Predicted Answer**: {pred.answer}"))
    display(Markdown("**Retrieved Contexts (truncated)**:"))
    for c in pred.context:
        print(f"{c[:100]}..." )
    display(Markdown("---"))

**Question**:Which model is used for instruction fine-tuning?

**Predicted Answer**: LLaMA

**Retrieved Contexts (truncated)**:

# Instruction Tuning with Optimizations

Instruction tuning is form of fine-tuning that enhances a m...
# Quick Overview of RLFH

The performance of Language Models until GPT-3 was kind of amazing as-is. ...
# Text Generation <a target="_blank" href="https://colab.research.google.com/github/raghavbali/llm_w...
# Getting Started : Text Representation
<img src="./assets/banner_notebook_1.jpg">


The NLP domain ...
# <img src="./assets/dspy_logo.png" width="2%"> DSPy: Beyond Prompting
---
<img src="./assets/dspy_b...


---

**Question**:List the models covered in module03

**Predicted Answer**: GPT, BERT, T5

**Retrieved Contexts (truncated)**:

# Prompt Engineering
<img src="./assets/pe_banner.jpg">

Prompt Engineering is this thrilling new di...
# Open Source Vs Close Sourced LLMs

Similar to any other piece of technology, LLMs are available in...
# <img src="./assets/dspy_logo.png" width="2%"> DSPy: Beyond Prompting
---
<img src="./assets/dspy_b...
# Text Generation <a target="_blank" href="https://colab.research.google.com/github/raghavbali/llm_w...
# Scaling Neural Nets and Efficient Training

We have covered quite some ground in previous 2 module...


---

**Question**:Summarize key takeways for module02

**Predicted Answer**: Answer: Text generation techniques

**Retrieved Contexts (truncated)**:

# Prompt Engineering
<img src="./assets/pe_banner.jpg">

Prompt Engineering is this thrilling new di...
# Text Generation <a target="_blank" href="https://colab.research.google.com/github/raghavbali/llm_w...
# Getting Started : Text Representation
<img src="./assets/banner_notebook_1.jpg">


The NLP domain ...
## Exploring Transformer Architectures ## The RNN Limitation
The RNN layer (LSTM, or GRU, etc.) take...
# Retrieval Augmented LLM App
<img src="./assets/rap_banner.jpeg">

We have covered quite some groun...
# Quick Overview of RLFH

The performance of Language Models until GPT-3 was kind of amazing as-is. ...
# <img src="./assets/dspy_logo.png" width="2%"> DSPy: Beyond Prompting
---
<img src="./assets/dspy_b...


---

**Question**:What is the focus of the following modules: module_01,module_02,module_03 and module_04? Respond as a list

**Predicted Answer**: 1. Text Representation
2. Text Generation
3. Instruction Tuning
4. Prompt Engineering

**Retrieved Contexts (truncated)**:

# Prompt Engineering
<img src="./assets/pe_banner.jpg">

Prompt Engineering is this thrilling new di...
# <img src="./assets/dspy_logo.png" width="2%"> DSPy: Beyond Prompting
---
<img src="./assets/dspy_b...
# Getting Started : Text Representation
<img src="./assets/banner_notebook_1.jpg">


The NLP domain ...
# Instruction Tuning with Optimizations

Instruction tuning is form of fine-tuning that enhances a m...
# Text Generation <a target="_blank" href="https://colab.research.google.com/github/raghavbali/llm_w...
# Open Source Vs Close Sourced LLMs

Similar to any other piece of technology, LLMs are available in...
# Retrieval Augmented LLM App
<img src="./assets/rap_banner.jpeg">

We have covered quite some groun...


---

**Question**:For RLHF what policy is covered in module03?

**Predicted Answer**: Proximal Policy Optimization (PPO)

**Retrieved Contexts (truncated)**:

# Quick Overview of RLFH

The performance of Language Models until GPT-3 was kind of amazing as-is. ...
# Prompt Engineering
<img src="./assets/pe_banner.jpg">

Prompt Engineering is this thrilling new di...
# Text Generation <a target="_blank" href="https://colab.research.google.com/github/raghavbali/llm_w...
# Getting Started : Text Representation
<img src="./assets/banner_notebook_1.jpg">


The NLP domain ...
# <img src="./assets/dspy_logo.png" width="2%"> DSPy: Beyond Prompting
---
<img src="./assets/dspy_b...


---