# DSPY Learn - Modules

https://dspy.ai/learn/programming/modules/

In [3]:
#!pip install dspy

In [4]:
import dspy 
from common.my_settings import MySettings  
from common.utils import md

settings = MySettings().get()

#lm_gpt35 = dspy.LM('gpt-3.5-turbo', temperature=0.8, model_type='chat', cache=False, api_key=settings.OPENAI_API_KEY)
lm_gpt4omin = dspy.LM('gpt-4o-mini', temperature=0.9, model_type='chat', cache=False, api_key=settings.OPENAI_API_KEY)
dspy.configure(lm=lm_gpt4omin)

lm = lm_gpt4omin
lm('hello there')


Getting keys from environment variables


['Hello! How can I assist you today?']

In [5]:
sentence = "it's a charming and often affecting journey"
classify = dspy.Predict("sentence -> sentiment: bool")
response = classify(sentence=sentence)
print(response.sentiment)

True


In [6]:
question = "What's something great about the ColBERT retrieval module"
classify = dspy.ChainOfThought("question -> answer", n=5)
response = classify(question=question)
response.completions.answer

['A great aspect of the ColBERT retrieval module is its combination of deep contextual understanding from BERT with efficient late interaction, enabling fast and accurate information retrieval without exhaustive document comparisons.',
 'One great aspect of the ColBERT retrieval module is its ability to leverage contextual embeddings from BERT, which enhances the understanding of semantic relationships in search queries and documents, resulting in more relevant retrieval outcomes while maintaining computational efficiency through its late interaction mechanism.',
 'One great aspect of the ColBERT retrieval module is its ability to efficiently combine the contextual power of BERT with a late interaction mechanism, which optimizes retrieval speed while providing highly relevant results based on deep contextual embeddings.',
 'One great thing about the ColBERT retrieval module is its ability to combine dense and sparse retrieval methods effectively, offering both high-quality relevance sc

In [7]:
print(f"Reasoning: {response.reasoning}")
print(f"Answer: {response.answer}")

Reasoning: ColBERT (Contextualized Late Interaction over BERT) is designed to enhance information retrieval tasks by combining the strengths of BERT's deep contextual understanding with efficient retrieval mechanisms. One of the great features of ColBERT is its ability to perform late interaction, allowing for fast and scalable retrieval without the need to exhaustively compare every possible document during the query phase. This means it leverages the power of BERT to generate contextual embeddings for queries and documents while still maintaining efficient performance, ultimately improving both the accuracy and speed of retrieval tasks.
Answer: A great aspect of the ColBERT retrieval module is its combination of deep contextual understanding from BERT with efficient late interaction, enabling fast and accurate information retrieval without exhaustive document comparisons.


# Math

Install Deno to run the Python code

In [8]:
import os, pathlib, shutil, subprocess

# Point to your Deno install (Codespaces default)
deno_bin = str(pathlib.Path.home() / ".deno" / "bin")

# Add to PATH for this kernel session
os.environ["PATH"] = deno_bin + ":" + os.environ.get("PATH","")

# (Optional) sanity checks
print("deno path:", deno_bin)
print("which deno:", shutil.which("deno"))
subprocess.run(["deno","--version"], check=True)


deno path: /home/codespace/.deno/bin
which deno: /home/codespace/.deno/bin/deno


deno 2.4.5 (stable, release, x86_64-unknown-linux-gnu)
v8 13.7.152.14-rusty
typescript 5.8.3


CompletedProcess(args=['deno', '--version'], returncode=0)

In [9]:
maths = dspy.ProgramOfThought("question -> answer")
result = maths(question="Whats the volume of a circle that is 3cm in diameter?")
md(result)

Prediction(  
    reasoning='To find the volume of a sphere, we use the formula \\( V = \\frac{4}{3} \\pi r^3 \\), where \\( r \\) is the radius. In this case, the diameter of the circle is given as 3 cm, which means the radius is half of that, or 1.5 cm. Plugging this value into the volume formula, we calculate the volume to be approximately 14.14 cubic centimeters. The output of the code confirms this calculation.',  
    answer='The volume of a circle (sphere) that is 3 cm in diameter is approximately 14.14 cm³.'  
)

In [10]:
maths.history

[{'prompt': None,
  'messages': [{'role': 'system',
    'content': 'Your input fields are:\n1. `question` (str):\nYour output fields are:\n1. `reasoning` (str): \n2. `generated_code` (str): python code that answers the question\nAll interactions will be structured in the following way, with the appropriate values filled in.\n\n[[ ## question ## ]]\n{question}\n\n[[ ## reasoning ## ]]\n{reasoning}\n\n[[ ## generated_code ## ]]\n{generated_code}\n\n[[ ## completed ## ]]\nIn adhering to this structure, your objective is: \n        You will be given `question` and you will respond with `generated_code`.\n        Generating executable Python code that programmatically computes the correct `generated_code`.\n        After you\'re done with the computation and think you have the answer, make sure to provide your answer by calling the preloaded function `final_answer()`.\n        You should structure your answer in a dict object, like {"field_a": answer_a, ...}, evaluates to the correct value 

# RAG
https://dspy.ai/learn/programming/modules/#__tabbed_1_2


In [11]:
lm('Whats the name of the Castle that David Gregory inherited?')

['David Gregory inherited a castle called Dunsany Castle, located in County Meath, Ireland.']

In [35]:
def search(query: str) -> list[str]:
    """Retreives abstracts from wikipedia"""
    # A COLBERT server is a vector database that can be used to store and retrieve embeddings.
    # The line below is fetching snippets from wikiped (2017) that is hosted on a COLBERT server. 
    result = dspy.ColBERTv2(url="http://20.102.90.50:2017/wiki17_abstracts")(query, k=3)

    md("**Search result**:", result[0])
    return [item['text'] for item in result]

rag = dspy.ChainOfThought("context, question -> response")
question = "Whats the name of the Castle that David Gregory inherited?"

md("**RAG with answerable question**")
md(rag(context=search(question), question=question))

md("**RAG with unanswerable question**")
# In the example below, the question is "Who is the president of Germany?" which cannot be answered from the context retrieved.
# It should give an answer such as "I dont know" or "The context does not provide this information.
md(rag(context=search(question), question="Who is the president of Germany?")) 

**RAG with answerable question**

**Search result**:{'text': 'David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinnairdy Castle in 1664. Three of his twenty-nine children became mathematics professors. He is credited with inventing a military cannon that Isaac Newton described as "being destructive to the human species". Copies and details of the model no longer exist. Gregory\'s use of a barometer to predict farming-related weather conditions led him to be accused of witchcraft by Presbyterian ministers from Aberdeen, although he was never convicted.', 'pid': 3296134, 'rank': 1, 'score': 25.856355667114258, 'prob': 0.9928459586069872, 'long_text': 'David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinnairdy Castle in 1664. Three of his twenty-nine children became mathematics professors. He is credited with inventing a military cannon that Isaac Newton described as "being destructive to the human species". Copies and details of the model no longer exist. Gregory\'s use of a barometer to predict farming-related weather conditions led him to be accused of witchcraft by Presbyterian ministers from Aberdeen, although he was never convicted.'}

Prediction(  
    reasoning='David Gregory, the Scottish physician mentioned in the context, inherited Kinnairdy Castle in 1664. The question asks for the name of the castle he inherited, which has been explicitly stated in the provided information.',  
    response='Kinnairdy Castle'  
)

**RAG with unanswerable question**

**Search result**:{'text': 'David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinnairdy Castle in 1664. Three of his twenty-nine children became mathematics professors. He is credited with inventing a military cannon that Isaac Newton described as "being destructive to the human species". Copies and details of the model no longer exist. Gregory\'s use of a barometer to predict farming-related weather conditions led him to be accused of witchcraft by Presbyterian ministers from Aberdeen, although he was never convicted.', 'pid': 3296134, 'rank': 1, 'score': 25.856355667114258, 'prob': 0.9928459586069872, 'long_text': 'David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinnairdy Castle in 1664. Three of his twenty-nine children became mathematics professors. He is credited with inventing a military cannon that Isaac Newton described as "being destructive to the human species". Copies and details of the model no longer exist. Gregory\'s use of a barometer to predict farming-related weather conditions led him to be accused of witchcraft by Presbyterian ministers from Aberdeen, although he was never convicted.'}

Prediction(  
    reasoning='The context provided does not include any information regarding the political structure or current leadership of Germany, including the identity of the president. None of the biographies mentioned relate to contemporary political figures in Germany. Therefore, I cannot derive the answer from the context.',  
    response='The current president of Germany is Frank-Walter Steinmeier, who has been in office since March 2017.'  
)

# Classification
https://dspy.ai/learn/programming/modules/#__tabbed_1_3

In [13]:
from typing import Literal 

class Classify(dspy.Signature):
    """Classify the sentiment"""
    sentence: str = dspy.InputField()
    sentiment: Literal['positive', 'neutral', 'negative'] = dspy.OutputField()
    confidence: float = dspy.OutputField()

classify = dspy.Predict(Classify)
md(classify(sentence='It is raining this evening, the roads will be dangerous'))


Prediction(  
    sentiment='negative',  
    confidence=0.85  
)

# Information Extraction
https://dspy.ai/learn/programming/modules/#__tabbed_1_4

In [14]:
#text = "Apple Inc. announced its latest iPhone 14 today. The CEO, Tim Cook, highlighted its new features in a press release."
#text = "The DSPy framework aims to resolve consistency and reliability issues by prioritizing declarative, systematic programming over manual prompt writing."
text = """
The Mongol Empire was the largest contiguous empire in history. Originating in present-day Mongolia in East Asia, the empire at its height stretched from the Sea of Japan 
to Eastern Europe, extending northward into Siberia and east and southward into the Indian subcontinent, mounting invasions of Southeast Asia, and conquering the Iranian 
plateau; and reaching westward as far as the Levant and the Carpathian Mountains.
"""

extract = dspy.Predict("text -> title, headings: list[str], entities_and_metadata: list[dict[str, str]]")
md(extract(text=text))

Prediction(  
    title='The Expansive Reach of the Mongol Empire',  
    headings=['Introduction', 'Geographical Extent', 'Historical Significance', 'Conclusion'],  
    entities_and_metadata=[{'entity': 'Mongol Empire', 'type': 'Empire'}, {'entity': 'Mongolia', 'type': 'Location'}, {'entity': 'Sea of Japan', 'type': 'Body of Water'}, {'entity': 'Eastern Europe', 'type': 'Region'}, {'entity': 'Siberia', 'type': 'Region'}, {'entity': 'Indian subcontinent', 'type': 'Region'}, {'entity': 'Southeast Asia', 'type': 'Region'}, {'entity': 'Iranian plateau', 'type': 'Location'}, {'entity': 'Levant', 'type': 'Region'}, {'entity': 'Carpathian Mountains', 'type': 'Mountain Range'}]  
)

# Agent 
https://dspy.ai/learn/programming/modules/#__tabbed_1_5

In [15]:
def evaluate_math(expression: str) -> float:
    md("Python code: ", expression)
    return dspy.PythonInterpreter({}).execute(expression)

def search_wikipedia(query: str) -> str:
    md("Searching Wikipedia for:", query)
    results = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')(query, k=3)
    return [x['text'] for x in results]

react = dspy.ReAct("question -> answer: float", tools=[evaluate_math, search_wikipedia])

pred = react(question="What is 9362158 divided by the year of birth of David Gregory of Kinnairdy castle?")

md(pred.answer)

Searching Wikipedia for:David Gregory Kinnairdy castle year of birth

Python code: 9362158 / 1625

5761.328

In [16]:
md("**Answer**: ", pred.answer)
md("**Reasoning**: ", pred.reasoning)

**Answer**: 5761.328

**Reasoning**: David Gregory of Kinnairdy Castle was born on December 20, 1625. By dividing 9362158 by his year of birth (1625), the result is approximately 5761.328.

## Agent thought process
See how by looping through the trajectory we can see how the agent 
1. thinks on what to do 
1. decides on a tool to call to fetch info 
1. calls the tool 
1. observes the result of the tool

In [17]:
# Loop through the trajectory to see how the agent thought and acted

for r in pred.trajectory:
    md(f"**{r}**: ", ": ", pred.trajectory[r])

**thought_0**: : I need to find out the year of birth of David Gregory of Kinnairdy castle before I can perform the division of 9362158 by that year. I will search for information on David Gregory to find out his year of birth.

**tool_name_0**: : search_wikipedia

**tool_args_0**: : {'query': 'David Gregory Kinnairdy castle year of birth'}

**observation_0**: : ['David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinnairdy Castle in 1664. Three of his twenty-nine children became mathematics professors. He is credited with inventing a military cannon that Isaac Newton described as "being destructive to the human species". Copies and details of the model no longer exist. Gregory\'s use of a barometer to predict farming-related weather conditions led him to be accused of witchcraft by Presbyterian ministers from Aberdeen, although he was never convicted.', 'David Castle (cricketer) | David James Castle (born 25 May 1972 in Launceston, Tasmania) is a former cricket player who represented the Tasmanian Tigers in one first-class match.', 'David Castle (philosopher) | David Castle (born 1967 in Edmonton, Alberta) is a Canadian philosopher and bioethicist. He is Vice-President of Research at the University of Victoria. Previously he was Professor and Chair of Innovations in the Life Sciences at University of Edinburgh, where he investigated how to get others to innovate. From 2006–2010 he served as Canada Research Chair in Science and Society at the University of Ottawa where he developed ideas leading to the creation of the Institute for Science, Society and Policy.']

**thought_1**: : I have found that David Gregory was born on December 20, 1625. Now I can perform the division of 9362158 by 1625.

**tool_name_1**: : evaluate_math

**tool_args_1**: : {'expression': '9362158 / 1625'}

**observation_1**: : 5761.328

**thought_2**: : I have successfully calculated the result of dividing 9362158 by 1625, which is approximately 5761.328. Now I can finalize my response.

**tool_name_2**: : finish

**tool_args_2**: : {}

**observation_2**: : Completed.

# Compose multiple modules into a bigger problem

https://dspy.ai/learn/programming/modules/#how-do-i-compose-multiple-modules-into-a-bigger-program

In [46]:
search("Stephen Curry is the best 3 pointer shooter ever in the human history")

ReadTimeout: HTTPConnectionPool(host='20.102.90.50', port=2017): Read timed out. (read timeout=10)

In [43]:


class Hop(dspy.Module):
    def __init__(self, num_docs=10, num_hops=1):
        self.num_docs = num_docs
        self.num_hops = num_hops

        self.generate_query = dspy.ChainOfThought('claim, notes -> query')
        self.append_notes = dspy.ChainOfThought('claim, notes, context -> new_notes: list[str], titles: list[str]')

    def forward(self, claim: str) -> list[str]:
        notes = []
        titles = []
        count = 1

        for _ in range(self.num_hops):
            print(count)
            count += 1
            query = self.generate_query(claim=claim, notes=notes).query
            print('a ########################################################')
            print(query)
            context = search(query)
            print('b ########################################################')
            prediction = self.append_notes(claim=claim, notes=notes, context=context)
            print('c ########################################################')
            notes.extend(prediction.new_notes)

            titles.extend(prediction.titles)

        return dspy.Prediction(notes=notes, titles=list(set(titles)))

hop = Hop()
md(hop(claim="The capital of Australia is Sydney"))

1
a ########################################################
What is the capital city of Australia?


ReadTimeout: HTTPConnectionPool(host='20.102.90.50', port=2017): Read timed out. (read timeout=10)

ReadTimeout: HTTPConnectionPool(host='20.102.90.50', port=2017): Read timed out. (read timeout=10)