<a href="https://colab.research.google.com/github/rockingboi/LLM-and-RAG-using-DSPY-FRAEWORK/blob/main/LLM_AND_RAG_USING_DSPY.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

INSTALL DSPY

In [None]:
!pip install -U dspy

Collecting dspy
  Downloading dspy-3.0.2-py3-none-any.whl.metadata (7.1 kB)
Collecting backoff>=2.2 (from dspy)
  Downloading backoff-2.2.1-py3-none-any.whl.metadata (14 kB)
Collecting openai>=0.28.1 (from dspy)
  Downloading openai-1.101.0-py3-none-any.whl.metadata (29 kB)
Collecting ujson>=5.8.0 (from dspy)
  Downloading ujson-5.11.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.metadata (9.4 kB)
Collecting optuna>=3.4.0 (from dspy)
  Downloading optuna-4.5.0-py3-none-any.whl.metadata (17 kB)
Collecting magicattr>=0.1.6 (from dspy)
  Downloading magicattr-0.1.6-py2.py3-none-any.whl.metadata (3.2 kB)
Collecting litellm>=1.64.0 (from dspy)
  Downloading litellm-1.76.0-py3-none-any.whl.metadata (41 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.2/41.2 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting diskcache>=5.6.0 (from dspy)
  Downloading diskcache-5.6.3-py3-none-any.whl.metadata (20 kB)
Collecting json-repair>=0.30.0 (from dspy)
  

ACTIVATE DSPY LLAMA MODEL USING GEMINI API

In [None]:
import dspy
lm = dspy.LM("gemini/gemini-2.5-flash", api_key="AIzaSyDih7-6AREnvm0IItCVe7vNnvlWItxiTRg")
dspy.configure(lm=lm)

MATH OPERATION

In [None]:
math = dspy.ChainOfThought("question -> answer: float")
math(question="Two dice are tossed. What is the probability that the sum equals two?")

Prediction(
    reasoning='To find the probability that the sum of two dice equals two, we need to determine the number of favorable outcomes and divide it by the total number of possible outcomes.\n\n1.  **Total Possible Outcomes:**\n    When two dice are tossed, each die has 6 possible outcomes (1, 2, 3, 4, 5, 6).\n    The total number of combinations is 6 * 6 = 36.\n    These combinations can be listed as (1,1), (1,2), ..., (6,6).\n\n2.  **Favorable Outcomes (Sum equals two):**\n    We need to find the combinations where the sum of the two dice is 2.\n    The only way to get a sum of 2 is if both dice show a 1.\n    So, the only favorable outcome is (1, 1).\n    There is only 1 favorable outcome.\n\n3.  **Calculate Probability:**\n    Probability = (Number of Favorable Outcomes) / (Total Number of Possible Outcomes)\n    Probability = 1 / 36\n\n4.  **Convert to float:**\n    1 / 36 = 0.027777777777777776',
    answer=0.027777777777777776
)

WIKIPEDIA SEARCH

In [None]:
def search_wikipedia(query: str) -> list[str]:
    results = dspy.ColBERTv2(url="http://20.102.90.50:2017/wiki17_abstracts")(query, k=3)
    return [x["text"] for x in results]

rag = dspy.ChainOfThought("context, question -> response")

question = "What's the name of the castle that David Gregory inherited?"
rag(context=search_wikipedia(question), question=question)

Prediction(
    reasoning='The question asks for the name of the castle inherited by David Gregory. I will look for the text block that mentions "David Gregory" and then find the sentence that describes him inheriting a castle.\n\nIn text block [1], it states: "David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor... He inherited Kinnairdy Castle in 1664."\n\nThe name of the castle is Kinnairdy Castle.',
    response='Kinnairdy Castle'
)

CLASSIFICATION USING DSPY

In [None]:
from typing import Literal

class Classify(dspy.Signature):
    """Classify sentiment of a given sentence."""

    sentence: str = dspy.InputField()
    sentiment: Literal["positive", "negative", "neutral"] = dspy.OutputField()
    confidence: float = dspy.OutputField()

classify = dspy.Predict(Classify)
classify(sentence="This book was super fun to read, though not the last chapter.")

Prediction(
    sentiment='positive',
    confidence=0.75
)

INFORMATION EXTRACTION

In [None]:
class ExtractInfo(dspy.Signature):
    """Extract structured information from text."""

    text: str = dspy.InputField()
    title: str = dspy.OutputField()
    headings: list[str] = dspy.OutputField()
    entities: list[dict[str, str]] = dspy.OutputField(desc="a list of entities and their metadata")

module = dspy.Predict(ExtractInfo)

text = "Apple Inc. announced its latest iPhone 14 today." \
    "The CEO, Tim Cook, highlighted its new features in a press release."
response = module(text=text)

print(response.title)
print(response.headings)
print(response.entities)

Apple Announces iPhone 14
[]
[{'name': 'Apple Inc.', 'type': 'Organization'}, {'name': 'iPhone 14', 'type': 'Product'}, {'name': 'Tim Cook', 'type': 'Person', 'role': 'CEO'}]


AGENTS USING DSPY

In [None]:
def evaluate_math(expression: str):
    return dspy.PythonInterpreter({}).execute(expression)

def search_wikipedia(query: str):
    results = dspy.ColBERTv2(url="http://20.102.90.50:2017/wiki17_abstracts")(query, k=3)
    return [x["text"] for x in results]

react = dspy.ReAct("question -> answer: float", tools=[evaluate_math, search_wikipedia])

pred = react(question="What is 9362158 divided by the year of birth of David Gregory of Kinnairdy castle?")
print(pred.answer)

5761.328


MULTI STAGE PIPELINES

In [None]:
class Outline(dspy.Signature):
    """Outline a thorough overview of a topic."""

    topic: str = dspy.InputField()
    title: str = dspy.OutputField()
    sections: list[str] = dspy.OutputField()
    section_subheadings: dict[str, list[str]] = dspy.OutputField(desc="mapping from section headings to subheadings")

class DraftSection(dspy.Signature):
    """Draft a top-level section of an article."""

    topic: str = dspy.InputField()
    section_heading: str = dspy.InputField()
    section_subheadings: list[str] = dspy.InputField()
    content: str = dspy.OutputField(desc="markdown-formatted section")

class DraftArticle(dspy.Module):
    def __init__(self):
        self.build_outline = dspy.ChainOfThought(Outline)
        self.draft_section = dspy.ChainOfThought(DraftSection)

    def forward(self, topic):
        outline = self.build_outline(topic=topic)
        sections = []
        for heading, subheadings in outline.section_subheadings.items():
            section, subheadings = f"## {heading}", [f"### {subheading}" for subheading in subheadings]
            section = self.draft_section(topic=outline.title, section_heading=section, section_subheadings=subheadings)
            sections.append(section.content)
        return dspy.Prediction(title=outline.title, sections=sections)

draft_article = DraftArticle()
article = draft_article(topic="World Cup 2002")

In [None]:
article = draft_article(topic="World Cup 2002")

In [None]:
article

Prediction(
    title="FIFA World Cup 2002: Asia's Historic Football Spectacle",
    sections=["## Introduction and Historical Significance\n\nThe FIFA World Cup 2002, jointly hosted by South Korea and Japan, stands as a monumental chapter in the history of international football. More than just a tournament, it represented a significant paradigm shift for FIFA, expanding the sport's most prestigious event into new territories and fostering unprecedented collaboration.\n\n### Historical Context: First Asian and Co-hosted World Cup\n\nThe decision to award the 2002 World Cup to South Korea and Japan marked a profound moment of historical significance. It was the first time in the tournament's then 72-year history that the FIFA World Cup was held on Asian soil, a testament to the continent's growing passion for football and its burgeoning economic power. Equally groundbreaking was its status as the first-ever co-hosted World Cup, a bold experiment in international cooperation that saw tw