# DSpy - Declarative Self Improving Python
***

Package Website: [DSpy Programming--not prompting--LMs](dspy.ai)  
Databricks Website: [Building GenAI Apps on Databricks with DSpy](https://docs.databricks.com/en/generative-ai/dspy/index.html)  
Blog: [DSpy On Datarbicks](https://www.databricks.com/blog/dspy-databricks)

***

Instead of prompts, DSpy is used to instruct the LM to deliver high quality outputs.  

***

## Installing DSpy

In [0]:
%pip install -U dspy

In [0]:
%restart_python

***
## Basic Use 

Import the package and set the LM to use the following as the framework's "brain."  The following code represents the set up for Databricks while running in an interactive notebook or notebook task in workflow.  

In [0]:
import dspy
lm = dspy.LM('databricks/databricks-meta-llama-3-1-70b-instruct')
dspy.configure(lm=lm)

Call the LM directly.  

In [0]:
lm("Say this is a test!", temperature=0.7)  # => ['This is a test!']
lm(messages=[{"role": "user", "content": "Say this is a test!"}])  # => ['This is a test!']

*** 
## Modules 

Modules help us decouple maintaining prompts for specific LMs from the inputs and outputs we wish them to perform.  We'll sepcifiy input/ouput behaviro as a signature and select a modeule to assign a strategy for invoking the LM.  DSpy turns the signatures into prompts and parses our typed outputs so we can compose different modules together into organized AI systems.  

### Math
***

In [0]:
math = dspy.ChainOfThought("question -> answer: float")
response = math(question="Two dice are tossed. What is the probability that the sum equals two?")

In [0]:
print(f"""
Reasoning: 
{response.completions['reasoning'][0]}

Answer: 
{response.completions['answer'][0]}
""")

### RAG
***

In [0]:
def search_wikipedia(query: str) -> list[str]:
    results = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')(query, k=3)
    return [x['text'] for x in results]

rag = dspy.ChainOfThought('context, question -> response')

question = "What's the name of the castle that David Gregory inherited?"

In [0]:
context = search_wikipedia(question)
print(context)

In [0]:
rag(context=search_wikipedia(question), question=question)

### Classification 
***

In [0]:
from typing import Literal

class Classify(dspy.Signature):
    """Classify sentiment of a given sentence."""

    sentence: str = dspy.InputField()
    sentiment: Literal['positive', 'negative', 'neutral'] = dspy.OutputField()
    confidence: float = dspy.OutputField()

classify = dspy.Predict(Classify)
classify(sentence="This book was super fun to read, though not the last chapter.")

### Information Extraction 
***

In [0]:
class ExtractInfo(dspy.Signature):
    """Extract structured information from text."""

    text: str = dspy.InputField()
    title: str = dspy.OutputField(desc="the title of the document")
    headings: list[str] = dspy.OutputField(desc="a list of headings in the document")
    entities: list[dict[str, str]] = dspy.OutputField(desc="a list of entities and their metadata")

module = dspy.Predict(ExtractInfo)

text = "Apple Inc. announced its latest iPhone 14 today." \
    "The CEO, Tim Cook, highlighted its new features in a press release."
response = module(text=text)

print(response.title)
print(response.headings)
print(response.entities)

### Agents
***

In [0]:
from pyspark.sql.functions import expr

def evaluate_math(expression: str):
    # return dspy.PythonInterpreter({}).execute(expression)
    return eval(expression) 

def search_wikipedia(query: str):
    results = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')(query, k=3)
    return [x['text'] for x in results]

react = dspy.ReAct("question -> answer: float", tools=[evaluate_math, search_wikipedia])

pred = react(question="What is 9362158 divided by the year of birth of David Gregory of Kinnairdy castle?")
print(pred.answer)

In [0]:
react(question = "What year was Scottish Physcian David Gregory born?")

In [0]:
9362158/1625

In [0]:
react(question = "What is 9362158 divided by the year that Scottish Physician David Gregory was born?")

### Multi-Stage Pipelines
***

In [0]:
class Outline(dspy.Signature):
    """Outline a thorough overview of a topic."""

    topic: str = dspy.InputField(desc = "the topic that will be outlined")
    title: str = dspy.OutputField(desc = "the title of the outline")
    sections: list[str] = dspy.OutputField(desc = "a list of sections")
    section_subheadings: dict[str, list[str]] = dspy.OutputField(desc="mapping from section headings to subheadings")

class DraftSection(dspy.Signature):
    """Draft a top-level section of an article."""

    topic: str = dspy.InputField(desc = "the topic for the top-level section of the article")
    section_heading: str = dspy.InputField(desc = "the section heading of the article")
    section_subheadings: list[str] = dspy.InputField(desc = "a list of section subheadings")
    content: str = dspy.OutputField(desc="markdown-formatted section")

class DraftArticle(dspy.Module):
    def __init__(self):
        self.build_outline = dspy.ChainOfThought(Outline)
        self.draft_section = dspy.ChainOfThought(DraftSection)

    def forward(self, topic):
        outline = self.build_outline(topic=topic)
        sections = []
        for heading, subheadings in outline.section_subheadings.items():
            section, subheadings = f"## {heading}", [f"### {subheading}" for subheading in subheadings]
            section = self.draft_section(topic=outline.title, section_heading=section, section_subheadings=subheadings)
            sections.append(section.content)
        return dspy.Prediction(title=outline.title, sections=sections)

draft_article = DraftArticle()
article = draft_article(topic="World Cup 2002")

In [0]:
article.title

In [0]:
import markdown

html_content = markdown.markdown(f"#{article.title}\n***")
for i in range(len(article.sections)):
  md_content = markdown.markdown(article.sections[i])
  html_content += f"\n{md_content}"

displayHTML(html_content)

## Optimizers 
***

### Optimizing prompts for a ReAct Agent

In [0]:
import dspy
from dspy.datasets import HotPotQA

dspy.configure(lm=dspy.LM('databricks/databricks-meta-llama-3-1-70b-instruct'))

def search_wikipedia(query: str) -> list[str]:
    results = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')(query, k=3)
    return [x['text'] for x in results]

trainset = [x.with_inputs('question') for x in HotPotQA(train_seed=2024, train_size=500).train]
react = dspy.ReAct("question -> answer", tools=[search_wikipedia])

tp = dspy.MIPROv2(metric=dspy.evaluate.answer_exact_match, auto="light", num_threads=24)

# takes awhile ---
# optimized_react = tp.compile(react, trainset=trainset)

In [0]:
lm.history[-1]