**Rawan Alkurd** 
*Senior Data Scientist*  
[My GitHub Profile](https://github.com/rawanalkurd)  
[My LinkedIn Profile](https://linkedin.com/in/rawan-alkurd/)

# Why DSPy is the Future of LLMs Programming (And How to Get Started!)

Imagine a world where interacting with large language models (LLMs) is as intuitive and efficient as programming with high-level abstractions. Picture this: You're working on an AI project late at night, wrestling with countless prompts and tweaks to get your language model to understand your commands. You wish there was a way to simplify this process, to make it less about guessing and more about clear, structured programming. Welcome to DSPy, the Declarative Self-Improving Language Model Programs framework developed by Stanford University. DSPy is revolutionizing how we build and optimize AI applications, making traditional prompt engineering a thing of the past.

## The Tale of Traditional Prompt Engineering
A few months ago, prompt engineering was the buzzword in AI circles. The idea was to craft specific prompts to get the best responses from LLMs. However, this approach quickly revealed its limitations: it's inconsistent, labor-intensive, and relies heavily on trial and error. As the complexity of tasks grows, maintaining these prompts becomes a daunting task. Imagine trying to cook a perfect meal without a recipe, just by trial and error, sometimes it works, often it doesn't! This is where DSPy comes into play, offering a structured, scalable, and efficient solution.

## What is DSPy?
DSPy stands for Declarative Self-Improving Language Model Programs. Unlike traditional prompt engineering, DSPy provides a systematic approach to programming LLMs, treating them as modular components within a self-improving pipeline. The framework is built around three key components: signatures, modules, and teleprompters (optimizers). The framework transforms the chaotic world of prompt engineering into a structured programming model. Think of DSPy as a master chef who not only cooks the perfect dish but also continuously improves the recipe based on feedback. This approach abstracts away the guesswork, allowing you to focus on defining what you want rather than how to achieve it.

## How DSPy Compares to Other Frameworks?
In the ever-evolving landscape of large language model (LLM) programming, various frameworks offer distinct approaches to enhance and streamline interactions with LLMs. DSPy stands out with its unique methodology and robust feature set. It offers a unique blend of high-level abstractions, automated optimization, and ease of use, making it an excellent choice for developers looking to simplify and enhance their LLM programming workflows. While other frameworks like Guidance, LMQL, Marvin, Instructor, and LangChain have their strengths, DSPy's specialized focus on LLMs and self-improving pipelines sets it apart as a powerful tool in the AI developer's toolkit.
 Here's how DSPy compares to other prominent frameworks:

### 1. Controlled Generation

Frameworks like Guidance and LMQL:
Strengths: Use template-based and constraint-based techniques to control LLM outputs, ensuring they meet specific requirements.
Limitations: Require manual setup and are more suitable for completion tasks rather than conversational models like GPT-4.

DSPy:
Strengths: Automates prompt optimization using teleprompters, reducing manual effort and providing consistent results.
Limitations: May introduce computational overhead due to continuous optimization.

### 2. Schema-Driven Generation
Frameworks like Marvin and Instructor:
Strengths: Utilize schemas to ensure outputs conform to predefined structures, simplifying integration with existing programming logic.
Limitations: Can be less flexible in controlling the type of content generated.

DSPy:
Strengths: Allows for high-level task definitions using natural language signatures, supporting a wide range of tasks with customizable modules.
Limitations: Learning curve for new users to understand and effectively use the high-level abstractions.

### 3. Compiling LM Programs
Frameworks like LangChain:
Strengths: Provide pre-packaged utilities for rapid prototyping with straightforward prompt generation and parsing tools.
Limitations: Limited customization options and reliance on manual prompt engineering.

DSPy:
Strengths: Offers modular design and advanced optimization strategies to build and refine complex pipelines.
Limitations: May not be as intuitive for developers accustomed to more traditional frameworks.

### 4. Open-Ended Agents
Frameworks like AutoGPT and MetaGPT:
Strengths: Use pre-built pipelines to perform tasks with minimal human intervention, offering convenience.
Limitations: Fixed prompts and capabilities can limit extensibility and adaptability.

DSPy:
Strengths: Enables the creation of highly customized and extensible pipelines with continuous self-improvement.
Limitations: Dependency on the quality and quantity of training examples for optimization effectiveness.

## Key Components of DSPy
Let's walk through the essential DSPy components needed to build a programmable LLM pipeline step by step. First, we'll set up our language model. For this tutorial, I'll be using Gemini Pro, but feel free to use any language model that suits your needs. If you need guidance on setting up an LLM locally, you can refer to my previous blog here:
https://medium.com/@rawanalkurd/running-llms-locally-a-guide-to-setting-up-ollama-with-docker-6ef8488e75d4

Here is how to setup our LLM to be used with DSPy:

In [1]:
import dspy
temperature=0 # set your temperature here
model = dspy.Google("models/gemini-1.0-pro",
        api_key=API_KEY, # Use your API Key
        temperature=temperature,
        max_output_tokens=3024)

dspy.settings.configure(lm=model, max_tokens=3024) # this is used to set the model to be used globally across all DSPy tasks

  from .autonotebook import tqdm as notebook_tqdm


## 1. Signatures
Signatures in DSPy are like the blueprints of desired behavior. They define the input/output relationship of a task in natural language, abstracting away the complexities of how to achieve it.
Example:

In [2]:
sig_1 = dspy.Signature("question -> answer")
sig_2 = dspy.Signature("document -> summary")
sig_3 = dspy.Signature("question, context -> answer")

In [3]:
# Here are the components of a signature:
sig_3

StringSignature(question, context -> answer
    instructions='Given the fields `question`, `context`, produce the fields `answer`.'
    question = Field(annotation=str required=True json_schema_extra={'__dspy_field_type': 'input', 'prefix': 'Question:', 'desc': '${question}'})
    context = Field(annotation=str required=True json_schema_extra={'__dspy_field_type': 'input', 'prefix': 'Context:', 'desc': '${context}'})
    answer = Field(annotation=str required=True json_schema_extra={'__dspy_field_type': 'output', 'prefix': 'Answer:', 'desc': '${answer}'})
)

As shown above, the DSPy signature consists of three main components:  instructions, inputs, and outputs. You can use the built in signature or write your own signature as shown in the ExtractActionItems example below:

In [5]:
class ExtractActionItems(dspy.Signature):
    """
    Extract the action items from the provided text and return them as a Python list.    
    """
    Text = dspy.InputField(desc="Text")
    Extracted_action_items = dspy.OutputField(desc="Python list of action items")

You can add as many signature as you wish and customize them based on your problem.

## 2. Modules
DSPy modules are essential components designed to simplify and enhance the process of programming large language models (LLMs). These modules provide pre-built functionalities that can be combined to handle complex tasks efficiently. Let's delve into the various types of DSPy modules, their specific uses, and practical examples to illustrate their application.
Types of DSPy Modules
Predict Module
ChainOfThought (CoT) Module
ProgramOfThought (PoT) Module
Task-Specific Modules

There are other advanced types of DSPy modules such as ReAct, MultiChainComparison and Majority. I will leave these to another blog. Stay Tuned! 
### 1. Predict Module:
Purpose: The Predict module is the foundational component that leverages a given Signature to generate prompts and produce results. It serves as a straightforward bridge between the Signature and the final output.
Usage:
Extracting Specific Information: The Predict module can be used to extract action items from a text, summarize content, or generate specific outputs based on predefined Signatures.

Example:

In [7]:
# Defining and using the ExtractActionItems Signature
class ExtractActionItems(dspy.Signature):
    """
    Extract the action items from the provided text and return them as a Python list.    
    """
    Text = dspy.InputField(desc="Text")
    Extracted_action_items = dspy.OutputField(desc="Python list of action items")
 
# Extracting action items
Text = "In our meeting today, we discussed the progress of Project A. It was noted that there are several blockers that need attention. John will follow up with the IT department to resolve the server issues. Sarah will prepare the financial report for the next quarter by Friday. Additionally, we need to organize a team-building activity next month to boost morale. Lastly, review the new project proposals by next Wednesday."
action_items_extractor = dspy.Predict(ExtractActionItems)
response = action_items_extractor(Text=Text)

prompt: /n  Extract the action items from the provided text and return them as a Python list.

---

Follow the following format.

Text: Text
Extracted Action Items: Python list of action items

---

Text: In our meeting today, we discussed the progress of Project A. It was noted that there are several blockers that need attention. John will follow up with the IT department to resolve the server issues. Sarah will prepare the financial report for the next quarter by Friday. Additionally, we need to organize a team-building activity next month to boost morale. Lastly, review the new project proposals by next Wednesday.
Extracted Action Items:


Lets inspect the output of this program:

In [8]:
model.inspect_history()




Extract the action items from the provided text and return them as a Python list.

---

Follow the following format.

Text: Text
Extracted Action Items: Python list of action items

---

Text: In our meeting today, we discussed the progress of Project A. It was noted that there are several blockers that need attention. John will follow up with the IT department to resolve the server issues. Sarah will prepare the financial report for the next quarter by Friday. Additionally, we need to organize a team-building activity next month to boost morale. Lastly, review the new project proposals by next Wednesday.
Extracted Action Items:- John will follow up with the IT department to resolve the server issues.
- Sarah will prepare the financial report for the next quarter by Friday.
- Organize a team-building activity next month to boost morale.
- Review the new project proposals by next Wednesday.





'\n\n\nExtract the action items from the provided text and return them as a Python list.\n\n---\n\nFollow the following format.\n\nText: Text\nExtracted Action Items: Python list of action items\n\n---\n\nText: In our meeting today, we discussed the progress of Project A. It was noted that there are several blockers that need attention. John will follow up with the IT department to resolve the server issues. Sarah will prepare the financial report for the next quarter by Friday. Additionally, we need to organize a team-building activity next month to boost morale. Lastly, review the new project proposals by next Wednesday.\nExtracted Action Items:\x1b[32m- John will follow up with the IT department to resolve the server issues.\n- Sarah will prepare the financial report for the next quarter by Friday.\n- Organize a team-building activity next month to boost morale.\n- Review the new project proposals by next Wednesday.\x1b[0m\n\n\n'

Observe how the LLM was unable to produce the output in the correct format. In a few minutes, we will see how DSPy can optimize the program to improve the output and align it with specific metrics to meet certain expectations or performance criteria.

### 2. ChainOfThought (CoT) Module
Purpose: The ChainOfThought module is designed for tasks requiring logical reasoning or multi-step processing. It structures prompts to guide the model through a step-by-step reasoning process, improving the depth and accuracy of responses.
Usage:
Logical Reasoning: Ideal for tasks that need to follow a logical sequence of steps.
Multi-Step Processing: Useful for breaking down complex tasks into manageable steps.

Example:
For this example, we will define the signature directly within the definition of the Chain of Thought (CoT) program. This is an alternative to explicitly defining the signature separately. Note that the words used for the inputs and outputs are crucial as they guide the LLM to produce reasonable answers. For example, if you have a question-answer problem, your input should be labeled as 'question' and your output as 'answer,' as shown below:

In [9]:
QA_program = dspy.ChainOfThought('question -> answer')
question ="If you have 10 oranges and you eat 3, then buy 7 more, how many oranges do you have now?"

response = QA_program(question=question)

prompt: /n  Given the fields `question`, produce the fields `answer`.

---

Follow the following format.

Question: ${question}
Reasoning: Let's think step by step in order to ${produce the answer}. We ...
Answer: ${answer}

---

Question: If you have 10 oranges and you eat 3, then buy 7 more, how many oranges do you have now?
Reasoning: Let's think step by step in order to


Now let inspect our output:

In [10]:
model.inspect_history()




Given the fields `question`, produce the fields `answer`.

---

Follow the following format.

Question: ${question}
Reasoning: Let's think step by step in order to ${produce the answer}. We ...
Answer: ${answer}

---

Question: If you have 10 oranges and you eat 3, then buy 7 more, how many oranges do you have now?
Reasoning: Let's think step by step in order toQuestion: If you have 10 oranges and you eat 3, then buy 7 more, how many oranges do you have now?
Reasoning: Let's think step by step in order to produce the answer. First, we start with 10 oranges. Then, we eat 3 oranges, which leaves us with 10 - 3 = 7 oranges. Finally, we buy 7 more oranges, which gives us 7 + 7 = 14 oranges.
Answer: 14





"\n\n\nGiven the fields `question`, produce the fields `answer`.\n\n---\n\nFollow the following format.\n\nQuestion: ${question}\nReasoning: Let's think step by step in order to ${produce the answer}. We ...\nAnswer: ${answer}\n\n---\n\nQuestion: If you have 10 oranges and you eat 3, then buy 7 more, how many oranges do you have now?\nReasoning: Let's think step by step in order to\x1b[32mQuestion: If you have 10 oranges and you eat 3, then buy 7 more, how many oranges do you have now?\nReasoning: Let's think step by step in order to produce the answer. First, we start with 10 oranges. Then, we eat 3 oranges, which leaves us with 10 - 3 = 7 oranges. Finally, we buy 7 more oranges, which gives us 7 + 7 = 14 oranges.\nAnswer: 14\x1b[0m\n\n\n"

Observe how the chain of thought reasoning helped the LLM to logically deduce and find the correct answer. The CoT module not only returns the requested answer but also provides the reasoning behind each step, enhancing transparency and understanding.

### 3. ProgramOfThought (PoT) Module
Purpose: The ProgramOfThought module extends the capabilities of the CoT module by allowing for more complex, programmable thought processes. It's suited for tasks that require a higher level of abstraction and control over the reasoning process.
Usage:
Complex Programming Tasks: Suitable for scenarios where the model needs to handle intricate logic and reasoning.

Example:

In [11]:
# Define a signature for question answering
generate_answer_signature = dspy.Signature("question -> answer")

# Pass signature to ProgramOfThought Module
pot = dspy.ProgramOfThought(generate_answer_signature)

# Call the ProgramOfThought module
question = "If you have 10 oranges and you eat 3, then buy 7 more, how many oranges do you have now?"
result = pot(question=question)

prompt: /n  You will be given `question` and you will respond with `answer`.
Generating executable Python code that programmatically computes the correct `answer`.
After you're done with the computation, make sure the last line in your code evaluates to the correct value for `answer`.

---

Follow the following format.

Question: ${question}
Reasoning: Let's think step by step in order to ${produce the generated_code}. We ...
Code: python code that answers the question

---

Question: If you have 10 oranges and you eat 3, then buy 7 more, how many oranges do you have now?
Reasoning: Let's think step by step in order to
prompt: /n  You will be given `question` and you will respond with `answer`.
Generating executable Python code that programmatically computes the correct `answer`.
After you're done with the computation, make sure the last line in your code evaluates to the correct value for `answer`.

---

Follow the following format.

Question: ${question}
Reasoning: Let's think step b

Now let inspect our output:

In [12]:
model.inspect_history()




Given the final code `question`, `final_generated_code`, `code_output`, provide the final `answer`.

---

Follow the following format.

Question: ${question}

Code: python code that answers the question

Code Output: output of previously-generated python code

Reasoning: Let's think step by step in order to ${produce the answer}. We ...

Answer: ${answer}

---

Question: If you have 10 oranges and you eat 3, then buy 7 more, how many oranges do you have now?

Code:
oranges_initial = 10
oranges_eaten = 3
oranges_bought = 7
oranges_now = oranges_initial - oranges_eaten + oranges_bought
answer = oranges_now
answer

Code Output: 14

Reasoning: Let's think step by step in order toQuestion: If you have 10 oranges and you eat 3, then buy 7 more, how many oranges do you have now?

Code:
oranges_initial = 10
oranges_eaten = 3
oranges_bought = 7
oranges_now = oranges_initial - oranges_eaten + oranges_bought
answer = oranges_now
answer

Code Output: 14

Reasoning: Let's think step by step in o

"\n\n\nGiven the final code `question`, `final_generated_code`, `code_output`, provide the final `answer`.\n\n---\n\nFollow the following format.\n\nQuestion: ${question}\n\nCode: python code that answers the question\n\nCode Output: output of previously-generated python code\n\nReasoning: Let's think step by step in order to ${produce the answer}. We ...\n\nAnswer: ${answer}\n\n---\n\nQuestion: If you have 10 oranges and you eat 3, then buy 7 more, how many oranges do you have now?\n\nCode:\noranges_initial = 10\noranges_eaten = 3\noranges_bought = 7\noranges_now = oranges_initial - oranges_eaten + oranges_bought\nanswer = oranges_now\nanswer\n\nCode Output: 14\n\nReasoning: Let's think step by step in order to\x1b[32mQuestion: If you have 10 oranges and you eat 3, then buy 7 more, how many oranges do you have now?\n\nCode:\noranges_initial = 10\noranges_eaten = 3\noranges_bought = 7\noranges_now = oranges_initial - oranges_eaten + oranges_bought\nanswer = oranges_now\nanswer\n\nCode 

Observe how the LLM produced the answer, the code used to generate the answer, and the reasoning employed to reach the correct answer.

### 4. Task-Specific Modules
Purpose: These modules are tailored for specific tasks and applications. They leverage the strengths of DSPy to provide optimized solutions for particular use cases.
Usage:
Custom Tasks: Create modules for unique tasks such as document classification, sentiment analysis, or any other specific application.

Example: To avoid repetition, refer to the use case in the last section of the blog. The use case demonstrates how to build a customized module tailored specifically for the task.
DSPy modules offer a powerful and flexible way to handle a variety of tasks in AI programming. By leveraging these modules, developers can simplify the process of interacting with large language models, making it more efficient and effective. By embracing the modular approach of DSPy, you're not just writing prompts; you're programming intelligent, adaptable systems capable of tackling intricate tasks with ease.

## 3. Teleprompters (Optimizers)
Teleprompters are the optimizers in DSPy, guiding the learning and optimization of modules. They improve the performance of LLM pipelines by refining prompts and learning from data. Here's a look at the different types of DSPy optimizers and how to effectively use them.
Types of DSPy Optimizers
### 1. Automatic Few-Shot Learning Optimizers
These optimizers extend the signature by generating and including optimized examples within the prompt, implementing few-shot learning.
LabeledFewShot: Constructs few-shot examples from labeled input-output data points. This optimizer uses existing labeled examples to create prompts that help the model understand the task with minimal examples.
BootstrapFewShot: Uses a teacher module to generate complete demonstrations for every stage of your program. This optimizer generates examples using a more knowledgeable model (teacher) to guide the less knowledgeable one (student).
BootstrapFewShotWithRandomSearch: Extends BootstrapFewShot by applying random search over generated demonstrations. This optimizer adds a layer of random search to find the best possible few-shot examples, improving the robustness of the generated prompts.
BootstrapFewShotWithOptuna: Uses Optuna optimization across demonstration sets. This optimizer integrates Optuna, a hyperparameter optimization framework, to enhance the selection of few-shot examples.
KNNFewShot: Selects demonstrations through the k-Nearest Neighbors algorithm. This optimizer uses the k-Nearest Neighbors algorithm to find the most relevant examples from the dataset to use as few-shot demonstrations.

### 2. Automatic Instruction Optimization Optimizers
These optimizers generate optimal instructions for the prompt.
COPRO: Generates and refines new instructions using coordinate ascent. COPRO uses a method called coordinate ascent to iteratively improve the instructions provided to the model.
MIPRO: Generates instructions and few-shot examples using Bayesian Optimization. MIPRO employs Bayesian Optimization to fine-tune both the instructions and the few-shot examples, balancing exploration and exploitation to find the best configurations.

### 3. Automatic Finetuning Optimizers
BootstrapFinetune: Distills a prompt-based DSPy program into weight updates for smaller LMs. This optimizer takes a program optimized for a large model and fine-tunes it to work effectively on a smaller model by updating the model weights.
### 4. Program Transformation Optimizers
Ensemble: Combines a set of DSPy programs into a single program.The Ensemble optimizer aggregates multiple DSPy programs to create a more robust and reliable single program, leveraging the strengths of each.
DSPy optimizer is a topic worth exploring in depth. MIPRO is a very interesting tool that can significantly improve the performance of your LLM. I'll address MIPRO and other optimizers in a separate blog, but for now, to get you started, I'll show you how to use BootstrapFewShot. In the upcoming use case example, I'll demonstrate how to use this optimizer and how it can enhance the outcome of your LLM.

## DSPy Workflow
The DSPy framework simplifies the process of building and optimizing language model pipelines. By breaking down the development process into clear, manageable steps, DSPy ensures that you can focus on defining the tasks and metrics that matter most. Here's a step-by-step guide to the DSPy workflow:
- Define the Task: Outline the task and metrics to optimize.
- Assemble Examples: Collect input examples, with or without labels.
- Construct the Pipeline: Choose built-in layers (modules) and assign input/output specifications.
- Optimize: Use a DSPy optimizer to transform the code into high-quality instructions and generate automatic few-shot examples.

## Use Case: Structured Feedback Summarization with DSPy
Scenario
Imagine you're receiving tons of unstructured feedback from users about your app. Some love the new features, others are frustrated with bugs, and a few offer suggestions for improvements. You need a way to make sense of all this data quickly and efficiently. Lets explore how we can use DSPy, our new best friend for transforming chaos into clarity.
### 1. Setting Up the Environment:
Here, we set up DSPy and configure our language model (we use Google's Gemini model) with the API key and some settings.

In [13]:
import dspy
from dspy.teleprompt import BootstrapFewShot
from dspy.evaluate.evaluate import Evaluate
from rouge_score import rouge_scorer
from typing import List, Dict, Any
temperature=0 # set your temprature here
model = dspy.Google("models/gemini-1.0-pro",
        api_key=API_KEY, # Use your API Key
        temperature=temperature,
        max_output_tokens=3024)
# this is used to set the model to be used globally across all DSPy tasks
dspy.settings.configure(lm=model, max_tokens=3024)

### 1. Define the Task with Signatures
Think of signatures as blueprints. They outline what you want to achieve without worrying about the nitty-gritty details.

In [14]:
class FeedbackSummary(dspy.Signature):
    feedback_text = dspy.InputField()
    sentiment = dspy.OutputField()
    topics = dspy.OutputField()
    action_items = dspy.OutputField()

Here, FeedbackSummary is a blueprint with an input for feedback_text and outputs for sentiment, topics, and action_items.

### 2. Create Modules for Processing
Modules are like little workers following the blueprint to get the job done. In this case, we create a module to process the feedback and generate structured summaries.

In [15]:
class SummarizeFeedback(dspy.Module):
    def __init__(self):
        super().__init__()
        self.summary = dspy.ChainOfThought(FeedbackSummary)

    def forward(self, feedback_text):
        # You can do sort of post processing you need her to return the output you need from you program. 
        # I am just going to return the output of the summary program 
        return self.summary(feedback_text=feedback_text)

Our SummarizeFeedback module takes feedback text and generates structured outputs based on our blueprint.

### 3. Add Training and Validation Examples
Here, I added some training and validation examples to teach the model what good summaries look like. These examples serve as a guide, showing the model the desired output format and content. By providing feedback texts along with the expected summary components, you help the model learn the desired structure and details. The examples include feedback texts with corresponding sentiments, key topics, and action items. The full dataset can be found in the notebook on my repository here: [[link to repo](https://github.com/rawanalkurd/Generative-AI-DSPy)].

In [16]:
# Training examples
def get_examples() -> list[dspy.Example]:
    return [
        dspy.Example(
            feedback_text="The new feature is great, but it needs more documentation.",
            sentiment="positive",
            topics="new feature documentation",
            action_items="improve documentation"
        ).with_inputs("feedback_text"),
        dspy.Example(
            feedback_text="The app is user-friendly, but it crashes frequently.",
            sentiment="negative",
            topics="usability stability",
            action_items="fix crash issues"
        ).with_inputs("feedback_text"),
        dspy.Example(
            feedback_text="Customer support was very helpful, but the response time was slow.",
            sentiment="mixed",
            topics="customer support response time",
            action_items="reduce response time"
        ).with_inputs("feedback_text"),
        dspy.Example(
            feedback_text="I love the design of the app, but there are some bugs that need to be addressed.",
            sentiment="mixed",
            topics="design bugs",
            action_items="fix bugs"
        ).with_inputs("feedback_text"),
        dspy.Example(
            feedback_text="The performance has improved significantly, but the UI could be more intuitive.",
            sentiment="mixed",
            topics="performance UI",
            action_items="improve UI"
        ).with_inputs("feedback_text"),
        dspy.Example(
            feedback_text="The pricing is reasonable, but the free trial period is too short.",
            sentiment="mixed",
            topics="pricing trial period",
            action_items="extend trial period"
        ).with_inputs("feedback_text"),
        dspy.Example(
            feedback_text="The installation process was straightforward, but the app is slow.",
            sentiment="mixed",
            topics="installation performance",
            action_items="improve performance"
        ).with_inputs("feedback_text"),
        dspy.Example(
            feedback_text="The customer support solved my issue quickly, but the initial wait time was long.",
            sentiment="mixed",
            topics="customer support wait time",
            action_items="reduce wait time"
        ).with_inputs("feedback_text"),
        dspy.Example(
            feedback_text="The recent update added useful features, but it also introduced new bugs.",
            sentiment="mixed",
            topics="update bugs",
            action_items="fix new bugs"
        ).with_inputs("feedback_text"),
        dspy.Example(
            feedback_text="The app's search functionality is excellent, but it needs more filters.",
            sentiment="positive",
            topics="search functionality filters",
            action_items="add more filters"
        ).with_inputs("feedback_text")
    ]

# Validation examples
def get_validation_examples() -> list[dspy.Example]:
    return [
        dspy.Example(
            feedback_text="The software is fast, but the interface is not intuitive.",
            sentiment="mixed",
            topics="performance interface",
            action_items="improve interface"
        ).with_inputs("feedback_text"),
        dspy.Example(
            feedback_text="I appreciate the quick updates, but the app is still buggy.",
            sentiment="mixed",
            topics="updates bugs",
            action_items="fix bugs"
        ).with_inputs("feedback_text"),
        dspy.Example(
            feedback_text="The customer service is excellent, but the product quality needs improvement.",
            sentiment="mixed",
            topics="customer service product quality",
            action_items="improve product quality"
        ).with_inputs("feedback_text"),
        dspy.Example(
            feedback_text="The app has many features, but it's overwhelming to new users.",
            sentiment="mixed",
            topics="features usability",
            action_items="improve usability for new users"
        ).with_inputs("feedback_text"),
        dspy.Example(
            feedback_text="The recent changes are beneficial, but the app's speed has decreased.",
            sentiment="mixed",
            topics="changes speed",
            action_items="improve speed"
        ).with_inputs("feedback_text"),
        dspy.Example(
            feedback_text="The design is appealing, but the app crashes on older devices.",
            sentiment="mixed",
            topics="design stability",
            action_items="fix crashes on older devices"
        ).with_inputs("feedback_text"),
        dspy.Example(
            feedback_text="The tutorials are helpful, but there should be more advanced guides.",
            sentiment="positive",
            topics="tutorials guides",
            action_items="add advanced guides"
        ).with_inputs("feedback_text"),
        dspy.Example(
            feedback_text="The app integrates well with other tools, but the synchronization is slow.",
            sentiment="mixed",
            topics="integration synchronization",
            action_items="improve synchronization speed"
        ).with_inputs("feedback_text"),
        dspy.Example(
            feedback_text="The pricing is competitive, but the subscription management is confusing.",
            sentiment="mixed",
            topics="pricing subscription management",
            action_items="simplify subscription management"
        ).with_inputs("feedback_text"),
        dspy.Example(
            feedback_text="The app is intuitive, but it lacks support for some file formats.",
            sentiment="mixed",
            topics="usability file formats",
            action_items="add support for more file formats"
        ).with_inputs("feedback_text")
    ]


### 4. Evaluating Model Performance: 
Next, we set up a metric to evaluate the performance of our model. This function is crucial because it checks whether the generated summary includes all necessary components: sentiment, topics, and action items. It also assesses how well the generated output matches the expected output. I chose to use ROUGE scores for this comparison. There are many other metrics you can choose from, or you can create your own metric depending on your needs and use case.

In [17]:
from rouge_score import rouge_scorer
from typing import Dict, Any

def calculate_rouge_score(prediction: str, reference: str) -> float:
    scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
    scores = scorer.score(reference, prediction)
    return (scores['rouge1'].fmeasure + scores['rouge2'].fmeasure + scores['rougeL'].fmeasure) / 3

def evaluate_summary(example: Dict[str, Any], pred: Dict[str, Any], trace=None) -> float:
    if not pred or not all(key in pred for key in ["sentiment", "topics", "action_items"]):
        return 0.0
    
    expected_sentiment = example.get("sentiment")
    expected_topics = example.get("topics", [])
    expected_action_items = example.get("action_items", [])
    
    predicted_sentiment = pred.get("sentiment")
    predicted_topics = pred.get("topics", [])
    predicted_action_items = pred.get("action_items", [])
    
    sentiment_match = 1.0 if predicted_sentiment == expected_sentiment else 0.0
    
    topics_rouge_score = calculate_rouge_score(predicted_topics, expected_topics)
    action_items_rouge_score = calculate_rouge_score(predicted_action_items, expected_action_items)
    
    return 0.2 * sentiment_match + 0.4 * topics_rouge_score + 0.4 * action_items_rouge_score

def evaluate_model(model, examples):
    from dspy.evaluate.evaluate import Evaluate
    evaluator = Evaluate(metric=evaluate_summary, devset=examples, display_progress=True, display_table=1)
    evaluation = evaluator(model)
    return evaluation

### 5. Initial Model Performance: 
Before we start optimizing our LLM performance, we need baseline performance of our summarizer before optimization.

In [18]:
initial_summarizer = SummarizeFeedback()
initial_performance = evaluate_model(initial_summarizer, get_validation_examples())
print(f"Initial performance: {initial_performance:.2f}")

  0%|          | 0/10 [00:00<?, ?it/s]prompt: /n  Given the fields `feedback_text`, produce the fields `sentiment`, `topics`, `action_items`.

---

Follow the following format.

Feedback Text: ${feedback_text}

Reasoning: Let's think step by step in order to ${produce the action_items}. We ...

Sentiment: ${sentiment}

Topics: ${topics}

Action Items: ${action_items}

---

Feedback Text: The software is fast, but the interface is not intuitive.

Reasoning: Let's think step by step in order to
Average Metric: 0.2111111111111111 / 1  (21.1):  10%|█         | 1/10 [00:02<00:25,  2.85s/it]prompt: /n  Given the fields `feedback_text`, produce the fields `sentiment`, `topics`, `action_items`.

---

Follow the following format.

Feedback Text: ${feedback_text}

Reasoning: Let's think step by step in order to ${produce the action_items}. We ...

Sentiment: ${sentiment}

Topics: ${topics}

Action Items: ${action_items}

---

Feedback Text: I appreciate the quick updates, but the app is still bu

Unnamed: 0,feedback_text,example_sentiment,example_topics,example_action_items,rationale,pred_sentiment,pred_topics,pred_action_items,evaluate_summary
0,"The software is fast, but the interface is not intuitive.",mixed,performance interface,improve interface,"Feedback Text: The software is fast, but the interface is not intuitive. Reasoning: Let's think step by step in order to improve the user experience....",Mixed,"Software performance, User interface",- Identify areas where the interface is not intuitive. - Make the interface more user-friendly.,✔️ [0.2111111111111111]


Initial performance: 28.27


  0%|          | 0/10 [00:00<?, ?it/s]

prompt: /n  Given the fields `feedback_text`, produce the fields `sentiment`, `topics`, `action_items`.

---

Follow the following format.

Feedback Text: ${feedback_text}

Reasoning: Let's think step by step in order to ${produce the action_items}. We ...

Sentiment: ${sentiment}

Topics: ${topics}

Action Items: ${action_items}

---

Feedback Text: The installation process was straightforward, but the app is slow.
Sentiment: mixed
Topics: installation performance
Action Items: improve performance

---

Feedback Text: The app's search functionality is excellent, but it needs more filters.
Sentiment: positive
Topics: search functionality filters
Action Items: add more filters

---

Feedback Text: Customer support was very helpful, but the response time was slow.
Sentiment: mixed
Topics: customer support response time
Action Items: reduce response time

---

Feedback Text: The performance has improved significantly, but the UI could be more intuitive.
Sentiment: mixed
Topics: performanc

 10%|█         | 1/10 [00:01<00:11,  1.29s/it]

prompt: /n  Given the fields `feedback_text`, produce the fields `sentiment`, `topics`, `action_items`.

---

Follow the following format.

Feedback Text: ${feedback_text}

Reasoning: Let's think step by step in order to ${produce the action_items}. We ...

Sentiment: ${sentiment}

Topics: ${topics}

Action Items: ${action_items}

---

Feedback Text: The installation process was straightforward, but the app is slow.
Sentiment: mixed
Topics: installation performance
Action Items: improve performance

---

Feedback Text: The app's search functionality is excellent, but it needs more filters.
Sentiment: positive
Topics: search functionality filters
Action Items: add more filters

---

Feedback Text: The new feature is great, but it needs more documentation.
Sentiment: positive
Topics: new feature documentation
Action Items: improve documentation

---

Feedback Text: Customer support was very helpful, but the response time was slow.
Sentiment: mixed
Topics: customer support response time
A

 20%|██        | 2/10 [00:03<00:15,  1.90s/it]

prompt: /n  Given the fields `feedback_text`, produce the fields `sentiment`, `topics`, `action_items`.

---

Follow the following format.

Feedback Text: ${feedback_text}

Reasoning: Let's think step by step in order to ${produce the action_items}. We ...

Sentiment: ${sentiment}

Topics: ${topics}

Action Items: ${action_items}

---

Feedback Text: The installation process was straightforward, but the app is slow.
Sentiment: mixed
Topics: installation performance
Action Items: improve performance

---

Feedback Text: The app's search functionality is excellent, but it needs more filters.
Sentiment: positive
Topics: search functionality filters
Action Items: add more filters

---

Feedback Text: The new feature is great, but it needs more documentation.
Sentiment: positive
Topics: new feature documentation
Action Items: improve documentation

---

Feedback Text: The performance has improved significantly, but the UI could be more intuitive.
Sentiment: mixed
Topics: performance UI
Acti

 30%|███       | 3/10 [00:04<00:10,  1.49s/it]

prompt: /n  Given the fields `feedback_text`, produce the fields `sentiment`, `topics`, `action_items`.

---

Follow the following format.

Feedback Text: ${feedback_text}

Reasoning: Let's think step by step in order to ${produce the action_items}. We ...

Sentiment: ${sentiment}

Topics: ${topics}

Action Items: ${action_items}

---

Feedback Text: The installation process was straightforward, but the app is slow.
Sentiment: mixed
Topics: installation performance
Action Items: improve performance

---

Feedback Text: The app's search functionality is excellent, but it needs more filters.
Sentiment: positive
Topics: search functionality filters
Action Items: add more filters

---

Feedback Text: The new feature is great, but it needs more documentation.
Sentiment: positive
Topics: new feature documentation
Action Items: improve documentation

---

Feedback Text: Customer support was very helpful, but the response time was slow.
Sentiment: mixed
Topics: customer support response time
A

 40%|████      | 4/10 [00:05<00:07,  1.30s/it]

prompt: /n  Given the fields `feedback_text`, produce the fields `sentiment`, `topics`, `action_items`.

---

Follow the following format.

Feedback Text: ${feedback_text}

Reasoning: Let's think step by step in order to ${produce the action_items}. We ...

Sentiment: ${sentiment}

Topics: ${topics}

Action Items: ${action_items}

---

Feedback Text: The installation process was straightforward, but the app is slow.
Sentiment: mixed
Topics: installation performance
Action Items: improve performance

---

Feedback Text: The app's search functionality is excellent, but it needs more filters.
Sentiment: positive
Topics: search functionality filters
Action Items: add more filters

---

Feedback Text: The new feature is great, but it needs more documentation.
Sentiment: positive
Topics: new feature documentation
Action Items: improve documentation

---

Feedback Text: Customer support was very helpful, but the response time was slow.
Sentiment: mixed
Topics: customer support response time
A

 50%|█████     | 5/10 [00:06<00:06,  1.30s/it]


  0%|          | 0/10 [00:00<?, ?it/s]prompt: /n  Given the fields `feedback_text`, produce the fields `sentiment`, `topics`, `action_items`.

---

Follow the following format.

Feedback Text: ${feedback_text}

Reasoning: Let's think step by step in order to ${produce the action_items}. We ...

Sentiment: ${sentiment}

Topics: ${topics}

Action Items: ${action_items}

---

Feedback Text: The new feature is great, but it needs more documentation.

Reasoning: Let's think step by step in order to produce the action_items. We need to provide more documentation for the new feature.

Sentiment: positive

Topics: new feature documentation

Action Items: provide more documentation

---

Feedback Text: The app is user-friendly, but it crashes frequently.

Reasoning: Let's think step by step in order to produce the action_items. We need to identify the issues and their causes. The feedback mentions that the app is user-friendly, which is a positive aspect. However, it also mentions that the app 

Unnamed: 0,feedback_text,example_sentiment,example_topics,example_action_items,rationale,pred_sentiment,pred_topics,pred_action_items,evaluate_summary
0,"The software is fast, but the interface is not intuitive.",mixed,performance interface,improve interface,produce the action_items}. We need to improve the intuitiveness of the interface.,positive,speed interface,improve interface intuitiveness,✔️ [0.4355555555555556]


Optimized performance: 72.51
prompt: /n  Given the fields `feedback_text`, produce the fields `sentiment`, `topics`, `action_items`.

---

Follow the following format.

Feedback Text: ${feedback_text}

Reasoning: Let's think step by step in order to ${produce the action_items}. We ...

Sentiment: ${sentiment}

Topics: ${topics}

Action Items: ${action_items}

---

Feedback Text: The new feature is great, but it needs more documentation.

Reasoning: Let's think step by step in order to produce the action_items. We need to provide more documentation for the new feature.

Sentiment: positive

Topics: new feature documentation

Action Items: provide more documentation

---

Feedback Text: The app is user-friendly, but it crashes frequently.

Reasoning: Let's think step by step in order to produce the action_items. We need to identify the issues and their causes. The feedback mentions that the app is user-friendly, which is a positive aspect. However, it also mentions that the app crashes f

Unnamed: 0,feedback_text,example_sentiment,example_topics,example_action_items,rationale,pred_sentiment,pred_topics,pred_action_items,evaluate_summary
0,"The software is fast, but the interface is not intuitive.",mixed,performance interface,improve interface,"Feedback Text: The software is fast, but the interface is not intuitive. Reasoning: Let's think step by step in order to improve the user experience....",Mixed,"Software performance, User interface",- Identify areas where the interface is not intuitive. - Make the interface more user-friendly.,✔️ [0.2111111111111111]


Initial performance: 28.25


The initial summarizer without optimization scored 28.30, which was a decent start but needed improvement.

### 6. Optimizing the Model
Optimizers are like personal trainers for your modules, helping them improve at their tasks by learning from examples. I decided to use the BootstrapFewShot optimizer to enhance the model's performance by generating better training examples, as shown below:

In [19]:
optimizer = BootstrapFewShot(metric=evaluate_summary, max_bootstrapped_demos=5, max_labeled_demos=5)
compiled_summarizer = optimizer.compile(SummarizeFeedback(), trainset=get_examples())
optimized_performance = evaluate_model(compiled_summarizer, get_validation_examples())
print(f"Optimized performance: {optimized_performance:.2f}")

  0%|          | 0/10 [00:00<?, ?it/s]

prompt: /n  Given the fields `feedback_text`, produce the fields `sentiment`, `topics`, `action_items`.

---

Follow the following format.

Feedback Text: ${feedback_text}

Reasoning: Let's think step by step in order to ${produce the action_items}. We ...

Sentiment: ${sentiment}

Topics: ${topics}

Action Items: ${action_items}

---

Feedback Text: The installation process was straightforward, but the app is slow.
Sentiment: mixed
Topics: installation performance
Action Items: improve performance

---

Feedback Text: The app's search functionality is excellent, but it needs more filters.
Sentiment: positive
Topics: search functionality filters
Action Items: add more filters

---

Feedback Text: Customer support was very helpful, but the response time was slow.
Sentiment: mixed
Topics: customer support response time
Action Items: reduce response time

---

Feedback Text: The performance has improved significantly, but the UI could be more intuitive.
Sentiment: mixed
Topics: performanc

 10%|█         | 1/10 [00:01<00:10,  1.15s/it]

prompt: /n  Given the fields `feedback_text`, produce the fields `sentiment`, `topics`, `action_items`.

---

Follow the following format.

Feedback Text: ${feedback_text}

Reasoning: Let's think step by step in order to ${produce the action_items}. We ...

Sentiment: ${sentiment}

Topics: ${topics}

Action Items: ${action_items}

---

Feedback Text: The installation process was straightforward, but the app is slow.
Sentiment: mixed
Topics: installation performance
Action Items: improve performance

---

Feedback Text: The app's search functionality is excellent, but it needs more filters.
Sentiment: positive
Topics: search functionality filters
Action Items: add more filters

---

Feedback Text: The new feature is great, but it needs more documentation.
Sentiment: positive
Topics: new feature documentation
Action Items: improve documentation

---

Feedback Text: Customer support was very helpful, but the response time was slow.
Sentiment: mixed
Topics: customer support response time
A

 20%|██        | 2/10 [00:03<00:12,  1.58s/it]

prompt: /n  Given the fields `feedback_text`, produce the fields `sentiment`, `topics`, `action_items`.

---

Follow the following format.

Feedback Text: ${feedback_text}

Reasoning: Let's think step by step in order to ${produce the action_items}. We ...

Sentiment: ${sentiment}

Topics: ${topics}

Action Items: ${action_items}

---

Feedback Text: The installation process was straightforward, but the app is slow.
Sentiment: mixed
Topics: installation performance
Action Items: improve performance

---

Feedback Text: The app's search functionality is excellent, but it needs more filters.
Sentiment: positive
Topics: search functionality filters
Action Items: add more filters

---

Feedback Text: The new feature is great, but it needs more documentation.
Sentiment: positive
Topics: new feature documentation
Action Items: improve documentation

---

Feedback Text: The performance has improved significantly, but the UI could be more intuitive.
Sentiment: mixed
Topics: performance UI
Acti

 30%|███       | 3/10 [00:03<00:08,  1.28s/it]

prompt: /n  Given the fields `feedback_text`, produce the fields `sentiment`, `topics`, `action_items`.

---

Follow the following format.

Feedback Text: ${feedback_text}

Reasoning: Let's think step by step in order to ${produce the action_items}. We ...

Sentiment: ${sentiment}

Topics: ${topics}

Action Items: ${action_items}

---

Feedback Text: The installation process was straightforward, but the app is slow.
Sentiment: mixed
Topics: installation performance
Action Items: improve performance

---

Feedback Text: The app's search functionality is excellent, but it needs more filters.
Sentiment: positive
Topics: search functionality filters
Action Items: add more filters

---

Feedback Text: The new feature is great, but it needs more documentation.
Sentiment: positive
Topics: new feature documentation
Action Items: improve documentation

---

Feedback Text: Customer support was very helpful, but the response time was slow.
Sentiment: mixed
Topics: customer support response time
A

 40%|████      | 4/10 [00:05<00:07,  1.23s/it]

prompt: /n  Given the fields `feedback_text`, produce the fields `sentiment`, `topics`, `action_items`.

---

Follow the following format.

Feedback Text: ${feedback_text}

Reasoning: Let's think step by step in order to ${produce the action_items}. We ...

Sentiment: ${sentiment}

Topics: ${topics}

Action Items: ${action_items}

---

Feedback Text: The installation process was straightforward, but the app is slow.
Sentiment: mixed
Topics: installation performance
Action Items: improve performance

---

Feedback Text: The app's search functionality is excellent, but it needs more filters.
Sentiment: positive
Topics: search functionality filters
Action Items: add more filters

---

Feedback Text: The new feature is great, but it needs more documentation.
Sentiment: positive
Topics: new feature documentation
Action Items: improve documentation

---

Feedback Text: Customer support was very helpful, but the response time was slow.
Sentiment: mixed
Topics: customer support response time
A

 50%|█████     | 5/10 [00:06<00:06,  1.26s/it]


  0%|          | 0/10 [00:00<?, ?it/s]prompt: /n  Given the fields `feedback_text`, produce the fields `sentiment`, `topics`, `action_items`.

---

Follow the following format.

Feedback Text: ${feedback_text}

Reasoning: Let's think step by step in order to ${produce the action_items}. We ...

Sentiment: ${sentiment}

Topics: ${topics}

Action Items: ${action_items}

---

Feedback Text: The new feature is great, but it needs more documentation.

Reasoning: Let's think step by step in order to produce the action_items. We need to provide more documentation for the new feature.

Sentiment: positive

Topics: new feature documentation

Action Items: provide more documentation

---

Feedback Text: The app is user-friendly, but it crashes frequently.

Reasoning: Let's think step by step in order to produce the action_items. We need to identify the issues and their causes. The feedback mentions that the app is user-friendly, which is a positive aspect. However, it also mentions that the app 

Unnamed: 0,feedback_text,example_sentiment,example_topics,example_action_items,rationale,pred_sentiment,pred_topics,pred_action_items,evaluate_summary
0,"The software is fast, but the interface is not intuitive.",mixed,performance interface,improve interface,produce the action_items}. We need to improve the intuitiveness of the interface.,positive,speed interface,improve interface intuitiveness,✔️ [0.4355555555555556]


Optimized performance: 72.51


After using the BootstrapFewShot optimizer, our model's performance shot up to 72.51. That's a huge jump and shows that the optimizer really helped the model get better at picking up the right sentiment, key topics, and action items from the feedback.

### Key Takeaways
- Big Improvement: Going from a score of 28.30 to 72.51 is a significant boost. It means our model is now much more accurate and reliable.
- Effective Optimization: The optimizer played a crucial role here. By generating and including high-quality examples, it fine-tuned our model to perform much better.
- Better Summarization: Now, our model can provide more accurate and detailed summaries of customer feedback, which can be extremely useful for making informed decisions.
- Cost-Effective Enhancement: This improvement was achieved without fine-tuning the model, simply by tweaking the prompt. This is a big win in terms of cost efficiency.
- Modularization with DSPy: DSPy helped to modularize the problem, making it easy to optimize and manage different components of the task.

In simpler terms, we started with a model that was just okay, but after some smart optimization, it became really good at its job. This kind of improvement is exactly what we want when dealing with complex tasks like feedback summarization.

Stay tuned for more blog posts where we'll explore DSPy in greater depth and uncover its features. For more details on DSPy, visit their official documentation: https://dspy-docs.vercel.app/
You can access the full source code on my GitHub here: https://github.com/rawanalkurd/Generative-AI-DSPy