# Story Generator - Winnie the Pooh 

## Part 2: Simple DSPy Retriever and Module

[1. Imports and environment](#1-imports-and-environment)

[2. Chroma retriever](#2-chroma-retriever)

[3. DSPy module](#3-dspy-module)

[4. Testing StoryGenerator](#4-testing-storygenerator)

[5. Evaluation metrics - readability](#5-evaluation-metrics---readability)

### 1. Imports and environment

In [1]:
#pip install dspy-ai openai chromadb sentence_transformers spacy textstat

In [None]:
import dspy
from dspy.retrieve.chromadb_rm import ChromadbRM
import chromadb
from chromadb.utils import embedding_functions
import dotenv

from deepeval.metrics import AnswerRelevancyMetric, BaseMetric 
from deepeval.test_case import LLMTestCase

from evaluation_metrics import *


# Establish paths
CHROMA_PATH = '../data/chroma_db'
DB_COLLECTION = "winnie_the_pooh"
default_ef = embedding_functions.DefaultEmbeddingFunction()

# Set up OpenAI API key
dotenv.load_dotenv()
#openai_key = os.getenv('OPENAI_API_KEY')

True

### 2. DSPy Set up

Taken from previous notebook.

In [3]:
# Configure OpenAI as the language model
llm = dspy.OpenAI(model="gpt-4o-mini", max_tokens=1000, temperature=1.0)

# Set up Chroma client and retriever
chroma_client = chromadb.PersistentClient(path=CHROMA_PATH)
collection = chroma_client.get_collection(DB_COLLECTION)

# Set up ChromadbRM as the retriever model
chroma_retriever = ChromadbRM(
    collection_name=DB_COLLECTION, 
    persist_directory=CHROMA_PATH, 
    embedding_function=default_ef,
    )

# Configure DSPy settings
dspy.settings.configure(lm=llm, rm=chroma_retriever)

In [7]:

class GenerateStory(dspy.Signature):
    """Generate a Winnie the Pooh style story."""
    name = dspy.InputField()
    prompt = dspy.InputField(desc="details to include in the story.")
    context = dspy.InputField(desc="relevant passages from Winnie the Pooh stories and story structure.")
    story = dspy.OutputField(desc="generate a one-minute story for a child. Name is the main character who is friends with Pooh, and finish the story with 'The End.'")


class StoryGenerator(dspy.Module):
    def __init__(self, chroma_retriever):
        super().__init__()
        self.retriever = chroma_retriever
        self.generate = dspy.ChainOfThought(GenerateStory)

    def forward(self, name, prompt):
        retrieved = self.retriever(prompt, k=8)
        retrieved_context = [doc.long_text for doc in retrieved]
        context = "\n".join(retrieved_context)
        
        result = self.generate(context=context, prompt=prompt, name=name)
        return dspy.Prediction(story=result.story)

story_gen = StoryGenerator(chroma_retriever)

In [24]:
name= 'Hannah'
prompt = "They go on an adventure and climb a tree."

new_name = 'Hannah'
new_prompt = "They go on an adventure and climb a tree."

new_story = story_gen(name, prompt)
print(new_story.story)

Once upon a time in the Hundred Acre Wood, there lived a cheerful little girl named Hannah. She was the best of friends with Pooh, who was always in search of honey. One sunny morning, Hannah said, “Pooh, let’s climb that big tree over there! I want to see the world from up high!” 

“Climb a tree? Oh, that sounds like a splendid idea!” Pooh replied, his eyes shining with excitement. So off they went, arms and legs moving cheerfully towards the tall tree that tickled the clouds.

As they approached, Pooh looked up and sang a little song to himself: 
“Isn't it funny, 
How a bear likes honey?
Buzz! Buzz! Buzz! 
I hope there’s some that I can see!”

Hannah giggled and began to climb, her little feet finding the branches easily as she went higher and higher. “Look, Pooh! I can see the pond and Rabbit’s house from up here!” she called out.

Pooh climbed slowly, careful not to lose his balance. “Oh dear, I love how the sun shines on the leaves like honey on toast!” he exclaimed, smiling at th

### 3. Evaluation metrics

#### DeepEval - Answer Relevancy

In [25]:

actual_output = new_story.story


# Initialize the AnswerRelevancyMetric
metric = AnswerRelevancyMetric(
    threshold=0.7,
    model="gpt-4o-mini",
    include_reason=True
)

test_case = LLMTestCase(
    input=prompt,
    actual_output=actual_output
)

# Calculate the relevancy score
metric.measure(test_case)
print(metric.score)
print(metric.reason)

Output()

0.6875
The score is 0.69 because while the output contains relevant elements of an adventure, it includes several irrelevant statements that distract from the primary focus of climbing the tree. These extraneous details about characters and their feelings dilute the central narrative, preventing the score from being higher.


DeepEval's built-in AnswerRelevancyMetric does not seem to be an appropriate metric in this case. Generating a fictional story will inevitably include "irrelevant" text from the context. I will instead define a metric that will better assess the appropriateness of the output, by measuring the readability of the generated story. 

#### Readability Score (Flesch-Kincaid Grade)

In [26]:

test_case = LLMTestCase(input=prompt, actual_output=new_story.story)
readability_metric = ReadabilityMetric(threshold_high=3.0, threshold_low=2.0)
result = readability_metric.measure(test_case)
print("Readability acceptable:", result)
print(f"Readability score: {calculate_readability_scores(new_story.story)['Flesch-Kincaid Grade']}")

Readability acceptable: False
Readability score: 4.7


The actual Winnie the Pooh stories have an average readability score 3.8, and standard deviation of 0.8. I would like the generated stories to fall within one standard deviations of the mean, or within the range of 3.0-4.6. 

In [27]:
def evaluate_readability(name, prompt):
    """
    Generate a story and evaluate its readability.

    input: name, prompt
    output: readability pass (bool), generated story
    """

    new_story = story_gen(name,prompt)
    actual_output = new_story.story

    # Initialize the ReadabilityMetric
    metric = ReadabilityMetric(
        threshold_high=4.6,
        threshold_low=3.0
    )

    test_case = LLMTestCase(
        input= prompt,
        actual_output=actual_output
    )

    return metric.measure(test_case), actual_output


def print_story(name, prompt):
    """ 
    print the story if it passes the readability metric, try again with simpler words and sentences if it fails,
    otherwise provide a suggestion to simplify the story.

    input: name, prompt
    output: story or suggestion
    """
    results = evaluate_readability(name, prompt)

    pass_metric = results[0]

    if pass_metric:
        print(results[1])

    else:
        new_prompt = prompt + " Write the story using simplistic words and sentences."

        new_results = evaluate_readability(name, new_prompt)

        if new_results[0]:
            print("Second Try: \n", new_results[1])

        else:
            return "I'm sorry, I was not able to write you a story. Try a different setting."


In [28]:
new_name = 'Hannah'
new_prompt = "They go on an adventure and climb a tree."

print_story(new_name, new_prompt)

Second Try: 
 Once upon a time, in the lovely Hundred Acre Wood, there lived a little girl named Hannah. She was a brave friend of Winnie-the-Pooh. One sunny day, Hannah said, "Let’s go on an adventure, Pooh!" 

Pooh smiled and nodded. "Oh, yes! Adventures are very nice!" So, off they went, laughing and singing as they wandered through the tall trees and bright flowers. 

After a while, they came to a big tree. It was the biggest tree they had ever seen! "Look, Pooh! Let’s climb it!" Hannah said excitedly. Pooh loved this idea. “Oh, I do love climbing—sometimes I see honey from up high!” he replied.

Hannah took a deep breath and began to climb. “One step, two steps, three steps!” she counted. Pooh followed along, a little slower because he was thinking of honey. “I think I smell honey!” he exclaimed as they climbed higher.

When they reached the first branch, they stopped to look around. “Oh, what a wonderful view!” said Hannah. The flowers looked like tiny dots below them. Pooh said,