# Story Generator - Winnie the Pooh 

## Part 3: Integrating Readability Metrics into DSPy Module

[1. Imports and environment](#1-imports-and-environment)

[2. DSPy set up](#2-dspy-set-up)

[3. Evaluation metrics](#3-evaluation-metrics)

### 1. Imports and environment

In [1]:
#pip install dspy-ai openai chromadb sentence_transformers spacy textstat asyncio deepeval

In [2]:
import os
os.environ['DEEPEVAL_TELEMETRY_OPT_OUT'] = "YES"


import dspy
from dspy.retrieve.chromadb_rm import ChromadbRM
import chromadb
from chromadb.utils import embedding_functions
import dotenv

from deepeval.metrics import AnswerRelevancyMetric
from deepeval.test_case import LLMTestCase

from evaluation_metrics import *


# Establish paths
CHROMA_PATH = '../data/chroma_db'
DB_COLLECTION = "winnie_the_pooh"
default_ef = embedding_functions.DefaultEmbeddingFunction()

# Set up OpenAI API key
dotenv.load_dotenv()
#openai_key = os.getenv('OPENAI_API_KEY')

True

### 2. DSPy Set up

Taken from previous notebook.

In [3]:
# Configure OpenAI as the language model
llm = dspy.OpenAI(model="gpt-4o-mini", max_tokens=1000, temperature=1.0)

# Set up Chroma client and retriever
chroma_client = chromadb.PersistentClient(path=CHROMA_PATH)
collection = chroma_client.get_collection(DB_COLLECTION)

# Set up ChromadbRM as the retriever model
chroma_retriever = ChromadbRM(
    collection_name=DB_COLLECTION, 
    persist_directory=CHROMA_PATH, 
    embedding_function=default_ef,
    )

# Configure DSPy settings
dspy.settings.configure(lm=llm, rm=chroma_retriever)

In [4]:

class GenerateStory(dspy.Signature):
    """Generate a Winnie the Pooh style story."""
    name = dspy.InputField()
    prompt = dspy.InputField(desc="details to include in the story.")
    context = dspy.InputField(desc="relevant passages from Winnie the Pooh stories and story structure.")
    story = dspy.OutputField(desc="generate a one-minute story for a child. Name is the main character who is friends with Pooh, and finish the story with 'The End.'")


class StoryGenerator(dspy.Module):
    def __init__(self, chroma_retriever):
        super().__init__()
        self.retriever = chroma_retriever
        self.generate = dspy.ChainOfThought(GenerateStory)

    def forward(self, name, prompt):
        retrieved = self.retriever(prompt, k=8)
        retrieved_context = [doc.long_text for doc in retrieved]
        context = "\n".join(retrieved_context)
        
        result = self.generate(context=context, prompt=prompt, name=name)
        return dspy.Prediction(story=result.story)

story_gen = StoryGenerator(chroma_retriever)

In [5]:
name= 'Hannah'
prompt = "They go on an adventure and climb a tree."


new_story = story_gen(name, prompt)
print(new_story.story)

 		You are using the client GPT3, which will be removed in DSPy 2.6.
 		Changing the client is straightforward and will let you use new features (Adapters) that improve the consistency of LM outputs, especially when using chat LMs. 

 		Learn more about the changes and how to migrate at
 		https://github.com/stanfordnlp/dspy/blob/main/examples/migration.ipynb


Once upon a time in the wonderful Hundred Acre Wood, there lived a cheerful little girl named Hannah. One sunny morning, while wandering through the forest, she heard a familiar hum. It was her dear friend, Winnie the Pooh! 

“Hannah!” called Pooh, bouncing up and down with excitement. “I just discovered the most delicious honey up in that tall tree!” He pointed to a grand oak that seemed to touch the sky. 

“Let’s climb it together, Pooh!” Hannah exclaimed. So, hand in paw, they began their adventure. As they climbed higher and higher, Hannah sang a little song:

“Isn’t it funny,
How we climb this tree,
To find some honey,
Just Pooh and me?”

Pooh chuckled as they climbed, his little belly bouncing along. “Oh, how I love honey!” he said. 

After a few moments, they reached a sturdy branch with a view of the whole forest. “Look, Hannah! There’s the river and all our friends down below!” Pooh exclaimed.

“Wow, Pooh! This is wonderful!” Hannah replied with wide eyes. 

But just then, a g

### 3. Evaluation metrics

#### DeepEval - Answer Relevancy

In [6]:

actual_output = new_story.story


# Initialize the AnswerRelevancyMetric
metric_relecancy = AnswerRelevancyMetric(
    threshold=0.7,
    model="gpt-4o-mini",
    include_reason=True
)

test_case = LLMTestCase(
    input=prompt,
    actual_output=actual_output
)

# Calculate the relevancy score
metric_relecancy.measure(test_case)
print(metric_relecancy.score)
print(metric_relecancy.reason)

Output()

0.4090909090909091
The score is 0.41 because while some elements of the output touch on adventure, many statements are irrelevant to the core theme of tree climbing and do not provide actionable or relevant details about the adventure.


DeepEval's built-in AnswerRelevancyMetric does not seem to be an appropriate metric in this case. Generating a fictional story will inevitably include "irrelevant" text from the context. I will instead define a metric that will better assess the appropriateness of the output, by measuring the readability of the generated story. 

#### Readability Score (Flesch-Kincaid Grade)

In [7]:

test_case = LLMTestCase(input=prompt, actual_output=new_story.story)
metric = ReadabilityMetric(threshold_high=3.0, threshold_low=2.0)

result = metric.measure(test_case)

print("Readability acceptable:", result)
print(f"Readability score: {calculate_readability_scores(new_story.story)['Flesch-Kincaid Grade']}")

Readability acceptable: False
Readability score: 4.3


The actual Winnie the Pooh stories have an average readability score 3.8, and standard deviation of 0.8. I would like the generated stories to fall within one standard deviations of the mean, or within the range of 3.0-4.6. 

In [8]:
def evaluate_readability(name, prompt):
    """
    Generate a story and evaluate its readability.

    input: name, prompt
    output: readability pass (bool), generated story
    """

    new_story = story_gen(name,prompt)
    actual_output = new_story.story

    # Initialize the ReadabilityMetric
    metric = ReadabilityMetric(
        threshold_high=4.6,
        #threshold_low=0.0
    )

    test_case = LLMTestCase(
        input= prompt,
        actual_output=actual_output
    )

    return metric.measure(test_case), actual_output


def print_story(name, prompt):
    """ 
    print the story if it passes the readability metric, try again with simpler words and sentences if it fails,
    otherwise provide a suggestion to simplify the story.

    input: name, prompt
    output: story or suggestion
    """
    results = evaluate_readability(name, prompt)

    pass_metric = results[0]

    if pass_metric:
        return results[1]

    else:
        new_prompt = prompt + " Write the story using simplistic words and sentences."

        new_results = evaluate_readability(name, new_prompt)

        if new_results[0]:
            #print("Second Try: \n", new_results[1]) 
            return new_results[1]

        else:
            return "I'm sorry, I was not able to write you a story. Try a different prompt."


In [9]:
test_name = 'Hannah'
test_prompt = "They go on an adventure and climb a tree."

print_story(test_name, test_prompt)

'Once upon a time in the wonderful Hundred Acre Wood, there lived a cheerful little girl named Hannah. One sunny morning, while wandering through the forest, she heard a familiar hum. It was her dear friend, Winnie the Pooh! \n\n“Hannah!” called Pooh, bouncing up and down with excitement. “I just discovered the most delicious honey up in that tall tree!” He pointed to a grand oak that seemed to touch the sky. \n\n“Let’s climb it together, Pooh!” Hannah exclaimed. So, hand in paw, they began their adventure. As they climbed higher and higher, Hannah sang a little song:\n\n“Isn’t it funny,\nHow we climb this tree,\nTo find some honey,\nJust Pooh and me?”\n\nPooh chuckled as they climbed, his little belly bouncing along. “Oh, how I love honey!” he said. \n\nAfter a few moments, they reached a sturdy branch with a view of the whole forest. “Look, Hannah! There’s the river and all our friends down below!” Pooh exclaimed.\n\n“Wow, Pooh! This is wonderful!” Hannah replied with wide eyes. \n\n

### 4. Gradio UI

In [10]:
import gradio as gr
from theme_violet_amber import theme as violet_amber


# Gradio UI
with gr.Blocks(theme=violet_amber) as demo:
    gr.Markdown(
    """
    # Winnie the Pooh Story Generator
    *Simply enter a character name and setting, then I will write a story from the Hundred Acre Woods for you!*
    """)
    textbox = gr.Textbox(label="Character Name")
    textbox2 = gr.Textbox(label="Story Setting")
    
    with gr.Row():
        button = gr.Button("Submit", variant="primary")
        clear = gr.Button('Clear')
    
    output = gr.Textbox(label="A story for you... ")
    
    button.click(print_story, [textbox, textbox2], output)

demo.launch()


* Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.




In [11]:
demo.close()

Closing server running on port: 7860
