# Story Generator - Winnie the Pooh 

## Part 2: Simple DSPy Retriever and Module

[1. Imports and environment](#1-imports-and-environment)

[2. Chroma retriever](#2-chroma-retriever)

[3. DSPy module](#3-dspy-module)

[4. Testing StoryGenerator](#4-testing-storygenerator)

[5. Evaluation metrics](#5-evaluation-metrics)
- [5.1. Readability scores](#51-readability-scores)
- [5.2. Sentiment analysis](#52-sentiment-analysis)

### 1. Imports and environment

In [1]:
#pip install dspy-ai openai chromadb sentence_transformers spacy textstat

In [19]:
import dspy
from dspy.retrieve.chromadb_rm import ChromadbRM
import chromadb
from chromadb.utils import embedding_functions
import dotenv
import os
import spacy
from textstat import flesch_reading_ease, flesch_kincaid_grade
from textstat.textstat import textstatistics

# Establish paths
CHROMA_PATH = '../data/chroma_db'
DB_COLLECTION = "winnie_the_pooh"
default_ef = embedding_functions.DefaultEmbeddingFunction()

# Set up OpenAI API key
dotenv.load_dotenv()
#openai_key = os.getenv('OPENAI_API_KEY')

True

In [2]:
# List all collections in the Chroma database
chroma_client = chromadb.PersistentClient(path=CHROMA_PATH)
collections = chroma_client.list_collections()
print(collections)

[Collection(id=5b97b6bf-4d1a-4d4c-977e-ec9c78025777, name=winnie_the_pooh)]


### 2. Chroma retriever

In [3]:
# Configure OpenAI as the language model
llm = dspy.OpenAI(model="gpt-4o-mini", max_tokens=1000, temperature=1.3)

# Set up Chroma client and retriever
chroma_client = chromadb.PersistentClient(path=CHROMA_PATH)
collection = chroma_client.get_collection(DB_COLLECTION)

# Set up ChromadbRM as the retriever model
chroma_retriever = ChromadbRM(
    collection_name=DB_COLLECTION, 
    persist_directory=CHROMA_PATH, 
    embedding_function=default_ef,
    )

# Configure DSPy settings
dspy.settings.configure(lm=llm, rm=chroma_retriever)

In [4]:
#example of calling retriever
results = chroma_retriever("honey")
len(results)
results[0]

{'id': '81857d41-f75e-4db3-98ee-9d6bbaf14858',
 'score': 1.08407461643219,
 'long_text': 'then he got up, and said: "And the only reason for making honey is so as I can eat it." So he began to climb the tree. He climbed and he climbed and he climbed, and as he climbed he sang a little',
 'metadatas': {'author': 'A. A. Milne',
  'chapter': 1.0,
  'chunk': 17.0,
  'title': 'Winnie the Pooh'}}

### 3. DSPy module

In [5]:

class GenerateStory(dspy.Signature):
    """Generate a Winnie the Pooh style story."""
    context = dspy.InputField(desc="relevant passages from Winnie the Pooh stories and story structure.")
    prompt = dspy.InputField(desc="details to include in the story.")
    story = dspy.OutputField(desc="generate a one minute story for a child.")


class StoryGenerator(dspy.Module):
    def __init__(self, chroma_retriever):
        super().__init__()
        self.retriever = chroma_retriever
        self.generate = dspy.ChainOfThought(GenerateStory)

    def forward(self, prompt):
        retrieved = self.retriever(prompt, k=8)
        context = "\n".join([doc.long_text for doc in retrieved])

        result = self.generate(context=context, prompt=prompt)
        return dspy.Prediction(story=result.story)


### 4. Testing StoryGenerator 

In [6]:
# Create an instance of the StoryGenerator
story_gen = StoryGenerator(chroma_retriever)

new_story = story_gen("Winnie the Pooh and friends go on a picnic")
print(new_story)


 		You are using the client GPT3, which will be removed in DSPy 2.6.
 		Changing the client is straightforward and will let you use new features (Adapters) that improve the consistency of LM outputs, especially when using chat LMs. 

 		Learn more about the changes and how to migrate at
 		https://github.com/stanfordnlp/dspy/blob/main/examples/migration.ipynb


Prediction(
    story='On a bright and cheerful morning in the Hundred Acre Wood, Winnie-the-Pooh had a splendid idea. "Oh, what a lovely day for a picnic! I\'ll invite all my friends!" He smacked his paws together as a gentle breeze rustled the leaves above him.\n\nFirst, Pooh waddled over to Piglet’s house. "Piglet," he called, "would you like to come on a picnic with me?" \n\n"Oh, how wonderful, Pooh!" squeaked Piglet, washing his hands carefully. "What should I bring?" \n\n"Perhaps you might bring some of your delicious little sandwiches?" Pooh suggested generously. Off went Piglet with excitement tucked under his little hoof.\n\nNext, Pooh bounced over to Rabbit\'s home, finding him busy in the garden. "Rabbit, how about some honey for the picnic?" asked Pooh with delight.\n\n“I can gather the veggies!” said Rabbit, carefully arranging his carrots.  “Don\'t forget to bring napkins, Pooh!”\n\nFeeling pleased with his invites, Pooh replied, "Of course! We’ll have the finest picnic e

In [7]:
new_story2 = story_gen("Rosie, a little girl, and Pooh climb a tree")
print(new_story2)

Prediction(
    story='Once upon a time in the sunny, cheerful Hundred Acre Wood, a little girl named Rosie met her friend, Pooh Bear. Blooms danced in the breeze, and everything smelled pleasantly of honey – or was it just Pooh somewhere? \n\n“Hello, Pooh! Would you like to climb a tree with me?” Rosie squeaked with excitement.\n\n“Oh, I do love climbing trees, especially when there might be a pot of honey at the top,” hummed Pooh, scratching his nose delightfully. So up they went, the little girl filling with curiosity and Pooh, well, wondering about the honey likely waiting just above.\n\nAs they started up the tree, Pooh\'s tummy rumbled. "Rosie," he called, peering way up, "Do you think I’ll find a pot of honey sitting on one of those branches?" \n\nRosie giggled, "Maybe, if we go just a little bit higher!" They climbed past fluttering butterflies and birds singing a cheery tune. \n\n“What if we see the bees?” Pooh pondered aloud, imagining pots overflowing with honey, swirling in

### 5. Evaluation Metrics

#### 5.1. Readability Scores

**Flesch–Kincaid grade level** - core measurements are word and sentence length
- score is grade level, where lowest possible score is -3.40 (ex. Dr. Suess' Green Eggs and Ham has a grade level of -1.3)
- formula:  grade = 0.39 * ( total words / total sentences ) + 11.8 * ( total syllables / total words ) - 15.59


In [20]:

# Splits the text into sentences, using spacy's sentence segmentation
def break_sentences(text):
	nlp = spacy.load('en_core_web_sm')
	doc = nlp(text)
	return list(doc.sents)

# Returns Number of Words in the text
def word_count(text):
	sentences = break_sentences(text)
	words = 0
	for sentence in sentences:
		words += len([token for token in sentence])
	return words

# Returns the number of sentences in the text
def sentence_count(text):
	sentences = break_sentences(text)
	return len(sentences)

# Returns average sentence length
def avg_sentence_length(text):
	words = word_count(text)
	sentences = sentence_count(text)
	average_sentence_length = float(words / sentences)
	return average_sentence_length

# Using textstat library to calculate syllables in a word
def syllables_count(word):
	return textstatistics().syllable_count(word)

# Returns the average number of syllables per word in the text
def avg_syllables_per_word(text):
	syllable = syllables_count(text)
	words = word_count(text)
	ASPW = float(syllable) / float(words)
	return round(ASPW, 1)

# Return total Difficult Words in a text
def difficult_words(text):
	
	nlp = spacy.load('en_core_web_sm')
	doc = nlp(text)
	# Find all words in the text
	words = []
	sentences = break_sentences(text)
	for sentence in sentences:
		words += [str(token) for token in sentence]

	# difficult words are those with syllables >= 2
	# easy_word_set is provide by Textstat as 
	# a list of common words
	diff_words_set = set()
	
	for word in words:
		syllable_count = syllables_count(word)
		if word not in nlp.Defaults.stop_words and syllable_count >= 2:
			diff_words_set.add(word)

	return len(diff_words_set)

# A word is polysyllablic if it has more than 3 syllables
# Counts the number of polysyllabic words in the text
def poly_syllable_count(text):
	count = 0
	words = []
	sentences = break_sentences(text)
	for sentence in sentences:
		words += [token for token in sentence]

	for word in words:
		
		try: 
			syllable_count = syllables_count(word)
			if syllable_count >= 3:
				count += 1
		except:
			pass
	return count


def flesch_kincaid_grade(text):
	"""
		Implements Flesch-Kincaid Grade Formula:
		grade level = 0.39 * (ASL) + 11.8 * (AWL) - 15.59
		Where,
		ASL = average sentence length (number of words divided by number of sentences)
		ASW = average word length in syllables (number of syllables divided by number of words)
	"""
	FRE = 0.39 * float(avg_sentence_length(text)) + 11.8 * float(avg_syllables_per_word(text)) - 15.59
	return round(FRE, 1)


def calculate_readability_scores(text):
    sentences = len(break_sentences(text))
    words = word_count(text)
    syllables = syllables_count(text)
    
    flesch_ease = flesch_reading_ease(text)
    flesch_kincaid = flesch_kincaid_grade(text)
    
    return {
        "Flesch Reading Ease": flesch_ease,
        "Flesch-Kincaid Grade": flesch_kincaid,
        "Sentence Count": sentences,
        "Word Count": words,
        "Syllable Count": syllables,
        "Average Syllables per Word": avg_syllables_per_word(text)
    }


In [22]:
calculate_readability_scores(new_story2.story)

{'Flesch Reading Ease': 75.71,
 'Flesch-Kincaid Grade': 2.6,
 'Sentence Count': 31,
 'Word Count': 505,
 'Syllable Count': 517,
 'Average Syllables per Word': 1.0}

#### 5.2. Sentiment Analysis

