# 2. Get the data 

In [None]:
data = pd.read_csv("full_dataset.csv").drop('Unnamed: 0', axis=1)

In [None]:
data.head()

Unnamed: 0,title,ingredients,directions,link,source,NER
0,No-Bake Nut Cookies,"[""1 c. firmly packed brown sugar"", ""1/2 c. eva...","[""In a heavy 2-quart saucepan, mix brown sugar...",www.cookbooks.com/Recipe-Details.aspx?id=44874,Gathered,"[""brown sugar"", ""milk"", ""vanilla"", ""nuts"", ""bu..."
1,Jewell Ball'S Chicken,"[""1 small jar chipped beef, cut up"", ""4 boned ...","[""Place chipped beef on bottom of baking dish....",www.cookbooks.com/Recipe-Details.aspx?id=699419,Gathered,"[""beef"", ""chicken breasts"", ""cream of mushroom..."
2,Creamy Corn,"[""2 (16 oz.) pkg. frozen corn"", ""1 (8 oz.) pkg...","[""In a slow cooker, combine all ingredients. C...",www.cookbooks.com/Recipe-Details.aspx?id=10570,Gathered,"[""frozen corn"", ""cream cheese"", ""butter"", ""gar..."
3,Chicken Funny,"[""1 large whole chicken"", ""2 (10 1/2 oz.) cans...","[""Boil and debone chicken."", ""Put bite size pi...",www.cookbooks.com/Recipe-Details.aspx?id=897570,Gathered,"[""chicken"", ""chicken gravy"", ""cream of mushroo..."
4,Reeses Cups(Candy),"[""1 c. peanut butter"", ""3/4 c. graham cracker ...","[""Combine first four ingredients and press in ...",www.cookbooks.com/Recipe-Details.aspx?id=659239,Gathered,"[""peanut butter"", ""graham cracker crumbs"", ""bu..."


In [None]:
data.shape

(2231142, 6)

In [None]:
data.columns

Index(['title', 'ingredients', 'directions', 'link', 'source', 'NER'], dtype='object')

## Features

### title
Title of the recipe

### Ingredients
Vector of strings descriping the amount of each ingredient required.

### directions
Vector of sentences containing the step by step actions necessary to reproduce the recipe. 

### link
Where the recipe has been scraped from

### source
If the recipe is from the `Recipes1M` data set or scraped. 

### Named Entity Recognition
Vector of the food entities in the recipie in string format. These entities can be used as input.

# 3. Explore the data

In [None]:
eda_copy = data.copy()

## Exploring the features in the data set

In [None]:
eda_copy.head()

Unnamed: 0,title,ingredients,directions,link,source,NER
0,No-Bake Nut Cookies,"[""1 c. firmly packed brown sugar"", ""1/2 c. eva...","[""In a heavy 2-quart saucepan, mix brown sugar...",www.cookbooks.com/Recipe-Details.aspx?id=44874,Gathered,"[""brown sugar"", ""milk"", ""vanilla"", ""nuts"", ""bu..."
1,Jewell Ball'S Chicken,"[""1 small jar chipped beef, cut up"", ""4 boned ...","[""Place chipped beef on bottom of baking dish....",www.cookbooks.com/Recipe-Details.aspx?id=699419,Gathered,"[""beef"", ""chicken breasts"", ""cream of mushroom..."
2,Creamy Corn,"[""2 (16 oz.) pkg. frozen corn"", ""1 (8 oz.) pkg...","[""In a slow cooker, combine all ingredients. C...",www.cookbooks.com/Recipe-Details.aspx?id=10570,Gathered,"[""frozen corn"", ""cream cheese"", ""butter"", ""gar..."
3,Chicken Funny,"[""1 large whole chicken"", ""2 (10 1/2 oz.) cans...","[""Boil and debone chicken."", ""Put bite size pi...",www.cookbooks.com/Recipe-Details.aspx?id=897570,Gathered,"[""chicken"", ""chicken gravy"", ""cream of mushroo..."
4,Reeses Cups(Candy),"[""1 c. peanut butter"", ""3/4 c. graham cracker ...","[""Combine first four ingredients and press in ...",www.cookbooks.com/Recipe-Details.aspx?id=659239,Gathered,"[""peanut butter"", ""graham cracker crumbs"", ""bu..."


In [None]:
eda_copy.dtypes

title          object
ingredients    object
directions     object
link           object
source         object
NER            object
dtype: object

Every feature is an object, i.e. string. The ingredients, directions and NER must be processed into array.

The data set has already been cleansed in terms of duplicate recipes. E.g. if two recipies with the exact same directions and ingredients exists, but different whitespacing, then one of them has been removed.

# 4 Prepare the data

In [None]:
data_cleaned = data.copy()

In [None]:
def drop_unnecessary_columns(df: pd.DataFrame) -> pd.DataFrame:
    return df.drop(columns=['link', 'source'], axis=1)
    

In [None]:
data_cleaned = drop_unnecessary_columns(data_cleaned)

In [None]:
data_cleaned

Unnamed: 0,title,ingredients,directions,NER
0,No-Bake Nut Cookies,"[""1 c. firmly packed brown sugar"", ""1/2 c. eva...","[""In a heavy 2-quart saucepan, mix brown sugar...","[""brown sugar"", ""milk"", ""vanilla"", ""nuts"", ""bu..."
1,Jewell Ball'S Chicken,"[""1 small jar chipped beef, cut up"", ""4 boned ...","[""Place chipped beef on bottom of baking dish....","[""beef"", ""chicken breasts"", ""cream of mushroom..."
2,Creamy Corn,"[""2 (16 oz.) pkg. frozen corn"", ""1 (8 oz.) pkg...","[""In a slow cooker, combine all ingredients. C...","[""frozen corn"", ""cream cheese"", ""butter"", ""gar..."
3,Chicken Funny,"[""1 large whole chicken"", ""2 (10 1/2 oz.) cans...","[""Boil and debone chicken."", ""Put bite size pi...","[""chicken"", ""chicken gravy"", ""cream of mushroo..."
4,Reeses Cups(Candy),"[""1 c. peanut butter"", ""3/4 c. graham cracker ...","[""Combine first four ingredients and press in ...","[""peanut butter"", ""graham cracker crumbs"", ""bu..."
...,...,...,...,...
2231137,Sunny's Fake Crepes,"[""1/2 cup chocolate hazelnut spread (recommend...","[""Spread hazelnut spread on 1 side of each tor...","[""chocolate hazelnut spread"", ""tortillas"", ""bu..."
2231138,Devil Eggs,"[""1 dozen eggs"", ""1 paprika"", ""1 salt and pepp...","[""Boil eggs on medium for 30mins."", ""Then cool...","[""eggs"", ""paprika"", ""salt"", ""choice"", ""miracle..."
2231139,Extremely Easy and Quick - Namul Daikon Salad,"[""150 grams Daikon radish"", ""1 tbsp Sesame oil...","[""Julienne the daikon and squeeze out the exce...","[""radish"", ""Sesame oil"", ""White sesame seeds"",..."
2231140,Pan-Roasted Pork Chops With Apple Fritters,"[""1 cup apple cider"", ""6 tablespoons sugar"", ""...","[""In a large bowl, mix the apple cider with 4 ...","[""apple cider"", ""sugar"", ""kosher salt"", ""bay l..."


In [None]:
data_cleaned.dtypes

title          object
ingredients    object
directions     object
NER            object
dtype: object

In [None]:
text = """"Bob the Builder" is a British animated children's television series that first premiered in 1999. The first season introduces the main character, Bob, a construction worker, and his team of anthropomorphic construction vehicles. The team consists of machines such as Scoop (a backhoe loader), Muck (a dump truck), Dizzy (a cement mixer), and Roley (a steamroller).
Throughout the first season, Bob and his team take on various construction projects in the fictional town of Bobsville. Each episode follows a similar structure, with Bob receiving a construction job, facing challenges, and ultimately completing the project with teamwork and problem-solving.
The show emphasizes positive values such as cooperation, perseverance, and the importance of a strong work ethic. It also introduces basic concepts related to construction and problem-solving, making it an educational and entertaining show for young children.
The characters and stories in the first season of "Bob the Builder" lay the foundation for the series' continued success and popularity among preschool audiences.
I'm afraid I can't provide a detailed summary of each episode in the first season of "Bob the Builder" as that would involve a significant amount of information. However, I can give you a general idea of the types of episodes and themes featured in the first season.
"Travis Gets Lucky": Travis, a tractor, feels left out and unlucky until Bob helps him find a way to contribute to the team.
"Bob's Barnraising": Bob and the team work together to build a new barn for Farmer Pickles.
"Buffalo Bob": Bob and the team build a buffalo enclosure at the zoo.
"Wendy's Busy Day": Wendy, Bob's assistant, has a busy day juggling multiple tasks.
"Bob Saves the Hedgehogs": Bob and the team create a safe space for hedgehogs.
"Bob's Bugle": Bob and the machines build a new music area for the town.
"Bob's Birthday": The team plans a surprise birthday party for Bob.
"Travis Paints the Town": Travis takes on the responsibility of painting road signs.
"Travis and Scoop's Race Day": Travis and Scoop compete in a friendly race.
"Bob's Badger": Bob and the team build a new home for badgers.
These summaries are quite basic, and each episode typically follows a formula where Bob and his team work together to overcome challenges and complete construction projects while imparting positive messages and lessons for young viewers. The first season sets the tone for the series, emphasizing teamwork, problem-solving, and the importance of community.
Certainly, I can provide a brief fictional page for each episode. Keep in mind that these are imaginative summaries, as I don't have the specific details of each episode at my disposal.
"Travis Gets Lucky"
In this exciting episode, Travis, the hardworking tractor, feels a bit down and left out. Bob and the team notice his struggles and decide to find a special job just for him. Through teamwork and encouragement, Travis discovers his unique skills and learns that everyone has an important role to play.
"Bob's Barnraising"
Farmer Pickles needs a new barn, and Bob and the team are ready for the challenge. The construction crew comes together to raise a brand-new barn, showcasing the power of teamwork and community spirit. It's a day filled with hard work, laughter, and a sense of accomplishment.
"Buffalo Bob"
The zoo needs a new home for its buffalo, and Bob and his crew are on the job. Join the team as they navigate the challenges of building an enclosure for these majestic animals. Along the way, the crew learns about wildlife conservation and the importance of creating habitats for all creatures.
"Wendy's Busy Day"
Wendy, the efficient assistant, has a day packed with various tasks. From organizing the construction site to managing paperwork, Wendy shows that being organized is a crucial part of any successful project. The episode highlights the importance of multitasking and staying on top of responsibilities.
"Bob Saves the Hedgehogs"
Bob and the team are on a mission to create a safe haven for hedgehogs in the town. As they build a hedgehog-friendly environment, they encounter challenges and surprises. The episode teaches young viewers about the needs of wildlife and the significance of creating spaces where animals can thrive.
"Bob's Bugle"
Bob and the machines take on a musical project, building a new area for the town's musicians. From constructing a stage to setting up sound systems, the team showcases their diverse skills. The episode emphasizes the importance of creativity and the arts in the community.
"Bob's Birthday"
It's a special day in Bobsville as the team plans a surprise birthday party for Bob. From secret preparations to unexpected twists, the episode is filled with fun and heartwarming moments. It's a celebration of friendship and appreciation for the hard work Bob does every day.
"Travis Paints the Town"
Travis takes on a new responsibility as the town's painter. From road signs to colorful murals, Travis transforms the town with his artistic touch. The episode explores the theme of taking pride in one's work and the joy that comes from contributing to the community.
"Travis and Scoop's Race Day"
It's a friendly competition between Travis and Scoop as they gear up for a race. The episode explores the excitement of healthy competition, as the two machines learn that winning isn't everything. Through teamwork and sportsmanship, they discover the true meaning of racing.
"Bob's Badger"
Bob and the team are on a mission to build a new home for the town's badgers. The crew faces challenges in creating a habitat that suits the needs of these shy creatures. The episode highlights the importance of respecting and protecting local wildlife.
These fictional pages aim to capture the spirit of each episode and the valuable lessons they convey to young audiences.
I'm afraid I can't provide a detailed summary of each episode in the first season of "Bob the Builder" as that would involve a significant amount of information. However, I can give you a general idea of the types of episodes and themes featured in the first season.
"Travis Gets Lucky": Travis, a tractor, feels left out and unlucky until Bob helps him find a way to contribute to the team.
"Bob's Barnraising": Bob and the team work together to build a new barn for Farmer Pickles.
"Buffalo Bob": Bob and the team build a buffalo enclosure at the zoo.
"Wendy's Busy Day": Wendy, Bob's assistant, has a busy day juggling multiple tasks.
"Bob Saves the Hedgehogs": Bob and the team create a safe space for hedgehogs.
"Bob's Bugle": Bob and the machines build a new music area for the town.
"Bob's Birthday": The team plans a surprise birthday party for Bob.
"Travis Paints the Town": Travis takes on the responsibility of painting road signs.
"Travis and Scoop's Race Day": Travis and Scoop compete in a friendly race.
"Bob's Badger": Bob and the team build a new home for badgers.
These summaries are quite basic, and each episode typically follows a formula where Bob and his team work together to overcome challenges and complete construction projects while imparting positive messages and lessons for young viewers. The first season sets the tone for the series, emphasizing teamwork, problem-solving, and the importance of community. Episode 1: "Travis Gets Lucky"
In this exciting episode, Travis, the hardworking tractor, feels a bit down and left out. Bob and the team notice his struggles and decide to find a special job just for him. Through teamwork and encouragement, Travis discovers his unique skills and learns that everyone has an important role to play.
"Bob's Barnraising"
Farmer Pickles needs a new barn, and Bob and the team are ready for the challenge. The construction crew comes together to raise a brand-new barn, showcasing the power of teamwork and community spirit. It's a day filled with hard work, laughter, and a sense of accomplishment.
"Buffalo Bob"
The zoo needs a new home for its buffalo, and Bob and his crew are on the job. Join the team as they navigate the challenges of building an enclosure for these majestic animals. Along the way, the crew learns about wildlife conservation and the importance of creating habitats for all creatures.
"Wendy's Busy Day"
Wendy, the efficient assistant, has a day packed with various tasks. From organizing the construction site to managing paperwork, Wendy shows that being organized is a crucial part of any successful project. The episode highlights the importance of multitasking and staying on top of responsibilities.
"Bob Saves the Hedgehogs"
Bob and the team are on a mission to create a safe haven for hedgehogs in the town. As they build a hedgehog-friendly environment, they encounter challenges and surprises. The episode teaches young viewers about the needs of wildlife and the significance of creating spaces where animals can thrive.
"Bob's Bugle"
Bob and the machines take on a musical project, building a new area for the town's musicians. From constructing a stage to setting up sound systems, the team showcases their diverse skills. The episode emphasizes the importance of creativity and the arts in the community.
"Bob's Birthday"
It's a special day in Bobsville as the team plans a surprise birthday party for Bob. From secret preparations to unexpected twists, the episode is filled with fun and heartwarming moments. It's a celebration of friendship and appreciation for the hard work Bob does every day.
"Travis Paints the Town"
Travis takes on a new responsibility as the town's painter. From road signs to colorful murals, Travis transforms the town with his artistic touch. The episode explores the theme of taking pride in one's work and the joy that comes from contributing to the community.
"Travis and Scoop's Race Day"
It's a friendly competition between Travis and Scoop as they gear up for a race. The episode explores the excitement of healthy competition, as the two machines learn that winning isn't everything. Through teamwork and sportsmanship, they discover the true meaning of racing.
"Bob's Badger
Bob and the team are on a mission to build a new home for the town's badgers. The crew faces challenges in creating a habitat that suits the needs of these shy creatures. The episode highlights the importance of respecting and protecting local wildlife.
These fictional pages aim to capture the spirit of each episode and the valuable lessons they convey to young audiences.
"Spud the Super Helper"
In this lively episode, Spud, the mischievous scarecrow, decides to be a super helper, but his attempts at assisting the team lead to comical chaos. Bob and the machines must find a way to channel Spud's energy positively while getting the job done. Through humor and teamwork, the episode emphasizes the importance of responsibility and finding one's unique strengths.
"Dizzy's Birdwatch"
Dizzy, the enthusiastic cement mixer, discovers a passion for birdwatching. The team joins her in creating a bird-friendly environment, learning about different bird species along the way. This nature-filled episode encourages appreciation for wildlife and showcases the value of pursuing personal interests.
"Roley's Apple Press"
When the town's apple trees produce an abundance of fruit, Roley takes charge of building an apple press. The episode explores the concept of resourcefulness as Bob and the team work together to find creative solutions to the surplus of apples. It's a tale of innovation and making the most of what nature provides.
"Travis and the Tropical Fruit"
Travis finds himself in a tropical fruit conundrum when a delivery of exotic fruits arrives in Bobsville. With the help of the team, Travis learns about different fruits and the importance of trying new things. The episode showcases diversity and the joy of exploration.
"Muck's Sleep-Over"
Muck, the energetic dump truck, invites his machine friends for a sleepover. The construction site transforms into a lively campsite as the machines share stories and experiences. This episode celebrates friendship and the value of bonding with others in a fun and imaginative way.
"Bob's Bugle"
Bob's musical talents take center stage again as he forms a band with the machines. Together, they prepare for a town concert, showcasing the importance of creativity and collaboration. The episode highlights the joy that comes from pursuing shared interests and working harmoniously as a team.
"Bob's Big Surprise"
A mysterious package arrives at the construction site, leading to anticipation and curiosity. As the team opens the surprise, they discover a new addition to their equipment. The episode revolves around the excitement of unexpected gifts and the positive impact they can have on a team.
"Dizzy Goes Camping"
Dizzy's enthusiasm for camping leads the team on a camping adventure. The episode explores the great outdoors, teamwork, and the thrill of experiencing new activities. It's a journey of discovery and friendship as the machines navigate the challenges of camping in the wilderness.
"Pilchard Steals the Show"
When Pilchard, Bob's cat, unexpectedly becomes the star of a pet talent show, the construction team must support her newfound fame. The episode explores the theme of encouragement and celebrating the unique talents of each team member, even the furry ones.
"Wendy's Magic Birthday"
It's Wendy's birthday, and the team plans a magical surprise for her. From decorations to a special birthday performance, the episode highlights the joy of celebrating special occasions with friends. It's a heartwarming tale of appreciation and the importance of creating memorable moments together.
"Bob's Boots"
Bob's trusty work boots go missing, causing a construction conundrum. As the team rallies to find the missing footwear, they encounter humorous mishaps and surprising places where Bob's boots may have ended up. The episode explores problem-solving, teamwork, and the value of a good laugh in the face of unexpected challenges.
"Mucky Muck"
Muck gets a makeover when he accidentally becomes the canvas for a playful paint mishap. The team turns this colorful accident into an opportunity to teach Muck about the importance of taking pride in one's appearance. Through creativity and teamwork, they transform Muck's mishap into a work of art, promoting self-expression and self-care.
"Lofty's Long Load"
Lofty, the crane with a gentle spirit, faces a challenge when tasked with transporting an unusually long load. The episode delves into themes of problem-solving and determination as Lofty navigates obstacles, ultimately learning that with patience and perseverance, even the toughest tasks can be accomplished.
"Bob's Metal Detector"
Bob discovers a metal detector, leading the team on a treasure hunt around the construction site. The episode combines adventure with problem-solving as the team unearths unexpected items buried underground. It's a tale of exploration, curiosity, and the thrill of discovering hidden treasures in unexpected places.
"Spud the DJ"
Spud takes on a new role as the construction site DJ, creating musical mayhem with his eclectic playlist. The team must find a way to balance work and play while appreciating Spud's enthusiasm. This musical escapade explores the importance of finding joy in everyday tasks and the positive impact of creativity in the workplace.
"Bob's Bugle"
Bob's musical talents shine once again as he organizes a town band. This time, the team prepares for a grand performance at the town's summer festival. The episode celebrates community spirit, the joy of shared interests, and the positive impact of bringing people together through music.
"Wendy's Party Plan"
Wendy takes charge of planning a surprise party for Bob, incorporating creative elements and teamwork to ensure the celebration is a success. The episode explores the art of party planning, highlighting the importance of attention to detail and collaboration in creating memorable events.
"Pilchard's Breakfast"
Pilchard, Bob's cat, becomes the focus of the episode as the team discovers her love for breakfast food. The construction site transforms into a culinary adventure as the team works together to create a special breakfast for Pilchard. It's a heartwarming exploration of caring for pets and the joy of surprising loved ones with thoughtful gestures.
"Roley to the Rescue"
Roley takes center stage when he discovers his unique ability to assist in rescue missions. The team faces a construction emergency, and Roley's newfound skills prove instrumental in overcoming the challenges. The episode emphasizes the value of recognizing individual strengths and the importance of being prepared for unexpected situations.
"Dizzy's Sleepover"
Dizzy, the energetic cement mixer, hosts a sleepover for her machine friends. The construction site transforms into a cozy campsite as the machines share stories, play games, and experience the excitement of a sleepover. This delightful episode celebrates friendship, camaraderie, and the joy of spending quality time with those we care about.
"Bob's Boots"
Bob's trusty work boots go missing, causing a construction conundrum. As the team rallies to find the missing footwear, they encounter humorous mishaps and surprising places where Bob's boots may have ended up. The episode explores problem-solving, teamwork, and the value of a good laugh in the face of unexpected challenges.
"Mucky Muck"
Muck gets a makeover when he accidentally becomes the canvas for a playful paint mishap. The team turns this colorful accident into an opportunity to teach Muck about the importance of taking pride in one's appearance. Through creativity and teamwork, they transform Muck's mishap into a work of art, promoting self-expression and self-care.
"Lofty's Long Load"
Lofty, the crane with a gentle spirit, faces a challenge when tasked with transporting an unusually long load. The episode delves into themes of problem-solving and determination as Lofty navigates obstacles, ultimately learning that with patience and perseverance, even the toughest tasks can be accomplished.
"Bob's Metal Detector"
Bob discovers a metal detector, leading the team on a treasure hunt around the construction site. The episode combines adventure with problem-solving as the team unearths unexpected items buried underground. It's a tale of exploration, curiosity, and the thrill of discovering hidden treasures in unexpected places.
"Spud the DJ"
Spud takes on a new role as the construction site DJ, creating musical mayhem with his eclectic playlist. The team must find a way to balance work and play while appreciating Spud's enthusiasm. This musical escapade explores the importance of finding joy in everyday tasks and the positive impact of creativity in the workplace.
"Bob's Bugle"
Bob's musical talents shine once again as he organizes a town band. This time, the team prepares for a grand performance at the town's summer festival. The episode celebrates community spirit, the joy of shared interests, and the positive impact of bringing people together through music.
"Wendy's Party Plan"
Wendy takes charge of planning a surprise party for Bob, incorporating creative elements and teamwork to ensure the celebration is a success. The episode explores the art of party planning, highlighting the importance of attention to detail and collaboration in creating memorable events.
"Pilchard's Breakfast"
Pilchard, Bob's cat, becomes the focus of the episode as the team discovers her love for breakfast food. The construction site transforms into a culinary adventure as the team works together to create a special breakfast for Pilchard. It's a heartwarming exploration of caring for pets and the joy of surprising loved ones with thoughtful gestures.
"Roley to the Rescue"
Roley takes center stage when he discovers his unique ability to assist in rescue missions. The team faces a construction emergency, and Roley's newfound skills prove instrumental in overcoming the challenges. The episode emphasizes the value of recognizing individual strengths and the importance of being prepared for unexpected situations.
"Dizzy's Sleepover"
Dizzy, the energetic cement mixer, hosts a sleepover for her machine friends. The construction site transforms into a cozy campsite as the machines share stories, play games, and experience the excitement of a sleepover. This delightful episode celebrates friendship, camaraderie, and the joy of spending quality time with those we care about.
"""


In [None]:
import functools

def stringify_directions(amount_of_rows):
    text = data_cleaned["directions"][:amount_of_rows].tolist()
    text = functools.reduce(lambda state, curr: f"{state}\n{curr}",text)
    #print(text)
    text = text.replace("[", "").replace(']', "")
    return text
    

In [None]:
text = stringify_directions(200)
print(text)

"In a heavy 2-quart saucepan, mix brown sugar, nuts, evaporated milk and butter or margarine.", "Stir over medium heat until mixture bubbles all over top.", "Boil and stir 5 minutes more. Take off heat.", "Stir in vanilla and cereal; mix well.", "Using 2 teaspoons, drop and shape into 30 clusters on wax paper.", "Let stand until firm, about 30 minutes."
"Place chipped beef on bottom of baking dish.", "Place chicken on top of beef.", "Mix soup and cream together; pour over chicken. Bake, uncovered, at 275\u00b0 for 3 hours."
"In a slow cooker, combine all ingredients. Cover and cook on low for 4 hours or until heated through and cheese is melted. Stir well before serving. Yields 6 servings."
"Boil and debone chicken.", "Put bite size pieces in average size square casserole dish.", "Pour gravy and cream of mushroom soup over chicken; level.", "Make stuffing according to instructions on box (do not make too moist).", "Put stuffing on top of chicken and gravy; level.", "Sprinkle shredded c

In [None]:
vocab = sorted(set(text))
vocab

['\n',
 ' ',
 '!',
 '"',
 '%',
 "'",
 '(',
 ')',
 '*',
 ',',
 '-',
 '.',
 '/',
 '0',
 '1',
 '2',
 '3',
 '4',
 '5',
 '6',
 '7',
 '8',
 '9',
 ':',
 ';',
 'A',
 'B',
 'C',
 'D',
 'E',
 'F',
 'G',
 'H',
 'I',
 'J',
 'K',
 'L',
 'M',
 'N',
 'O',
 'P',
 'Q',
 'R',
 'S',
 'T',
 'U',
 'V',
 'W',
 'Y',
 'Z',
 '\\',
 'a',
 'b',
 'c',
 'd',
 'e',
 'f',
 'g',
 'h',
 'i',
 'j',
 'k',
 'l',
 'm',
 'n',
 'o',
 'p',
 'q',
 'r',
 's',
 't',
 'u',
 'v',
 'w',
 'x',
 'y',
 'z']

In [None]:
import tensorflow as tf

s = ["bob", "the builder"]
s2 = ["bob", "the maker"]

chars = tf.strings.unicode_split(s, input_encoding='UTF-8')
chars

<tf.RaggedTensor [[b'b', b'o', b'b'],
 [b't', b'h', b'e', b' ', b'b', b'u', b'i', b'l', b'd', b'e', b'r']]>

In [None]:
ids_from_chars = tf.keras.layers.StringLookup(
    vocabulary=list(vocab), mask_token=None)

In [None]:
ids = ids_from_chars(chars)
ids

<tf.RaggedTensor [[53, 66, 53], [71, 59, 56, 2, 53, 72, 60, 63, 55, 56, 69]]>

In [None]:
chars_from_ids = tf.keras.layers.StringLookup(
    vocabulary=ids_from_chars.get_vocabulary(), invert=True, mask_token=None)

In [None]:
chars = chars_from_ids(ids)
chars

<tf.RaggedTensor [[b'b', b'o', b'b'],
 [b't', b'h', b'e', b' ', b'b', b'u', b'i', b'l', b'd', b'e', b'r']]>

In [None]:
def text_from_ids(ids):
  return tf.strings.reduce_join(chars_from_ids(ids), axis=-1)


print(text_from_ids(ids))

tf.Tensor([b'bob' b'the builder'], shape=(2,), dtype=string)


In [None]:
all_ids = ids_from_chars(tf.strings.unicode_split(text, 'UTF-8'))
all_ids

<tf.Tensor: shape=(50390,), dtype=int64, numpy=array([ 4, 34, 65, ..., 56, 12,  4], dtype=int64)>

In [None]:
ids_dataset = tf.data.Dataset.from_tensor_slices(all_ids)

In [None]:
for ids in ids_dataset.take(10):
    print(chars_from_ids(ids).numpy().decode('utf-8'))

"
I
n
 
a
 
h
e
a
v


In [None]:
seq_length = 100

In [None]:
sequences = ids_dataset.batch(seq_length+1, drop_remainder=True)

for seq in sequences.take(1):
  print(chars_from_ids(seq))

tf.Tensor(
[b'"' b'I' b'n' b' ' b'a' b' ' b'h' b'e' b'a' b'v' b'y' b' ' b'2' b'-'
 b'q' b'u' b'a' b'r' b't' b' ' b's' b'a' b'u' b'c' b'e' b'p' b'a' b'n'
 b',' b' ' b'm' b'i' b'x' b' ' b'b' b'r' b'o' b'w' b'n' b' ' b's' b'u'
 b'g' b'a' b'r' b',' b' ' b'n' b'u' b't' b's' b',' b' ' b'e' b'v' b'a'
 b'p' b'o' b'r' b'a' b't' b'e' b'd' b' ' b'm' b'i' b'l' b'k' b' ' b'a'
 b'n' b'd' b' ' b'b' b'u' b't' b't' b'e' b'r' b' ' b'o' b'r' b' ' b'm'
 b'a' b'r' b'g' b'a' b'r' b'i' b'n' b'e' b'.' b'"' b',' b' ' b'"' b'S'
 b't' b'i' b'r'], shape=(101,), dtype=string)


In [None]:
for seq in sequences.take(5):
  print(text_from_ids(seq).numpy())

b'"In a heavy 2-quart saucepan, mix brown sugar, nuts, evaporated milk and butter or margarine.", "Stir'
b' over medium heat until mixture bubbles all over top.", "Boil and stir 5 minutes more. Take off heat.'
b'", "Stir in vanilla and cereal; mix well.", "Using 2 teaspoons, drop and shape into 30 clusters on wa'
b'x paper.", "Let stand until firm, about 30 minutes."\n"Place chipped beef on bottom of baking dish.", '
b'"Place chicken on top of beef.", "Mix soup and cream together; pour over chicken. Bake, uncovered, at'


In [None]:
def split_input_target(sequence):
    input_text = sequence[:-1]
    target_text = sequence[1:]
    return input_text, target_text

In [None]:
dataset = sequences.map(split_input_target)

In [None]:
for input_example, target_example in dataset.take(1):
    print("Input :", text_from_ids(input_example).numpy())
    print("Target:", text_from_ids(target_example).numpy())

Input : b'"In a heavy 2-quart saucepan, mix brown sugar, nuts, evaporated milk and butter or margarine.", "Sti'
Target: b'In a heavy 2-quart saucepan, mix brown sugar, nuts, evaporated milk and butter or margarine.", "Stir'


In [None]:
# Batch size
BATCH_SIZE = 64

# Buffer size to shuffle the dataset
# (TF data is designed to work with possibly infinite sequences,
# so it doesn't attempt to shuffle the entire sequence in memory. Instead,
# it maintains a buffer in which it shuffles elements).
BUFFER_SIZE = 10000

dataset = (
    dataset
    .shuffle(BUFFER_SIZE)
    .batch(BATCH_SIZE, drop_remainder=True)
    .prefetch(tf.data.experimental.AUTOTUNE))

dataset

<_PrefetchDataset element_spec=(TensorSpec(shape=(64, 100), dtype=tf.int64, name=None), TensorSpec(shape=(64, 100), dtype=tf.int64, name=None))>

In [None]:
# Length of the vocabulary in StringLookup Layer
vocab_size = len(ids_from_chars.get_vocabulary())

# The embedding dimension
embedding_dim = 256

# Number of RNN units
rnn_units = 1024

In [None]:
class MyModel(tf.keras.Model):
  def __init__(self, vocab_size, embedding_dim, rnn_units):
    super().__init__(self)
    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
    self.gru = tf.keras.layers.GRU(rnn_units,
                                   return_sequences=True,
                                   return_state=True)
    self.dense = tf.keras.layers.Dense(vocab_size)

  def call(self, inputs, states=None, return_state=False, training=False):
    x = inputs
    x = self.embedding(x, training=training)
    if states is None:
      states = self.gru.get_initial_state(x)
    x, states = self.gru(x, initial_state=states, training=training)
    x = self.dense(x, training=training)

    if return_state:
      return x, states
    else:
      return x

In [None]:
model = MyModel(
    vocab_size=vocab_size,
    embedding_dim=embedding_dim,
    rnn_units=rnn_units)

In [None]:
for input_example_batch, target_example_batch in dataset.take(1):
    example_batch_predictions = model(input_example_batch)
    print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")

(64, 100, 78) # (batch_size, sequence_length, vocab_size)


In [None]:
model.summary()


Model: "my_model_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_5 (Embedding)     multiple                  19968     
                                                                 
 gru_4 (GRU)                 multiple                  3938304   
                                                                 
 dense_4 (Dense)             multiple                  79950     
                                                                 
Total params: 4038222 (15.40 MB)
Trainable params: 4038222 (15.40 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [None]:
sampled_indices = tf.random.categorical(example_batch_predictions[0], num_samples=1)
sampled_indices = tf.squeeze(sampled_indices, axis=-1).numpy()

In [None]:
sampled_indices

array([60, 35, 63, 45, 54, 59, 52, 35, 10, 66, 14, 66,  8, 34, 34, 45, 48,
       48, 69, 65,  0,  7, 42, 71,  7, 76, 42, 63, 37,  5, 71, 42,  9, 35,
       61, 60, 34, 61, 13, 76, 25, 62, 65, 64, 73, 22, 43, 56, 34, 39, 18,
       57, 77, 15, 76, 59, 18, 26, 37, 33, 47, 76, 42, 27, 49, 73, 54, 75,
        8, 57, 30, 33,  7, 62, 76, 72, 61, 40,  9, 65, 68,  7, 19, 12, 69,
       23, 43, 36, 10, 32,  1, 47, 63, 48, 16, 51, 73, 35, 53, 52],
      dtype=int64)

In [None]:
print("Input:\n", text_from_ids(input_example_batch[0]).numpy())
print()
print("Next Char Predictions:\n", text_from_ids(sampled_indices).numpy())

Input:
 b'rator."\n"Wash potatoes; prick several times with a fork.", "Microwave them with a wet paper towel co'

Next Char Predictions:
 b'iJlTchaJ,o0o)IITWWrn[UNK](Qt(yQlL%tQ*JjiIj/y;knmv8ReIN4fz1yh4ALHVyQBYvcx)fEH(kyujO*nq(5.r9RK,G\nVlW2\\vJba'


In [None]:
loss = tf.losses.SparseCategoricalCrossentropy(from_logits=True)


In [None]:
example_batch_mean_loss = loss(target_example_batch, example_batch_predictions)
print("Prediction shape: ", example_batch_predictions.shape, " # (batch_size, sequence_length, vocab_size)")
print("Mean loss:        ", example_batch_mean_loss)

Prediction shape:  (64, 100, 78)  # (batch_size, sequence_length, vocab_size)
Mean loss:         tf.Tensor(4.359045, shape=(), dtype=float32)


In [None]:
tf.exp(example_batch_mean_loss).numpy()

78.182434

In [None]:
model.compile(optimizer='adam', loss=loss)


In [None]:
import os
# Directory where the checkpoints will be saved
checkpoint_dir = './training_checkpoints'
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")

checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True)

In [None]:
EPOCHS = 100

In [None]:
history = model.fit(dataset, epochs=EPOCHS, callbacks=[checkpoint_callback])

Epoch 1/100


Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78/100
Epoch 7

In [None]:
class OneStep(tf.keras.Model):
  def __init__(self, model, chars_from_ids, ids_from_chars, temperature=1.0):
    super().__init__()
    self.temperature = temperature
    self.model = model
    self.chars_from_ids = chars_from_ids
    self.ids_from_chars = ids_from_chars

    # Create a mask to prevent "[UNK]" from being generated.
    skip_ids = self.ids_from_chars(['[UNK]'])[:, None]
    sparse_mask = tf.SparseTensor(
        # Put a -inf at each bad index.
        values=[-float('inf')]*len(skip_ids),
        indices=skip_ids,
        # Match the shape to the vocabulary
        dense_shape=[len(ids_from_chars.get_vocabulary())])
    self.prediction_mask = tf.sparse.to_dense(sparse_mask)

  @tf.function
  def generate_one_step(self, inputs, states=None):
    # Convert strings to token IDs.
    input_chars = tf.strings.unicode_split(inputs, 'UTF-8')
    input_ids = self.ids_from_chars(input_chars).to_tensor()

    # Run the model.
    # predicted_logits.shape is [batch, char, next_char_logits]
    predicted_logits, states = self.model(inputs=input_ids, states=states,
                                          return_state=True)
    # Only use the last prediction.
    predicted_logits = predicted_logits[:, -1, :]
    print(f"predicted logits: {predicted_logits}")
    predicted_logits = predicted_logits/self.temperature
    # Apply the prediction mask: prevent "[UNK]" from being generated.
    predicted_logits = predicted_logits + self.prediction_mask

    # Sample the output logits to generate token IDs.
    predicted_ids = tf.random.categorical(predicted_logits, num_samples=1)
    print(f'categorical: {predicted_ids}')
    predicted_ids = tf.squeeze(predicted_ids, axis=-1)
    print(f'squeeze: {predicted_ids}')

    # Convert from token ids to characters
    predicted_chars = self.chars_from_ids(predicted_ids)

    # Return the characters and model state.
    return predicted_chars, states

In [None]:
one_step_model = OneStep(model, chars_from_ids, ids_from_chars)


In [None]:
import time 

start = time.time()
states = None
next_char = tf.constant(['"1 egg, one bag of corn flour, one apple, two carrots, 1/2 cup of urine"'])
result = [next_char]

for n in range(1000):
  next_char, states = one_step_model.generate_one_step(next_char, states=states)
  result.append(next_char)

result = tf.strings.join(result)
end = time.time()
print(result[0].numpy().decode('utf-8'), '\n\n' + '_'*80)
print('\nRun time:', end - start)

"1 egg, one bag of corn flour, one apple, two carrots, 1/2 cup of urine", "Thing "y large borings.", "Drain cream cheese in milt.", "Remove from pan."
"Mix iaster and Criln seasoning.", "Cook beef remaining ingredients.", "Puen", "reserved noodles, soup mix, broccoli, cornstarch and well browned.", "Cut ih everly bitter oil allovailing whipped cream.", "Mix well.", "Pour into an oblong shepper cake mix directions)."
"Mix Cool Whip, solt jall quickly and pour into buttently.", "Add 1/2 cup sugar and frout (hely.", "Sprinkle cheese ov rot into greased mix dry ingredients.", "Combine tomato sauce, water, brown sugar, mustard, salt and pepper.", "Allow to square pan, 8 x 8-inch pan.", "Next mix Eagle Brang tigether to cheese until smooth.", "Add gelatin and beat at high speed untill sorm cream the hut in a garales and remeid for 1 hour.", "Combine tomato sigers and chif in salt.", "Add Minger with whites encold cheese (1/4-inch thickness, saucepan, mix brown sugar, nuts.", "Mix with cansy.

In [None]:

start = time.time()
states = None
next_char = tf.constant(['"Two cups of milk, one scoop of brown sugar'])
result = [next_char]

for n in range(1000):
  next_char, states = one_step_model.generate_one_step(next_char, states=states)
  result.append(next_char)

result = tf.strings.join(result)
end = time.time()
print(result[0].numpy().decode('utf-8'), '\n\n' + '_'*80)
print('\nRun time:', end - start)

"Two cups of milk, one scoop of brown sugar and oil; let stand overn 1/2 cup of nuts until tender.", "Mix eggs, entar and cook; then and beat until smooth and well bello; stir tile peaky at rediege the chicken bottom.", "Bake at 350\u00b0 for 1 hour.", "Let stand 5 minutes before slicing.", "Makes 8 servings."
"Aloul hots in pan; ddain. Axt 3 ingredients.", "Cover."
"Drop batter pileclion for 5 minutes. Yield: 2 1/3 dozen."
"In a 6-quart punch bowl mix all of the cansf tamales.", "Spread Top Chips."
"Mix ground beef and cheese; pinch sides together."
"Mix and lolf pinest of waxmprate mixture over top.", "Mix in saucepan, mix brown sugar, nuts and cheese; pinch sides together."
"Mix ingredients and simmer for 40 minutes.", "Serve hot over corn chips."
"Sift first 5id ingredients together firm wowh th tomatoes until done, about 30 to 30 minutes."
"Brown beef with onions.", "Add greased 8 x 8 x 2-inches or until for 1 to 1 1/2 minutes cour of Ifel ard 1/2 cup of bacon.", "Cook until heate