# Video: Implementing Retrieval-Augmented Generation

This video shows how an existing language model with embedding support can be quickly used to implement retrieval-augmented generation.

Script: (faculty on screen)
* Retrieval-augmented generation or RAG aims for a best of both worlds approach to text generation.
* It has the extreme flexibility of language models.
* And it uses document search based on the same language models to fetch relevant documents and provide relevant context to the language model for in-context learning.
* This video will show you a basic RAG implementation that answers questions about recipes grounded by the recipes on my Bacon Powered Recipes site.

In [None]:
import json

import google.genai as genai
from google.genai import types
from google.colab import userdata
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sklearn.linear_model

In [None]:
client = genai.Client(api_key=userdata.get('GEMINI_API_KEY'))

In [None]:
embedding_model_name = 'gemini-embedding-001'
model_name = 'gemini-2.0-flash'

In [None]:
recipes = pd.read_csv("https://raw.githubusercontent.com/bu-cds-omds/dx704-examples/refs/heads/main/data/recipes.tsv.gz", sep="\t")
recipes = recipes.set_index("recipe_slug")
recipes = recipes[:1000]

In [None]:
recipes.head()

Unnamed: 0_level_0,recipe_title,recipe_introduction,recipe_ingredients,recipe_instructions,recipe_conclusion,recipe_related_slugs,recipe_ts
recipe_slug,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
spiced-pear-and-walnut-salad,Spiced Pear And Walnut Salad,Spiced pear and walnut salad is a delicious an...,"[""2 ripe pears, thinly sliced"", ""4 cups mixed ...","[""In a small bowl, whisk together the olive oi...",\N,"[""pear-and-blue-cheese-salad"", ""walnut-and-cra...",2023-06-17 22:10:35.744536+00
roasted-pear-and-butternut-squash-soup,Roasted Pear And Butternut Squash Soup,Roasted pear and butternut squash soup is a cr...,"[""2 medium-sized butternut squash, peeled and ...","[""Preheat the oven to 400°F."", ""In a large bow...",\N,"[""roasted-butternut-squash-and-apple-soup"", ""p...",2023-06-17 22:10:46.428069+00
peach-clafoutis,Peach Clafoutis,Peach clafoutis is a classic French dessert th...,"[""4 ripe peaches, peeled and sliced"", ""3 eggs""...","[""Preheat the oven to 375°F."", ""Grease a 9-inc...",\N,"[""cherry-clafoutis"", ""blueberry-clafoutis"", ""a...",2023-06-17 19:05:50.44248+00
plum-clafoutis,Plum Clafoutis,Plum clafoutis is a classic French dessert mad...,"[""4-5 ripe plums, pitted and sliced"", ""3 eggs""...","[""Preheat the oven to 375°F (190°C) and butter...",\N,"[""cherry-clafoutis"", ""apple-clafoutis"", ""blueb...",2023-06-17 19:05:42.705122+00
pear,Pear,Pears are a sweet and juicy fruit that come in...,"[""1 sheet of puff pastry"", ""2 ripe pears, peel...","[""Preheat the oven to 400°F."", ""Roll out the p...",\N,"[""pear-and-goat-cheese-salad"", ""pear-and-ginge...",2023-06-17 22:11:13.760378+00


In [None]:
raw_embeddings = {}

In [None]:
recipe_embeddings = {}

In [None]:
def get_embedding(text):
    if text in raw_embeddings:
        return raw_embeddings[text]
    response = client.models.embed_content(model=embedding_model_name,
                                           contents=text)

    raw_embeddings[text] = np.array(response.embeddings[0].values)
    return raw_embeddings[text]

In [None]:
def save_recipe_embedding(recipe_tuple):
    recipe_slug = recipe_tuple.Index
    if recipe_slug in recipe_embeddings:
        return

    embedding = get_embedding(recipe_tuple.recipe_introduction)
    recipe_embeddings[recipe_slug] = embedding

In [None]:
for r in recipes.itertuples():
    save_recipe_embedding(r)

In [None]:
def get_response(contents):
    response = client.models.generate_content(model=model_name,
                                              contents=contents)
    return response.text

Script:
* I've already loaded the Google genai modules and fetched embeddings for a thousand recipes.

In [None]:
query = "What ingredients are common in dessert recipes?"

Script:
* I am going to ask this question, "What ingredients are common in dessert recipes?".
* Without context, I expect a very generic answer to this question.

In [None]:
print(get_response(query))

Dessert recipes commonly include a range of ingredients that contribute to their sweetness, texture, and flavor. Here's a breakdown of some of the most common ones:

**Sweeteners:**

*   **Sugar:** The foundation of most desserts. Comes in various forms like granulated, powdered (icing), brown, and caster sugar (superfine). Each has a different texture and molasses content, affecting the final product.
*   **Honey:** A natural sweetener that adds a distinct flavor.
*   **Maple Syrup:** Another natural sweetener, offering a unique flavor profile.
*   **Corn Syrup:** Used for its texture-enhancing properties, often in candies and icings.
*   **Molasses:** A byproduct of sugar production, providing a rich, dark flavor.
*   **Other Syrups:** Agave, rice malt syrup, etc., used as alternative sweeteners.

**Fats:**

*   **Butter:** Adds richness, flavor, and tenderness.
*   **Oil:** Provides moisture and a tender crumb. Often used in cakes and quick breads.
*   **Shortening:** Offers a neutr

Script:
* As you can see, the response was totally generic.
* It is not specific at all to the recipes at Bacon Powered.
* Let's get a list of recipes that might be useful for this question using the embeddings to find relevant documents.

In [None]:
def search_recipes(query):
    query_embedding = get_embedding(query)
    candidates = list(recipe_embeddings.keys())
    candidates.sort(key=lambda x: np.linalg.norm(recipe_embeddings[x]- query_embedding))
    return candidates[:10]

search_recipes(query)

['chocolate-brownies',
 'cheesecake',
 'apple-crumble',
 'blondie-bars',
 'chocolate-mousse',
 'pancakes',
 'fudge',
 'strawberry-tart',
 'apple-crisp',
 'coconut-cream-pie']

Script:
* These 10 recipes all sound relevant to desserts except for pancakes.
* That's a bit context dependent - are they stacked with fruit and whipped cream?
* The language model will have to decide whether to include them.
* How do we use all these recipes to improve the answer to our question?

In [None]:
def get_rag_prompt(query):
    rag_prompt = []
    rag_prompt.append(f"QUESTION: {query}\n\n")
    for recipe_slug in search_recipes(query):
        recipe = recipes.loc[recipe_slug]
        rag_prompt.append(f"RELEVANT RECIPE: {recipe.recipe_title}\n\n")
        rag_prompt.append(recipe.recipe_introduction)
        rag_prompt.append("\n\n")
        rag_prompt.append("Ingredients:\n")
        for ingredient in json.loads(recipe.recipe_ingredients):
            rag_prompt.append(f"* {ingredient}\n\n")
        rag_prompt.append("Instructions:\n")
        for instruction in json.loads(recipe.recipe_instructions):
            rag_prompt.append(f"* {instruction}\n\n")
        rag_prompt.append("\n\n")

    rag_prompt.append(f"QUESTION: {query}\n\n")
    rag_prompt.append(f"ANSWER: ")
    rag_prompt = "".join(rag_prompt)

    return rag_prompt

Script:
* I designed this prompt to start with the question, list all the recipes including their introduction, ingredients list, and instructions.
* Then I repeated the question, and added a prompt for the answer.
* Why repeat the question?
* Some research has shown that different queries work better with the question before context and some work better with the question after context, and repeating the question before and after tends to be as good or better than correctly guessing before or after.
* We don't have any scientifically grounded rules at this point, and the answers vary from model to model and as the models improve.
* Similarly, I included the recipes' introductions, ingredients, and instructions pretending that I did not know that the question was just about ingredients.
* This makes this prompt structure more generally useful, and it would defeat the point of having the model figure out what was important if I had to pick and choose the data per question.
* Let's look at the generated prompt now.

In [None]:
rag_prompt = get_rag_prompt(query)
print(rag_prompt)

QUESTION: What ingredients are common in dessert recipes?

RELEVANT RECIPE: Chocolate Brownies

Chocolate brownies are a classic dessert that are loved by many. They are a rich and decadent treat that are perfect for satisfying a sweet tooth. Brownies are typically made with cocoa powder, sugar, flour, eggs, and butter, and can be customized with additional ingredients such as nuts, chocolate chips, or frosting. They are typically baked in a square or rectangular pan and cut into individual servings.

Ingredients:
* 1 cup unsalted butter

* 2 1/4 cups granulated sugar

* 4 large eggs

* 1 1/4 cups cocoa powder

* 1 teaspoon salt

* 1 teaspoon baking powder

* 1 teaspoon vanilla extract

* 1 1/2 cups all-purpose flour

* 1 cup chocolate chips

Instructions:
* Preheat the oven to 350°F and grease a 9x13 inch baking pan.

* In a large mixing bowl, melt the butter in the microwave or on the stove.

* Add the sugar to the melted butter and stir until well combined.

* Add the eggs one at a 

Script:
* That's a lot of text.
* I bet the API providers charging for tokens love these long RAG queries.
* Implementers still love this setup though because it improves the quality a lot.
* Language models are getting a lot smarter, but they won't know about companies internal documents and rules unless we share them.
* How did this prompt perform?

In [None]:
print(get_response(rag_prompt))

Based on the provided recipes, common ingredients in dessert recipes include:

*   **Flour:** All-purpose flour is used in most of the listed recipes.
*   **Sugar:** Both granulated sugar and brown sugar appear frequently.
*   **Butter:** Unsalted butter is a very common ingredient.
*   **Eggs:** Used in many desserts for richness and binding.
*   **Vanilla extract:** A common flavoring agent.
*   **Milk/Cream:** Used in many desserts for moisture and richness.
*   **Chocolate:** Chocolate chips, cocoa powder, and semisweet chocolate are common.
*   **Baking powder:** Used as a leavening agent in some recipes.
*   **Salt:** Used to enhance flavors.

Other ingredients that appear in multiple desserts:

*   **Fruit:** Apples, strawberries
*   **Oats:** Rolled oats
*   **Nuts:** Chopped nuts,
*   **Coconut:** Shredded coconut
*   **Cream Cheese:** Used in cheesecake


Script:
* That's a lot more specific to the provided recipes from my site.
* For corporate use cases, this specificity is important.


Script: (faculty on screen)
* Retrieval augmented generation is a great way to improve the accuracy and relevance of generated text.
* New language models need less help to be generally correct now, but retrieving context still helps a lot in being situationally correct.