# GenAI/RAG in Python 2025

## Session 01. A Basic RAG Framework

In [6]:
import os
import numpy as np
import pandas as pd
from openai import OpenAI

## 1. Let's grab some text... Italian cuisine, for example?

In [7]:
# Path to the CSV file
file_path = "../../_data/italian_recipes_clean.csv"

# Load the CSV into a Pandas DataFrame
df = pd.read_csv(file_path)

# Display some basic information
print(df.info())
print(df.head())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 220 entries, 0 to 219
Data columns (total 2 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   title    220 non-null    object
 1   receipt  220 non-null    object
dtypes: object(2)
memory usage: 3.6+ KB
None
                 title                                            receipt
0  BROTH OR SOUP STOCK  (Brodo) To obtain good broth the meat must be ...
1           BREAD SOUP  (Panata) This excellent and nutritious soup is...
2              GNOCCHI  This is an excellent soup, but as it requires ...
3       VEGETABLE SOUP  (Zuppa Sante) Any kind of vegetables may be us...
4         QUEEN'S SOUP  (Zuppa Regina) This is made with the white mea...


### We would like to build a system that...

(1) Takes user input in the form of a question (e.g. "I'd like to cook something with carrots"), (2) performs a similarity search across the recipes in the `df` DataFrame, (3) obtains the most similar five recipes, lists them, and (4) combines them with a prompt sent to ChatGPT to shape the final response that is shared with the user.  

## 2. Vector embeddings for similarity search

All recipes must be embedded in order to be prepared for similarity search.

We will use OpenAI's embedding models this time.

In [8]:
# Set your API key (ensure OPENAI_API_KEY is set in your environment)
api_key = os.getenv("OPENAI_API_KEY")

# Instantiate the OpenAI client with your API key  
client = OpenAI(api_key=api_key)           

# Instantiate the OpenAI client with your API key  
client = OpenAI(api_key=api_key)           

# Select the embedding model to use (as per OpenAI docs)  
model_name = "text-embedding-3-small"      

# Prepare a list to collect embedding vectors  
embeddings = []                            

# Iterate over each row in your DataFrame `df`  
for idx, row in df.iterrows():
    # grab the receipt text for this row              
    text = row["receipt"]  
    # If it's not a valid string, skip embedding  
    if not isinstance(text, str) or text.strip() == "":
        # TODO: zasto dodajemo None? Zasto ne preskocimo sad umesto kasnije sto radimo "scores.append(-1.0)"?
        embeddings.append(None)             
        continue                            

    '''
    TODO:
    Kako optimizovati ovaj deo koda?
    Ovo će poslati 220 pojedinačnih zahteva OpenAI-ju, pa ce da traje i da troši kredite.
    Jedan pristup je da sacuvam rezultate u fajl pa da ih kasnije korstim.
    Koje su mi mogucnosti optimizacije ovde?
    '''
    # Call the embeddings endpoint on the client  
    resp = client.embeddings.create(        
        model=model_name,                   
        input=[text]                        
    )                                     

    # Extract the embedding vector from the response object  
    emb = resp.data[0].embedding            

    # Append that embedding vector to our list  
    embeddings.append(emb)                  

# Show first few rows to verify
print("df.head() BEFORE adding embeddings:")
df.head()

df.head() BEFORE adding embeddings:


Unnamed: 0,title,receipt
0,BROTH OR SOUP STOCK,(Brodo) To obtain good broth the meat must be ...
1,BREAD SOUP,(Panata) This excellent and nutritious soup is...
2,GNOCCHI,"This is an excellent soup, but as it requires ..."
3,VEGETABLE SOUP,(Zuppa Sante) Any kind of vegetables may be us...
4,QUEEN'S SOUP,(Zuppa Regina) This is made with the white mea...


In [9]:
# TODO: Razjasniti
# After the loop, assign embeddings list to a new DataFrame column
df["embedding"] = embeddings

# Show first few rows to verify
print("df.head() AFTER adding embeddings:")
df.head()

df.head() AFTER adding embeddings:


Unnamed: 0,title,receipt,embedding
0,BROTH OR SOUP STOCK,(Brodo) To obtain good broth the meat must be ...,"[0.00074602389940992, -0.03424371778964996, -0..."
1,BREAD SOUP,(Panata) This excellent and nutritious soup is...,"[0.01498448383063078, -0.008606121875345707, 0..."
2,GNOCCHI,"This is an excellent soup, but as it requires ...","[-0.003438756102696061, -0.004649834707379341,..."
3,VEGETABLE SOUP,(Zuppa Sante) Any kind of vegetables may be us...,"[-0.016981083899736404, 0.001846199156716466, ..."
4,QUEEN'S SOUP,(Zuppa Regina) This is made with the white mea...,"[0.014747325330972672, 0.007032071240246296, 0..."


In [10]:
type(df['embedding'][0])

list

In [6]:
len(df['embedding'][0])

1536

## 3. Now we need a user input...

In [7]:
user_text = """
Hi! I’d like to cook a good Italian dish for lunch! I have potatoes, carrots, 
rosemary, and pork. Can you recommend a recipe and help me a bit with 
preparation tips?
"""

... and of course we need an embedding of `user_text` as well:

In [8]:
resp = client.embeddings.create(        
        model=model_name,                   
        input=[user_text]                        
    )
user_query = resp.data[0].embedding

print(type(user_query))
print(len(user_query))

<class 'list'>
1536


## 4. Find the most suitable examples that match the user input

In [9]:
# scipy has a function to compute cosine distance: cosine()
from scipy.spatial.distance import cosine

# Compute similarity scores: similarity = 1 − cosine_distance
scores = []
for emb in df["embedding"]:
    # TODO: razjasniti - ovde nismo morali da oduzimamo od 1? algoritam bi isto radio i da smo:
    # - gore preskocili None vrednosti
    # - ovde samo izracunali cosine(np.array(emb), np.array(user_query))
    # - i onda na kraju sortirali po najmanjoj vrednosti (najmanja udaljenost = najvise slicnosti)???
    if emb is None:
        scores.append(-1.0)
    else:
        scores.append(1.0 - cosine(np.array(emb), np.array(user_query)))

# Get top 5 indices
top5 = np.argsort(scores)[-5:]
# N.B. np.argsort(scores) — returns an array of indices that would 
# sort scores in ascending order. 
# [-5:] — takes the last 5 indices from that sorted‐indices array. 
# Since the full array is in ascending order, its last 5 indices correspond to 
# the 5 highest scores.

# Build a single output string with titles and recipes
output_lines = []
for i in top5:
    title = df.iloc[i]["title"]
    recipe = df.iloc[i]["receipt"]
    output_lines.append(f"{title}:\n{recipe}")
prompt_recipes = "\n\n".join(output_lines)

print(prompt_recipes)

LOIN OF PORK ROASTED:
(Lombo di maiale arrosto) The loin of pork, cut in little pieces forms an excellent roast at the spit. The pieces of pork are to be divided by little pieces of toast and greased with oil. If the pork is to be baked, choose that piece of the loin that has its ribs and that may weigh six or eight pounds. Lard it with garlic, rosemary or bay leaf and a few cloves, but moderately, and season with salt and pepper. This roast is very popular in Italy, where they call it =arista=.

POT ROAST WITH GARLIC AND ROSEMARY:
(Arrosto morto coll'odore dell'aglio e del ramerino) Cook the meat as above, but add a clove of garlic and one or two bunches of rosemary in the saucepan. When serving the roast rub the gravy through a sieve without pressing and surround the meat with potatoes or vegetables cooked apart. The leg of lamb comes very well in this way, baked in the oven.

LAMB WITH PEAS:
(Agnello ai piselli) Take a piece of lamb from the hind side, lard it with two cloves of gar

$$
\cos\theta = \frac{\mathbf{a} \cdot \mathbf{b}}{\|\mathbf{a}\|\;\|\mathbf{b}\|}
= \frac{\sum_{i=1}^n a_i\,b_i}{\sqrt{\sum_{i=1}^n a_i^2}\;\sqrt{\sum_{i=1}^n b_i^2}}
$$

A common definition of **cosine similarity** is:

$$
d_{\text{cos}}(\mathbf{a},\mathbf{b}) = 1 - \cos\theta
$$

- In text / embedding applications, higher cosine similarity (or lower cosine distance) means vectors are more semantically aligned.



## 5. Finally, use an LLM to shape the final response

In [11]:
prompt = f"""
You are a helpful Italian cooking assistant.  
Here are some recipe examples I found that may or may not be relevant to the user's request:

{prompt_recipes}

User’s question: "{user_text}"

From the examples above:
1. Determine which recipes are *relevant* to what the user asked and which are not.
2. Discard or ignore irrelevant ones, and focus on relevant ones.
3. For each relevant example, rephrase the recipe in a more narrative, 
conversational style, adding cooking tips, alternative ingredients, variations, 
or suggestions.
4. Then produce a final response to the user: a narrative that weaves 
together those enhanced recipes (titles + steps + tips) in an engaging way.
5. Don't forget to use the original titles of the recipes.
6. Advise on more than one recipe - if there are more than one relevant!

Do not just list recipes — tell a story, connect to the user's question, 
and use the examples as inspirations, but enhance them.  
Make sure your response is clear, helpful, and focused on what the user wants.
"""


In [29]:
response = client.chat.completions.create(
    model="gpt-4",    # or whichever model you prefer
    messages=[
        {"role": "system", "content": "You are a helpful Italian cooking assistant."},
        {"role": "user", "content": prompt}
    ],
    temperature=1,
    max_tokens=5000
)

reply_text = response.choices[0].message.content
print(reply_text)

Buongiorno! Since you’ve got potatoes, carrots, rosemary, and pork, we can prepare an enticing Italian dish just using these ingredients. Thinking about what we have at hand, two recipes come to mind that could bring the Italian countryside's flavor into your kitchen.

First, we can take inspiration from the "Loin of Pork Roasted" recipe. You don't have a whole loin, but don't worry, any cut of pork will do. Grab your pork and cut it into small portions, about an inch thick – this will ensure the meat cooks evenly and helps to absorb more flavors. Sprinkle each piece with salt, pepper, and finely chopped rosemary. Let this marinate for a while to let the flavors settle in. While this is happening, we can prepare the potatoes and carrots into bite-sized pieces - this will be our rustic side dish.

Put a drizzle of olive oil in a roasting pan and lay your pork pieces in, making sure not to crowd the pan. We want the meat to roast, not steam. Arrange your potato and carrot pieces around t