# GenAI/RAG in Python 2025

## Session 03. 

## (1) Postgres+pgvector Vector Storage
## (2) Function Calling

In [4]:
import os
import numpy as np
import pandas as pd
from openai import OpenAI

## 1. Postgres + pgvector as our vector storage

Install Docker image:

```
docker pull pgvector/pgvector:pg16
```

Run the container:

```
docker run --name ragdb \
  -e POSTGRES_USER=raguser \
  -e POSTGRES_PASSWORD=ragpass \
  -e POSTGRES_DB=ragdb \
  -p 5432:5432 \
  -d pgvector/pgvector:pg16
```

| Option                 | Meaning                                                 |
| ---------------------- | ------------------------------------------------------- |
| `--name ragdb`         | names the container (handy for stopping/starting later) |
| `-e POSTGRES_USER`     | creates a default Postgres user                         |
| `-e POSTGRES_PASSWORD` | sets that user’s password                               |
| `-e POSTGRES_DB`       | creates a database on startup                           |
| `-p 5432:5432`         | exposes port 5432 on localhost                          |
| `-d`                   | runs in detached (background) mode                      |


Postgres + pgvector is now running locally on port `5432`.

Verify:

```
docker exec -it ragdb psql -U raguser -d ragdb
```

```
CREATE EXTENSION IF NOT EXISTS vector;
\dx
```

```
SELECT 'pgvector ready' AS status;
```

Stopping / cleaning up

```
docker stop ragdb         # stop the container
docker start ragdb        # restart it
docker rm -f ragdb        # remove it completely
docker volume rm pgdata   # remove stored data (optional)
```

In [5]:
df = pd.read_csv("_data/italian_recipes_embedded.csv")

In [7]:
df

Unnamed: 0.1,Unnamed: 0,title,receipt,embedding
0,0,BROTH OR SOUP STOCK,(Brodo) To obtain good broth the meat must be ...,"[0.0007909321575425565, -0.03435778617858887, ..."
1,1,BREAD SOUP,(Panata) This excellent and nutritious soup is...,"[0.01498448383063078, -0.008606121875345707, 0..."
2,2,GNOCCHI,"This is an excellent soup, but as it requires ...","[-0.003453409532085061, -0.004623207729309797,..."
3,3,VEGETABLE SOUP,(Zuppa Sante) Any kind of vegetables may be us...,"[-0.016981083899736404, 0.001846199156716466, ..."
4,4,QUEEN'S SOUP,(Zuppa Regina) This is made with the white mea...,"[0.014747325330972672, 0.007032071240246296, 0..."
...,...,...,...,...
215,215,LEMON ICE,"(Gelato di limone) Granulated sugar, 3/4 lb. W...","[0.011548703536391258, -0.034989822655916214, ..."
216,216,STRAWBERRY ICE,"(Gelato di fragola) Ripe strawberries, 3/4 lb....","[0.014095728285610676, -0.019511036574840546, ..."
217,217,ORANGE ICE,(Gelato di aranci) Four big oranges. One lemon...,"[-0.012138275429606438, -0.025227639824151993,..."
218,218,PISTACHE ICE CREAM,"(Gelato di pistacchi) Milk, one quart. Sugar, ...","[0.009850133210420609, -0.06877513974905014, -..."


We want to migrate `df` to Postgres now... 

In [9]:
df.dtypes

Unnamed: 0     int64
title         object
receipt       object
embedding     object
dtype: object

Note: `embedding` is an `object` (a string indeed).

In [10]:
import ast
import psycopg
from psycopg.rows import dict_row

In [14]:
# --- 1. Parse string embeddings into numpy arrays ---
df['embedding_vector'] = df['embedding'].apply(
    lambda x: np.array(ast.literal_eval(x), dtype=np.float32)
)
df

Unnamed: 0.1,Unnamed: 0,title,receipt,embedding,embedding_vector
0,0,BROTH OR SOUP STOCK,(Brodo) To obtain good broth the meat must be ...,"[0.0007909321575425565, -0.03435778617858887, ...","[0.00079093216, -0.034357786, -0.00049442815, ..."
1,1,BREAD SOUP,(Panata) This excellent and nutritious soup is...,"[0.01498448383063078, -0.008606121875345707, 0...","[0.014984484, -0.008606122, 0.0067268386, -0.0..."
2,2,GNOCCHI,"This is an excellent soup, but as it requires ...","[-0.003453409532085061, -0.004623207729309797,...","[-0.0034534095, -0.0046232077, -0.0049738525, ..."
3,3,VEGETABLE SOUP,(Zuppa Sante) Any kind of vegetables may be us...,"[-0.016981083899736404, 0.001846199156716466, ...","[-0.016981084, 0.0018461992, 0.023807365, 0.00..."
4,4,QUEEN'S SOUP,(Zuppa Regina) This is made with the white mea...,"[0.014747325330972672, 0.007032071240246296, 0...","[0.014747325, 0.0070320712, 0.03286345, 0.0097..."
...,...,...,...,...,...
215,215,LEMON ICE,"(Gelato di limone) Granulated sugar, 3/4 lb. W...","[0.011548703536391258, -0.034989822655916214, ...","[0.011548704, -0.034989823, -0.003018932, -0.0..."
216,216,STRAWBERRY ICE,"(Gelato di fragola) Ripe strawberries, 3/4 lb....","[0.014095728285610676, -0.019511036574840546, ...","[0.014095728, -0.019511037, -0.014948981, -0.0..."
217,217,ORANGE ICE,(Gelato di aranci) Four big oranges. One lemon...,"[-0.012138275429606438, -0.025227639824151993,...","[-0.012138275, -0.02522764, 0.01355302, -0.011..."
218,218,PISTACHE ICE CREAM,"(Gelato di pistacchi) Milk, one quart. Sugar, ...","[0.009850133210420609, -0.06877513974905014, -...","[0.009850133, -0.06877514, -0.0041708536, -0.0..."


Why?

In [15]:
df['embedding'][0]

'[0.0007909321575425565, -0.03435778617858887, -0.0004944281536154449, 0.003790360875427723, 0.00041151398909278214, -0.020247863605618477, -0.052820514887571335, -0.029638176783919334, 0.011994658038020134, -0.10407597571611404, 0.015210351906716824, -0.03812369331717491, -0.0043864259496331215, 0.03726780787110329, -0.009598172269761562, 0.0026303271297365427, 0.06739506125450134, 0.00019200165115762502, -0.036778729408979416, 0.0037200557999312878, 0.04343020170927048, -0.031863484531641006, -0.03582502529025078, 0.02606790140271187, 0.01979546621441841, -0.016200736165046692, -0.006541429553180933, 0.015674976631999016, 0.00601566955447197, -0.025578822940587997, -0.03279273584485054, -0.011499466374516487, -0.025016382336616516, -0.06030341982841492, -0.024588437750935555, 0.015723884105682373, 0.04237867891788483, 0.01259377971291542, -0.027706315740942955, -0.01307063177227974, 0.0016032615676522255, 0.017826924100518227, 0.007935304194688797, -0.026532527059316635, -0.013755341

In [16]:
type(df['embedding'][0])

str

In [17]:
list(df['embedding'][0])

['[',
 '0',
 '.',
 '0',
 '0',
 '0',
 '7',
 '9',
 '0',
 '9',
 '3',
 '2',
 '1',
 '5',
 '7',
 '5',
 '4',
 '2',
 '5',
 '5',
 '6',
 '5',
 ',',
 ' ',
 '-',
 '0',
 '.',
 '0',
 '3',
 '4',
 '3',
 '5',
 '7',
 '7',
 '8',
 '6',
 '1',
 '7',
 '8',
 '5',
 '8',
 '8',
 '8',
 '7',
 ',',
 ' ',
 '-',
 '0',
 '.',
 '0',
 '0',
 '0',
 '4',
 '9',
 '4',
 '4',
 '2',
 '8',
 '1',
 '5',
 '3',
 '6',
 '1',
 '5',
 '4',
 '4',
 '4',
 '9',
 ',',
 ' ',
 '0',
 '.',
 '0',
 '0',
 '3',
 '7',
 '9',
 '0',
 '3',
 '6',
 '0',
 '8',
 '7',
 '5',
 '4',
 '2',
 '7',
 '7',
 '2',
 '3',
 ',',
 ' ',
 '0',
 '.',
 '0',
 '0',
 '0',
 '4',
 '1',
 '1',
 '5',
 '1',
 '3',
 '9',
 '8',
 '9',
 '0',
 '9',
 '2',
 '7',
 '8',
 '2',
 '1',
 '4',
 ',',
 ' ',
 '-',
 '0',
 '.',
 '0',
 '2',
 '0',
 '2',
 '4',
 '7',
 '8',
 '6',
 '3',
 '6',
 '0',
 '5',
 '6',
 '1',
 '8',
 '4',
 '7',
 '7',
 ',',
 ' ',
 '-',
 '0',
 '.',
 '0',
 '5',
 '2',
 '8',
 '2',
 '0',
 '5',
 '1',
 '4',
 '8',
 '8',
 '7',
 '5',
 '7',
 '1',
 '3',
 '3',
 '5',
 ',',
 ' ',
 '-',
 '0',
 '.',
 '0',
 '2'

Nope.

In [18]:
np.array(ast.literal_eval(df['embedding'][0]), dtype=np.float32)

array([ 0.00079093, -0.03435779, -0.00049443, ..., -0.00139005,
       -0.01881731, -0.00839382], shape=(1536,), dtype=float32)

Because:

In [19]:
ast.literal_eval(df['embedding'][0])

[0.0007909321575425565,
 -0.03435778617858887,
 -0.0004944281536154449,
 0.003790360875427723,
 0.00041151398909278214,
 -0.020247863605618477,
 -0.052820514887571335,
 -0.029638176783919334,
 0.011994658038020134,
 -0.10407597571611404,
 0.015210351906716824,
 -0.03812369331717491,
 -0.0043864259496331215,
 0.03726780787110329,
 -0.009598172269761562,
 0.0026303271297365427,
 0.06739506125450134,
 0.00019200165115762502,
 -0.036778729408979416,
 0.0037200557999312878,
 0.04343020170927048,
 -0.031863484531641006,
 -0.03582502529025078,
 0.02606790140271187,
 0.01979546621441841,
 -0.016200736165046692,
 -0.006541429553180933,
 0.015674976631999016,
 0.00601566955447197,
 -0.025578822940587997,
 -0.03279273584485054,
 -0.011499466374516487,
 -0.025016382336616516,
 -0.06030341982841492,
 -0.024588437750935555,
 0.015723884105682373,
 0.04237867891788483,
 0.01259377971291542,
 -0.027706315740942955,
 -0.01307063177227974,
 0.0016032615676522255,
 0.017826924100518227,
 0.00793530419468

In [20]:
type(ast.literal_eval(df['embedding'][0]))

list

Create Postgres table:

In [33]:
# --- 1. Connect to Postgres ---
from pgvector.psycopg import register_vector
conn = psycopg.connect(
    host="localhost",
    dbname="ragdb",
    user="raguser",
    password="ragpass",
    port=5432
)

register_vector(conn)

cur = conn.cursor()

Drop the `receipts` table if the code was previously executed.

In [22]:
cur.execute("DROP TABLE IF EXISTS receipts;")
conn.commit()

In [23]:
# --- 2. Create table if not exists ---

# - dimensionality
dim = len(df['embedding_vector'][0])

# - create:
create_table_sql = f"""
CREATE TABLE IF NOT EXISTS receipts (
    id SERIAL PRIMARY KEY,
    title TEXT,
    receipt TEXT,
    embedding VECTOR({dim})
);
"""
cur.execute(create_table_sql)
conn.commit()

Transfer:

In [24]:
for _, row in df.iterrows():
    cur.execute(
        "INSERT INTO receipts (title, receipt, embedding) VALUES (%s, %s, %s)",
        (row['title'], row['receipt'], row['embedding_vector'].tolist())
    )

conn.commit()

Check table:

In [25]:
# - a simple query to test
cur.execute("""
    SELECT id, title, receipt, embedding
    FROM receipts
    LIMIT 5;
""")

# - fetch results
rows = cur.fetchall()
colnames = [desc[0] for desc in cur.description]

# DataFrame
df_test = pd.DataFrame(rows, columns=colnames)
df_test

Unnamed: 0,id,title,receipt,embedding
0,1,BROTH OR SOUP STOCK,(Brodo) To obtain good broth the meat must be ...,"[0.00079093216, -0.034357786, -0.00049442815, ..."
1,2,BREAD SOUP,(Panata) This excellent and nutritious soup is...,"[0.014984484, -0.008606122, 0.0067268386, -0.0..."
2,3,GNOCCHI,"This is an excellent soup, but as it requires ...","[-0.0034534095, -0.0046232077, -0.0049738525, ..."
3,4,VEGETABLE SOUP,(Zuppa Sante) Any kind of vegetables may be us...,"[-0.016981084, 0.0018461992, 0.023807365, 0.00..."
4,5,QUEEN'S SOUP,(Zuppa Regina) This is made with the white mea...,"[0.014747325, 0.0070320712, 0.03286345, 0.0097..."


### Client:

In [26]:
# Set your API key (ensure OPENAI_API_KEY is set in your environment)
api_key = os.getenv("OPENAI_API_KEY")

# Instantiate the OpenAI client with your API key  
client = OpenAI(api_key=api_key)

In [27]:
user_text = """
Hi! I’d like to cook a good Italian dish for lunch! I have potatoes, carrots, 
rosemary, and pork. Can you recommend a recipe and help me a bit with 
preparation tips?
"""

... and of course we need an embedding of `user_text` as well:

In [28]:
# Select the embedding model to use (as per OpenAI docs)  
model_name = "text-embedding-3-small"   

resp = client.embeddings.create(        
        model=model_name,                   
        input=[user_text]                        
    )
user_query = resp.data[0].embedding

print(type(user_query))
print(len(user_query))

<class 'list'>
1536


In [30]:
print(user_query)

[-0.00511555140838027, -0.06346769630908966, -0.02818438969552517, -0.04233996942639351, -0.006982714403420687, -0.07052435725927353, -0.0485515221953392, 0.005535464733839035, 0.024254633113741875, -0.0622422881424427, -0.036339692771434784, -0.014609824866056442, 0.02005021460354328, 0.011081493459641933, -0.031247910112142563, -0.027550557628273964, -0.012338593602180481, 0.047664158046245575, -0.02562793530523777, 0.022712308913469315, 0.08020085841417313, -0.019712170585989952, -0.028353411704301834, 0.021634794771671295, 0.029008371755480766, 0.035579096525907516, 0.004114625044167042, 0.05970695987343788, -0.0014023530529811978, -0.0012168250977993011, -0.0011382563970983028, -0.008313761092722416, -0.050410762429237366, -0.056326523423194885, 0.010104336775839329, -0.04466401785612106, 0.01663808710873127, -0.0038558105006814003, -0.02307147905230522, 0.015349294990301132, -0.010421251878142357, 0.02528989128768444, -0.001218145596794784, -0.02864919975399971, 0.006037248298525

Find the most suitable examples that match the user input: Cosine Distance (Similarity) in Postgres

In [34]:
# Run the retrieval query using pgvector operator
# NOTE: ::vector - the "vector adapter"

sql = """
SELECT id, title, receipt, 1 - (embedding <=> %s::vector) AS similarity
FROM receipts
ORDER BY embedding <=> %s::vector
LIMIT 5;
"""

cur.execute(sql, (user_query, user_query))

rows = cur.fetchall()
colnames = [desc[0] for desc in cur.description]

# - Load into pandas DataFrame
prompt_recipes = pd.DataFrame(rows, columns=colnames)

# Show results
print(prompt_recipes)

    id                               title  \
0    8                   VEGETABLE CHOWDER   
1   49                         STEWED HARE   
2   74                      LAMB WITH PEAS   
3  125  POT ROAST WITH GARLIC AND ROSEMARY   
4  140                LOIN OF PORK ROASTED   

                                             receipt  similarity  
0  (Minestrone alla Milanese) Cut off the rind of...    0.549674  
1  (Stufato di lepre) Take half of a good sized h...    0.524103  
2  (Agnello ai piselli) Take a piece of lamb from...    0.523721  
3  (Arrosto morto coll'odore dell'aglio e del ram...    0.520807  
4  (Lombo di maiale arrosto) The loin of pork, cu...    0.518094  


Nice; clean-up:

In [35]:
cur.close()
conn.close()

Integrate results into the prompt:

In [36]:
# Build a single output string with titles and recipes
output_lines = []
for _, row in prompt_recipes.iterrows():
    title = row["title"]
    recipe = row["receipt"]
    output_lines.append(f"{title}:\n{recipe}")
prompt_recipes = "\n\n".join(output_lines)
print(prompt_recipes)

VEGETABLE CHOWDER:
(Minestrone alla Milanese) Cut off the rind of 1/2 lb. salt pork and put it into two quarts of water to boil. Cut off a small slice of the pork and beat it to a paste with two or three sprigs of parsley, a little celery and one kernel of garlic. Add this paste to the pork and water. Slice two carrots, cut the rib out of the leaves of 1/4 medium sized cabbage. Add the carrots, cabbage leaves, other vegetables, seasoning and butter to the soup, and let it boil slowly for 2-1/2 hours. The last 1/2 hour add one small handful of rice for each person. When the pork is very soft, remove and slice in little ribbons and put it back. The minestrone is equally good eaten cold.

STEWED HARE:
(Stufato di lepre) Take half of a good sized hare and, after cutting it in pieces, chop fine one medium sized onion, one clove of garlic, a stalk of celery and several leaves of rosemary. Put on the fire with some pieces of butter, two tablespoonfuls of olive oil and four or five strips of b

In [37]:
prompt = f"""
You are a helpful Italian cooking assistant.  
Here are some recipe examples I found that may or may not be relevant to the user's request:

{prompt_recipes}

User’s question: "{user_text}"

From the examples above:
1. Determine which recipes are *relevant* to what the user asked and which are not.
2. Discard or ignore irrelevant ones, and focus on relevant ones.
3. For each relevant example, rephrase the recipe in a more narrative, 
conversational style, adding cooking tips, alternative ingredients, variations, 
or suggestions.
4. Then produce a final response to the user: a narrative that weaves 
together those enhanced recipes (titles + steps + tips) in an engaging way.
5. Don't forget to use the original titles of the recipes.
6. Advise on more than one recipe - if there are more than one relevant!

Do not just list recipes — tell a story, connect to the user's question, 
and use the examples as inspirations, but enhance them.  
Make sure your response is clear, helpful, and focused on what the user wants.
"""

Run prompt, sit and enjoy:

In [38]:
response = client.chat.completions.create(
    model="gpt-4",    # or whichever model you prefer
    messages=[
        {"role": "system", "content": "You are a helpful Italian cooking assistant."},
        {"role": "user", "content": prompt}
    ],
    temperature=0,
    max_tokens=5000
)

reply_text = response.choices[0].message.content
print(reply_text)

Buongiorno! It sounds like you're in the mood for a hearty Italian meal. With the ingredients you have on hand, I have two delightful recipes in mind that will transport you straight to the heart of Italy. 

The first one is a variation of the "Loin of Pork Roasted" or "Lombo di maiale arrosto". This dish is a classic in Italian households, and it's perfect for a cozy lunch. Here's how you can prepare it:

Start by preheating your oven to 375°F (190°C). While the oven is heating, take your pork loin and season it generously with salt, pepper, and finely chopped rosemary. If you have garlic, you can also add some for an extra kick of flavor. Once your pork is well-seasoned, place it in a roasting pan. 

Next, take your potatoes and carrots, peel them, and cut them into bite-sized pieces. Toss them in a bowl with a drizzle of olive oil, a pinch of salt, and some more chopped rosemary. Once they're well coated, arrange them around the pork in the roasting pan. 

Roast everything in the ov

## 2. Function Calling with OpenAI ChatGPT

OpenAI's function calling feature allows ChatGPT to output structured data (like JSON) to call external functions or APIs. This means the model can decide when to use a function you provide, and it will return the function name and arguments it wants to call. The developer's code can then execute that function (e.g., call an external API) and pass the result back to the model. This is useful for retrieving real-time information (like weather or stock prices) or performing computations that the model can't do by itself.

### 2.1 The Tool

First, we need an external API that our tool (previously known as: `function` in the OpenAI API) will call. For this educational example, we'll use the `wttr.in` weather API. It's a free API (no API key needed) that returns current weather information for a given location in JSON format.

We'll write a Python function `get_current_weather(location)` that calls this API. This function will serve as the "tool" that ChatGPT can use via function calling. In a real application, this function might call any external service or perform some calculation. Here, it will fetch weather data and return a simple text summary.

In [39]:
import requests

def get_current_weather(location: str) -> str:
    """
    Fetches the current weather in Celsius for a given location using wttr.in.
    Returns a simple summary string.
    """
    url = f"http://wttr.in/{location}?format=j1"
    data = requests.get(url).json()
    current = data["current_condition"][0]
    temp_c = current["temp_C"]
    desc = current["weatherDesc"][0]["value"]
    return f"Temperature: {temp_c}°C, Condition: {desc}"

In [41]:
get_current_weather("New York")

'Temperature: 13°C, Condition: Sunny'

### 2.2. Defining the Function Schema for the OpenAI API

Now that we have a Python function, we need to tell the OpenAI API about it. We do this by providing a tool schema in our API call. The schema includes the tool name, description, and parameters (with types) that the tool accepts. This information helps the model decide when and how to call the function.

We'll prepare a dictionary that follows the OpenAI function calling specification. This will later be passed to the functions parameter of the chat completion API call.

In [None]:
# Define the function schema for the OpenAI API
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather for a city (Celsius only)",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name, e.g. Paris or London"
                    }
                },
                "required": ["location"]
            }
        }
    }
]

This schema is how we register the function with the model. With `functions=[...]` in the API call, GPT-3.5/4 is aware that it has an available tool named "get_current_weather" and how to call it.

In [66]:
# messages
messages = [
    {"role": "user", 
     "content": "What's the weather in Paris right now?"}
]

# call
response = client.chat.completions.create(
    model="gpt-4-1106-preview",
    messages=messages,
    tools=tools,
    tool_choice="auto",
    temperature=0
)

# result
response_message = response.choices[0].message
print(response_message)

ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=[ChatCompletionMessageFunctionToolCall(id='call_IPAALPF57DLiTymFqC422YBR', function=Function(arguments='{"location":"Paris"}', name='get_current_weather'), type='function')])


In [67]:
resp = response_message.tool_calls[0].function.arguments
print(resp)

{"location":"Paris"}


In [78]:
response_message.tool_calls[0].function

Function(arguments='{"location":"Paris"}', name='get_current_weather')

In [68]:
print(type(resp))
import json
args = json.loads(resp)
print(type(args))

<class 'str'>
<class 'dict'>


In [69]:
args['location']

'Paris'

Find out:

In [None]:
weather_conditions = get_current_weather(args['location'])
print(weather_conditions)

Temperature: 11°C, Condition: Partly cloudy


Final response:

In [71]:
messages[0]['content']

"What's the weather in Paris right now?"

In [74]:
prompt = f"""
This is the user formulated question: {messages[0]['content']}.

This is the response obtained from a relevant API: {weather_conditions}.

Provide a polite response to the user in English including the info found 
in the API response and include some recommendations.
"""

# messages
messages = [
    {"role": "user", 
     "content": prompt}
]

# call
response = client.chat.completions.create(
    messages = messages,
    model="gpt-4-1106-preview",
    temperature = .25
)

In [76]:
print(response.choices[0].message.content)

Hello! Currently, in Paris, the temperature is a cool 11°C with partly cloudy skies. It's a lovely day for a stroll around the city, perhaps enjoying the sights of the Eiffel Tower or the gardens of Luxembourg. If you're planning to be outdoors, you might want to wear a light jacket to stay comfortable. Enjoy your time in the beautiful city of Paris!
