<a href="https://colab.research.google.com/github/jeevanshrestha/GenAi/blob/main/OCR_RAG_with_OPENAI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [95]:
#set working directory
import os
os.chdir('/content/drive/MyDrive/GenAI/RAG/RAG with OpenAI')

## RAG and OCR with OPENAI

### OpenAI Setup

In [2]:
!pip install openai pdf2image -q

In [23]:
!apt-get install -y poppler-utils -q

Reading package lists...
Building dependency tree...
Reading state information...
poppler-utils is already the newest version (22.02.0-2ubuntu0.8).
0 upgraded, 0 newly installed, 0 to remove and 35 not upgraded.


In [3]:
from google.colab import userdata
api_key = userdata.get('genai_course')

In [4]:
# Mount Google Drive to access files
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [26]:
#Import OpenAI Libraries
from openai import OpenAI
from IPython.display import Markdown, display

#Define the model
MODEL = "gpt-4o-mini"

In [27]:
#Connect to OpenAi api
client = OpenAI(api_key=api_key)

In [75]:
#Import the libraries
from pdf2image import convert_from_path
import os
import base64
import json
import numpy as np
import pandas as pd

In [8]:
print(os.getcwd())

/content


In [21]:
def pdf_to_images(pdf_path, output_folder):
    # Create output folder if it doesn't exist
    if not os.path.exists(output_folder):
        os.makedirs(output_folder)

    # Convert PDF to images
    images = convert_from_path(pdf_path)
    image_paths = []

    for i, image in enumerate(images):
        image_path = os.path.join(output_folder, f"page_{i+1}.jpg")
        image.save(image_path, "JPEG")
        image_paths.append(image_path)

    return image_paths

#### Add pdf path to convert to image

In [17]:
pdf_name = "Things mother used to make.pdf"
directory = os.path.join(os.getcwd(),'drive','MyDrive','GenAI','RAG','RAG with OpenAI')
pdf_path = os.path.join(directory,pdf_name)
os.path.exists(pdf_path)

True

In [22]:
output_folder = os.path.join(os.getcwd(),'drive','MyDrive','GenAI','RAG','RAG with OpenAI','output')
image_paths = pdf_to_images(pdf_path,output_folder)
for images in image_paths:
  print(images)

/content/drive/MyDrive/GenAI/RAG/RAG with OpenAI/output/page_1.jpg
/content/drive/MyDrive/GenAI/RAG/RAG with OpenAI/output/page_2.jpg
/content/drive/MyDrive/GenAI/RAG/RAG with OpenAI/output/page_3.jpg
/content/drive/MyDrive/GenAI/RAG/RAG with OpenAI/output/page_4.jpg
/content/drive/MyDrive/GenAI/RAG/RAG with OpenAI/output/page_5.jpg
/content/drive/MyDrive/GenAI/RAG/RAG with OpenAI/output/page_6.jpg
/content/drive/MyDrive/GenAI/RAG/RAG with OpenAI/output/page_7.jpg
/content/drive/MyDrive/GenAI/RAG/RAG with OpenAI/output/page_8.jpg
/content/drive/MyDrive/GenAI/RAG/RAG with OpenAI/output/page_9.jpg
/content/drive/MyDrive/GenAI/RAG/RAG with OpenAI/output/page_10.jpg
/content/drive/MyDrive/GenAI/RAG/RAG with OpenAI/output/page_11.jpg
/content/drive/MyDrive/GenAI/RAG/RAG with OpenAI/output/page_12.jpg
/content/drive/MyDrive/GenAI/RAG/RAG with OpenAI/output/page_13.jpg
/content/drive/MyDrive/GenAI/RAG/RAG with OpenAI/output/page_14.jpg
/content/drive/MyDrive/GenAI/RAG/RAG with OpenAI/output/p

##### Encode image to base64

In [29]:
image_path = os.path.join(output_folder,"page_23.jpg")
with open(image_path, "rb") as image_file:
    encoded_image_data = base64.b64encode(image_file.read()).decode('utf-8')


### OpenAI for OCR

In [31]:
SYSTEM_PROMPT ="""
Please analyze the content of this image and extract any relatd recipe information.

"""

In [32]:

user_content = [
    {"type": "text",
     "text": "This is an image of a recipe from Things mother used to make cook book."},
    {"type": "image_url",
     "image_url": {
         "url": f"data:image/jpeg;base64,{encoded_image_data}",
         "detail": "high"
     }}
]

response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_content}
    ]
)

In [35]:
#Define a function to get the gpt response and display in markdown
def get_gpt_response(response):
  return display(Markdown(response.choices[0].message.content))

In [37]:
# Display Output
get_gpt_response(response)

Here are the recipes extracted from the image:

### Bannocks
**Ingredients:**
- 1 Cupful of Thick Sour Milk
- ½ Cupful of Sugar
- 1 Egg
- 2 Cupfuls of Flour
- ½ Cupful of Indian Meal
- 1 Teaspoonful of Soda
- A pinch of Salt

**Instructions:**
1. Make the mixture stiff enough to drop from a spoon.
2. Drop mixture, size of a walnut, into boiling fat.
3. Serve warm, with maple syrup.

---

### Boston Brown Bread
**Ingredients:**
- 1 Cupful of Rye Meal
- 1 Cupful of Graham Meal
- ½ Cupful of Flour
- 1 Cupful of Indian Meal
- 1 Cupful of Sweet Milk
- 1 Cupful of Sour Milk
- 1 Cupful of Molasses
- ½ Teaspoonful of Salt
- 1 Heaping Teaspoonful of Soda

**Instructions:**
1. Stir the meals and salt together.
2. Beat the soda into the molasses until it foams; add sour milk, mix all together, and pour into a tin pail which has been well greased, if you have no brown-bread steamer.

In [38]:
# Define improved system prompt

system_prompt2 = """
Please analyze the content of this image and extract any related recipe information into structure components.
Specifically, extra the recipe title, list of ingredients, step by step instructions, cuisine type, dish type, any relevant tags or metadata.
The output must be formatted in a way suited for embedding in a Retrieval Augmented Generation (RAG) system.
"""

In [39]:
# Call the api to extract the information
response = client.chat.completions.create(
    model = MODEL,
    messages = [
        # Provide the system prompt
        {"role": "system", "content": system_prompt2},

         # The user message contains both the text and image URL / path
        {"role": "user", "content": user_content}
    ],
    temperature = 0, # Set the temperature to 0 for deterministic output
)

In [40]:
get_gpt_response(response)

Here’s the structured information extracted from the recipe image:

### Recipe Title
- **Breads**

### Dishes
- **Bannocks**
- **Boston Brown Bread**

### Ingredients

#### Bannocks
- 1 Cupful of Thick Sour Milk
- ½ Cupful of Sugar
- 1 Egg
- 2 Cupfuls of Flour
- ½ Cupful of Indian Meal
- 1 Teaspoonful of Soda
- A pinch of Salt

#### Boston Brown Bread
- 1 Cupful of Rye Meal
- 1 Cupful of Graham Meal
- ½ Cupful of Flour
- 1 Cupful of Indian Meal
- 1 Cupful of Sweet Milk
- 1 Cupful of Sour Milk
- 1 Cupful of Molasses
- ½ Teaspoonful of Salt
- 1 Heaping Teaspoonful of Soda

### Instructions

#### Bannocks
1. Make the mixture stiff enough to drop from a spoon.
2. Drop mixture, size of a walnut, into boiling fat.
3. Serve warm, with maple syrup.

#### Boston Brown Bread
1. Stir the meals and salt together.
2. Beat the soda into the molasses until it foams.
3. Add sour milk, mix all together, and pour into a tin pail which has been well greased, if you have no brown-bread steamer.

### Cuisine Type
- Traditional/Heritage

### Tags/Metadata
- Breads
- Traditional Recipes
- Comfort Food
- Baking

This format is suitable for embedding in a Retrieval Augmented Generation (RAG) system.

### Data Extraction and Data Processing

In [42]:
import openai
from openai import RateLimitError
import time
def call_openai_with_retry(user_content, model, system_prompt, max_retries=5):
    retries = 0
    while retries < max_retries:
        try:
            response = client.chat.completions.create(
                model=model,
                messages=[
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": user_content}
                ],
                temperature=0,
            )
            return response.choices[0].message.content
        except RateLimitError as e:
            wait_time = 2 ** retries  # exponential backoff
            print(f"Rate limit hit. Retrying in {wait_time} seconds...")
            time.sleep(wait_time)
            retries += 1
    raise Exception("Rate limit exceeded after multiple retries.")

In [44]:
# Extract the info about all the images/recipes

SYSTEM_PROMPT = """
Please analyze the content of this image and extract any related recipe information into structure components.
Specifically, extra the recipe title, list of ingredients, step by step instructions, cuisine type, dish type, any relevant tags or metadata.
The output must be formatted in a way suited for embedding in a Retrieval Augmented Generation (RAG) system.

"""
extracted_recipes = []
for image_path in image_paths:
    print(f'Processing image: {image_path}')

    # Encode image to base64
    with open(image_path, "rb") as image_file:
        encoded_image_data = base64.b64encode(image_file.read()).decode('utf-8')

    user_content = [
        {"type": "text", "text": "This is an image of a recipe from Things Mother Used to Make cookbook."},
        {"type": "image_url", "image_url": {
            "url": f"data:image/jpeg;base64,{encoded_image_data}",
            "detail": "high"
        }}
    ]

    # Call GPT with retry logic
    recipe_info = call_openai_with_retry(user_content, MODEL, system_prompt2)

    extracted_recipes.append({
        "image_path": image_path,
        "recipe_info": recipe_info
    })

    print(f'Extracted info from image: {image_path}, receipes: {recipe_info}')

Processing image: /content/drive/MyDrive/GenAI/RAG/RAG with OpenAI/output/page_1.jpg
Extracted info from image: /content/drive/MyDrive/GenAI/RAG/RAG with OpenAI/output/page_1.jpg, receipes: I can't extract specific recipe information from the image you provided. If you have the text of the recipe or any details about it, feel free to share, and I can help you organize that information!
Processing image: /content/drive/MyDrive/GenAI/RAG/RAG with OpenAI/output/page_2.jpg
Extracted info from image: /content/drive/MyDrive/GenAI/RAG/RAG with OpenAI/output/page_2.jpg, receipes: I can't extract specific recipe information from the image you provided. If you have the text of the recipe or any details about it, feel free to share, and I can help you organize that information!
Processing image: /content/drive/MyDrive/GenAI/RAG/RAG with OpenAI/output/page_3.jpg
Extracted info from image: /content/drive/MyDrive/GenAI/RAG/RAG with OpenAI/output/page_3.jpg, receipes: It seems that the image you prov

KeyboardInterrupt: 

In [45]:
extracted_recipes

[{'image_path': '/content/drive/MyDrive/GenAI/RAG/RAG with OpenAI/output/page_1.jpg',
  'recipe_info': "I can't extract specific recipe information from the image you provided. If you have the text of the recipe or any details about it, feel free to share, and I can help you organize that information!"},
 {'image_path': '/content/drive/MyDrive/GenAI/RAG/RAG with OpenAI/output/page_2.jpg',
  'recipe_info': "I can't extract specific recipe information from the image you provided. If you have the text of the recipe or any details about it, feel free to share, and I can help you organize that information!"},
 {'image_path': '/content/drive/MyDrive/GenAI/RAG/RAG with OpenAI/output/page_3.jpg',
  'recipe_info': 'It seems that the image you provided does not contain any recipe information, such as ingredients or instructions. It appears to be a bibliographic entry for the book "Things Mother Used to Make." If you have another image or specific details about a recipe, feel free to share!'},
 {

In [47]:
# Filtering out the non receipe pages
keywords = ["ingredients", "instructions","recipe title"]
filter_out_keywords=["does not contain any recipe", "content provided is an introduction", "image contains a foreword", "I'm unable to view or analyze images directly"]
filtered_recipes = [recipe for recipe in extracted_recipes if any(keyword in recipe['recipe_info'].lower() for keyword in keywords) and not any(filter_out_keyword in recipe['recipe_info'].lower() for filter_out_keyword in filter_out_keywords)]

In [64]:

filtered_recipes

[{'image_path': '/content/drive/MyDrive/GenAI/RAG/RAG with OpenAI/output/page_13.jpg',
  'recipe_info': 'Here’s the structured information extracted from the image of the recipe contents:\n\n### Recipe Title\n- Various Breads\n\n### List of Ingredients\n- Not provided in the image.\n\n### Step-by-Step Instructions\n- Not provided in the image.\n\n### Cuisine Type\n- Traditional American\n\n### Dish Type\n- Breads\n\n### Relevant Tags or Metadata\n- Cookbook: Things Mother Used to Make\n- Category: Breads\n- Page Numbers: 1-11 (various recipes listed)\n\n### Recipes Included\n1. Bannocks\n2. Boston Brown Bread\n3. Brown Bread (Baked)\n4. Coffee Cakes\n5. Corn Meal Gems\n6. Cream of Tartar Biscuits\n7. Crullers\n8. Delicious Dip Toast\n9. Doughnuts\n10. Fried Bread\n11. German Toast\n12. Soft Gingerbread\n13. Huckleberry Cake\n14. Quick Graham Bread\n15. Graham Bread (Raised Over Night)\n16. Graham Muffins\n17. Sour Milk Griddle Cakes\n18. Sweet Milk Griddle Cakes\n19. Jenny Lind Tea Cak

In [65]:
#store data as json

output_file = os.path.join(directory, 'receipes.json')
with (open(output_file,'w')) as f:
  json.dump(filtered_recipes,f  )

### Data Embeddings


In [67]:
EMBEDDING_MODEL = "text-embedding-3-large"
recipe_text = [recipe['recipe_info'] for recipe in filtered_recipes]
embedding_response = client.embeddings.create(
    input=recipe_text,
    model=EMBEDDING_MODEL
)

In [71]:
embedding_response.data[0].embedding

[0.0028593551833182573,
 -0.03267834335565567,
 -0.017896737903356552,
 0.011200731620192528,
 -0.015896335244178772,
 -0.04144347086548805,
 0.007474789395928383,
 0.008917829021811485,
 0.005126988049596548,
 0.05295724421739578,
 0.009543908759951591,
 0.0002951453789137304,
 0.02522646076977253,
 -0.0035846922546625137,
 0.011864988133311272,
 -0.0024566021747887135,
 0.012475797906517982,
 0.00011428831203375012,
 -0.03090699575841427,
 -0.028784429654479027,
 -0.004134421236813068,
 0.012521608732640743,
 0.0332280732691288,
 -0.00956681463867426,
 0.02525700069963932,
 0.006092831492424011,
 -0.024310246109962463,
 -0.009895125404000282,
 -0.006692189257591963,
 0.02400483936071396,
 0.012323095463216305,
 -0.00956681463867426,
 -0.007795465178787708,
 -0.011506136506795883,
 -0.01809525117278099,
 -0.008284113369882107,
 0.010253976099193096,
 0.06523452699184418,
 0.03857266157865524,
 0.01751498132944107,
 0.029563212767243385,
 0.001456400495953858,
 0.008116140030324459,
 0

In [72]:
embeddings = [data.embedding for data in embedding_response.data]
embeddings

[[0.0028593551833182573,
  -0.03267834335565567,
  -0.017896737903356552,
  0.011200731620192528,
  -0.015896335244178772,
  -0.04144347086548805,
  0.007474789395928383,
  0.008917829021811485,
  0.005126988049596548,
  0.05295724421739578,
  0.009543908759951591,
  0.0002951453789137304,
  0.02522646076977253,
  -0.0035846922546625137,
  0.011864988133311272,
  -0.0024566021747887135,
  0.012475797906517982,
  0.00011428831203375012,
  -0.03090699575841427,
  -0.028784429654479027,
  -0.004134421236813068,
  0.012521608732640743,
  0.0332280732691288,
  -0.00956681463867426,
  0.02525700069963932,
  0.006092831492424011,
  -0.024310246109962463,
  -0.009895125404000282,
  -0.006692189257591963,
  0.02400483936071396,
  0.012323095463216305,
  -0.00956681463867426,
  -0.007795465178787708,
  -0.011506136506795883,
  -0.01809525117278099,
  -0.008284113369882107,
  0.010253976099193096,
  0.06523452699184418,
  0.03857266157865524,
  0.01751498132944107,
  0.029563212767243385,
  0.001

In [76]:
#convert the embeddgins to numpy array
embedding_matrix = np.array(embeddings)
embedding_matrix

array([[ 0.00285936, -0.03267834, -0.01789674, ..., -0.00692888,
        -0.01795782,  0.01471289],
       [-0.00698836, -0.04248798, -0.0177557 , ..., -0.00600237,
        -0.03507144,  0.01590157],
       [ 0.00824435, -0.0186461 , -0.02428359, ...,  0.00935515,
        -0.04373872,  0.00654925],
       ...,
       [-0.02872093, -0.0297627 , -0.02386465, ...,  0.00389664,
        -0.01411204,  0.0132065 ],
       [ 0.0143513 , -0.03258373, -0.01450625, ...,  0.0010662 ,
        -0.01809224,  0.01565731],
       [ 0.00025989, -0.0331783 , -0.02229303, ..., -0.00095065,
        -0.01940481,  0.00330912]])

In [78]:
# Verify the embedding matrix
print(f'Generated imbedings for {len(filtered_recipes)} receipes')
print(f'Embeddings shape: {len(embeddings[0] )}')

Generated imbedings for 10 receipes
Embeddings shape: 3072


## Retrieval System

In [82]:
!pip install langchain langchain_community faiss-cpu -q

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m24.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.3/31.3 MB[0m [31m53.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m438.1/438.1 kB[0m [31m23.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.0/363.0 kB[0m [31m20.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.4/44.4 kB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.9/50.9 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[?25h

In [85]:
import faiss

In [86]:
print(f'Embedding matrix shape: {embedding_matrix.shape} ')

Embedding matrix shape: (10, 3072) 


In [144]:
index = faiss.IndexFlatL2(embedding_matrix.shape[1])
index.add(embedding_matrix)

In [145]:
# Save the index
faiss.write_index(index, 'faiss_index.faiss')

In [163]:
# Save the metadata
metadata = [{"recipe_info":recipe['recipe_info'], "image_path":recipe['image_path']} for recipe in filtered_recipes]

with open("recipe_metadata.json", "w") as f:
    json.dump(metadata, f, indent=4)

In [169]:
query = f"How to make apple cake?"
query_embedding = client.embeddings.create(
    input=[query],
    model=EMBEDDING_MODEL
).data[0].embedding
print(f"The query embedding is {query_embedding}\n")
query_vector = np.array(query_embedding).reshape(1, -1)
print(f"The query vector is {query_vector} \n")

The query embedding is [-0.039059195667505264, -0.03422307223081589, -0.019205646589398384, 0.041002899408340454, -0.019587446004152298, -0.0011526279849931598, 0.020200638100504875, 0.008966491557657719, -0.018361061811447144, 0.018731290474534035, 0.018002402037382126, -0.03426935151219368, -0.041627660393714905, 0.025962332263588905, 0.029224978759884834, -0.004882399458438158, 0.034547023475170135, -0.0019118874333798885, -0.014346386305987835, -0.003757249331101775, 0.009643317200243473, 0.0026754853315651417, 0.003453545505180955, 0.038874078541994095, 0.01177792064845562, -0.004324162844568491, 0.00787315797060728, -0.011963034979999065, 0.024226881563663483, 0.032973550260066986, 0.004766702651977539, -0.007531852927058935, -0.02806801162660122, 0.005249736364930868, -0.03371400758624077, 0.03320494294166565, 0.014948009513318539, 0.012240706942975521, 0.029410092160105705, -0.0006822487921454012, -0.056043464690446854, 0.04070208594202995, 0.01111844927072525, 0.03827245533466

In [170]:
#search the faiss index
k=5
distances, indices = index.search(query_vector, min(k ,  len(metadata)))
print(f"The distances are {distances} \n")
print(f"The indices are {indices} \n")
results = zip(indices.flatten(), distances.flatten())
sorted_results = sorted(results, key=lambda x: x[1], reverse=True)
sorted_results

The distances are [[1.1230092 1.4160175 1.42567   1.4348071 1.451101 ]] 

The indices are [[2 1 9 5 8]] 



[(np.int64(8), np.float32(1.451101)),
 (np.int64(5), np.float32(1.4348071)),
 (np.int64(9), np.float32(1.42567)),
 (np.int64(1), np.float32(1.4160175)),
 (np.int64(2), np.float32(1.1230092))]

In [171]:

# Print the metadata content
results_metadata = []
for i, dist in sorted_results:
  results_metadata.append(metadata[i]['recipe_info'])

results_metadata

['Here’s the structured information extracted from the recipe image:\n\n### Recipe Title\n- Bannocks\n- Boston Brown Bread\n\n### Ingredients\n\n#### Bannocks\n- 1 Cupful of Thick Sour Milk\n- ½ Cupful of Sugar\n- 1 Egg\n- 2 Cupfuls of Flour\n- ½ Cupful of Indian Meal\n- 1 Teaspoonful of Soda\n- A pinch of Salt\n\n#### Boston Brown Bread\n- 1 Cupful of Rye Meal\n- 1 Cupful of Graham Meal\n- ½ Cupful of Flour\n- 1 Cupful of Indian Meal\n- 1 Cupful of Sweet Milk\n- 1 Cupful of Sour Milk\n- 1 Cupful of Molasses\n- ½ Teaspoonful of Salt\n- 1 Heaping Teaspoonful of Soda\n\n### Step-by-Step Instructions\n\n#### Bannocks\n1. Make the mixture stiff enough to drop from a spoon.\n2. Drop mixture, size of a walnut, into boiling fat.\n3. Serve warm, with maple syrup.\n\n#### Boston Brown Bread\n1. Stir the meals and salt together.\n2. Beat the soda into the molasses until it foams.\n3. Add sour milk, mix all together.\n4. Pour into a tin pail which has been well greased, if you have no brown-bread

In [172]:
# Define a function to query the embeddings
def query_embeddings(query, index, metatdata ,k=5):

  # Generate the embeddings for the query
  query_embedding = client.embeddings.create(
    input=[query],
    model=EMBEDDING_MODEL
  ).data[0].embedding
  print(f"The query embedding is {query_embedding}\n")
  query_vector = np.array(query_embedding).reshape(1, -1)
  print(f"The query vector is {query_vector} \n")

  # Search the FAISS Index
  distances, indices = index.search(query_vector, min(k ,  len(metadata)))
  print(f"The distances are {distances} \n")
  print(f"The indices are {indices} \n")
  # Store the indices and distance
  results = zip(indices.flatten(), distances.flatten())
  sorted_results = sorted(results, key=lambda x: x[1], reverse=True)
  sorted_results

  # Print everythin (indices, distnces, metadata) -> debugging
  results_metadata = []
  for i, dist in sorted_results:
    results_metadata.append(metadata[i]['recipe_info'])

  print(results_metadata)
  return results_metadata
  # Return the results

In [173]:
combined_context = "\n".join(results_metadata)
SYSTEM_PROMPT3 = f"""
You are highly experienced and expert chef specialized in providing cooking advice.
Your main task is to provide information precise and accurate on the compined content.
Your answer directly to the query using only information from the provided {combined_context}
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Your goal is to help the user answer the  {query}
"""

In [174]:
# Define Generate Respons function to make OpanAI API Call

def generate_response(query, system_prompt):

  response =  client.chat.completions.create(
    model = MODEL,
    messages = [
        # Provide the system prompt
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": query}
    ],
    temperature = 0, # Set the temperature to 0 for deterministic output
  )

  return response



In [175]:

response = generate_response(query, SYSTEM_PROMPT3)

In [176]:
display(Markdown(response.choices[0].message.content))

I'm sorry, but I don't have a specific recipe for apple cake in the provided information. If you have a different recipe or details, I can help with that!

## RAG System


In [155]:
def rag_system(query, index, metadata, k=5 ):

  #Retrieval System
  results_metadata = query_embeddings(query, index, metadata, k)
  combined_context = "\n".join(results_metadata)
  SYSTEM_PROMPT3 = f"""
  You are highly experienced and expert chef specialized in providing cooking advice.
  Your main task is to provide information precise and accurate on the compined content.
  Your answer directly to the query using only information from the provided {combined_context}
  If you don't know the answer, just say that you don't know, don't try to make up an answer.
  Your goal is to help the user answer the  {query}
  """
  response = generate_response(query, SYSTEM_PROMPT3)
  return response



In [156]:
query1 = "How to make the best chocolate cake?"
response =rag_system(query1, index, metadata, k=5)
display(Markdown(response.choices[0].message.content))

The query embedding is [-0.0026176702231168747, -0.039649564772844315, -0.01333328802138567, 0.007591654546558857, -0.019321225583553314, 0.022386349737644196, 0.00810068380087614, -0.0053721764124929905, -0.00648054713383317, 0.037438295781612396, -0.00040024492773227394, 0.007104518823325634, -0.05175680294632912, -0.0076080746948719025, 0.00033405059366486967, -0.04030637443065643, 0.021138405427336693, -0.004674313589930534, -0.004466323181986809, -0.015851067379117012, 0.01501910574734211, -0.02482750080525875, -0.011702204123139381, 0.034022871404886246, 0.0030596500728279352, 0.006770639214664698, 0.02259434014558792, -0.0006677591009065509, -0.03686905652284622, 0.05679236352443695, 0.04457565397024155, 0.0018924401374533772, -0.016223261132836342, -0.014712593518197536, -0.029118673875927925, 0.01690196804702282, -0.004737257957458496, -0.0019444377394393086, 0.034767258912324905, 0.014417028054594994, -0.005730686243623495, 4.985957275494002e-05, 0.002579356310889125, 0.01078

I'm sorry, but I don't have the information on how to make the best chocolate cake.

In [164]:
query2 = "How to make bread ?"
response =rag_system(query2, index, metadata, k=5)
display(Markdown(response.choices[0].message.content))

The query embedding is [-0.0218611229211092, -0.03577050939202309, -0.007010084111243486, 0.016789736226201057, -0.03975868597626686, -0.038380056619644165, 0.043525297194719315, 0.023867521435022354, 0.008991863578557968, 0.0020571735221892595, -0.0028480389155447483, -0.0124507462605834, -0.015177232213318348, -0.004114347044378519, 0.029812859371304512, -0.019559303298592567, 0.01152140274643898, -0.008425640873610973, -0.013540109619498253, 0.005745314992964268, -0.036287494003772736, 0.02456914447247982, 0.00838871393352747, -0.011718349531292915, 0.008868771605193615, 0.01152755692601204, 0.004828279837965965, -0.012315344996750355, -0.04101422429084778, 0.04190048575401306, -0.0026510918978601694, 0.020039362832903862, -0.03892166167497635, -0.00872106198221445, -0.01027817465364933, 0.010949024930596352, -0.013663201592862606, 0.006443861406296492, -0.0036558296997100115, 0.0168882105499506, -0.018340693786740303, 0.007662471383810043, -0.009822734631597996, -0.0168266631662845

To make bread, you can follow a basic recipe. Here’s a simple method based on traditional bread-making techniques:

### Basic Bread Recipe

#### Ingredients
- 4 cups of all-purpose flour
- 1 packet (2 1/4 teaspoons) of active dry yeast
- 1 1/2 cups of warm water (about 110°F or 43°C)
- 2 tablespoons of sugar
- 1 teaspoon of salt
- 2 tablespoons of olive oil (optional)

#### Instructions
1. **Activate the Yeast**: In a small bowl, combine warm water, sugar, and yeast. Let it sit for about 5-10 minutes until it becomes frothy.
   
2. **Mix Dry Ingredients**: In a large mixing bowl, combine flour and salt.

3. **Combine Ingredients**: Make a well in the center of the flour mixture and add the yeast mixture and olive oil. Mix until a dough forms.

4. **Knead the Dough**: Transfer the dough to a floured surface and knead for about 8-10 minutes until smooth and elastic.

5. **First Rise**: Place the dough in a greased bowl, cover it with a clean cloth, and let it rise in a warm place for about 1-2 hours, or until it has doubled in size.

6. **Shape the Dough**: Punch down the risen dough to release air. Shape it into a loaf or divide it into rolls.

7. **Second Rise**: Place the shaped dough into a greased loaf pan or on a baking sheet. Cover and let it rise again for about 30-60 minutes.

8. **Preheat Oven**: Preheat your oven to 375°F (190°C).

9. **Bake**: Bake the bread for 25-30 minutes, or until it sounds hollow when tapped on the bottom and is golden brown.

10. **Cool**: Remove the bread from the oven and let it cool on a wire rack before slicing.

Enjoy your homemade bread!