# Retrieval-Augmentated Generation (RAG)

## Intro and Environment

Large language models usually give great answers, but because they're limited to the training data used to create the model. Over time they can become incomplete--or worse, generate answers that are just plain wrong. To fight the LLM challenges of having no source and outdated data RAG is used.

![Alt text](LLM_RAG.png)

Steps in the RAG Pipeline
1. Prompt Input: User provides a query/prompt. Example: "What are the effects of climate change on agriculture in South Asia?"

2. Query Embedding: The prompt is converted into a vector (embedding) using a query encoder. Often a pre-trained model like BERT, MiniLM, or sentence-transformers.

3. Document Retrieval: The query vector is used to search a vector database (e.g., FAISS, Weaviate, Elasticsearch) that contains pre-embedded documents. The system retrieves top-k relevant documents (e.g., top 5 or top 10). This is usually done via similarity search (e.g., cosine similarity).

4. Context Construction: The retrieved documents are compiled into a context window, formatted as input for the LLM. Optionally includes source citations or document metadata.

5. Generation (LLM Stage): The LLM receives the original prompt + retrieved context and generates a response. The LLM can now "ground" its answer in actual retrieved data, improving factual accuracy.

6. Output Response: The model returns a final answer, possibly citing or referencing the documents used.

![Alt text](RAG_Process.png)

Below I will implement a simple RAG pipeline using `Top Rated Wine` dataset. Let's first setup the environment.

In [1]:
# !pip install -q langchain openai
# !pip install rich

In [2]:
import rich
from rich.console import Console
from rich_theme_manager import Theme, ThemeManager
import pathlib
from rich.style import Style

THEMES = [
    Theme(
        name="dark",
        description="Dark mode theme",
        tags=["dark"],
        styles={
            "repr.own": Style(color="#e87d3e", bold=True),      # Class names
            "repr.tag_name": "dim cyan",                        # Adjust tag names 
            "repr.call": "bright_yellow",                       # Function calls and other symbols
            "repr.str": "bright_green",                         # String representation
            "repr.number": "bright_red",                        # Numbers
            "repr.none": "dim white",                           # None
            "repr.attrib_name": Style(color="#e87d3e", bold=True),    # Attribute names
            "repr.attrib_value": "bright_blue",                 # Attribute values
            "default": "bright_white on black"                  # Default text and background
        },
    ),
    Theme(
        name="light",
        description="Light mode theme",
        styles={
            "repr.own": Style(color="#22863a", bold=True),          # Class names
            "repr.tag_name": Style(color="#00bfff", bold=True),     # Adjust tag names 
            "repr.call": Style(color="#ffff00", bold=True),         # Function calls and other symbols
            "repr.str": Style(color="#008080", bold=True),          # String representation
            "repr.number": Style(color="#ff6347", bold=True),       # Numbers
            "repr.none": Style(color="#808080", bold=True),         # None
            "repr.attrib_name": Style(color="#ffff00", bold=True),  # Attribute names
            "repr.attrib_value": Style(color="#008080", bold=True), # Attribute values
            "default": Style(color="#000000", bgcolor="#ffffff"),   # Default text and background
        },
    ),
]

theme_dir = pathlib.Path("themes").expanduser()
theme_dir.expanduser().mkdir(parents=True, exist_ok=True)

theme_manager = ThemeManager(theme_dir=theme_dir, themes=THEMES)
theme_manager.list_themes()

dark = theme_manager.get("dark")
theme_manager.preview_theme(dark) 

from rich.console import Console

dark = theme_manager.get("dark")
# Create a console with the dark theme
console = Console(theme=dark)

import warnings

# Suppress warnings
warnings.filterwarnings('ignore')

## Loading Data

In [3]:
import pandas as pd
data = pd.read_csv("top_rated_wines.csv")
data.head()

Unnamed: 0,name,region,variety,rating,notes
0,3 Rings Reserve Shiraz 2004,"Barossa Valley, Barossa, South Australia, Aust...",Red Wine,96.0,Vintage Comments : Classic Barossa vintage con...
1,Abreu Vineyards Cappella 2007,"Napa Valley, California",Red Wine,96.0,Cappella is a proprietary blend of two clones ...
2,Abreu Vineyards Cappella 2010,"Napa Valley, California",Red Wine,98.0,Cappella is one of the oldest vineyard sites i...
3,Abreu Vineyards Howell Mountain 2008,"Howell Mountain, Napa Valley, California",Red Wine,96.0,When David purchased this Howell Mountain prop...
4,Abreu Vineyards Howell Mountain 2009,"Howell Mountain, Napa Valley, California",Red Wine,98.0,"As a set of wines, it is hard to surpass the f..."


In [4]:
data = data.query("variety.notna()").reset_index(drop=True).to_dict(orient="records")
console.print(data[:2])


## Encode using Vector Embedding

In [5]:
# %pip install sentence-transformers

from qdrant_client import models, QdrantClient
from sentence_transformers import SentenceTransformer

# create the vector database client
qdrant = QdrantClient(":memory:") # Create in-memory Qdrant instance

# Create the embedding encoder
encoder = SentenceTransformer("all-MiniLM-L6-v2")



In [6]:
# Create collection to store the wine rating data
collection_name = 'top_wines'

qdrant.recreate_collection(collection_name = collection_name, 
                           vectors_config = models.VectorParams(size = encoder.get_sentence_embedding_dimension(), distance = models.Distance.COSINE))

True

## Loading the data into the vector database

We will use the vector collection that we created above, to go over all the notes column of the wine dataset, and encode it into embedding vector, and store it in the vector database. The indexing of the data to allow quick retrieval is running in the background as we load it.

In [7]:
# vectorize

qdrant.upload_points(
    collection_name = collection_name,
    points = [models.PointStruct(
        id = idx,
        vector = encoder.encode(doc["notes"]).tolist(),
        payload = doc
    ) for idx, doc in enumerate(data)] # data is the variable holding all the wines
)

In [8]:
console.print(qdrant.get_collection(collection_name = collection_name))

## Retrieve sematically relevant data based on user's query

Once the data is loaded into the vector database and the indexing process is done, we can start using our simple RAG system.

In [9]:
user_prompt = "Suggest me an amazing Malbec wine from Argentina"

#### Encoding the user's query
We will use the same encoder that we used to encode the document data to encode the query of the user. This way we can search results based on semantic similarity.



In [10]:
query_vector = encoder.encode(user_prompt).tolist()

#### Search similar rows
We can now take the embedding encoding of the user's query and use it to find similar rows in the vector database.

In [11]:
hits = qdrant.search(collection_name = collection_name, query_vector = query_vector, limit = 3)

In [12]:
from rich.console import Console
from rich.text import Text
from rich.table import Table

table = Table(title="Retrieval Results", show_lines=True)

table.add_column("Name", style="#e87d3e")
table.add_column("Region", style="bright_red")
table.add_column("Variety", style="green")
table.add_column("Rating", style="yellow")
table.add_column("Notes", style="#89ddff")
table.add_column("Score", style="#a6accd")

for hit in hits:
    table.add_row(
        hit.payload["name"],
        hit.payload["region"],
        hit.payload["variety"],
        str(hit.payload["rating"]),
        f'{hit.payload["notes"][:50]}...',
        f"{hit.score:.4f}"
    )

console.print(table)

#### Augment the prompt to the LLM with retrieved data
In our simple example, we will simply take the top 3 results and use them as is in the prompt to the generation LLM.

#### Generate reply to the user's query

We will use one of the most popular generative AI LLMs from OpenAI.
 

In [13]:
from dotenv import load_dotenv
load_dotenv()

False

First let's try without Retrieval. We can ask the LLM to recommend based only on the user prompt.

In [14]:
# # Now time to connect to the large language model
# from openai import OpenAI
# from rich.panel import Panel
# import os

# OPENAI_API_KEY = "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"
# client = OpenAI(api_key=OPENAI_API_KEY)
# completion = client.chat.completions.create(
#     model="gpt-3.5-turbo",
#     messages= [
#         {"role": "system", "content": "You are chatbot, a wine specialist. Your top priority is to help guide users into selecting amazing wine and guide them with their requests."},
#         {"role": "user", "content": user_prompt},
#         {"role": "assistant", "content": "Here is my wine recommendation:"}
#     ]
# )

# response_text = Text(completion.choices[0].message.content)
# styled_panel = Panel(
#     response_text,
#     title="Wine Recommendation without Retrieval",
#     expand=False,
#     border_style="bold green",
#     padding=(1, 1)
# )

# console.print(styled_panel)


In [15]:
# # %pip install xai_sdk

# import xai_sdk
# from rich.panel import Panel
# import os
# import asyncio

# async def main():
#     x_AI_key = "xai-ZHX3p8n49z98g1rvBsR9EuIyV6oPwDboQyJv0XMaOXGm1N0PBKa9jStYMp7K9GN6DZejUeb4FRdduXi0"
#     client = xai_sdk.Client(api_key=x_AI_key)

#     # Make a request to Grok
#     completion = client.chat.completions.create(
#         model="grok-3",  # Specify Grok model (check xAI docs for exact model name)
#         messages=[
#             {"role": "system", "content": "You are a wine specialist chatbot. Your top priority is to guide users in selecting amazing wines and assist with their requests."},
#             {"role": "user", "content": user_prompt},
#             {"role": "assistant", "content": "Here is my wine recommendation:"}
#         ]
#     )

#     # Extract response
#     response_text = Text(completion.choices[0].message.content)

#     # Display in a styled panel
#     styled_panel = Panel(
#         response_text,
#         title="Wine Recommendation without Retrieval",
#         expand=False,
#         border_style="bold green",
#         padding=(1, 1)
#     )

#     console.print(styled_panel)


# # Run the async function
# # asyncio.run(main())
# await main()


In [16]:
# from openai import OpenAI
# import xai_sdk

# XAI_API_KEY = "xai-ZHX3p8n49z98g1rvBsR9EuIyV6oPwDboQyJv0XMaOXGm1N0PBKa9jStYMp7K9GN6DZejUeb4FRdduXi0"
# client = OpenAI(
#   api_key = XAI_API_KEY,
#   base_url = "https://api.x.ai/v1",
# )

# completion = client.chat.completions.create(
#   model="grok-3",
#   messages=[
#         {"role": "system", "content": "You are a wine specialist chatbot. Your top priority is to guide users in selecting amazing wines and assist with their requests."},
#         {"role": "user", "content": user_prompt},
#         {"role": "assistant", "content": "Here is my wine recommendation:"}
#     ]
# )

# # Extract response
# response_text = Text(completion.choices[0].message.content)

# # Display in a styled panel
# styled_panel = Panel(
#     response_text,
#     title="Wine Recommendation without Retrieval",
#     expand=False,
#     border_style="bold green",
#     padding=(1, 1)
# )

# console.print(styled_panel)

In [17]:

from rich.panel import Panel
from rich.text import Text
import openai  # Use Together's OpenAI-compatible API

# Set Together AI credentials
openai.api_key = "48aa31dc9909a41fffc2c2b9ff0f9e91aa3baf52d86dbcff0cd16a8b60904ced"  
openai.api_base = "https://api.together.xyz/v1"

# Create the final prompt using retrieved context (optional)
# Example: just use user prompt for now
completion = openai.ChatCompletion.create(
    model="mistralai/Mixtral-8x7B-Instruct-v0.1",
    messages=[
        {"role": "system", "content": "You are chatbot, a wine specialist. Your top priority is to help guide users into selecting amazing wine and guide them with their requests."},
        {"role": "user", "content": user_prompt},
        {"role": "assistant", "content": "Here is my wine recommendation:"}
    ]
)

# Display output in rich console
response_text = Text(completion.choices[0].message["content"])
styled_panel = Panel(
    response_text,
    title="Wine Recommendation without Retrieval",
    expand=False,
    border_style="bold green",
    padding=(1, 1)
)

console.print(styled_panel)

Now, let's add Retrieval Results. The recommendation sounds great, however, we don't have this wine in our inventory and menu. Moreover, new wines may be newly available that were not part of the pre-training of the LLM.
We will run the same query with the Retrieval results and get better recommendations for our business needs.

In [20]:
# define a variable to hold the search results
search_results = [hit.payload for hit in hits]

completion = openai.ChatCompletion.create(
    model="mistralai/Mixtral-8x7B-Instruct-v0.1",
    messages=[
        {"role": "system", "content": "You are chatbot, a wine specialist. Your top priority is to help guide users into selecting amazing wine and guide them with their requests."},
        {"role": "user", "content": user_prompt},
        {"role": "assistant", "content": str(search_results)}
    ]
)

response_text = Text(completion.choices[0].message.content)
styled_panel = Panel(
    response_text,
    title="Wine Recommendation with Retrieval",
    expand=False,
    border_style="bold green",
    padding=(1, 1)
)

console.print(styled_panel)