# Simple RAG Implementation

Based on [Alfredo Deza's GitHub Repository](https://github.com/alfredodeza/learn-retrieval-augmented-generation).

In this notebook we will build a simple RAG application based on a structured CSV file with wine rating. We will:
* Load the small dataset.
* Encode using vector embedding one of the columns.
* **R**etrieve some of the rows based on a query using semantic similarity.
* **A**ugment the prompt to the LLM with the retrieved data.
* **G**enerate a reply to the user's query based on the retrieved rows.

In [15]:
from rich.pretty import pprint
from rich.theme import Theme
from rich.console import Console
from rich.panel import Panel
from rich.text import Text

custom_theme = Theme({
    "repr.own": "bright_yellow",            # Class names
    "repr.tag_name": "bright_yellow",       # Adjust tag names which might still be purple
    "repr.call": "bright_yellow",           # Function calls and other symbols
    "repr.str": "bright_green",             # String representation
    "repr.number": "bright_red",            # Numbers
    "repr.attrib_name": "bright_yellow",    # Attribute names
    "repr.attrib_value": "bright_blue"      # Attribute values
})

# Apply the theme and print the object with rich formatting

console = Console(theme=custom_theme)

## Loading the Dataset

Since the data is in a simple, small and structured CSV file, we can load it using Pandas.

In [2]:
import warnings

# Suppress warnings
warnings.filterwarnings('ignore')

In [3]:
import pandas as pd

data = (
    pd
    .read_csv('data/top_rated_wines.csv')
    .query('variety.notna()')
    .reset_index(drop=True)
    .to_dict('records')
)
data[:2]

[{'name': '3 Rings Reserve Shiraz 2004',
  'region': 'Barossa Valley, Barossa, South Australia, Australia',
  'variety': 'Red Wine',
  'rating': 96.0,
  'notes': 'Vintage Comments : Classic Barossa vintage conditions. An average wet Spring followed by extreme heat in early February. Occasional rainfall events kept the vines in good balance up to harvest in late March 2004. Very good quality coupled with good average yields. More than 30 months in wood followed by six months tank maturation of the blend prior to bottling, July 2007. '},
 {'name': 'Abreu Vineyards Cappella 2007',
  'region': 'Napa Valley, California',
  'variety': 'Red Wine',
  'rating': 96.0,
  'notes': 'Cappella is a proprietary blend of two clones of Cabernet Sauvignon with Cabernet Franc, Petit Verdot and Merlot. The gravelly soil at Cappella produces fruit that is very elegant in structure. The resulting wine exhibits beautiful purity of fruit with fine grained and lengthy tannins. '}]

## Encode using Vector Embedding

We will use one of the popular open source vector databases, [Qdrant](https://qdrant.tech/), and one of the popular embedding encoder and text transformer libraries, [SentenceTransformer](https://sbert.net/).

In [4]:
from qdrant_client import models, QdrantClient
from sentence_transformers import SentenceTransformer

# create the vector database client
qdrant = QdrantClient(":memory:") # Create in-memory Qdrant instance

# Create the embedding encoder
encoder = SentenceTransformer('all-MiniLM-L6-v2') # Model to create embeddings

In [5]:
# Create collection to store the wine rating data
qdrant.recreate_collection(
    collection_name="top_wines",
    vectors_config=models.VectorParams(
        size=encoder.get_sentence_embedding_dimension(), # Vector size is defined by used model
        distance=models.Distance.COSINE
    )
)

True

### Loading the data into the vector database

We will use the (vector) collection that we created above, to go over all the `notes` column of the wine dataset, and encode it into embedding vector, and store it in the vector database. The indexing of the data to allow quick retrieval is running in the background as we load it.

This step will take a few seconds (less than a minute on my laptop).

In [6]:
# vectorize!
qdrant.upload_points(
    collection_name="top_wines",
    points=[
        models.PointStruct(
            id=idx,
            vector=encoder.encode(doc["notes"]).tolist(),
            payload=doc
        ) for idx, doc in enumerate(data) # data is the variable holding all the wines
    ]
)

## **R**etrieve sematically relevant data based on user's query

Once the data is loaded into the vector database and the indexing process is done, we can start using our simple RAG system.

In [7]:
user_prompt = "Suggest me an amazing Malbec wine from Argentina"

### Encoding the user's query

We will use the same encoder that we used to encode the document data to encode the query of the user. 
This way we can search results based on semantic similarity. 

In [8]:
query_vector = encoder.encode(user_prompt).tolist()

### Search similar rows

We can now take the embedding encoding of the user's query and use it to find similar rows in the vector database.

In [16]:
# Search time for awesome wines!

hits = qdrant.search(
    collection_name="top_wines",
    query_vector=query_vector,
    limit=3
)
table = Table(title="Search Results")

table.add_column("Name", style="cyan")
table.add_column("Region", style="magenta")
table.add_column("Variety", style="green")
table.add_column("Rating", style="yellow")
table.add_column("Score", style="red")

for hit in hits:
    table.add_row(
        hit.payload["name"],
        hit.payload["region"],
        hit.payload["variety"],
        str(hit.payload["rating"]),
        f"{hit.score:.4f}"
    )

console.print(table)

## **A**ugment the prompt to the LLM with retrieved data

In our simple example, we will simply take the top 3 results and use them as is in the prompt to the generation LLM.

In [10]:
# define a variable to hold the search results
search_results = [hit.payload for hit in hits]

## **G**enerate reply to the user's query

We will use one of the most popular generative AI LLMs from [OpenAI](https://platform.openai.com/docs/models). 

In [17]:
# Now time to connect to the local large language model
from openai import OpenAI
client = OpenAI()
completion = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are chatbot, a wine specialist. Your top priority is to help guide users into selecting amazing wine and guide them with their requests."},
        {"role": "user", "content": user_prompt},
        {"role": "assistant", "content": str(search_results)}
    ]
)

response_text = Text(completion.choices[0].message.content)
styled_panel = Panel(
    response_text,
    title="Wine Recommendation",
    expand=False,
    border_style="bold green",
    padding=(1, 1)
)

console.print(styled_panel)