Packages that are required for VectorDB and Embeddings

- !pip3 install qdrant_client

- !pip3 install -U sentence-transformers
    - Dependency with pytorch version so I've used python version 3.9.6

- !pip3 install openai

# Import Libraries

In [41]:
# Python Packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Optional Packages
from typing import List, Tuple, Dict
from tqdm import tqdm

# VD Client Packages
from qdrant_client import models, QdrantClient

# Word Embeddings 
from sentence_transformers import SentenceTransformer

# OpenAI - to utilize openAI Models
import openai

# OpenAI api key
from API import api_key

# Import Dataset

In [42]:
# Reading Wine Recommendation Dataset
dataset = pd.read_csv("/Volumes/Transcend/UpGrad Machine Learning and AI/Deep Reinforcement Learning/Neural_Networks/deep_learning_sans/RAG/top_rated_wines.csv")

dataset.head()

Unnamed: 0,name,region,variety,rating,notes
0,3 Rings Reserve Shiraz 2004,"Barossa Valley, Barossa, South Australia, Aust...",Red Wine,96.0,Vintage Comments : Classic Barossa vintage con...
1,Abreu Vineyards Cappella 2007,"Napa Valley, California",Red Wine,96.0,Cappella is a proprietary blend of two clones ...
2,Abreu Vineyards Cappella 2010,"Napa Valley, California",Red Wine,98.0,Cappella is one of the oldest vineyard sites i...
3,Abreu Vineyards Howell Mountain 2008,"Howell Mountain, Napa Valley, California",Red Wine,96.0,When David purchased this Howell Mountain prop...
4,Abreu Vineyards Howell Mountain 2009,"Howell Mountain, Napa Valley, California",Red Wine,98.0,"As a set of wines, it is hard to surpass the f..."


In [43]:
dataset.isnull().any() # We see there are null values

# Remove the null value records from the dataset
print("Column(Variety): No.of Records having Null Values: ", dataset[dataset["variety"].isnull()].shape[0])

# Dropping the Records that are having NULL values
dataset = dataset[~dataset["variety"].isnull()]

dataset.head()

Column(Variety): No.of Records having Null Values:  18


Unnamed: 0,name,region,variety,rating,notes
0,3 Rings Reserve Shiraz 2004,"Barossa Valley, Barossa, South Australia, Aust...",Red Wine,96.0,Vintage Comments : Classic Barossa vintage con...
1,Abreu Vineyards Cappella 2007,"Napa Valley, California",Red Wine,96.0,Cappella is a proprietary blend of two clones ...
2,Abreu Vineyards Cappella 2010,"Napa Valley, California",Red Wine,98.0,Cappella is one of the oldest vineyard sites i...
3,Abreu Vineyards Howell Mountain 2008,"Howell Mountain, Napa Valley, California",Red Wine,96.0,When David purchased this Howell Mountain prop...
4,Abreu Vineyards Howell Mountain 2009,"Howell Mountain, Napa Valley, California",Red Wine,98.0,"As a set of wines, it is hard to surpass the f..."


In [44]:
help(dataset.to_dict)

Help on method to_dict in module pandas.core.frame:

to_dict(orient: "Literal['dict', 'list', 'series', 'split', 'tight', 'records', 'index']" = 'dict', *, into: 'type[MutableMappingT] | MutableMappingT' = <class 'dict'>, index: 'bool' = True) -> 'MutableMappingT | list[MutableMappingT]' method of pandas.core.frame.DataFrame instance
    Convert the DataFrame to a dictionary.
    
    The type of the key-value pairs can be customized with the parameters
    (see below).
    
    Parameters
    ----------
    orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}
        Determines the type of the values of the dictionary.
    
        - 'dict' (default) : dict like {column -> {index -> value}}
        - 'list' : dict like {column -> [values]}
        - 'series' : dict like {column -> Series(values)}
        - 'split' : dict like
          {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}
        - 'tight' : dict like
          {'index' -> [index], 

In [45]:
# Vector database supports the data in dictionary format, so we need to covert the data
data_dict = dataset.to_dict(orient="records")

# Create Embeddings

### 1. Load the pretrained Sentence Transformer Model

In [46]:
model = SentenceTransformer("all-MiniLM-L6-v2")

# Example Sentences
sentences = [
    "The weather is lovely today.",
    "It's so sunny outside!",
    "He drove to the stadium.",
]

# Calculate embeddings by calling model.encode()
embeddings = model.encode(sentences)
embeddings.shape 

(3, 384)

### 2. Create the Vector Database Client

In [48]:
qdrant = QdrantClient(":memory:") # Create in-memory Qdrant instance

qdrant

<qdrant_client.qdrant_client.QdrantClient at 0x324600a00>

### 3. Create Collections to store Wine Rating Information

In [49]:
qdrant.recreate_collection(
    collection_name= "top_wines",
    vectors_config= models.VectorParams(
        size= model.get_sentence_embedding_dimension(), # Vector size is defined by used model
        distance = models.Distance.COSINE        
    )
)


# Vectorize!!
qdrant.upload_points(
    collection_name= "top_wines",
    points= [
        models.PointStruct(
            id= idx,
            vector= model.encode(doc["notes"]).tolist(),
            payload = doc
        ) for idx, doc in enumerate(data_dict)
    ]
)

  qdrant.recreate_collection(


### 4. A Sample Vector database search

In [52]:
def get_wine_recommendations(query: str, top_k: int = 5) -> List[Dict]:
    query_vector = model.encode(query).tolist()
    search_result = qdrant.search(
        collection_name="top_wines",
        query_vector=query_vector,
        limit=top_k
    )
    return [hit.payload for hit in search_result]


get_wine_recommendations(query = "A wine from Mendroza Argentina")

  search_result = qdrant.search(


[{'name': 'Bruno Giacosa Barolo Le Rocche del Falletto Riserva 2001',
  'region': 'Barolo, Piedmont, Italy',
  'variety': 'Red Wine',
  'rating': 97.0,
  'notes': '"Darker and more backward than the Falletto in both its aromas and flavors, Giacosa\'s staggering 2001 Barolo Riserva Le Rocche del Falletto offers an explosive nose of spices, menthol, minerals, smoke and scorched earth followed by waves of sweet fruit that coat the palate in a potent mix of finesse and sheer power, with fine tannins, and a lingering balsamic note on the finish. This complex, multi-dimensional wine will require considerable patience and will age gracefully for several decades. Made from the oldest vines at Falletto, the 2001 Barolo Riserva Le Rocche del Falletto is another towering achievement from Bruno Giacosa. An Azienda Agricola Falletto di Bruno Giacosa bottling. To be released in 2007. Anticipated maturity: 2013-2031." - Wine Advocate'},
 {'name': 'La Vizcaina La Vitoriana Tinto 2018',
  'region': 'Bi

# Create a RAG with LLM and Qdrant

In [56]:
user_prompt = "Suggest me an amazing Malbec wine from Argentina"

wine_recommendations = get_wine_recommendations(query = user_prompt)

  search_result = qdrant.search(


In [54]:
openai.api_key = api_key

### 1. LLM Response 

In [60]:
def generate_response(user_prompt, recommendations: List[Dict]):

    if recommendations is None:
        completion = openai.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "developer", "content": "You are chatbot, a wine specialist. Your top priority is to help guide users into selecting amazing wine and guide them with their requests."},
            {"role": "user", "content": user_prompt},
            ]
        )

        return completion.choices[0].message.content
    else:
        completion = openai.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "developer", "content": "You are chatbot, a wine specialist. Your top priority is to help guide users into selecting amazing wine and guide them with their requests."},
            {"role": "user", "content": user_prompt},
            {"role": "assistant", "content": str(recommendations)}
            ]
        )
        
        return completion.choices[0].message.content


###  2. LLM Respone without RAG results

In [61]:
generate_response(user_prompt = "Suggest me an amazing Malbec wine from Argentina",
                recommendations= None)

"I recommend trying Alamos Malbec from Mendoza, Argentina. It offers a rich and velvety texture with flavors of dark fruits, spices, and a hint of vanilla. It's a great representation of the bold and fruity characteristics that Malbec wines from Argentina are known for. Enjoy!"

### 3. LLM Response including RAG results

In [63]:
print(generate_response(user_prompt = "Suggest me an amazing Malbec wine from Argentina",
                recommendations= wine_recommendations))

Here are some amazing Malbec wines from Argentina that you may enjoy:

1. Catena Zapata Argentino Vineyard Malbec 2004: This wine received a rating of 98 points and is described as remarkably fragrant and complex aromatically with aromas of wood smoke, clove, and black cherry. It is recommended for additional cellaring for 10 years or more.
   
2. Bodega Colome Altura Maxima Malbec 2012: This Malbec from Salta, Argentina, received a rating of 96 points and is known for being a wine of distinction crafted by winemaker Thibaut Delmotte. It embodies two extremes and challenges convention in the modern viticultural world.

3. Catena Zapata Adrianna Vineyard Malbec 2004: This Malbec from Argentina achieved a rating of 97 points and offers aromas of wood smoke, black cherry, and blackberry liqueur. It is described as opulent and full-flavored, making it a pleasure to enjoy now or allow to evolve for a decade.

4. Catena Zapata Nicasia Vineyard Malbec 2004: With a rating of 96 points, this Ma