Packages that are required for VectorDB and Embeddings

- !pip3 install qdrant_client

- !pip3 install -U sentence-transformers
    - Dependency with pytorch version so I've used python version 3.9.6

- !pip3 install openai

# Import Libraries

In [1]:
# Python Packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Optional Packages
from typing import List, Tuple, Dict
from tqdm import tqdm

# VD Client Packages
from qdrant_client import models, QdrantClient

# Word Embeddings 
from sentence_transformers import SentenceTransformer

# OpenAI - to utilize openAI Models
import openai

# OpenAI api key
from API import api_key

  from .autonotebook import tqdm as notebook_tqdm


# Import Dataset

In [3]:
# Reading Wine Recommendation Dataset
dataset = pd.read_csv("/Volumes/Transcend/UpGrad Machine Learning and AI/Deep Reinforcement Learning/Neural_Networks/deep_learning_sans/RAG/top_rated_wines.csv")

dataset.head()

Unnamed: 0,name,region,variety,rating,notes
0,3 Rings Reserve Shiraz 2004,"Barossa Valley, Barossa, South Australia, Aust...",Red Wine,96.0,Vintage Comments : Classic Barossa vintage con...
1,Abreu Vineyards Cappella 2007,"Napa Valley, California",Red Wine,96.0,Cappella is a proprietary blend of two clones ...
2,Abreu Vineyards Cappella 2010,"Napa Valley, California",Red Wine,98.0,Cappella is one of the oldest vineyard sites i...
3,Abreu Vineyards Howell Mountain 2008,"Howell Mountain, Napa Valley, California",Red Wine,96.0,When David purchased this Howell Mountain prop...
4,Abreu Vineyards Howell Mountain 2009,"Howell Mountain, Napa Valley, California",Red Wine,98.0,"As a set of wines, it is hard to surpass the f..."


In [4]:
dataset.isnull().any() # We see there are null values

# Remove the null value records from the dataset
print("Column(Variety): No.of Records having Null Values: ", dataset[dataset["variety"].isnull()].shape[0])

# Dropping the Records that are having NULL values
dataset = dataset[~dataset["variety"].isnull()]

dataset.head()

Column(Variety): No.of Records having Null Values:  18


Unnamed: 0,name,region,variety,rating,notes
0,3 Rings Reserve Shiraz 2004,"Barossa Valley, Barossa, South Australia, Aust...",Red Wine,96.0,Vintage Comments : Classic Barossa vintage con...
1,Abreu Vineyards Cappella 2007,"Napa Valley, California",Red Wine,96.0,Cappella is a proprietary blend of two clones ...
2,Abreu Vineyards Cappella 2010,"Napa Valley, California",Red Wine,98.0,Cappella is one of the oldest vineyard sites i...
3,Abreu Vineyards Howell Mountain 2008,"Howell Mountain, Napa Valley, California",Red Wine,96.0,When David purchased this Howell Mountain prop...
4,Abreu Vineyards Howell Mountain 2009,"Howell Mountain, Napa Valley, California",Red Wine,98.0,"As a set of wines, it is hard to surpass the f..."


In [5]:
help(dataset.to_dict)

Help on method to_dict in module pandas.core.frame:

to_dict(orient: "Literal['dict', 'list', 'series', 'split', 'tight', 'records', 'index']" = 'dict', *, into: 'type[MutableMappingT] | MutableMappingT' = <class 'dict'>, index: 'bool' = True) -> 'MutableMappingT | list[MutableMappingT]' method of pandas.core.frame.DataFrame instance
    Convert the DataFrame to a dictionary.
    
    The type of the key-value pairs can be customized with the parameters
    (see below).
    
    Parameters
    ----------
    orient : str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}
        Determines the type of the values of the dictionary.
    
        - 'dict' (default) : dict like {column -> {index -> value}}
        - 'list' : dict like {column -> [values]}
        - 'series' : dict like {column -> Series(values)}
        - 'split' : dict like
          {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}
        - 'tight' : dict like
          {'index' -> [index], 

In [6]:
# Vector database supports the data in dictionary format, so we need to covert the data
data_dict = dataset.to_dict(orient="records")

# Create Embeddings

### 1. Load the pretrained Sentence Transformer Model

In [7]:
model = SentenceTransformer("all-MiniLM-L6-v2")

# Example Sentences
sentences = [
    "The weather is lovely today.",
    "It's so sunny outside!",
    "He drove to the stadium.",
]

# Calculate embeddings by calling model.encode()
embeddings = model.encode(sentences)
embeddings.shape 

(3, 384)

### 2. Create the Vector Database Client

In [8]:
qdrant = QdrantClient(":memory:") # Create in-memory Qdrant instance

qdrant

<qdrant_client.qdrant_client.QdrantClient at 0x1070fa910>

### 3. Create Collections to store Wine Rating Information

In [9]:
qdrant.recreate_collection(
    collection_name= "top_wines",
    vectors_config= models.VectorParams(
        size= model.get_sentence_embedding_dimension(), # Vector size is defined by used model
        distance = models.Distance.COSINE        
    )
)


# Vectorize!!
qdrant.upload_points(
    collection_name= "top_wines",
    points= [
        models.PointStruct(
            id= idx,
            vector= model.encode(doc["notes"]).tolist(),
            payload = doc
        ) for idx, doc in enumerate(data_dict)
    ]
)

  qdrant.recreate_collection(


### 4. A Sample Vector database search

In [10]:
search_hits = qdrant.search(
    collection_name= "top_wines",
    query_vector= model.encode("A wine from Mendroza Argentina").tolist(),
    limit = 4
)


for hits in search_hits:
    print(hits.payload, "score:" , hits.score)


{'name': 'Bruno Giacosa Barolo Le Rocche del Falletto Riserva 2001', 'region': 'Barolo, Piedmont, Italy', 'variety': 'Red Wine', 'rating': 97.0, 'notes': '"Darker and more backward than the Falletto in both its aromas and flavors, Giacosa\'s staggering 2001 Barolo Riserva Le Rocche del Falletto offers an explosive nose of spices, menthol, minerals, smoke and scorched earth followed by waves of sweet fruit that coat the palate in a potent mix of finesse and sheer power, with fine tannins, and a lingering balsamic note on the finish. This complex, multi-dimensional wine will require considerable patience and will age gracefully for several decades. Made from the oldest vines at Falletto, the 2001 Barolo Riserva Le Rocche del Falletto is another towering achievement from Bruno Giacosa. An Azienda Agricola Falletto di Bruno Giacosa bottling. To be released in 2007. Anticipated maturity: 2013-2031." - Wine Advocate'} score: 0.5781563591590686
{'name': 'La Vizcaina La Vitoriana Tinto 2018', 

  search_hits = qdrant.search(


# Create a RAG with LLM and Qdrant

In [11]:
user_prompt = "Suggest me an amazing Malbec wine from Argentina"

hits = qdrant.search(
    collection_name="top_wines",
    query_vector=model.encode(user_prompt).tolist(),
    limit=3
)
for hit in hits:
  print(hit.payload, "score:", hit.score)


# define a variable to hold the search results
search_results = [hit.payload for hit in hits]

{'name': 'Catena Zapata Argentino Vineyard Malbec 2004', 'region': 'Argentina', 'variety': 'Red Wine', 'rating': 98.0, 'notes': '"The single-vineyard 2004 Malbec Argentino Vineyard spent 17 months in new French oak. Remarkably fragrant and complex aromatically, it offers up aromas of wood smoke, creosote, pepper, clove, black cherry, and blackberry. Made in a similar, elegant style, it is the most structured of the three single vineyard wines, needing a minimum of a decade of additional cellaring. It should easily prove to be a 25-40 year wine. It is an exceptional achievement in Malbec. When all is said and done, Catena Zapata is the Argentina winery of reference – the standard of excellence for comparing all others. The brilliant, forward-thinking Nicolas Catena remains in charge, with his daughter, Laura, playing an increasingly large role. The Catena Zapata winery is an essential destination for fans of both architecture and wine in Mendoza. It is hard to believe, given the surge i

  hits = qdrant.search(


In [13]:
openai.api_key = api_key

###  1. LLM Respone without RAG results

In [38]:
completion = openai.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "developer", "content": "You are chatbot, a wine specialist. Your top priority is to help guide users into selecting amazing wine and guide them with their requests."},
        {"role": "user", "content": "Suggest me an amazing Malbec wine from Argentina"},
    ]
)

In [39]:
print(completion.choices[0].message.content)

I recommend trying the Catena Malbec from Argentina. It is one of the most renowned Malbec wines from the region, known for its rich flavors of dark fruits, spice, and velvety texture. Catena wines are highly regarded for their quality and expression of the Malbec grape. Enjoy it with grilled meats or hearty dishes for a truly delightful experience. Cheers!


### 2. LLM Response including RAG results

In [35]:
# Now time to connect to the local large language model
completion = openai.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "developer", "content": "You are chatbot, a wine specialist. Your top priority is to help guide users into selecting amazing wine and guide them with their requests."},
        {"role": "user", "content": "Suggest me an amazing Malbec wine from Argentina"},
        {"role": "assistant", "content": str(search_results)}
    ]
)

In [36]:
print(completion.choices[0].message.content)

One amazing Malbec wine from Argentina that I would recommend is the Catena Zapata Argentino Vineyard Malbec. It received a rating of 98.0 and is highly acclaimed for its complex aromas of wood smoke, pepper, black cherry, and blackberry. This wine spent 17 months in new French oak and is known for its elegance and structure. It is recommended to cellar this wine for a minimum of a decade for optimal enjoyment. Another excellent choice is the Bodega Colome Altura Maxima Malbec from Salta, Argentina, which received a rating of 96.0. This wine is crafted by winemaker Thibaut Delmotte and is considered a wine of distinction with international acclaim. The Malbec from the Altura Maxima Vineyard challenges conventions and offers a unique expression of the traditional grape variety. Both of these wines are fantastic options for experiencing the best of Malbec from Argentina.


In [37]:
search_results

[{'name': 'Catena Zapata Argentino Vineyard Malbec 2004',
  'region': 'Argentina',
  'variety': 'Red Wine',
  'rating': 98.0,
  'notes': '"The single-vineyard 2004 Malbec Argentino Vineyard spent 17 months in new French oak. Remarkably fragrant and complex aromatically, it offers up aromas of wood smoke, creosote, pepper, clove, black cherry, and blackberry. Made in a similar, elegant style, it is the most structured of the three single vineyard wines, needing a minimum of a decade of additional cellaring. It should easily prove to be a 25-40 year wine. It is an exceptional achievement in Malbec. When all is said and done, Catena Zapata is the Argentina winery of reference – the standard of excellence for comparing all others. The brilliant, forward-thinking Nicolas Catena remains in charge, with his daughter, Laura, playing an increasingly large role. The Catena Zapata winery is an essential destination for fans of both architecture and wine in Mendoza. It is hard to believe, given th