## Building A RAG System with Free GPU, MongoDB and Open Source Models
### Step 1: Installing Libraries
The shell command sequence below installs libraries for leveraging open-source large language models (LLMs), embedding models, and database interaction functionalities. These libraries simplify the development of a RAG system, reducing the complexity to a small amount of code:

* PyMongo: A Python library for interacting with MongoDB that enables functionalities to connect to a cluster and query data stored in collections and documents.
* Pandas: Provides a data structure for efficient data processing and analysis using Python
* Hugging Face datasets: Holds audio, vision, and text datasets
* Hugging Face Accelerate: Abstracts the complexity of writing code that leverages hardware accelerators such as GPUs. * Accelerate is leveraged in the implementation to utilise the Gemma model on GPU resources.
* Hugging Face Transformers: Access to a vast collection of pre-trained models
* Hugging Face Sentence Transformers: Provides access to sentence, text, and image embeddings.

In [None]:
!pip install -qq datasets transformers accelerate bitsandbytes sentence_transformers "pymongo[srv]"

### Step 2: Data sourcing and preparation

[Data](https://huggingface.co/datasets/Hieu-Pham/kaggle_food_recipes)

In [None]:
# Load Dataset
from datasets import load_dataset
import pandas as pd

# https://huggingface.co/datasets/AIatMongoDB/embedded_movies
dataset = load_dataset("Hieu-Pham/kaggle_food_recipes")

# Convert the dataset to a pandas dataframe
dataset_df = pd.DataFrame(dataset["train"])

dataset_df.head(5)

In [None]:
print(dataset_df.isnull().sum())

In [None]:
# Data Preparation
#https://huggingface.co/docs/datasets/v2.19.0/how_to
# Remove data point where instructions is missing
dataset_df = dataset_df.dropna(subset=["Instructions"])
print("\nNumber of missing values in each column after removal:")
print(dataset_df.isnull().sum())

In [None]:
dataset_df["Instructions"].head()

In [None]:
dataset_df = dataset_df[:100]

### Step 3: Generating embeddings
The steps in the code snippets are as follows:
1. Import the SentenceTransformer class to access the embedding models.
2. Load the embedding model using the SentenceTransformer constructor to instantiate the gte-large embedding model.
3. Define the get_embedding function, which takes a text string as input and returns a list of floats representing the embedding. The function first checks if the input text is not empty (after stripping whitespace). If the text is empty, it returns an empty list. Otherwise, it generates an embedding using the loaded model.
4. Generate embeddings by applying the get_embedding function to the “Instruction” column of the dataset_df DataFrame, generating embeddings for each recipe’s instruction. The resulting list of embeddings is assigned to a new column named embedding.

<i>Note: It’s not necessary to chunk the text in the full plot, as we can ensure that the text length remains within a manageable range.</i>


In [None]:
#embed instructions
from sentence_transformers import SentenceTransformer

# https://huggingface.co/thenlper/gte-large
embedding_model = SentenceTransformer("thenlper/gte-large")


def get_embedding(text: str) -> list[float]:
    if not text.strip():
        print("Attempted to get embedding for empty text.")
        return []

    embedding = embedding_model.encode(text)

    return embedding.tolist()

In [None]:
dataset_df["embedding"] = dataset_df["Instructions"].apply(get_embedding)
dataset_df.head()

In [None]:
# drop columns not required
dataset_df = dataset_df.drop(columns=["Unnamed: 0","Image_Name", "Ingredients"])

In [None]:
dataset_df.head()

### Step 4: Database setup and connection
MongoDB acts as both an operational and a vector database. It offers a database solution that efficiently stores, queries and retrieves vector embeddings—the advantages of this lie in the simplicity of database maintenance, management and cost.

To create a new MongoDB database, set up a database cluster:

1. Head over to MongoDB official site and register for a [free MongoDB Atlas account](https://www.mongodb.com/cloud/atlas/register?utm_campaign=devrel&utm_source=community&utm_medium=cta&utm_content=Partner%20Cookbook&utm_term=richmond.alake), or for existing users, sign into [MongoDB Atlas](https://account.mongodb.com/account/login?utm_campaign=devrel&utm_source=community&utm_medium=cta&utm_content=Partner%20Cookbook&utm_term=richmond.alakee).

2. Select the ‘Database’ option on the left-hand pane, which will navigate to the Database Deployment page, where there is a deployment specification of any existing cluster. Create a new database cluster by clicking on the “+Create” button.

3. Select all the applicable configurations for the database cluster. Once all the configuration options are selected, click the “Create Cluster” button to deploy the newly created cluster. MongoDB also enables the creation of free clusters on the “Shared Tab”.

<i>Note: Don’t forget to whitelist the IP for the Python host or 0.0.0.0/0 for any IP when creating proof of concepts.</i>

4. After successfully creating and deploying the cluster, the cluster becomes accessible on the ‘Database Deployment’ page.

5. Click on the “Connect” button of the cluster to view the option to set up a connection to the cluster via various language drivers.

Requires the cluster’s URI(unique resource identifier). 

### 4.1 Database and Collection Setup
Before moving forward, ensure the following prerequisites are met

* Database cluster set up on MongoDB Atlas
* Obtained the URI to your cluster
For assistance with database cluster setup and obtaining the URI, [refer to our guide for setting up a MongoDB cluster](https://www.mongodb.com/docs/guides/atlas/cluster/) and getting your [connection string](https://www.mongodb.com/docs/guides/atlas/connection-string/)

Once you have created a cluster, create the database and collection within the MongoDB Atlas cluster by clicking + Create Database in the cluster overview page.

Here is a guide for creating a [database and collection](https://www.mongodb.com/basics/create-database)

The database will be named recipes.

The collection will be named instructions.

### Step 5: Create a Vector Search Index
At this point make sure that your vector index is created via MongoDB Atlas.

This next step is mandatory for conducting efficient and accurate vector-based searches based on the vector embeddings stored within the documents in the instructions collection.

Creating a Vector Search Index enables the ability to traverse the documents efficiently to retrieve documents with embeddings that match the query embedding based on vector similarity.

Go here to read more about [MongoDB Vector Search Index](https://www.mongodb.com/docs/atlas/atlas-search/field-types/knn-vector/).

```
{
 "fields": [{
     "numDimensions": 1024,
     "path": "embedding",
     "similarity": "cosine",
     "type": "vector"
   }]
}
```

The 1024 value of the numDimension field corresponds to the dimension of the vector generated by the gte-large embedding model. If you use the gte-base or gte-small embedding models, the numDimension value in the vector search index must be set to 768 and 384, respectively.

### Step 6: Establish Data Connection
The code snippet below also utilises PyMongo to create a MongoDB client object, representing the connection to the cluster and enabling access to its databases and collections.

In [None]:
import pymongo
from pymongo.mongo_client import MongoClient
from pymongo.server_api import ServerApi

def get_mongo_client(mongo_uri):
    """Establish connection to the MongoDB."""
    try:
        client = MongoClient(mongo_uri, server_api=ServerApi('1'))
        print("Connection to MongoDB successful")
        return client
    except pymongo.errors.ConnectionFailure as e:
        print(f"Connection failed: {e}")
        return None

URI = uri
mongo_uri = uri

if not mongo_uri:
    print("MONGO_URI not set in environment variables")

mongo_client = get_mongo_client(mongo_uri)

# Ingest data into MongoDB
db = mongo_client["recipes"]
collection = db["instructions"]

In [None]:
# Delete any existing records in the collection
#collection.delete_many({})

Ingesting data into a MongoDB collection from a pandas DataFrame is a straightforward process that can be efficiently accomplished by converting the DataFrame into dictionaries and then utilising the insert_many method on the collection to pass the converted dataset records.

In [None]:
# https://www.mongodb.com/docs/atlas/atlas-vector-search/tutorials/vector-search-quick-start/
documents = dataset_df.to_dict("records")
collection.insert_many(documents)
print("Data ingestion into MongoDB completed")


### What is vector search?
Vector search is a capability that allows you to do semantic search where you are searching data based on meaning. This technique employs machine learning models, often called encoders, to transform text, audio, images, or other types of data into high-dimensional vectors. These vectors capture the semantic meaning of the data, which can then be searched through to find similar content based on vectors being “near” one another in a high-dimensional space. 

This can be a great compliment to traditional keyword-based search techniques but is also seeing an explosion of excitement because of its relevance to augment the capabilities of large language models (LLMs) by providing ground truth outside of what the LLMs “know.” In search use cases, this allows you to find relevant results even when the exact wording isn't known. This technique can be useful in a variety of contexts, such as natural language processing and recommendation systems.

#### Benefits of vector search with MongoDB
Efficiency: By storing the vectors together with the original data, you avoid the need to sync data between your application database and your vector store at both query and write time.
1. Consistency: Storing the vectors with the data ensures that the vectors are always associated with the correct data. This can be important in situations where the vector generation process might change over time. By storing the vectors, you can be sure that you always have the correct vector for a given piece of data.
2. Simplicity: Storing vectors with the data simplifies the overall architecture of your application. You don't need to maintain a separate service or database for the vectors, reducing the complexity and potential points of failure in your system.
3. Scalability: With the power of MongoDB Atlas, vector search on MongoDB scales horizontally and vertically, allowing you to power the most demanding workloads.

### Step 7: Perform Vector Search on User Queries
The following step implements a function that returns a vector search result by generating a query embedding and defining a MongoDB aggregation pipeline.

The pipeline, consisting of the 
* vectorSearch and 
* project stages, 

executes queries using the generated vector and formats the results to include only the required information, such as plot, title, and genres while incorporating a search score for [each result](https://huggingface.co/learn/cookbook/en/rag_with_hugging_face_gemma_mongodb).

In [None]:
def vector_search(user_query, collection):
    """
    Perform a vector search in the MongoDB collection based on the user query.

    Args:
    user_query (str): The user's query string.
    collection (MongoCollection): The MongoDB collection to search.

    Returns:
    list: A list of matching documents.
    """

    # Generate embedding for the user query
    query_embedding = get_embedding(user_query)

    if query_embedding is None:
        return "Invalid query or embedding generation failed."

    # Define the vector search pipeline
    pipeline = [
        {
            "$vectorSearch": {
                "index": "vector_index",
                "queryVector": query_embedding,
                "path": "embedding",
                "numCandidates": 100,  # Number of candidate matches to consider (100 sample)
                "limit": 6,  # Return top 6 matches
            }
        },
        {
            "$project": {
                "_id": 0,  # Exclude the _id field
                "Title": 1,  # Include the title field
                "Cleaned_Ingredients": 1,
                "Instructions": 1,  # Include the Instructions field
                "score": {"$meta": "vectorSearchScore"},  # Include the search score
            }
        },
    ]


    # Execute the search
    results = collection.aggregate(pipeline)
    return list(results)

### Step 8: Handling user queries

In [None]:
def get_search_result(query, collection):

    get_knowledge = vector_search(query, collection)

    search_result = ""
    for result in get_knowledge:
        search_result += f"""Title: {result.get('Title', 'N/A')}, Ingredients: {result.get('Cleaned_Ingredients', 'N/A')}, 
                           "Instructions": {result.get('Instructions', 'N/A')}\n"""
    return search_result

In [None]:
# Conduct query with retrival of sources
query = "Which is the best recipe for a light and healthy meal for lunch?"
source_information = get_search_result(query, collection)


In [None]:
from huggingface_hub import notebook_login, Repository

# Login to Hugging Face
notebook_login()

### Understanding Mistral 7B
Mistral 7B is a new 7.3 billion parameter language model that represents a major advance in large language model (LLM) capabilities. It has outperformed the 13 billion parameter Llama 2 model on all tasks and outperforms the 34 billion parameter Llama 1 on many benchmarks.

We will create 4-bit quantization with NF4-type configuration using BitsAndBytes to load our model in 4-bit precision. It will help us load the model faster and reduce the memory footprint so that it can be run on [Google Colab](https://www.datacamp.com/tutorial/mistral-7b-tutorial) or consumer GPUs.

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import BitsAndBytesConfig
import torch
import time

model_name = "mistralai/Mistral-7B-Instruct-v0.2"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    llm_int8_enable_fp32_cpu_offload=True,
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(model_name, 
                                             device_map="auto", 
                                             quantization_config=bnb_config)
                                             
tokenizer = AutoTokenizer.from_pretrained(model_name, 
                                          use_fast=True, 
                                          quantization_config=bnb_config) 

In [None]:
torch.backends.cuda.enable_mem_efficient_sdp(False)
torch.backends.cuda.enable_flash_sdp(False)

In [None]:
messages = [
    {"role": "user", 
     "content": f"""
     Answer the query by using the source information.
     #Query
    {query}
    /n
    #Source Information
    {source_information}
    Give only one recipe.
     """}
]

encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")
device = "cuda" 
model_inputs = encodeds.to(device)

generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0].split('[/INST]')[-1])

Based on the source information provided, here's the Kimchi Toast recipe:

Ingredients:
- 4 oz. cream cheese, room temperature
- ¾ cup finely chopped kimchi; plus more for serving (optional)
- 2 scallions, thinly sliced
- 1 cup cilantro leaves with tender stems
- ½ lime
- Kosher salt
- 4 (¾"-thick) slices country-style bread, grilled or toasted
- Chili oil and toasted white sesame seeds (for serving)

Instructions:
1. Mix cream cheese and kimchi in a medium bowl.
2. Toss scallions and cilantro in a small bowl to combine.
3. Squeeze in juice from lime, season with salt, and toss again.
4. Smear kimchi cream cheese over each slice of bread.
5. Top with scallion salad.
6. Drizzle with chili oil.
7. Sprinkle with sesame seeds.

Enjoy your light and healthy lunch of Kimchi Toast!</s>