[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sujee/mongodb-atlas-vector-search/blob/main/lab-4-rag/rag-10k-b-query-open-embeddings-mistral-llm.ipynb)

#  RAG 10k Query: Open Embeddings with Mitral LLM

Here is the overall RAG pipeline.   In this notebook, we will do steps (2), (3) and (4)
- Step-1: populating embeddings.  It is already done in this notebook [rag-10k-a-populate-embeddings-mistral.ipynb](https://github.com/sujee/mongodb-atlas-vector-search/blob/main/lab-4-rag/rag-10k-a-populate-embeddings-mistral.ipynb)
- 👉 Step 2: Calculate embedding for user query
- 👉 Step 3 & 4: Send the query to Atlas to retrieve relevant documents
- 👉 Step-4: Send the query and relevant documents (returned above step) to LLM and get answers to our query

![image missing](https://raw.githubusercontent.com/sujee/mongodb-atlas-vector-search/main/images/rag-1.svg)

### What you need to run this notebook

- a (free) MongoDB Atlas Account
- and connection credentials
- a Mistral API Key

### This lab depends on:

- We assume we have processed PDF documents, calculated embeddings and loaded them into Atlas.  Refer to this notebook : [rag-10k-a-populate-embeddings-mistral.ipynb](https://github.com/sujee/mongodb-atlas-vector-search/blob/main/lab-4-rag/rag-10k-a-populate-embeddings-mistral.ipynb)

### The Stack

- Langugage : Python
- Vector database: Atlas
- Embedding Model: open source embedding model (runs locally)
- LLM: Mistral (access via API)


### How to run

This notebook can be run on Google Colab and stand alone python development environments.  Click here to run on colab.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sujee/mongodb-atlas-vector-search/blob/main/lab-4-rag/rag-10k-b-query-open-embeddings-mistral-llm.ipynb)



## Step-1: Make sure documents is loaded in Atlas

This is done in this notebook: [rag-10k-a-populate-embeddings-local.ipynb](https://github.com/sujee/mongodb-atlas-vector-search/blob/main/lab-4-rag/rag-10k-a-populate-embeddings-local.ipynb)

Please complete this first.

## Step-2: Configuration

We will setup some common configurations here

In [1]:
# We will keep all global variables in an object to not pollute the global namespace.
class MyConfig(object):
    pass

MY_CONFIG = MyConfig()

MY_CONFIG.DB_NAME = 'rag1'
MY_CONFIG.COLLECTION_NAME = '10k_local'
MY_CONFIG.EMBEDDING_ATTRIBUTE = 'embedding_local'
MY_CONFIG.INDEX_NAME = 'idx_embedding_local'

## Embedding settings
## Option 1 : small model - about 133 MB size
## Option 2 : large model - about 1.34 GB
## See Step-12 for more details

MY_CONFIG.EMBEDDING_MODEL = "BAAI/bge-small-en-v1.5"


## Step-3: Load Configuration

We need to configure the following
- Atlas connection credentials

### Option 3A - If running on Colab

- Click on 'Colab secrets' icon (🔑) on left pane, and crate the following secrets.
   - `ATLAS_URI`
   - `MISTRAL_API_KEY`
-  Make sure the `notebook access` button is checked on for all
- See screenshot below for example

<!-- ![](../images/colab-secret-2.png) -->

![](https://raw.githubusercontent.com/sujee/mongodb-atlas-vector-search/main/images/colab-secret-3.png)


### Option 3B - If running on local python environment

- setup your local python env following this [setup guide](https://github.com/sujee/mongodb-atlas-vector-search/blob/main/setup-python-env.md)
- Create a file named `.env` in the same location as notebook
- And add the following settings

```text
ATLAS_URI=mongodb+srv://<username>:<password>@sandbox.....
MISTRAL_API_KEY=xyz
```


## Step-4: Determine Runtime Environment

This code will figure out if we are running on Google Colab environment or local environment.  We use it to install relevant packages later.

In [2]:
# are we running in Colab?
import os

if os.getenv("COLAB_RELEASE_TAG"):
    print("Running in Colab")
    MY_CONFIG.RUNNING_IN_COLAB = True
else:
    print("NOT running in Colab")
    MY_CONFIG.RUNNING_IN_COLAB = False

Running in Colab


## Step-5: Install dependencies (if necessary)

We will install required libraries in cloud environments like Google Colab.  For local environments, we assume the dependencies are already setup.

In [3]:
if MY_CONFIG.RUNNING_IN_COLAB:
    !pip install \
                pymongo==4.6.2 \
                llama-index \
                llama-index-embeddings-huggingface \
                llama-index-llms-mistralai \
                llama-index-vector-stores-mongodb \
                transformers==4.38.2 \
                torch==2.2.1

Collecting pymongo==4.6.2
  Downloading pymongo-4.6.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (677 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m677.2/677.2 kB[0m [31m7.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting llama-index
  Downloading llama_index-0.10.20-py3-none-any.whl (5.6 kB)
Collecting llama-index-embeddings-huggingface
  Downloading llama_index_embeddings_huggingface-0.1.4-py3-none-any.whl (7.7 kB)
Collecting llama-index-llms-mistralai
  Downloading llama_index_llms_mistralai-0.1.7-py3-none-any.whl (4.2 kB)
Collecting llama-index-vector-stores-mongodb
  Downloading llama_index_vector_stores_mongodb-0.1.4-py3-none-any.whl (4.0 kB)
Collecting dnspython<3.0.0,>=1.16.0 (from pymongo==4.6.2)
  Downloading dnspython-2.6.1-py3-none-any.whl (307 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m307.7/307.7 kB[0m [31m14.8 MB/s[0m eta [36m0:00:00[0m
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch==2.2.1)
  D

## Step-6: Basic Setup

### 6.1 - Check if we have GPU

In [4]:
## Check if GPU is enabled
import os
import torch

## To disable GPU and experiment, uncomment the following line
## Normally, you would want to use GPU, if one is available.
# os.environ["CUDA_VISIBLE_DEVICES"]=""

print ("using CUDA/GPU: ", torch.cuda.is_available())

for i in range(torch.cuda.device_count()):
   print("device ", i , torch.cuda.get_device_properties(i).name)

using CUDA/GPU:  True
device  0 Tesla T4


### 6.2 - Logging

In [5]:
## Setup logging.  To see more loging set the level to DEBUG

import sys
import logging

# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logging.basicConfig(stream=sys.stdout, level=logging.WARNING)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

## Step-7: Load Configurations

In [6]:
## Load settings based on where we are running
##  - if runninning on google Colab, load from secrets
##  - if running locally use dotenv

if MY_CONFIG.RUNNING_IN_COLAB:
    from google.colab import userdata
    MY_CONFIG.ATLAS_URI = userdata.get('ATLAS_URI')
    MY_CONFIG.MISTRAL_API_KEY = userdata.get('MISTRAL_API_KEY')
    # MY_CONFIG.OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
else:
    import os, sys
    from dotenv import find_dotenv, dotenv_values

    this_dir = os.path.abspath('')
    parent_dir = os.path.dirname(this_dir)
    sys.path.append (os.path.abspath (parent_dir))

    config = dotenv_values(find_dotenv())
    # debug
    # print (config)
    MY_CONFIG.ATLAS_URI = config.get('ATLAS_URI')
    MY_CONFIG.MISTRAL_API_KEY = config.get("MISTRAL_API_KEY")
## --- end load config

## If you just want to quickly set the config manually, you can do so here.
# MY_CONFIG.ATLAS_URI = ''
# MY_CONFIG.MISTRAL_API_KEY = ''

if  MY_CONFIG.ATLAS_URI:
    print ("✅ config ATLAS_URI found")
else:
    raise Exception ("'❌ ATLAS_URI' is not set.  Please set it above to continue...")


if MY_CONFIG.MISTRAL_API_KEY:
   print ("✅ config MISTRAL_API_KEY found")
else:
    raise Exception ("❌'MISTRAL_API_KEY' is not set.  Please set it above to continue...")

✅ config ATLAS_URI found
✅ config MISTRAL_API_KEY found


## Step-8: Initialize Atlas Client

If this step fails, make sure 'connect from anywhere' is enabled on your Atlas network configuration

![](https://raw.githubusercontent.com/sujee/mongodb-atlas-vector-search/main/images/atlas-connect-2.png)

In [7]:
import pymongo

mongodb_client = pymongo.MongoClient(MY_CONFIG.ATLAS_URI)
print ('✅ Connected to Atlas instance!')

✅ Connected to Atlas instance!


## Step-9 : Setup Embeddings Model

We will use Mistral embedding

In [8]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings

Settings.embed_model = HuggingFaceEmbedding(
    model_name = MY_CONFIG.EMBEDDING_MODEL
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

In [9]:
## testing
embeddings = Settings.embed_model.get_text_embedding("Hello world!")
print ('embedding len : ', len(embeddings))
print ('first few embeddings : ', embeddings[:10])

embedding len :  384
first few embeddings :  [-0.0032757227309048176, -0.011690807528793812, 0.041559189558029175, -0.03814816102385521, 0.024183066561818123, 0.01364425290375948, 0.011117850430309772, 0.04811973124742508, 0.02140951342880726, 0.01417492888867855]


## Step-10: Setup LLM

Our LLM of choice is Mistral

In [10]:
from llama_index.llms.mistralai import MistralAI
from llama_index.core import Settings

llm = MistralAI(model="mistral-large-latest", temperature=0.1, api_key=MY_CONFIG.MISTRAL_API_KEY)

Settings.llm = llm

In [11]:
## Testing
resp = llm.complete("The capital of the United States is ")
print (resp)

The capital of the United States is Washington, D.C. It was founded in 1790 and is located on the east coast of the country, between the states of Maryland and Virginia. The city is named after George Washington, the first president of the United States. It is known for its iconic landmarks, such as the White House, the Capitol Building, and the Lincoln Memorial.


## Step-11: Connect Illama-Index and MongoDB Atlas

Let's define MongoDB Atlas as our vector storage. This is critical to stored indexed data and then query

In [12]:
from llama_index.vector_stores.mongodb import MongoDBAtlasVectorSearch
from llama_index.core import StorageContext
from llama_index.core import VectorStoreIndex


vector_store = MongoDBAtlasVectorSearch(mongodb_client = mongodb_client,
                                        db_name = MY_CONFIG.DB_NAME,
                                        collection_name = MY_CONFIG.COLLECTION_NAME,
                                        index_name  = MY_CONFIG.INDEX_NAME,
                                        embedding_key = MY_CONFIG.EMBEDDING_ATTRIBUTE,
                                        ## the following columns are set to default values
                                       # text_key = 'text', metadata_= 'metadata',
                                 )
storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex.from_vector_store(vector_store=vector_store, storage_context=storage_context)

## Step-12: Query Data / Ask Questions

Now that we have every thing setup, let's ask some questions

These are the PDF documents we have loaded into Atlas, you can download them and inspect them.

- [10k/lyft_2021.pdf](https://raw.githubusercontent.com/sujee/mongodb-atlas-vector-search/main/data/10k/lyft_2021.pdf)
- [10k/uber_2021.pdf](https://raw.githubusercontent.com/sujee/mongodb-atlas-vector-search/main/data/10k/uber_2021.pdf)

In [13]:
%%time

from IPython.display import Markdown
from pprint import pprint

response = index.as_query_engine().query("What was Uber's revenue?")
print (response)
print()
pprint(response, indent=4)

Uber's total revenue for the year ended December 31, 2021, was $17,455 million. This revenue is disaggregated into various offerings and geographical regions. The revenue from Mobility was $6,953 million, Delivery revenue was $8,362 million, Freight revenue was $2,132 million, and All Other revenue was $8 million for the same period. The revenue is recognized based on the location where the transaction occurred.

Response(response="Uber's total revenue for the year ended December 31, 2021, "
                  'was $17,455 million. This revenue is disaggregated into '
                  'various offerings and geographical regions. The revenue '
                  'from Mobility was $6,953 million, Delivery revenue was '
                  '$8,362 million, Freight revenue was $2,132 million, and All '
                  'Other revenue was $8 million for the same period. The '
                  'revenue is recognized based on the location where the '
                  'transaction occurred.',

In [14]:
%%time

response = index.as_query_engine().query("How much money did Lyft make in 2020?")
print (response)
print()
pprint(response, indent=4)

Lyft's revenue for the year 2020 was $2,364,681 (in thousands). This means they made $2.36 billion in 2020.

Response(response="Lyft's revenue for the year 2020 was $2,364,681 (in "
                  'thousands). This means they made $2.36 billion in 2020.',
         source_nodes=[   NodeWithScore(node=TextNode(id_='73c0821d-d0be-49c7-b3cb-10729aaea1c4', embedding=None, metadata={'page_label': '58', 'file_name': 'lyft_2021.pdf', 'file_path': '/content/data/10k/lyft_2021.pdf', 'file_type': 'application/pdf', 'file_size': 1440303, 'creation_date': '2024-03-20', 'last_modified_date': '2024-03-20'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='708cd071-70b1-42ca-9076-2c469deaa545', node_type=<ObjectType.DOCU

In [15]:
%%time

## The answer to this question doesn't exist in the Lyft_10k filing!
## Let's see what we get back
response = index.as_query_engine().query("How much money did Lyft make in 2018?")
print (response)
print()
pprint(response, indent=4)

The provided context does not include financial data for Lyft in 2018.

Response(response='The provided context does not include financial data for '
                  'Lyft in 2018.',
         source_nodes=[   NodeWithScore(node=TextNode(id_='82725934-1ff2-41dd-9853-daf6f5b6c521', embedding=None, metadata={'page_label': '79', 'file_name': 'lyft_2021.pdf', 'file_path': '/content/data/10k/lyft_2021.pdf', 'file_type': 'application/pdf', 'file_size': 1440303, 'creation_date': '2024-03-20', 'last_modified_date': '2024-03-20'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='a2dcc1a8-c7c0-4cea-97cc-8b069688f142', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'page_label': '79', 'file_name': 'lyft_2021.pdf', 'f

In [16]:
%%time

response = index.as_query_engine().query("When did Uber go IPO?")
print (response)
print()
pprint(response, indent=4)

Uber went public on May 14, 2019.

Response(response='Uber went public on May 14, 2019.',
         source_nodes=[   NodeWithScore(node=TextNode(id_='bc48762b-d3ff-482f-86e7-64a23666a248', embedding=None, metadata={'page_label': '119', 'file_name': 'uber_2021.pdf', 'file_path': '/content/data/10k/uber_2021.pdf', 'file_type': 'application/pdf', 'file_size': 1880483, 'creation_date': '2024-03-20', 'last_modified_date': '2024-03-20'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='8cfaf251-79c1-476f-9876-fb82da2f044a', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'page_label': '119', 'file_name': 'uber_2021.pdf', 'file_path': '/content/data/10k/uber_2021.pdf', 'file_type': 'application/pdf', 'file_size': 18

In [17]:
%%time

response = index.as_query_engine().query("What were the Stock-based compensation for Lyft?")
print (response)
print()
pprint(response, indent=4)

In the year ending December 31, 2021, the stock-based compensation for Lyft was $721,710. For the year ending December 31, 2020, it was $565,807. And in the year ending December 31, 2019, it was $1,599,311.

Response(response='In the year ending December 31, 2021, the stock-based '
                  'compensation for Lyft was $721,710. For the year ending '
                  'December 31, 2020, it was $565,807. And in the year ending '
                  'December 31, 2019, it was $1,599,311.',
         source_nodes=[   NodeWithScore(node=TextNode(id_='ffc0f805-8304-4491-ba12-606480129964', embedding=None, metadata={'page_label': '82', 'file_name': 'lyft_2021.pdf', 'file_path': '/content/data/10k/lyft_2021.pdf', 'file_type': 'application/pdf', 'file_size': 1440303, 'creation_date': '2024-03-20', 'last_modified_date': '2024-03-20'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys

## Try your own queries

Inspect the PDFs, and ask away.

Here are few things to try

- ask question about data in tables.  Are we getting accurate answers?
- Ask vague questions, e.g "which company is more environmentally friendly".  What do we get back?

In [18]:


%%time

# response = index.as_query_engine().query("YOUR QUERY GOES HERE")
# print (response)
# print()
# pprint(response, indent=4)

CPU times: user 4 µs, sys: 0 ns, total: 4 µs
Wall time: 6.2 µs
