<a href="https://colab.research.google.com/github/rkulesza/bigdata/blob/main/08-vector-rag-mongo/text_english.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Atlas Vector Search - Retrieval-Augmented Generation (RAG)

This notebook is a companion to the [Retrieval-Augmented Generation (RAG)](https://www.mongodb.com/docs/atlas/atlas-vector-search/rag/#get-started) tutorial. Refer to the page for set-up instructions and detailed explanations.

This notebook takes you through how to implement RAG with Atlas Vector Search by using open-source models from Hugging Face.

<a target="_blank" href="https://colab.research.google.com/github/mongodb/docs-notebooks/blob/main/use-cases/rag.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

In [1]:
pip install --quiet --upgrade pymongo sentence_transformers einops langchain langchain_community pypdf huggingface_hub

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.4 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.5/1.4 MB[0m [31m16.2 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m23.5 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/340.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m340.6/340.6 kB[0m [31m31.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m58.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m87.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.3/302.3 kB[0m [31m24.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [2]:
from sentence_transformers import SentenceTransformer

# Load the embedding model (https://huggingface.co/nomic-ai/nomic-embed-text-v1")
model = SentenceTransformer("nomic-ai/nomic-embed-text-v1", trust_remote_code=True)

# Define a function to generate embeddings
def get_embedding(data):
    """Generates vector embeddings for the given data."""

    embedding = model.encode(data)
    return embedding.tolist()

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/128 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/71.2k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/54.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/2.03k [00:00<?, ?B/s]

configuration_hf_nomic_bert.py:   0%|          | 0.00/1.96k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/nomic-ai/nomic-bert-2048:
- configuration_hf_nomic_bert.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_hf_nomic_bert.py:   0%|          | 0.00/103k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/nomic-ai/nomic-bert-2048:
- modeling_hf_nomic_bert.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


pytorch_model.bin:   0%|          | 0.00/547M [00:00<?, ?B/s]



tokenizer_config.json:   0%|          | 0.00/1.19k [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/270 [00:00<?, ?B/s]

In [3]:
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Load the PDF
loader = PyPDFLoader("https://investors.mongodb.com/node/12236/pdf")
data = loader.load()

# Split the data into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=400, chunk_overlap=20)
documents = text_splitter.split_documents(data)

In [4]:
# Prepare documents for insertion
docs_to_insert = [{
    "text": doc.page_content,
    "embedding": get_embedding(doc.page_content)
} for doc in documents]

In [6]:
from pymongo import MongoClient

# Connect to your Atlas cluster
client = MongoClient("URI")
collection = client["rag_db"]["test"]

# Insert documents into the collection
result = collection.insert_many(docs_to_insert)

In [7]:
from pymongo.operations import SearchIndexModel
import time

# Create your index model, then create the search index
index_name="vector_index"
search_index_model = SearchIndexModel(
  definition = {
    "fields": [
      {
        "type": "vector",
        "numDimensions": 768,
        "path": "embedding",
        "similarity": "cosine"
      }
    ]
  },
  name = index_name,
  type = "vectorSearch"
)
collection.create_search_index(model=search_index_model)

# Wait for initial sync to complete
print("Polling to check if the index is ready. This may take up to a minute.")
predicate=None
if predicate is None:
   predicate = lambda index: index.get("queryable") is True

while True:
   indices = list(collection.list_search_indexes(index_name))
   if len(indices) and predicate(indices[0]):
      break
   time.sleep(5)
print(index_name + " is ready for querying.")

Polling to check if the index is ready. This may take up to a minute.
vector_index is ready for querying.


In [8]:
# Define a function to run vector search queries
def get_query_results(query):
  """Gets results from a vector search query."""

  query_embedding = get_embedding(query)
  pipeline = [
      {
            "$vectorSearch": {
              "index": "vector_index",
              "queryVector": query_embedding,
              "path": "embedding",
              "exact": True,
              "limit": 5
            }
      }, {
            "$project": {
              "_id": 0,
              "text": 1
         }
      }
  ]

  results = collection.aggregate(pipeline)

  array_of_results = []
  for doc in results:
      array_of_results.append(doc)
  return array_of_results

# Test the function with a sample query
import pprint
pprint.pprint(get_query_results("AI technology"))

[{'text': 'artificial intelligence, in our offerings or partnerships; the '
          'growth and expansion of the market for database products and our '
          'ability to penetrate that\n'
          'market; our ability to integrate acquired businesses and '
          'technologies successfully or achieve the expected benefits of such '
          'acquisitions; our ability to'},
 {'text': 'more of our customers. We also see a tremendous opportunity to win '
          'more legacy workloads, as AI has now become a catalyst to modernize '
          'these\n'
          "applications. MongoDB's document-based architecture is particularly "
          'well-suited for the variety and scale of data required by '
          'AI-powered applications.\xa0\n'
          'We are confident MongoDB  will be a substantial beneficiary of this '
          'next wave of application development."'},
 {'text': 'MongoDB  continues to expand its AI ecosystem with the announcement '
          'of the Mong

In [20]:
import os
from huggingface_hub import InferenceClient

# Specify search query, retrieve relevant documents, and convert to string
query = "What are MongoDB? latest AI announcements?"
context_docs = get_query_results(query)
context_string = " ".join([doc["text"] for doc in context_docs])

# Construct prompt for the LLM using the retrieved documents as the context
prompt = f"""Use the following pieces of context to answer the question at the end.
    {context_string}
    Question: {query}
"""

# Authenticate to Hugging Face and access the model
os.environ["HF_TOKEN"] = "TOKEN"
llm = InferenceClient(
    "mistralai/Mistral-7B-Instruct-v0.3",
    token = os.getenv("HF_TOKEN"))

# Prompt the LLM (this code varies depending on the model you use)
output = llm.chat_completion(
    messages=[{"role": "user", "content": prompt}],
    max_tokens=150
)
print(output.choices[0].message.content)

MongoDB is a developer-centric, cloud-native database platform built by developers for developers. Its mission is to empower innovators to create, transform, and disrupt industries by unleashing the power of software and data. It provides a unified and consistent user experience with integrated services for addressing the requirements of modern applications. The platform offers a high-performance database with features like faster reads and updates, as well as faster bulk inserts and time series queries. MongoDB has been downloaded by tens of thousands of customers in over 100 countries, and it also offers services like Atlas Stream Processing for building sophisticated, event-driven applications with real-time data. MongoDB also pays attention to the
