# MongoDB Vector Search - Retrieval-Augmented Generation (RAG)

This notebook is a companion to the [Retrieval-Augmented Generation (RAG)](https://www.mongodb.com/docs/atlas/atlas-vector-search/rag/#get-started) tutorial. Refer to the page for set-up instructions and detailed explanations.

This notebook takes you through how to implement RAG with MongoDB Vector Search by using open-source models from Hugging Face.

<a target="_blank" href="https://colab.research.google.com/github/mongodb/docs-notebooks/blob/main/use-cases/rag.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

In [2]:
pip install --quiet --upgrade pymongo sentence_transformers einops langchain langchain_community langchain-text-splitters pypdf huggingface_hub

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.7 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m54.8 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/102.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m102.1/102.1 kB[0m [31m10.7 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/2.5 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m107.9 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/329.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m329.5/329.5 kB[0m [31m30.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m 

In [6]:
import os

# Specify your Hugging Face access token
os.environ["HF_TOKEN"] = "<token>"

In [7]:
from sentence_transformers import SentenceTransformer

# Load the embedding model (https://huggingface.co/nomic-ai/nomic-embed-text-v1")
model = SentenceTransformer("nomic-ai/nomic-embed-text-v1", trust_remote_code=True)

# Define a function to generate embeddings
def get_embedding(data):
    """Generates vector embeddings for the given data."""

    embedding = model.encode(data)
    return embedding.tolist()

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/128 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/54.0 [00:00<?, ?B/s]

config.json: 0.00B [00:00, ?B/s]

configuration_hf_nomic_bert.py: 0.00B [00:00, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/nomic-ai/nomic-bert-2048:
- configuration_hf_nomic_bert.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_hf_nomic_bert.py: 0.00B [00:00, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/nomic-ai/nomic-bert-2048:
- modeling_hf_nomic_bert.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


pytorch_model.bin:   0%|          | 0.00/547M [00:00<?, ?B/s]



tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/270 [00:00<?, ?B/s]

In [8]:
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Load the PDF
loader = PyPDFLoader("https://investors.mongodb.com/node/12236/pdf")
data = loader.load()

# Split the data into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=400, chunk_overlap=20)
documents = text_splitter.split_documents(data)

In [9]:
# Prepare documents for insertion
docs_to_insert = [{
    "text": doc.page_content,
    "embedding": get_embedding(doc.page_content)
} for doc in documents]

In [13]:
from pymongo import MongoClient

# Replace the ENTIRE string below with your actual connection URI
MONGO_URI = "mongodb+srv://your_atlas_user:YourSecurePassword@cluster0.abcde.mongodb.net/?retryWrites=true&w=majority"

client = MongoClient(MONGO_URI)

ConfigurationError: The DNS query name does not exist: _mongodb._tcp.cluster0.abcde.mongodb.net.

In [11]:
from pymongo import MongoClient

# Connect to your MongoDB cluster
# IMPORTANT: Replace '<YOUR_MONGODB_CONNECTION_STRING>' with your actual MongoDB Atlas connection string.
# Ensure your IP address is whitelisted in MongoDB Atlas.
client = MongoClient("<YOUR_MONGODB_CONNECTION_STRING>")
collection = client["rag_db"]["test"]

# Insert documents into the collection
result = collection.insert_many(docs_to_insert)

ServerSelectionTimeoutError: <your_mongodb_connection_string>:27017: [Errno -2] Name or service not known (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms), Timeout: 30s, Topology Description: <TopologyDescription id: 69337011762be37eee0473d3, topology_type: Unknown, servers: [<ServerDescription ('<your_mongodb_connection_string>', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('<your_mongodb_connection_string>:27017: [Errno -2] Name or service not known (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>]>

In [None]:
from pymongo.operations import SearchIndexModel
import time

# Create your index model, then create the search index
index_name="vector_index"
search_index_model = SearchIndexModel(
  definition = {
    "fields": [
      {
        "type": "vector",
        "numDimensions": 768,
        "path": "embedding",
        "similarity": "cosine"
      }
    ]
  },
  name = index_name,
  type = "vectorSearch"
)
collection.create_search_index(model=search_index_model)

# Wait for initial sync to complete
print("Polling to check if the index is ready. This may take up to a minute.")
predicate=None
if predicate is None:
   predicate = lambda index: index.get("queryable") is True

while True:
   indices = list(collection.list_search_indexes(index_name))
   if len(indices) and predicate(indices[0]):
      break
   time.sleep(5)
print(index_name + " is ready for querying.")

In [None]:
# Define a function to run vector search queries
def get_query_results(query):
  """Gets results from a vector search query."""

  query_embedding = get_embedding(query)
  pipeline = [
      {
            "$vectorSearch": {
              "index": "vector_index",
              "queryVector": query_embedding,
              "path": "embedding",
              "exact": True,
              "limit": 5
            }
      }, {
            "$project": {
              "_id": 0,
              "text": 1
         }
      }
  ]

  results = collection.aggregate(pipeline)

  array_of_results = []
  for doc in results:
      array_of_results.append(doc)
  return array_of_results

# Test the function with a sample query
import pprint
pprint.pprint(get_query_results("AI technology"))

In [None]:
from huggingface_hub import InferenceClient

# Specify search query, retrieve relevant documents, and convert to string
query = "What are MongoDB's latest AI announcements?"
context_docs = get_query_results(query)
context_string = " ".join([doc["text"] for doc in context_docs])

# Construct prompt for the LLM using the retrieved documents as the context
prompt = f"""Use the following pieces of context to answer the question at the end.
    {context_string}
    Question: {query}
"""

# Use a model from Hugging Face
llm = InferenceClient(
    "mistralai/Mixtral-8x22B-Instruct-v0.1",
    provider = "fireworks-ai",
    token = os.getenv("HF_TOKEN"))

# Prompt the LLM (this code varies depending on the model you use)
output = llm.chat_completion(
    messages=[{"role": "user", "content": prompt}],
    max_tokens=150
)
print(output.choices[0].message.content)

# Task
Update the MongoDB connection string in cell `3nw8BGOMReRb` and verify that the MongoDB cluster details are successfully connected, and data can be inserted without errors.

## Locate and Update Connection String

### Subtask:
Navigate to the code cell where the MongoClient is initialized (cell `3nw8BGOMReRb`) and replace the placeholder `<connection-string>` with your actual MongoDB Atlas connection string. Ensure it includes the correct hostname, port, and authentication details.


### Subtask Instructions

To update the connection string, please follow these steps:

1.  **Navigate to Cell `3nw8BGOMReRb`**: This cell initializes the `MongoClient`.
2.  **Locate the Line**: Find the line `client = MongoClient("<connection-string>")`.
3.  **Replace Placeholder**: Substitute `<connection-string>` with your actual MongoDB Atlas connection string. Ensure the connection string remains enclosed within double quotes.
4.  **Verify Authentication**: Confirm that your connection string includes all necessary authentication details (username, password, etc.). Also, if you are connecting from a new IP address, ensure it is whitelisted in your MongoDB Atlas project.

After making this change, you will need to re-run the cell `3nw8BGOMReRb` and any subsequent cells that depend on this connection.

### Subtask Instructions

To update the connection string, please follow these steps:

1.  **Navigate to Cell `3nw8BGOMReRb`**: This cell initializes the `MongoClient`.
2.  **Locate the Line**: Find the line `client = MongoClient(\"<connection-string>\")`.
3.  **Replace Placeholder**: Substitute `<connection-string>` with your actual MongoDB Atlas connection string. Ensure the connection string remains enclosed within double quotes.
4.  **Verify Authentication**: Confirm that your connection string includes all necessary authentication details (username, password, etc.). Also, if you are connecting from a new IP address, ensure it is whitelisted in your MongoDB Atlas project.
5.  **Re-run Cells**: After making this change, you will need to re-run the cell `3nw8BGOMReRb` and any subsequent cells that depend on this connection to apply the changes.

## Rerun Connection and Insertion Cells

### Subtask:
After updating the connection string, rerun the cell with the `MongoClient` initialization and the subsequent cell that inserts documents. This will attempt to connect to your MongoDB cluster and insert the prepared documents.


#### Instructions
1. Rerun the code cell `3nw8BGOMReRb`.
2. Observe the output of the cell. If the connection is successful and data is inserted, there should be no error messages, and a `ServerSelectionTimeoutError` should not appear.
3. If an error still occurs, carefully re-check the connection string for any typos or missing components, and ensure your IP address is whitelisted in MongoDB Atlas before rerunning the cell again.

## Verify Connection and Data Insertion

### Subtask:
Observe the output of the rerun cells to ensure there are no connection errors and that the data insertion completes successfully. If an error persists, double-check your connection string for any typos or missing components.


### Action Required: Correct MongoDB Connection String

The `3nw8BGOMReRb` cell failed with a `ServerSelectionTimeoutError`, indicating that the Python driver could not connect to your MongoDB cluster. This is typically due to an incorrect connection string or network configuration issues (e.g., IP whitelist).

1.  **Verify Connection String**: Go back to the `3nw8BGOMReRb` cell and carefully re-examine the placeholder `<connection-string>`. Replace it with your actual MongoDB Atlas connection string. Ensure there are no typos, missing characters, or extra spaces.
    *   **Example format**: `"mongodb+srv://<username>:<password>@<cluster-name>.mongodb.net/?retryWrites=true&w=majority&appName=<app-name>"`
2.  **IP Whitelist**: Log in to your [MongoDB Atlas](https://cloud.mongodb.com/) account. Navigate to your cluster, then to "Network Access" under the "Security" tab. Ensure that your current IP address is added to the IP Access List. For development purposes, you can add `0.0.0.0/0` to allow access from anywhere (though this is not recommended for production environments).
3.  **Rerun Cells**: After updating your connection string and verifying network access, rerun the following cells in order:
    *   `3nw8BGOMReRb` (data insertion)
    *   `3gx-Fqp9ReRc` (search index creation)
    *   `yOHFJbUkReRd` (vector search query)
    *   `08Xo9G5NReRd` (LLM interaction)

Once `3nw8BGOMReRb` executes successfully, you should see output indicating successful document insertion, and you can then proceed to the next steps of creating the search index and performing vector search queries.

### Action Required: Correct MongoDB Connection String and Verify Insertion

The `3nw8BGOMReRb` cell previously failed with a `ServerSelectionTimeoutError`, indicating that the Python driver could not connect to your MongoDB cluster. This is typically due to an incorrect connection string or network configuration issues (e.g., IP whitelist).

**Instructions:**

1.  **Review Output of `3nw8BGOMReRb`**:
    *   **If `ServerSelectionTimeoutError` persists**: This means the connection issue is still present. Carefully re-examine the placeholder `<connection-string>` in the `3nw8BGOMReRb` cell. Replace it with your actual MongoDB Atlas connection string, ensuring no typos or missing components. Also, log in to your [MongoDB Atlas](https://cloud.mongodb.com/) account, navigate to "Network Access" under the "Security" tab for your cluster, and ensure your current IP address is added to the IP Access List. For development, you can temporarily add `0.0.0.0/0`. After making changes, re-execute the cell.
    *   **If `3nw8BGOMReRb` executes successfully**: Confirm that the `result` variable (returned by `collection.insert_many`) indicates successful insertions. You can check `len(result.inserted_ids)` to see the number of documents inserted. A successful execution means the documents have been added to your MongoDB collection.

2.  **Rerun Subsequent Cells**: After `3nw8BGOMReRb` executes successfully and you've verified data insertion (or fixed the connection error), proceed to rerun the following cells in order to continue:
    *   `3gx-Fqp9ReRc` (search index creation)
    *   `yOHFJbUkReRd` (vector search query)
    *   `08Xo9G5NReRd` (LLM interaction)

Once `3nw8BGOMReRb` executes successfully, you should see output indicating successful document insertion, and you can then proceed to the next steps of creating the search index and performing vector search queries.

### Action Required: Correct MongoDB Connection String and Verify Insertion

The `3nw8BGOMReRb` cell previously failed with a `ServerSelectionTimeoutError`, indicating that the Python driver could not connect to your MongoDB cluster. This is typically due to an incorrect connection string or network configuration issues (e.g., IP whitelist).

**Instructions:**

1.  **Review Output of `3nw8BGOMReRb`**:
    *   **If `ServerSelectionTimeoutError` persists**: This means the connection issue is still present. Carefully re-examine the placeholder `<connection-string>` in the `3nw8BGOMReRb` cell. Replace it with your actual MongoDB Atlas connection string, ensuring no typos or missing components. Also, log in to your [MongoDB Atlas](https://cloud.mongodb.com/) account, navigate to "Network Access" under the "Security" tab for your cluster, and ensure your current IP address is added to the IP Access List. For development, you can temporarily add `0.0.0.0/0`. After making changes, re-execute the cell.
    *   **If `3nw8BGOMReRb` executes successfully**: Confirm that the `result` variable (returned by `collection.insert_many`) indicates successful insertions. You can check `len(result.inserted_ids)` to see the number of documents inserted. A successful execution means the documents have been added to your MongoDB collection.

2.  **Rerun Subsequent Cells**: After `3nw8BGOMReRb` executes successfully and you've verified data insertion (or fixed the connection error), proceed to rerun the following cells in order to continue:
    *   `3gx-Fqp9ReRc` (search index creation)
    *   `yOHFJbUkReRd` (vector search query)
    *   `08Xo9G5NReRd` (LLM interaction)

Once `3nw8BGOMReRb` executes successfully, you should see output indicating successful document insertion, and you can then proceed to the next steps of creating the search index and performing vector search queries.

### Action Required: Correct MongoDB Connection String and Verify Insertion

The `3nw8BGOMReRb` cell previously failed with a `ServerSelectionTimeoutError`, indicating that the Python driver could not connect to your MongoDB cluster. This is typically due to an incorrect connection string or network configuration issues (e.g., IP whitelist).

**Instructions:**

1.  **Review Output of `3nw8BGOMReRb`**:
    *   **If `ServerSelectionTimeoutError` persists**: This means the connection issue is still present. Carefully re-examine the placeholder `<connection-string>` in the `3nw8BGOMReRb` cell. Replace it with your actual MongoDB Atlas connection string, ensuring no typos or missing components. Also, log in to your [MongoDB Atlas](https://cloud.mongodb.com/) account, navigate to "Network Access" under the "Security" tab for your cluster, and ensure your current IP address is added to the IP Access List. For development, you can temporarily add `0.0.0.0/0`. After making changes, re-execute the cell.
    *   **If `3nw8BGOMReRb` executes successfully**: Confirm that the `result` variable (returned by `collection.insert_many`) indicates successful insertions. You can check `len(result.inserted_ids)` to see the number of documents inserted. A successful execution means the documents have been added to your MongoDB collection.

2.  **Rerun Subsequent Cells**: After `3nw8BGOMReRb` executes successfully and you've verified data insertion (or fixed the connection error), proceed to rerun the following cells in order to continue:
    *   `3gx-Fqp9ReRc` (search index creation)
    *   `yOHFJbUkReRd` (vector search query)
    *   `08Xo9G5NReRd` (LLM interaction)

Once `3nw8BGOMReRb` executes successfully, you should see output indicating successful document insertion, and you can then proceed to the next steps of creating the search index and performing vector search queries.


### Action Required: Correct MongoDB Connection String and Verify Insertion

The `3nw8BGOMReRb` cell previously failed with a `ServerSelectionTimeoutError`, indicating that the Python driver could not connect to your MongoDB cluster. This is typically due to an incorrect connection string or network configuration issues (e.g., IP whitelist).

**Instructions:**

1.  **Review Output of `3nw8BGOMReRb`**:
    *   **If `ServerSelectionTimeoutError` persists**: This means the connection issue is still present. Carefully re-examine the placeholder `<connection-string>` in the `3nw8BGOMReRb` cell. Replace it with your actual MongoDB Atlas connection string, ensuring no typos or missing components. Also, log in to your [MongoDB Atlas](https://cloud.mongodb.com/) account, navigate to "Network Access" under the "Security" tab for your cluster, and ensure your current IP address is added to the IP Access List. For development, you can temporarily add `0.0.0.0/0`. After making changes, re-execute the cell.
    *   **If `3nw8BGOMReRb` executes successfully**: Confirm that the `result` variable (returned by `collection.insert_many`) indicates successful insertions. You can check `len(result.inserted_ids)` to see the number of documents inserted. A successful execution means the documents have been added to your MongoDB collection.

2.  **Rerun Subsequent Cells**: After `3nw8BGOMReRb` executes successfully and you've verified data insertion (or fixed the connection error), proceed to rerun the following cells in order to continue:
    *   `3gx-Fqp9ReRc` (search index creation)
    *   `yOHFJbUkReRd` (vector search query)
    *   `08Xo9G5NReRd` (LLM interaction)

Once `3nw8BGOMReRb` executes successfully, you should see output indicating successful document insertion, and you can then proceed to the next steps of creating the search index and performing vector search queries.


### Action Required: Correct MongoDB Connection String and Verify Insertion

The `3nw8BGOMReRb` cell previously failed with a `ServerSelectionTimeoutError`, indicating that the Python driver could not connect to your MongoDB cluster. This is typically due to an incorrect connection string or network configuration issues (e.g., IP whitelist).

**Instructions:**

1.  **Review Output of `3nw8BGOMReRb`**:
    *   **If `ServerSelectionTimeoutError` persists**: This means the connection issue is still present. Carefully re-examine the placeholder `<connection-string>` in the `3nw8BGOMReRb` cell. Replace it with your actual MongoDB Atlas connection string, ensuring no typos or missing components. Also, log in to your [MongoDB Atlas](https://cloud.mongodb.com/) account, navigate to "Network Access" under the "Security" tab for your cluster, and ensure your current IP address is added to the IP Access List. For development, you can temporarily add `0.0.0.0/0`. After making changes, re-execute the cell.
    *   **If `3nw8BGOMReRb` executes successfully**: Confirm that the `result` variable (returned by `collection.insert_many`) indicates successful insertions. You can check `len(result.inserted_ids)` to see the number of documents inserted. A successful execution means the documents have been added to your MongoDB collection.

2.  **Rerun Subsequent Cells**: After `3nw8BGOMReRb` executes successfully and you've verified data insertion (or fixed the connection error), proceed to rerun the following cells in order to continue:
    *   `3gx-Fqp9ReRc` (search index creation)
    *   `yOHFJbUkReRd` (vector search query)
    *   `08Xo9G5NReRd` (LLM interaction)

Once `3nw8BGOMReRb` executes successfully, you should see output indicating successful document insertion, and you can then proceed to the next steps of creating the search index and performing vector search queries.


### Action Required: Correct MongoDB Connection String and Verify Insertion

The `3nw8BGOMReRb` cell previously failed with a `ServerSelectionTimeoutError`, indicating that the Python driver could not connect to your MongoDB cluster. This is typically due to an incorrect connection string or network configuration issues (e.g., IP whitelist).

**Instructions:**

1.  **Review Output of `3nw8BGOMReRb`**:
    *   **If `ServerSelectionTimeoutError` persists**: This means the connection issue is still present. Carefully re-examine the placeholder `<connection-string>` in the `3nw8BGOMReRb` cell. Replace it with your actual MongoDB Atlas connection string, ensuring no typos or missing components. Also, log in to your [MongoDB Atlas](https://cloud.mongodb.com/) account, navigate to "Network Access" under the "Security" tab for your cluster, and ensure your current IP address is added to the IP Access List. For development, you can temporarily add `0.0.0.0/0`. After making changes, re-execute the cell.
    *   **If `3nw8BGOMReRb` executes successfully**: Confirm that the `result` variable (returned by `collection.insert_many`) indicates successful insertions. You can check `len(result.inserted_ids)` to see the number of documents inserted. A successful execution means the documents have been added to your MongoDB collection.

2.  **Rerun Subsequent Cells**: After `3nw8BGOMReRb` executes successfully and you've verified data insertion (or fixed the connection error), proceed to rerun the following cells in order to continue:
    *   `3gx-Fqp9ReRc` (search index creation)
    *   `yOHFJbUkReRd` (vector search query)
    *   `08Xo9G5NReRd` (LLM interaction)

Once `3nw8BGOMReRb` executes successfully, you should see output indicating successful document insertion, and you can then proceed to the next steps of creating the search index and performing vector search queries.


### Action Required: Correct MongoDB Connection String and Verify Insertion

The `3nw8BGOMReRb` cell previously failed with a `ServerSelectionTimeoutError`, indicating that the Python driver could not connect to your MongoDB cluster. This is typically due to an incorrect connection string or network configuration issues (e.g., IP whitelist).

**Instructions:**

1.  **Review Output of `3nw8BGOMReRb`**:
    *   **If `ServerSelectionTimeoutError` persists**: This means the connection issue is still present. Carefully re-examine the placeholder `<connection-string>` in the `3nw8BGOMReRb` cell. Replace it with your actual MongoDB Atlas connection string, ensuring no typos or missing components. Also, log in to your [MongoDB Atlas](https://cloud.mongodb.com/) account, navigate to "Network Access" under the "Security" tab for your cluster, and ensure your current IP address is added to the IP Access List. For development, you can temporarily add `0.0.0.0/0`. After making changes, re-execute the cell.
    *   **If `3nw8BGOMReRb` executes successfully**: Confirm that the `result` variable (returned by `collection.insert_many`) indicates successful insertions. You can check `len(result.inserted_ids)` to see the number of documents inserted. A successful execution means the documents have been added to your MongoDB collection.

2.  **Rerun Subsequent Cells**: After `3nw8BGOMReRb` executes successfully and you've verified data insertion (or fixed the connection error), proceed to rerun the following cells in order to continue:
    *   `3gx-Fqp9ReRc` (search index creation)
    *   `yOHFJbUkReRd` (vector search query)
    *   `08Xo9G5NReRd` (LLM interaction)

Once `3nw8BGOMReRb` executes successfully, you should see output indicating successful document insertion, and you can then proceed to the next steps of creating the search index and performing vector search queries.


### Action Required: Correct MongoDB Connection String and Verify Insertion

The `3nw8BGOMReRb` cell previously failed with a `ServerSelectionTimeoutError`, indicating that the Python driver could not connect to your MongoDB cluster. This is typically due to an incorrect connection string or network configuration issues (e.g., IP whitelist).

**Instructions:**

1.  **Review Output of `3nw8BGOMReRb`**:
    *   **If `ServerSelectionTimeoutError` persists**: This means the connection issue is still present. Carefully re-examine the placeholder `<connection-string>` in the `3nw8BGOMReRb` cell. Replace it with your actual MongoDB Atlas connection string, ensuring no typos or missing components. Also, log in to your [MongoDB Atlas](https://cloud.mongodb.com/) account, navigate to "Network Access" under the "Security" tab for your cluster, and ensure your current IP address is added to the IP Access List. For development, you can temporarily add `0.0.0.0/0`. After making changes, re-execute the cell.
    *   **If `3nw8BGOMReRb` executes successfully**: Confirm that the `result` variable (returned by `collection.insert_many`) indicates successful insertions. You can check `len(result.inserted_ids)` to see the number of documents inserted. A successful execution means the documents have been added to your MongoDB collection.

2.  **Rerun Subsequent Cells**: After `3nw8BGOMReRb` executes successfully and you've verified data insertion (or fixed the connection error), proceed to rerun the following cells in order to continue:
    *   `3gx-Fqp9ReRc` (search index creation)
    *   `yOHFJbUkReRd` (vector search query)
    *   `08Xo9G5NReRd` (LLM interaction)

Once `3nw8BGOMReRb` executes successfully, you should see output indicating successful document insertion, and you can then proceed to the next steps of creating the search index and performing vector search queries.



## Final Task

### Subtask:
Confirm that the MongoDB cluster details have been successfully verified and that data can be inserted without errors.


## Summary:

### Q&A
The MongoDB cluster details have not yet been successfully verified, and data has not been inserted without errors by the agent. The initial connection attempt failed with a `ServerSelectionTimeoutError`. The agent has provided comprehensive instructions to the user to troubleshoot this connection issue, update the connection string, and verify data insertion manually.

### Data Analysis Key Findings
*   The initial attempt to connect to the MongoDB cluster and insert data resulted in a `ServerSelectionTimeoutError`, indicating a connection failure.
*   The agent successfully generated and delivered detailed markdown instructions to the user on how to update the MongoDB Atlas connection string in cell `3nw8BGOMReRb`.
*   Instructions included steps to verify authentication details, ensure IP address whitelisting in MongoDB Atlas, and re-run the relevant cells after updating the connection string.
*   Comprehensive troubleshooting guidance was provided for the `ServerSelectionTimeoutError`, advising the user to check the connection string for typos and confirm network access settings (IP whitelist).
*   The agent outlined how the user could verify successful data insertion by checking the `result.inserted_ids` attribute after the connection issue is resolved.

### Insights or Next Steps
*   The user must follow the provided troubleshooting steps, update the MongoDB connection string, and ensure their IP is whitelisted in MongoDB Atlas to resolve the `ServerSelectionTimeoutError`.
*   After resolving the connection, the user should re-run cell `3nw8BGOMReRb` and subsequent cells to confirm successful data insertion and proceed with the rest of the workflow (search index creation, vector search, LLM interaction).
