# MongoDb for Agents and RAG.

**Retrieval-augmented generation (RAG)** is an architecture used to augment large language models (LLMs) with additional data so that they can generate more accurate responses.

**Memory**: Agents can remember previous interactions. Memory can be short-term (for the current session) or long-term (persisted across sessions).

**Tools**: Agents can use **vector search** as a tool to retrieve relevant information and implement RAG. The vector search returns documents whose embeddings are closest in distance to the embedding the user's query.

In [None]:
!pip install -q -r requirements.txt

We will use:
- [Mongo Atlas](https://cloud.mongodb.com) for the sample database sample_mflix.
- [OpenAi](https://platform.openai.com/) or [Voyageai](https://www.voyageai.com/) for embeddings models. The collection sample_mfix.embedded_movies  has embedding generated wiht both OpneAi an Voyage models.
- [Groq](https://console.groq.com/) for LLM models.

Set MONGO_CONNECTION_STRING, OPENAI_API_KEY and GROQ_API_KEY

Follow the steps in:

- [Mongo docs-create database](https://www.mongodb.com/resources/products/fundamentals/create-database)

- [Mongo docs-cluster setup](https://www.mongodb.com/resources/products/fundamentals/mongodb-cluster-setup)

In [None]:
import os
from dotenv import load_dotenv

load_dotenv()

if not os.environ.get("MONGO_CONNECTION_STRING"):
    print("Connection string for MONGO is not set. Please check your .env file.")
else:
    print("MONGO_CONNECTION_STRING loaded successfully.")

if not os.environ.get("OPENAI_API_KEY"):
    print("API KEY for OPENAI is not set. Please check your .env file.")
else:
    print("OPENAI_API_KEY loaded successfully.")

if not os.environ.get("GROQ_API_KEY"):
    print("API key for Groq is not set. Please check your .env file.")
else:
    print("API key loaded successfully.")



### Connect to MongoDb Cluster

Connect to the MongoDB and initialize with mongo_client db = sample_mflix and collection = embedded_movies.

In [None]:
import pymongo

MONGO_CONNECTION_STRING = os.environ.get("MONGO_CONNECTION_STRING")
mongo_client = pymongo.MongoClient(MONGO_CONNECTION_STRING)

try:
    mongo_client.admin.command('ping')
    print("✅ Connected successfully!")
except Exception as e:
    print("❌ Connection failed:", e)

db = mongo_client["sample_mflix"]
embedded_movies_collection = db["embedded_movies"]

### Test embedding models with openai_client
Test embedding models from OpenAI. We can check the setting of the organization linked to the API-KEY.

In [None]:
from openai import OpenAI

model_3small = "text-embedding-3-small"
model_ada002 = "text-embedding-ada-002"
openai_client = OpenAI()

print(openai_client.organization)
print(openai_client.models.list())

# Define a function to generate embeddings
def get_embedding(text, model):
   """Generates vector embeddings for the given text."""

   embedding = openai_client.embeddings.create(input = [text], model=model).data[0].embedding
   return embedding

# Generate an embedding
embedding_3small = get_embedding(text = "foo", model = model_3small)
print(embedding_3small)


embedding_ada002 = get_embedding(text = "foo", model = model_ada002)
print(embedding_ada002)

### Create an index for sample_mfix.embedded_movies

Parameters:
- **`fields`**: A list describing which field(s) in the documents should be indexed.
- **`type`**: Set to `"vector"` to indicate this is a vector search index.
- **`path`**: The document field containing the embedding vectors (here, `"plot_embedding"`).
- **`similarity`**: The distance metric used to compare vectors—commonly `"cosine"`, `"dotProduct"`, or `"euclidean"`.
- **`numDimensions`**: The dimensionality of the embeddings (here, `1536`, matching the model output).

The index is given a custom **name** (`"vector_index"`) and a **type** (`"vectorSearch"`). The default value for type is 'search'.

We poll MongoDB at regular intervals to check when the index becomes *queryable*.

After submitting the index definition with `create_search_index()`, the script polls MongoDB at regular intervals to check when the index becomes *queryable*. Once the index has fully synchronized, a confirmation message is printed indicating that it’s ready for use.


[mongo docs - create_search_index](https://www.mongodb.com/docs/php-library/current/reference/method/MongoDBCollection-createSearchIndex/)


In [None]:
from pymongo.operations import SearchIndexModel
import time

# Create the index model, then create the search index
search_index_model = SearchIndexModel(
  definition = {
    "fields": [
      {
        "type": "vector",
        "path": "plot_embedding",
        "similarity": "cosine",
        "numDimensions": 1536
      }
    ]
  },
  name="vector_index_plot",
  type="vectorSearch"
)
result = embedded_movies_collection.create_search_index(model=search_index_model)

# Wait for initial sync to complete
print("Polling to check if the index is ready. This may take up to a minute.")
predicate = lambda index: index.get("queryable") is True

while True:
  indices = list(embedded_movies_collection.list_search_indexes(result))
  if len(indices) and predicate(indices[0]):
    break
  time.sleep(5)
print(result + " is ready for querying.")

### Test the search index

Test the search index created above. We get the embeddings from NL querys from OpenAi API and then search the sample_mflix.embedded_movies.

In MongoDB, an **aggregation pipeline** is a sequence of stages that process and transform documents step by step.
Each stage performs an operation (such as filtering, projecting, grouping, or searching) and passes the results to the next stage — similar to a data processing pipeline.

Pipelines are defined as Python lists of dictionaries, where each dictionary represents a stage (e.g., `$match`, `$project`, `$group`, `$vectorSearch`). This approach allows complex queries, transformations, and search operations to be performed efficiently on the database side.


In [None]:
# Generate embedding for the search query
query1_embedding = get_embedding(text = "Show me movies that tell the story of artificial intelligence becoming dangerous.", model = model_ada002 )
query2_embedding = get_embedding(text = "Find a movie about dinosaurs coming back to life.", model = model_ada002 )
query3_embedding = get_embedding(text = "Which movies are about time travel or alternate realities?", model = model_ada002 )

In [None]:
def run_vector_search(collection, query_vector, index_name,
                      path, fields, limit=5, exact=True, ):
    """
    Executes a vector similarity search in MongoDB Atlas using a defined search index.
    """
    # Build projection dynamically
    projection = {field: 1 for field in fields}
    projection["_id"] = 0
    projection["score"] = {"$meta": "vectorSearchScore"}

    # Build aggregation pipeline
    pipeline = [
        {
            "$vectorSearch": {
                "index": index_name,
                "queryVector": query_vector,
                "path": path,
                "exact": exact,
                "limit": limit
            }
        },
        {"$project": projection}
    ]

    # Execute the search
    results = list(collection.aggregate(pipeline))
    return results


In [None]:
print(run_vector_search(collection = embedded_movies_collection, query_vector = query1_embedding, index_name="vector_index_plot",
                      path="plot_embedding", fields = ["title", "plot"], limit=5, exact=True, ))


#### Code Example

```python
# Sample vector search pipeline
pipeline = [
    {
        "$vectorSearch": {
            "index": "vector_index",
            "queryVector": query_embedding,
            "path": "plot_embedding",
            "exact": True,
            "limit": 5
        }
    },
    {
        "$project": {
            "_id": 0,
            "title": 1,
            "plot": 1,
            "score": {
                "$meta": "vectorSearchScore"
            }
        }
    }
]

# Execute the search
results = collection.aggregate(pipeline)

# Print results
for i in results:
    print(i)
```

### Agents and Tools

In [None]:
from langchain_core.tools import tool

@tool
def check_mflix_similarity(plot: str) -> str:
    """
    Check if the generated plot is similar
    to an existing movie plot in the MFlix vector store.
    """
    query_embedding = get_embedding(text = plot, model = model_ada002)
    results = run_vector_search(collection = embedded_movies_collection, query_vector = query_embedding, index_name="vector_index_plot",
                      path="plot_embedding", fields = ["title", "plot"], limit=5, exact=True, )
    return results




In [None]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import MessagesPlaceholder

plot_finder_system_prompt = """
Your role is to read the users plot and help the user check mflix similarity.

Return the plot found by similarity check or a question helping the user refine the plot for further serches.
"""

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", plot_finder_system_prompt),
        MessagesPlaceholder("chat_history", optional=True),
        ("human", "{input}"),
        MessagesPlaceholder("agent_scratchpad"),
    ]
)

In [None]:
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain.chat_models import init_chat_model


toolkit = [check_mflix_similarity]
llm = init_chat_model("llama-3.1-8b-instant", model_provider="groq")
agent = create_openai_tools_agent(llm, toolkit, prompt)

agent_executor = AgentExecutor(agent=agent, tools=toolkit, verbose=True)
result = agent_executor.invoke({"input": "A man trapped on Mars."})

print(result['output'])
print(result)

In [None]:
result = agent_executor.invoke({"input": "Story of the great fire of 1871."})

print(result['output'])
print(result)

### Short memory

The next steps will create a database langchain_db and a collection langchain_db.rag_with_memory that will act as a **short-term memory store** for previous user–LLM interactions.
This setup enables the system to retain recent context, conversation history across multiple queries, improving the coherence of responses.

In [None]:
from langchain_mongodb import MongoDBAtlasVectorSearch
from langchain_openai import OpenAIEmbeddings

# Use the text-embedding-ada-002 or voyage-3-large embedding model
embedding_model = OpenAIEmbeddings(model="text-embedding-ada-002")

# Create the vector store
vector_store_history = MongoDBAtlasVectorSearch.from_connection_string(
    connection_string=MONGO_CONNECTION_STRING,
    embedding=embedding_model,
    namespace="langchain_db.rag_with_memory"
)



vector_store_history.create_vector_search_index(
   dimensions = 1536
)

In [None]:
from pymongo import MongoClient
from langchain_core.documents import Document

query = {
    "awards.wins": {
        "$gt": 8,
        "$lt": 10
    }
}

projection = {"plot": 1, "title": 1, "directors": 1, "_id": 0}

# Fetch the data
movie_data = embedded_movies_collection.find(query, projection).limit(1000) # Limiting for a manageable example

# --- Create the list of LangChain Documents ---
docs = []

for movie in movie_data:
    # 1. Create the content (the plot)
    content = movie.get('plot', '')

    # 2. Create the metadata (the title and source)
    # Adding the directors list to the metadata is useful for context/retrieval
    metadata = {
        "title": movie.get('title'),
        "source": f"MFlix Movie Plot: {movie.get('title')}",
        "directors": movie.get('directors')
    }

    # 3. Create the Document object and append it to the list
    if content: # Ensure we only add documents with a plot
        doc = Document(page_content=content, metadata=metadata)
        docs.append(doc)

# Example: Check the first document
if docs:
    print(f"Successfully loaded {len(docs)} movie plots.")
    print("--- Example Document (Nolan Movie) ---")
    print(docs[0])
else:
    print("No documents found matching the criteria (Nolan movies with a plot/title).")

vector_store_history.add_documents(docs)

### Standalone Question Generation

Define a **prompt template** and a simple **processing chain** for rephrasing follow-up questions into standalone ones.


When users ask context-dependent questions in a conversation, this step ensures each query can be understood independently of prior turns—useful for retrieval and context indexing.

- **`standalone_system_prompt`** instructs the model to reformulate a follow-up question using the chat history, without answering it.
- **`ChatPromptTemplate`** structures the prompt, combining:
  - A system message with task instructions.
  - A `MessagesPlaceholder` for inserting the chat history dynamically.
  - The human message (`{question}`) containing the user’s latest input.
- **`StrOutputParser`** converts the model’s output into a plain string.

- The resulting `question_chain` pipes the prompt through the language model (`llm`) and parser, producing a clean standalone question ready for embedding or retrieval.

In [None]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import MessagesPlaceholder

# Create a prompt to generate standalone questions from follow-up questions
standalone_system_prompt = """
Given a chat history and a follow-up question, rephrase the follow-up question to be a standalone question.
Do NOT answer the question, just reformulate it if needed, otherwise return it as is.
Only return the final standalone question.
"""

standalone_question_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", standalone_system_prompt),
        MessagesPlaceholder(variable_name="history"),
        ("human", "{question}"),
    ]
)
# Parse output as a string
parse_output = StrOutputParser()

question_chain = standalone_question_prompt | llm | parse_output

In [None]:
history = [
    ("human", "What movies has Christopher Nolan directed?"),
    ("ai", "Here’s a list of major Christopher Nolan–directed films: Inception, Interstellar, Oppenheimer"),
]
followup_question = "Who was the lead actor in those films?"



result = question_chain.invoke({
    "history": history,
    "question": followup_question
})

print("Standalone question:")
print(result)

In [None]:
history = [
    ("human", "I’m in the mood for a movie about space exploration, but not too scary or dark. Something exciting and maybe a little inspiring."),
    ("ai", "Here’s a list of movies 1. Wall-E (2008) – If you want something lighter, this animated film mixes space adventure with charm, heart, and environmental themes. 2. The Martian (2015) – A gripping survival story with humor and optimism as an astronaut is stranded on Mars. Very inspiring and scientifically grounded. 3. Gravity (2013) – Visually breathtaking survival story in space. A bit intense at times, but overall more thrilling than dark.")
]
followup_question = "I want to find only the movies released before 2020."

result = question_chain.invoke({
    "history": history,
    "question": followup_question
})

print("Standalone question:")
print(result)

### Session-Based Chat History

Define a helper function to manage **session-specific chat memory** using MongoDB.

The `get_session_history()` function returns a `MongoDBChatMessageHistory` object that connects to the `langchain_db.rag_with_memory` collection.

In [None]:
from langchain_mongodb.chat_message_histories import MongoDBChatMessageHistory

def get_session_history(session_id: str) -> MongoDBChatMessageHistory:
    return MongoDBChatMessageHistory(
        connection_string=MONGO_CONNECTION_STRING,
        session_id=session_id,
        database_name="langchain_db",
        collection_name="rag_with_memory"
    )

RunnablePassthrough.assign add additional keys to a chain.

In [None]:
from langchain_core.runnables import RunnablePassthrough

# Create a retriever
retriever = vector_store_history.as_retriever(search_type="similarity", search_kwargs={ "k": 5 })

# Create a retriever chain that processes the question with history and retrieves documents
retriever_chain = RunnablePassthrough.assign(context=question_chain | retriever | (lambda docs: "\n\n".join([d.page_content for d in docs])))

In [None]:
# Create a prompt template that includes the retrieved context and chat history
rag_system_prompt = """Answer the question based only on the following context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", rag_system_prompt),
        MessagesPlaceholder(variable_name="history"),
        ("human", "{question}"),
    ]
)

In [None]:
# Build the RAG chain
from langchain_core.runnables.history import RunnableWithMessageHistory

rag_chain = (
    retriever_chain
    | rag_prompt
    | llm
    | parse_output
)

# Wrap the chain with message history
rag_with_memory = RunnableWithMessageHistory(
    rag_chain,
    get_session_history,
    input_messages_key="question",
    history_messages_key="history",
)

In [None]:
# First question
response_1 = rag_with_memory.invoke(
    {"question": "Show me movies that tell the story of artificial intelligence becoming dangerous?"},
    {"configurable": {"session_id": "user_1"}}
)
print(response_1)

In [None]:
# Follow-up question that references the previous question
response_2 = rag_with_memory.invoke(
    {"question": "Who was the lead actor in those films?"},
    {"configurable": {"session_id": "user_1"}}
)
print(response_2)

More details are to be found in:

[Mongo docs - create embeddings](https://github.com/mongodb/docs-notebooks/blob/main/create-embeddings/openai-new-data.ipynb)
[Mongo docs - LangChain](https://www.mongodb.com/docs/atlas/ai-integrations/langchain/get-started/#std-label-langchain-get-started)
