# RAG with MongoDB Atlas and VertexAI Agent Engine using Langchain

**Vertex AI Agent Engine with Langchain**  is a powerful duo for building and deploying generative AI applications. It is one of the managed services in the Vertex AI console that porvides secure scalable runtime environment for your workload. Langchain provides the tool to design your application logic while reasoning engine provides environment to run it. With its flexibility to connect external data sources we can connect the MongoDB Atlas to Google VertexAI reasoning engine.

At the core lies **MongoDB Atlas Vector Search**. It excels at searching unstructured data using vector embeddings, allowing you to find similar information even if phrased differently. This empowers your AI to grasp the true meaning behind user queries. Langchain then steps in, providing a user-friendly framework to design your application logic. Here, you can leverage Langchain's flexibility to seamlessly integrate MongoDB Atlas Vector Search, enabling your AI to retrieve highly relevant data based on semantic similarity. Finally, Vertex AI Agent Engine provides a secure and scalable environment to run your creations. This trio simplifies development, offering a pre-built foundation and tools to focus on building innovative solutions. With MongoDB Atlas Vector Search's semantic understanding, your generative AI applications can deliver superior results and user experiences.

In this Notebook we will cover *How to build a RAG and deploy it as endpoints using Agent Engine, MongoDB Atlas and VertexAI*

First we will install all thre required dependecies and restart the kernel

In [15]:
!pip install --upgrade --quiet \
    "google-cloud-aiplatform[langchain,agent_engines]" requests datasets pymongo langchain langchain-community langchain-mongodb langchain-google-vertexai google-cloud-aiplatform langchain_google_genai requests beautifulsoup4

In [19]:
import sys

if "google.colab" in sys.modules:
    from google.colab import auth

    auth.authenticate_user()

import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)



{'status': 'ok', 'restart': True}

## Ingest data
To begin with the setup we will load the dataset to MongoDB Atlas. For user convinience, we are using an existing Hugingface MongoDB embedding dataset. Run the below code to import the *MongoDB/subset_arxiv_papers_with_embeddings* dataset as ds and load to MongoDB Atlas.




## Create vector search index on the newly created MongoDb collection


// To do: add code for creating atlas vector search index on the collection

In [1]:
import requests
from bs4 import BeautifulSoup
from google.cloud import aiplatform
from pymongo import MongoClient
from vertexai.language_models import TextEmbeddingInput, TextEmbeddingModel
from pymongo import MongoClient
import certifi
from googleapiclient import discovery
from IPython.display import display, Markdown
from langchain.agents.format_scratchpad.tools import format_to_tool_messages
from langchain_core import prompts
from langchain.memory import ChatMessageHistory
from vertexai.preview import reasoning_engines

In [None]:

# Scrape the website content
def scrape_website(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    content = ' '.join([p.text for p in soup.find_all('p')])
    return content

# Split the content into chunks of 1000 characters
def split_into_chunks(text, chunk_size=1000):
    return [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]

def get_text_embeddings(chunks):
    model = TextEmbeddingModel.from_pretrained("text-embedding-004")
    inputs = chunks[0]
    embeddings = model.get_embeddings(chunks)
    return [embedding.values for embedding in embeddings]

def write_to_mongoDB(embeddings, chunks, db_name, coll_name):
    client = MongoClient("MongoDB URI", tlsCAFile=certifi.where())
    db = client[db_name]
    collection = db[coll_name]

    for i in range(len(chunks)):
        collection.insert_one({
            "chunk": chunks[i],
            "embedding": embeddings[i]
        })




content = scrape_website("https://en.wikipedia.org/wiki/Star_Wars")
chunks = split_into_chunks(content)
embeddings_starwars = get_text_embeddings(chunks)
write_to_mongoDB(embeddings_starwars, chunks, "REASONING-ENGINE", "sample_starwars_embeddings")

content = scrape_website("https://en.wikipedia.org/wiki/Star_Trek")
chunks = split_into_chunks(content)
embeddings_starteck = get_text_embeddings(chunks)
write_to_mongoDB(embeddings_starteck, chunks, "REASONING-ENGINE", "sample_startrek_embeddings")


### Initilize Vertex AI SDK

To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/apis/enableflow?apiid=aiplatform.googleapis.com).

In [2]:
PROJECT_ID = "gcp-pov"  # @param {type:"string"}
LOCATION = "us-central1"  # @param {type:"string"}
STAGING_BUCKET = "gs://vshanbh01"  # @param {type:"string"}

import vertexai

vertexai.init(project=PROJECT_ID, location=LOCATION, staging_bucket=STAGING_BUCKET)

### Import Agent Engine library

In [3]:
from vertexai import agent_engines
from vertexai.preview.reasoning_engines import LangchainAgent

### 1. Define Model

As you construct your agent engine agent from the bottom up, the first component deals with which generative model you want to use in your agent. We are using "gemini-1.5-pro" which is latest at the time of creation of this python notebook. This LLM model will be used to build the RAG itself.

In [4]:
model = "gemini-1.5-pro-001"

### 2. Defile Function to read from MongoDB Atlas using langchain

The second component of your agent includes tools and functions, which enable the generative model to interact with MongoDB Atlas. We use Langchain to interact and query vectors from MongoDB Atlas. The function takes "query" as input and is trasformed into embeddings using Googles "textembedding-gecko@001" model. The function returns the queried data from MongoDB that has most similarity with the queried data.

In [5]:
def star_wars_query_tool(
    query: str
):
    """
    Retrieves vectors from a MongoDB database and uses them to answer a question related to Star wars.

    Args:
        query: The question to be answered about star wars.

    Returns:
        A dictionary containing the response to the question.
    """
    from langchain.chains import ConversationalRetrievalChain, RetrievalQA
    from langchain_mongodb import MongoDBAtlasVectorSearch
    from langchain_google_vertexai import VertexAIEmbeddings, ChatVertexAI
    from langchain.memory import ConversationBufferMemory, ConversationBufferWindowMemory
    from pymongo import MongoClient

    from langchain.prompts import PromptTemplate


    prompt_template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Do not return any answers from your own knowledge. Respond only in 2 or 3 sentences.


    {context}


    Question: {question}
    """
    PROMPT = PromptTemplate(
        template=prompt_template, input_variables=["context", "question"]
    )

    # Add your connection string in srv format below in place of URI
    client = MongoClient("Update your URI here")
    db = client["REASONING-ENGINE"]

    embeddings = VertexAIEmbeddings(model_name="textembedding-gecko@003")

    vs = MongoDBAtlasVectorSearch(
        collection=db["sample_starwars_embeddings"],
        embedding=embeddings,
        index_name="vector_index",
        embedding_key="embedding",
        text_key="chunk",
    )

    llm = ChatVertexAI(
        model_name="gemini-pro",
        convert_system_message_to_human=True,
        max_output_tokens=1000,
    )
    retriever = vs.as_retriever(
        search_type="mmr", search_kwargs={"k": 10, "lambda_mult": 0.25}
    )
    memory = ConversationBufferWindowMemory(
        memory_key="chat_history", k=5, return_messages=True
    )
    conversation_chain = ConversationalRetrievalChain.from_llm(
        llm=llm,
        retriever=retriever,
        memory=memory,
        combine_docs_chain_kwargs={"prompt": PROMPT},
    )
    response = conversation_chain({"question": query})

    return response

In [6]:
def star_trek_query_tool(
    query: str
):
    """
    Retrieves vectors from a MongoDB database and uses them to answer a question related to star trek.

    Args:
        query: The question to be answered about star trek.

    Returns:
        A dictionary containing the response to the question.
    """
    from langchain.chains import ConversationalRetrievalChain, RetrievalQA
    from langchain_mongodb import MongoDBAtlasVectorSearch
    from langchain_google_vertexai import VertexAIEmbeddings, ChatVertexAI
    from langchain.memory import ConversationBufferMemory, ConversationBufferWindowMemory
    from pymongo import MongoClient

    from langchain.prompts import PromptTemplate


    prompt_template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Do not return any answers from your own knowledge. Respond only in 2 or 3 sentences.


    {context}


    Question: {question}
    """
    PROMPT = PromptTemplate(
        template=prompt_template, input_variables=["context", "question"]
    )

    # Add your connection string in srv format below in place of URI
    client = MongoClient("Update Your URI here")
    db = client["REASONING-ENGINE"]

    embeddings = VertexAIEmbeddings(model_name="textembedding-gecko@003")

    vs = MongoDBAtlasVectorSearch(
        collection=db["sample_startrek_embeddings"],
        embedding=embeddings,
        index_name="vector_index",
        embedding_key="embedding",
        text_key="chunk",
    )

    llm = ChatVertexAI(
        model_name="gemini-pro",
        convert_system_message_to_human=True,
        max_output_tokens=1000,
    )
    retriever = vs.as_retriever(
        search_type="mmr", search_kwargs={"k": 10, "lambda_mult": 0.25}
    )
    memory = ConversationBufferWindowMemory(
        memory_key="chat_history", k=5, return_messages=True
    )
    conversation_chain = ConversationalRetrievalChain.from_llm(
        llm=llm,
        retriever=retriever,
        memory=memory,
        combine_docs_chain_kwargs={"prompt": PROMPT},
    )
    response = conversation_chain({"question": query})

    return response

### 3. Define agent for calling your function

The third component of your agent involves adding a reasoning layer, which helps your agent use the tools that you provided to help the end user achieve a higher-level goal.


In [7]:
# Initialize session history
store = {}


def get_session_history(session_id: str):
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

In [8]:
agent = LangchainAgent(
    model=model,
    chat_history=get_session_history,
    model_kwargs={"temperature": 0},
    tools=[star_wars_query_tool, star_trek_query_tool],
    agent_executor_kwargs={"return_intermediate_steps": True},
)




Test your agent with a sample query.

In [9]:
response = agent.query(
    input="Who was the antagonist in Star wars and is played by whom? ",
    config={"configurable": {"session_id": "demo"}},
)

display(Markdown(response["output"]))

  memory = ConversationBufferWindowMemory(
  response = conversation_chain({"question": query})


The main antagonist in the Star Wars series is Darth Vader, a dark lord of the Sith. He was originally played by David Prowse in the original trilogy, and later voiced by James Earl Jones. In the prequel trilogy, he appears as Anakin Skywalker, and was played by Hayden Christensen. 


In [12]:
response = agent.query(
    input="Which episode does David Prowse becomes darth vader? ",
    config={"configurable": {"session_id": "demo"}},
)

In [None]:
display(Markdown(response["output"]))

### 4. Deploy the agent

Now that you've specified a model, tools, and reasoning for your agent and tested it out, you're ready to deploy your agent as a remote service in Vertex AI!

In [10]:
remote_agent = agent_engines.create(
    agent,
    requirements=[
        "google-cloud-aiplatform[agent_engines,langchain]",
        "cloudpickle==3.0.0",
        "pydantic>=2.10",
        "requests",
        "langchain-mongodb",
        "pymongo",
        "langchain-google-vertexai"
    ],
)

INFO:vertexai.agent_engines:Identified the following requirements: {'google-cloud-aiplatform': '1.84.0', 'cloudpickle': '3.0.0'}
INFO:vertexai.agent_engines:The final list of requirements: ['google-cloud-aiplatform[agent_engines,langchain]', 'cloudpickle==3.0.0', 'pydantic>=2.10', 'requests', 'langchain-mongodb', 'pymongo', 'langchain-google-vertexai']
INFO:vertexai.agent_engines:Using bucket vshanbh01
INFO:vertexai.agent_engines:Wrote to gs://vshanbh01/agent_engine/agent_engine.pkl
INFO:vertexai.agent_engines:Writing to gs://vshanbh01/agent_engine/requirements.txt
INFO:vertexai.agent_engines:Creating in-memory tarfile of extra_packages
INFO:vertexai.agent_engines:Writing to gs://vshanbh01/agent_engine/dependencies.tar.gz
INFO:vertexai.agent_engines:Creating AgentEngine
INFO:vertexai.agent_engines:Create AgentEngine backing LRO: projects/787220387490/locations/us-central1/reasoningEngines/6930485672662794240/operations/8689450392997593088
INFO:vertexai.agent_engines:View progress and l

### Grant Discovery Engine Editor access to Agent Engine service account


Before you send queries to your remote agent, you'll need to grant the Discovery Engine Editor role to the Reasoning Engine service account.

After you've completed this step, you remote agent will be able to retrieve documents from the data store that you created in Vertex AI Search:

In [None]:
# Retrieve the project number associated with your project ID
service = discovery.build("cloudresourcemanager", "v1")
request = service.projects().get(projectId=PROJECT_ID)
response = request.execute()
project_number = response["projectNumber"]
project_number

'787220387490'

In [None]:
!gcloud projects add-iam-policy-binding {PROJECT_ID} \
    --member=serviceAccount:service-{project_number}@gcp-sa-aiplatform-re.iam.gserviceaccount.com \
    --role=roles/discoveryengine.editor

 [1] EXPRESSION=request.time < timestamp("2024-09-06T02:59:01.276Z"), TITLE=cloudbuild-connection-setup
 [2] None
 [3] Specify a new condition
The policy contains bindings with conditions, so specifying a condition is 
required when adding a binding. Please specify a condition.:  

Command killed by keyboard interrupt

^C


### Test your remotely deployed agent
With all of the core components of your community solar planning agent in place, you can send prompts to your remotely deployed agent to perform different tasks and test that it's working as expected

In [11]:
from vertexai.preview import reasoning_engines


REASONING_ENGINE_RESOURCE_NAME = "projects/787220387490/locations/us-central1/reasoningEngines/6930485672662794240"

remote_agent = reasoning_engines.ReasoningEngine(REASONING_ENGINE_RESOURCE_NAME)

response = remote_agent.query(
    input="tell me about episode 1 of star wars",
    config={"configurable": {"session_id": "demo"}},
)
print(response["output"])

response = remote_agent.query(
    input="Who was the main charecter in this series",
    config={"configurable": {"session_id": "demo"}},
)
print(response["output"])



Star Wars: Episode I – The Phantom Menace was the first film installment released as part of the prequel trilogy. It was released on May 19, 1999. The main plot lines involve the return of Darth Sidious, the Jedi's discovery of young Anakin Skywalker, and the invasion of Naboo by the Trade Federation. 

The main character in Star Wars is Luke Skywalker. He is a young farm boy who dreams of adventure and becomes a Jedi Knight. He fights against the evil Galactic Empire alongside his friends, Princess Leia and Han Solo. 



In [12]:
response = remote_agent.query(
    input="what is the episode 1 of star trek?",
    config={"configurable": {"session_id": "demo"}},
)
print(response["output"])

Episode 1 of Star Trek is called "The Man Trap". It was first aired on September 8, 1966. The story involves the Enterprise crew investigating the disappearance of a crew on a scientific outpost. It turns out that the crew members were killed by a creature that can take on someone else's form after it kills them. 

