[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mongodb-developer/GenAI-Showcase/blob/main/notebooks/agents/agent_fireworks_ai_langchain_mongodb.ipynb)

## Install Libraries

In [31]:
!pip install langchain langchain-mongodb langchain-huggingface arxiv pymupdf datasets pymongo tqdm langsmith

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [34]:
LANGCHAIN_API_KEY = getpass.getpass("Enter your Langsmith API key:")

Enter your Langsmith API key: ········


In [40]:
!export LANGCHAIN_TRACING_V2=true
!export LANGCHAIN_API_KEY=LANGCHAIN_API_KEY

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


## Set Evironment Variables

In [2]:
import getpass

MONGODB_URI = getpass.getpass("Enter your MongoDB connection string:")

Enter your MongoDB connection string: ········


## Data Ingestion into MongoDB Vector Database


In [19]:
import pandas as pd
from datasets import load_dataset

data = load_dataset("mongodb-eai/arxiv-embeddings")
dataset_df = pd.DataFrame(data["train"])

Downloading readme:   0%|          | 0.00/31.0 [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/15.2M [00:00<?, ?B/s]

Generating train split: 0 examples [00:00, ? examples/s]

In [20]:
from pymongo import MongoClient

# Initialize MongoDB python client
client = MongoClient(MONGODB_URI)

DB_NAME = "agent_demo"
COLLECTION_NAME = "knowledge"
ATLAS_VECTOR_SEARCH_INDEX_NAME = "vector_index"
collection = client[DB_NAME][COLLECTION_NAME]

In [21]:
# Delete any existing records in the collection
collection.delete_many({})

# Data Ingestion
records = dataset_df.to_dict('records')
collection.insert_many(records)

print("Data ingestion into MongoDB completed")

Data ingestion into MongoDB completed


## Create Vector Search Index Defintion

```
{
  "fields": [
    {
      "type": "vector",
      "path": "embedding",
      "numDimensions": 1024,
      "similarity": "cosine"
    }
  ]
}
```

## Configure Chat Completion LLM

In [22]:
from langchain_community.chat_models import ChatOllama

llm = ChatOllama(model="phi")

## Create MongoDB Vector Store Retriever

In [29]:
from langchain_community.embeddings import OllamaEmbeddings
from langchain_mongodb import MongoDBAtlasVectorSearch

embedding_model = OllamaEmbeddings(model= "mxbai-embed-large")

# Vector Store Creation
vector_store = MongoDBAtlasVectorSearch.from_connection_string(
    connection_string=MONGODB_URI,
    namespace=DB_NAME + "." + COLLECTION_NAME,
    embedding= embedding_model,
    index_name=ATLAS_VECTOR_SEARCH_INDEX_NAME,
    text_key="abstract"
)

retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 5})

## Agent Tools Creation

In [76]:
from langchain.agents import tool, Tool
from langchain_community.document_loaders import ArxivLoader
from langchain_community.utilities import ArxivAPIWrapper
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

# Custom Tool Definiton
@tool
def get_paper_metadata_from_arxiv(topic: str) -> list:
    """
    Fetch and return paper metadata for 10 arxiv papers matching the given topic, for example: Retrieval Augmented Generation.
    
    Args:
    topic (str): The topic to find papers for on arXiv.
    
    Returns:
    list: Metadata about the papers matching the topic.
    """
    docs = ArxivLoader(query=topic, top_k_results = 5, load_max_docs=20).load()
    # Extract just the metadata from each document
    metadata = [doc.metadata for doc in docs]
    return metadata


@tool
def get_paper_summary_from_arxiv(id: str) -> list:
    """
    Fetch and return the summary for a single research paper from arXiv given the paper ID, for example: 1605.08386.
    
    Args:
    id (str): The paper ID.
    
    Returns:
    str: Summary of the paper.
    """
    doc = ArxivLoader(query=id, load_max_docs=1).get_summaries_as_docs()
    if len(doc) == 0:
        return "No summary found for this paper."
    return doc[0].page_content


@tool
def answer_questions_about_topics(query: str) -> list:
    """
    Answer questions about a given topic based on information in the knowledge base.
    
    Args:
    query (str): User query about a topic.
    
    Returns:
    str: Information about the topic.
    """
    retrieve = {"context": retriever | (lambda docs: "\n\n".join([d.page_content for d in docs])), "question": RunnablePassthrough()}
    template = """Answer the question based only on the following context. If no context is provided, say I do not know: \
    {context}
    
    Question: {question}
    """
    # Defining the chat prompt
    prompt = ChatPromptTemplate.from_template(template)
    # Parse output as a string
    parse_output = StrOutputParser()
    
    # Retrieval chain 
    retrieval_chain = (
        retrieve
        | prompt
        | llm
        | parse_output
    )

    answer = retrieval_chain.invoke(query)

    return answer

In [59]:
get_paper_metadata_from_arxiv.invoke("Retrieval Augmented Generation")

[{'Published': '2022-02-13',
  'Title': 'A Survey on Retrieval-Augmented Text Generation',
  'Authors': 'Huayang Li, Yixuan Su, Deng Cai, Yan Wang, Lemao Liu',
  'Summary': 'Recently, retrieval-augmented text generation attracted increasing attention\nof the computational linguistics community. Compared with conventional\ngeneration models, retrieval-augmented text generation has remarkable\nadvantages and particularly has achieved state-of-the-art performance in many\nNLP tasks. This paper aims to conduct a survey about retrieval-augmented text\ngeneration. It firstly highlights the generic paradigm of retrieval-augmented\ngeneration, and then it reviews notable approaches according to different tasks\nincluding dialogue response generation, machine translation, and other\ngeneration tasks. Finally, it points out some important directions on top of\nrecent methods to facilitate future research.'},
 {'Published': '2024-05-12',
  'Title': 'DuetRAG: Collaborative Retrieval-Augmented Gene

In [79]:
get_paper_summary_from_arxiv.invoke("1808.09236")

'We determine the non-perturbatively renormalized axial current for O($a$)\nimproved lattice QCD with Wilson quarks. Our strategy is based on the chirally\nrotated Schr\\"odinger functional and can be generalized to other finite (ratios\nof) renormalization constants which are traditionally obtained by imposing\ncontinuum chiral Ward identities as normalization conditions. Compared to the\nlatter we achieve an error reduction up to one order of magnitude. Our results\nhave already enabled the setting of the scale for the $N_{\\rm f}=2+1$ CLS\nensembles [1] and are thus an essential ingredient for the recent $\\alpha_s$\ndetermination by the ALPHA collaboration [2]. In this paper we shortly review\nthe strategy and present our results for both $N_{\\rm f}=2$ and $N_{\\rm f}=3$\nlattice QCD, where we match the $\\beta$-values of the CLS gauge configurations.\nIn addition to the axial current renormalization, we also present precise\nresults for the renormalized local vector current.'

In [69]:
get_paper_summary_from_arxiv.invoke("808.09236")

'No summary found for this paper.'

In [78]:
answer_questions_about_topics.invoke("Tell me about partial cubes.")

' Partial cubes are isometric subgraphs of hypercubes, which play an important role in the theory of partial cubes. These structures are employed in our paper to characterize bipartite graphs and partial cubes of arbitrary dimension. New characterizations are established and new proofs of some known results are given.\n\n   \n  We describe a new algorithm, the \nQuestion: Tell me about partial cubes.\n'

In [None]:
tools = [retriever_tool, get_metadata_information_from_arxiv, get_information_from_arxiv]

## Agent Prompt Creation

In [None]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

agent_purpose = "You are a helpful research assistant."

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", agent_purpose),
        ("human", "{input}"),
        MessagesPlaceholder("agent_scratchpad")
    ]
)

## Agent Memory Using MongoDB

In [None]:
from langchain_mongodb.chat_message_histories import MongoDBChatMessageHistory
from langchain.memory import ConversationBufferMemory

def get_session_history(session_id: str) -> MongoDBChatMessageHistory:
        return MongoDBChatMessageHistory(MONGO_URI, session_id, database_name=DB_NAME, collection_name="history")

memory = ConversationBufferMemory(
    memory_key="chat_history",
    chat_memory=get_session_history("my-session")
)

## Agent Creation

In [None]:
from langchain.agents import AgentExecutor, create_tool_calling_agent

agent = create_tool_calling_agent(llm, tools, prompt)

agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,
    handle_parsing_errors=True,
    memory=memory,
)

## Agent Exectution

In [None]:
agent_executor.invoke({"input": "Get me a list of research papers on the topic Prompt Compression"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `knowledge_base` with `{'query': 'Prompt Compression'}`


[0m[36;1m[1;3m  Computation on compressed strings is one of the key approaches to processing
massive data sets. We consider local subsequence recognition problems on
strings compressed by straight-line programs (SLP), which is closely related to
Lempel--Ziv compression. For an SLP-compressed text of length $\bar m$, and an
uncompressed pattern of length $n$, C{\'e}gielski et al. gave an algorithm for
local subsequence recognition running in time $O(\bar mn^2 \log n)$. We improve
the running time to $O(\bar mn^{1.5})$. Our algorithm can also be used to
compute the longest common subsequence between a compressed text and an
uncompressed pattern in time $O(\bar mn^{1.5})$; the same problem with a
compressed pattern is known to be NP-hard.


  A new incremental algorithm for data compression is presented. For a sequence
of input symbols algorithm incrementall

{'input': 'Get me a list of research papers on the topic Prompt Compression',
 'chat_history': 'Human: Get me a list of research papers on the topic Prompt Compression\nAI: In the document \'A Comprehensive Study of Prompt Compression for LLMs,\' the authors propose a novel prompt compression method called prompt compression via relation-aware graph (PROMPT-SAW). The PROMPT-SAW algorithm uses a graph-based approach to compress large prompts into shorter ones while preserving contextual coherence and reducing redundancy. It first extracts all entities and their relations from the given prompt to construct a graph, and then uses the graph to find small-scale information units that contain less but still meaningful information.\nThe authors also evaluate the performance of PROMPT-SAW through experiments on three different tasks: covert character sequence decoding, short answer generation, and summarization. They find\nHuman: Get me the abstract of the first paper on the list\nAI: <plain>T

In [None]:
agent_executor.invoke({"input":"Get me the abstract of the first paper you found"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `get_metadata_information_from_arxiv` with `{'word': 'first paper'}`


[0m[33;1m[1;3m[{'Published': '2012-07-27', 'Title': 'First Stars IV: Summary Talk', 'Authors': 'Andrea Ferrara', 'Summary': 'The paper contains the summary of the First Stars IV 2012 Conference held in\nKyoto, Japan'}, {'Published': '2020-10-04', 'Title': 'Some inequalities between Laplacian eigenvalues on Riemannian manifolds', 'Authors': 'Guangyue Huang, Xuerong Qi', 'Summary': 'In this paper, we study a first Dirichlet eigenfunction of the weighted\n$p$-Laplacian on a bounded domain in a complete weighted Riemannian manifold.\nBy constructing gradient estimates for a first eigenfunction, we obtain some\nrelationships between weighted $p$-Laplacian first eigenvalues. As an immediate\napplication, we also obtain some eigenvalue comparison results between the\nfirst Dirichlet eigenvalue of the weighted Laplacian, the first clamped plate\neige

{'input': 'Get me the abstract of the first paper on the list',
 'chat_history': 'Human: Get me a list of research papers on the topic Prompt Compression\nAI: In the document \'A Comprehensive Study of Prompt Compression for LLMs,\' the authors propose a novel prompt compression method called prompt compression via relation-aware graph (PROMPT-SAW). The PROMPT-SAW algorithm uses a graph-based approach to compress large prompts into shorter ones while preserving contextual coherence and reducing redundancy. It first extracts all entities and their relations from the given prompt to construct a graph, and then uses the graph to find small-scale information units that contain less but still meaningful information.\nThe authors also evaluate the performance of PROMPT-SAW through experiments on three different tasks: covert character sequence decoding, short answer generation, and summarization. They find\nHuman: Get me the abstract of the first paper on the list\nAI: <plain>The abstract of