[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mongodb-developer/GenAI-Showcase/blob/main/notebooks/agents/agent_fireworks_ai_langchain_mongodb.ipynb)


## Install Libraries


In [98]:
!pip install langchain-fireworks langchain-huggingface langchain-community langchain-mongodb arxiv pymupdf datasets pymongo tqdm

Collecting langchain-fireworks
  Using cached langchain_fireworks-0.1.3-py3-none-any.whl.metadata (3.9 kB)
Collecting langchain-huggingface
  Using cached langchain_huggingface-0.0.1-py3-none-any.whl.metadata (1.2 kB)
Collecting langchain-community
  Using cached langchain_community-0.2.1-py3-none-any.whl.metadata (8.9 kB)
Collecting langchain-mongodb
  Using cached langchain_mongodb-0.1.5-py3-none-any.whl.metadata (2.3 kB)
Collecting arxiv
  Using cached arxiv-2.1.0-py3-none-any.whl.metadata (6.1 kB)
Collecting pymupdf
  Downloading PyMuPDF-1.24.5-cp312-none-macosx_11_0_arm64.whl.metadata (3.4 kB)
Collecting datasets
  Using cached datasets-2.19.1-py3-none-any.whl.metadata (19 kB)
Collecting pymongo
  Using cached pymongo-4.7.2-cp312-cp312-macosx_11_0_arm64.whl.metadata (22 kB)
Collecting tqdm
  Using cached tqdm-4.66.4-py3-none-any.whl.metadata (57 kB)
Collecting aiohttp<4.0.0,>=3.9.1 (from langchain-fireworks)
  Using cached aiohttp-3.9.5-cp312-cp312-macosx_11_0_arm64.whl.metadata (

## Set Evironment Variables


In [93]:
import getpass

MONGODB_URI = getpass.getpass("Enter your MongoDB connection string:")

In [94]:
os.environ["FIREWORKS_API_KEY"] = getpass.getpass("Enter Fireworks API key:")

## Ingest Data into MongoDB Vector Database


In [95]:
import pandas as pd
from datasets import load_dataset

data = load_dataset("mongodb-eai/arxiv-embeddings")
dataset_df = pd.DataFrame(data["train"])

Using the latest cached version of the dataset since mongodb-eai/arxiv-embeddings couldn't be found on the Hugging Face Hub
Found the latest cached dataset configuration 'default' at /Users/apoorva.joshi/.cache/huggingface/datasets/mongodb-eai___arxiv-embeddings/default/0.0.0/489df6ceb90444598a5f73794db75a7dec209134 (last modified on Wed May 29 15:33:09 2024).


In [96]:
dataset_df.head()

Unnamed: 0,id,submitter,authors,title,comments,journal-ref,doi,report-no,categories,license,abstract,versions,update_date,authors_parsed,embedding
0,704.0001,Pavel Nadolsky,"C. Bal\'azs, E. L. Berger, P. M. Nadolsky, C.-...",Calculation of prompt diphoton production cros...,"37 pages, 15 figures; published version","Phys.Rev.D76:013009,2007",10.1103/PhysRevD.76.013009,ANL-HEP-PR-07-12,hep-ph,,A fully differential calculation in perturba...,"[{'version': 'v1', 'created': 'Mon, 2 Apr 2007...",1227657600000,"[[Balázs, C., ], [Berger, E. L., ], [Nadolsky,...","[0.2324569076, -0.894839108, -0.242858842, 0.1..."
1,704.0002,Louis Theran,Ileana Streinu and Louis Theran,Sparsity-certifying Graph Decompositions,To appear in Graphs and Combinatorics,,,,math.CO cs.CG,http://arxiv.org/licenses/nonexclusive-distrib...,"We describe a new algorithm, the $(k,\ell)$-...","[{'version': 'v1', 'created': 'Sat, 31 Mar 200...",1229126400000,"[[Streinu, Ileana, ], [Theran, Louis, ]]","[0.6949232221, 0.3588359952, 0.1817755997, 0.7..."
2,704.0003,Hongjun Pan,Hongjun Pan,The evolution of the Earth-Moon system based o...,"23 pages, 3 figures",,,,physics.gen-ph,,The evolution of Earth-Moon system is descri...,"[{'version': 'v1', 'created': 'Sun, 1 Apr 2007...",1200182400000,"[[Pan, Hongjun, ]]","[0.1294624656, 1.1964389086, 0.8928941488, -0...."
3,704.0004,David Callan,David Callan,A determinant of Stirling cycle numbers counts...,11 pages,,,,math.CO,,We show that a determinant of Stirling cycle...,"[{'version': 'v1', 'created': 'Sat, 31 Mar 200...",1179878400000,"[[Callan, David, ]]","[-0.0994227678, -0.364127785, 0.5390082002, -0..."
4,704.0005,Alberto Torchinsky,Wael Abu-Shammala and Alberto Torchinsky,From dyadic $\Lambda_{\alpha}$ to $\Lambda_{\a...,,"Illinois J. Math. 52 (2008) no.2, 681-689",,,math.CA math.FA,,In this paper we show how to compute the $\L...,"[{'version': 'v1', 'created': 'Mon, 2 Apr 2007...",1381795200000,"[[Abu-Shammala, Wael, ], [Torchinsky, Alberto, ]]","[0.0711007342, 0.5356642008, 0.5095595121, 0.4..."


In [97]:
from pymongo import MongoClient

# Initialize MongoDB python client
client = MongoClient(MONGODB_URI)

DB_NAME = "agent_demo"
COLLECTION_NAME = "knowledge"
ATLAS_VECTOR_SEARCH_INDEX_NAME = "vector_index"
collection = client[DB_NAME][COLLECTION_NAME]

In [6]:
# Delete any existing records in the collection
collection.delete_many({})

# Data Ingestion
records = dataset_df.to_dict("records")
collection.insert_many(records)

print("Data ingestion into MongoDB completed")

Data ingestion into MongoDB completed


## Create Vector Search Index Defintion

```
{
  "fields": [
    {
      "type": "vector",
      "path": "embedding",
      "numDimensions": 1024,
      "similarity": "cosine"
    }
  ]
}
```


## Create MongoDB Vector Store Retriever


In [101]:
from langchain_huggingface.embeddings import HuggingFaceEmbeddings
from langchain_mongodb import MongoDBAtlasVectorSearch

embedding_model = HuggingFaceEmbeddings(model_name="mixedbread-ai/mxbai-embed-large-v1")

# Vector Store Creation
vector_store = MongoDBAtlasVectorSearch.from_connection_string(
    connection_string=MONGODB_URI,
    namespace=DB_NAME + "." + COLLECTION_NAME,
    embedding=embedding_model,
    index_name=ATLAS_VECTOR_SEARCH_INDEX_NAME,
    text_key="abstract",
)

retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 5})

In [102]:
from langchain_fireworks import ChatFireworks

llm = ChatFireworks(
    model="accounts/fireworks/models/firefunction-v1", temperature=0.0, max_tokens=1024
)

## Create Agent Tools


In [137]:
from langchain.pydantic_v1 import BaseModel, Field
from langchain.tools import BaseTool, tool
from typing import Type
from langchain_community.document_loaders import ArxivLoader
from langchain_core.prompts import ChatPromptTemplate
from langchain_community.chat_models import ChatOllama
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser


@tool
def get_paper_metadata_from_arxiv(topic: str) -> list:
    """
    Fetch and return paper metadata for 10 arxiv papers matching the given topic, for example: Retrieval Augmented Generation.

    Args:
    topic (str): The topic to find papers for on arXiv.

    Returns:
    list: Metadata about the papers matching the topic.
    """
    docs = ArxivLoader(query=topic, load_max_docs=5).load()
    # Extract just the metadata from each document
    metadata = [doc.metadata for doc in docs]
    return metadata


@tool
def get_paper_summary_from_arxiv(id: str) -> list:
    """
    Fetch and return the summary for a single research paper from arXiv given the paper ID, for example: 1605.08386.

    Args:
    id (str): The paper ID.

    Returns:
    str: Summary of the paper.
    """
    doc = ArxivLoader(query=id, load_max_docs=1).get_summaries_as_docs()
    if len(doc) == 0:
        return "No summary found for this paper."
    return doc[0].page_content


@tool
def answer_questions_about_topics(query: str) -> list:
    """
    Answer questions about a given topic based on information in the knowledge base.

    Args:
    query (str): User query about a topic.

    Returns:
    str: Information about the topic.
    """
    retrieve = {
        "context": retriever
        | (lambda docs: "\n\n".join([d.page_content for d in docs])),
        "question": RunnablePassthrough(),
    }
    template = """Answer the question based only on the following context. If no context is provided, say I do not know: \
    {context}

    Question: {question}
    """
    # Defining the chat prompt
    prompt = ChatPromptTemplate.from_template(template)
    # Parse output as a string
    parse_output = StrOutputParser()
    # Retrieval chain
    retrieval_chain = retrieve | prompt | llm | parse_output

    answer = retrieval_chain.invoke(query)

    return answer

In [110]:
get_paper_metadata_from_arxiv.invoke("Retrieval Augmented Generation")

[{'Published': '2022-02-13',
  'Title': 'A Survey on Retrieval-Augmented Text Generation',
  'Authors': 'Huayang Li, Yixuan Su, Deng Cai, Yan Wang, Lemao Liu',
  'Summary': 'Recently, retrieval-augmented text generation attracted increasing attention\nof the computational linguistics community. Compared with conventional\ngeneration models, retrieval-augmented text generation has remarkable\nadvantages and particularly has achieved state-of-the-art performance in many\nNLP tasks. This paper aims to conduct a survey about retrieval-augmented text\ngeneration. It firstly highlights the generic paradigm of retrieval-augmented\ngeneration, and then it reviews notable approaches according to different tasks\nincluding dialogue response generation, machine translation, and other\ngeneration tasks. Finally, it points out some important directions on top of\nrecent methods to facilitate future research.'},
 {'Published': '2024-05-12',
  'Title': 'DuetRAG: Collaborative Retrieval-Augmented Gene

In [111]:
get_paper_summary_from_arxiv.invoke("1808.09236")

'We determine the non-perturbatively renormalized axial current for O($a$)\nimproved lattice QCD with Wilson quarks. Our strategy is based on the chirally\nrotated Schr\\"odinger functional and can be generalized to other finite (ratios\nof) renormalization constants which are traditionally obtained by imposing\ncontinuum chiral Ward identities as normalization conditions. Compared to the\nlatter we achieve an error reduction up to one order of magnitude. Our results\nhave already enabled the setting of the scale for the $N_{\\rm f}=2+1$ CLS\nensembles [1] and are thus an essential ingredient for the recent $\\alpha_s$\ndetermination by the ALPHA collaboration [2]. In this paper we shortly review\nthe strategy and present our results for both $N_{\\rm f}=2$ and $N_{\\rm f}=3$\nlattice QCD, where we match the $\\beta$-values of the CLS gauge configurations.\nIn addition to the axial current renormalization, we also present precise\nresults for the renormalized local vector current.'

In [113]:
get_paper_summary_from_arxiv.invoke("808.09236")

'No summary found for this paper.'

In [120]:
answer_questions_about_topics.invoke("What are partial cubes?")

"Partial cubes are isometric subgraphs of hypercubes. They are characterized by structures on a graph defined by means of semicubes, and Djokovi\\'{c}'s and Winkler's relations. These structures are employed in the paper to characterize bipartite graphs and partial cubes of arbitrary dimension."

In [116]:
tools = [
    get_paper_metadata_from_arxiv,
    get_paper_summary_from_arxiv,
    answer_questions_about_topics,
]

## Basic Agent


In [133]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.agents import AgentExecutor, create_tool_calling_agent

instructions = """You are a helpful research assistant.
When answering questions about a particular topic, only use the answer_questions_about_topics tool.
DO NOT use your own knowledge."""

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", instructions),
        ("user", "{input}"),
        MessagesPlaceholder("agent_scratchpad"),
    ]
)

agent = create_tool_calling_agent(llm, tools, prompt)

agent_executor = AgentExecutor(
    agent=agent, tools=tools, verbose=True, handle_parsing_errors=True
)

In [129]:
agent_executor.invoke({"input": "Give me papers on the topic prompt compression."})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `get_paper_metadata_from_arxiv` with `{'topic': 'prompt compression'}`


[0m[36;1m[1;3m[{'Published': '2024-03-30', 'Title': 'PROMPT-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression', 'Authors': 'Muhammad Asif Ali, Zhengping Li, Shu Yang, Keyuan Cheng, Yang Cao, Tianhao Huang, Lijie Hu, Lu Yu, Di Wang', 'Summary': "Large language models (LLMs) have shown exceptional abilities for multiple\ndifferent natural language processing tasks. While prompting is a crucial tool\nfor LLM inference, we observe that there is a significant cost associated with\nexceedingly lengthy prompts. Existing attempts to compress lengthy prompts lead\nto sub-standard results in terms of readability and interpretability of the\ncompressed prompt, with a detrimental impact on prompt utility. To address\nthis, we propose PROMPT-SAW: Prompt compresSion via Relation AWare graphs, an\neffective strategy for prompt compressi

{'input': 'Give me papers on the topic prompt compression.',
 'output': 'Here are some papers on the topic of prompt compression:\n\n1. "PROMPT-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression" by Muhammad Asif Ali, Zhengping Li, Shu Yang, Keyuan Cheng, Yang Cao, Tianhao Huang, Lijie Hu, Lu Yu, Di Wang\n2. "Say More with Less: Understanding Prompt Learning Behaviors through Gist Compression" by Xinze Li, Zhenghao Liu, Chenyan Xiong, Shi Yu, Yukun Yan, Shuo Wang, Ge Yu\n3. "Compress, Then Prompt: Improving Accuracy-Efficiency Trade-off of LLM Inference with Transferable Prompt" by Zhaozhuo Xu, Zirui Liu, Beidi Chen, Yuxin Tang, Jue Wang, Kaixiong Zhou, Xia Hu, Anshumali Shrivastava\n4. "PromptCIR: Blind Compressed Image Restoration with Prompt Learning" by Bingchen Li, Xin Li, Yiting Lu, Ruoyu Feng, Mengxi Guo, Shijie Zhao, Li Zhang, Zhibo Chen\n5. "Learning to Compress Prompt in Natural Language Formats" by Yu-Neng Chuang, Tianwei Xing, Chia-Yuan Chang, Zirui Liu, X

In [123]:
agent_executor.invoke({"input": "Give me a summary of paper id 2401.00002"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `get_paper_summary_from_arxiv` with `{'id': '2401.00002'}`


[0m[33;1m[1;3mWe show evidence of particle acceleration at GEV energies associated directly
with protons from the prompt emission of a long-duration M6-class solar flare
on July 17, 2023, rather than from protons acceleration by shocks from its
associated Coronal Mass Ejection (CME), which erupted with a speed of 1342
km/s. Solar Energetic Particles (SEP) accelerated by the blast have reached
Earth, up to an almost S3 (strong) category of a radiation storm on the NOAA
scale. Also, we show a temporal correlation between the fast rising of GOES-16
proton and muon excess at ground level in the count rate of the New-Tupi muon
detector at the central SAA region. A Monte Carlo spectral analysis based on
muon excess at New-Tupi is consistent with the acceleration of electrons and
protons (ions) up to relativistic energies (GeV energy range) in the impulsive
p

{'input': 'Give me a summary of paper id 2401.00002',
 'output': 'The paper with ID 2401.00002 is about the discovery of particle acceleration at GEV energies associated directly with protons from the prompt emission of a long-duration M6-class solar flare on July 17, 2023. The paper also discusses the correlation between the fast rising of GOES-16 proton and muon excess at ground level in the count rate of the New-Tupi muon detector at the central SAA region. Additionally, the paper presents two marginal particle excesses (with low confidence) at ground-level detectors in correlation with the solar flare prompt emission.'}

In [136]:
agent_executor.invoke({"input": "Tell me about Retrieval Augmented Generation."})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `answer_questions_about_topics` with `{'query': 'Retrieval Augmented Generation'}`


[0m[38;5;200m[1;3m I do not know. The provided context does not mention anything about "Retrieval Augmented Generation." <|end|>[0m[32;1m[1;3mI apologize, but I don't have any information about Retrieval Augmented Generation at the moment. Please try again later.[0m

[1m> Finished chain.[0m


{'input': 'Tell me about Retrieval Augmented Generation.',
 'output': "I apologize, but I don't have any information about Retrieval Augmented Generation at the moment. Please try again later."}

## Agent Memory Using MongoDB


In [None]:
from langchain_mongodb.chat_message_histories import MongoDBChatMessageHistory
from langchain.memory import ConversationBufferMemory


def get_session_history(session_id: str) -> MongoDBChatMessageHistory:
    return MongoDBChatMessageHistory(
        MONGO_URI, session_id, database_name=DB_NAME, collection_name="history"
    )


memory = ConversationBufferMemory(
    memory_key="chat_history", chat_memory=get_session_history("my-session")
)

## Agent Creation


In [None]:
from langchain.agents import AgentExecutor, create_tool_calling_agent

agent = create_tool_calling_agent(llm, tools, prompt)

agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,
    handle_parsing_errors=True,
    memory=memory,
)

## Agent Exectution


In [None]:
agent_executor.invoke(
    {"input": "Get me a list of research papers on the topic Prompt Compression"}
)

In [None]:
agent_executor.invoke({"input": "Get me the abstract of the first paper you found"})