## Install Libraries

One-click install of all the required libraries


In [13]:
!pip install -qU langchain langchainhub langchain-community langchain-fireworks langchain-huggingface langchain-mongodb arxiv pymupdf datasets pymongo

## Setup Pre-requisites

You should have obtained the MongoDB connection string and Fireworks API Key during the first hands-on breakout. Set those here when prompted.


In [2]:
import getpass
import os

MONGODB_URI = getpass.getpass("Enter your MongoDB connection string:")

In [3]:
os.environ["FIREWORKS_API_KEY"] = getpass.getpass("Enter Fireworks API key:")

## Ingest Data into MongoDB Atlas

We will use MongoDB Atlas as the vector store/knowledge base for one of the agent tools. But first, we need to create the knowledge base.


In [4]:
# Download a dataset consisting of a subset of ArxiV papers along with their embedded abstracts.
# Available at https://huggingface.co/datasets/mongodb-eai/arxiv-embeddings

import pandas as pd
from datasets import load_dataset

data = load_dataset("mongodb-eai/arxiv-embeddings")
dataset_df = pd.DataFrame(data["train"])

  from .autonotebook import tqdm as notebook_tqdm


In [5]:
# Preview the dataset
dataset_df.head()

Unnamed: 0,id,submitter,authors,title,comments,journal-ref,doi,report-no,categories,license,abstract,versions,update_date,authors_parsed,embedding
0,704.0001,Pavel Nadolsky,"C. Bal\'azs, E. L. Berger, P. M. Nadolsky, C.-...",Calculation of prompt diphoton production cros...,"37 pages, 15 figures; published version","Phys.Rev.D76:013009,2007",10.1103/PhysRevD.76.013009,ANL-HEP-PR-07-12,hep-ph,,A fully differential calculation in perturba...,"[{'version': 'v1', 'created': 'Mon, 2 Apr 2007...",1227657600000,"[[Balázs, C., ], [Berger, E. L., ], [Nadolsky,...","[0.2324569076, -0.894839108, -0.242858842, 0.1..."
1,704.0002,Louis Theran,Ileana Streinu and Louis Theran,Sparsity-certifying Graph Decompositions,To appear in Graphs and Combinatorics,,,,math.CO cs.CG,http://arxiv.org/licenses/nonexclusive-distrib...,"We describe a new algorithm, the $(k,\ell)$-...","[{'version': 'v1', 'created': 'Sat, 31 Mar 200...",1229126400000,"[[Streinu, Ileana, ], [Theran, Louis, ]]","[0.6949232221, 0.3588359952, 0.1817755997, 0.7..."
2,704.0003,Hongjun Pan,Hongjun Pan,The evolution of the Earth-Moon system based o...,"23 pages, 3 figures",,,,physics.gen-ph,,The evolution of Earth-Moon system is descri...,"[{'version': 'v1', 'created': 'Sun, 1 Apr 2007...",1200182400000,"[[Pan, Hongjun, ]]","[0.1294624656, 1.1964389086, 0.8928941488, -0...."
3,704.0004,David Callan,David Callan,A determinant of Stirling cycle numbers counts...,11 pages,,,,math.CO,,We show that a determinant of Stirling cycle...,"[{'version': 'v1', 'created': 'Sat, 31 Mar 200...",1179878400000,"[[Callan, David, ]]","[-0.0994227678, -0.364127785, 0.5390082002, -0..."
4,704.0005,Alberto Torchinsky,Wael Abu-Shammala and Alberto Torchinsky,From dyadic $\Lambda_{\alpha}$ to $\Lambda_{\a...,,"Illinois J. Math. 52 (2008) no.2, 681-689",,,math.CA math.FA,,In this paper we show how to compute the $\L...,"[{'version': 'v1', 'created': 'Mon, 2 Apr 2007...",1381795200000,"[[Abu-Shammala, Wael, ], [Torchinsky, Alberto, ]]","[0.0711007342, 0.5356642008, 0.5095595121, 0.4..."


In [6]:
# Initialize MongoDB client and other variables for the vector store

from pymongo import MongoClient

# Initialize MongoDB Python client
client = MongoClient(MONGODB_URI)

# Name of the database -- Change if needed or leave as is
DB_NAME = "agents_workshop"
# Name of the collection -- Change if needed or leave as is
COLLECTION_NAME = "knowledge"
# Name of the vector search index -- Change if needed or leave as is
ATLAS_VECTOR_SEARCH_INDEX_NAME = "vector_index"
collection = client[DB_NAME][COLLECTION_NAME]

In [7]:
# Delete any existing records in the collection
collection.delete_many({})
# Ingest data into the collection
records = dataset_df.to_dict("records")
# Bulk insert multiple records with a single `insert_many` call
collection.insert_many(records)

print("Data ingestion into MongoDB completed")

Data ingestion into MongoDB completed


## Create Vector Search Index Defintion

To be done in the Atlas UI. Follow the documentation for instructions.

```
{
  "fields": [
    {
      "type": "vector",
      "path": "embedding",
      "numDimensions": 1024,
      "similarity": "cosine"
    }
  ]
}
```


## Create MongoDB Vector Store Retriever

Now that we have ingested data into a MongoDB Collection, let's use it to create a vector store retriever.


In [None]:
# Embedding model to use for the vector store -- DO NOT CHANGE
from langchain_huggingface.embeddings import HuggingFaceEmbeddings

embedding_model = HuggingFaceEmbeddings(model_name="mixedbread-ai/mxbai-embed-large-v1")

In [10]:
from langchain_mongodb import MongoDBAtlasVectorSearch

# Create a MongoDBAtlas vector store object
# Reference docs: https://api.python.langchain.com/en/latest/_modules/langchain_mongodb/vectorstores.html#MongoDBAtlasVectorSearch
# Use the `from_connection_string` method of the MongoDBAtlAsVectorSearch class.
# Arguments: connection_string, namespace, embedding, index_name, text_key
vector_store = # INSERT CODE HERE

# Construct a retriever from the vector store
# Reference docs: https://python.langchain.com/v0.1/docs/modules/data_connection/retrievers/vectorstore/
# Set the search type for the retriever to `similarity` and `k` to 5.
retriever = # INSERT CODE HERE



## Instantiate Chat Completion LLM

Instantiate the chat completion LLM to use as the "brain" of our agent and for any of the tools if required.

We will use Fireworks AI's open-source AND free `firefunction-v1` model.


In [11]:
from langchain_fireworks import ChatFireworks

# Reference docs: https://python.langchain.com/v0.1/docs/integrations/chat/fireworks/
# Create an instance of `ChatFireworks` with the model `accounts/fireworks/models/firefunction-v1`, temperature set to 0.0 and max tokens set to 1024
llm = # INSERT CODE HERE

## Create Agent Tools

Define tools for the agent to use.


In [14]:
from langchain.tools import tool
from langchain_community.document_loaders import ArxivLoader
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser


@tool
def get_paper_metadata_from_arxiv(topic: str) -> list:
    """
    Fetch and return paper metadata for 10 arxiv papers matching the given topic, for example: Retrieval Augmented Generation.

    Args:
    topic (str): The topic to find papers for on arXiv.

    Returns:
    list: Metadata about the papers matching the topic.
    """
    docs = ArxivLoader(query=topic, load_max_docs=5).load()
    # Extract just the metadata from each document
    metadata = [doc.metadata for doc in docs]
    return metadata


@tool
def get_paper_summary_from_arxiv(id: str) -> str:
    """
    Fetch and return the summary for a single research paper from arXiv given the paper ID, for example: 1605.08386.

    Args:
    id (str): The paper ID.

    Returns:
    str: Summary of the paper.
    """
    # Reference docs:
    # https://python.langchain.com/v0.1/docs/integrations/document_loaders/arxiv/
    # https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.arxiv.ArxivLoader.html
    # Create a tool that uses the ArxivLoader to return summaries given a paper ID (`id`).
    # HINT: Use the `get_summaries_as_docs` method of `ArxivLoader`
    # Handle the case where the paper ID is invalid i.e. number of docs returned from `ArxivLoader` are 0.
    # INSERT CODE HERE


@tool
def answer_questions_about_topics(query: str) -> list:
    """
    Answer questions about a given topic based on information in the knowledge base.

    Args:
    query (str): User query about a topic.

    Returns:
    str: Information about the topic.
    """
    # Reference docs: https://python.langchain.com/v0.1/docs/use_cases/question_answering/quickstart/
    # Follow the example in the reference docs above to create a tool that uses the Fireworks LLM we instantiated above to answer questions based on content in our MongoDB Atlas knowledge base.
    # Create a RAG chain similar to the one in the example and return the response of the `invoke` call with `query` as an argument
    # INSERT CODE HERE

## Test out the tools

Test out the tools individually to make sure we are getting the right responses from them. Remember tools are LangChain Runnables so we can use the `invoke` method on them.


In [None]:
get_paper_metadata_from_arxiv.invoke("Retrieval Augmented Generation")

In [None]:
get_paper_summary_from_arxiv.invoke("1808.09236")

In [None]:
get_paper_summary_from_arxiv.invoke("808.09236")

In [None]:
answer_questions_about_topics.invoke("What are partial cubes?")

In [19]:
# Create the list of tools
tools = [
    get_paper_metadata_from_arxiv,
    get_paper_summary_from_arxiv,
    answer_questions_about_topics,
]

## Create a Basic Tool-calling Agent


In [20]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain.tools.render import render_text_description

system_message = f"""Answer the following questions as best you can.
You can answer directly if the user is greeting you or similar.
Otherwise, you have access to the following tools:

{render_text_description(tools)}
"""

# # CoT prompt
# system_message = f"""Given a question, write out in a step-by-step manner your reasoning
# for how you will solve the problem to be sure that your conclusion is correct.
# Avoid simply stating the correct answer at the outset.You can answer directly if the user
# is greeting you or similar. Otherwise, you have access to the following tools:

# {render_text_description(tools)}

# Begin!
# """

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_message),
        ("human", "{input}"),
        # Placeholders fill up a **list** of messages
        MessagesPlaceholder("agent_scratchpad"),
    ]
)

agent = create_tool_calling_agent(llm, tools, prompt)

agent_executor = AgentExecutor(
    agent=agent, tools=tools, verbose=True, handle_parsing_errors=True
)

In [21]:
agent_executor.invoke({"input": "Give me papers on the topic prompt compression."})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `get_paper_metadata_from_arxiv` with `{'topic': 'prompt compression'}`


[0m[36;1m[1;3m[{'Published': '2024-03-30', 'Title': 'PROMPT-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression', 'Authors': 'Muhammad Asif Ali, Zhengping Li, Shu Yang, Keyuan Cheng, Yang Cao, Tianhao Huang, Lijie Hu, Lu Yu, Di Wang', 'Summary': "Large language models (LLMs) have shown exceptional abilities for multiple\ndifferent natural language processing tasks. While prompting is a crucial tool\nfor LLM inference, we observe that there is a significant cost associated with\nexceedingly lengthy prompts. Existing attempts to compress lengthy prompts lead\nto sub-standard results in terms of readability and interpretability of the\ncompressed prompt, with a detrimental impact on prompt utility. To address\nthis, we propose PROMPT-SAW: Prompt compresSion via Relation AWare graphs, an\neffective strategy for prompt compressi

{'input': 'Give me papers on the topic prompt compression.',
 'output': 'Here are some papers on the topic of prompt compression:\n\n1. "PROMPT-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression" by Muhammad Asif Ali, Zhengping Li, Shu Yang, Keyuan Cheng, Yang Cao, Tianhao Huang, Lijie Hu, Lu Yu, Di Wang. Published on 2024-03-30.\n\n2. "Say More with Less: Understanding Prompt Learning Behaviors through Gist Compression" by Xinze Li, Zhenghao Liu, Chenyan Xiong, Shi Yu, Yukun Yan, Shuo Wang, Ge Yu. Published on 2024-02-25.\n\n3. "Compress, Then Prompt: Improving Accuracy-Efficiency Trade-off of LLM Inference with Transferable Prompt" by Zhaozhuo Xu, Zirui Liu, Beidi Chen, Yuxin Tang, Jue Wang, Kaixiong Zhou, Xia Hu, Anshumali Shrivastava. Published on 2023-10-10.\n\n4. "PromptCIR: Blind Compressed Image Restoration with Prompt Learning" by Bingchen Li, Xin Li, Yiting Lu, Ruoyu Feng, Mengxi Guo, Shijie Zhao, Li Zhang, Zhibo Chen. Published on 2024-04-26.\n\n5. "Learnin

## Bonus: Without using `create_tool_calling_agent`


In [22]:
from langchain.agents.output_parsers.tools import ToolsAgentOutputParser
from langchain.agents.format_scratchpad.tools import (
    format_to_tool_messages,
)

llm_with_tools = llm.bind_tools(tools)

agent = (
    RunnablePassthrough.assign(
        agent_scratchpad=lambda x: format_to_tool_messages(x["intermediate_steps"])
    )
    | prompt
    | llm_with_tools
    | ToolsAgentOutputParser()
)

agent_executor = AgentExecutor(
    agent=agent, tools=tools, verbose=True, handle_parsing_errors=True
)

In [None]:
agent_executor.invoke({"input": "Give me papers on the topic prompt compression."})

## Create a ReAct Agent


In [24]:
from langchain.agents import create_react_agent
from langchain import hub

prompt = hub.pull("hwchase17/react")
prompt.pretty_print()

Answer the following questions as best you can. You have access to the following tools:

[33;1m[1;3m{tools}[0m

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [[33;1m[1;3m{tool_names}[0m]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin!

Question: [33;1m[1;3m{input}[0m
Thought:[33;1m[1;3m{agent_scratchpad}[0m


In [25]:
agent = create_react_agent(llm, tools, prompt)
agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,
    handle_parsing_errors="Check your output. Make an observation in order to determine whether or not you have the final answer.\
        If you do, use the exact characters `Final Answer` and exit.",
)

In [26]:
agent_executor.invoke({"input": "Give me the summary for the paper 1808.09236."})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mI need to use the get_paper_summary_from_arxiv tool to get the summary for the paper with ID 1808.09236.

Action: get_paper_summary_from_arxiv
Action Input: 1808.09236[0m[33;1m[1;3mWe determine the non-perturbatively renormalized axial current for O($a$)
improved lattice QCD with Wilson quarks. Our strategy is based on the chirally
rotated Schr\"odinger functional and can be generalized to other finite (ratios
of) renormalization constants which are traditionally obtained by imposing
continuum chiral Ward identities as normalization conditions. Compared to the
latter we achieve an error reduction up to one order of magnitude. Our results
have already enabled the setting of the scale for the $N_{\rm f}=2+1$ CLS
ensembles [1] and are thus an essential ingredient for the recent $\alpha_s$
determination by the ALPHA collaboration [2]. In this paper we shortly review
the strategy and present our results for both $N_{\rm f}=2$ a

{'input': 'Give me the summary for the paper 1808.09236.',
 'output': 'The summary for the paper with ID 1808.09236 is: "We determine the non-perturbatively renormalized axial current for O($a$) improved lattice QCD with Wilson quarks. Our strategy is based on the chirally rotated Schr\\"odinger functional and can be generalized to other finite (ratios of) renormalization constants which are traditionally obtained by imposing continuum chiral Ward identities as normalization conditions. Compared to the latter we achieve an error reduction up to one order of magnitude. Our results have already enabled the setting of the scale for the $N_{\rm f}=2+1$ CLS ensembles [1] and are thus an essential ingredient for the recent $\\alpha_s$ determination by the ALPHA collaboration [2]. In this paper we shortly review the strategy and present our results for both $N_{\rm f}=2$ and $N_{\rm f}=3$ lattice QCD, where we match the $\\beta$-values of the CLS gauge configurations. In addition to the axial

## Add Memory to Agents using MongoDB


In [27]:
from langchain_mongodb.chat_message_histories import MongoDBChatMessageHistory


def get_session_history(session_id: str) -> MongoDBChatMessageHistory:
    return MongoDBChatMessageHistory(
        MONGODB_URI, session_id, database_name=DB_NAME, collection_name="history"
    )

In [28]:
system_message = f"""Answer the following questions as best you can.
You can answer directly if the user is greeting you or similar.
Otherwise, you have access to the following tools:

{render_text_description(tools)}
"""

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_message),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
        # Placeholders fill up a **list** of messages
        MessagesPlaceholder("agent_scratchpad"),
    ]
)

agent = create_tool_calling_agent(llm, tools, prompt)

agent_executor = AgentExecutor(
    agent=agent, tools=tools, verbose=True, handle_parsing_errors=True
)

In [29]:
from langchain_core.runnables.history import RunnableWithMessageHistory

agent_with_chat_history = RunnableWithMessageHistory(
    agent_executor,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
)

In [30]:
agent_with_chat_history.invoke(
    {"input": "Get me papers on Prompt Compression."},
    config={"configurable": {"session_id": "my-session"}},
)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `get_paper_metadata_from_arxiv` with `{'topic': 'Prompt Compression'}`


[0m[36;1m[1;3m[{'Published': '2024-03-30', 'Title': 'PROMPT-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression', 'Authors': 'Muhammad Asif Ali, Zhengping Li, Shu Yang, Keyuan Cheng, Yang Cao, Tianhao Huang, Lijie Hu, Lu Yu, Di Wang', 'Summary': "Large language models (LLMs) have shown exceptional abilities for multiple\ndifferent natural language processing tasks. While prompting is a crucial tool\nfor LLM inference, we observe that there is a significant cost associated with\nexceedingly lengthy prompts. Existing attempts to compress lengthy prompts lead\nto sub-standard results in terms of readability and interpretability of the\ncompressed prompt, with a detrimental impact on prompt utility. To address\nthis, we propose PROMPT-SAW: Prompt compresSion via Relation AWare graphs, an\neffective strategy for prompt compressi

{'input': 'Get me papers on Prompt Compression.',
 'chat_history': [],
 'output': 'Here are some papers on Prompt Compression:\n\n1. "PROMPT-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression" by Muhammad Asif Ali, Zhengping Li, Shu Yang, Keyuan Cheng, Yang Cao, Tianhao Huang, Lijie Hu, Lu Yu, Di Wang. Published on 2024-03-30.\n\nSummary: This paper proposes PROMPT-SAW, an effective strategy for prompt compression over task-agnostic and task-aware prompts. PROMPT-SAW uses the prompt\'s textual information to build a graph, later extracts key information elements in the graph to come up with the compressed prompt. The authors also propose GSM8K-AUG, an extended version of the existing GSM8k benchmark for task-agnostic prompts in order to provide a comprehensive evaluation platform.\n\n2. "Say More with Less: Understanding Prompt Learning Behaviors through Gist Compression" by Xinze Li, Zhenghao Liu, Chenyan Xiong, Shi Yu, Yukun Yan, Shuo Wang, Ge Yu. Published on 2024-

In [31]:
agent_with_chat_history.invoke(
    {"input": "What is the title of the first paper you found?"},
    config={"configurable": {"session_id": "my-session"}},
)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThe title of the first paper is "PROMPT-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression".[0m

[1m> Finished chain.[0m


{'input': 'What is the title of the first paper you found?',
 'chat_history': [HumanMessage(content='Get me papers on Prompt Compression.'),
  AIMessage(content='Here are some papers on Prompt Compression:\n\n1. "PROMPT-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression" by Muhammad Asif Ali, Zhengping Li, Shu Yang, Keyuan Cheng, Yang Cao, Tianhao Huang, Lijie Hu, Lu Yu, Di Wang. Published on 2024-03-30.\n\nSummary: This paper proposes PROMPT-SAW, an effective strategy for prompt compression over task-agnostic and task-aware prompts. PROMPT-SAW uses the prompt\'s textual information to build a graph, later extracts key information elements in the graph to come up with the compressed prompt. The authors also propose GSM8K-AUG, an extended version of the existing GSM8k benchmark for task-agnostic prompts in order to provide a comprehensive evaluation platform.\n\n2. "Say More with Less: Understanding Prompt Learning Behaviors through Gist Compression" by Xinze Li, Zheng