# Buliding Agent with Tools Wikipedia, Arxiv, and RAG using LangChain and GROQ LLM

In this notebook, we build a powerful multi-tool AI agent using LangChain and GROQ's fast and efficient LLaMA3 model. The agent can answer complex questions by combining reasoning with external knowledge sources and custom document retrieval.

### Create Tools 

#### Tool for Wikipedia

In [1]:
from langchain_community.tools import WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper

# Create wrapper for Wikipedia API
wiki_api_wrapper = WikipediaAPIWrapper(
    top_k_results=1,
    doc_content_chars_max=250
)
# Create function to query Wikipedia
wiki_func = WikipediaQueryRun(api_wrapper=wiki_api_wrapper)
wiki_func.name

'wikipedia'

#### Tool for Arxiv

In [2]:
from langchain_community.tools import ArxivQueryRun
from langchain_community.utilities import ArxivAPIWrapper

# Create wrapper for Arxiv API
arxiv_api_wrapper = ArxivAPIWrapper(
    top_k_results=1,
    doc_content_chars_max=250
)
# Create function to query Arxiv
arxiv_func = ArxivQueryRun(api_wrapper=arxiv_api_wrapper)
print(arxiv_func.name)

arxiv


#### Custom Tool (RAG Tool)

In [3]:
from langchain_community.document_loaders import WebBaseLoader
# Create a web loader for a specific URL
web_loader = WebBaseLoader(
    web_path="https://docs.smith.langchain.com/"
)
docs = web_loader.load()
# Print the loaded documents
for doc in docs:
    print(doc.page_content[:50])  
    print(doc.metadata)  

USER_AGENT environment variable not set, consider setting it to identify your requests.







Get started with LangSmith | 🦜️🛠️ LangSmith


{'source': 'https://docs.smith.langchain.com/', 'title': 'Get started with LangSmith | 🦜️🛠️ LangSmith', 'description': 'LangSmith is a platform for building production-grade LLM applications.', 'language': 'en'}


In [4]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Create a text splitter to split documents into smaller chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)
# Split the loaded documents into smaller chunks
split_docs = text_splitter.split_documents(docs)

# Print the first chunk of the first document
print(split_docs[0].page_content[:50])

Get started with LangSmith | 🦜️🛠️ LangSmith


In [5]:
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

# Create an embeddings model using Hugging Face
embeddings_model = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

# Create a vector store using FAISS with the embeddings model
vector_store = FAISS.from_documents(
    split_docs,
    embeddings_model
)
# Print the number of documents in the vector store
print(len(vector_store.index_to_docstore_id))

# Create a retriever from the vector store
retriever = vector_store.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 2}
)

# print the retriever's search type and number of results to return
retriever


  from .autonotebook import tqdm as notebook_tqdm


6


VectorStoreRetriever(tags=['FAISS', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x000001BCA2A31D50>, search_kwargs={'k': 2})

In [None]:
# Create a retriever tool using the retriever
from langchain.tools.retriever import create_retriever_tool

# Create a retriever tool using the retriever
retriever_tool = create_retriever_tool(
    retriever=retriever,
    name="LangSmithSearch",
    description="A tool to retrieve from the LangSmith knowledge base."
)

# Print the retriever tool's name and description
retriever_tool

Tool(name='LangSmithSearch', description='A tool to retrieve from the LangSmith knowledge base.', args_schema=<class 'langchain_core.tools.retriever.RetrieverInput'>, func=functools.partial(<function _get_relevant_documents at 0x000001BCCEF9E3B0>, retriever=VectorStoreRetriever(tags=['FAISS', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x000001BCA2A31D50>, search_kwargs={'k': 2}), document_prompt=PromptTemplate(input_variables=['page_content'], input_types={}, partial_variables={}, template='{page_content}'), document_separator='\n\n', response_format='content'), coroutine=functools.partial(<function _aget_relevant_documents at 0x000001BCCF310820>, retriever=VectorStoreRetriever(tags=['FAISS', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x000001BCA2A31D50>, search_kwargs={'k': 2}), document_prompt=PromptTemplate(input_variables=['page_content'], input_types={}, partial_variables={}, temp

#### Create Tool

In [7]:
tools = [wiki_func, arxiv_func, retriever_tool]
# Print the names of the tools
for tool in tools:
    print(tool.name)
    print(tool.description)
    print(tool.args_schema)
    print() 

wikipedia
A wrapper around Wikipedia. Useful for when you need to answer general questions about people, places, companies, facts, historical events, or other subjects. Input should be a search query.
<class 'langchain_community.tools.wikipedia.tool.WikipediaQueryInput'>

arxiv
A wrapper around Arxiv.org Useful for when you need to answer questions about Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance, Statistics, Electrical Engineering, and Economics from scientific articles on arxiv.org. Input should be a search query.
<class 'langchain_community.tools.arxiv.tool.ArxivInput'>

LangSmithSearch
A tool to retrieve from the LangSmith knowledge base.
<class 'langchain_core.tools.retriever.RetrieverInput'>



#### Create model

In [8]:
# Create Groq model using the Groq API
from dotenv import load_dotenv
load_dotenv()

import os
groq_api_key = os.getenv('GROQ_API_KEY')
os.environ["GROQ_API_KEY"] = groq_api_key

from langchain_groq import ChatGroq
llm = ChatGroq(
    model_name="Llama3-8b-8192",
    api_key=groq_api_key
)
llm

ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x000001BCC1BC1F90>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x000001BCC1BC3400>, model_name='Llama3-8b-8192', model_kwargs={}, groq_api_key=SecretStr('**********'))

#### Prompt Template for Agent

In [17]:

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful AI agent that can search Wikipedia and Arxiv."),
    ("human", "{input}"),
    MessagesPlaceholder(variable_name="agent_scratchpad")
])
prompt


ChatPromptTemplate(input_variables=['agent_scratchpad', 'input'], input_types={'agent_scratchpad': list[typing.Annotated[typing.Union[typing.Annotated[langchain_core.messages.ai.AIMessage, Tag(tag='ai')], typing.Annotated[langchain_core.messages.human.HumanMessage, Tag(tag='human')], typing.Annotated[langchain_core.messages.chat.ChatMessage, Tag(tag='chat')], typing.Annotated[langchain_core.messages.system.SystemMessage, Tag(tag='system')], typing.Annotated[langchain_core.messages.function.FunctionMessage, Tag(tag='function')], typing.Annotated[langchain_core.messages.tool.ToolMessage, Tag(tag='tool')], typing.Annotated[langchain_core.messages.ai.AIMessageChunk, Tag(tag='AIMessageChunk')], typing.Annotated[langchain_core.messages.human.HumanMessageChunk, Tag(tag='HumanMessageChunk')], typing.Annotated[langchain_core.messages.chat.ChatMessageChunk, Tag(tag='ChatMessageChunk')], typing.Annotated[langchain_core.messages.system.SystemMessageChunk, Tag(tag='SystemMessageChunk')], typing.Ann

#### Create the Agent

In [19]:
from langchain.agents import AgentExecutor, create_tool_calling_agent

agent = create_tool_calling_agent(llm=llm, tools=tools, prompt=prompt)
agent

RunnableAssign(mapper={
  agent_scratchpad: RunnableLambda(lambda x: message_formatter(x['intermediate_steps']))
})
| ChatPromptTemplate(input_variables=['agent_scratchpad', 'input'], input_types={'agent_scratchpad': list[typing.Annotated[typing.Union[typing.Annotated[langchain_core.messages.ai.AIMessage, Tag(tag='ai')], typing.Annotated[langchain_core.messages.human.HumanMessage, Tag(tag='human')], typing.Annotated[langchain_core.messages.chat.ChatMessage, Tag(tag='chat')], typing.Annotated[langchain_core.messages.system.SystemMessage, Tag(tag='system')], typing.Annotated[langchain_core.messages.function.FunctionMessage, Tag(tag='function')], typing.Annotated[langchain_core.messages.tool.ToolMessage, Tag(tag='tool')], typing.Annotated[langchain_core.messages.ai.AIMessageChunk, Tag(tag='AIMessageChunk')], typing.Annotated[langchain_core.messages.human.HumanMessageChunk, Tag(tag='HumanMessageChunk')], typing.Annotated[langchain_core.messages.chat.ChatMessageChunk, Tag(tag='ChatMessageCh

In [20]:
# Wrap with executor
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
agent_executor

AgentExecutor(verbose=True, agent=RunnableMultiActionAgent(runnable=RunnableAssign(mapper={
  agent_scratchpad: RunnableLambda(lambda x: message_formatter(x['intermediate_steps']))
})
| ChatPromptTemplate(input_variables=['agent_scratchpad', 'input'], input_types={'agent_scratchpad': list[typing.Annotated[typing.Union[typing.Annotated[langchain_core.messages.ai.AIMessage, Tag(tag='ai')], typing.Annotated[langchain_core.messages.human.HumanMessage, Tag(tag='human')], typing.Annotated[langchain_core.messages.chat.ChatMessage, Tag(tag='chat')], typing.Annotated[langchain_core.messages.system.SystemMessage, Tag(tag='system')], typing.Annotated[langchain_core.messages.function.FunctionMessage, Tag(tag='function')], typing.Annotated[langchain_core.messages.tool.ToolMessage, Tag(tag='tool')], typing.Annotated[langchain_core.messages.ai.AIMessageChunk, Tag(tag='AIMessageChunk')], typing.Annotated[langchain_core.messages.human.HumanMessageChunk, Tag(tag='HumanMessageChunk')], typing.Annotated[l

In [21]:
agent_executor.invoke(
    {
        "input": "What is LangSmith? What are some of its features? How can I use it to test my LLMs?"
    }
)   



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `wikipedia` with `{'query': 'LangSmith'}`


[0m[36;1m[1;3mPage: Will Smith
Summary: Willard Carroll  Smith II (born September 25, 1968) is an American actor, rapper, and film producer. He has received multiple accolades, including an Academy Award, a Golden Globe Award, a BAFTA Award, and four Grammy Awards[0m[32;1m[1;3m
Invoking: `arxiv` with `{'query': 'langsmith'}`


[0m[33;1m[1;3mPublished: 2020-10-09
Title: Langsmith: An Interactive Academic Text Revision System
Authors: Takumi Ito, Tatsuki Kuribayashi, Masatoshi Hidaka, Jun Suzuki, Kentaro Inui
Summary: Despite the current diversity and inclusion initiatives in the academic[0m[32;1m[1;3mIt seems that you've called the tool to retrieve information about LangSmith, and it yielded a research paper about the same topic.

Based on the information provided, it appears that LangSmith is an interactive academic text revision system. It's likely a tool des

{'input': 'What is LangSmith? What are some of its features? How can I use it to test my LLMs?',
 'output': "It seems that you've called the tool to retrieve information about LangSmith, and it yielded a research paper about the same topic.\n\nBased on the information provided, it appears that LangSmith is an interactive academic text revision system. It's likely a tool designed to help improve the quality and accuracy of academic papers.\n\nAs you're interested in testing your LLMs, LangSmith might be a useful resource for you. You could try using it to generate text, revise text, or even use it as a dataset to train your models.\n\nSince you've already called the tools to gather information about LangSmith, I'll respond directly to your question. LangSmith seems like a fascinating tool, and I'm sure it has many features that could be useful for testing and training your LLMs. If you have any specific questions about how to use LangSmith or how to integrate it with your LLMs, feel fre