We will explore the following strategies

- Better Prompting
- Agentic RAG with Tools


#### Install the dependencies

In [78]:
# ! pip install langchain-community tiktoken langchain-openai langchainhub chromadb langchain langgraph tavily-python langchain-groq

In [79]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import SentenceTransformerEmbeddings
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_groq import ChatGroq
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_core.tools import tool

#### Enter GROQ API Key

In [80]:
import os

os.environ["GROQ_API_KEY"] = "your_api_key"

#### Load Connection to LLM


In [81]:
llm = ChatGroq(
    model="llama-3.3-70b-versatile",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
)

#### Download data

In [82]:
# Data related with Agents, Prompt Engineering and LLMs

urls = [
    "https://lilianweng.github.io/posts/2023-06-23-agent/",
    "https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/",
    "https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/",
]

docs = [WebBaseLoader(url).load() for url in urls]
docs_list = [item for sublist in docs for item in sublist]

text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=250, chunk_overlap=0
)
doc_splits = text_splitter.split_documents(docs_list)

#### Use a Sentence-Transformer model

In [83]:
embedding_model = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

#### Create Vector Index

In [84]:
vectorstore = Chroma.from_documents(
    documents=doc_splits,
    collection_name="rag-chroma",
    embedding=embedding_model
)
retriever = vectorstore.as_retriever()

## Better Prompting for Consistent Results

#### Problematic RAG Prompt

In [85]:
prompt = """You are an assistant for question-answering tasks.
            Give an answer to the following question with the context provided

            Question:
            {question}

            Context:
            {context}

            Answer:
         """

prompt_template = ChatPromptTemplate.from_template(prompt)

In [86]:
# Post-processing
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

In [87]:
prompt = """You are an assistant for question-answering tasks.
            Give an answer to the following question with the context provided

            Question:
            {question}

            Context:
            {context}

            Answer:
         """

prompt_template = ChatPromptTemplate.from_template(prompt)

#### The question below is not part of the provided context, but the LLM generates the answer based on its pretraining knowledge.

In [88]:
question = "When was the research paper: Attention Is All You Need was pubilshed?"
docs = retriever.get_relevant_documents(question)
docs = format_docs(docs)

# Chain
rag_chain = prompt_template | llm | StrOutputParser()

# Run
generation = rag_chain.invoke({"context": docs, "question": question})
print(generation)

The research paper "Attention Is All You Need" was published in 2017. It was written by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. The paper introduced the Transformer model, which relies entirely on self-attention mechanisms to process input sequences. Although the provided context does not mention this specific paper, it is a well-known publication in the field of natural language processing and machine learning.


#### Better RAG Prompt

In [89]:
prompt = """You are an assistant for question-answering tasks.
            Use the following pieces of retrieved context to answer the question.
            If no context is present or if you don't know the answer, just say that you don't know.
            Do not make up the answer unless it is there in the provided context.

            Question:
            {question}

            Context:
            {context}

            Answer:
         """

prompt_template = ChatPromptTemplate.from_template(prompt)

##### After updating the prompt, we are now receiving the correct response, as the LLM is no longer relying on its pre-trained knowledge to answer the question

In [90]:
question = "When was the research paper: Attention Is All You Need was pubilshed?"
docs = retriever.get_relevant_documents(question)
docs = format_docs(docs)

# Chain
rag_chain = prompt_template | llm | StrOutputParser()

# Run
generation = rag_chain.invoke({"context": docs, "question": question})
print(generation)

I don't know. The provided context does not mention the research paper "Attention Is All You Need" or its publication date.


## Agentic RAG with Tools

#### Enter Tavily Search Tool API Key


In [91]:
os.environ['TAVILY_API_KEY'] = "your_api_key"

#### Setup Search Tool

In [92]:
@tool
def search_web(query: str) -> list:
    """Search the web for a query."""
    tavily_tool = TavilySearchResults(max_results=3,
                                      search_depth='advanced',
                                      max_tokens=10000)
    results = tavily_tool.invoke(query)
    return [doc['content'] for doc in results]

In [93]:
question

'When was the research paper: Attention Is All You Need was pubilshed?'

In [94]:
search_web(question)

['An illustration of main components of the transformer model from the paper "Attention Is All You Need" [1] is a 2017 landmark [2] [3] research paper in machine learning authored by eight scientists working at Google.The paper introduced a new deep learning architecture known as the transformer, based on the attention mechanism proposed in 2014 by Bahdanau et al. [4] It is considered a',
 'published in 2017 an influential research paper titled "Attention Is All You Need" at the Neural Information Processing Systems (NeurIPS) conference that introduced the Transformer architecture, a novel Neural Network (NN) model for Natural Language Processing (NLP) tasks. With the Transformer, researchers created a simple network architecture based only on attention mechanisms, without recurrence and convolutions. The output is computed as a weighted sum of the values, where the weight assigned to each value is computed by a compatibility function of the query with the corresponding key." [3] To ca

#### Bind Tools to LLM

In [95]:
tools = [search_web]

llm_with_tools = llm.bind_tools(tools, tool_choice="auto")  # Ensure function calling is enabled

#### Better RAG Prompt with Tool Calling

In [96]:
prompt = """You are an assistant for question-answering tasks.
            Use the following pieces of retrieved context to answer the question.
            If no context is present or if you don't know the answer,
            check and see if you can use the tools available to you to get the answer.

            Question:
            {question}

            Context:
            {context}

            Answer:
         """

prompt_template = ChatPromptTemplate.from_template(prompt)

In [97]:
question = "When was the research paper: Attention Is All You Need was pubilshed?"
docs = retriever.get_relevant_documents(question)
docs = format_docs(docs)

rag_chain = prompt_template | llm_with_tools

response = rag_chain.invoke({"context": docs, "question": question})
print(response)


content='<function=search_web{"query": "publication date of research paper Attention Is All You Need"}</function>' additional_kwargs={} response_metadata={'token_usage': {'completion_tokens': 23, 'prompt_tokens': 1102, 'total_tokens': 1125, 'completion_time': 0.083636364, 'prompt_time': 0.058822957, 'queue_time': 0.261708122, 'total_time': 0.142459321}, 'model_name': 'llama-3.3-70b-versatile', 'system_fingerprint': 'fp_2ca0059abb', 'finish_reason': 'stop', 'logprobs': None} id='run-7d39daac-a13a-42b4-b278-ba21e8ce9b5c-0' usage_metadata={'input_tokens': 1102, 'output_tokens': 23, 'total_tokens': 1125}


In [98]:
question = "What are the types of agent memory?"
docs = retriever.get_relevant_documents(question)
docs = format_docs(docs)

rag_chain = prompt_template | llm_with_tools

response = rag_chain.invoke({"context": docs, "question": question})
print(response)

content='The provided context does not explicitly mention the types of agent memory. However, it does mention "memory" as a component of generative agents, which combines LLM with memory, planning, and reflection mechanisms. Additionally, it mentions a "Memory stream" which is a long-term memory module that records a comprehensive list of agents\' experience in natural language.\n\nThe context also discusses human brain memory, mentioning the following types:\n\n1. Sensory Memory: \n   - Iconic memory (visual)\n   - Echoic memory (auditory)\n   - Haptic memory (touch)\n\nSince the question asks about the types of agent memory and the provided context does not give a clear answer, I will not provide any additional information.' additional_kwargs={} response_metadata={'token_usage': {'completion_tokens': 146, 'prompt_tokens': 1181, 'total_tokens': 1327, 'completion_time': 0.530909091, 'prompt_time': 0.117476252, 'queue_time': 0.024661195999999996, 'total_time': 0.648385343}, 'model_name'

#### Simple Agentic RAG with Tool Calls

In [99]:
def agentic_rag(question, context):
  tool_call_map = {'search_web' : search_web}
  response = rag_chain.invoke({"context": docs, "question": question})

  # if response content is present then we have our answer
  if response.content:
    print('Answer is in retrieved context')
    answer = response.content

  # if no response content present then call search tool
  elif response.tool_calls:
    print('Answer not in context, trying to use tools')
    tool_call = response.tool_calls[0]
    selected_tool = tool_call_map[tool_call["name"].lower()]
    print(f"Calling tool: {tool_call['name']}")
    tool_output = selected_tool.invoke(tool_call["args"])
    context = '\n\n'.join(tool_output)
    response = rag_chain.invoke({'context': context, 'question': question})
    answer = response.content

  # no answer found from web search also
  else:
    answer = 'No answer found'

  print(answer)

In [100]:
question = "What are the types of agent memory?"
docs = retriever.get_relevant_documents(question)
docs = format_docs(docs)

agentic_rag(question, docs)

Answer is in retrieved context
The provided context does not explicitly mention the types of agent memory. However, it does mention "memory" as a component of generative agents, which includes a "memory stream" that is a long-term memory module. Additionally, the context provides information on human memory types, but it is unclear if these types apply to agent memory. To provide a more accurate answer, I would need to search for information on agent memory types. 

<function=search_web{"query": "types of agent memory"}</function>


In [101]:
question = "When was the research paper: Attention Is All You Need was pubilshed?"

docs = retriever.get_relevant_documents(question)
docs = format_docs(docs)

agentic_rag(question, docs)

Answer not in context, trying to use tools
Calling tool: search_web
The research paper "Attention Is All You Need" was published in 2017, as indicated by the arXiv identifier "arXiv:1706.03762" which corresponds to June 2017, and it was presented at the 31st International Conference on Neural Information Processing Systems (NIPS'17).
