<div>
<img src=https://www.institutedata.com/wp-content/uploads/2019/10/iod_h_tp_primary_c.svg width="300">
</div>

# Demo 8.4: Agentic RAG

In this demo, we'll explore how an Agentic RAG (Retrieval-Augmented Generation) system can be created using LangChain and Gemini.

A RAG system fetches relevant documents from an external knowledge base (for this demo, a pdf document) and passes them to a language model. Agentic RAG adds reasoning and multi-step decision-making to this, allowing the model to refine its responses intelligently, like an agent. Additional tools (for this demo, web search using Serper) may be executed before coming up with a final answer.

[LangChain](https://github.com/langchain-ai/langchain) is an open-source framework used to develop applications based on large language models. In LangChain a chain strings together a series of components which are then executed in order (like a pipeline).

Serper is a Google Search API that allows real-time web searches using data from Google. It’s often used in AI agent systems, chatbots, and RAG pipelines for real-time access to information from the web.

Components of the system:
* a pdf loader that takes in the text of the pdf document
  
* a simple vector store which stores a vector representation of chunks of the document - FAISS (Facebook AI Similarity Search) is a library that enables this
    
* A retriever to fetch relevant documents based on a user query

* A language model (Gemini) to process the query

* Vector search and Web search tools to be used by the agent.
  
* A LangChain agent that analyses the query, plans a retrieval reasoning step and then combines information to answer the question either by searching the document or performing a web search.

INSTRUCTIONS:

- Run the cells
- Observe and understand the results

In [None]:
!pip install --upgrade langchain langchain_community langchain-google-genai pypdf faiss-cpu
# worked with langchain==0.3.25, langchain_community==0.3.23, langchain-google-genai==2.1.4, pypdf==5.4.0, faiss-cpu==1.11.0



In [None]:
from google import genai

# for pdf files:
from langchain_community.document_loaders import PyPDFLoader
# for text files:
# from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings.base import Embeddings
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.chains import RetrievalQA
from langchain.vectorstores import FAISS
from langchain_community.utilities import GoogleSerperAPIWrapper # for performing web search
from langchain.agents import Tool # class for tool creation
from langchain.tools import tool # enables tool wrapping from a function
from langchain.tools.render import render_text_description_and_args
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.agents import AgentExecutor
from langchain.schema.runnable import RunnablePassthrough
from langchain.agents.output_parsers import JSONAgentOutputParser
from langchain.agents.format_scratchpad import format_log_to_str

1. Have the file gemini_key.txt (used in Lab 3.2.3) in the same folder as this notebook.

2. Sign up for a free Serper account at https://serper.dev/
   
3. Create a new API key at https://serper.dev/api-key. Copy-paste it into an empty text file called 'serper_key.txt'.

Running the next cell will then read in the keys and assign them to variables.

In [None]:
geminifilename = r'gemini_key.txt' # this file contains a single line containing your Gemini API key only
try:
    with open(geminifilename, 'r') as f:
        geminikey = f.read().strip()
except FileNotFoundError:
    print("'%s' file not found" % filename)

serperfilename = r'serper_key.txt' # this file contains a single line containing your Serper API key only
try:
    with open(serperfilename, 'r') as f:
        serperkey = f.read().strip()
except FileNotFoundError:
    print("'%s' file not found" % filename)

## Reading a pdf file into a vector store
Have "Course overview slide.pdf" from Module 0 in the same folder as this notebook.

In [None]:
pdf_files = ["Course overview slide.pdf"] #using one document for now, can be extended if using a paid Gemini account

documents = []

for pdf_file in pdf_files:
    loader = PyPDFLoader(pdf_file) # can use TextLoader if reading text files
    docs = loader.load()
    documents.extend(docs)

In [None]:
documents

[Document(metadata={'producer': 'PyPDF', 'creator': 'Google', 'creationdate': '', 'title': 'Course overview slide.pptx', 'source': 'Course overview slide.pdf', 'total_pages': 1, 'page': 0, 'page_label': '1'}, page_content='© 2022 Institute of Data1 1\nFoundational skills\n• Programming for Data Science (Python)\n• Maths and Statistics for Data Science\nCore Data Science and AI skills\n• Exploratory Data Analysis (EDA) and data wrangling\n• Data visualisation\n• Database access\n• Application Programming Interfaces (APIs)\n• Supervised learning (Regression and Classification)\n• Unsupervised learning (Clustering and \nDimensionality reduction)\n• Decision Trees and Ensemble Methods\n• Natural Language Processing (NLP) and LLMs\n• Artificial Intelligence & Deep learning\n• Generative AI \n• Cloud computing & Machine learning deployment\n• Data science industry practices\nApplying Data Science in \nindustry\n•Applying data science on different data \nstructures and domains\n•Defining a da

In [None]:
len(documents)

1

The text is then split into chunks, each of which will be converted into a vector. Having a non-zero value of `chunk_overlap` can help preserve context at chunk boundaries.

In [None]:
# split documents

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
documents = text_splitter.split_documents(documents)


In [None]:
documents

[Document(metadata={'producer': 'PyPDF', 'creator': 'Google', 'creationdate': '', 'title': 'Course overview slide.pptx', 'source': 'Course overview slide.pdf', 'total_pages': 1, 'page': 0, 'page_label': '1'}, page_content='© 2022 Institute of Data1 1\nFoundational skills\n• Programming for Data Science (Python)\n• Maths and Statistics for Data Science\nCore Data Science and AI skills\n• Exploratory Data Analysis (EDA) and data wrangling\n• Data visualisation\n• Database access\n• Application Programming Interfaces (APIs)\n• Supervised learning (Regression and Classification)\n• Unsupervised learning (Clustering and \nDimensionality reduction)\n• Decision Trees and Ensemble Methods\n• Natural Language Processing (NLP) and LLMs'),
 Document(metadata={'producer': 'PyPDF', 'creator': 'Google', 'creationdate': '', 'title': 'Course overview slide.pptx', 'source': 'Course overview slide.pdf', 'total_pages': 1, 'page': 0, 'page_label': '1'}, page_content='• Artificial Intelligence & Deep learn

In [None]:
len(documents) # this should be at most 5, due to rate limits with Gemini's API in their embedding model

3

Next these documents will be embedded as numerical vectors. For this we will use a Google Gemini embedding model. The following cell shows how a sample sentence is encoded.

In [None]:
client = genai.Client(api_key=geminikey)

result = client.models.embed_content(
        model="gemini-embedding-exp-03-07",
        contents="Here is a sample sentence to be encoded as a vector.")

print(result.embeddings)

[ContentEmbedding(values=[-0.021102482, 0.017272899, 0.026589524, -0.06426501, 0.0013448372, -0.011096641, 0.024506137, -0.0048867515, 0.04925546, 0.0152401235, -0.019089045, -0.02765348, 0.012880644, 0.017667916, 0.1239164, 0.009019512, 0.0077062338, 0.019099379, 0.0063787703, -0.02574759, -0.0067446292, -0.0012019322, 0.0075493082, -0.015887082, 0.0024043093, -0.03204151, 0.03321089, 0.018802723, 0.026272679, -0.014342566, -0.0038752356, 0.016667696, 0.032453395, -0.0022884882, -0.010365401, -0.0019003312, 0.036442734, 0.014119206, 0.021265734, 0.013461521, -0.008164951, -0.021147953, -0.0024661967, -0.036623493, -0.00024103023, 0.028712375, 0.0058087907, -0.018305311, 0.01805461, 0.03316174, -0.0035838757, -0.00450171, -0.012465403, -0.14968765, 0.0055459505, -0.0082166325, -0.0038568343, -0.015740115, 0.033392254, 0.00076819985, -0.003069475, 0.039658662, -0.02786392, -0.012367489, 0.022291884, -0.014413216, 0.012418294, -0.011510544, -0.0074004363, -0.019570341, -0.0067159273, 0.0

In [None]:
result.embeddings[0].values

[-0.021102482,
 0.017272899,
 0.026589524,
 -0.06426501,
 0.0013448372,
 -0.011096641,
 0.024506137,
 -0.0048867515,
 0.04925546,
 0.0152401235,
 -0.019089045,
 -0.02765348,
 0.012880644,
 0.017667916,
 0.1239164,
 0.009019512,
 0.0077062338,
 0.019099379,
 0.0063787703,
 -0.02574759,
 -0.0067446292,
 -0.0012019322,
 0.0075493082,
 -0.015887082,
 0.0024043093,
 -0.03204151,
 0.03321089,
 0.018802723,
 0.026272679,
 -0.014342566,
 -0.0038752356,
 0.016667696,
 0.032453395,
 -0.0022884882,
 -0.010365401,
 -0.0019003312,
 0.036442734,
 0.014119206,
 0.021265734,
 0.013461521,
 -0.008164951,
 -0.021147953,
 -0.0024661967,
 -0.036623493,
 -0.00024103023,
 0.028712375,
 0.0058087907,
 -0.018305311,
 0.01805461,
 0.03316174,
 -0.0035838757,
 -0.00450171,
 -0.012465403,
 -0.14968765,
 0.0055459505,
 -0.0082166325,
 -0.0038568343,
 -0.015740115,
 0.033392254,
 0.00076819985,
 -0.003069475,
 0.039658662,
 -0.02786392,
 -0.012367489,
 0.022291884,
 -0.014413216,
 0.012418294,
 -0.011510544,
 -0.0

In [None]:
len(result.embeddings[0].values)

3072

The following code defines a custom embedding class that integrates Google Gemini's embedding model into LangChain's framework, which expects an Embeddings interface with `embed_documents` and `embed_query` methods.

In [None]:
class GeminiEmbeddings(Embeddings):
    def __init__(self, api_key, model_name="models/gemini-embedding-exp-03-07"):
        self.client = genai.Client(api_key=api_key)
        self.model_name = model_name

    def embed_documents(self, texts): # embeds multiple documents as vectors
        return [self.client.models.embed_content(model=self.model_name, contents=t).embeddings[0].values for t in texts]

    def embed_query(self, text): # embeds a single document as a vector
        return self.client.models.embed_content(model=self.model_name, contents=text).embeddings[0].values

embeddings = GeminiEmbeddings(api_key=geminikey)

In [None]:
# create vectorstore from the documents and embeddings model - do not run this cell more than once per minute
vectorstore = FAISS.from_documents(documents, embeddings)

# optionally saving the vectorstore
# vectorstore.save_local("vectorstore.db")

Now that we have a vectorstore, we can query it to retrieve relevant documents based on input text. The following retrieves the two most similar documents in the store to the question about foundational skills.

In [None]:
results = vectorstore.similarity_search_with_score("what are the foundational skills listed in the document?", k=2)

for doc, score in results:
    print("Text:", doc.page_content)
    print("Score:", score) # L2 score - the smaller, the more similar
    print("\n")

Text: © 2022 Institute of Data1 1
Foundational skills
• Programming for Data Science (Python)
• Maths and Statistics for Data Science
Core Data Science and AI skills
• Exploratory Data Analysis (EDA) and data wrangling
• Data visualisation
• Database access
• Application Programming Interfaces (APIs)
• Supervised learning (Regression and Classification)
• Unsupervised learning (Clustering and 
Dimensionality reduction)
• Decision Trees and Ensemble Methods
• Natural Language Processing (NLP) and LLMs
Score: 0.61617523


Text: Consulting, Questioning, Critical Thinking, Problem Solving, Documenting, Presenting, Story-telling
Learning how to learn effectively framework
Minimal Viable Learning (MVL), Multimodal learning, Learn-Create cycle
Score: 0.73513484




In [None]:
# load llm
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash-exp", api_key = geminikey)

# create retriever
retriever = vectorstore.as_retriever()

From `llm` and `retriever` a vector_search function is defined so that the LLM can apply a query to the vector store:

In [None]:
# define vector search

def vector_search(query):
    qa_chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)
    return qa_chain.invoke(query)

Example usage:

In [None]:
vector_search("what are the foundational skills listed in the document?")

{'query': 'what are the foundational skills listed in the document?',
 'result': 'The foundational skills listed in the document are:\n* Programming for Data Science (Python)\n* Maths and Statistics for Data Science'}

Asking the same question via a web search gives a highly irrelevant response!

In [None]:
web_search = GoogleSerperAPIWrapper(serper_api_key = serperkey)

In [None]:
web_search.run("what are the foundational skills listed in the document?")

'Mastery of foundational literacy skills such as phonics and phonological awareness, including phonemic awareness, supports long-term educational achievement. Missing: document? | Show results with:document?. Reading: Foundational Skills · History/Social Studies · Science & Technical Subjects · Writing. Decoding: learning to read words by recognizing and stringing together sounds Encoding: using letter sounds to write Automaticity: decoding that is done so ... The five skills they identified as critical were phonological awareness, phonics, fluency, vocabulary, and comprehension. Missing: document? | Show results with:document?. The table below shows what is included in foundational skills by grade. Please note that there are many very important parts of literacy ... Missing: document? | Show results with:document?. Development, Effective Expression, Content Knowledge, and, as discussed in this paper, Foundational Skills. In addition, it is fundamental to achieving one of ... It helps 

## Setting up the agent

After the vector_search and web_search functions are created, tools for the agent are created.

In [None]:
# here tool (from langchain.tools) enables tool wrapping from a defined function

@tool # this is shorthand for vector_search_tool = tool(vector_search_tool)
def vector_search_tool(query):
    """Tool for searching the vector store."""
    return vector_search(query)

@tool # this is shorthand for web_search_tool = tool(web_search_tool)
def web_search_tool(query):
    """Tool for performing web search."""
    return web_search.run(query)


In [None]:
# define tools for the agent - here Tool is a class for tool creation from langchain.agents

tools = [Tool(name="VectorStoreSearch", func=vector_search_tool,
            description="Searches a vector store for relevant documents based on a query."),
         Tool(name="WebSearch", func=web_search_tool,
            description="Performs a web search for real-time information.")]

Next a system prompt for the agent is defined with a desired response included.

In [None]:
system_prompt = """Respond as accurately as possible. You have access to the following tools: {tools}
Always try the \"VectorStoreSearch\" tool first. Use \"WebSearch\" only if the vector store does not contain the required information.
Please provide an action key (tool name) and an action_input key (tool input).
Valid "action" values: "Final Answer" or {tool_names}
Provide only ONE action at a time.
```
Observation: action result
... (repeat Thought/Action/Observation N times)
Thought: I know what to respond
Action:
```
{{
  "action": "Final Answer",
  "action_input": "Final response to human"
}}
"""

In [None]:
human_prompt = """{input}
{agent_notes}"""

Then we create a prompt template comprising the above.

In [None]:
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", human_prompt),
    ]
)

Modifying the langchain prompt with information about the tools the agent can use:

In [None]:

prompt = prompt.partial(
    tools = render_text_description_and_args(list(tools)), # a string description of all tools for use in an agent prompt
    tool_names = ", ".join([t.name for t in tools]),
)


The following creates the RAG chain:

In [None]:
# creating the RAG chain
chain = (
    RunnablePassthrough.assign(
        agent_notes=lambda x: format_log_to_str(x["intermediate_steps"]),
    )
    | prompt
    | llm
    | JSONAgentOutputParser()  # JSON used here so that the output could possibly be used by other agents
)

RunnablePassthrough is a LangChain utility that passes transformed input. The key "agent_notes" is assigned a value which is a list of agent actions and observations ("intermediate_steps") formatted as a string.

In [None]:
prompt

ChatPromptTemplate(input_variables=['agent_notes', 'input'], input_types={}, partial_variables={'tools': "VectorStoreSearch(tool_input: 'str', callbacks: 'Callbacks' = None) -> 'str' - Searches a vector store for relevant documents based on a query., args: {'tool_input': {'type': 'string'}}\nWebSearch(tool_input: 'str', callbacks: 'Callbacks' = None) -> 'str' - Performs a web search for real-time information., args: {'tool_input': {'type': 'string'}}", 'tool_names': 'VectorStoreSearch, WebSearch'}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['tool_names', 'tools'], input_types={}, partial_variables={}, template='Respond as accurately as possible. You have access to the following tools: {tools}\nAlways try the "VectorStoreSearch" tool first. Use "WebSearch" only if the vector store does not contain the required information.\nPlease provide an action key (tool name) and an action_input key (tool input).\nValid "action" values: "Final Answer" or {tool_name

Finally an AgentExecutor object can now be created using the above defined chain and tools.

In [None]:
agent_executor = AgentExecutor(
    agent=chain,
    tools=tools,
    handle_parsing_errors=True,
    verbose=True
)

## Invoking the agent

Note the tasks executed by the agent as it is given the following question to answer:

In [None]:
agent_executor.invoke({"input": "What are the core data science and AI skills from the document?"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mI need to see the document to answer your question. I will start by searching for it.
```json
{
  "action": "VectorStoreSearch",
  "action_input": {
    "tool_input": "core data science and AI skills"
  }
}
```[0m[36;1m[1;3m{'query': 'core data science and AI skills', 'result': 'According to the provided text, core Data Science and AI skills include:\n\n*   Exploratory Data Analysis (EDA) and data wrangling\n*   Data visualisation\n*   Database access\n*   Application Programming Interfaces (APIs)\n*   Supervised learning (Regression and Classification)\n*   Unsupervised learning (Clustering and Dimensionality reduction)\n*   Decision Trees and Ensemble Methods\n*   Natural Language Processing (NLP) and LLMs\n*   Artificial Intelligence & Deep learning\n*   Generative AI\n*   Cloud computing & Machine learning deployment\n*   Data science industry practices'}[0m[32;1m[1;3m```json
{
  "action": "Final Answer",
  "action_

{'input': 'What are the core data science and AI skills from the document?',
 'output': 'The core data science and AI skills, according to the document, include:\n\n*   Exploratory Data Analysis (EDA) and data wrangling\n*   Data visualisation\n*   Database access\n*   Application Programming Interfaces (APIs)\n*   Supervised learning (Regression and Classification)\n*   Unsupervised learning (Clustering and Dimensionality reduction)\n*   Decision Trees and Ensemble Methods\n*   Natural Language Processing (NLP) and LLMs\n*   Artificial Intelligence & Deep learning\n*   Generative AI\n*   Cloud computing & Machine learning deployment\n*   Data science industry practices'}

Next we ask a question that is not answered in the document.

In [None]:
agent_executor.invoke({"input": "What is the difference between programming and coding?"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m```json
{
  "action": "VectorStoreSearch",
  "action_input": "difference between programming and coding"
}
```[0m[36;1m[1;3m{'query': 'difference between programming and coding', 'result': 'The provided context does not explicitly define the difference between programming and coding.'}[0m[32;1m[1;3m```json
{
  "action": "WebSearch",
  "action_input": "difference between programming and coding"
}
```[0m[33;1m[1;3mCoding means when someone else tells you what to do. Programming means you do what you want. The development you do what you want to do even if ... Coding is a computer programming language that helps to communicate with a computer. Computers do not understand human languages. Coding allows ... In this article, I'll explain the basic differences between coding and programming and how they work collaboratively to develop apps and sites. The primary difference between coding vs. programming is the level of resp

{'input': 'What is the difference between programming and coding?',
 'output': 'While the terms are often used interchangeably, coding is generally considered a subset of programming. Coding refers to the act of writing code in a specific language to instruct a computer to perform a task. Programming, on the other hand, is a broader process that involves planning, designing, developing, testing, and maintaining software. Programming encompasses problem-solving, algorithm design, and data structure implementation, while coding is primarily focused on translating those designs into executable code.'}

Feel free to ask additional questions!

## Summary

An agentic RAG system is created here based on LangChain's framework. It takes in a pdf document as context and the agent resorts to web search if it is unable to answer a question from the document directly. In practice such short documents can be included as part of a prompt to an LLM, but the above idea can be applied easily to cases where the additional context is vast.



---



---



> > > > > > > > > © 2025 Institute of Data


---



---



