<a href="https://colab.research.google.com/github/laplezeda/agents/blob/main/Agentic_RAG_using_LlamaIndex.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Agentic RAG in LlamaIndex


## Part 0: Loading libraries

In [None]:
!pip install llama-index llama-index-vector-stores-chroma llama-index-llms-huggingface-api llama-index-embeddings-huggingface -U -q

Traceback (most recent call last):
  File "/usr/local/bin/pip3", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/pip/_internal/cli/main.py", line 78, in main
    command = create_command(cmd_name, isolated=("--isolated" in cmd_args))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/pip/_internal/commands/__init__.py", line 114, in create_command
    module = importlib.import_module(module_path)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlo

KeyboardInterrupt: 

In [None]:
import os
os.environ["OPENAI_API_KEY"] = ""

And, let's log in to Hugging Face to use serverless Inference APIs.

In [None]:
from huggingface_hub import login

login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## Part 1: Simple RAG Systems

In [None]:
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, Settings
from llama_index.core.node_parser import SentenceSplitter
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

# Load document
reader = SimpleDirectoryReader(input_files=["state.pdf"])
documents = reader.load_data()
print(f"Loaded {len(documents)} document(s).")

# Split into chunks
splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)

# Set up LLM and embedding model
Settings.llm = OpenAI(model="gpt-3.5-turbo")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")

# Create vector index
vector_index = VectorStoreIndex(nodes)

# Create query engine
query_engine = vector_index.as_query_engine()

Loaded 26 document(s).


#### 1.1 Inspecting the vector store

In [None]:
# Access the vector store data directly
vector_store = vector_index.vector_store

# Get embedding dictionary and node dictionary
embedding_dict = vector_store.data.embedding_dict
node_dict = vector_store.data.text_id_to_ref_doc_id

print(f"Number of embeddings: {len(embedding_dict)}")
print(f"Number of node references: {len(node_dict)}")

# Show first few embeddings
for i, (node_id, embedding) in enumerate(list(embedding_dict.items())[:3]):
    print(f"\n--- Embedding {i} ---")
    print(f"Node ID: {node_id}")
    print(f"Embedding dimension: {len(embedding)}")
    print(f"First 10 values: {embedding[:10]}")

Number of embeddings: 27
Number of node references: 27

--- Embedding 0 ---
Node ID: a997ad8c-500d-4827-b2b4-d5d068a60b0f
Embedding dimension: 1536
First 10 values: [-0.014133124612271786, -0.01401845458894968, -0.013387767598032951, -0.021386027336120605, 0.006475293077528477, 0.013058089651167393, -0.02142902836203575, 0.01010532770305872, -0.03012964315712452, -0.024381790310144424]

--- Embedding 1 ---
Node ID: 945aeda2-1ab4-4d5f-9a23-3bcaeae2f6ca
Embedding dimension: 1536
First 10 values: [-0.014858749695122242, -0.02170269563794136, -0.003049110062420368, -0.019040387123823166, -0.007666334044188261, 0.02011367306113243, -0.028435129672288895, 0.010872256010770798, -0.02122877538204193, -0.02844906970858574]

--- Embedding 2 ---
Node ID: 5b92bddc-68d8-4c70-9c07-2784b57de187
Embedding dimension: 1536
First 10 values: [0.0016929084667935967, -0.027508273720741272, -0.004747966304421425, -0.03599747642874718, -0.007251191884279251, 0.01682875119149685, -0.01629817672073841, 0.026596

#### 1.2 Asking questions to the RAG system

In [None]:
# Query the document
response = query_engine.query("Who is Lareina Yee?")
print(response)

Lareina Yee is a Senior partner and McKinsey Global Institute director.


#### 1.3 Checking if the responses make sense

In [None]:
print(len(response.source_nodes))

2


In [None]:
# Print out each source node
print("Source nodes:")
print("=" * 50)

for i, node in enumerate(response.source_nodes):
    print(f"Node {i+1}:")
    print(f"Score: {node.score}")
    print(f"Text: {node.text}")
    print(f"Metadata: {node.metadata}")
    print("-" * 30)

Source nodes:
Node 1:
Score: 0.7230227134327099
Text: The state of AI  
March 2025
Alex Singla  
Alexander Sukharevsky  
Lareina Yee  
Michael Chui  
Bryce Hall
How organizations are rewiring to capture value
Metadata: {'page_label': '1', 'file_name': 'state.pdf', 'file_path': 'state.pdf', 'file_type': 'application/pdf', 'file_size': 5564174, 'creation_date': '2025-07-01', 'last_modified_date': '2025-07-01'}
------------------------------
Node 2:
Score: 0.7062869479601049
Text: McKinsey commentary
Lareina Yee
Senior partner and McKinsey Global Institute director
Although we remain in the early stages of gen AI, we’re beginning to get a glimpse into 
the ways the technology is affecting the workforce. A common fear about the technology 
is that it will be a job killer, as organizations offload tasks historically done by employees to 
increasingly powerful AI platforms. But our survey suggests that this is not necessarily the 
case. In fact, a plurality of respondents anticipate no immed

In [None]:
# Ask more questions
response2 = query_engine.query("What are the main findings about AI adoption?")
print(response2)

The main findings about AI adoption include an increase in reported AI use among organizations, with 78 percent of respondents stating their organizations use AI in at least one business function. The IT and marketing and sales functions are the most common areas where AI is utilized. Additionally, organizations are now using AI in more business functions compared to previous surveys, with most respondents reporting AI use in more than one business function on average.


In [None]:
response3 = query_engine.query("What does the document say about AI risks?")
print(response3)

The document mentions that organizations are actively working to mitigate risks related to inaccuracy, cybersecurity, intellectual property infringement, and privacy when using generative AI. It also notes that larger organizations are reported to be addressing more risks compared to other organizations, particularly focusing on managing potential cybersecurity and privacy risks.


## Part 2: Agentic RAG

Let's now upgrade the previously defined RAG system into an Agentic RAG system.

In [None]:
!pip install --upgrade datasets
!pip install --upgrade huggingface-hub



#### 2.1: Loading the data

In [None]:
from llama_index.core import SimpleDirectoryReader

reader = SimpleDirectoryReader(input_files=["state.pdf"])
documents = reader.load_data()

print(f"Loaded {len(documents)} document(s).")


Loaded 26 document(s).


#### 2.2: Breaking the data into chunks

In [None]:
from llama_index.core.node_parser import SentenceSplitter

# chunk_size of 1024 is a good default value
splitter = SentenceSplitter(chunk_size=1024)
# Create nodes from documents
nodes = splitter.get_nodes_from_documents(documents)

#### 2.3 Define the LLM and the Embedding Model

In [None]:
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

# LLM model
Settings.llm = OpenAI(model="gpt-3.5-turbo")
# embedding model
Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")

#### 2.4 Create the vector index and summary index

In [None]:
from llama_index.core import SummaryIndex, VectorStoreIndex

# summary index
summary_index = SummaryIndex(nodes)
# vector store index
vector_index = VectorStoreIndex(nodes)

#### 2.4 Create the vector query engine and summary query engine

In [None]:
# summary query engine
summary_query_engine = summary_index.as_query_engine(
    response_mode="tree_summarize",
    use_async=True,
)

# vector query engine
vector_query_engine = vector_index.as_query_engine()

#### 2.5 Convert the vector and query engines into tools

In [None]:
from llama_index.core.tools import QueryEngineTool


summary_tool = QueryEngineTool.from_defaults(
    query_engine=summary_query_engine,
    description=(
        "Useful for summarization questions related to the State of AI paper."
    ),
)

vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description=(
        "Useful for retrieving specific context from the the State of AI paper."
    ),
)

#### 2.6 Define a superset query engine

In [None]:
from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector


query_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=[
        summary_tool,
        vector_tool,
    ],
    verbose=True
)

#### 2.7 Test whether the query engine works

In [None]:
response = query_engine.query("Who is Lareina Yee according to teh document?")
print(str(response))

[1;3;38;5;200mSelecting query engine 1: This choice is more relevant as it focuses on retrieving specific context from the document, which would be necessary to identify who Lareina Yee is..
[0mLareina Yee is one of the authors listed in the document.


#### 2.8 Convert the query engine into a tool

In [None]:
# Create tool wrapper around router
query_engine_tool = QueryEngineTool.from_defaults(
    query_engine=query_engine,
    name="state_of_ai_report_assistant",
    description="Answers questions based on the McKinsey 2025 State of AI report.",
    return_direct=False,
)


#### 2.9 Define system prompt


In [None]:
system_prompt = """
You are a helpful assistant specialized in answering questions using the 'State of AI' March 2025 report by McKinsey.
Your task is to:

1. Use the Summary Tool when the user asks for high-level insights, trends, survey findings, or general understanding
   (e.g., "What are the top AI adoption practices?" or "Summarize the report's key findings").

2. Use the Vector Tool when the user is asking for specific statistics, organizational practices, exhibit-based
   evidence, or detailed examples
   (e.g., "What percentage of companies track AI KPIs?" or "What are the risks companies are mitigating?").

Refer only to the content of the report. If the user's query is outside this context, politely decline or redirect.

Check your answer multiple times to make sure it is actually relevant and mentioned in the document.

Examples of summary queries:
- "How are companies restructuring to adopt GenAI?"
- "What does the report say about workforce reskilling?"

Examples of specific/vector queries:
- "What percentage of companies have a roadmap for GenAI adoption?"
- "Who is responsible for AI governance in large firms?"

Always explain clearly, referencing exact statistics, frameworks, or concepts when relevant. Be concise and insightful.
"""

#### 2.10 Define the agent

In [None]:
from llama_index.core.agent.workflow import AgentWorkflow

query_engine_agent = AgentWorkflow.from_tools_or_functions(
    tools_or_functions=[query_engine_tool],
    llm=Settings.llm,
    system_prompt=system_prompt,
)


#### 2.11 Setup agent observability using Arize Phoenix

In [None]:
!pip install llama-index-callbacks-arize-phoenix arize-phoenix

Collecting llama-index-callbacks-arize-phoenix
  Downloading llama_index_callbacks_arize_phoenix-0.5.1-py3-none-any.whl.metadata (744 bytes)
Collecting arize-phoenix
  Downloading arize_phoenix-11.2.0-py3-none-any.whl.metadata (27 kB)
Collecting openinference-instrumentation-llama-index>=4.1.0 (from llama-index-callbacks-arize-phoenix)
  Downloading openinference_instrumentation_llama_index-4.3.1-py3-none-any.whl.metadata (5.2 kB)
Collecting aioitertools (from arize-phoenix)
  Downloading aioitertools-0.12.0-py3-none-any.whl.metadata (3.8 kB)
Collecting alembic<2,>=1.3.0 (from arize-phoenix)
  Downloading alembic-1.16.2-py3-none-any.whl.metadata (7.3 kB)
Collecting arize-phoenix-client (from arize-phoenix)
  Downloading arize_phoenix_client-1.11.0-py3-none-any.whl.metadata (4.3 kB)
Collecting arize-phoenix-evals>=0.20.6 (from arize-phoenix)
  Downloading arize_phoenix_evals-0.21.0-py3-none-any.whl.metadata (4.8 kB)
Collecting arize-phoenix-otel>=0.10.3 (from arize-phoenix)
  Downloadin

In [None]:
import llama_index
import os

PHOENIX_API_KEY = ""
os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = f"api_key={PHOENIX_API_KEY}"
llama_index.core.set_global_handler(
    "arize_phoenix", endpoint="https://llamatrace.com/v1/traces"
)


#### 2.12 Run the agent and analyze responses

In [None]:
# In Jupyter/Colab, you can use await directly
question = "Who is Yareina Lee according to the document? Where is she mentioned in the document and in what context?"
response = await query_engine_agent.run(question)
print(response)

[1;3;38;5;200mSelecting query engine 1: The question 'Yareina Lee' is more likely to be related to retrieving specific context from the State of AI paper rather than summarization questions..
[0mLareina Yee is a director of the McKinsey Global Institute and a senior partner in the Bay Area office, as mentioned in the document.


#### 2.13 Equip the agent with multiple tools

In [None]:
!pip install llama-index-tools-arxiv llama-index-tools-wikipedia duckduckgo-search
!pip install llama-index-tools-brave-search


Collecting llama-index-tools-arxiv
  Downloading llama_index_tools_arxiv-0.3.0-py3-none-any.whl.metadata (1.6 kB)
Collecting llama-index-tools-wikipedia
  Downloading llama_index_tools_wikipedia-0.3.0-py3-none-any.whl.metadata (1.7 kB)
Collecting duckduckgo-search
  Downloading duckduckgo_search-8.0.4-py3-none-any.whl.metadata (16 kB)
Collecting arxiv<3.0.0,>=2.1.0 (from llama-index-tools-arxiv)
  Downloading arxiv-2.2.0-py3-none-any.whl.metadata (6.3 kB)
Collecting wikipedia<2.0,>=1.4 (from llama-index-tools-wikipedia)
  Downloading wikipedia-1.4.0.tar.gz (27 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting primp>=0.15.0 (from duckduckgo-search)
  Downloading primp-0.15.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (13 kB)
Collecting feedparser~=6.0.10 (from arxiv<3.0.0,>=2.1.0->llama-index-tools-arxiv)
  Downloading feedparser-6.0.11-py3-none-any.whl.metadata (2.4 kB)
Collecting sgmllib3k (from feedparser~=6.0.10->arxiv<3.0.0,>=2.1.0->llama-in

Collecting llama-index-tools-brave-search
  Downloading llama_index_tools_brave_search-0.3.0-py3-none-any.whl.metadata (1.4 kB)
Downloading llama_index_tools_brave_search-0.3.0-py3-none-any.whl (2.8 kB)
Installing collected packages: llama-index-tools-brave-search
Successfully installed llama-index-tools-brave-search-0.3.0


#### 2.14 Add the new tools (ArXiV, Brave Search)


In [None]:
# Import additional tools
from llama_index.tools.arxiv import ArxivToolSpec
from llama_index.tools.wikipedia import WikipediaToolSpec
from llama_index.core.tools import QueryEngineTool
from llama_index.tools.brave_search import BraveSearchToolSpec

import requests
import json


In [None]:
# Create ArXiV tool

arxiv_tool = ArxivToolSpec()

arxiv_tools = arxiv_tool.to_tool_list()


# Create Brave Search tool

brave_search_tool_spec = BraveSearchToolSpec(api_key="")
brave_search_tools = brave_search_tool_spec.to_tool_list()


In [None]:

# Create enhanced agent with multiple tools - FIX: Use extend instead of append
enhanced_tools = [query_engine_tool]  # Start with McKinsey report tool
enhanced_tools.extend(brave_search_tools)  # Add all brave search tools
enhanced_tools.extend(arxiv_tools)  # Add all arxiv tools

#### 2.15 Define the enhanced agent with all tools


In [None]:
# Create new enhanced agent
enhanced_agent = AgentWorkflow.from_tools_or_functions(
    tools_or_functions=enhanced_tools,
    llm=Settings.llm,
    system_prompt="""You are an AI research assistant with access to:
    1. The McKinsey 2025 State of AI report
    2. Web search capabilities
    3. ArXiv research paper search

    Use these tools to provide comprehensive, well-researched answers. When discussing AI trends,
    combine insights from the McKinsey report with recent research and web findings.""",
)


#### 2.16 Battle test agent with multiple questions!


In [None]:
# Test questions that can benefit from multiple tools

# Question 1: Combine McKinsey insights with recent research
question1 = """According to the McKinsey report, what are the main organizational changes companies are making for AI adoption?
Can you also search for recent research papers on AI governance and organizational transformation to provide additional context?"""

print("Question 1: Organizational changes and governance")
print("=" * 50)
response1 = await enhanced_agent.run(question1)
print(response1)
print("\n" + "="*80 + "\n")

Question 1: Organizational changes and governance
[1;3;38;5;200mSelecting query engine 0: Summarization questions related to the State of AI paper would likely cover the main organizational changes companies are making for AI adoption..
[0m### Recent Research Papers on AI Governance and Organizational Transformation:

#### AI Governance Research Papers:
1. **[AI And Organizational Change: Dynamics And Management Strategies](https://www.researchgate.net/publication/380929689_AI_And_Organizational_Change_Dynamics_And_Management_Strategies)**
   - This study investigates the dynamics of AI-induced organizational change, focusing on effective change management strategies, employee adaptation, and cultural transformation.

2. **[AI in Organizational Change Management — Case Studies, Best Practices, Ethical Implications, and Future Technological Trajectories](https://medium.com/@adnanmasood/ai-in-organizational-change-management-case-studies-best-practices-ethical-implications-and-179be4ec

In [None]:
# Question 2: Workflow Redesign and Implementation
question2 = """What does the McKinsey report say about workflow redesign for AI implementation?
Search ArXiv for papers on business process automation with AI and find current web articles about workflow transformation."""

print("Question 2: Workflow Redesign")
response2 = await enhanced_agent.run(question2)
print(response2)


Question 2: Workflow Redesign
[1;3;38;5;200mSelecting query engine 0: Workflow redesign for AI implementation may involve summarizing key points and recommendations from the State of AI paper..
[0m### ArXiv Papers on Business Process Automation with AI:
1. **[D3BA: A Tool for Optimizing Business Processes Using Non-Deterministic Planning](http://arxiv.org/pdf/2001.02619v2):**
   - This paper introduces D3BA, a tool for optimizing business processes using AI planning. It focuses on composing services to automate subtasks within complex business processes.

2. **[Can Artificial Intelligence Transform DevOps?](http://arxiv.org/pdf/2206.00225v1):**
   - Explores the connection between DevOps and AI, highlighting how AI can enhance DevOps processes such as testing, coding, releasing, monitoring, and system improvement.

3. **[Impact of Artificial Intelligence on Businesses](http://arxiv.org/pdf/1905.02092v1):**
   - Discusses the integration of AI in business processes and its impact on r

In [None]:
# Question 3: Risk management and future trends
question3 = """Based on the McKinsey report, what are the key risks organizations are addressing with gen AI?
Can you search the web for recent academic research on AI risk mitigation and compare with the report's findings?"""

print("Question 3: Risk management")
response3 = await enhanced_agent.run(question3)
print(response3)


Question 3: Risk management
[1;3;38;5;200mSelecting query engine 0: The choice 'Useful for summarization questions related to the State of AI paper' is most relevant as it focuses on summarizing key risks organizations are addressing with gen AI, which aligns with the question asked..
[0m### Recent Academic Research on AI Risk Mitigation:
1. **FADRM: Fast and Accurate Data Residual Matching for Dataset Distillation**:
   - This research introduces the concept of Data Residual Matching to facilitate data generation and mitigate data information vanishing in dataset distillation tasks. The method significantly improves computational efficiency and achieves superior performance across multiple dataset benchmarks.
   - [Read more](http://arxiv.org/pdf/2506.24125v1)

2. **Scaling Human Judgment in Community Notes with LLMs**:
   - The paper proposes an open ecosystem where both humans and LLMs can write notes, with human raters serving as the ultimate evaluator of helpful notes. This appr

In [None]:
# Question that forces usage of all three tools
comprehensive_question = """Who is Lareina Yee in the McKinsey document and what are her views on AI's workforce impact?

After finding information about her from the document, please:
1. Search the web using Brave Search for recent articles, interviews, or news about Lareina Yee and her work on AI
2. Search ArXiv for any research papers she may have authored or co-authored related to AI, workforce transformation, or economic impact
3. Provide a comprehensive profile combining insights from all three sources about her expertise and contributions to AI research"""

print("Question: Comprehensive profile of Lareina Yee")
print("=" * 60)
print("This question should force the agent to use:")
print("1. Query Engine - to find info about Lareina Yee in the McKinsey document")
print("2. Brave Search - to find recent web articles/news about her")
print("3. ArXiv Search - to find any academic papers she's authored")
print("=" * 60)

response = await enhanced_agent.run(comprehensive_question)
print(response)

Question: Comprehensive profile of Lareina Yee
This question should force the agent to use:
1. Query Engine - to find info about Lareina Yee in the McKinsey document
2. Brave Search - to find recent web articles/news about her
3. ArXiv Search - to find any academic papers she's authored
[1;3;38;5;200mSelecting query engine 1: The question is asking for specific context related to 'Lareina Yee', which would require retrieving specific information from the State of AI paper..
[0m### Lareina Yee Profile:

#### McKinsey Global Institute Director:
Lareina Yee is a Senior Partner and the Director of the McKinsey Global Institute, where she leads research on AI and frontier technologies, advising companies on growth and transformation.

#### Views on AI's Workforce Impact:
Lareina Yee's work focuses on understanding the impact of AI on the workforce, particularly in terms of transformation and economic implications.

#### Recent Web Findings:
1. **[McKinsey Article: Superagency in the Workp