# Agentic RAG in LlamaIndex


## Part 0: Loading libraries

In [2]:
!pip install llama-index llama-index-vector-stores-chroma llama-index-llms-huggingface-api llama-index-embeddings-huggingface -U -q

In [3]:
pip install llama-index-llms-gemini llama-index-embeddings-huggingface google-generativeai sentence-transformers



In [None]:
import os
os.environ["OPENAI_API_KEY"] = ""

And, let's log in to Hugging Face to use serverless Inference APIs.

In [4]:
from huggingface_hub import login

login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## Part 1: Simple RAG Systems

In [None]:
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, Settings
from llama_index.core.node_parser import SentenceSplitter
from llama_index.llms.gemini import Gemini  # New import for Gemini LLM
from llama_index.embeddings.huggingface import HuggingFaceEmbedding  # New import for local/HuggingFace embeddings
import os
# Make sure to install the necessary packages first:
# pip install llama-index-llms-gemini llama-index-embeddings-huggingface google-generativeai sentence-transformers

# Set your API Key for Gemini
# It is highly recommended to set this as an environment variable (e.g., in a .env file or your shell)
# export GOOGLE_API_KEY="YOUR_API_KEY"
# For demonstration, you might set it directly, but this is less secure:
os.environ["GOOGLE_API_KEY"] = ""

# Load document
reader = SimpleDirectoryReader(input_files=["state_of_AI.pdf"])
documents = reader.load_data()
print(f"Loaded {len(documents)} document(s).")

# Split into chunks
splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)

# Set up LLM and embedding model
# LLM: Use Gemini, specifying a model like "models/gemini-pro" or "models/gemini-2.5-flash"
Settings.llm = Gemini(model="models/gemini-2.5-flash")

# Embedding Model: Use HuggingFaceEmbedding for BGE-small-en
Settings.embed_model = HuggingFaceEmbedding(
    model_name="BAAI/bge-small-en-v1.5" # This is the standard BGE-small-en model name
)

# Create vector index
vector_index = VectorStoreIndex(nodes)

# Create query engine
query_engine = vector_index.as_query_engine()

Loaded 26 document(s).


  Settings.llm = Gemini(model="models/gemini-2.5-flash")


In [None]:
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, Settings
from llama_index.core.node_parser import SentenceSplitter
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

# Load document
reader = SimpleDirectoryReader(input_files=["state.pdf"])
documents = reader.load_data()
print(f"Loaded {len(documents)} document(s).")

# Split into chunks
splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)

# Set up LLM and embedding model
Settings.llm = OpenAI(model="gpt-3.5-turbo")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")

# Create vector index
vector_index = VectorStoreIndex(nodes)

# Create query engine
query_engine = vector_index.as_query_engine()

Loaded 26 document(s).


#### 1.1 Inspecting the vector store

In [6]:
# Access the vector store data directly
vector_store = vector_index.vector_store

# Get embedding dictionary and node dictionary
embedding_dict = vector_store.data.embedding_dict
node_dict = vector_store.data.text_id_to_ref_doc_id

print(f"Number of embeddings: {len(embedding_dict)}")
print(f"Number of node references: {len(node_dict)}")

# Show first few embeddings
for i, (node_id, embedding) in enumerate(list(embedding_dict.items())[:3]):
    print(f"\n--- Embedding {i} ---")
    print(f"Node ID: {node_id}")
    print(f"Embedding dimension: {len(embedding)}")
    print(f"First 10 values: {embedding[:10]}")

Number of embeddings: 27
Number of node references: 27

--- Embedding 0 ---
Node ID: 37433a3d-10f0-4e62-b871-442e5522532c
Embedding dimension: 384
First 10 values: [-0.0662464052438736, -0.027124036103487015, 0.03070172294974327, -0.002462257631123066, 0.03332288935780525, 0.030975334346294403, -0.022839121520519257, 0.04124998301267624, 0.0035174640361219645, -0.006849117111414671]

--- Embedding 1 ---
Node ID: 7538d787-0c9b-4df6-b35e-3ebad09db9f9
Embedding dimension: 384
First 10 values: [0.006492024287581444, -0.029223959892988205, 0.005867324769496918, -0.05067090690135956, 0.01823529414832592, 0.043149806559085846, -0.0011607427150011063, 0.017698295414447784, 0.03303244709968567, -0.005703154020011425]

--- Embedding 2 ---
Node ID: 1dcd7c99-c62e-4b80-a1bc-422652d7102a
Embedding dimension: 384
First 10 values: [0.026413721963763237, -0.015026451088488102, 0.013218114152550697, -0.05975547432899475, 0.049674130976200104, -0.005162168759852648, 0.03932492807507515, 0.032075706869363

#### 1.2 Asking questions to the RAG system

In [7]:
# Query the document
response = query_engine.query("Who is Lareina Yee?")
print(response)

Lareina Yee is one of the individuals associated with "The state of AI" document, published in March 2025.


#### 1.3 Checking if the responses make sense

In [8]:
print(len(response.source_nodes))

2


In [9]:
# Print out each source node
print("Source nodes:")
print("=" * 50)

for i, node in enumerate(response.source_nodes):
    print(f"Node {i+1}:")
    print(f"Score: {node.score}")
    print(f"Text: {node.text}")
    print(f"Metadata: {node.metadata}")
    print("-" * 30)

Source nodes:
Node 1:
Score: 0.47403650633360583
Text: The state of AI  
March 2025
Alex Singla  
Alexander Sukharevsky  
Lareina Yee  
Michael Chui  
Bryce Hall
How organizations are rewiring to capture value
Metadata: {'page_label': '1', 'file_name': 'state_of_AI.pdf', 'file_path': 'state_of_AI.pdf', 'file_type': 'application/pdf', 'file_size': 5564174, 'creation_date': '2025-10-24', 'last_modified_date': '2025-10-24'}
------------------------------
Node 2:
Score: 0.4416760088006249
Text: McKinsey & Company
16
14
8
18
40
4
16
26
13 20
16
2
30
35
8
28
5
16
14
10
15
42
3 34
15
28
12 22
12
26
31
10
30
7
20
16
7
19
35
4
23
24
8
20
21
24
39
5
32
3
13
16
8
18
42
15
26
15 20
15
33
33
8
24
6
18
18
7
17
37
3
16
34
9
21
15
32
31
7
26
6
24
22
5
11
36
3
17
31
15
22
21
2
36
32
4
19
21
17
6
18
9
30
20
22
15 18
15
31
28
12
27
5
5
4 32
Personal experience with gen AI tools, in 2023, /f_irst half of 2024, and second half of 2024,¹ % of respondents
Respondents are much more likely now than in 2023 and

In [10]:
# Ask more questions
response2 = query_engine.query("What are the main findings about AI adoption?")
print(response2)

Organizations are generally in the early stages of adopting and scaling generative AI (gen AI) solutions, with only a small percentage of executives describing their rollouts as mature. Many have yet to experience organization-wide, bottom-line impact from gen AI use.

Key findings include:
*   There are 12 identified adoption and scaling practices for gen AI, all of which show positive correlations with EBIT impact.
*   Tracking well-defined Key Performance Indicators (KPIs) for gen AI solutions is the practice with the most significant impact on the bottom line. For larger organizations, establishing a clearly defined road map to drive gen AI adoption also has a substantial impact.
*   Overall, less than one-third of organizations are following most of these 12 adoption and scaling practices. For instance, less than one in five organizations are tracking KPIs for gen AI solutions.
*   Larger organizations are more likely to implement these best practices compared to smaller organizat

In [11]:
response3 = query_engine.query("What does the document say about AI risks?")
print(response3)

Organizations are actively working to mitigate risks associated with generative AI. There is an increasing focus on addressing risks such as inaccuracy, intellectual property infringement, and privacy. Many organizations are intensifying their efforts to manage these and other gen-AI-related risks.

Specifically, organizations are more likely than in early 2024 to be actively managing risks related to inaccuracy, cybersecurity, and intellectual property infringement. These three are among the risks most frequently cited as having caused negative consequences for organizations. A survey indicates that 47 percent of organizations have experienced at least one negative consequence from gen AI use, an increase from 44 percent in early 2024.

Other identified risks include regulatory compliance, personal/individual privacy, explainability, workforce/labor displacement, equity and fairness, organizational reputation, national security, physical safety, environmental impact, and political sta

## Part 2: Agentic RAG

Let's now upgrade the previously defined RAG system into an Agentic RAG system.

In [12]:
!pip install --upgrade datasets
!pip install --upgrade huggingface-hub



#### 2.1: Loading the data

In [13]:
from llama_index.core import SimpleDirectoryReader

reader = SimpleDirectoryReader(input_files=["state_of_AI.pdf"])
documents = reader.load_data()

print(f"Loaded {len(documents)} document(s).")


Loaded 26 document(s).


#### 2.2: Breaking the data into chunks

In [14]:
from llama_index.core.node_parser import SentenceSplitter

# chunk_size of 1024 is a good default value
splitter = SentenceSplitter(chunk_size=1024)
# Create nodes from documents
nodes = splitter.get_nodes_from_documents(documents)

#### 2.3 Define the LLM and the Embedding Model

In [15]:
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

# LLM: Use Gemini, specifying a model like "models/gemini-pro" or "models/gemini-2.5-flash"
Settings.llm = Gemini(model="models/gemini-2.5-flash")

# Embedding Model: Use HuggingFaceEmbedding for BGE-small-en
Settings.embed_model = HuggingFaceEmbedding(
    model_name="BAAI/bge-small-en-v1.5" # This is the standard BGE-small-en model name
)

  Settings.llm = Gemini(model="models/gemini-2.5-flash")


#### 2.4 Create the vector index and summary index

In [16]:
from llama_index.core import SummaryIndex, VectorStoreIndex

# summary index
summary_index = SummaryIndex(nodes)
# vector store index
vector_index = VectorStoreIndex(nodes)

#### 2.4 Create the vector query engine and summary query engine

In [17]:
# summary query engine
summary_query_engine = summary_index.as_query_engine(
    response_mode="tree_summarize",
    use_async=True,
)

# vector query engine
vector_query_engine = vector_index.as_query_engine()

#### 2.5 Convert the vector and query engines into tools

In [18]:
from llama_index.core.tools import QueryEngineTool


summary_tool = QueryEngineTool.from_defaults(
    query_engine=summary_query_engine,
    description=(
        "Useful for summarization questions related to the State of AI paper."
    ),
)

vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description=(
        "Useful for retrieving specific context from the the State of AI paper."
    ),
)

#### 2.6 Define a superset query engine

In [19]:
from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector


query_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=[
        summary_tool,
        vector_tool,
    ],
    verbose=True
)

#### 2.7 Test whether the query engine works

In [20]:
response = query_engine.query("Who is Lareina Yee according to teh document?")
print(str(response))

[1;3;38;5;200mSelecting query engine 1: The question 'Who is Lareina Yee according to teh document?' asks for a specific piece of information or detail about an individual from the document. This aligns directly with 'retrieving specific context' rather than summarizing the entire paper..
[0mLareina Yee is one of the authors of "The state of AI" document.


#### 2.8 Convert the query engine into a tool

In [21]:
# Create tool wrapper around router
query_engine_tool = QueryEngineTool.from_defaults(
    query_engine=query_engine,
    name="state_of_ai_report_assistant",
    description="Answers questions based on the McKinsey 2025 State of AI report.",
    return_direct=False,
)


#### 2.9 Define system prompt


In [22]:
system_prompt = """
You are a helpful assistant specialized in answering questions using the 'State of AI' March 2025 report by McKinsey.
Your task is to:

1. Use the Summary Tool when the user asks for high-level insights, trends, survey findings, or general understanding
   (e.g., "What are the top AI adoption practices?" or "Summarize the report's key findings").

2. Use the Vector Tool when the user is asking for specific statistics, organizational practices, exhibit-based
   evidence, or detailed examples
   (e.g., "What percentage of companies track AI KPIs?" or "What are the risks companies are mitigating?").

Refer only to the content of the report. If the user's query is outside this context, politely decline or redirect.

Check your answer multiple times to make sure it is actually relevant and mentioned in the document.

Examples of summary queries:
- "How are companies restructuring to adopt GenAI?"
- "What does the report say about workforce reskilling?"

Examples of specific/vector queries:
- "What percentage of companies have a roadmap for GenAI adoption?"
- "Who is responsible for AI governance in large firms?"

Always explain clearly, referencing exact statistics, frameworks, or concepts when relevant. Be concise and insightful.
"""

#### 2.10 Define the agent

In [23]:
from llama_index.core.agent.workflow import AgentWorkflow

query_engine_agent = AgentWorkflow.from_tools_or_functions(
    tools_or_functions=[query_engine_tool],
    llm=Settings.llm,
    system_prompt=system_prompt,
)


#### 2.11 Setup agent observability using Arize Phoenix

In [24]:
!pip install llama-index-callbacks-arize-phoenix arize-phoenix

Collecting llama-index-callbacks-arize-phoenix
  Downloading llama_index_callbacks_arize_phoenix-0.6.1-py3-none-any.whl.metadata (499 bytes)
Collecting arize-phoenix
  Downloading arize_phoenix-12.7.1-py3-none-any.whl.metadata (35 kB)
Collecting openinference-instrumentation-llama-index>=4.1.0 (from llama-index-callbacks-arize-phoenix)
  Downloading openinference_instrumentation_llama_index-4.3.8-py3-none-any.whl.metadata (5.2 kB)
Collecting aioitertools (from arize-phoenix)
  Downloading aioitertools-0.12.0-py3-none-any.whl.metadata (3.8 kB)
Collecting arize-phoenix-client>=1.20.0 (from arize-phoenix)
  Downloading arize_phoenix_client-1.21.0-py3-none-any.whl.metadata (11 kB)
Collecting arize-phoenix-evals>=2.0.0 (from arize-phoenix)
  Downloading arize_phoenix_evals-2.5.0-py3-none-any.whl.metadata (6.4 kB)
Collecting arize-phoenix-otel>=0.10.3 (from arize-phoenix)
  Downloading arize_phoenix_otel-0.13.1-py3-none-any.whl.metadata (8.3 kB)
Collecting email-validator (from arize-phoenix

In [None]:
import llama_index
import os

PHOENIX_API_KEY = ""
os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = f"api_key={PHOENIX_API_KEY}"
llama_index.core.set_global_handler(
    "arize_phoenix", endpoint="https://llamatrace.com/v1/traces"
)


#### 2.12 Run the agent and analyze responses

In [30]:
# In Jupyter/Colab, you can use await directly
question = "Who is Yareina Lee according to the document? Where is she mentioned in the document and in what context?"
response = await query_engine_agent.run(question)
print(response)

InvalidStateError: Result is not set.

#### 2.13 Equip the agent with multiple tools

In [None]:
!pip install llama-index-tools-arxiv llama-index-tools-wikipedia duckduckgo-search
!pip install llama-index-tools-brave-search


Collecting llama-index-tools-arxiv
  Downloading llama_index_tools_arxiv-0.3.0-py3-none-any.whl.metadata (1.6 kB)
Collecting llama-index-tools-wikipedia
  Downloading llama_index_tools_wikipedia-0.3.0-py3-none-any.whl.metadata (1.7 kB)
Collecting duckduckgo-search
  Downloading duckduckgo_search-8.0.4-py3-none-any.whl.metadata (16 kB)
Collecting arxiv<3.0.0,>=2.1.0 (from llama-index-tools-arxiv)
  Downloading arxiv-2.2.0-py3-none-any.whl.metadata (6.3 kB)
Collecting wikipedia<2.0,>=1.4 (from llama-index-tools-wikipedia)
  Downloading wikipedia-1.4.0.tar.gz (27 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting primp>=0.15.0 (from duckduckgo-search)
  Downloading primp-0.15.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (13 kB)
Collecting feedparser~=6.0.10 (from arxiv<3.0.0,>=2.1.0->llama-index-tools-arxiv)
  Downloading feedparser-6.0.11-py3-none-any.whl.metadata (2.4 kB)
Collecting sgmllib3k (from feedparser~=6.0.10->arxiv<3.0.0,>=2.1.0->llama-in

Collecting llama-index-tools-brave-search
  Downloading llama_index_tools_brave_search-0.3.0-py3-none-any.whl.metadata (1.4 kB)
Downloading llama_index_tools_brave_search-0.3.0-py3-none-any.whl (2.8 kB)
Installing collected packages: llama-index-tools-brave-search
Successfully installed llama-index-tools-brave-search-0.3.0


#### 2.14 Add the new tools (ArXiV, Brave Search)


In [None]:
# Import additional tools
from llama_index.tools.arxiv import ArxivToolSpec
from llama_index.tools.wikipedia import WikipediaToolSpec
from llama_index.core.tools import QueryEngineTool
from llama_index.tools.brave_search import BraveSearchToolSpec

import requests
import json


In [None]:
# Create ArXiV tool

arxiv_tool = ArxivToolSpec()

arxiv_tools = arxiv_tool.to_tool_list()


# Create Brave Search tool

brave_search_tool_spec = BraveSearchToolSpec(api_key="")
brave_search_tools = brave_search_tool_spec.to_tool_list()


In [None]:

# Create enhanced agent with multiple tools - FIX: Use extend instead of append
enhanced_tools = [query_engine_tool]  # Start with McKinsey report tool
enhanced_tools.extend(brave_search_tools)  # Add all brave search tools
enhanced_tools.extend(arxiv_tools)  # Add all arxiv tools

#### 2.15 Define the enhanced agent with all tools


In [None]:
# Create new enhanced agent
enhanced_agent = AgentWorkflow.from_tools_or_functions(
    tools_or_functions=enhanced_tools,
    llm=Settings.llm,
    system_prompt="""You are an AI research assistant with access to:
    1. The McKinsey 2025 State of AI report
    2. Web search capabilities
    3. ArXiv research paper search

    Use these tools to provide comprehensive, well-researched answers. When discussing AI trends,
    combine insights from the McKinsey report with recent research and web findings.""",
)


#### 2.16 Battle test agent with multiple questions!


In [None]:
# Test questions that can benefit from multiple tools

# Question 1: Combine McKinsey insights with recent research
question1 = """According to the McKinsey report, what are the main organizational changes companies are making for AI adoption?
Can you also search for recent research papers on AI governance and organizational transformation to provide additional context?"""

print("Question 1: Organizational changes and governance")
print("=" * 50)
response1 = await enhanced_agent.run(question1)
print(response1)
print("\n" + "="*80 + "\n")

Question 1: Organizational changes and governance
[1;3;38;5;200mSelecting query engine 0: Summarization questions related to the State of AI paper would likely cover the main organizational changes companies are making for AI adoption..
[0m### Recent Research Papers on AI Governance and Organizational Transformation:

#### AI Governance Research Papers:
1. **[AI And Organizational Change: Dynamics And Management Strategies](https://www.researchgate.net/publication/380929689_AI_And_Organizational_Change_Dynamics_And_Management_Strategies)**
   - This study investigates the dynamics of AI-induced organizational change, focusing on effective change management strategies, employee adaptation, and cultural transformation.

2. **[AI in Organizational Change Management — Case Studies, Best Practices, Ethical Implications, and Future Technological Trajectories](https://medium.com/@adnanmasood/ai-in-organizational-change-management-case-studies-best-practices-ethical-implications-and-179be4ec

In [None]:
# Question 2: Workflow Redesign and Implementation
question2 = """What does the McKinsey report say about workflow redesign for AI implementation?
Search ArXiv for papers on business process automation with AI and find current web articles about workflow transformation."""

print("Question 2: Workflow Redesign")
response2 = await enhanced_agent.run(question2)
print(response2)


Question 2: Workflow Redesign
[1;3;38;5;200mSelecting query engine 0: Workflow redesign for AI implementation may involve summarizing key points and recommendations from the State of AI paper..
[0m### ArXiv Papers on Business Process Automation with AI:
1. **[D3BA: A Tool for Optimizing Business Processes Using Non-Deterministic Planning](http://arxiv.org/pdf/2001.02619v2):**
   - This paper introduces D3BA, a tool for optimizing business processes using AI planning. It focuses on composing services to automate subtasks within complex business processes.

2. **[Can Artificial Intelligence Transform DevOps?](http://arxiv.org/pdf/2206.00225v1):**
   - Explores the connection between DevOps and AI, highlighting how AI can enhance DevOps processes such as testing, coding, releasing, monitoring, and system improvement.

3. **[Impact of Artificial Intelligence on Businesses](http://arxiv.org/pdf/1905.02092v1):**
   - Discusses the integration of AI in business processes and its impact on r

In [None]:
# Question 3: Risk management and future trends
question3 = """Based on the McKinsey report, what are the key risks organizations are addressing with gen AI?
Can you search the web for recent academic research on AI risk mitigation and compare with the report's findings?"""

print("Question 3: Risk management")
response3 = await enhanced_agent.run(question3)
print(response3)


Question 3: Risk management
[1;3;38;5;200mSelecting query engine 0: The choice 'Useful for summarization questions related to the State of AI paper' is most relevant as it focuses on summarizing key risks organizations are addressing with gen AI, which aligns with the question asked..
[0m### Recent Academic Research on AI Risk Mitigation:
1. **FADRM: Fast and Accurate Data Residual Matching for Dataset Distillation**:
   - This research introduces the concept of Data Residual Matching to facilitate data generation and mitigate data information vanishing in dataset distillation tasks. The method significantly improves computational efficiency and achieves superior performance across multiple dataset benchmarks.
   - [Read more](http://arxiv.org/pdf/2506.24125v1)

2. **Scaling Human Judgment in Community Notes with LLMs**:
   - The paper proposes an open ecosystem where both humans and LLMs can write notes, with human raters serving as the ultimate evaluator of helpful notes. This appr

In [None]:
# Question that forces usage of all three tools
comprehensive_question = """Who is Lareina Yee in the McKinsey document and what are her views on AI's workforce impact?

After finding information about her from the document, please:
1. Search the web using Brave Search for recent articles, interviews, or news about Lareina Yee and her work on AI
2. Search ArXiv for any research papers she may have authored or co-authored related to AI, workforce transformation, or economic impact
3. Provide a comprehensive profile combining insights from all three sources about her expertise and contributions to AI research"""

print("Question: Comprehensive profile of Lareina Yee")
print("=" * 60)
print("This question should force the agent to use:")
print("1. Query Engine - to find info about Lareina Yee in the McKinsey document")
print("2. Brave Search - to find recent web articles/news about her")
print("3. ArXiv Search - to find any academic papers she's authored")
print("=" * 60)

response = await enhanced_agent.run(comprehensive_question)
print(response)

Question: Comprehensive profile of Lareina Yee
This question should force the agent to use:
1. Query Engine - to find info about Lareina Yee in the McKinsey document
2. Brave Search - to find recent web articles/news about her
3. ArXiv Search - to find any academic papers she's authored
[1;3;38;5;200mSelecting query engine 1: The question is asking for specific context related to 'Lareina Yee', which would require retrieving specific information from the State of AI paper..
[0m### Lareina Yee Profile:

#### McKinsey Global Institute Director:
Lareina Yee is a Senior Partner and the Director of the McKinsey Global Institute, where she leads research on AI and frontier technologies, advising companies on growth and transformation.

#### Views on AI's Workforce Impact:
Lareina Yee's work focuses on understanding the impact of AI on the workforce, particularly in terms of transformation and economic implications.

#### Recent Web Findings:
1. **[McKinsey Article: Superagency in the Workp