# Demo #8: Agentic RAG - Autonomous Query Planning and Tool Selection

## Objective
Demonstrate an autonomous agent that dynamically plans retrieval strategies and selects appropriate tools to answer complex queries. This demo extends beyond internal knowledge bases to include **internet search** and **arXiv academic paper access**.

## Core Concepts
- **Agentic workflow**: Thought → Action → Observation loop (ReAct framework)
- **Dynamic tool selection**: Agent chooses which knowledge base or external source to query
- **Multi-step reasoning**: Agent decomposes complex questions into sub-tasks
- **Query decomposition and planning**: Autonomous strategy formulation
- **External tool integration**: Internet search, arXiv research papers, and more

## What is Agentic RAG?

Traditional RAG is **passive**: it retrieves from a single, fixed knowledge base and generates an answer.

**Agentic RAG** is **active**: it uses an LLM-powered agent that:
1. **Analyzes** the query to understand requirements
2. **Plans** which knowledge sources to consult (internal + external)
3. **Executes** multi-step retrieval strategies
4. **Adapts** based on retrieved information
5. **Synthesizes** information from multiple sources

### Static RAG vs. Agentic RAG

**Static RAG:**
```
Query → Single Vector DB → Retrieve Top-K → Generate
```
- ❌ No query analysis
- ❌ Fixed retrieval strategy
- ❌ Single knowledge source
- ❌ No multi-hop reasoning
- ❌ Limited to internal knowledge base
- ❌ Cannot access current information

**Agentic RAG:**
```
Query → Agent Analyzes
          ↓
    Plans retrieval strategy
          ↓
    Selects Tool(s): Internal KB | Internet | arXiv | APIs
          ↓
    Tool 1 → Retrieve → Observation
          ↓
    Needs more info?
          ↓
    Tool 2 → Retrieve → Observation
          ↓
    Synthesize all observations → Generate
```
- ✅ Intelligent query understanding
- ✅ Dynamic strategy selection
- ✅ Multiple knowledge sources (internal + external)
- ✅ Multi-hop reasoning
- ✅ Access to current information via internet
- ✅ Integration with academic research (arXiv)
- ✅ Extensible to any API or tool

## ReAct Framework

The agent uses the **ReAct** (Reasoning + Acting) pattern:

```
Loop:
  1. Thought: "I need information about X"
  2. Action: Use tool Y with query Z
  3. Observation: Retrieved information
  4. Thought: "This partially answers the question, but I need more about A"
  5. Action: Use tool W with query B
  6. Observation: Additional information
  7. Thought: "Now I have enough information"
  8. Final Answer: Synthesized response
```

## Use Cases

1. **Cross-domain queries**: "How can ML improve financial portfolio management?"
   - Queries ML knowledge base
   - Queries finance knowledge base
   - Synthesizes both

2. **Multi-hop reasoning**: "Compare X vs. Y considering factors A, B, and C"
   - Retrieves info about X
   - Retrieves info about Y
   - Retrieves info about factors A, B, C
   - Performs comparative analysis

3. **Current information**: "What are the latest developments in GPT-4?"
   - Uses internet search for current information
   - Accesses recent news and updates
   - Synthesizes current state

4. **Academic research**: "Find recent papers on Retrieval-Augmented Generation"
   - Searches arXiv for relevant papers
   - Fetches specific papers for details
   - Summarizes research findings

5. **Hybrid queries**: "Explain RL theory and find recent research applications"
   - Internal knowledge for fundamentals
   - arXiv for latest research
   - Comprehensive synthesis

## Available Tools in This Demo

1. **machine_learning_knowledge**: Internal ML concepts database
2. **finance_knowledge**: Internal finance and trading database
3. **internet_search**: DuckDuckGo search for current information
4. **arxiv_search**: Search academic papers on arXiv.org
5. **arxiv_fetch_paper**: Fetch specific papers by arXiv ID

## Setup: Install Dependencies and Load Environment

In [None]:
# Install required packages
# Run this cell if packages are not already installed
# !pip install llama-index llama-index-llms-azure-openai llama-index-embeddings-azure-openai
# !pip install python-dotenv

In [None]:
import os
import json
from typing import List, Dict
from dotenv import load_dotenv
import warnings
warnings.filterwarnings('ignore')

# Load environment variables
load_dotenv()

# Verify Azure OpenAI credentials
required_vars = [
    'AZURE_OPENAI_API_KEY',
    'AZURE_OPENAI_ENDPOINT',
    'AZURE_OPENAI_API_VERSION',
    'AZURE_OPENAI_DEPLOYMENT_NAME',
    'AZURE_OPENAI_EMBEDDING_DEPLOYMENT'
]

missing_vars = [var for var in required_vars if not os.getenv(var)]
if missing_vars:
    print(f"❌ Missing environment variables: {', '.join(missing_vars)}")
    print("\nPlease create a .env file with:")
    for var in missing_vars:
        print(f"{var}=<your_value>")
else:
    print("✅ All required environment variables are set")
    print(f"   Endpoint: {os.getenv('AZURE_OPENAI_ENDPOINT')}")
    print(f"   Deployment: {os.getenv('AZURE_OPENAI_DEPLOYMENT_NAME')}")
    print(f"   Embedding: {os.getenv('AZURE_OPENAI_EMBEDDING_DEPLOYMENT')}")

## Step 1: Initialize Azure OpenAI Components

In [None]:
from llama_index.llms.azure_openai import AzureOpenAI
from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding
from llama_index.core import Settings

# Initialize Azure OpenAI LLM
azure_llm = AzureOpenAI(
    model="gpt-4",
    deployment_name=os.getenv('AZURE_OPENAI_DEPLOYMENT_NAME'),
    api_key=os.getenv('AZURE_OPENAI_API_KEY'),
    azure_endpoint=os.getenv('AZURE_OPENAI_ENDPOINT'),
    api_version=os.getenv('AZURE_OPENAI_API_VERSION'),
    temperature=0.0  # Deterministic for consistent comparisons
)

# Initialize Azure OpenAI Embedding Model
azure_embed = AzureOpenAIEmbedding(
    model="text-embedding-ada-002",
    deployment_name=os.getenv('AZURE_OPENAI_EMBEDDING_DEPLOYMENT'),
    api_key=os.getenv('AZURE_OPENAI_API_KEY'),
    azure_endpoint=os.getenv('AZURE_OPENAI_ENDPOINT'),
    api_version=os.getenv('AZURE_OPENAI_API_VERSION'),
)

# Set global defaults
Settings.llm = azure_llm
Settings.embed_model = azure_embed
Settings.chunk_size = 512
Settings.chunk_overlap = 50

print("✅ Azure OpenAI components initialized")
print(f"   LLM: {azure_llm.model}")
print(f"   Embeddings: {azure_embed.model}")

## Step 2: Load and Index Machine Learning Knowledge Base

In [None]:
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter

# Define paths
ml_data_path = "./data/ml_concepts"

print("📁 Loading Machine Learning documents...")

# Load ML documents
ml_documents = SimpleDirectoryReader(
    input_dir=ml_data_path,
    recursive=False,
    required_exts=[".md"]
).load_data()

print(f"   Loaded {len(ml_documents)} ML documents")
for doc in ml_documents:
    print(f"   - {doc.metadata.get('file_name', 'Unknown')}")

# Parse into chunks
parser = SentenceSplitter(chunk_size=512, chunk_overlap=50)
ml_nodes = parser.get_nodes_from_documents(ml_documents)
print(f"\n   Parsed into {len(ml_nodes)} chunks")

# Create vector index
print("\n🔍 Creating ML vector index...")
ml_index = VectorStoreIndex(ml_nodes, embed_model=azure_embed)
print("✅ ML index created")

## Step 3: Load and Index Finance Knowledge Base

In [None]:
# Define paths
finance_data_path = "./data/finance_docs"

print("📁 Loading Finance documents...")

# Load Finance documents
finance_documents = SimpleDirectoryReader(
    input_dir=finance_data_path,
    recursive=False,
    required_exts=[".md"]
).load_data()

print(f"   Loaded {len(finance_documents)} Finance documents")
for doc in finance_documents:
    print(f"   - {doc.metadata.get('file_name', 'Unknown')}")

# Parse into chunks
finance_nodes = parser.get_nodes_from_documents(finance_documents)
print(f"\n   Parsed into {len(finance_nodes)} chunks")

# Create vector index
print("\n🔍 Creating Finance vector index...")
finance_index = VectorStoreIndex(finance_nodes, embed_model=azure_embed)
print("✅ Finance index created")

## Step 4: Create Query Engine Tools for Agent

We wrap each knowledge base in a tool that the agent can use. Each tool has:
- **name**: Identifier for the tool
- **description**: Helps the agent decide when to use this tool
- **query_engine**: The actual retrieval system

In [None]:
from llama_index.core.tools import QueryEngineTool, ToolMetadata

# Create query engines
ml_query_engine = ml_index.as_query_engine(
    similarity_top_k=3,
    llm=azure_llm
)

finance_query_engine = finance_index.as_query_engine(
    similarity_top_k=3,
    llm=azure_llm
)

# Wrap in tools with descriptive metadata
ml_tool = QueryEngineTool(
    query_engine=ml_query_engine,
    metadata=ToolMetadata(
        name="machine_learning_knowledge",
        description=(
            "Provides expert knowledge about machine learning algorithms, concepts, and techniques. "
            "Use this tool for questions about: neural networks, gradient boosting, random forests, "
            "support vector machines, k-means clustering, deep learning, reinforcement learning, "
            "model training, optimization, and ML theory."
        )
    )
)

finance_tool = QueryEngineTool(
    query_engine=finance_query_engine,
    metadata=ToolMetadata(
        name="finance_knowledge",
        description=(
            "Provides information about financial concepts, investment strategies, and market analysis. "
            "Use this tool for questions about: portfolio management, diversification, risk management, "
            "quantitative trading, technical analysis, market indicators, investment strategies, "
            "and financial markets."
        )
    )
)

print("✅ Query engine tools created:")
print(f"   1. {ml_tool.metadata.name}")
print(f"      → {ml_tool.metadata.description[:80]}...")
print(f"\n   2. {finance_tool.metadata.name}")
print(f"      → {finance_tool.metadata.description[:80]}...")

## Step 4.5: Create Internet Search and arXiv Tools

In addition to our internal knowledge bases, we'll add tools that can access external information:
- **Internet Search**: For current information and topics outside our knowledge bases
- **arXiv Search**: For recent research papers and academic publications
- **arXiv Fetch**: To retrieve and read specific arXiv papers

These tools demonstrate the extensibility of the agentic framework.

In [None]:
# Install additional required packages for web search and arXiv
# !pip install duckduckgo-search arxiv

In [None]:
from llama_index.core.tools import FunctionTool
from typing import Optional
import json

# Import web search libraries
try:
    from duckduckgo_search import DDGS
    import arxiv
    print("✅ Web search and arXiv libraries loaded")
except ImportError as e:
    print(f"⚠️  Import error: {e}")
    print("   Please install: pip install duckduckgo-search arxiv")

# Define internet search function
def internet_search(query: str, max_results: int = 3) -> str:
    """
    Search the internet for current information using DuckDuckGo.
    
    Args:
        query: The search query
        max_results: Maximum number of results to return (default: 3)
    
    Returns:
        Formatted string with search results
    """
    try:
        with DDGS() as ddgs:
            results = list(ddgs.text(query, max_results=max_results))
        
        if not results:
            return f"No results found for query: {query}"
        
        formatted_results = f"Internet search results for '{query}':\n\n"
        for i, result in enumerate(results, 1):
            title = result.get('title', 'No title')
            snippet = result.get('body', 'No description')
            url = result.get('href', 'No URL')
            formatted_results += f"{i}. **{title}**\n"
            formatted_results += f"   {snippet}\n"
            formatted_results += f"   Source: {url}\n\n"
        
        return formatted_results
    except Exception as e:
        return f"Error performing internet search: {str(e)}"

# Define arXiv search function
def arxiv_search(query: str, max_results: int = 3) -> str:
    """
    Search arXiv for research papers.
    
    Args:
        query: The search query
        max_results: Maximum number of papers to return (default: 3)
    
    Returns:
        Formatted string with paper information
    """
    try:
        search = arxiv.Search(
            query=query,
            max_results=max_results,
            sort_by=arxiv.SortCriterion.Relevance
        )
        
        papers = list(search.results())
        
        if not papers:
            return f"No arXiv papers found for query: {query}"
        
        formatted_results = f"arXiv search results for '{query}':\n\n"
        for i, paper in enumerate(papers, 1):
            formatted_results += f"{i}. **{paper.title}**\n"
            formatted_results += f"   Authors: {', '.join([author.name for author in paper.authors])}\n"
            formatted_results += f"   Published: {paper.published.strftime('%Y-%m-%d')}\n"
            formatted_results += f"   arXiv ID: {paper.entry_id.split('/')[-1]}\n"
            formatted_results += f"   Summary: {paper.summary[:300]}...\n"
            formatted_results += f"   URL: {paper.entry_id}\n\n"
        
        return formatted_results
    except Exception as e:
        return f"Error searching arXiv: {str(e)}"

# Define arXiv fetch function
def arxiv_fetch_paper(arxiv_id: str) -> str:
    """
    Fetch and read a specific arXiv paper by its ID.
    
    Args:
        arxiv_id: The arXiv paper ID (e.g., '2301.12345' or 'cs.AI/2301.12345')
    
    Returns:
        Formatted string with detailed paper information
    """
    try:
        # Clean the arxiv_id
        arxiv_id = arxiv_id.replace('arxiv:', '').replace('arXiv:', '')
        arxiv_id = arxiv_id.split('/')[-1]  # Handle URLs
        
        search = arxiv.Search(id_list=[arxiv_id])
        paper = next(search.results(), None)
        
        if not paper:
            return f"No paper found with arXiv ID: {arxiv_id}"
        
        result = f"**arXiv Paper Details**\n\n"
        result += f"**Title:** {paper.title}\n\n"
        result += f"**Authors:** {', '.join([author.name for author in paper.authors])}\n\n"
        result += f"**Published:** {paper.published.strftime('%Y-%m-%d')}\n\n"
        result += f"**Updated:** {paper.updated.strftime('%Y-%m-%d')}\n\n"
        result += f"**arXiv ID:** {paper.entry_id.split('/')[-1]}\n\n"
        result += f"**Categories:** {', '.join(paper.categories)}\n\n"
        result += f"**Abstract:**\n{paper.summary}\n\n"
        result += f"**PDF URL:** {paper.pdf_url}\n\n"
        result += f"**Primary Category:** {paper.primary_category}\n"
        
        if paper.comment:
            result += f"\n**Comments:** {paper.comment}\n"
        
        if paper.journal_ref:
            result += f"\n**Journal Reference:** {paper.journal_ref}\n"
        
        if paper.doi:
            result += f"\n**DOI:** {paper.doi}\n"
        
        return result
    except Exception as e:
        return f"Error fetching arXiv paper: {str(e)}"

print("✅ Search functions defined:")

In [None]:
# Create FunctionTool wrappers for the search functions

internet_search_tool = FunctionTool.from_defaults(
    fn=internet_search,
    name="internet_search",
    description=(
        "Search the internet for current information, news, and topics not covered in the knowledge bases. "
        "Use this tool when you need up-to-date information, current events, recent developments, "
        "or information about topics outside the ML and Finance domains. "
        "Particularly useful for: latest news, current market conditions, recent events, "
        "general knowledge queries, and fact-checking."
    )
)

arxiv_search_tool = FunctionTool.from_defaults(
    fn=arxiv_search,
    name="arxiv_search",
    description=(
        "Search arXiv.org for academic research papers in various fields including machine learning, "
        "AI, physics, mathematics, finance, and more. Use this tool when you need recent research papers, "
        "academic publications, or scholarly articles. Returns paper titles, authors, abstracts, and arXiv IDs. "
        "Useful for: latest research, academic insights, state-of-the-art methods, research trends."
    )
)

arxiv_fetch_tool = FunctionTool.from_defaults(
    fn=arxiv_fetch_paper,
    name="arxiv_fetch_paper",
    description=(
        "Fetch and read a specific arXiv paper by its ID. Use this tool when you have an arXiv ID "
        "and want to get detailed information about that specific paper including full abstract, "
        "authors, categories, and publication details. Provide the arXiv ID (e.g., '2301.12345')."
    )
)

print("✅ External tools created:")
print(f"   3. {internet_search_tool.metadata.name}")
print(f"      → {internet_search_tool.metadata.description[:80]}...")
print(f"\n   4. {arxiv_search_tool.metadata.name}")
print(f"      → {arxiv_search_tool.metadata.description[:80]}...")
print(f"\n   5. {arxiv_fetch_tool.metadata.name}")
print(f"      → {arxiv_fetch_tool.metadata.description[:80]}...")

## Step 5: Create ReAct Agent

The **ReActAgent** implements the Reasoning + Acting pattern:
- **Reasoning**: Agent thinks about what information it needs
- **Acting**: Agent uses tools to gather information
- **Loop**: Continues until it has enough information to answer

We enable `verbose=True` to see the agent's internal reasoning process.

In [None]:
from llama_index.core.agent import ReActAgent

# Create ReAct agent with all tools
agent = ReActAgent.from_tools(
    tools=[ml_tool, finance_tool, internet_search_tool, arxiv_search_tool, arxiv_fetch_tool],
    llm=azure_llm,
    verbose=True,  # Show reasoning process
    max_iterations=10  # Increased for more complex queries with external tools
)

print("✅ ReAct Agent initialized with extended capabilities")
print(f"   Available tools: {len(agent.get_tools())}")
print(f"   Internal Knowledge: {ml_tool.metadata.name}, {finance_tool.metadata.name}")
print(f"   External Tools: {internet_search_tool.metadata.name}, {arxiv_search_tool.metadata.name}, {arxiv_fetch_tool.metadata.name}")
print(f"   Max iterations: 10")
print(f"   Verbose: True (will show reasoning)")

## Step 6: Create Static RAG Baseline for Comparison

To demonstrate the advantages of Agentic RAG, we'll create a static RAG system that simply combines both knowledge bases into one index. This represents the traditional approach.

In [None]:
print("🔍 Creating static RAG baseline (combined index)...")

# Combine all nodes
combined_nodes = ml_nodes + finance_nodes
print(f"   Combined {len(combined_nodes)} chunks from both domains")

# Create combined index
combined_index = VectorStoreIndex(combined_nodes, embed_model=azure_embed)
static_query_engine = combined_index.as_query_engine(
    similarity_top_k=3,
    llm=azure_llm
)

print("✅ Static RAG baseline created")

## Test Scenario 1: Simple Single-Domain Query

**Query**: "Explain gradient boosting."

This is a straightforward ML question. The agent should:
1. Identify that this is about machine learning
2. Use the ML knowledge tool
3. Retrieve relevant information
4. Provide an answer

Expected: Agent uses only the ML tool.

In [None]:
print("="*80)
print("TEST SCENARIO 1: Simple Single-Domain Query")
print("="*80)

query_1 = "Explain gradient boosting."
print(f"\n📝 Query: {query_1}")
print("\n" + "="*80)
print("AGENTIC RAG (with tool selection):")
print("="*80)

# Run agent (verbose=True will show reasoning)
agent_response_1 = agent.chat(query_1)

print("\n" + "="*80)
print("AGENT'S FINAL ANSWER:")
print("="*80)
print(agent_response_1.response)

In [None]:
# Compare with static RAG
print("\n" + "="*80)
print("STATIC RAG (combined index):")
print("="*80)

static_response_1 = static_query_engine.query(query_1)
print(static_response_1.response)

# Show retrieved sources
print("\n📚 Sources used:")
for i, node in enumerate(static_response_1.source_nodes, 1):
    filename = node.metadata.get('file_name', 'Unknown')
    score = node.score
    print(f"   {i}. {filename} (score: {score:.3f})")

### Analysis: Scenario 1

**What to observe:**
- Agent's reasoning process (Thought → Action → Observation)
- Tool selection (should choose ML tool)
- Both approaches should provide good answers for single-domain queries
- Agent might be more targeted in tool use

## Test Scenario 2: Cross-Domain Query

**Query**: "How can machine learning be applied to stock market prediction and portfolio management?"

This requires information from BOTH domains:
- **ML**: How ML models work, what algorithms are suitable
- **Finance**: Portfolio management concepts, market dynamics

Expected behavior:
- Agent should recognize the cross-domain nature
- Use ML tool to get ML information
- Use Finance tool to get financial context
- Synthesize information from both

In [None]:
print("\n\n")
print("="*80)
print("TEST SCENARIO 2: Cross-Domain Query")
print("="*80)

query_2 = "How can machine learning be applied to stock market prediction and portfolio management?"
print(f"\n📝 Query: {query_2}")
print("\n" + "="*80)
print("AGENTIC RAG (with multi-tool usage):")
print("="*80)

# Run agent - should use both tools
agent_response_2 = agent.chat(query_2)

print("\n" + "="*80)
print("AGENT'S FINAL ANSWER:")
print("="*80)
print(agent_response_2.response)

In [None]:
# Compare with static RAG
print("\n" + "="*80)
print("STATIC RAG (combined index):")
print("="*80)

static_response_2 = static_query_engine.query(query_2)
print(static_response_2.response)

# Show retrieved sources
print("\n📚 Sources used:")
for i, node in enumerate(static_response_2.source_nodes, 1):
    filename = node.metadata.get('file_name', 'Unknown')
    score = node.score
    print(f"   {i}. {filename} (score: {score:.3f})")

### Analysis: Scenario 2

**Key differences:**

**Agentic RAG:**
- ✅ Explicitly queries both ML and Finance tools
- ✅ Systematic retrieval from each domain
- ✅ Guaranteed coverage of both aspects
- ✅ Clear reasoning trail showing multi-step process

**Static RAG:**
- ⚠️ Relies on vector similarity alone
- ⚠️ May miss one domain if embedding similarity is skewed
- ⚠️ Top-3 chunks might all come from one domain
- ⚠️ No guarantee of balanced coverage

The agent's ability to **plan** and **execute** multi-step retrieval ensures comprehensive answers.

## Test Scenario 3: Complex Multi-Hop Query

**Query**: "Compare the risk-adjusted returns of portfolio strategies using reinforcement learning versus traditional diversification. Consider both the Sharpe ratio and maximum drawdown."

This is a complex query requiring:
1. **Reinforcement learning** knowledge (ML domain)
2. **Portfolio diversification** strategies (Finance domain)
3. **Risk metrics** like Sharpe ratio and max drawdown (Finance domain)
4. **Synthesis** of ML approaches to finance problems

Expected: Agent should:
- Break down into sub-questions
- Query ML tool for RL concepts
- Query Finance tool for diversification strategies
- Query Finance tool again for risk metrics
- Perform comparative analysis

In [None]:
print("\n\n")
print("="*80)
print("TEST SCENARIO 3: Complex Multi-Hop Query")
print("="*80)

query_3 = (
    "Compare the risk-adjusted returns of portfolio strategies using reinforcement learning "
    "versus traditional diversification. Consider both the Sharpe ratio and maximum drawdown."
)
print(f"\n📝 Query: {query_3}")
print("\n" + "="*80)
print("AGENTIC RAG (with multi-step reasoning):")
print("="*80)

# Run agent - should use multiple queries across tools
agent_response_3 = agent.chat(query_3)

print("\n" + "="*80)
print("AGENT'S FINAL ANSWER:")
print("="*80)
print(agent_response_3.response)

In [None]:
# Compare with static RAG
print("\n" + "="*80)
print("STATIC RAG (combined index):")
print("="*80)

static_response_3 = static_query_engine.query(query_3)
print(static_response_3.response)

# Show retrieved sources
print("\n📚 Sources used:")
for i, node in enumerate(static_response_3.source_nodes, 1):
    filename = node.metadata.get('file_name', 'Unknown')
    score = node.score
    print(f"   {i}. {filename} (score: {score:.3f})")

### Analysis: Scenario 3

**Agent's advantages become clear:**

1. **Decomposition**: Agent breaks complex query into manageable sub-questions
2. **Strategic retrieval**: Queries specific tools for specific information
3. **Iterative refinement**: Can make follow-up queries if initial information is insufficient
4. **Synthesis**: Combines information from multiple sources coherently

**Static RAG limitations:**
- Must rely on single query embedding matching multiple concepts
- Top-K retrieval might miss important aspects
- No ability to "realize" information is missing and query again
- Less systematic coverage of complex multi-part questions

## Test Scenario 4: External Knowledge - Internet Search

**Query**: "What are the latest developments in GPT-4 and how do they compare to Claude 3?"

This query requires **current information** that is NOT in our internal knowledge bases:
- Latest developments in GPT-4 (released after our knowledge cutoff)
- Information about Claude 3 (outside our domain)
- Comparison of current models

Expected behavior:
- Agent should recognize this requires external information
- Use the internet_search tool to find current information
- Synthesize findings into a comprehensive answer

In [None]:
print("\n\n")
print("="*80)
print("TEST SCENARIO 4: External Knowledge - Internet Search")
print("="*80)

query_4 = "What are the latest developments in GPT-4 and how do they compare to Claude 3?"
print(f"\n📝 Query: {query_4}")
print("\n" + "="*80)
print("AGENTIC RAG (with internet search capability):")
print("="*80)

# Run agent - should use internet search tool
agent_response_4 = agent.chat(query_4)

print("\n" + "="*80)
print("AGENT'S FINAL ANSWER:")
print("="*80)
print(agent_response_4.response)

### Analysis: Scenario 4

**Key observations:**

1. **Knowledge Gap Recognition**: Agent recognizes that internal knowledge bases don't contain this information
2. **Tool Selection**: Chooses internet_search tool for current information
3. **External Integration**: Successfully queries external sources
4. **Synthesis**: Combines information from web search into coherent answer

**This demonstrates:**
- ✅ Agent can handle queries beyond internal knowledge
- ✅ Seamless integration of external information sources
- ✅ No need to maintain constantly updated internal databases
- ✅ Access to the entire internet's knowledge

## Test Scenario 5: Academic Research - arXiv Search and Fetch

**Query**: "Find recent research papers on Retrieval-Augmented Generation and summarize the key findings."

This requires:
- Searching academic literature on arXiv
- Finding recent RAG papers
- Potentially fetching specific papers for details
- Synthesizing research findings

Expected behavior:
- Agent uses arxiv_search tool to find relevant papers
- May use arxiv_fetch_paper for detailed information
- Synthesizes academic findings into accessible summary

In [None]:
print("\n\n")
print("="*80)
print("TEST SCENARIO 5: Academic Research - arXiv Search")
print("="*80)

query_5 = "Find recent research papers on Retrieval-Augmented Generation and summarize the key findings."
print(f"\n📝 Query: {query_5}")
print("\n" + "="*80)
print("AGENTIC RAG (with arXiv search capability):")
print("="*80)

# Run agent - should use arXiv search tool
agent_response_5 = agent.chat(query_5)

print("\n" + "="*80)
print("AGENT'S FINAL ANSWER:")
print("="*80)
print(agent_response_5.response)

### Analysis: Scenario 5

**Academic Research Integration:**

1. **Research Discovery**: Agent uses arXiv search to find relevant academic papers
2. **Multi-Paper Analysis**: Can search and compare multiple research papers
3. **Detailed Fetching**: Can use arxiv_fetch_paper for specific papers if needed
4. **Academic Synthesis**: Summarizes research findings in accessible language

**This demonstrates:**
- ✅ Access to cutting-edge research
- ✅ Stay current with latest academic developments
- ✅ Integration of peer-reviewed sources
- ✅ Bridge between academic research and practical applications

**Use cases:**
- Literature reviews
- Staying current with research trends
- Finding state-of-the-art methods
- Academic research support

## Test Scenario 6: Hybrid Multi-Tool Query

**Query**: "How is reinforcement learning currently being applied in algorithmic trading? Include both theoretical foundations from our knowledge base and recent research developments from arXiv."

This is a **complex hybrid query** requiring:
1. **Internal ML knowledge**: RL fundamentals, algorithms
2. **Internal Finance knowledge**: Algorithmic trading concepts
3. **External Research**: Recent arXiv papers on RL in trading
4. **Synthesis**: Combine theoretical foundations with cutting-edge research

Expected behavior:
- Agent uses ML tool for RL theory
- Agent uses Finance tool for trading concepts
- Agent uses arXiv search for recent research
- Comprehensive synthesis of all sources

In [None]:
print("\n\n")
print("="*80)
print("TEST SCENARIO 6: Hybrid Multi-Tool Query (Internal + External)")
print("="*80)

query_6 = (
    "How is reinforcement learning currently being applied in algorithmic trading? "
    "Include both theoretical foundations from our knowledge base and recent research "
    "developments from arXiv."
)
print(f"\n📝 Query: {query_6}")
print("\n" + "="*80)
print("AGENTIC RAG (orchestrating multiple tools):")
print("="*80)

# Run agent - should use ML, Finance, and arXiv tools
agent_response_6 = agent.chat(query_6)

print("\n" + "="*80)
print("AGENT'S FINAL ANSWER:")
print("="*80)
print(agent_response_6.response)

### Analysis: Scenario 6

**Ultimate Agent Orchestration:**

This scenario demonstrates the **full power of Agentic RAG**:

1. **Multi-Source Integration**:
   - Internal ML knowledge for RL fundamentals
   - Internal Finance knowledge for trading context
   - External arXiv for latest research
   
2. **Intelligent Orchestration**:
   - Agent decides which tools to use and in what order
   - Recognizes when internal knowledge is sufficient vs. when external research is needed
   - Synthesizes information from 3+ different sources
   
3. **Comprehensive Coverage**:
   - Theory + Practice + Research
   - Historical + Current
   - Internal + External

**This is impossible with static RAG:**
- ❌ Static RAG cannot access external sources
- ❌ Cannot distinguish between different types of information needs
- ❌ Cannot orchestrate multiple specialized tools
- ❌ Limited to what's in the vector database

**Agentic RAG excels:**
- ✅ Dynamic tool selection based on query requirements
- ✅ Seamless integration of internal and external sources
- ✅ Comprehensive answers combining multiple perspectives
- ✅ Extensible to any number of specialized tools

## Comparative Analysis: Agent Behavior Across Scenarios

In [None]:
print("="*80)
print("COMPREHENSIVE COMPARATIVE ANALYSIS SUMMARY")
print("="*80)

summary = {
    "Scenario 1: Simple Single-Domain": {
        "Query": "Explain gradient boosting",
        "Tools Used": "ML knowledge only",
        "Static RAG Performance": "Good (simple query)",
        "Advantage": "Minimal - both work well"
    },
    "Scenario 2: Cross-Domain": {
        "Query": "ML for stock market and portfolio management",
        "Tools Used": "ML + Finance tools",
        "Static RAG Performance": "May miss one domain",
        "Advantage": "Agent guarantees coverage of both domains"
    },
    "Scenario 3: Complex Multi-Hop": {
        "Query": "Compare RL vs traditional diversification (risk-adjusted)",
        "Tools Used": "ML + Finance (multiple queries)",
        "Static RAG Performance": "Likely incomplete coverage",
        "Advantage": "Agent can decompose, retrieve iteratively, synthesize"
    },
    "Scenario 4: External Knowledge": {
        "Query": "Latest developments in GPT-4 vs Claude 3",
        "Tools Used": "Internet search",
        "Static RAG Performance": "Impossible - no access to external info",
        "Advantage": "Agent accesses current information beyond knowledge base"
    },
    "Scenario 5: Academic Research": {
        "Query": "Recent RAG research papers and findings",
        "Tools Used": "arXiv search + fetch",
        "Static RAG Performance": "Impossible - no access to academic databases",
        "Advantage": "Agent integrates cutting-edge research"
    },
    "Scenario 6: Hybrid Multi-Tool": {
        "Query": "RL in algorithmic trading (theory + recent research)",
        "Tools Used": "ML + Finance + arXiv (orchestrated)",
        "Static RAG Performance": "Impossible - limited to internal KB only",
        "Advantage": "Comprehensive synthesis of internal + external sources"
    }
}

for scenario, details in summary.items():
    print(f"\n{scenario}:")
    for key, value in details.items():
        print(f"  {key}: {value}")

print("\n" + "="*80)
print("KEY INSIGHT:")
print("="*80)
print("Scenarios 1-3: Agentic RAG provides better tool selection within internal KB")
print("Scenarios 4-6: Agentic RAG unlocks entirely new capabilities impossible with static RAG")
print("\nThe extensibility to external tools is the true game-changer!")

## Visualizing the Agentic Workflow

Let's trace through the agent's decision-making for the complex query:

In [None]:
print("="*80)
print("AGENTIC RAG WORKFLOW VISUALIZATION")
print("="*80)

workflow = """
Query: "Compare RL vs traditional diversification (risk-adjusted)"
    ↓
┌─────────────────────────────────────────────────────────────┐
│ AGENT REASONING (Iteration 1)                               │
├─────────────────────────────────────────────────────────────┤
│ Thought: "I need to understand reinforcement learning       │
│           in the context of portfolio management"           │
│                                                              │
│ Action: Use machine_learning_knowledge tool                 │
│         Query: "reinforcement learning for portfolio mgmt"  │
└─────────────────────────────────────────────────────────────┘
    ↓
┌─────────────────────────────────────────────────────────────┐
│ OBSERVATION 1                                                │
├─────────────────────────────────────────────────────────────┤
│ Retrieved: RL algorithms (Q-learning, DQN, Policy Gradient) │
│            RL advantages: adaptability, non-linear learning │
│            RL challenges: non-stationarity, overfitting     │
└─────────────────────────────────────────────────────────────┘
    ↓
┌─────────────────────────────────────────────────────────────┐
│ AGENT REASONING (Iteration 2)                               │
├─────────────────────────────────────────────────────────────┤
│ Thought: "Good, now I need information about traditional    │
│           diversification strategies and risk metrics"      │
│                                                              │
│ Action: Use finance_knowledge tool                          │
│         Query: "portfolio diversification and risk metrics" │
└─────────────────────────────────────────────────────────────┘
    ↓
┌─────────────────────────────────────────────────────────────┐
│ OBSERVATION 2                                                │
├─────────────────────────────────────────────────────────────┤
│ Retrieved: Diversification strategies (MPT, risk parity)    │
│            Sharpe Ratio = (Return - RFR) / StdDev           │
│            Maximum Drawdown = largest peak-to-trough decline│
└─────────────────────────────────────────────────────────────┘
    ↓
┌─────────────────────────────────────────────────────────────┐
│ AGENT REASONING (Iteration 3)                               │
├─────────────────────────────────────────────────────────────┤
│ Thought: "I have sufficient information from both domains   │
│           to perform a comprehensive comparison"            │
│                                                              │
│ Action: Generate final answer                               │
└─────────────────────────────────────────────────────────────┘
    ↓
┌─────────────────────────────────────────────────────────────┐
│ FINAL ANSWER                                                 │
├─────────────────────────────────────────────────────────────┤
│ Synthesized comparison covering:                            │
│  • RL approach characteristics                              │
│  • Traditional diversification characteristics              │
│  • Sharpe ratio implications                                │
│  • Maximum drawdown considerations                          │
│  • Comparative analysis                                     │
└─────────────────────────────────────────────────────────────┘
"""

print(workflow)

## Key Advantages of Agentic RAG

### 1. **Intelligent Tool Selection**
- Agent analyzes query semantics
- Selects appropriate knowledge source(s)
- More precise than embedding-based retrieval alone

### 2. **Multi-Step Reasoning**
- Can break complex queries into sub-tasks
- Retrieves information iteratively
- Builds comprehensive understanding step-by-step

### 3. **Cross-Domain Synthesis**
- Naturally handles queries spanning multiple domains
- Ensures coverage of all relevant aspects
- Better than hoping single retrieval captures everything

### 4. **Adaptability**
- Can adjust strategy based on retrieved information
- Can make follow-up queries if needed
- Dynamic rather than fixed retrieval path

### 5. **Transparency**
- Reasoning process is visible (with verbose=True)
- Can see which tools were used and why
- Easier to debug and understand system behavior

### 6. **Extensibility**
- Easy to add new tools (web search, calculators, APIs)
- Agent learns to use them automatically
- Scales better than monolithic systems

## When to Use Agentic RAG

### ✅ Use Agentic RAG when:
1. **Multiple Knowledge Sources**: You have distinct, specialized knowledge bases
2. **Complex Queries**: Questions require multi-step reasoning or synthesis
3. **Cross-Domain**: Queries span multiple topics or domains
4. **Dynamic Needs**: Need to adapt retrieval strategy based on query
5. **Interpretability**: Want to understand the reasoning process
6. **Tool Integration**: Need to combine retrieval with other tools (calculators, APIs)

### ⚠️ Consider Static RAG when:
1. **Simple Queries**: Straightforward single-topic questions
2. **Speed Critical**: Need fastest possible response (agent adds latency)
3. **Cost Sensitive**: Agent makes multiple LLM calls (higher cost)
4. **Single Source**: Only one homogeneous knowledge base
5. **Predictable Patterns**: All queries follow similar pattern

### 💡 Hybrid Approach:
- Use static RAG for simple, common queries
- Route complex queries to agentic system
- Best of both worlds!

## Limitations and Considerations

### Challenges with Agentic RAG:

1. **Latency**
   - Multiple LLM calls (reasoning + tool use)
   - Slower than single-pass retrieval
   - May require optimization for production

2. **Cost**
   - More LLM token usage
   - Multiple embedding operations
   - Can be 3-5x more expensive per query

3. **Complexity**
   - More moving parts
   - Harder to debug when things go wrong
   - Requires careful prompt engineering

4. **Reliability**
   - Agent might not always choose optimal tool
   - Can get stuck in loops (max_iterations needed)
   - More failure modes than static systems

5. **Token Limits**
   - Reasoning traces consume context window
   - May need careful management of conversation history

### Mitigation Strategies:
- Use caching for common queries
- Implement query classification (route simple queries to static RAG)
- Optimize tool descriptions for better selection
- Monitor and limit max_iterations
- Implement fallbacks for agent failures

## Extensions and Advanced Patterns

### 1. **Add More Tools**
```python
# Web search tool
web_search_tool = QueryEngineTool(...)

# Calculator tool for numerical operations
calculator_tool = FunctionTool.from_defaults(...)

# External API tool
api_tool = QueryEngineTool(...)

agent = ReActAgent.from_tools(
    tools=[ml_tool, finance_tool, web_search_tool, calculator_tool, api_tool],
    llm=azure_llm
)
```

### 2. **Sub-Agents Pattern**
Create specialized agents for each domain:
```python
ml_agent = ReActAgent.from_tools([ml_tool_1, ml_tool_2], ...)
finance_agent = ReActAgent.from_tools([finance_tool_1, finance_tool_2], ...)

# Meta-agent that delegates to sub-agents
meta_agent = ReActAgent.from_tools(
    tools=[ml_agent_tool, finance_agent_tool],
    llm=azure_llm
)
```

### 3. **Memory-Augmented Agents**
```python
# Add conversation memory
from llama_index.core.memory import ChatMemoryBuffer

memory = ChatMemoryBuffer.from_defaults(token_limit=3000)
agent = ReActAgent.from_tools(
    tools=[ml_tool, finance_tool],
    llm=azure_llm,
    memory=memory  # Maintains context across queries
)
```

### 4. **Custom Tool Creation**
```python
from llama_index.core.tools import FunctionTool

def calculate_sharpe_ratio(returns: List[float], risk_free_rate: float) -> float:
    """Calculate Sharpe Ratio"""
    # Implementation
    pass

sharpe_tool = FunctionTool.from_defaults(fn=calculate_sharpe_ratio)
```

### 5. **Hybrid Routing**
```python
def smart_query(query: str):
    # Classify query complexity
    if is_simple(query):
        return static_query_engine.query(query)
    else:
        return agent.chat(query)
```

## Conclusion

### Key Takeaways

1. **Agentic RAG represents a paradigm shift** from passive retrieval to active reasoning

2. **The ReAct pattern** (Reasoning + Acting) enables systematic multi-step information gathering

3. **Tool-based architecture** makes systems modular and extensible to **any information source**

4. **Agent's planning capability** ensures comprehensive coverage of complex queries

5. **External tool integration** (internet search, arXiv, APIs) unlocks capabilities impossible with static RAG

6. **Trade-offs exist**: Higher latency and cost vs. better handling of complex queries and access to external information

7. **Best practice**: Use hybrid approach—static RAG for simple queries, agentic for complex and external information needs

### When Agentic RAG Shines
- ✅ Cross-domain questions requiring multiple knowledge bases
- ✅ Multi-hop reasoning required
- ✅ Multiple specialized knowledge sources
- ✅ Need for interpretable reasoning
- ✅ Integration with external tools/APIs
- ✅ **Current information beyond knowledge base cutoff**
- ✅ **Academic research and literature reviews**
- ✅ **Hybrid internal-external information synthesis**

### Tools Demonstrated in This Notebook

**Internal Knowledge:**
1. Machine Learning concepts
2. Finance and trading knowledge

**External Knowledge:**
3. Internet search (DuckDuckGo) - current information
4. arXiv search - academic papers
5. arXiv fetch - specific paper details

**Extensible to:**
- Any API (weather, stock prices, news, etc.)
- Databases and data warehouses
- Custom calculators and processors
- Other AI models and services

### The Future of RAG

As LLM reasoning capabilities improve and costs decrease, **agentic approaches will become increasingly practical**. The ability to:
- Plan retrieval strategies dynamically
- Adapt to retrieved information
- Synthesize from multiple sources (internal + external)
- Integrate diverse tools seamlessly
- Access current information and research

...represents the next evolution of RAG systems.

### Impact of External Tools

The addition of internet search and arXiv integration demonstrates that **Agentic RAG is not limited by knowledge base boundaries**:

- **Static RAG**: Constrained to pre-indexed documents
- **Agentic RAG**: Access to the entire internet, academic databases, APIs, and more

This fundamentally changes what's possible:
- No need to constantly update internal knowledge bases
- Always have access to latest information
- Can verify facts against multiple sources
- Bridge internal knowledge with external research

### Further Exploration
- Experiment with different agent types (OpenAI Function Calling Agent, Structured Planning Agent)
- Add custom tools for your domain (calculators, databases, proprietary APIs)
- Integrate more external sources (Google Scholar, Wikipedia, news APIs)
- Implement agent memory for multi-turn conversations
- Explore sub-agent hierarchies for very complex domains
- Build evaluation frameworks to measure agent decision quality
- Add authentication for accessing private/proprietary tools
- Implement caching strategies for expensive external calls