<a href="https://colab.research.google.com/github/shrikantvarma/AgenticAI/blob/main/Adaptive_RAG_final.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Adaptive/Hybrid RAG: Traditional +  Agentic RAG

**A Practical Comparison with ChromaDB and OpenAI**

This notebook explores **Adaptive/Hybrid Retrieval-Augmented Generation (RAG)**, demonstrating how it can intelligently combine the strengths of **Traditional RAG** and **Agentic RAG** approaches. By routing queries based on their complexity, Adaptive RAG aims to optimize the balance between **latency** (speed) and **quality** (accuracy and depth of analysis) for different types of user questions.



We use an **Admin trigger troubleshooting use case** to illustrate these concepts, leveraging:
-   **ChromaDB**: A vector database for efficient storage and retrieval of troubleshooting knowledge.
-   **OpenAI GPT-4o-mini**: A powerful language model for generating responses and executing diagnostic steps.

The scenario is as follows:
1. There is a trigger in the system that failed.
2. There is a document that explains how to troubleshoot triggers.
3. The trigger setting and logs are available to the LLM.

---

## Understanding the RAG Approaches

-   **Traditional RAG**:
    -   **Process**: Simple, single-step approach. Retrieves relevant documents from a vector database and uses an LLM to generate an answer based *only* on the retrieved context.
    -   **Characteristics**: Generally **fast** and **low cost** due to minimal LLM interaction. Provides **generic answers** without analyzing specific system states.

-   **Agentic RAG**:
    -   **Process**: A multi-step reasoning process that involves multiple LLM calls. It retrieves relevant information from the troubleshooting guides *and* analyzes provided state data (like trigger settings and logs) to form a diagnosis. The LLM acts as an "agent" to plan and execute checks.
    -   **Characteristics**: **Thorough** and provides **specific solution**. However, it is **slower** and **more costly** due to the increased number of steps and LLM interactions.

-   **Adaptive RAG**:
    -   **Process**: Introduces an intelligent router that assesses the **complexity** of the incoming user query. Simple queries are directed to the faster Traditional RAG path, while complex queries requiring deeper analysis are sent to the more thorough Agentic RAG path.
    -   **Characteristics**: Aims to achieve the **best balance** between speed and quality by using the most appropriate method for each query type.

---

## Demonstration and Comparison

The notebook demonstrates these approaches by:
1.  Setting up a knowledge base in ChromaDB and simulating trigger state data.
2.  Implementing and running both Traditional and Agentic RAG methods on a troubleshooting query.
3.  Comparing their performance metrics (Latency, LLM Calls) and output quality.
4.  Implementing a simple Hybrid RAG router and testing how it routes different types of queries.

---

## Key Takeaways

-   No single RAG method is ideal for all situations; there's an inherent **latency vs quality tradeoff**.
-   Traditional RAG is effective for quick information retrieval (simple queries).
-   Agentic RAG is powerful for complex problem-solving requiring state analysis.
-   **Hybrid RAG** provides a practical solution to this tradeoff by dynamically choosing the optimal approach per query.
-   Effective **query complexity assessment** is fundamental to a successful Hybrid RAG implementation.

## Comparison Metrics and Summary

The demonstration highlights the differences between the RAG approaches based on the following metrics observed during the runs:

-   **Quality**: The relevance, specificity, and depth of the generated answer.
-   **Latency**: The time taken to generate a response.
-   **LLM Calls**: The number of times the language model is invoked.
-   **Cost**: An estimation of the cost based on the number of LLM calls and assumed token usage.
-   **Complexity Handled**: The type of queries each method is best suited for.
-   **State Analysis**: Whether the method incorporates analysis of the provided system state.
-   **Steps**: The number and nature of the steps involved in generating a response.

Here is a summary table comparing the approaches with metrics from the runs:

| Characteristic     | Traditional RAG                 | Agentic RAG                     | Adaptive RAG                      |
| :----------------- | :------------------------------ | :------------------------------ | :------------------------------ |
| **Quality**        | Generic answers                 | Specific diagnosis              | Varies based on routing         |
| **Latency**        | 3766 ms                        | 14316 ms                       | 8721 ms (average)              |
| **LLM Calls**      | 1                               | 6                               | ~3.5 (average)                  |
| **Estimated Cost** | 0.02 cents                         | 0.12 cents              | 0.07 cents                          |
| **Complexity Handled** | Simple queries                  | Complex queries                 | Routes based on query complexity |
| **State Analysis** | No                              | Yes                             | Yes (when routed to Agentic)    |
| **Steps**          | Single retrieval and generation | Multi-step reasoning and checks | Routes to appropriate method    |

In [None]:
# ============================================================================
# CELL 1: Install Dependencies
# ============================================================================

!pip install chromadb openai python-dotenv -q

print("✓ Dependencies installed")

✓ Dependencies installed


In [None]:
# ============================================================================
# CELL 2: Setup and Imports
# ============================================================================

import os
import json
import time
from typing import Dict, List, Any
import chromadb
from chromadb.config import Settings
from openai import OpenAI

from google.colab import userdata
openai_api_key = userdata.get('OPENAI_API_KEY').strip()
client = OpenAI(api_key=openai_api_key)


print("✓ Imports loaded")
print("✓ OpenAI client initialized")

✓ Imports loaded
✓ OpenAI client initialized


In [None]:
# ============================================================================
# CELL 3: Create Knowledge Base
# ============================================================================

# Troubleshooting guide as a single string
troubleshooting_guide_text = """Trigger Disabled Issue: If trigger is not firing, first check if the trigger is enabled.
Go to Settings > Automation > Triggers and look for the toggle switch. It should be ON/green.
Disabled triggers never fire regardless of conditions.

Condition Matching: Triggers only fire when conditions match ticket data.
With ALL logic, every condition must be true. With ANY logic, at least one condition must be true.
Compare your trigger conditions against the actual ticket field values carefully.

Logic Types Explained: ALL logic means every condition must match (AND).
For example, status=new AND priority=high means both must be true.
ANY logic means at least one condition must match (OR).
Common mistake: using ALL when you meant ANY.

Execution Logs: Check trigger logs to see which tickets were evaluated,
whether the trigger fired, and the specific reason it didn't fire.
Logs are found in Settings > Automation > Logs and show execution history with error details.
"""

# Split the guide into individual documents based on paragraphs
troubleshooting_docs_content = troubleshooting_guide_text.strip().split('\n\n')

# Create a list of document dictionaries with IDs and metadata
troubleshooting_docs = [
    {
        "id": f"doc_{i+1}",
        "content": content,
        "metadata": {"category": "troubleshooting"} # Using a generic category for now
    }
    for i, content in enumerate(troubleshooting_docs_content)
]


# Trigger state (actual settings and logs)
trigger_state = {
    "trigger_settings": {
        "id": "trigger_001",
        "name": "Auto-assign high priority tickets",
        "enabled": True,
        "conditions": [
            {"field": "status", "operator": "equals", "value": "new"},
            {"field": "priority", "operator": "equals", "value": "high"}
        ],
        "logic": "ALL",
        "actions": ["assign_to_team_a"]
    },
    "execution_logs": [
        {
            "ticket_id": "TKT_123",
            "timestamp": "2025-01-15T10:30:00Z",
            "fired": False,
            "reason": "Condition mismatch: priority is 'medium', expected 'high'"
        },
        {
            "ticket_id": "TKT_124",
            "timestamp": "2025-01-15T11:15:00Z",
            "fired": True,
            "actions_executed": ["Assigned to Team A"]
        },
        {
            "ticket_id": "TKT_125",
            "timestamp": "2025-01-15T14:22:00Z",
            "fired": False,
            "reason": "Condition mismatch: status is 'open', expected 'new'"
        }
    ],
    "recent_tickets": [
        {"id": "TKT_123", "status": "new", "priority": "medium"},
        {"id": "TKT_124", "status": "new", "priority": "high"},
        {"id": "TKT_125", "status": "open", "priority": "high"}
    ]
}

print("✓ Knowledge base created")
print(f"  - {len(troubleshooting_docs)} troubleshooting documents")
print(f"  - Trigger state with {len(trigger_state['execution_logs'])} logs")

✓ Knowledge base created
  - 4 troubleshooting documents
  - Trigger state with 3 logs


In [None]:
# ============================================================================
# CELL 4: Setup ChromaDB Vector Database
# ============================================================================

# Initialize ChromaDB client
chroma_client = chromadb.Client(Settings(
    anonymized_telemetry=False,
    allow_reset=True
))

# Reset to start fresh
chroma_client.reset()

# Create collection
collection = chroma_client.create_collection(
    name="trigger_troubleshooting",
    metadata={"description": "Admin trigger troubleshooting knowledge base"}
)

# Add documents to collection
collection.add(
    documents=[doc["content"] for doc in troubleshooting_docs],
    ids=[doc["id"] for doc in troubleshooting_docs],
    metadatas=[doc["metadata"] for doc in troubleshooting_docs]
)

print("✓ ChromaDB initialized")
print(f"  - Collection: {collection.name}")
print(f"  - Documents: {collection.count()}")

✓ ChromaDB initialized
  - Collection: trigger_troubleshooting
  - Documents: 4


In [None]:
# ============================================================================
# CELL 5: Helper Functions for LLM and Retrieval
# ============================================================================

def call_llm(prompt: str, model: str = "gpt-4o-mini", max_tokens: int = 500) -> str:
    """Call OpenAI API"""
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        max_tokens=max_tokens,
        temperature=0.3
    )
    return response.choices[0].message.content

def retrieve_docs(query: str, n_results: int = 2) -> List[Dict]:
    """Retrieve relevant documents from ChromaDB"""
    results = collection.query(
        query_texts=[query],
        n_results=n_results
    )

    docs = []
    for i in range(len(results['ids'][0])):
        docs.append({
            'id': results['ids'][0][i],
            'content': results['documents'][0][i],
            'metadata': results['metadatas'][0][i],
            'distance': results['distances'][0][i] if 'distances' in results else None
        })
    return docs

print("✓ Helper functions ready")

✓ Helper functions ready


In [None]:
# ============================================================================
# CELL 6: Traditional RAG Implementation
# ============================================================================

class TraditionalRAG:
    """
    Traditional RAG: Single retrieval + single generation
    - Retrieve relevant docs from vector DB
    - Generate answer in one LLM call
    - Fast but generic (no state analysis)
    """

    def answer(self, query: str) -> Dict:
        start_time = time.time()
        steps = []

        # Step 1: Retrieve relevant docs
        docs = retrieve_docs(query, n_results=2)
        steps.append(f"Retrieved {len(docs)} documents from ChromaDB")

        # Step 2: Generate answer
        context = "\n\n".join([doc['content'] for doc in docs])
        prompt = f"""You are a helpful admin assistant. Answer the user's question based on this troubleshooting guide.

Question: {query}

Troubleshooting Guide:
{context}

Provide a helpful answer:"""

        answer = call_llm(prompt)
        steps.append("Generated answer with LLM")

        elapsed = time.time() - start_time

        return {
            "answer": answer,
            "method": "Traditional RAG",
            "retrieval_calls": 1,
            "llm_calls": 1,
            "docs_retrieved": len(docs),
            "latency_ms": round(elapsed * 1000, 2),
            "steps": steps,
            "retrieved_docs": docs
        }

trad_rag = TraditionalRAG()
print("✓ Traditional RAG ready")

✓ Traditional RAG ready


In [None]:
# ============================================================================
# CELL 7: Agentic RAG Implementation
# ============================================================================

class AgenticRAG:
    """
    Agentic RAG: Multi-step reasoning with state analysis
    - Retrieve troubleshooting guide
    - Create diagnostic plan (LLM call #1)
    - Execute state checks (LLM calls #2-5)
    - Synthesize diagnosis (LLM call #6)
    - Slower but provides specific diagnosis
    """

    def answer(self, query: str, state: Dict = None) -> Dict:
        start_time = time.time()
        steps = []
        llm_calls = 0
        retrieval_calls = 0

        if state is None:
            state = trigger_state

        # Step 1: Retrieve troubleshooting docs
        docs = retrieve_docs(query, n_results=3)
        steps.append(f"Retrieved {len(docs)} documents from ChromaDB")
        retrieval_calls += 1

        context = "\n\n".join([doc['content'] for doc in docs])

        # Step 2: Create diagnostic plan (LLM call #1)
        plan_prompt = f"""Based on the user's question and troubleshooting guide, what specific checks should we run?

Question: {query}

Guide:
{context}

Return ONLY a JSON array of 4 specific checks to run against the trigger state.
Example format: ["check one", "check two", "check three", "check four"]
DO NOT include any other text or formatting outside the JSON array."""

        plan_response = call_llm(plan_prompt, max_tokens=200)
        llm_calls += 1
        steps.append("Created diagnostic plan")

        try:
            diagnostic_plan = json.loads(plan_response)
        except Exception as e:
            print(f"Error: Diagnostic plan could not be parsed. Response was: {plan_response}")
            print(f"Parsing error: {e}")
            # Fallback plan if parsing fails
            diagnostic_plan = ["Check enabled status", "Analyze execution logs", "Compare conditions with tickets", "Verify logic type"]
            steps.append("Using fallback diagnostic plan due to parsing error")


        # Step 3: Execute each diagnostic check
        findings = []

        # Check 1: Is enabled?
        check_prompt = f"""Check if trigger is enabled.

Trigger settings: {json.dumps(state['trigger_settings'], indent=2)}

Answer in one sentence: Is the trigger enabled?"""
        finding = call_llm(check_prompt, max_tokens=50)
        findings.append(f"Enabled status: {finding}")
        llm_calls += 1
        steps.append("Check #1: Verified enabled status")

        # Check 2: Analyze logs
        logs_prompt = f"""Analyze these execution logs.

Logs: {json.dumps(state['execution_logs'], indent=2)}

Answer in 2-3 sentences: What do the logs show about trigger firing?"""
        finding = call_llm(logs_prompt, max_tokens=100)
        findings.append(f"Log analysis: {finding}")
        llm_calls += 1
        steps.append("Check #2: Analyzed execution logs")

        # Check 3: Compare conditions
        conditions_prompt = f"""Compare trigger conditions against actual tickets.

Conditions: {json.dumps(state['trigger_settings']['conditions'], indent=2)}
Recent tickets: {json.dumps(state['recent_tickets'], indent=2)}

Answer in 2-3 sentences: Which tickets match the conditions?"""
        finding = call_llm(conditions_prompt, max_tokens=150)
        findings.append(f"Condition matching: {finding}")
        llm_calls += 1
        steps.append("Check #3: Compared conditions vs tickets")

        # Check 4: Verify logic type
        logic_prompt = f"""Explain the logic type.

Logic type: {state['trigger_settings']['logic']}
Conditions: {json.dumps(state['trigger_settings']['conditions'], indent=2)}

Answer in 1-2 sentences: What does this logic type mean?"""
        finding = call_llm(logic_prompt, max_tokens=100)
        findings.append(f"Logic type: {finding}")
        llm_calls += 1
        steps.append("Check #4: Verified logic type")


        # Step 4: Synthesize final diagnosis (LLM call #6)
        synthesis_prompt = f"""Based on all the findings, provide a specific diagnosis.

User question: {query}

Findings:
{chr(10).join(f'{i+1}. {f}' for i, f in enumerate(findings))}

Provide a specific, actionable answer that:
1. Explains if the trigger is working correctly or not
2. Gives evidence from the actual state
3. Explains why certain tickets didn't fire
4. Suggests concrete next steps if needed"""

        answer = call_llm(synthesis_prompt, max_tokens=600)
        llm_calls += 1
        steps.append("Synthesized final diagnosis")

        elapsed = time.time() - start_time

        return {
            "answer": answer,
            "method": "Agentic RAG",
            "retrieval_calls": retrieval_calls,
            "llm_calls": llm_calls,
            "docs_retrieved": len(docs),
            "latency_ms": round(elapsed * 1000, 2),
            "steps": steps,
            "findings": findings,
            "retrieved_docs": docs
        }

agentic_rag = AgenticRAG()
print("✓ Agentic RAG ready")

✓ Agentic RAG ready


In [None]:
# ============================================================================
# CELL 8: Hybrid RAG Router
# ============================================================================

class HybridRAG:
    """
    Hybrid RAG: Intelligent routing
    - Simple queries → Traditional RAG
    - Complex queries → Agentic RAG
    """

    def __init__(self, complexity_threshold: float = 0.5):
        self.traditional = TraditionalRAG()
        self.agentic = AgenticRAG()
        self.threshold = complexity_threshold

    def assess_complexity(self, query: str) -> float:
        """Assess query complexity (0.0 = simple, 1.0 = complex)"""
        query_lower = query.lower()

        # Simple queries
        if any(word in query_lower for word in ["where", "what is", "how to find"]):
            return 0.3

        # Complex queries
        if any(word in query_lower for word in ["why", "not working", "not firing", "issue"]):
            return 0.8

        return 0.5

    def answer(self, query: str, verbose: bool = True) -> Dict:
        complexity = self.assess_complexity(query)

        if verbose:
            print(f"\n{'='*70}")
            print(f"Query: {query}")
            print(f"{'='*70}")
            print(f"Complexity: {complexity:.2f} (threshold: {self.threshold})")

        if complexity < self.threshold:
            if verbose:
                print("→ Routing to: Traditional RAG (simple query)\n")
            result = self.traditional.answer(query)
        else:
            if verbose:
                print("→ Routing to: Agentic RAG (complex query)\n")
            result = self.agentic.answer(query)

        result['complexity'] = complexity
        return result

hybrid = HybridRAG(complexity_threshold=0.5)
print("✓ Hybrid RAG router ready")

✓ Hybrid RAG router ready


In [None]:
# ============================================================================
# CELL 9: Run Comparison Demo
# ============================================================================

def compare_methods(query: str):
    """Compare Traditional vs Agentic RAG"""

    print("\n" + "="*70)
    print("COMPARISON: Traditional RAG vs Agentic RAG")
    print("="*70)
    print(f"Query: {query}\n")

    # Traditional
    print("─"*70)
    print("METHOD 1: Traditional RAG")
    print("─"*70)
    trad_result = trad_rag.answer(query)

    print(f"\nSteps:")
    for i, step in enumerate(trad_result['steps'], 1):
        print(f"  {i}. {step}")

    print(f"\nAnswer:\n{trad_result['answer']}")
    print(f"\n📊 Performance:")
    print(f"   Retrieval calls: {trad_result['retrieval_calls']}")
    print(f"   LLM calls: {trad_result['llm_calls']}")
    print(f"   Latency: {trad_result['latency_ms']}ms")

    # Agentic
    print("\n" + "─"*70)
    print("METHOD 2: Agentic RAG")
    print("─"*70)
    agentic_result = agentic_rag.answer(query)

    print(f"\nSteps:")
    for i, step in enumerate(agentic_result['steps'], 1):
        print(f"  {i}. {step}")

    print(f"\nAnswer:\n{agentic_result['answer']}")
    print(f"\n📊 Performance:")
    print(f"   Retrieval calls: {agentic_result['retrieval_calls']}")
    print(f"   LLM calls: {agentic_result['llm_calls']}")
    print(f"   Latency: {agentic_result['latency_ms']}ms")

    # Analysis
    print("\n" + "="*70)
    print("TRADEOFF ANALYSIS")
    print("="*70)
    speedup = agentic_result['latency_ms'] / trad_result['latency_ms']
    print(f"⚡ Speed: Traditional is {speedup:.1f}x faster")
    print(f"🤖 LLM calls: Agentic uses {agentic_result['llm_calls']}x more")
    print(f"🎯 Quality: Agentic provides specific diagnosis with state analysis")
    print(f"💰 Cost: Agentic uses {agentic_result['llm_calls']}x more tokens")

# Run comparison
compare_methods("Why isn't my trigger firing?")


COMPARISON: Traditional RAG vs Agentic RAG
Query: Why isn't my trigger firing?

──────────────────────────────────────────────────────────────────────
METHOD 1: Traditional RAG
──────────────────────────────────────────────────────────────────────

Steps:
  1. Retrieved 2 documents from ChromaDB
  2. Generated answer with LLM

Answer:
If your trigger isn't firing, there are a couple of key areas to check:

1. **Trigger Status**: First, ensure that your trigger is enabled. You can do this by navigating to **Settings > Automation > Triggers** and looking for the toggle switch next to your trigger. It should be ON (green). If it's OFF, simply switch it to ON.

2. **Condition Matching**: Next, review the conditions set for your trigger. Triggers only activate when the conditions match the ticket data. If you're using ALL logic, every condition must be true for the trigger to fire. If you're using ANY logic, at least one condition must be true. Double-check that the conditions you have set

In [None]:
# ============================================================================
# CELL 10: Test Hybrid Router
# ============================================================================

test_queries = [
    "Where can I find triggers?",
    "My trigger isn't firing, what's wrong?",
    "Why did trigger fire for some tickets but not others?",
]

print("\n" + "="*70)
print("HYBRID RAG ROUTER TEST")
print("="*70)

for query in test_queries:
    result = hybrid.answer(query, verbose=True)
    print(f"\nMethod chosen: {result['method']}")
    print(f"Final Answer:\n{result['answer']}") # Added to display the final answer
    print(f"Performance: {result['llm_calls']} LLM calls, {result['latency_ms']}ms")
    print()


HYBRID RAG ROUTER TEST

Query: Where can I find triggers?
Complexity: 0.30 (threshold: 0.5)
→ Routing to: Traditional RAG (simple query)


Method chosen: Traditional RAG
Final Answer:
You can find triggers by navigating to Settings > Automation > Triggers. Here, you can view all your triggers and check if they are enabled. Make sure the toggle switch for each trigger is ON (green) to ensure they fire correctly. If you need to check the execution history or any issues with triggers not firing, you can look at the execution logs by going to Settings > Automation > Logs. This will show you which tickets were evaluated and any error details if a trigger didn't fire.
Performance: 1 LLM calls, 2162.87ms


Query: My trigger isn't firing, what's wrong?
Complexity: 0.50 (threshold: 0.5)
→ Routing to: Agentic RAG (complex query)


Method chosen: Agentic RAG
Final Answer:
### Diagnosis of Trigger Issue

1. **Is the Trigger Working Correctly?**
   - Yes, the trigger is functioning correctly based

## Hybrid RAG Router Test Performance

Based on the last run of the Hybrid RAG router test (Cell 10), here is the performance for each query:

| Query                                        | Complexity | Method Chosen     | LLM Calls | Latency (ms) |
| :------------------------------------------- | :--------- | :---------------- | :-------- | :----------- |
| Where can I find triggers?                   | 0.30       | Traditional RAG   | 1         | 2162.87      |
| My trigger isn't firing, what's wrong?       | 0.50       | Agentic RAG       | 6         | 20274.25     |
| Why did trigger fire for some tickets but not others? | 0.80       | Agentic RAG       | 6         | 17854.34     |

## Average Performance Comparison of RAG Approaches

Based on the comparison run (Cell 9) and the hybrid test run (Cell 10), here is an estimated average performance comparison across the three RAG approaches demonstrated in this notebook.

*Note: Averages for Hybrid RAG are based on the sample queries tested and depend heavily on the complexity routing.*

| RAG Approach      | Average LLM Calls | Average Latency (ms) |
| :---------------- | :---------------- | :------------------- |
| Traditional RAG   | 1                 | ~3863                |
| Agentic RAG       | 6                 | ~18080 (Average of two runs in Cell 10) |
| Adaptive (Hybrid) | ~4.3 (Average of LLM calls in Cell 10) | ~13430 (Average of latencies in Cell 10) |