# TestAgentWorkflow.ipynb

## Comprehensive Testing for MultiAgentWorkflow (workflow.py)

This notebook tests the complete multi-agent workflow system that orchestrates the RAG pipeline for the FastAPI application.

### Test Coverage:
1. **Workflow Initialization** - Component setup and configuration
2. **Supervisor Routing** - Agent selection based on query characteristics  
3. **Supabase Fallback Methods** - BM25, Vector, and Hybrid search fallbacks
4. **Complete Query Processing** - End-to-end workflow execution
5. **Error Handling** - Robustness and edge cases
6. **Performance Metrics** - Timing and resource usage

---

**Created**: August 15, 2025  
**Purpose**: Validate MultiAgentWorkflow functionality before API integration  
**Dependencies**: Supabase, OpenAI, RAG Tools

In [1]:
# Install required packages
print("📦 INSTALLING REQUIRED PACKAGES")
print("=" * 50)

import subprocess
import sys

# List of required packages for the workflow testing
required_packages = [
    "openai>=1.0.0",
    "langchain-openai", 
    "langchain-core",
    "supabase>=2.0.0",
    "python-dotenv",
    "pydantic>=2.0.0",
    "nest-asyncio",  # For async support in Jupyter
    "numpy<2.0.0,>=1.26.0",
    "pandas",  # May be useful for results analysis
]

def install_package(package):
    """Install a package using pip."""
    try:
        subprocess.check_call([sys.executable, "-m", "pip", "install", package])
        return True
    except subprocess.CalledProcessError:
        return False

print("🔄 Installing packages...")

for package in required_packages:
    print(f"   Installing {package}...", end="")
    if install_package(package):
        print(" ✅")
    else:
        print(f" ❌ (failed)")

print("\n✅ Package installation completed!")
print("⚠️  Note: If any installations failed, you may need to install them manually")
print("   Example: pip install openai langchain-openai supabase python-dotenv")

📦 INSTALLING REQUIRED PACKAGES
🔄 Installing packages...
 ✅
   Installing langchain-openai...


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


 ✅
   Installing langchain-core...


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


 ✅
   Installing supabase>=2.0.0...


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


 ✅
   Installing python-dotenv...


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


 ✅
   Installing pydantic>=2.0.0...


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


 ✅
   Installing nest-asyncio...


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


 ✅
   Installing numpy<2.0.0,>=1.26.0...


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


 ✅
   Installing pandas...


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


 ✅

✅ Package installation completed!
⚠️  Note: If any installations failed, you may need to install them manually
   Example: pip install openai langchain-openai supabase python-dotenv



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [2]:
import os
import sys
import asyncio
import logging
from pathlib import Path
from datetime import datetime
from typing import Dict, List, Any, Optional

# Setup paths for imports
print("🔧 SETTING UP TEST ENVIRONMENT")
print("=" * 50)

# Get the current working directory
current_path = Path.cwd()
print(f"📁 Current Directory: {current_path}")

# Navigate to app directory 
app_dir = current_path.parent
project_root = app_dir.parent

print(f"📁 App Directory: {app_dir}")
print(f"📁 Project Root: {project_root}")

# Add necessary paths to Python path
for path in [str(project_root), str(app_dir), str(app_dir / 'agents'), str(app_dir / 'tools'), str(app_dir / 'rag')]:
    if path not in sys.path:
        sys.path.insert(0, path)
        print(f"📎 Added to path: {path}")

print("✅ Python paths configured")



🔧 SETTING UP TEST ENVIRONMENT
📁 Current Directory: /Users/foohm/github/cuttlefish4/app/api
📁 App Directory: /Users/foohm/github/cuttlefish4/app
📁 Project Root: /Users/foohm/github/cuttlefish4
📎 Added to path: /Users/foohm/github/cuttlefish4
📎 Added to path: /Users/foohm/github/cuttlefish4/app
📎 Added to path: /Users/foohm/github/cuttlefish4/app/agents
📎 Added to path: /Users/foohm/github/cuttlefish4/app/tools
📎 Added to path: /Users/foohm/github/cuttlefish4/app/rag
✅ Python paths configured


In [3]:
# Load environment variables
print("\n🌍 LOADING ENVIRONMENT")
print("=" * 50)

try:
    from dotenv import load_dotenv
    
    # Try to load .env from project root
    env_file = project_root / ".env"
    if env_file.exists():
        load_dotenv(str(env_file))
        print(f"✅ Environment loaded from: {env_file}")
    else:
        load_dotenv()
        print("⚠️  .env file not found in project root, using system environment")
        
    # Check required environment variables
    required_vars = ['OPENAI_API_KEY', 'SUPABASE_URL', 'SUPABASE_KEY', 'CUTTLEFISH_HOME']
    missing_vars = []
    
    for var in required_vars:
        if os.environ.get(var):
            print(f"✅ {var}: {'*' * 10}...{os.environ.get(var)[-4:]}")
        else:
            missing_vars.append(var)
            print(f"❌ {var}: Missing")
    
    if missing_vars:
        print(f"⚠️  Missing required variables: {', '.join(missing_vars)}")
        print("   Tests may fail without these variables")
    else:
        print("✅ All required environment variables found")
        
except ImportError:
    print("⚠️  python-dotenv not installed, using system environment variables")
except Exception as e:
    print(f"⚠️  Environment loading error: {e}")

# Set CUTTLEFISH_HOME to project root
if 'CUTTLEFISH_HOME' not in os.environ:
    os.environ['CUTTLEFISH_HOME'] = str(project_root)
    print(f"🏠 Set CUTTLEFISH_HOME to: {project_root}")

# Add project root to Python path
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))
    print(f"📎 Added project root to path: {project_root}")

# Add app directory to Python path for direct imports
if str(app_dir) not in sys.path:
    sys.path.insert(0, str(app_dir))
    print(f"📎 Added app dir to path: {app_dir}")



🌍 LOADING ENVIRONMENT
✅ Environment loaded from: /Users/foohm/github/cuttlefish4/.env
✅ OPENAI_API_KEY: **********...IzAA
✅ SUPABASE_URL: **********...e.co
✅ SUPABASE_KEY: **********...PhMQ
✅ CUTTLEFISH_HOME: **********...ish4
✅ All required environment variables found


In [4]:
# Import all required components
print("\n📦 IMPORTING COMPONENTS")
print("=" * 50)

# Core workflow import
try:
    from workflow import MultiAgentWorkflow
    print("✅ MultiAgentWorkflow imported")
except ImportError as e:
    print(f"❌ MultiAgentWorkflow import failed: {e}")
    print("   Trying alternative import paths...")
    try:
        from api.workflow import MultiAgentWorkflow
        print("✅ MultiAgentWorkflow imported (alternative path)")
    except ImportError as e2:
        print(f"❌ Alternative import also failed: {e2}")

# Import models for validation
try:
    from models import (
        MultiAgentRAGRequest, MultiAgentRAGResponse, 
        DebugRoutingRequest, DebugRoutingResponse,
        RetrievalMetadata, RetrievedContext, RelevantTicket
    )
    print("✅ Pydantic models imported")
except ImportError as e:
    print(f"⚠️  Pydantic models import failed: {e}")
    print("   Will create mock data structures")

# Import supporting components
try:
    from langchain_openai import ChatOpenAI
    print("✅ LangChain OpenAI imported")
except ImportError as e:
    print(f"❌ LangChain OpenAI import failed: {e}")

# Test framework imports
try:
    import json
    import time
    from unittest.mock import Mock, patch
    print("✅ Testing utilities imported")
except ImportError as e:
    print(f"⚠️  Some testing utilities unavailable: {e}")

print("✅ Import phase completed")


📦 IMPORTING COMPONENTS
✅ MultiAgentWorkflow imported
✅ Pydantic models imported
✅ LangChain OpenAI imported
✅ Testing utilities imported
✅ Import phase completed


## Test 1: MultiAgentWorkflow Initialization

Test the initialization of the workflow system, including:
- LLM setup (GPT-4o for supervisor/response, GPT-4o-mini for RAG)
- Vectorstore connection (Qdrant fallback to Supabase)
- Agent initialization (Supervisor, ResponseWriter, and retrieval agents)
- RAG tools integration

In [5]:
# Test 1: MultiAgentWorkflow Initialization
print("\n🧪 TEST 1: MULTIAGENTWORKFLOW INITIALIZATION")
print("=" * 60)

workflow = None
initialization_results = {
    'workflow_creation': False,
    'llm_initialization': False,
    'vectorstore_setup': False,
    'agents_initialization': False,
    'rag_tools_setup': False,
    'errors': []
}

try:
    print("🔄 Creating MultiAgentWorkflow instance...")
    start_time = time.time()
    
    # Initialize the workflow
    workflow = MultiAgentWorkflow()
    initialization_time = time.time() - start_time
    
    print(f"✅ MultiAgentWorkflow created in {initialization_time:.2f}s")
    initialization_results['workflow_creation'] = True
    
    # Test LLM initialization
    print("\n🔍 Checking LLM initialization...")
    if hasattr(workflow, 'supervisor_llm') and workflow.supervisor_llm:
        print(f"   ✅ Supervisor LLM: {workflow.supervisor_llm.model_name}")
        initialization_results['llm_initialization'] = True
    else:
        print("   ❌ Supervisor LLM not initialized")
        
    if hasattr(workflow, 'rag_llm') and workflow.rag_llm:
        print(f"   ✅ RAG LLM: {workflow.rag_llm.model_name}")
    else:
        print("   ❌ RAG LLM not initialized")
        
    if hasattr(workflow, 'response_writer_llm') and workflow.response_writer_llm:
        print(f"   ✅ Response Writer LLM: {workflow.response_writer_llm.model_name}")
    else:
        print("   ❌ Response Writer LLM not initialized")
    
    # Test vectorstore setup
    print(f"\n🔍 Checking vectorstore setup...")
    if hasattr(workflow, 'vectorstore'):
        if workflow.vectorstore:
            print(f"   ✅ Vectorstore connected: {type(workflow.vectorstore).__name__}")
            initialization_results['vectorstore_setup'] = True
        else:
            print("   ⚠️  No vectorstore connected - using Supabase fallbacks")
            initialization_results['vectorstore_setup'] = True  # This is expected
    
    # Test RAG tools setup
    print(f"\n🔍 Checking RAG tools...")
    if hasattr(workflow, 'rag_tools') and workflow.rag_tools:
        print(f"   ✅ RAG tools initialized: {type(workflow.rag_tools).__name__}")
        initialization_results['rag_tools_setup'] = True
    else:
        print("   ❌ RAG tools not initialized")
    
    # Test agent initialization
    print(f"\n🔍 Checking agent initialization...")
    agents_status = {}
    
    for agent_name in ['supervisor_agent', 'response_writer_agent', 'bm25_agent', 'contextual_compression_agent', 'ensemble_agent']:
        if hasattr(workflow, agent_name):
            agent = getattr(workflow, agent_name)
            if agent:
                print(f"   ✅ {agent_name}: {type(agent).__name__}")
                agents_status[agent_name] = True
            else:
                print(f"   ⚠️  {agent_name}: None (expected for Supabase fallbacks)")
                agents_status[agent_name] = 'expected_none'
        else:
            print(f"   ❌ {agent_name}: Missing attribute")
            agents_status[agent_name] = False
    
    # At least supervisor and response writer should be initialized
    if agents_status.get('supervisor_agent') and agents_status.get('response_writer_agent'):
        initialization_results['agents_initialization'] = True
    
    print(f"\n📊 INITIALIZATION SUMMARY:")
    for key, status in initialization_results.items():
        if key != 'errors':
            status_icon = "✅" if status else "❌"
            print(f"   {status_icon} {key.replace('_', ' ').title()}: {status}")
    
    if all(v for k, v in initialization_results.items() if k != 'errors'):
        print("\n🎉 ALL INITIALIZATION TESTS PASSED! (we do not need QDrant so Supabase fallback is expected)")
    else:
        print(f"\n⚠️  Some initialization components failed - check details above")

except Exception as e:
    error_msg = f"Initialization failed: {str(e)}"
    print(f"❌ {error_msg}")
    initialization_results['errors'].append(error_msg)
    import traceback
    traceback.print_exc()

2025-08-18 19:03:00,249 - MultiAgentWorkflow - INFO - ✅ LLMs initialized



🧪 TEST 1: MULTIAGENTWORKFLOW INITIALIZATION
🔄 Creating MultiAgentWorkflow instance...


2025-08-18 19:03:01,014 - SupabaseRetriever_bugs - INFO - ✅ Connection to bugs table successful
2025-08-18 19:03:01,360 - SupabaseRetriever_pcr - INFO - ✅ Connection to pcr table successful
2025-08-18 19:03:01,361 - MultiAgentWorkflow - INFO - ✅ Connected to Supabase retrievers (bugs & pcr)
2025-08-18 19:03:01,362 - MultiAgentWorkflow - INFO - ✅ Vectorstore and RAG tools initialized
2025-08-18 19:03:01,373 - MultiAgentWorkflow - INFO - ✅ Agents initialized
2025-08-18 19:03:01,373 - MultiAgentWorkflow - INFO - ✅ Multi-agent workflow initialized


✅ MultiAgentWorkflow created in 1.27s

🔍 Checking LLM initialization...
   ✅ Supervisor LLM: gpt-4o
   ✅ RAG LLM: gpt-4o-mini
   ✅ Response Writer LLM: gpt-4o

🔍 Checking vectorstore setup...
   ⚠️  No vectorstore connected - using Supabase fallbacks

🔍 Checking RAG tools...
   ✅ RAG tools initialized: RAGTools

🔍 Checking agent initialization...
   ✅ supervisor_agent: SupervisorAgent
   ✅ response_writer_agent: ResponseWriterAgent
   ⚠️  bm25_agent: None (expected for Supabase fallbacks)
   ⚠️  contextual_compression_agent: None (expected for Supabase fallbacks)
   ⚠️  ensemble_agent: None (expected for Supabase fallbacks)

📊 INITIALIZATION SUMMARY:
   ✅ Workflow Creation: True
   ✅ Llm Initialization: True
   ✅ Vectorstore Setup: True
   ✅ Agents Initialization: True
   ✅ Rag Tools Setup: True

🎉 ALL INITIALIZATION TESTS PASSED! (we do not need QDrant so Supabase fallback is expected)


In [6]:
# Initialize summary_data for tracking workflow test results
from datetime import datetime

summary_data = {
    'components_tested': [],
    'test_results': {},
    'recommendations': [],
    'overall_status': 'PENDING',
    'start_time': datetime.now().isoformat(),
    'workflow_tests': {
        'initialization': False,
        'routing': False,
        'fallback_methods': False,
        'query_processing': False,
        'error_handling': False,
        'websearch_routing': False,
        'websearch_integration': False
    }
}

print("✅ Initialized summary_data for comprehensive workflow test tracking")
print("   This will collect results from all workflow tests including WebSearch integration")


✅ Initialized summary_data for comprehensive workflow test tracking
   This will collect results from all workflow tests including WebSearch integration


## Test 2: Supervisor Routing Decisions

Test the supervisor agent's routing logic with different query types:
- **Production incidents** → ContextualCompression (urgent)  
- **User can wait** → Ensemble (comprehensive)
- **JIRA ticket references** → BM25 (exact match)
- **General queries** → ContextualCompression (default)

In [7]:
# Test 2: Supervisor Routing Decisions
print("\n🧪 TEST 2: SUPERVISOR ROUTING DECISIONS")
print("=" * 60)

# Test scenarios for routing
test_scenarios = [
    {
        'name': 'Production Incident',
        'query': 'database connection timeout causing login failures',
        'user_can_wait': False,
        'production_incident': True,
        'expected_reasoning': 'production incident',
        'expected_agent': 'ContextualCompression'
    },
    {
        'name': 'User Can Wait',
        'query': 'authentication error patterns in recent tickets', 
        'user_can_wait': True,
        'production_incident': False,
        'expected_reasoning': 'comprehensive analysis',
        'expected_agent': 'Ensemble'
    },
    {
        'name': 'JIRA Ticket Reference',
        'query': 'HBASE-12345 connection timeout issue details',
        'user_can_wait': False,
        'production_incident': False,
        'expected_reasoning': 'specific ticket',
        'expected_agent': 'BM25'
    },
    {
        'name': 'General Query',
        'query': 'Java OutOfMemoryError troubleshooting',
        'user_can_wait': False,
        'production_incident': False,
        'expected_reasoning': 'default routing',
        'expected_agent': 'ContextualCompression'
    }
]

routing_results = []

if workflow:
    print("🔄 Testing supervisor routing decisions...")
    
    for i, scenario in enumerate(test_scenarios, 1):
        try:
            print(f"\n📋 Scenario {i}: {scenario['name']}")
            print(f"   Query: '{scenario['query'][:60]}...'")
            print(f"   user_can_wait: {scenario['user_can_wait']}, production_incident: {scenario['production_incident']}")
            
            # Test routing decision (this is async)
            start_time = time.time()
            
            # For async testing in Jupyter, we need to handle the event loop
            try:
                # Try to get existing loop
                loop = asyncio.get_event_loop()
                if loop.is_running():
                    # Create task for running loop
                    import nest_asyncio
                    nest_asyncio.apply()
                    result = await workflow.get_routing_decision(
                        query=scenario['query'],
                        user_can_wait=scenario['user_can_wait'],
                        production_incident=scenario['production_incident']
                    )
                else:
                    result = loop.run_until_complete(
                        workflow.get_routing_decision(
                            query=scenario['query'],
                            user_can_wait=scenario['user_can_wait'],
                            production_incident=scenario['production_incident']
                        )
                    )
            except RuntimeError:
                # No existing loop, create new one
                result = asyncio.run(
                    workflow.get_routing_decision(
                        query=scenario['query'],
                        user_can_wait=scenario['user_can_wait'],
                        production_incident=scenario['production_incident']
                    )
                )
            
            routing_time = time.time() - start_time
            
            decision = result.get('routing_decision', 'Unknown')
            reasoning = result.get('routing_reasoning', 'No reasoning provided')
            
            print(f"   🎯 Decision: {decision} ({routing_time:.2f}s)")
            print(f"   💭 Reasoning: {reasoning[:100]}...")
            
            # Validate against expected results
            decision_correct = decision == scenario['expected_agent']
            reasoning_relevant = any(keyword in reasoning.lower() 
                                   for keyword in scenario['expected_reasoning'].split())
            
            result_entry = {
                'scenario': scenario['name'],
                'query': scenario['query'],
                'expected_agent': scenario['expected_agent'],
                'actual_agent': decision,
                'decision_correct': decision_correct,
                'reasoning_relevant': reasoning_relevant,
                'routing_time': routing_time,
                'full_reasoning': reasoning
            }
            
            routing_results.append(result_entry)
            
            if decision_correct:
                print(f"   ✅ Routing decision correct")
            else:
                print(f"   ❌ Expected {scenario['expected_agent']}, got {decision}")
                
        except Exception as e:
            print(f"   ❌ Routing test failed: {str(e)}")
            routing_results.append({
                'scenario': scenario['name'],
                'error': str(e)
            })
            import traceback
            traceback.print_exc()
    
    # Summary of routing tests
    print(f"\n📊 ROUTING TEST SUMMARY:")
    print("-" * 40)
    
    correct_decisions = sum(1 for r in routing_results if r.get('decision_correct', False))
    total_tests = len([r for r in routing_results if 'error' not in r])
    
    print(f"✅ Correct routing decisions: {correct_decisions}/{total_tests}")
    
    if total_tests > 0:
        avg_routing_time = sum(r.get('routing_time', 0) for r in routing_results if 'routing_time' in r) / total_tests
        print(f"⏱️  Average routing time: {avg_routing_time:.2f}s")
    
    # Show any errors
    errors = [r for r in routing_results if 'error' in r]
    if errors:
        print(f"❌ Failed tests: {len(errors)}")
        for error in errors:
            print(f"   - {error['scenario']}: {error['error']}")
    
else:
    print("❌ Workflow not initialized - skipping routing tests")

2025-08-18 19:03:01,393 - MultiAgentWorkflow - INFO - Getting routing decision for: 'database connection timeout causing login failures...'



🧪 TEST 2: SUPERVISOR ROUTING DECISIONS
🔄 Testing supervisor routing decisions...

📋 Scenario 1: Production Incident
   Query: 'database connection timeout causing login failures...'
   user_can_wait: False, production_incident: True
🧠 Supervisor Agent analyzing query: 'database connection timeout causing login failures'
   user_can_wait: False, production_incident: True


2025-08-18 19:03:02,783 - MultiAgentWorkflow - INFO - Getting routing decision for: 'authentication error patterns in recent tickets...'


✅ Supervisor decision: ContextualCompression - The query is about a production incident and the user cannot wait, so a fast semantic search is needed.
   Analysis time: 1.39s
   🎯 Decision: ContextualCompression (1.39s)
   💭 Reasoning: The query is about a production incident and the user cannot wait, so a fast semantic search is need...
   ✅ Routing decision correct

📋 Scenario 2: User Can Wait
   Query: 'authentication error patterns in recent tickets...'
   user_can_wait: True, production_incident: False
🧠 Supervisor Agent analyzing query: 'authentication error patterns in recent tickets'
   user_can_wait: True, production_incident: False


2025-08-18 19:03:04,418 - MultiAgentWorkflow - INFO - Getting routing decision for: 'HBASE-12345 connection timeout issue details...'


✅ Supervisor decision: Ensemble - The query is complex and user_can_wait=True, allowing for a comprehensive search to analyze authentication error patterns in recent tickets.
   Analysis time: 1.63s
   🎯 Decision: Ensemble (1.63s)
   💭 Reasoning: The query is complex and user_can_wait=True, allowing for a comprehensive search to analyze authenti...
   ✅ Routing decision correct

📋 Scenario 3: JIRA Ticket Reference
   Query: 'HBASE-12345 connection timeout issue details...'
   user_can_wait: False, production_incident: False
🧠 Supervisor Agent analyzing query: 'HBASE-12345 connection timeout issue details'
   user_can_wait: False, production_incident: False


2025-08-18 19:03:05,631 - MultiAgentWorkflow - INFO - Getting routing decision for: 'Java OutOfMemoryError troubleshooting...'


✅ Supervisor decision: BM25 - The query contains a specific ticket reference 'HBASE-12345', which is best handled by the BM25 agent for fast keyword-based search.
   Analysis time: 1.21s
   🎯 Decision: BM25 (1.21s)
   💭 Reasoning: The query contains a specific ticket reference 'HBASE-12345', which is best handled by the BM25 agen...
   ✅ Routing decision correct

📋 Scenario 4: General Query
   Query: 'Java OutOfMemoryError troubleshooting...'
   user_can_wait: False, production_incident: False
🧠 Supervisor Agent analyzing query: 'Java OutOfMemoryError troubleshooting'
   user_can_wait: False, production_incident: False
✅ Supervisor decision: ContextualCompression - The query is a general troubleshooting question and the user cannot wait, so a fast semantic search is appropriate.
   Analysis time: 2.57s
   🎯 Decision: ContextualCompression (2.57s)
   💭 Reasoning: The query is a general troubleshooting question and the user cannot wait, so a fast semantic search ...
   ✅ Routing decision

## Test 3: Supabase Fallback Methods

Test the Supabase-based fallback retrieval methods:
- **BM25 Fallback** - keyword/text search
- **Vector Fallback** - semantic similarity search  
- **Hybrid Fallback** - combined vector + keyword search

These fallbacks are used when Qdrant vectorstore is not available.

In [8]:
# Test 3: Supabase Fallback Methods
print("\n🧪 TEST 3: SUPABASE FALLBACK METHODS")
print("=" * 60)

fallback_results = {}

if workflow:
    # Create test state for fallback methods
    test_state = {
        'query': 'authentication error',
        'user_can_wait': False,
        'production_incident': False,
        'routing_decision': None,
        'routing_reasoning': None,
        'retrieved_contexts': [],
        'retrieval_method': None,
        'retrieval_metadata': {},
        'final_answer': None,
        'relevant_tickets': [],
        'messages': []
    }
    
    # Test BM25 fallback
    print("🔄 Testing Supabase BM25 fallback...")
    try:
        start_time = time.time()
        
        # Since these are async methods, handle event loop
        try:
            loop = asyncio.get_event_loop()
            if loop.is_running():
                import nest_asyncio
                nest_asyncio.apply()
                bm25_result = await workflow._supabase_bm25_fallback(test_state.copy())
            else:
                bm25_result = loop.run_until_complete(workflow._supabase_bm25_fallback(test_state.copy()))
        except RuntimeError:
            bm25_result = asyncio.run(workflow._supabase_bm25_fallback(test_state.copy()))
        
        bm25_time = time.time() - start_time
        
        print(f"   ✅ BM25 fallback completed in {bm25_time:.2f}s")
        print(f"   📄 Retrieved contexts: {len(bm25_result.get('retrieved_contexts', []))}")
        print(f"   🔍 Method: {bm25_result.get('retrieval_method', 'Unknown')}")
        
        fallback_results['bm25'] = {
            'success': True,
            'contexts': len(bm25_result.get('retrieved_contexts', [])),
            'time': bm25_time,
            'method': bm25_result.get('retrieval_method', 'Unknown'),
            'metadata': bm25_result.get('retrieval_metadata', {})
        }
        
    except Exception as e:
        print(f"   ❌ BM25 fallback failed: {str(e)}")
        fallback_results['bm25'] = {'success': False, 'error': str(e)}
    
    # Test Vector fallback
    print("\n🔄 Testing Supabase Vector fallback...")
    try:
        start_time = time.time()
        
        try:
            loop = asyncio.get_event_loop()
            if loop.is_running():
                import nest_asyncio
                nest_asyncio.apply()
                vector_result = await workflow._supabase_vector_fallback(test_state.copy())
            else:
                vector_result = loop.run_until_complete(workflow._supabase_vector_fallback(test_state.copy()))
        except RuntimeError:
            vector_result = asyncio.run(workflow._supabase_vector_fallback(test_state.copy()))
        
        vector_time = time.time() - start_time
        
        print(f"   ✅ Vector fallback completed in {vector_time:.2f}s")
        print(f"   📄 Retrieved contexts: {len(vector_result.get('retrieved_contexts', []))}")
        print(f"   🔍 Method: {vector_result.get('retrieval_method', 'Unknown')}")
        
        fallback_results['vector'] = {
            'success': True,
            'contexts': len(vector_result.get('retrieved_contexts', [])),
            'time': vector_time,
            'method': vector_result.get('retrieval_method', 'Unknown'),
            'metadata': vector_result.get('retrieval_metadata', {})
        }
        
    except Exception as e:
        print(f"   ❌ Vector fallback failed: {str(e)}")
        fallback_results['vector'] = {'success': False, 'error': str(e)}
    
    # Test Hybrid fallback
    print("\n🔄 Testing Supabase Hybrid fallback...")
    try:
        start_time = time.time()
        
        try:
            loop = asyncio.get_event_loop()
            if loop.is_running():
                import nest_asyncio
                nest_asyncio.apply()
                hybrid_result = await workflow._supabase_hybrid_fallback(test_state.copy())
            else:
                hybrid_result = loop.run_until_complete(workflow._supabase_hybrid_fallback(test_state.copy()))
        except RuntimeError:
            hybrid_result = asyncio.run(workflow._supabase_hybrid_fallback(test_state.copy()))
        
        hybrid_time = time.time() - start_time
        
        print(f"   ✅ Hybrid fallback completed in {hybrid_time:.2f}s")
        print(f"   📄 Retrieved contexts: {len(hybrid_result.get('retrieved_contexts', []))}")
        print(f"   🔍 Method: {hybrid_result.get('retrieval_method', 'Unknown')}")
        
        fallback_results['hybrid'] = {
            'success': True,
            'contexts': len(hybrid_result.get('retrieved_contexts', [])),
            'time': hybrid_time,
            'method': hybrid_result.get('retrieval_method', 'Unknown'),
            'metadata': hybrid_result.get('retrieval_metadata', {})
        }
        
    except Exception as e:
        print(f"   ❌ Hybrid fallback failed: {str(e)}")
        fallback_results['hybrid'] = {'success': False, 'error': str(e)}
    
    # Fallback results summary
    print(f"\n📊 FALLBACK METHODS SUMMARY:")
    print("-" * 40)
    
    for method, results in fallback_results.items():
        if results.get('success'):
            print(f"✅ {method.upper()}: {results['contexts']} contexts in {results['time']:.2f}s")
        else:
            print(f"❌ {method.upper()}: Failed - {results.get('error', 'Unknown error')}")
    
    # Check if any method retrieved results
    successful_methods = sum(1 for r in fallback_results.values() if r.get('success') and r.get('contexts', 0) > 0)
    if successful_methods > 0:
        print(f"\n🎉 {successful_methods}/3 fallback methods successfully retrieved contexts!")
    else:
        print(f"\n⚠️  No fallback methods retrieved contexts - check RAG tools setup")

else:
    print("❌ Workflow not initialized - skipping fallback tests")

2025-08-18 19:03:08,235 - RAGTools - INFO - ✅ RAG tools initialized successfully
2025-08-18 19:03:08,236 - SupabaseRetriever_bugs - INFO - Direct keyword search for: 'authentication error...' in bugs



🧪 TEST 3: SUPABASE FALLBACK METHODS
🔄 Testing Supabase BM25 fallback...


2025-08-18 19:03:09,286 - SupabaseRetriever_bugs - INFO - Direct keyword search returned 10 results
2025-08-18 19:03:09,287 - RAGTools - INFO - Keyword search (bugs): 10 results for 'authentication error...'
2025-08-18 19:03:09,287 - MultiAgentWorkflow - INFO - Supabase BM25 fallback: 10 results
2025-08-18 19:03:09,288 - SupabaseRetriever_bugs - INFO - Direct vector search for: 'authentication error...' in bugs
2025-08-18 19:03:09,290 - SupabaseRetriever_bugs - INFO - Parameters: k=10, similarity_threshold=0.2, filters=None


   ✅ BM25 fallback completed in 1.07s
   📄 Retrieved contexts: 10
   🔍 Method: Supabase_BM25

🔄 Testing Supabase Vector fallback...


2025-08-18 19:03:10,200 - SupabaseRetriever_bugs - INFO - Processing 30 candidates for similarity calculation
2025-08-18 19:03:10,214 - SupabaseRetriever_bugs - INFO - Calculated 30 similarities, 17 above threshold 0.2
2025-08-18 19:03:10,214 - SupabaseRetriever_bugs - INFO - Result similarities: ['0.2479', '0.2604', '0.2196']
2025-08-18 19:03:10,214 - SupabaseRetriever_bugs - INFO - Direct vector search returned 10 results (from 30 candidates)
2025-08-18 19:03:10,214 - RAGTools - INFO - Vector search (bugs): 10 results for 'authentication error...'
2025-08-18 19:03:10,215 - MultiAgentWorkflow - INFO - Supabase vector fallback: 10 results
2025-08-18 19:03:10,215 - SupabaseRetriever_bugs - INFO - Direct hybrid search for: 'authentication error...' in bugs
2025-08-18 19:03:10,215 - SupabaseRetriever_bugs - INFO - Direct vector search for: 'authentication error...' in bugs
2025-08-18 19:03:10,215 - SupabaseRetriever_bugs - INFO - Parameters: k=20, similarity_threshold=0.2, filters=None


   ✅ Vector fallback completed in 0.93s
   📄 Retrieved contexts: 10
   🔍 Method: Supabase_Vector

🔄 Testing Supabase Hybrid fallback...


2025-08-18 19:03:12,130 - SupabaseRetriever_bugs - INFO - Processing 60 candidates for similarity calculation
2025-08-18 19:03:12,153 - SupabaseRetriever_bugs - INFO - Calculated 60 similarities, 34 above threshold 0.2
2025-08-18 19:03:12,154 - SupabaseRetriever_bugs - INFO - Result similarities: ['0.2478', '0.2604', '0.2196']
2025-08-18 19:03:12,154 - SupabaseRetriever_bugs - INFO - Direct vector search returned 20 results (from 60 candidates)
2025-08-18 19:03:12,155 - SupabaseRetriever_bugs - INFO - Direct keyword search for: 'authentication error...' in bugs
2025-08-18 19:03:13,144 - SupabaseRetriever_bugs - INFO - Direct keyword search returned 20 results
2025-08-18 19:03:13,145 - SupabaseRetriever_bugs - INFO - Direct hybrid search returned 10 results
2025-08-18 19:03:13,145 - RAGTools - INFO - Hybrid search (bugs): 10 results for 'authentication error...'
2025-08-18 19:03:13,146 - MultiAgentWorkflow - INFO - Supabase hybrid fallback: 10 results


   ✅ Hybrid fallback completed in 2.93s
   📄 Retrieved contexts: 10
   🔍 Method: Supabase_Hybrid

📊 FALLBACK METHODS SUMMARY:
----------------------------------------
✅ BM25: 10 contexts in 1.07s
✅ VECTOR: 10 contexts in 0.93s
✅ HYBRID: 10 contexts in 2.93s

🎉 3/3 fallback methods successfully retrieved contexts!


## Test 4: Complete Query Processing

Test the end-to-end workflow processing with real queries:
- Full supervisor → retrieval → response writer pipeline
- Performance metrics and timing
- Response format validation
- Context quality assessment

In [9]:
# Test 4: Complete Query Processing 
print("\n🧪 TEST 4: COMPLETE QUERY PROCESSING")
print("=" * 60)

# Test queries for end-to-end processing
end_to_end_queries = [
    {
        'query': 'authentication error in login system',
        'user_can_wait': False,
        'production_incident': False,
        'expected_elements': ['authentication', 'login', 'error']
    },
    {
        'query': 'HBASE-12345 connection timeout troubleshooting',
        'user_can_wait': False, 
        'production_incident': False,
        'expected_elements': ['HBASE-12345', 'connection', 'timeout']
    },
    {
        'query': 'Java OutOfMemoryError heap space issues',
        'user_can_wait': True,
        'production_incident': False,
        'expected_elements': ['Java', 'OutOfMemoryError', 'heap']
    }
]

processing_results = []

if workflow:
    print("🔄 Testing complete query processing...")
    
    for i, test_case in enumerate(end_to_end_queries, 1):
        try:
            print(f"\n📋 Query {i}: '{test_case['query']}'")
            print(f"   Parameters: user_can_wait={test_case['user_can_wait']}, production_incident={test_case['production_incident']}")
            
            start_time = time.time()
            
            # Process query through complete workflow
            try:
                loop = asyncio.get_event_loop()
                if loop.is_running():
                    import nest_asyncio
                    nest_asyncio.apply()
                    result = await workflow.process_query(
                        query=test_case['query'],
                        user_can_wait=test_case['user_can_wait'],
                        production_incident=test_case['production_incident']
                    )
                else:
                    result = loop.run_until_complete(
                        workflow.process_query(
                            query=test_case['query'],
                            user_can_wait=test_case['user_can_wait'],
                            production_incident=test_case['production_incident']
                        )
                    )
            except RuntimeError:
                result = asyncio.run(
                    workflow.process_query(
                        query=test_case['query'],
                        user_can_wait=test_case['user_can_wait'],
                        production_incident=test_case['production_incident']
                    )
                )
            
            processing_time = time.time() - start_time
            
            # Analyze results
            print(f"   ⏱️  Processing time: {processing_time:.2f}s")
            print(f"   🎯 Routing: {result.get('routing_decision', 'Unknown')}")
            print(f"   🔍 Retrieval: {result.get('retrieval_method', 'Unknown')}")
            print(f"   📄 Contexts: {len(result.get('retrieved_contexts', []))}")
            print(f"   📝 Answer length: {len(result.get('final_answer', ''))}")
            
            # Validate response structure
            required_fields = [
                'query', 'final_answer', 'routing_decision', 'routing_reasoning',
                'retrieval_method', 'retrieved_contexts', 'retrieval_metadata',
                'timestamp', 'total_processing_time'
            ]
            
            missing_fields = [field for field in required_fields if field not in result]
            structure_valid = len(missing_fields) == 0
            
            # Check content quality
            final_answer = result.get('final_answer', '')
            content_relevant = any(element.lower() in final_answer.lower() 
                                 for element in test_case['expected_elements'])
            
            has_contexts = len(result.get('retrieved_contexts', [])) > 0
            
            # Store results
            test_result = {
                'query': test_case['query'],
                'processing_time': processing_time,
                'workflow_time': result.get('total_processing_time', 0),
                'routing_decision': result.get('routing_decision'),
                'retrieval_method': result.get('retrieval_method'),
                'contexts_count': len(result.get('retrieved_contexts', [])),
                'answer_length': len(final_answer),
                'structure_valid': structure_valid,
                'missing_fields': missing_fields,
                'content_relevant': content_relevant,
                'has_contexts': has_contexts,
                'full_result': result
            }
            
            processing_results.append(test_result)
            
            # Print validation results
            if structure_valid:
                print(f"   ✅ Response structure: Valid")
            else:
                print(f"   ❌ Missing fields: {missing_fields}")
                
            if content_relevant:
                print(f"   ✅ Content relevance: Good")
            else:
                print(f"   ⚠️  Content relevance: Could not verify")
                
            if has_contexts:
                print(f"   ✅ Context retrieval: Success")
            else:
                print(f"   ❌ Context retrieval: No contexts found")
            
            # Show first part of answer
            print(f"   💬 Answer preview: {final_answer[:100]}...")
                
        except Exception as e:
            print(f"   ❌ Query processing failed: {str(e)}")
            processing_results.append({
                'query': test_case['query'],
                'error': str(e),
                'processing_time': time.time() - start_time
            })
            import traceback
            traceback.print_exc()
    
    # Overall processing summary
    print(f"\n📊 COMPLETE PROCESSING SUMMARY:")
    print("-" * 50)
    
    successful_queries = [r for r in processing_results if 'error' not in r]
    failed_queries = [r for r in processing_results if 'error' in r]
    
    print(f"✅ Successful queries: {len(successful_queries)}/{len(processing_results)}")
    print(f"❌ Failed queries: {len(failed_queries)}")
    
    if successful_queries:
        avg_processing_time = sum(r['processing_time'] for r in successful_queries) / len(successful_queries)
        avg_contexts = sum(r['contexts_count'] for r in successful_queries) / len(successful_queries)
        avg_answer_length = sum(r['answer_length'] for r in successful_queries) / len(successful_queries)
        
        print(f"⏱️  Average processing time: {avg_processing_time:.2f}s")
        print(f"📄 Average contexts retrieved: {avg_contexts:.1f}")
        print(f"📝 Average answer length: {avg_answer_length:.0f} characters")
        
        structure_valid_count = sum(1 for r in successful_queries if r.get('structure_valid'))
        content_relevant_count = sum(1 for r in successful_queries if r.get('content_relevant'))
        contexts_found_count = sum(1 for r in successful_queries if r.get('has_contexts'))
        
        print(f"🏗️  Structure validation: {structure_valid_count}/{len(successful_queries)}")
        print(f"🎯 Content relevance: {content_relevant_count}/{len(successful_queries)}")
        print(f"📚 Context retrieval: {contexts_found_count}/{len(successful_queries)}")
    
    if failed_queries:
        print(f"\n❌ FAILED QUERIES:")
        for failed in failed_queries:
            print(f"   - '{failed['query'][:50]}...': {failed['error']}")

else:
    print("❌ Workflow not initialized - skipping end-to-end tests")

2025-08-18 19:03:13,174 - MultiAgentWorkflow - INFO - Processing query: 'authentication error in login system...'



🧪 TEST 4: COMPLETE QUERY PROCESSING
🔄 Testing complete query processing...

📋 Query 1: 'authentication error in login system'
   Parameters: user_can_wait=False, production_incident=False
🧠 Supervisor Agent analyzing query: 'authentication error in login system'
   user_can_wait: False, production_incident: False


2025-08-18 19:03:14,484 - SupabaseRetriever_bugs - INFO - Direct vector search for: 'authentication error in login system...' in bugs
2025-08-18 19:03:14,485 - SupabaseRetriever_bugs - INFO - Parameters: k=10, similarity_threshold=0.2, filters=None


✅ Supervisor decision: ContextualCompression - The query is a general troubleshooting question related to an authentication error, and the user cannot wait, making speed critical.
   Analysis time: 1.31s


2025-08-18 19:03:15,256 - SupabaseRetriever_bugs - INFO - Processing 30 candidates for similarity calculation
2025-08-18 19:03:15,270 - SupabaseRetriever_bugs - INFO - Calculated 30 similarities, 16 above threshold 0.2
2025-08-18 19:03:15,270 - SupabaseRetriever_bugs - INFO - Result similarities: ['0.2022', '0.2315', '0.2099']
2025-08-18 19:03:15,270 - SupabaseRetriever_bugs - INFO - Direct vector search returned 10 results (from 30 candidates)
2025-08-18 19:03:15,271 - RAGTools - INFO - Vector search (bugs): 10 results for 'authentication error in login system...'
2025-08-18 19:03:15,271 - MultiAgentWorkflow - INFO - Supabase vector fallback: 10 results


✍️  ResponseWriter Agent  generating response...


2025-08-18 19:03:19,886 - MultiAgentWorkflow - INFO - Query processed successfully in 6.71s
2025-08-18 19:03:19,887 - MultiAgentWorkflow - INFO - Processing query: 'HBASE-12345 connection timeout troubleshooting...'


✅ ResponseWriter completed in 4.61s
   Generated response: 870 characters
   Relevant tickets: 10
   ⏱️  Processing time: 6.71s
   🎯 Routing: ContextualCompression
   🔍 Retrieval: Supabase_Vector
   📄 Contexts: 10
   📝 Answer length: 870
   ✅ Response structure: Valid
   ✅ Content relevance: Good
   ✅ Context retrieval: Success
   💬 Answer preview: Based on your query regarding an "authentication error in login system," it appears that none of the...

📋 Query 2: 'HBASE-12345 connection timeout troubleshooting'
   Parameters: user_can_wait=False, production_incident=False
🧠 Supervisor Agent analyzing query: 'HBASE-12345 connection timeout troubleshooting'
   user_can_wait: False, production_incident: False


2025-08-18 19:03:21,113 - SupabaseRetriever_bugs - INFO - Direct keyword search for: 'HBASE-12345 connection timeout troubleshooting...' in bugs


✅ Supervisor decision: BM25 - The query contains a specific ticket reference 'HBASE-12345', which is best handled by the BM25 agent for fast keyword-based search.
   Analysis time: 1.23s


2025-08-18 19:03:21,806 - SupabaseRetriever_bugs - INFO - Direct keyword search returned 10 results
2025-08-18 19:03:21,807 - RAGTools - INFO - Keyword search (bugs): 10 results for 'HBASE-12345 connection timeout troubleshooting...'
2025-08-18 19:03:21,807 - MultiAgentWorkflow - INFO - Supabase BM25 fallback: 10 results


✍️  ResponseWriter Agent  generating response...


2025-08-18 19:03:27,582 - MultiAgentWorkflow - INFO - Query processed successfully in 7.70s
2025-08-18 19:03:27,584 - MultiAgentWorkflow - INFO - Processing query: 'Java OutOfMemoryError heap space issues...'


✅ ResponseWriter completed in 5.77s
   Generated response: 1522 characters
   Relevant tickets: 10
   ⏱️  Processing time: 7.70s
   🎯 Routing: BM25
   🔍 Retrieval: Supabase_BM25
   📄 Contexts: 10
   📝 Answer length: 1522
   ✅ Response structure: Valid
   ✅ Content relevance: Good
   ✅ Context retrieval: Success
   💬 Answer preview: Based on your query regarding "HBASE-12345 connection timeout troubleshooting," it appears that none...

📋 Query 3: 'Java OutOfMemoryError heap space issues'
   Parameters: user_can_wait=True, production_incident=False
🧠 Supervisor Agent analyzing query: 'Java OutOfMemoryError heap space issues'
   user_can_wait: True, production_incident: False


2025-08-18 19:03:28,686 - SupabaseRetriever_bugs - INFO - Direct hybrid search for: 'Java OutOfMemoryError heap space issues...' in bugs
2025-08-18 19:03:28,686 - SupabaseRetriever_bugs - INFO - Direct vector search for: 'Java OutOfMemoryError heap space issues...' in bugs
2025-08-18 19:03:28,687 - SupabaseRetriever_bugs - INFO - Parameters: k=20, similarity_threshold=0.2, filters=None


✅ Supervisor decision: Ensemble - The user can wait, and the query is complex, requiring thorough analysis of Java OutOfMemoryError heap space issues.
   Analysis time: 1.10s


2025-08-18 19:03:29,792 - SupabaseRetriever_bugs - INFO - Processing 60 candidates for similarity calculation
2025-08-18 19:03:29,815 - SupabaseRetriever_bugs - INFO - Calculated 60 similarities, 34 above threshold 0.2
2025-08-18 19:03:29,816 - SupabaseRetriever_bugs - INFO - Result similarities: ['0.4497', '0.2525', '0.2674']
2025-08-18 19:03:29,816 - SupabaseRetriever_bugs - INFO - Direct vector search returned 20 results (from 60 candidates)
2025-08-18 19:03:29,817 - SupabaseRetriever_bugs - INFO - Direct keyword search for: 'Java OutOfMemoryError heap space issues...' in bugs
2025-08-18 19:03:30,598 - SupabaseRetriever_bugs - INFO - Direct keyword search returned 20 results
2025-08-18 19:03:30,598 - SupabaseRetriever_bugs - INFO - Direct hybrid search returned 10 results
2025-08-18 19:03:30,599 - RAGTools - INFO - Hybrid search (bugs): 10 results for 'Java OutOfMemoryError heap space issues...'
2025-08-18 19:03:30,599 - MultiAgentWorkflow - INFO - Supabase hybrid fallback: 10 resul

✍️  ResponseWriter Agent  generating response...


2025-08-18 19:03:37,724 - MultiAgentWorkflow - INFO - Query processed successfully in 10.14s


✅ ResponseWriter completed in 7.12s
   Generated response: 1353 characters
   Relevant tickets: 10
   ⏱️  Processing time: 10.14s
   🎯 Routing: Ensemble
   🔍 Retrieval: Supabase_Hybrid
   📄 Contexts: 10
   📝 Answer length: 1353
   ✅ Response structure: Valid
   ✅ Content relevance: Good
   ✅ Context retrieval: Success
   💬 Answer preview: Based on your query regarding "Java OutOfMemoryError heap space issues," the retrieved JIRA ticket [...

📊 COMPLETE PROCESSING SUMMARY:
--------------------------------------------------
✅ Successful queries: 3/3
❌ Failed queries: 0
⏱️  Average processing time: 8.18s
📄 Average contexts retrieved: 10.0
📝 Average answer length: 1248 characters
🏗️  Structure validation: 3/3
🎯 Content relevance: 3/3
📚 Context retrieval: 3/3


## Test 5: Error Handling and Edge Cases

Test the workflow's robustness with various error conditions:
- Invalid queries and parameters
- Missing environment variables
- Network failures and timeouts
- Empty retrieval results
- Malformed responses

In [10]:
# Test 5: Error Handling and Edge Cases
print("\n🧪 TEST 5: ERROR HANDLING AND EDGE CASES")
print("=" * 60)

error_test_cases = [
    {
        'name': 'Empty Query',
        'query': '',
        'user_can_wait': False,
        'production_incident': False,
        'expected_behavior': 'should handle gracefully'
    },
    {
        'name': 'Very Long Query',
        'query': 'authentication error ' * 100,  # 2000+ characters
        'user_can_wait': False,
        'production_incident': False,
        'expected_behavior': 'should truncate or handle large input'
    },
    {
        'name': 'Special Characters',
        'query': 'SQL injection; DROP TABLE users; -- authentication error',
        'user_can_wait': False,
        'production_incident': False,
        'expected_behavior': 'should sanitize input'
    },
    {
        'name': 'Unicode Characters',
        'query': '认证错误 🔒 authentication πρόβλημα',
        'user_can_wait': False,
        'production_incident': False,
        'expected_behavior': 'should handle unicode'
    }
]

error_results = []

if workflow:
    print("🔄 Testing error handling and edge cases...")
    
    for i, test_case in enumerate(error_test_cases, 1):
        try:
            print(f"\n📋 Test {i}: {test_case['name']}")
            print(f"   Query length: {len(test_case['query'])} characters")
            print(f"   Expected: {test_case['expected_behavior']}")
            
            start_time = time.time()
            
            # Test with potentially problematic input
            try:
                loop = asyncio.get_event_loop()
                if loop.is_running():
                    import nest_asyncio
                    nest_asyncio.apply()
                    result = await workflow.process_query(
                        query=test_case['query'],
                        user_can_wait=test_case['user_can_wait'],
                        production_incident=test_case['production_incident']
                    )
                else:
                    result = loop.run_until_complete(
                        workflow.process_query(
                            query=test_case['query'],
                            user_can_wait=test_case['user_can_wait'],
                            production_incident=test_case['production_incident']
                        )
                    )
            except RuntimeError:
                result = asyncio.run(
                    workflow.process_query(
                        query=test_case['query'],
                        user_can_wait=test_case['user_can_wait'],
                        production_incident=test_case['production_incident']
                    )
                )
            
            processing_time = time.time() - start_time
            
            # Check if workflow handled the edge case
            has_result = result is not None
            has_answer = bool(result.get('final_answer', '') if has_result else False)
            answer_length = len(result.get('final_answer', '')) if has_result else 0
            
            print(f"   ✅ Processing completed in {processing_time:.2f}s")
            print(f"   📝 Generated answer: {answer_length} characters")
            if has_answer:
                print(f"   💬 Answer preview: {result['final_answer'][:100]}...")
            
            error_results.append({
                'test_case': test_case['name'],
                'query_length': len(test_case['query']),
                'success': True,
                'processing_time': processing_time,
                'has_answer': has_answer,
                'answer_length': answer_length
            })
            
        except Exception as e:
            processing_time = time.time() - start_time
            print(f"   ❌ Error occurred: {str(e)}")
            
            # Check if error is expected/handled gracefully
            error_handled_gracefully = any(keyword in str(e).lower() 
                                         for keyword in ['validation', 'invalid', 'empty'])
            
            error_results.append({
                'test_case': test_case['name'],
                'query_length': len(test_case['query']),
                'success': False,
                'error': str(e),
                'processing_time': processing_time,
                'error_handled_gracefully': error_handled_gracefully
            })
    
    # Test empty results fallback
    print(f"\n🔄 Testing empty results fallback...")
    try:
        # Create a state that would result in empty results
        empty_state = {
            'query': 'nonexistent_query_12345_abcdef',
            'user_can_wait': False,
            'production_incident': False,
            'routing_decision': None,
            'routing_reasoning': None,
            'retrieved_contexts': [],
            'retrieval_method': None,
            'retrieval_metadata': {},
            'final_answer': None,
            'relevant_tickets': [],
            'messages': []
        }
        
        empty_result = workflow._empty_results_fallback(empty_state, 'Test_Empty_Fallback')
        print(f"   ✅ Empty results fallback working")
        print(f"   🔍 Method: {empty_result.get('retrieval_method', 'Unknown')}")
        print(f"   📄 Contexts: {len(empty_result.get('retrieved_contexts', []))}")
        
        error_results.append({
            'test_case': 'Empty Results Fallback',
            'success': True,
            'processing_time': 0.0,
            'has_answer': False,
            'contexts_count': len(empty_result.get('retrieved_contexts', []))
        })
        
    except Exception as e:
        print(f"   ❌ Empty results fallback failed: {str(e)}")
        error_results.append({
            'test_case': 'Empty Results Fallback',
            'success': False,
            'error': str(e)
        })
    
    # Error handling summary
    print(f"\n📊 ERROR HANDLING SUMMARY:")
    print("-" * 40)
    
    successful_tests = [r for r in error_results if r.get('success')]
    failed_tests = [r for r in error_results if not r.get('success')]
    
    print(f"✅ Handled gracefully: {len(successful_tests)}/{len(error_results)}")
    print(f"❌ Unhandled errors: {len(failed_tests)}")
    
    if successful_tests:
        avg_time = sum(r.get('processing_time', 0) for r in successful_tests) / len(successful_tests)
        print(f"⏱️  Average error handling time: {avg_time:.2f}s")
        
        with_answers = sum(1 for r in successful_tests if r.get('has_answer'))
        print(f"📝 Generated answers: {with_answers}/{len(successful_tests)}")
    
    if failed_tests:
        print(f"\n❌ UNHANDLED ERRORS:")
        for failed in failed_tests:
            error_type = "graceful" if failed.get('error_handled_gracefully') else "unexpected"
            print(f"   - {failed['test_case']}: {error_type} - {failed.get('error', 'Unknown error')[:50]}...")
    
    print(f"\n🛡️  ROBUSTNESS ASSESSMENT:")
    success_rate = len(successful_tests) / len(error_results) * 100
    if success_rate >= 80:
        print(f"   🎉 Excellent: {success_rate:.1f}% of edge cases handled")
    elif success_rate >= 60:
        print(f"   ✅ Good: {success_rate:.1f}% of edge cases handled")
    else:
        print(f"   ⚠️  Needs improvement: {success_rate:.1f}% of edge cases handled")

else:
    print("❌ Workflow not initialized - skipping error handling tests")

2025-08-18 19:03:37,760 - MultiAgentWorkflow - INFO - Processing query: '...'



🧪 TEST 5: ERROR HANDLING AND EDGE CASES
🔄 Testing error handling and edge cases...

📋 Test 1: Empty Query
   Query length: 0 characters
   Expected: should handle gracefully
🧠 Supervisor Agent analyzing query: ''
   user_can_wait: False, production_incident: False


2025-08-18 19:03:39,133 - SupabaseRetriever_bugs - INFO - Direct vector search for: '...' in bugs
2025-08-18 19:03:39,134 - SupabaseRetriever_bugs - INFO - Parameters: k=10, similarity_threshold=0.2, filters=None


✅ Supervisor decision: ContextualCompression - The query does not mention service status/outages or specific ticket references, and the user cannot wait, making ContextualCompression the best choice for fast semantic search.
   Analysis time: 1.37s


2025-08-18 19:03:40,180 - SupabaseRetriever_bugs - INFO - Processing 30 candidates for similarity calculation
2025-08-18 19:03:40,193 - SupabaseRetriever_bugs - INFO - Calculated 30 similarities, 0 above threshold 0.2
2025-08-18 19:03:40,204 - SupabaseRetriever_bugs - INFO - Direct vector search returned 0 results (from 30 candidates)
2025-08-18 19:03:40,205 - RAGTools - INFO - Vector search (bugs): 0 results for '...'
2025-08-18 19:03:40,205 - MultiAgentWorkflow - INFO - Supabase vector fallback: 0 results


✍️  ResponseWriter Agent  generating response...


2025-08-18 19:03:44,676 - MultiAgentWorkflow - INFO - Query processed successfully in 6.92s
2025-08-18 19:03:44,677 - MultiAgentWorkflow - INFO - Processing query: 'authentication error authentication error authenti...'


✅ ResponseWriter completed in 4.47s
   Generated response: 1041 characters
   Relevant tickets: 0
   ✅ Processing completed in 6.92s
   📝 Generated answer: 1041 characters
   💬 Answer preview: Thank you for your query. Based on the information provided, it appears that no relevant JIRA ticket...

📋 Test 2: Very Long Query
   Query length: 2100 characters
   Expected: should truncate or handle large input
🧠 Supervisor Agent analyzing query: 'authentication error authentication error authentication error authentication error authentication error authentication error authentication error authentication error authentication error authentication error authentication error authentication error authentication error authentication error authentication error authentication error authentication error authentication error authentication error authentication error authentication error authentication error authentication error authentication error authentication error authentication error authentic

2025-08-18 19:03:45,992 - SupabaseRetriever_bugs - INFO - Direct vector search for: 'authentication error authentication error authenti...' in bugs
2025-08-18 19:03:45,992 - SupabaseRetriever_bugs - INFO - Parameters: k=10, similarity_threshold=0.2, filters=None


✅ Supervisor decision: ContextualCompression - The query is a general troubleshooting question about an authentication error, and the user cannot wait, making speed critical.
   Analysis time: 1.31s


2025-08-18 19:03:46,829 - SupabaseRetriever_bugs - INFO - Processing 30 candidates for similarity calculation
2025-08-18 19:03:46,842 - SupabaseRetriever_bugs - INFO - Calculated 30 similarities, 1 above threshold 0.2
2025-08-18 19:03:46,843 - SupabaseRetriever_bugs - INFO - Result similarities: ['0.2073']
2025-08-18 19:03:46,843 - SupabaseRetriever_bugs - INFO - Direct vector search returned 1 results (from 30 candidates)
2025-08-18 19:03:46,844 - RAGTools - INFO - Vector search (bugs): 1 results for 'authentication error authentication error authenti...'
2025-08-18 19:03:46,844 - MultiAgentWorkflow - INFO - Supabase vector fallback: 1 results


✍️  ResponseWriter Agent  generating response...


2025-08-18 19:03:51,724 - MultiAgentWorkflow - INFO - Query processed successfully in 7.05s
2025-08-18 19:03:51,726 - MultiAgentWorkflow - INFO - Processing query: 'SQL injection; DROP TABLE users; -- authentication...'


✅ ResponseWriter completed in 4.88s
   Generated response: 1204 characters
   Relevant tickets: 1
   ✅ Processing completed in 7.05s
   📝 Generated answer: 1204 characters
   💬 Answer preview: It seems that your query is focused on an "authentication error," but the retrieved JIRA ticket [JBI...

📋 Test 3: Special Characters
   Query length: 56 characters
   Expected: should sanitize input
🧠 Supervisor Agent analyzing query: 'SQL injection; DROP TABLE users; -- authentication error'
   user_can_wait: False, production_incident: False


2025-08-18 19:03:53,160 - SupabaseRetriever_bugs - INFO - Direct vector search for: 'SQL injection; DROP TABLE users; -- authentication...' in bugs
2025-08-18 19:03:53,160 - SupabaseRetriever_bugs - INFO - Parameters: k=10, similarity_threshold=0.2, filters=None


✅ Supervisor decision: ContextualCompression - The query involves a general troubleshooting question related to SQL injection and authentication error, and the user cannot wait, making ContextualCompression the best choice for fast semantic search.
   Analysis time: 1.43s


2025-08-18 19:03:53,963 - SupabaseRetriever_bugs - INFO - Processing 30 candidates for similarity calculation
2025-08-18 19:03:53,975 - SupabaseRetriever_bugs - INFO - Calculated 30 similarities, 1 above threshold 0.2
2025-08-18 19:03:53,976 - SupabaseRetriever_bugs - INFO - Result similarities: ['0.2529']
2025-08-18 19:03:53,976 - SupabaseRetriever_bugs - INFO - Direct vector search returned 1 results (from 30 candidates)
2025-08-18 19:03:53,976 - RAGTools - INFO - Vector search (bugs): 1 results for 'SQL injection; DROP TABLE users; -- authentication...'
2025-08-18 19:03:53,977 - MultiAgentWorkflow - INFO - Supabase vector fallback: 1 results


✍️  ResponseWriter Agent  generating response...


2025-08-18 19:03:59,708 - MultiAgentWorkflow - INFO - Query processed successfully in 7.98s
2025-08-18 19:03:59,712 - MultiAgentWorkflow - INFO - Processing query: '认证错误 🔒 authentication πρόβλημα...'


✅ ResponseWriter completed in 5.73s
   Generated response: 1360 characters
   Relevant tickets: 1
   ✅ Processing completed in 7.99s
   📝 Generated answer: 1360 characters
   💬 Answer preview: Based on your query regarding "SQL injection; DROP TABLE users; -- authentication error," it seems y...

📋 Test 4: Unicode Characters
   Query length: 30 characters
   Expected: should handle unicode
🧠 Supervisor Agent analyzing query: '认证错误 🔒 authentication πρόβλημα'
   user_can_wait: False, production_incident: False


2025-08-18 19:04:01,157 - SupabaseRetriever_bugs - INFO - Direct vector search for: '认证错误 🔒 authentication πρόβλημα...' in bugs
2025-08-18 19:04:01,159 - SupabaseRetriever_bugs - INFO - Parameters: k=10, similarity_threshold=0.2, filters=None


✅ Supervisor decision: ContextualCompression - The query involves an authentication problem, which is a general troubleshooting question and the user cannot wait.
   Analysis time: 1.44s


2025-08-18 19:04:02,085 - SupabaseRetriever_bugs - INFO - Processing 30 candidates for similarity calculation
2025-08-18 19:04:02,096 - SupabaseRetriever_bugs - INFO - Calculated 30 similarities, 3 above threshold 0.2
2025-08-18 19:04:02,097 - SupabaseRetriever_bugs - INFO - Result similarities: ['0.2005', '0.2008', '0.2167']
2025-08-18 19:04:02,097 - SupabaseRetriever_bugs - INFO - Direct vector search returned 3 results (from 30 candidates)
2025-08-18 19:04:02,098 - RAGTools - INFO - Vector search (bugs): 3 results for '认证错误 🔒 authentication πρόβλημα...'
2025-08-18 19:04:02,098 - MultiAgentWorkflow - INFO - Supabase vector fallback: 3 results


✍️  ResponseWriter Agent  generating response...


2025-08-18 19:04:07,092 - MultiAgentWorkflow - INFO - Query processed successfully in 7.38s


✅ ResponseWriter completed in 4.99s
   Generated response: 949 characters
   Relevant tickets: 3
   ✅ Processing completed in 7.38s
   📝 Generated answer: 949 characters
   💬 Answer preview: It seems that your query "认证错误 🔒 authentication πρόβλημα" is related to authentication issues, possi...

🔄 Testing empty results fallback...
   ✅ Empty results fallback working
   🔍 Method: Test_Empty_Fallback
   📄 Contexts: 0

📊 ERROR HANDLING SUMMARY:
----------------------------------------
✅ Handled gracefully: 5/5
❌ Unhandled errors: 0
⏱️  Average error handling time: 5.87s
📝 Generated answers: 4/5

🛡️  ROBUSTNESS ASSESSMENT:
   🎉 Excellent: 100.0% of edge cases handled


## Test 6: WebSearch Routing Decisions

Testing supervisor routing accuracy for WebSearch queries.



In [11]:
# Test WebSearch routing decisions
from test_websearch_workflow import test_websearch_routing_only
summary_data = test_websearch_routing_only(workflow, summary_data)


2025-08-18 19:04:22,798 - MultiAgentWorkflow - INFO - Getting routing decision for: 'Is GitHub down?...'


🧠 TESTING: WebSearch Routing Decisions

🔍 Routing Test 1: 'Is GitHub down?'
🧠 Supervisor Agent analyzing query: 'Is GitHub down?'
   user_can_wait: False, production_incident: True


2025-08-18 19:04:24,186 - MultiAgentWorkflow - INFO - Getting routing decision for: 'AWS Lambda outage...'


✅ Supervisor decision: WebSearch - The query mentions a service status/outage ('Is GitHub down?'), which is best handled by WebSearch for real-time information.
   Analysis time: 1.39s
   Expected: WebSearch
   Got: WebSearch
   Reasoning: The query mentions a service status/outage ('Is GitHub down?'), which is best handled by WebSearch f...
   ✅ Correct: True

🔍 Routing Test 2: 'AWS Lambda outage'
🧠 Supervisor Agent analyzing query: 'AWS Lambda outage'
   user_can_wait: False, production_incident: True


2025-08-18 19:04:25,384 - MultiAgentWorkflow - INFO - Getting routing decision for: 'Spring Boot security vulnerability...'


✅ Supervisor decision: WebSearch - The query mentions an AWS outage, which is a service status check requiring real-time information.
   Analysis time: 1.20s
   Expected: WebSearch
   Got: WebSearch
   Reasoning: The query mentions an AWS outage, which is a service status check requiring real-time information....
   ✅ Correct: True

🔍 Routing Test 3: 'Spring Boot security vulnerability'
🧠 Supervisor Agent analyzing query: 'Spring Boot security vulnerability'
   user_can_wait: True, production_incident: False


2025-08-18 19:04:26,639 - MultiAgentWorkflow - INFO - Getting routing decision for: 'How to configure HBase cluster...'


✅ Supervisor decision: Ensemble - The query is complex and user_can_wait=True, allowing for a comprehensive search to cover all aspects of the security vulnerability.
   Analysis time: 1.25s
   Expected: WebSearch
   Got: Ensemble
   Reasoning: The query is complex and user_can_wait=True, allowing for a comprehensive search to cover all aspect...
   ✅ Correct: False

🔍 Routing Test 4: 'How to configure HBase cluster'
🧠 Supervisor Agent analyzing query: 'How to configure HBase cluster'
   user_can_wait: True, production_incident: False
✅ Supervisor decision: Ensemble - The query is a complex question about configuring an HBase cluster, and the user can wait for comprehensive results, making Ensemble the best choice for thorough analysis.
   Analysis time: 1.54s
   Expected: BM25
   Got: Ensemble
   Reasoning: The query is a complex question about configuring an HBase cluster, and the user can wait for compre...
   ✅ Correct: False

📊 ROUTING SUMMARY:
   Correct routing decisions: 2/4
  

## Test 7: WebSearch Workflow Integration

Testing complete end-to-end WebSearch workflow integration.

In [12]:
# Test complete WebSearch workflow integration
import asyncio
from test_websearch_workflow import test_websearch_workflow_integration

# Run async workflow integration tests
summary_data = await test_websearch_workflow_integration(workflow, summary_data)


2025-08-18 19:05:49,616 - MultiAgentWorkflow - INFO - Processing query: 'Is GitHub down right now?...'


🔗 TESTING: WebSearch Workflow Integration
🧪 Running 4 WebSearch workflow integration tests...

📋 Test 1: GitHub service status inquiry
   Query: 'Is GitHub down right now?'
   Expected routing: WebSearch
🧠 Supervisor Agent analyzing query: 'Is GitHub down right now?'
   user_can_wait: False, production_incident: True
✅ Supervisor decision: WebSearch - The query mentions a service status check for GitHub, which is best handled by WebSearch for real-time information on outages or downtime.
   Analysis time: 1.20s
✍️  ResponseWriter Agent [PRODUCTION INCIDENT] generating response...


2025-08-18 19:06:30,749 - MultiAgentWorkflow - INFO - Query processed successfully in 41.13s
2025-08-18 19:06:30,750 - MultiAgentWorkflow - INFO - Processing query: 'AWS Lambda outage today...'


✅ ResponseWriter completed in 3.65s
   Generated response: 527 characters
   Relevant tickets: 0
   📍 Actual routing: WebSearch
   🔧 Retrieval method: WebSearch
   📊 Retrieved contexts: 10
   📝 Answer length: 527 chars
   ⏱️  Processing time: 41.13s
   ✅ Routing correct: True
   🌐 WebSearch used: True
   🔗 Web sources found: True (3 sources)
   📊 Quality metrics: Contexts=True, Answer=True, Time=False
   🎯 Overall: ✅ EXCELLENT

📋 Test 2: AWS Lambda service outage check
   Query: 'AWS Lambda outage today'
   Expected routing: WebSearch
🧠 Supervisor Agent analyzing query: 'AWS Lambda outage today'
   user_can_wait: False, production_incident: True
✅ Supervisor decision: WebSearch - The query mentions an AWS Lambda outage, which is a service status check and requires real-time information.
   Analysis time: 1.23s
✍️  ResponseWriter Agent [PRODUCTION INCIDENT] generating response...


2025-08-18 19:07:11,550 - MultiAgentWorkflow - INFO - Query processed successfully in 40.80s
2025-08-18 19:07:11,551 - MultiAgentWorkflow - INFO - Processing query: 'Docker Hub registry down...'


✅ ResponseWriter completed in 5.40s
   Generated response: 800 characters
   Relevant tickets: 0
   📍 Actual routing: WebSearch
   🔧 Retrieval method: WebSearch
   📊 Retrieved contexts: 10
   📝 Answer length: 800 chars
   ⏱️  Processing time: 40.80s
   ✅ Routing correct: True
   🌐 WebSearch used: True
   🔗 Web sources found: True (3 sources)
   📊 Quality metrics: Contexts=True, Answer=True, Time=False
   🎯 Overall: ✅ EXCELLENT

📋 Test 3: Docker Hub registry status
   Query: 'Docker Hub registry down'
   Expected routing: WebSearch
🧠 Supervisor Agent analyzing query: 'Docker Hub registry down'
   user_can_wait: True, production_incident: False
✅ Supervisor decision: WebSearch - The query mentions a service status/outage ('Docker Hub registry down'), which is best handled by WebSearch for real-time information.
   Analysis time: 1.29s
✍️  ResponseWriter Agent  generating response...


2025-08-18 19:07:54,560 - MultiAgentWorkflow - INFO - Query processed successfully in 43.01s
2025-08-18 19:07:54,562 - MultiAgentWorkflow - INFO - Processing query: 'Latest security vulnerability in Java Spring Boot...'


✅ ResponseWriter completed in 5.21s
   Generated response: 970 characters
   Relevant tickets: 0
   📍 Actual routing: WebSearch
   🔧 Retrieval method: WebSearch
   📊 Retrieved contexts: 10
   📝 Answer length: 970 chars
   ⏱️  Processing time: 43.01s
   ✅ Routing correct: True
   🌐 WebSearch used: True
   🔗 Web sources found: True (3 sources)
   📊 Quality metrics: Contexts=True, Answer=True, Time=False
   🎯 Overall: ✅ EXCELLENT

📋 Test 4: Security research query
   Query: 'Latest security vulnerability in Java Spring Boot'
   Expected routing: WebSearch
🧠 Supervisor Agent analyzing query: 'Latest security vulnerability in Java Spring Boot'
   user_can_wait: True, production_incident: False


2025-08-18 19:07:56,195 - SupabaseRetriever_bugs - INFO - Direct hybrid search for: 'Latest security vulnerability in Java Spring Boot...' in bugs
2025-08-18 19:07:56,195 - SupabaseRetriever_bugs - INFO - Direct vector search for: 'Latest security vulnerability in Java Spring Boot...' in bugs
2025-08-18 19:07:56,196 - SupabaseRetriever_bugs - INFO - Parameters: k=20, similarity_threshold=0.2, filters=None


✅ Supervisor decision: Ensemble - The query is a research-type question about the latest security vulnerability in Java Spring Boot, and the user can wait for comprehensive results.
   Analysis time: 1.63s


2025-08-18 19:07:57,301 - SupabaseRetriever_bugs - INFO - Processing 60 candidates for similarity calculation
2025-08-18 19:07:57,317 - SupabaseRetriever_bugs - INFO - Calculated 60 similarities, 43 above threshold 0.2
2025-08-18 19:07:57,318 - SupabaseRetriever_bugs - INFO - Result similarities: ['0.3205', '0.2810', '0.2917']
2025-08-18 19:07:57,318 - SupabaseRetriever_bugs - INFO - Direct vector search returned 20 results (from 60 candidates)
2025-08-18 19:07:57,318 - SupabaseRetriever_bugs - INFO - Direct keyword search for: 'Latest security vulnerability in Java Spring Boot...' in bugs
2025-08-18 19:07:58,546 - SupabaseRetriever_bugs - INFO - Direct keyword search returned 20 results
2025-08-18 19:07:58,546 - SupabaseRetriever_bugs - INFO - Direct hybrid search returned 10 results
2025-08-18 19:07:58,547 - RAGTools - INFO - Hybrid search (bugs): 10 results for 'Latest security vulnerability in Java Spring Boot...'
2025-08-18 19:07:58,548 - MultiAgentWorkflow - INFO - Supabase hybri

✍️  ResponseWriter Agent  generating response...


2025-08-18 19:08:03,574 - MultiAgentWorkflow - INFO - Query processed successfully in 9.01s


✅ ResponseWriter completed in 5.03s
   Generated response: 777 characters
   Relevant tickets: 10
   📍 Actual routing: Ensemble
   🔧 Retrieval method: Supabase_Hybrid
   📊 Retrieved contexts: 10
   📝 Answer length: 777 chars
   ⏱️  Processing time: 9.01s
   ✅ Routing correct: False
   🌐 WebSearch used: False
   🔗 Web sources found: False (0 sources)
   📊 Quality metrics: Contexts=True, Answer=True, Time=True
   🎯 Overall: ❌ POOR

📊 WEBSEARCH WORKFLOW INTEGRATION SUMMARY:
   Total tests: 4
   Successful tests: 4/4
   Correct routing: 3/4
   WebSearch actually used: 3/4
   Excellent results: 3/4
   Average contexts per query: 10.0
   Average processing time: 33.49s
   Average answer length: 768 chars
   Overall Integration Status: ✅ EXCELLENT

📋 DETAILED RESULTS BY TEST TYPE:
   status_check: 1/1 successful
   outage_check: 1/1 successful
   service_status: 1/1 successful
   research: 1/1 successful



## Final Summary and Test Report

Generate a comprehensive test report with all results and recommendations.

In [13]:
# Final Test Report
print("\n📋 COMPREHENSIVE TEST REPORT")
print("=" * 70)
print(f"📅 Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print("=" * 70)

# Collect all results
test_categories = {
    'initialization': initialization_results if 'initialization_results' in locals() else {},
    'routing': routing_results if 'routing_results' in locals() else [],
    'fallbacks': fallback_results if 'fallback_results' in locals() else {},
    'end_to_end': processing_results if 'processing_results' in locals() else [],
    'error_handling': error_results if 'error_results' in locals() else []
}

# Overall Statistics
total_tests = 0
passed_tests = 0

print("\n📊 OVERALL TEST STATISTICS:")
print("-" * 50)

# Initialization results
init_passed = sum(1 for k, v in test_categories['initialization'].items() if v and k != 'errors')
init_total = len([k for k in test_categories['initialization'].keys() if k != 'errors'])
if init_total > 0:
    print(f"🔧 Initialization: {init_passed}/{init_total} components")
    total_tests += init_total
    passed_tests += init_passed

# Routing results  
routing_passed = sum(1 for r in test_categories['routing'] if r.get('decision_correct', False))
routing_total = len([r for r in test_categories['routing'] if 'error' not in r])
if routing_total > 0:
    print(f"🎯 Routing Decisions: {routing_passed}/{routing_total} correct")
    total_tests += routing_total
    passed_tests += routing_passed

# Fallback results
fallback_passed = sum(1 for r in test_categories['fallbacks'].values() if r.get('success') and r.get('contexts', 0) > 0)
fallback_total = len(test_categories['fallbacks'])
if fallback_total > 0:
    print(f"🔄 Fallback Methods: {fallback_passed}/{fallback_total} successful")
    total_tests += fallback_total
    passed_tests += fallback_passed

# End-to-end results
e2e_passed = len([r for r in test_categories['end_to_end'] if 'error' not in r])
e2e_total = len(test_categories['end_to_end'])
if e2e_total > 0:
    print(f"🚀 End-to-End Processing: {e2e_passed}/{e2e_total} successful")
    total_tests += e2e_total
    passed_tests += e2e_passed

# Error handling results
error_passed = sum(1 for r in test_categories['error_handling'] if r.get('success'))
error_total = len(test_categories['error_handling'])
if error_total > 0:
    print(f"🛡️  Error Handling: {error_passed}/{error_total} handled gracefully")
    total_tests += error_total
    passed_tests += error_passed

# Overall score
if total_tests > 0:
    overall_score = (passed_tests / total_tests) * 100
    print(f"\n🎯 OVERALL SUCCESS RATE: {passed_tests}/{total_tests} ({overall_score:.1f}%)")
    
    if overall_score >= 90:
        print("🏆 EXCELLENT - Workflow is production ready")
    elif overall_score >= 80:
        print("✅ GOOD - Workflow is functional with minor issues")
    elif overall_score >= 70:
        print("⚠️  ACCEPTABLE - Workflow needs some improvements")
    else:
        print("❌ NEEDS WORK - Workflow requires significant fixes")
else:
    print("❌ NO TESTS EXECUTED - Check test environment setup")

# Performance Metrics
print(f"\n⏱️  PERFORMANCE METRICS:")
print("-" * 30)

if test_categories['routing']:
    avg_routing_time = sum(r.get('routing_time', 0) for r in test_categories['routing']) / len(test_categories['routing'])
    print(f"🎯 Average routing time: {avg_routing_time:.2f}s")

if test_categories['end_to_end']:
    successful_e2e = [r for r in test_categories['end_to_end'] if 'error' not in r]
    if successful_e2e:
        avg_e2e_time = sum(r.get('processing_time', 0) for r in successful_e2e) / len(successful_e2e)
        avg_contexts = sum(r.get('contexts_count', 0) for r in successful_e2e) / len(successful_e2e)
        print(f"🚀 Average end-to-end time: {avg_e2e_time:.2f}s")
        print(f"📄 Average contexts retrieved: {avg_contexts:.1f}")

if test_categories['fallbacks']:
    successful_fallbacks = [r for r in test_categories['fallbacks'].values() if r.get('success')]
    if successful_fallbacks:
        avg_fallback_time = sum(r.get('time', 0) for r in successful_fallbacks) / len(successful_fallbacks)
        print(f"🔄 Average fallback time: {avg_fallback_time:.2f}s")

# Key Findings and Recommendations
print(f"\n🔍 KEY FINDINGS:")
print("-" * 25)

findings = []

# Check initialization issues
if test_categories['initialization'].get('errors'):
    findings.append("❌ Initialization errors detected - check environment variables")

# Check routing accuracy
if test_categories['routing']:
    routing_accuracy = (routing_passed / max(routing_total, 1)) * 100
    if routing_accuracy < 80:
        findings.append(f"⚠️  Routing accuracy is {routing_accuracy:.1f}% - review supervisor logic")

# Check retrieval success
if test_categories['fallbacks']:
    retrieval_success = (fallback_passed / max(fallback_total, 1)) * 100
    if retrieval_success < 70:
        findings.append(f"⚠️  Retrieval success is {retrieval_success:.1f}% - check Supabase connection")

# Check error handling
if test_categories['error_handling']:
    error_handling_rate = (error_passed / max(error_total, 1)) * 100
    if error_handling_rate < 80:
        findings.append(f"⚠️  Error handling rate is {error_handling_rate:.1f}% - improve robustness")

# Positive findings
if not findings:
    findings.append("✅ All systems functioning within acceptable parameters")

if overall_score >= 90:
    findings.append("🎉 Workflow exceeds quality thresholds for production deployment")

for finding in findings:
    print(f"   {finding}")

# Recommendations
print(f"\n💡 RECOMMENDATIONS:")
print("-" * 25)

recommendations = []

# Performance recommendations
if test_categories['end_to_end']:
    successful_e2e = [r for r in test_categories['end_to_end'] if 'error' not in r]
    if successful_e2e:
        avg_e2e_time = sum(r.get('processing_time', 0) for r in successful_e2e) / len(successful_e2e)
        if avg_e2e_time > 10:
            recommendations.append("⚡ Consider caching or optimization for >10s processing times")

# Error handling recommendations
if test_categories['error_handling']:
    failed_errors = [r for r in test_categories['error_handling'] if not r.get('success')]
    if failed_errors:
        recommendations.append("🛡️  Add validation and sanitization for edge cases")

# Fallback recommendations
if test_categories['fallbacks']:
    failed_fallbacks = [k for k, r in test_categories['fallbacks'].items() if not r.get('success')]
    if failed_fallbacks:
        recommendations.append(f"🔄 Fix fallback methods: {', '.join(failed_fallbacks)}")

# General recommendations
if overall_score < 100:
    recommendations.append("🔍 Address failing test cases before production deployment")

if not recommendations:
    recommendations.append("✅ Workflow is ready for production deployment")

recommendations.append("📊 Monitor performance metrics in production environment")
recommendations.append("🔄 Run these tests regularly as part of CI/CD pipeline")

for rec in recommendations:
    print(f"   {rec}")

print(f"\n📝 TEST COMPLETION SUMMARY:")
print(f"   📅 Test session completed at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print(f"   ⏱️  Total test execution time: ~{total_tests * 0.5:.1f} seconds (estimated)")
print(f"   🏷️  Workflow version: MultiAgentWorkflow API Integration Testing")
print(f"   🎯 Recommended next step: {'Deploy to staging environment' if overall_score >= 80 else 'Address failing tests'}")

print("=" * 70)


📋 COMPREHENSIVE TEST REPORT
📅 Generated: 2025-08-18 19:08:40

📊 OVERALL TEST STATISTICS:
--------------------------------------------------
🔧 Initialization: 5/5 components
🎯 Routing Decisions: 4/4 correct
🔄 Fallback Methods: 3/3 successful
🚀 End-to-End Processing: 3/3 successful
🛡️  Error Handling: 5/5 handled gracefully

🎯 OVERALL SUCCESS RATE: 20/20 (100.0%)
🏆 EXCELLENT - Workflow is production ready

⏱️  PERFORMANCE METRICS:
------------------------------
🎯 Average routing time: 1.70s
🚀 Average end-to-end time: 8.18s
📄 Average contexts retrieved: 10.0
🔄 Average fallback time: 1.64s

🔍 KEY FINDINGS:
-------------------------
   ✅ All systems functioning within acceptable parameters
   🎉 Workflow exceeds quality thresholds for production deployment

💡 RECOMMENDATIONS:
-------------------------
   ✅ Workflow is ready for production deployment
   📊 Monitor performance metrics in production environment
   🔄 Run these tests regularly as part of CI/CD pipeline

📝 TEST COMPLETION SUMMARY: