# LLMMultiRoundRouter - Inference

This notebook demonstrates how to use the **LLMMultiRoundRouter** for multi-round query processing.

## Overview

LLMMultiRoundRouter uses LLM prompts for both query decomposition and routing decisions.
Unlike KNN-based routers, it doesn't require training - it uses LLM reasoning directly.

**Pipeline**:
1. **Decompose + Route**: LLM breaks query into sub-queries AND routes each in one step
2. **Execute**: Call routed model APIs for each sub-query
3. **Aggregate**: LLM combines responses into final answer

**Key Features**:
- No training required (LLM-based reasoning)
- Single-step decomposition and routing
- Model descriptions guide routing decisions
- Supports both local vLLM and API-based inference

## 1. Environment Setup

In [None]:
# For Google Colab
import os

if 'COLAB_GPU' in os.environ:
    !git clone https://github.com/ulab-uiuc/LLMRouter.git
    %cd LLMRouter
    !pip install -e .
    !pip install pyyaml openai

In [None]:
import os
import sys
from pathlib import Path

PROJECT_ROOT = Path(os.getcwd()).parent.parent
if str(PROJECT_ROOT) not in sys.path:
    sys.path.insert(0, str(PROJECT_ROOT))

os.chdir(PROJECT_ROOT)
print(f"Working directory: {os.getcwd()}")

In [None]:
from llmrouter.utils import setup_environment
import yaml

setup_environment()
print("Environment setup complete!")

## 2. Configuration

LLMMultiRoundRouter requires:

| Parameter | Description | Required |
|-----------|-------------|----------|
| `llm_data` | LLM candidates with descriptions | Yes |
| `base_model` | LLM for decomposition/aggregation | Yes |
| `api_endpoint` | API endpoint for execution | Yes |
| `use_local_llm` | Use vLLM for local inference | No |

In [None]:
# Create configuration for LLMMultiRoundRouter
llm_multi_config = {
    "data_path": {
        "query_data_test": "data/example_data/query_data/default_query_test.jsonl",
        "routing_data_test": "data/example_data/routing_data/default_routing_test_data.jsonl",
        "llm_data": "data/example_data/llm_candidates/default_llm.json"
    },
    "base_model": "Qwen/Qwen2.5-3B-Instruct",
    "use_local_llm": False,
    "api_endpoint": os.environ.get("API_ENDPOINT", "https://api.openai.com/v1")
}

# Save config
CONFIG_PATH = "configs/model_config_train/llmmultiroundrouter_temp.yaml"
os.makedirs(os.path.dirname(CONFIG_PATH), exist_ok=True)

with open(CONFIG_PATH, 'w') as f:
    yaml.dump(llm_multi_config, f, default_flow_style=False)

print("Configuration:")
print("=" * 50)
print(yaml.dump(llm_multi_config, default_flow_style=False))

## 3. Initialize Router

In [None]:
from llmrouter.models.llmmultiroundrouter import LLMMultiRoundRouter

try:
    router = LLMMultiRoundRouter(yaml_path=CONFIG_PATH)
    print("Router initialized successfully!")
    print(f"Base model: {router.base_model}")
    print(f"Use local LLM: {router.use_local_llm}")
    print(f"Number of LLM candidates: {len(router.llm_data)}")
except Exception as e:
    print(f"Error initializing router: {e}")

In [None]:
# Display LLM candidates with their descriptions
print("Available LLM Candidates:")
print("=" * 60)

for name, info in router.llm_data.items():
    print(f"\n{name}:")
    if 'description' in info:
        print(f"  Description: {info['description'][:100]}...")
    if 'size' in info:
        print(f"  Size: {info['size']}B parameters")
    if 'capabilities' in info:
        print(f"  Capabilities: {info['capabilities']}")

## 4. Chat Mode (Simple Queries)

For simple queries, pass a string and get a string response.

In [None]:
# Chat mode - simple string input/output
query = "What are the main causes of climate change and what solutions exist?"

print(f"Query: {query}")
print("=" * 60)

try:
    response = router.route_single(query)
    print(f"\nResponse:\n{response}")
except Exception as e:
    print(f"Error: {e}")
    print("\nNote: LLMMultiRoundRouter requires API access for LLM calls.")

## 5. Evaluation Mode (With Metrics)

For evaluation, pass a dict with task_name and ground_truth.

In [None]:
# Evaluation mode - dict input with metrics
eval_query = {
    "query": "What is the largest planet in our solar system?",
    "task_name": "trivia",
    "ground_truth": "Jupiter"
}

print(f"Query: {eval_query['query']}")
print(f"Task: {eval_query['task_name']}")
print(f"Ground Truth: {eval_query['ground_truth']}")
print("=" * 60)

try:
    result = router.route_single(eval_query)
    
    print(f"\nResponse: {result.get('response', 'N/A')}")
    print(f"Success: {result.get('success', False)}")
    print(f"Prompt Tokens: {result.get('prompt_tokens', 0)}")
    print(f"Completion Tokens: {result.get('completion_tokens', 0)}")
    if 'task_performance' in result:
        print(f"Task Performance: {result['task_performance']:.2f}")
except Exception as e:
    print(f"Error: {e}")

## 6. Batch Processing

In [None]:
# Batch processing
batch_queries = [
    {"query": "Explain quantum computing."},
    {"query": "What is the difference between AI and ML?"},
    {"query": "How does blockchain technology work?"},
]

print(f"Processing {len(batch_queries)} queries...")
print("=" * 60)

try:
    results = router.route_batch(batch_queries)
    
    for i, result in enumerate(results, 1):
        print(f"\n{i}. Query: {result.get('query', 'N/A')[:50]}...")
        print(f"   Success: {result.get('success', False)}")
        response = result.get('response', 'N/A')
        print(f"   Response: {response[:100] if response else 'N/A'}...")
except Exception as e:
    print(f"Error: {e}")

## 7. Task-Specific Routing

LLMMultiRoundRouter supports task-specific prompts for evaluation.

In [None]:
# Multiple choice task
mc_query = {
    "query": "What is the capital of Australia? A) Sydney B) Melbourne C) Canberra D) Perth",
    "task_name": "commonsense_qa",
    "choices": ["Sydney", "Melbourne", "Canberra", "Perth"],
    "ground_truth": "C"
}

print(f"Multiple Choice Query:")
print(f"Question: {mc_query['query']}")
print(f"Ground Truth: {mc_query['ground_truth']}")
print("=" * 60)

try:
    results = router.route_batch([mc_query], task_name="commonsense_qa")
    result = results[0]
    
    print(f"\nResponse: {result.get('response', 'N/A')}")
    if 'task_performance' in result:
        print(f"Correct: {result['task_performance'] == 1.0}")
except Exception as e:
    print(f"Error: {e}")

## 8. Understanding the Pipeline

Let's examine how LLMMultiRoundRouter works.

In [None]:
print("LLMMultiRoundRouter Pipeline:")
print("=" * 60)

print("""
┌─────────────────────────────────────────────────────────────┐
│                    Input Query                              │
│  "What are the causes of climate change and solutions?"    │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│              Step 1: Decompose + Route                      │
│  LLM breaks query into sub-queries AND routes each          │
│                                                             │
│  Output format: <sub-query>: <model-name>                   │
│  • "What causes climate change?": Qwen-7B                   │
│  • "What solutions exist for climate change?": Llama-70B    │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│              Step 2: Execute Sub-queries                    │
│  Call routed model API for each sub-query                   │
│                                                             │
│  • Qwen-7B → "Greenhouse gases, deforestation..."           │
│  • Llama-70B → "Renewable energy, carbon capture..."        │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│              Step 3: Aggregate Responses                    │
│  LLM combines sub-responses into coherent final answer      │
│                                                             │
│  "Climate change is caused by greenhouse gases...           │
│   Solutions include renewable energy and..."                │
└─────────────────────────────────────────────────────────────┘
""")

print("\nKey Advantages:")
print("• No training required - uses LLM reasoning directly")
print("• Model descriptions guide intelligent routing decisions")
print("• Single-step decomposition and routing for efficiency")
print("• Flexible aggregation based on task type")

## 9. Comparison with KNNMultiRoundRouter

In [None]:
print("LLMMultiRoundRouter vs KNNMultiRoundRouter:")
print("=" * 60)

comparison = """
| Feature              | LLMMultiRoundRouter     | KNNMultiRoundRouter     |
|----------------------|-------------------------|-------------------------|
| Training Required    | No                      | Yes (KNN fitting)       |
| Routing Method       | LLM reasoning           | KNN on embeddings       |
| Model Selection      | Based on descriptions   | Based on similarity     |
| Decomposition        | Same LLM call           | Separate LLM call       |
| Flexibility          | High (prompt-based)     | Medium (learned)        |
| Inference Cost       | Higher (more LLM calls) | Lower (KNN is fast)     |
| Cold Start           | Works immediately       | Needs training data     |
"""
print(comparison)

## 10. File-Based Inference

Load queries from a file and save results.

In [None]:
import json

# Load queries from a JSONL file
def load_queries_from_file(file_path):
    """Load queries from a JSONL file."""
    queries = []
    with open(file_path, 'r', encoding='utf-8') as f:
        for line in f:
            if line.strip():
                queries.append(json.loads(line))
    return queries

# Save results to a JSONL file
def save_results_to_file(results, output_path):
    """Save routing results to a JSONL file."""
    os.makedirs(os.path.dirname(output_path), exist_ok=True)
    with open(output_path, 'w', encoding='utf-8') as f:
        for result in results:
            f.write(json.dumps(result, ensure_ascii=False) + '\n')
    print(f"Results saved to: {output_path}")

# Example: Load from default query file
QUERY_FILE = "data/example_data/query_data/default_query_test.jsonl"
OUTPUT_FILE = "outputs/llmmultiroundrouter_results.jsonl"

if os.path.exists(QUERY_FILE):
    # Load queries
    file_queries = load_queries_from_file(QUERY_FILE)
    print(f"Loaded {len(file_queries)} queries from: {QUERY_FILE}")
    
    # Route queries (limit to 5 for demo due to API costs)
    try:
        file_results = router.route_batch(file_queries[:5])
        print(f"Routed {len(file_results)} queries")
        
        # Save results
        save_results_to_file(file_results, OUTPUT_FILE)
        
        # Show sample results
        print(f"\nSample results:")
        for i, result in enumerate(file_results[:3], 1):
            print(f"  {i}. {result.get('query', '')[:40]}...")
            print(f"     Success: {result.get('success', False)}")
    except Exception as e:
        print(f"Error during batch routing: {e}")
else:
    print(f"Query file not found: {QUERY_FILE}")
    print("Create a JSONL file with format: {\"query\": \"Your question\"}")

## Summary

**LLMMultiRoundRouter** provides:
- Zero-shot multi-round routing (no training)
- LLM-based decomposition and routing in one step
- Model description-guided routing decisions
- Flexible aggregation for different task types

**Use Cases**:
- Quick prototyping without training data
- Complex queries requiring expert routing decisions
- When model descriptions are more reliable than embeddings
- Low-volume, high-quality routing needs

**Requirements**:
- API access for LLM calls
- Optional: vLLM for local inference
- Model descriptions in llm_data (recommended)