# SmallestLLM Router - Inference

This notebook demonstrates the **SmallestLLM** baseline router.

## Overview

SmallestLLM is a simple baseline router that always routes queries to the smallest model in the candidate pool.
This serves as a lower bound for routing performance and an upper bound for cost efficiency.

**Key Characteristics**:
- No training required (deterministic baseline)
- Always selects the smallest model by parameter size
- Useful for cost-efficiency benchmarking
- Lowest latency routing

## 1. Environment Setup

In [None]:
# For Google Colab: Clone repository and install dependencies
import os

if 'COLAB_GPU' in os.environ:
    !git clone https://github.com/ulab-uiuc/LLMRouter.git
    %cd LLMRouter
    !pip install -e .
    !pip install pyyaml

In [None]:
import os
import sys
from pathlib import Path

PROJECT_ROOT = Path(os.getcwd()).parent.parent
if str(PROJECT_ROOT) not in sys.path:
    sys.path.insert(0, str(PROJECT_ROOT))

os.chdir(PROJECT_ROOT)
print(f"Working directory: {os.getcwd()}")

In [None]:
from llmrouter.models.smallest_llm import SmallestLLMRouter
from llmrouter.utils import setup_environment
import yaml

setup_environment()
print("Environment setup complete!")

## 2. Configuration

SmallestLLM router requires only data paths - no hyperparameters.

| Parameter | Description |
|-----------|-------------|
| `llm_data` | Path to LLM candidate metadata |
| `routing_data_test` | Path to test routing data |

In [None]:
CONFIG_PATH = "configs/model_config_train/smallest_llm.yaml"

with open(CONFIG_PATH, 'r') as f:
    config = yaml.safe_load(f)

print("Current Configuration:")
print("=" * 50)
print(yaml.dump(config, default_flow_style=False))

## 3. Initialize Router

In [None]:
router = SmallestLLMRouter(yaml_path=CONFIG_PATH)

print("Router initialized successfully!")
print(f"Number of LLM candidates: {len(router.llm_data)}")

In [None]:
# Display available LLM candidates sorted by size
print("Available LLM Candidates (by size):")
print("=" * 60)

llm_list = [(name, info.get('size', 'unknown')) for name, info in router.llm_data.items()]
llm_list_sorted = sorted(llm_list, key=lambda x: float(x[1]) if isinstance(x[1], (int, float)) else 0)

for i, (name, size) in enumerate(llm_list_sorted, 1):
    marker = " <- SMALLEST" if i == 1 else ""
    print(f"{i}. {name}: {size}B parameters{marker}")

## 4. Query Routing

SmallestLLM always routes to the smallest model, regardless of query complexity.

In [None]:
EXAMPLE_QUERIES = [
    {"query": "What is 2 + 2?"},  # Simple
    {"query": "Explain the theory of general relativity."},  # Medium
    {"query": "Prove P != NP."},  # Complex
]

print("Routing Results:")
print("=" * 60)

for i, query in enumerate(EXAMPLE_QUERIES, 1):
    result = router.route_single(query)
    print(f"{i}. {query['query'][:50]}...")
    print(f"   Routed to: {result['model_name']}")
    print()

## 5. Batch Routing

In [None]:
# Route test data
test_queries = router.routing_data_test[:10]

print(f"Routing {len(test_queries)} test queries...")
results = router.route(test_queries)

print(f"\nRouting Distribution:")
from collections import Counter
model_counts = Counter([r['model_name'] for r in results])
for model, count in model_counts.most_common():
    print(f"  {model}: {count} ({100*count/len(results):.1f}%)")

## 6. Evaluation

In [None]:
from llmrouter.evaluator import Evaluator

evaluator = Evaluator(router=router)
metrics = evaluator.eval()

print("\nEvaluation Results:")
print("=" * 50)
for metric_name, value in metrics.items():
    if isinstance(value, float):
        print(f"{metric_name}: {value:.4f}")
    else:
        print(f"{metric_name}: {value}")

## 7. File-Based Inference

Load queries from a custom file and save results.

In [None]:
import json

# Load queries from a JSONL file
def load_queries_from_file(file_path):
    """Load queries from a JSONL file."""
    queries = []
    with open(file_path, 'r', encoding='utf-8') as f:
        for line in f:
            if line.strip():
                queries.append(json.loads(line))
    return queries

# Save results to a JSONL file
def save_results_to_file(results, output_path):
    """Save routing results to a JSONL file."""
    os.makedirs(os.path.dirname(output_path), exist_ok=True)
    with open(output_path, 'w', encoding='utf-8') as f:
        for result in results:
            f.write(json.dumps(result, ensure_ascii=False) + '\n')
    print(f"Results saved to: {output_path}")

# Example: Load from your own query file
QUERY_FILE = "data/example_data/query_data/default_query_test.jsonl"
OUTPUT_FILE = "outputs/smallest_llm_results.jsonl"

if os.path.exists(QUERY_FILE):
    # Load queries from file
    file_queries = load_queries_from_file(QUERY_FILE)
    print(f"Loaded {len(file_queries)} queries from: {QUERY_FILE}")
    
    # Route queries using route_batch
    file_results = router.route_batch(batch=file_queries[:10])
    print(f"Routed {len(file_results)} queries")
    
    # Save results to file
    save_results_to_file(file_results, OUTPUT_FILE)
    
    # Show sample results
    print(f"\nSample results:")
    for i, result in enumerate(file_results[:3], 1):
        print(f"  {i}. {result.get('query', '')[:40]}... -> {result['model_name']}")
else:
    print(f"Query file not found: {QUERY_FILE}")
    print("Create a JSONL file with format: {\"query\": \"Your question\"}")

## Summary

**SmallestLLM Router**:
- Always routes to the smallest model
- No training required (deterministic baseline)
- Provides lower bound for performance, upper bound for cost efficiency
- Useful for comparing against learned routing methods

**Use Cases**:
- Baseline comparison for routing research
- Cost-critical applications with simple queries
- Latency-sensitive scenarios