# Routing Strategies in langchain-fused-model

This notebook demonstrates all available routing strategies and how to choose the right one for your use case.

## Available Strategies

1. **PRIORITY** - Use models in priority order
2. **ROUND_ROBIN** - Distribute requests evenly
3. **LEAST_USED** - Prefer models with fewest requests
4. **COST_AWARE** - Route to lowest cost models
5. **Custom** - Define your own strategy function

## Setup

In [None]:
import os
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from langchain_fused_model import MultiModelManager, ModelConfig, RoutingStrategy

# Set your API keys
# os.environ["OPENAI_API_KEY"] = "your-openai-key"
# os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-key"

# Create three models for testing
models = [
    ChatOpenAI(model="gpt-4", temperature=0.7),
    ChatOpenAI(model="gpt-3.5-turbo", temperature=0.7),
    ChatAnthropic(model="claude-3-sonnet-20240229", temperature=0.7),
]

print(f"Created {len(models)} models for testing")

## 1. Priority-Based Routing

Routes to the highest priority available model. Best for:
- Preferring premium models with fallback to cheaper alternatives
- Quality-first approach with cost-effective fallbacks

In [None]:
# Configure priorities (higher = more preferred)
priority_configs = [
    ModelConfig(priority=100, max_rpm=10),  # GPT-4 - highest priority, low limit
    ModelConfig(priority=50, max_rpm=60),   # GPT-3.5 - medium priority
    ModelConfig(priority=10, max_rpm=120),  # Claude - lowest priority
]

priority_manager = MultiModelManager(
    models=models,
    model_configs=priority_configs,
    strategy=RoutingStrategy.PRIORITY,
    default_fallback=True
)

print("\n=== Priority-Based Routing ===")
print("Model priorities: GPT-4 (100) > GPT-3.5 (50) > Claude (10)")

# Make several requests
for i in range(5):
    response = priority_manager.invoke(f"What is {i+1} + {i+1}?")
    print(f"Request {i+1}: {response.content[:50]}")

# Check which models were used
stats = priority_manager._usage_tracker.get_all_stats()
print("\nUsage distribution:")
for idx, stat in stats.items():
    print(f"  Model {idx}: {stat.total_requests} requests")

## 2. Cost-Aware Routing

Routes to the lowest cost model. Best for:
- Cost optimization
- Budget-conscious applications
- High-volume workloads

In [None]:
# Configure costs (per 1k tokens)
cost_configs = [
    ModelConfig(cost_per_1k_tokens=0.03, max_rpm=60),   # GPT-4 - expensive
    ModelConfig(cost_per_1k_tokens=0.002, max_rpm=60),  # GPT-3.5 - cheap
    ModelConfig(cost_per_1k_tokens=0.015, max_rpm=60),  # Claude - medium
]

cost_manager = MultiModelManager(
    models=models,
    model_configs=cost_configs,
    strategy=RoutingStrategy.COST_AWARE,
    default_fallback=True
)

print("\n=== Cost-Aware Routing ===")
print("Model costs: GPT-4 ($0.03) > Claude ($0.015) > GPT-3.5 ($0.002)")

# Make several requests
for i in range(5):
    response = cost_manager.invoke(f"What is the color of the sky?")
    print(f"Request {i+1}: {response.content[:50]}")

# Check which models were used
stats = cost_manager._usage_tracker.get_all_stats()
print("\nUsage distribution:")
for idx, stat in stats.items():
    cost = cost_configs[idx].cost_per_1k_tokens
    print(f"  Model {idx} (${cost}/1k): {stat.total_requests} requests")

## 3. Round-Robin Routing

Distributes requests evenly across models. Best for:
- Load balancing
- Testing multiple models
- Avoiding rate limits on any single model

In [None]:
# Simple configs for round-robin
rr_configs = [
    ModelConfig(max_rpm=60),
    ModelConfig(max_rpm=60),
    ModelConfig(max_rpm=60),
]

rr_manager = MultiModelManager(
    models=models,
    model_configs=rr_configs,
    strategy=RoutingStrategy.ROUND_ROBIN
)

print("\n=== Round-Robin Routing ===")
print("Requests will be distributed evenly across all models")

# Make several requests
for i in range(9):
    response = rr_manager.invoke(f"Count to {i+1}")
    print(f"Request {i+1}: {response.content[:30]}...")

# Check distribution
stats = rr_manager._usage_tracker.get_all_stats()
print("\nUsage distribution (should be roughly equal):")
for idx, stat in stats.items():
    print(f"  Model {idx}: {stat.total_requests} requests")

## 4. Least-Used Routing

Routes to the model with fewest total requests. Best for:
- Balancing usage over time
- Avoiding overuse of any single model
- Dynamic load distribution

In [None]:
# Simple configs for least-used
lu_configs = [
    ModelConfig(max_rpm=60),
    ModelConfig(max_rpm=60),
    ModelConfig(max_rpm=60),
]

lu_manager = MultiModelManager(
    models=models,
    model_configs=lu_configs,
    strategy=RoutingStrategy.LEAST_USED
)

print("\n=== Least-Used Routing ===")
print("Each request goes to the model with fewest total requests")

# Make several requests
for i in range(9):
    response = lu_manager.invoke(f"What is {i}?")
    print(f"Request {i+1}: {response.content[:30]}...")

# Check distribution
stats = lu_manager._usage_tracker.get_all_stats()
print("\nUsage distribution (should be balanced):")
for idx, stat in stats.items():
    print(f"  Model {idx}: {stat.total_requests} requests")

## 5. Custom Strategy

Define your own routing logic. Best for:
- Complex business logic
- Custom optimization criteria
- Specialized use cases

In [None]:
def success_rate_strategy(models, configs, usage_stats, available_models):
    """
    Custom strategy: prefer models with highest success rate.
    Falls back to first available model if no stats yet.
    """
    best_model = available_models[0]
    best_rate = 0.0
    
    for idx in available_models:
        stats = usage_stats.get(idx)
        if stats and stats.total_requests > 0:
            success_rate = stats.successful_requests / stats.total_requests
            if success_rate > best_rate:
                best_rate = success_rate
                best_model = idx
    
    return best_model

custom_manager = MultiModelManager(
    models=models,
    model_configs=rr_configs,
    strategy=success_rate_strategy
)

print("\n=== Custom Strategy (Success Rate) ===")
print("Routes to model with highest success rate")

# Make several requests
for i in range(6):
    response = custom_manager.invoke(f"Hello {i}")
    print(f"Request {i+1}: {response.content[:30]}...")

# Check distribution and success rates
stats = custom_manager._usage_tracker.get_all_stats()
print("\nUsage and success rates:")
for idx, stat in stats.items():
    if stat.total_requests > 0:
        success_rate = stat.successful_requests / stat.total_requests * 100
        print(f"  Model {idx}: {stat.total_requests} requests, {success_rate:.1f}% success")

## Strategy Comparison

Let's compare all strategies side by side:

In [None]:
import pandas as pd

# Create a comparison table
comparison_data = {
    "Strategy": ["PRIORITY", "COST_AWARE", "ROUND_ROBIN", "LEAST_USED", "Custom"],
    "Best For": [
        "Quality-first with fallback",
        "Cost optimization",
        "Load balancing",
        "Usage balancing",
        "Custom logic"
    ],
    "Distribution": [
        "Uneven (by priority)",
        "Uneven (by cost)",
        "Even rotation",
        "Balanced over time",
        "Depends on logic"
    ],
    "Use Case": [
        "Premium model with cheap fallback",
        "High-volume, budget-conscious",
        "Testing, rate limit avoidance",
        "Long-running applications",
        "Complex requirements"
    ]
}

df = pd.DataFrame(comparison_data)
print("\n=== Strategy Comparison ===")
print(df.to_string(index=False))

## Choosing the Right Strategy

### Use PRIORITY when:
- You want to prefer specific models (e.g., GPT-4) but have fallbacks
- Quality is more important than cost
- You have tiered model access (premium → standard → basic)

### Use COST_AWARE when:
- Cost optimization is your primary goal
- You're processing high volumes
- All models meet your quality requirements

### Use ROUND_ROBIN when:
- You want even distribution across models
- Testing multiple models simultaneously
- Avoiding rate limits on any single provider

### Use LEAST_USED when:
- You want balanced usage over time
- Running long-term applications
- Avoiding overuse of any single model

### Use Custom when:
- You have specific business logic
- Need to combine multiple factors (cost + quality + availability)
- Have unique optimization criteria

## Conclusion

This notebook demonstrated:
- All five routing strategies
- When to use each strategy
- How to implement custom strategies
- Comparing strategies side by side

Choose the strategy that best fits your use case, or combine multiple managers with different strategies for different parts of your application!