Model Routing System

System Architecture Overview

The Provider Model Routing System is an intelligent, multi-layered routing infrastructure that enables multiple providers to offer the same AI models while automatically selecting the optimal provider based on real-time metrics, user preferences, and system health.

Core Components Structure

┌─────────────────────────────────────────────────────────────────┐
│                    USER REQUEST (Model ID)                      │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│              RELAY HANDLER                                      │
│  • Request validation                                           │
│  • Channel selection with routing                               │
│  • Circuit breaker checking                                     │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│         CHANNEL SERVICE                                         │
│  get_routed_channel()                                           │
│  ├── Try Provider Model Routing (STEP 1)                        │
│  └── Fallback to Legacy Channel Routing (STEP 2)                │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│      MODEL ROUTING SERVICE                                      │
│                                                                 │
│  route_request()                                                │
│  ├─► 1. get_model_providers()      [Provider Discovery]         │
│  ├─► 2. load_user_preferences()    [User Prefs Loading]         │
│  ├─► 3. load_routing_config()      [Model Config Loading]       │
│  ├─► 4. score_providers()          [Intelligent Scoring]        │
│  └─► 5. select_provider()          [Final Selection]            │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│           ROUTING DECISION                                      │
│  • Selected Provider + Channel                                  │
│  • Fallback Providers (ordered)                                 │
│  • Routing Reason & Score                                       │
│  • Strategy Used                                                │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│      CIRCUIT BREAKER CHECK                                      │
│  should_allow_request()                                         │
│  • Closed → Allow                                               │
│  • Open → Try Fallback                                          │
│  • Half-Open → Limited Allow                                    │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│              EXECUTE REQUEST ON SELECTED CHANNEL                │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌──────────────────────────────────────────────────────────────────┐
│        METRICS RECORDING                                         │
│  record_request()                                                │
│  • Latency tracking                                              │
│  • Success/failure counting                                      │
│  • Token usage                                                   │
│  • Quality score calculation                                     │
│  • Circuit breaker state update                                  │
└──────────────────────────────────────────────────────────────────┘

Database Schema Architecture

1. provider_models (Source of Truth)

Provider-submitted model definitions

├── Model Info: model_id, model_name, description
├── Provider: provider_id, provider_name, channel_id
├── Pricing: pricing_prompt, pricing_completion, pricing_image
├── Specs: context_length, modality, supported_parameters
└── Status: status (0=pending, 1=approved, 2=rejected)

2. provider_model_metrics (Real-time Performance)

Live performance tracking per provider-model-channel

├── Cumulative: total_requests, successful_requests, failed_requests
├── Latency: avg/p50/p95/p99/min/max_latency_ms
├── Time Windows: last_hour, last_24h metrics
├── Circuit Breaker: circuit_state, consecutive_failures/successes
├── Quality: quality_score (0.0-1.0)
└── Token Throughput: total tokens, avg_tokens_per_second

3. model_routing_config (Per-Model Configuration)

Admin-configurable routing rules per model

├── Weights: latency_weight, success_rate_weight, price_weight
├── Strategy: default_strategy (performance/cost/balanced/round_robin)
├── Fallback: enable_auto_fallback, max_fallback_attempts
└── Circuit: failure_threshold, recovery_timeout_seconds

4. user_routing_preferences (User Preferences)

Per-user routing customization

├── Strategy: default_strategy
├── Providers: preferred_providers[], blocked_providers[]
├── Limits: max_price_per_million_tokens, min_success_rate, max_latency_ms
└── Requirements: require_streaming, require_function_calling

5. routing_decision_logs (Audit Trail)

Complete history of routing decisions

├── Decision: selected_provider_id, routing_strategy, routing_reason
├── Candidates: candidates_count, candidates_json
├── Fallback: fallback_providers[], is_fallback_request
└── Performance: routing_duration_us

Detailed Process Flow

Phase 1: Request Initiation

User/API Request
     │
     ├─► Model ID: "deepseek-chat"
     ├─► User ID: 12345
     └─► Optional: RoutingPreferences { strategy: "performance" }

Phase 2: Provider Discovery

SELECT provider_id, channel_id, provider_name, pricing, metrics, quality_score
FROM provider_models pm
LEFT JOIN provider_model_metrics pmm ON (...)
LEFT JOIN channels c ON pm.channel_id = c.id
WHERE pm.model_id = 'deepseek-chat' AND pm.status = 1
ORDER BY quality_score DESC

Output: List of ProviderCandidate structs

ProviderCandidate {
    provider_id: 5,
    channel_id: 23,
    provider_name: "Provider A",
    price_per_million_prompt: 2.50,
    price_per_million_completion: 10.00,
    success_rate: 0.98,
    avg_latency_ms: 450,
    quality_score: 0.92,
    circuit_state: Closed,
}

Phase 3: Configuration Loading

Model Config:

RoutingConfig {
    canonical_model_id: "deepseek-chat",
    latency_weight: 0.3,
    success_rate_weight: 0.4,
    price_weight: 0.2,
    provider_priority_weight: 0.1,
    default_strategy: "balanced",
}

User Preferences (merged with request prefs):

RoutingPreferences {
    strategy: Performance,
    prefer_providers: [5, 8],
    avoid_providers: [3],
    max_price: Some(15.0),
    min_success_rate: Some(0.95),
}

Phase 4: Intelligent Scoring

Strategy: Performance

score = success_rate * 0.4 + latency_score * 0.3 + quality_score * 0.1 + priority_bonus

Strategy: Cost

price_score = 1.0 - (avg_price / 100.0)
score = price_score * 0.6 + success_rate * 0.3 + quality_score * 0.1

Strategy: Balanced

perf_score = performance_score(candidate)
cost_score = cost_score(candidate)
score = perf_score * perf_weight + cost_score * cost_weight

Phase 5: Provider Selection

Filter by preferences:
- Remove avoided providers
- Check max_price threshold
- Check min_success_rate
- Check max_latency_ms
Boost preferred providers:
- Apply 50% score boost to preferred providers
Weighted random selection:
- Sort by score (descending)
- Take top 3 candidates
- Weighted random selection (prevents provider starvation)
Prepare fallback chain:
- Remaining candidates become fallback providers (up to 3)

Phase 6: Circuit Breaker Check

┌──────────────────────────────────────────────┐
│         Circuit State Machine                │
├──────────────────────────────────────────────┤
│                                              │
│  CLOSED ──────────────► OPEN                 │
│    ▲        (5 failures)    │                │
│    │                        │                │
│    │                    (60s timeout)        │
│    │                        │                │
│    │                        ▼                │
│    └───── HALF-OPEN ◄────────                │
│        (3 successes)                         │
│                                              │
└──────────────────────────────────────────────┘

States:
• CLOSED: Normal operation (all requests pass)
• OPEN: Block all requests, try fallbacks
• HALF-OPEN: Allow limited test requests

Circuit Breaker Decision:

If primary provider circuit is OPEN → Try fallback providers
If all circuits OPEN → Fallback to legacy routing
If circuit is CLOSED or HALF-OPEN → Proceed

Phase 7: Request Execution

Request sent to selected channel:

Channel {
    id: 23,
    provider_id: 5,
    base_url: "https://api.provider-a.com/v1",
    key: "encrypted_key",
    status: 1 (active)
}

Phase 8: Metrics Recording

After request completion:

ProviderMetricsService::record_request(
    provider_id: 5,
    model_id: "deepseek-chat",
    channel_id: 23,
    latency_ms: 450,
    success: true,
    prompt_tokens: 1500,
    completion_tokens: 300,
)

Metrics Update Process:

Record in memory buffer (fast)
Periodic aggregation (every 60 seconds)
Database update via update_provider_metrics() SQL function
Quality score recalculation
Circuit breaker state evaluation

Routing Strategies Explained

1. Performance Strategy

Goal: Maximize speed and reliability

Scoring Formula:

score = success_rate × 0.4 + latency_score × 0.3 + quality_score × 0.1 + priority_bonus

Best for:

Real-time applications
Latency-sensitive workloads
Production critical paths

Example:

Provider A: 98% success, 450ms → Score: 0.89
Provider B: 95% success, 800ms → Score: 0.78
Winner: Provider A

2. Cost Strategy

Goal: Minimize costs

Scoring Formula:

price_score = 1.0 - (avg_price / 100.0)
score = price_score × 0.6 + success_rate × 0.3 + quality_score × 0.1

Best for:

Batch processing
Development/testing
Cost-conscious applications

Example:

Provider A: $5/M → Score: 0.92
Provider B: $12/M → Score: 0.78
Winner: Provider A (cheaper)

3. Balanced Strategy (Default)

Goal: Optimize all factors

Scoring Formula:

Combined = performance_score × perf_weight + cost_score × cost_weight

Best for:

General purpose applications
Mixed workloads
Most production scenarios

4. Round-Robin Strategy

Goal: Equal distribution

Behavior:

All providers get equal score
Rotate through providers sequentially
No performance consideration

Best for:

Load distribution testing
Provider evaluation
Ensuring provider diversity

Key Advantages

1. Intelligent Provider Selection

Real-time metrics-based routing

Automatically routes to best-performing providers
Adapts to changing provider performance
No manual intervention required

Multi-dimensional scoring

Considers latency, success rate, cost, and quality
Configurable weights per model
Strategy-based optimization

2. High Availability & Fault Tolerance

Circuit breaker pattern

Failed Provider → Circuit Opens → Automatic Fallback
↓
Health Recovery → Circuit Half-Opens → Test Requests
↓
Success → Circuit Closes → Full Traffic Restoration

Automatic fallback chains

Up to 3 fallback providers per request
Ordered by score
Seamless failover on provider failure

No single point of failure

Multiple providers for same model
Instant failover without retries
Graceful degradation

3. Cost Optimization

Price-aware routing

Cost strategy prioritizes cheaper providers
Price thresholds per user
Balance cost vs performance

Provider competition

Multiple providers compete on price
Market-driven pricing
Automatic selection of best value

4. Performance Tracking

Comprehensive metrics

Latency: avg, p50, p95, p99, min, max
Success Rate: overall, last_hour, last_24h
Quality Score: calculated from success + latency + experience
Token Throughput: tokens/second tracking

Historical data

All-time cumulative metrics
Time-windowed metrics (hourly, daily)
Trend analysis capability

5. User Empowerment

Customizable preferences

UserPreferences {
    strategy: "performance",           // Choose optimization goal
    prefer_providers: [1, 5],          // Favorite providers
    avoid_providers: [3],              // Blacklist problematic ones
    max_price: 15.0,                   // Budget control
    min_success_rate: 0.95,            // Quality threshold
    max_latency_ms: 5000,              // Latency requirement
}

Per-request overrides

Can override preferences per API call
Flexible for different use cases
Maintains user defaults

6. Provider Ecosystem Benefits

Fair provider exposure

Weighted random selection prevents dominance
Quality providers get more traffic
New providers can compete

Transparent performance

Real metrics visible to admin
Quality score based on actual performance
Accountability for providers

7. Operational Excellence

Complete audit trail

routing_decision_logs:
- Every routing decision logged
- Full candidate list with scores
- Debugging and analytics
- 7-day retention (configurable)

Admin control

• Manual circuit breaker control
• Per-model routing configuration
• Provider approval workflow
• Analytics dashboard

8. Scalability

Efficient data structures

In-memory metrics buffering
Periodic batch updates to database
Minimal per-request overhead

Distributed-ready

Stateless routing decisions
Database-backed state
Redis-compatible circuit breakers

9. Developer Experience

Simple API integration

// Automatic routing - just pass model ID
let channel = ChannelService::get_routed_channel(
    &pool, "default", "deepseek-chat", user_id, None
).await?;

Simulation endpoint

POST /api/routing/simulate
{
  "model_id": "deepseek-chat",
  "preferences": { "strategy": "cost" }
}

10. Business Intelligence

Rich analytics

• Provider selection rates
• Strategy distribution
• Model usage patterns
• Cost analysis
• Performance trends

Performance Characteristics

Routing Decision Speed

Average: < 10ms
P99: < 50ms
Includes: DB queries + scoring + selection

Metrics Update

Memory buffer: ~1μs per record
DB flush: Every 60s (async, non-blocking)
Impact on request: Zero (async recording)

Database Queries

Provider lookup: Single JOIN query with indexes
Config loading: Cached or single query
Metrics aggregation: Periodic batch operation

Security & Isolation

Data Isolation

**Provider models completely separate from legacy channels No mixing of provider_models and abilities tables Clear separation of routing logic

Access Control

**Provider can only manage their own models Admin approval required for model visibility User-level routing preferences isolated

API Key Management

**Encrypted channel keys Provider-owned API keys Rotation support via provider_api_keys table

Future Enhancements

Planned Improvements

ML-based routing
- Predict provider performance
- Learn from user patterns
- Adaptive weight tuning
Geographic routing
- Provider location awareness
- Latency-based geo selection
- Regional failover
Advanced analytics
- Provider comparison dashboards
- Cost forecasting
- Performance predictions
Enhanced fallback strategies
- Intelligent retry with backoff
- Cross-model fallbacks
- Dynamic strategy switching

Configuration Examples

Example 1: High-Performance Setup

INSERT INTO model_routing_config VALUES (
    'deepseek-chat',
    0.35,  -- latency_weight (high)
    0.45,  -- success_rate_weight (high)
    0.10,  -- price_weight (low)
    0.10,  -- provider_priority_weight
    'performance'
);

Example 2: Cost-Optimized Setup

INSERT INTO model_routing_config VALUES (
    'deepseek-chat-v3.1',
    0.15,  -- latency_weight (low)
    0.35,  -- success_rate_weight (medium)
    0.40,  -- price_weight (high)
    0.10,  -- provider_priority_weight
    'cost'
);

Example 3: User Cost Control

UserRoutingPreferences {
    default_strategy: "cost",
    max_price_per_million_tokens: 10.0,  // Max $10/M
    min_success_rate: 0.90,              // Must maintain 90%+
    preferred_providers: [1, 5, 8],      // Try these first
}

Provider Routing System

1. High-Level System Architecture

┌────────────────────────────────────────────────────────────────────────────┐
│                         CLIENT APPLICATION                                 │
│                    (Web/Mobile/API Consumer)                               │
└────────────────────────────┬───────────────────────────────────────────────┘
                             │ HTTP Request
                             │ POST /v1/chat/completions
                             │ { "model": "deepseek-chat", "messages": [...] }
                             ▼
┌────────────────────────────────────────────────────────────────────────────┐
│                      ACTIX-WEB HTTP SERVER                                 │
│                    (backend/src/routes/)                                   │
└────────────────────────────┬───────────────────────────────────────────────┘
                             │
                             ▼
┌────────────────────────────────────────────────────────────────────────────┐
│                    RELAY HANDLER LAYER                                     │
│               (backend/src/relay/handlers.rs)                              │
│                                                                            │
│  ┌──────────────────────────────────────────────────────────────────────┐  │
│  │ 1. Authentication & Authorization                                    │  │
│  │ 2. Rate Limiting & Quotas                                            │  │
│  │ 3. Model Validation                                                  │  │
│  │ 4. select_channel_with_routing() ──────────────────────┐             │  │
│  └────────────────────────────────────────────────────────│─────────────┘  │
└───────────────────────────────────────────────────────────│────────────────┘
                                                            │
                             ┌──────────────────────────────┘
                             ▼
┌────────────────────────────────────────────────────────────────────────────┐
│                    ROUTING DECISION ENGINE                                 │
│               (backend/src/services/)                                      │
│                                                                            │
│  ┌───────────────────────────┐  ┌──────────────────────────────┐           │
│  │   ModelRoutingService     │  │    ChannelService            │           │
│  │   • route_request()       │◄─┤    • get_routed_channel()    │           │
│  │   • score_providers()     │  │    • Provider model routing  │           │
│  │   • select_provider()     │  │    • Legacy channel routing  │           │
│  └───────────┬───────────────┘  └──────────────────────────────┘           │
│              │                                                             │
│              ├──► load_user_preferences()                                  │
│              ├──► load_routing_config()                                    │
│              └──► get_model_providers() ──┐                                │
└───────────────────────────────────────────│────────────────────────────────┘
                                            │
                             ┌──────────────┘
                             ▼
┌────────────────────────────────────────────────────────────────────────────┐
│                       DATABASE LAYER (PostgreSQL)                          │
│                                                                            │
│  ┌──────────────────┐  ┌──────────────────┐  ┌─────────────────────────┐   │
│  │ provider_models  │  │ provider_model_  │  │ model_routing_config    │   │
│  │ • Model catalog  │  │   metrics        │  │ • Routing weights       │   │
│  │ • Pricing info   │  │ • Performance    │  │ • Default strategies    │   │
│  │ • Provider link  │  │ • Circuit state  │  │ • Fallback config       │   │
│  └──────────────────┘  └──────────────────┘  └─────────────────────────┘   │
│                                                                            │
│  ┌──────────────────┐  ┌──────────────────┐  ┌─────────────────────────┐   │
│  │ user_routing_    │  │ routing_decision_│  │ channels                │   │
│  │   preferences    │  │   logs           │  │ • Channel configs       │   │
│  │ • User settings  │  │ • Audit trail    │  │ • API keys              │   │
│  └──────────────────┘  └──────────────────┘  └─────────────────────────┘   │
└────────────────────────────────────────────────────────────────────────────┘
                             │
                             ▼
┌───────────────────────────────────────────────────────────────────────────┐
│                    SCORING & SELECTION ENGINE                             │
│                                                                           │
│  ┌─────────────────────────────────────────────────────────────────────┐  │
│  │              CANDIDATE PROVIDERS (Example)                          │  │
│  │                                                                     │  │
│  │  A: deepseek-chat    │ B: deepseek-chat    │ C: deepseek-chat │     │  │
│  │  • Price: $2.50/M    │ • Price: $3.00/M    │ • Price: $2.00/M       │  │
│  │  • Latency: 450ms    │ • Latency: 600ms    │ • Latency: 800ms       │  │
│  │  • Success: 98%      │ • Success: 97%      │ • Success: 95%         │  │
│  │  • Quality: 0.92     │ • Quality: 0.88     │ • Quality: 0.85        │  │
│  │  • Circuit: Closed   │ • Circuit: Closed   │ • Circuit: Half-Open   │  │
│  └─────────────────────────────────────────────────────────────────────┘  │
│                             │                                             │
│                             ▼                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐  │
│  │                    SCORING PROCESS                                  │  │
│  │                                                                     │  │
│  │  Strategy: "Performance"                                            │  │
│  │                                                                     │  │
│  │  Provider A Score = 0.98×0.4 + latency_score×0.3 + 0.92×0.1         │  │
│  │                   = 0.392 + 0.165 + 0.092 = 0.649                   │  │
│  │                                                                     │  │
│  │  Provider B Score = 0.97×0.4 + latency_score×0.3 + 0.88×0.1         │  │
│  │                   = 0.388 + 0.140 + 0.088 = 0.616                   │  │
│  │                                                                     │  │
│  │  Provider C Score = 0.95×0.4 + latency_score×0.3 + 0.85×0.1         │  │
│  │                   = 0.380 + 0.120 + 0.085 = 0.585                   │  │
│  │                   (Circuit Half-Open: Lower priority)               │  │
│  └─────────────────────────────────────────────────────────────────────┘  │
│                             │                                             │
│                             ▼                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐  │
│  │                   SELECTION RESULT                                  │  │
│  │                                                                     │  │
│  │              WINNER: Provider A (Score: 0.649)                      │  │          
│  │              Fallback: Provider B (Score: 0.616)                    │  │
│  │              Fallback: Provider C (Score: 0.585)                    │  │          
│  └─────────────────────────────────────────────────────────────────────┘  │
└───────────────────────────────────────────────────────────────────────────┘
                             │
                             ▼
┌────────────────────────────────────────────────────────────────────────────┐
│                    CIRCUIT BREAKER CHECK                                   │
│               (backend/src/services/circuit_breaker.rs)                    │
│                                                                            │
│     should_allow_request(Provider A, "deepseek-chat", channel_id) ?        │
│                                                                            │
│     Circuit: CLOSED → Allow Request                                        │
│     Circuit: OPEN   → Try Fallback Provider B                              │
│     Circuit: HALF_OPEN → Allow (limited)                                   │
└────────────────────────────┬───────────────────────────────────────────────┘
                             │
                             ▼
┌────────────────────────────────────────────────────────────────────────────┐
│                    EXECUTE REQUEST                                         │
│                                                                            │
│  Channel ID: 23                                                            │
│  Provider: Provider A                                                      │
│  Base URL: https://api.provider-a.com/v1                                   │
│  API Key: [encrypted]                                                      │
│                                                                            │
│  ┌──────────────────────────────────────────────────────────────────────┐  │
│  │  Forward Request to Provider                                         │  │
│  │  ├─► Add API key authentication                                      │  │
│  │  ├─► Transform request format                                        │  │
│  │  ├─► Handle streaming/non-streaming                                  │  │
│  │  └─► Track latency & tokens                                          │  │
│  └──────────────────────────────────────────────────────────────────────┘  │
└────────────────────────────┬───────────────────────────────────────────────┘
                             │
                   ┌─────────┴─────────┐
                   │                   │
                ✅ SUCCES         ❌ FAILURE
                   │                   │
                   ▼                   ▼
┌────────────────────────────┐  ┌────────────────────────────┐
│  Record Success Metrics    │  │  Record Failure Metrics    │
│  • Latency: 450ms          │  │  • Increment failure count │
│  • Tokens: 1500 + 300      │  │  • Update circuit state    │
│  • Update quality score    │  │  • Try fallback provider   │
│  • Circuit: record_success │  │  • Circuit: record_failure │
└────────────────────────────┘  └────────────────────────────┘
                   │                   │
                   └─────────┬─────────┘
                             ▼
┌────────────────────────────────────────────────────────────────────────────┐
│                    METRICS UPDATE PIPELINE                                 │
│               (backend/src/services/provider_metrics.rs)                   │
│                                                                            │
│  Step 1: Memory Buffer (Immediate)                                         │
│  ┌──────────────────────────────────────────────────────────────────────┐  │
│  │ METRICS_BUFFER (In-Memory HashMap)                                   │  │
│  │ Key: (provider_id=5, model_id="deepseek-chat", channel_id=23)        │  │
│  │ Value: [ {latency: 450, success: true, tokens: ...}, ... ]           │  │
│  └──────────────────────────────────────────────────────────────────────┘  │
│                             │                                              │
│  Step 2: Periodic Aggregation (Every 60s)                                  │
│  ┌──────────────────────────────────────────────────────────────────────┐  │
│  │ Aggregate metrics from buffer                                        │  │
│  │ • Calculate avg, p50, p95, p99 latency                               │  │
│  │ • Calculate success rate                                             │  │
│  │ • Sum token counts                                                   │  │
│  │ • Compute quality score                                              │  │
│  └──────────────────────────────────────────────────────────────────────┘  │
│                             │                                              │
│  Step 3: Database Flush (Batch)                                            │
│  ┌──────────────────────────────────────────────────────────────────────┐  │
│  │ UPDATE provider_model_metrics SET                                    │  │
│  │   total_requests = total_requests + 1,                               │  │
│  │   avg_latency_ms = (avg_latency_ms * 0.9 + 450 * 0.1),               │  │
│  │   quality_score = calculate_provider_quality_score(...),             │  │
│  │   circuit_state = ...                                                │  │
│  │ WHERE provider_id=5 AND model_id='deepseek-chat' AND channel_id=23   │  │
│  └──────────────────────────────────────────────────────────────────────┘  │
└────────────────────────────────────────────────────────────────────────────┘
                             │
                             ▼
┌────────────────────────────────────────────────────────────────────────────┐
│                    RETURN RESPONSE TO CLIENT                               │
│                                                                            │
│  HTTP 200 OK                                                               │
│  {                                                                         │
│    "id": "chatcmpl-...",                                                   │
│    "model": "deepseek-chat",                                               │
│    "choices": [...],                                                       │
│    "usage": { "prompt_tokens": 1500, "completion_tokens": 300 }            │
│  }                                                                         │
└────────────────────────────────────────────────────────────────────────────┘

2. Circuit Breaker State Machine

                    ╔═══════════════════════════════╗
                    ║    CIRCUIT STATE MACHINE      ║
                    ╚═══════════════════════════════╝

┌──────────────────────────────────────────────────────────────────────┐
│                                                                      │
│     ┌──────────────────────────────────────────────────────────┐     │
│     │                    CLOSED STATE                          │     │
│     │                 (Normal Operation)                       │     │
│     │  • All requests allowed                                  │     │
│     │  • failure_count = 0                                     │     │
│     │  • Tracking consecutive failures                         │     │
│     └──────────────┬────────────────────────────────────┬──────┘     │
│                    │                                    │            │
│        Success     │                         Failure    │            │
│        (reset      │                         (increment)│            │
│        counter)    │                                    │            │
│                    │                                    │            │
│                    │         ┌──────────────────────────┘            │
│                    │         │                                       │
│                    │         │ 5 consecutive failures                │
│                    │         │ (threshold reached)                   │
│                    │         ▼                                       │
│     ┌──────────────┴────────────────────────────────────────────┐    │
│     │                     OPEN STATE                            │    │
│     │                 (Blocking Requests)                       │    │
│     │  • All requests BLOCKED                                   │    │
│     │  • opened_at = current_timestamp                          │    │
│     │  • Return error / try fallback                            │    │
│     │  • Wait for recovery_timeout (60 seconds)                 │    │
│     └──────────────┬────────────────────────────────────────────┘    │
│                    │                                                 │
│                    │ Wait 60 seconds                                 │
│                    │ (recovery_timeout expired)                      │
│                    │                                                 │
│                    ▼                                                 │
│     ┌───────────────────────────────────────────────────────────┐    │
│     │                  HALF-OPEN STATE                          │    │
│     │                  (Testing Recovery)                       │    │
│     │  • Limited requests allowed (max 3)                       │    │
│     │  • half_open_requests = 0                                 │    │
│     │  • Testing if provider recovered                          │    │
│     └──────────────┬─────────────────────────────┬──────────────┘    │
│                    │                             │                   │
│        Success     │                  Failure    │                   │
│        (3 times)   │                  (any)      │                   │
│                    │                             │                   │
│                    ▼                             ▼                   │
│     ┌──────────────────────────┐   ┌───────────────────────────┐     │
│     │   Back to CLOSED         │   │   Back to OPEN            │     │
│     │   (Provider recovered)    │   │   (Still failing)        │     │
│     │   • Reset counters        │   │   • Reset timeout        │     │
│     │   • Full traffic resume   │   │   • Wait another 60s     │     │
│     └──────────────────────────┘   └───────────────────────────┘     │
│                                                                      │
└──────────────────────────────────────────────────────────────────────┘

Configuration:
• failure_threshold = 5          (failures to open circuit)
• success_threshold = 3          (successes to close circuit)
• recovery_timeout = 60 seconds  (wait before testing)
• half_open_max_requests = 3     (test request limit)

3. Scoring Algorithm Comparison

╔════════════════════════════════════════════════════════════════════════╗
║                 ROUTING STRATEGY SCORING ALGORITHMS                    ║
╚════════════════════════════════════════════════════════════════════════╝

┌──────────────────────────────────────────────────────────────────────────┐
│                        PERFORMANCE STRATEGY                              │
│  Goal: Maximize speed and reliability                                    │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Formula:                                                                │
│  ┌─────────────────────────────────────────────────────────────────┐     │
│  │ score = success_rate × 0.4                                      │     │
│  │       + latency_score × 0.3                                     │     │
│  │       + quality_score × 0.1                                     │     │
│  │       + priority_bonus                                          │     │
│  └─────────────────────────────────────────────────────────────────┘     │
│                                                                          │
│  Where:                                                                  │
│  • success_rate: 0.0 - 1.0 (higher is better)                            │
│  • latency_score = 1.0 - (latency_ms / 30000) (lower latency = higher)   │
│  • quality_score: historical quality metric (0.0 - 1.0)                  │
│  • priority_bonus: channel_priority / 100 (max 0.2)                      │
│                                                                          │
│  Example:                                                                │
│  Provider with 98% success, 450ms latency, quality 0.92, priority 10     │
│  score = 0.98×0.4 + (1-450/30000)×0.3 + 0.92×0.1 + 0.1                   │
│        = 0.392 + 0.296 + 0.092 + 0.1 = 0.880                             │
└──────────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────────┐
│                           COST STRATEGY                                  │
│  Goal: Minimize cost while maintaining quality                           │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Formula:                                                                │
│  ┌─────────────────────────────────────────────────────────────────┐     │
│  │ avg_price = (prompt_price + completion_price) / 2               │     │
│  │ price_score = 1.0 - (avg_price / 100.0)                         │     │
│  │ score = price_score × 0.6                                       │     │
│  │       + success_rate × 0.3                                      │     │
│  │       + quality_score × 0.1                                     │     │
│  └─────────────────────────────────────────────────────────────────┘     │
│                                                                          │
│  Where:                                                                  │
│  • prompt_price: $ per million prompt tokens                             │
│  • completion_price: $ per million completion tokens                     │
│  • price_score: normalized inverse price (cheaper = higher)              │
│                                                                          │
│  Example:                                                                │
│  Provider with $2.50 prompt, $10.00 completion, 97% success, quality 0.9 │
│  avg_price = (2.50 + 10.00) / 2 = $6.25                                  │
│  price_score = 1.0 - (6.25 / 100) = 0.9375                               │
│  score = 0.9375×0.6 + 0.97×0.3 + 0.9×0.1 = 0.944                         │
└──────────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────────┐
│                         BALANCED STRATEGY                                │
│  Goal: Optimize all factors with configurable weights                    │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Formula:                                                                │
│  ┌─────────────────────────────────────────────────────────────────┐     │
│  │ perf_score = performance_score(candidate, config)               │     │
│  │ cost_score = cost_score(candidate, config)                      │     │
│  │                                                                 │     │
│  │ total_weight = latency_w + success_w + price_w + priority_w     │     │
│  │ perf_weight = (latency_w + success_w) / total_weight            │     │
│  │ cost_weight = price_w / total_weight                            │     │
│  │                                                                 │     │
│  │ score = perf_score × perf_weight + cost_score × cost_weight     │     │
│  └─────────────────────────────────────────────────────────────────┘     │
│                                                                          │
│  Default weights (can be configured per model):                          │
│  • latency_weight: 0.3                                                   │
│  • success_rate_weight: 0.4                                              │
│  • price_weight: 0.2                                                     │
│  • provider_priority_weight: 0.1                                         │
│                                                                          │
│  Example:                                                                │
│  Using default weights:                                                  │
│  perf_weight = (0.3 + 0.4) / 1.0 = 0.7                                   │
│  cost_weight = 0.2 / 1.0 = 0.2                                           │
│  score = 0.880×0.7 + 0.944×0.2 = 0.616 + 0.189 = 0.805                   │
└──────────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────────┐
│                       ROUND-ROBIN STRATEGY                               │
│  Goal: Equal distribution across all providers                           │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Formula:                                                                │
│  ┌─────────────────────────────────────────────────────────────────┐     │
│  │ All candidates receive equal score = 1.0                        │     │
│  │ Selection: index = counter % provider_count                     │     │
│  │ counter = (counter + 1) % usize::MAX                            │     │
│  └─────────────────────────────────────────────────────────────────┘     │
│                                                                          │
│  Behavior:                                                               │
│  Request 1 → Provider A                                                  │
│  Request 2 → Provider B                                                  │
│  Request 3 → Provider C                                                  │
│  Request 4 → Provider A (cycle repeats)                                  │
│                                                                          │
│  Note: No performance consideration, purely sequential distribution      │
└──────────────────────────────────────────────────────────────────────────┘

4. Data Flow Timeline

Time  │ Component                │ Action
──────┼─────────────────────────┼────────────────────────────────────────────
0ms   │ Client                  │ POST /v1/chat/completions
      │                         │ { "model": "deepseek-chat", "messages": [...] }
──────┼─────────────────────────┼────────────────────────────────────────────
1ms   │ Relay Handler           │ Validate request, extract model_id
      │                         │ Check authentication & rate limits
──────┼─────────────────────────┼────────────────────────────────────────────
2ms   │ Channel Service         │ Call get_routed_channel("deepseek-chat", user_id)
      │                         │ Try provider model routing first
──────┼─────────────────────────┼────────────────────────────────────────────
3ms   │ Model Routing Service   │ Query provider_models table
      │                         │ SELECT * FROM provider_models WHERE model_id='deepseek-chat'
      │                         │ JOIN provider_model_metrics
      │                         │ Found 3 candidates
──────┼─────────────────────────┼────────────────────────────────────────────
4ms   │ Model Routing Service   │ Load user preferences (if exists)
      │                         │ SELECT * FROM user_routing_preferences WHERE user_id=...
──────┼─────────────────────────┼────────────────────────────────────────────
5ms   │ Model Routing Service   │ Load model routing config
      │                         │ SELECT * FROM model_routing_config WHERE model_id='deepseek-chat'
──────┼─────────────────────────┼────────────────────────────────────────────
6ms   │ Model Routing Service   │ Score 3 candidates using "balanced" strategy
      │                         │ Provider A: 0.880
      │                         │ Provider B: 0.805
      │                         │ Provider C: 0.750
──────┼─────────────────────────┼────────────────────────────────────────────
7ms   │ Model Routing Service   │ Apply user preferences filters
      │                         │ Boost preferred providers (+50%)
      │                         │ Remove blocked providers
──────┼─────────────────────────┼────────────────────────────────────────────
8ms   │ Model Routing Service   │ Weighted random selection from top 3
      │                         │ Selected: Provider A (channel_id=23)
      │                         │ Fallbacks: [Provider B, Provider C]
──────┼─────────────────────────┼────────────────────────────────────────────
9ms   │ Circuit Breaker         │ Check should_allow_request(Provider A, "deepseek-chat", 23)
      │                         │ Circuit state: CLOSED
      │                         │ Allow request
──────┼─────────────────────────┼────────────────────────────────────────────
10ms  │ Relay Handler           │ Get channel details from channels table
      │                         │ Channel 23: base_url, api_key
──────┼─────────────────────────┼────────────────────────────────────────────
11ms  │ Routing Decision Log    │ Async: INSERT INTO routing_decision_logs
      │                         │ (non-blocking, happens in background)
──────┼─────────────────────────┼────────────────────────────────────────────
12ms  │ Relay Handler           │ Transform request for provider API
      │                         │ Add Authorization: Bearer [api_key]
      │                         │ Adjust model name if needed
──────┼─────────────────────────┼────────────────────────────────────────────
15ms  │ HTTP Client             │ POST https://api.provider-a.com/v1/chat/completions
      │                         │ Start latency timer
──────┼─────────────────────────┼────────────────────────────────────────────
...   │ Provider A              │ Processing request...
──────┼─────────────────────────┼────────────────────────────────────────────
465ms │ HTTP Client             │ Response received from Provider A
      │                         │ Status: 200 OK
      │                         │ Latency: 450ms (15ms → 465ms)
──────┼─────────────────────────┼────────────────────────────────────────────
466ms │ Relay Handler           │ Parse response
      │                         │ Extract usage: prompt_tokens=1500, completion_tokens=300
──────┼─────────────────────────┼────────────────────────────────────────────
467ms │ Provider Metrics        │ record_request(provider_id=5, model="deepseek-chat", 
      │                         │   channel_id=23, latency=450, success=true,
      │                         │   prompt_tokens=1500, completion_tokens=300)
      │                         │ → Stored in memory buffer (non-blocking)
──────┼─────────────────────────┼────────────────────────────────────────────
468ms │ Circuit Breaker         │ record_success(provider_id=5, model="deepseek-chat", 
      │                         │   channel_id=23)
      │                         │ → success_count++, failure_count=0
──────┼─────────────────────────┼────────────────────────────────────────────
469ms │ Billing Service         │ post_consume_quota() (async, non-blocking)
      │                         │ Deduct quota from user balance
──────┼─────────────────────────┼────────────────────────────────────────────
470ms │ Relay Handler           │ Return response to client
      │                         │ HTTP 200 OK with completion
──────┼─────────────────────────┼────────────────────────────────────────────

Background Tasks (runs every 60 seconds):
──────┼─────────────────────────┼────────────────────────────────────────────
60s   │ Metrics Aggregator      │ Aggregate metrics from memory buffer
      │                         │ Calculate avg, p50, p95, p99 latency
      │                         │ Calculate success rate for last hour
──────┼─────────────────────────┼────────────────────────────────────────────
61s   │ Metrics Aggregator      │ Batch update to provider_model_metrics table
      │                         │ UPDATE provider_model_metrics SET ...
      │                         │ Recalculate quality scores
──────┼─────────────────────────┼────────────────────────────────────────────
62s   │ Circuit Breaker         │ recover_circuit_breakers()
      │                         │ Check if any OPEN circuits can move to HALF_OPEN
──────┼─────────────────────────┼────────────────────────────────────────────

5. Database Schema Relationships

┌─────────────────────────────────────────────────────────────────────────┐
│                     DATABASE SCHEMA RELATIONSHIPS                       │
└─────────────────────────────────────────────────────────────────────────┘

                              ┌──────────────┐
                              │    users     │
                              │ (providers)  │
                              ├──────────────┤
                              │ id (PK)      │◄────────┐
                              │ username     │         │
                              │ is_provider  │         │ provider_id (FK)
                              │ provider_    │         │
                              │   status     │         │
                              └──────┬───────┘         │
                                     │                 │
                  ┌──────────────────┼─────────────────┼──────────────┐
                  │                  │                 │              │
                  │ provider_id (FK) │                 │              │
                  ▼                  ▼                 │              │
       ┌──────────────────┐  ┌──────────────┐          │              │
       │  provider_       │  │  channels    │          │              │
       │    models        │  ├──────────────┤          │              │
       ├──────────────────┤  │ id (PK)      │◄────┐    │              │
       │ id (PK)          │  │ provider_id  │     │    │              │
       │ model_id         │  │   (FK)       │     │    │              │
       │ provider_id (FK) ├─►│ base_url     │     │    │              │
       │ channel_id (FK)  ├──┤ key          │     │    │              │
       │ model_name       │  │ status       │     │    │              │
       │ pricing_prompt   │  └──────────────┘     │    │              │
       │ pricing_         │                       │    │              │
       │   completion     │                       │    │              │
       │ context_length   │                       │    │              │
       │ status           │    channel_id (FK)    │    │              │
       │ quality_score    │         │             │    │              │
       └──────┬───────────┘         │             │    │              │
              │                     │             │    │              │
              │ (provider_id,       │             │    │              │
              │  model_id,          │             │    │              │
              │  channel_id)        │             │    │              │
              │                     │             │    │              │
              ▼                     │             │    │              │
       ┌──────────────────┐         │             │    │              │
       │  provider_model_ │         │             │    │              │
       │    metrics       │◄────────┘             │    │              │
       ├──────────────────┤                       │    │              │
       │ id (PK)          │                       │    │              │
       │ provider_id (FK) ├───────────────────────┘    │              │
       │ model_id         │                            │              │
       │ channel_id (FK)  ├────────────────────────────┘              │
       │ total_requests   │                                           │
       │ success_rate_    │                                           │
       │   last_hour      │                                           │
       │ avg_latency_ms   │                                           │
       │ quality_score    │                                           │
       │ circuit_state    │                                           │
       └──────────────────┘                                           │
                                                                      │
              ┌───────────────────────────────────────────────────────┘
              │
              │ user_id (FK)
              ▼
       ┌──────────────────┐
       │  user_routing_   │
       │    preferences   │
       ├──────────────────┤
       │ id (PK)          │
       │ user_id (FK)     │
       │ default_strategy │
       │ preferred_       │
       │   providers      │
       │ blocked_         │
       │   providers      │
       │ max_price        │
       │ min_success_rate │
       └──────────────────┘

       ┌──────────────────┐
       │  model_routing_  │
       │     config       │
       ├──────────────────┤
       │ id (PK)          │
       │ canonical_       │
       │   model_id       │
       │ latency_weight   │
       │ success_rate_    │
       │   weight         │
       │ price_weight     │
       │ default_strategy │
       └──────────────────┘

       ┌──────────────────┐
       │  routing_        │
       │    decision_logs │
       ├──────────────────┤
       │ id (PK)          │
       │ request_id       │
       │ user_id          │
       │ model_id         │
       │ selected_        │
       │   provider_id    │
       │ selected_        │
       │   channel_id     │
       │ routing_strategy │
       │ routing_reason   │
       │ candidates_json  │
       │ created_at       │
       └──────────────────┘

Legend:
PK = Primary Key
FK = Foreign Key

6. Advantages Visualization

╔══════════════════════════════════════════════════════════════════════════╗
║               KEY ADVANTAGES OF PROVIDER ROUTING SYSTEM                  ║
╚══════════════════════════════════════════════════════════════════════════╝

┌────────────────────────────────────────────────────────────────────────┐
│ 1. INTELLIGENT SELECTION                                               │
├────────────────────────────────────────────────────────────────────────┤
│                                                                        │
│   Traditional:                    Provider Routing:                    │
│   ┌──────────┐                   ┌──────────┐                          │
│   │ Request  │                   │ Request  │                          │
│   └────┬─────┘                   └────┬─────┘                          │
│        │                              │                                │
│        │ Fixed                        │ Intelligent                    │
│        │ Config                       │ Selection                      │
│        ▼                              ▼                                │
│   ┌──────────┐                   ┌──────────┐                          │
│   │ Channel  │                   │ Best     │ ← Based on:              │
│   │ (static) │                   │ Provider │   • Performance          │
│   └──────────┘                   └──────────┘   • Cost                 │
│                                                  • User prefs          │
│                                                  • Real-time metrics   │
│   Result: Fixed,                 Result: Dynamic,                      │
│           no optimization                always optimized              │
└────────────────────────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────────────────────────┐
│ 2. HIGH AVAILABILITY                                                   │
├────────────────────────────────────────────────────────────────────────┤
│                                                                        │
│   Without Routing:               With Provider Routing:                │
│   ┌──────────┐                   ┌──────────┐                          │
│   │ Provider │                   │ Provider │                          │
│   │    A     │ ← Request         │    A     │ ← Request                │
│   └────┬─────┘                   └────┬─────┘                          │
│        │                              │                                │
│        │ FAILS                        │ FAILS                          │
│        ▼                              ▼                                │
│   ┌──────────┐                   ┌──────────┐                          │
│   │  ERROR   │                   │ Circuit  │                          │
│   │ RETURNED │                   │ Breaker  │                          │
│   └──────────┘                   │  Opens   │                          │
│                                  └────┬─────┘                          │
│   User sees error                     │ Auto                           │
│                                       │ Fallback                       │
│                                       ▼                                │
│                                  ┌──────────┐                          │
│                                  │ Provider │                          │
│                                  │    B     │ ← Retry                  │
│                                  └────┬─────┘                          │
│                                       │                                │
│                                       │ SUCCESS                        │
│                                       ▼                                │
│                                  ┌──────────┐                          │
│                                  │ Response │                          │
│                                  │ Returned │                          │
│                                  └──────────┘                          │
│                                                                        │
│   Uptime: ~99.5%                 Uptime: ~99.99%                       │
└────────────────────────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────────────────────────┐
│ 3. COST OPTIMIZATION                                                   │
├────────────────────────────────────────────────────────────────────────┤
│                                                                        │
│   Fixed Provider:                Provider Routing (Cost Strategy):     │
│                                                                        │
│   Provider A: $10/M              Provider A: $10/M → Score: 0.70       │
│   (only option)                  Provider B: $5/M  → Score: 0.85       │
│                                  Provider C: $12/M → Score: 0.65       │
│   1M tokens = $10                                                      │
│   10M tokens = $100              Provider B selected (cheapest)        │
│   100M tokens = $1,000           1M tokens = $5                        │
│                                  10M tokens = $50                      │
│                                  100M tokens = $500                    │
│                                                                        │
│   Monthly cost: $1,000           Monthly cost: $500                    │
│                                  SAVINGS: 50%                          │
└────────────────────────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────────────────────────┐
│ 4. PERFORMANCE TRACKING                                                │
├────────────────────────────────────────────────────────────────────────┤
│                                                                        │
│   No Metrics:                    Provider Routing Metrics:             │
│   • Unknown performance          • Real-time latency (avg, p50-p99)    │
│   • No visibility                • Success rate (hourly, daily)        │
│   • Can't optimize               • Quality score (0.0-1.0)             │
│   • Blind to issues              • Circuit breaker state               │
│                                  • Token throughput                    │
│                                  • Historical trends                   │
│                                                                        │
│   Dashboard:                     Dashboard:                            │
│   ┌──────────────┐               ┌──────────────────────────────────┐  │
│   │              │               │ Provider A: 450ms avg, 98%       │  │
│   │   No Data    │               │ Provider B: 650ms avg, 97%       │  │
│   │              │               │ Provider C: 900ms avg, 92%       │  │
│   │              │               │                                  │  │
│   └──────────────┘               │ Trending: Provider A improving   │  │
│                                  │ Alert: Provider C degraded       │  │
│                                  └──────────────────────────────────┘  │
│                                                                        │
│   Result: Reactive               Result: Proactive                     │
└────────────────────────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────────────────────────┐
│ 5. USER EMPOWERMENT                                                    │
├────────────────────────────────────────────────────────────────────────┤
│                                                                        │
│   Fixed Config:                  User Preferences:                     │
│   • No control                   • Choose strategy (perf/cost/balanced)│
│   • One size fits all            • Set preferred providers             │
│   • Can't avoid bad providers    • Block problematic providers         │
│   • No budget control            • Set price limits ($X per M tokens)  │
│                                  • Set quality thresholds              │
│                                  • Per-request overrides               │
│                                                                        │
│   User A (needs speed):          User A preferences:                   │
│   ┌────────────────┐             ┌──────────────────────────────────┐  │
│   │ Gets random    │             │ strategy: "performance"          │  │
│   │ slow provider  │             │ min_success_rate: 0.99           │  │
│   │ Frustrated     │             │ max_latency_ms: 1000             │  │
│   └────────────────┘             └──────────────────────────────────┘  │
│                                  → Gets fastest, most reliable         │
│   User B (budget-conscious):     User B preferences:                   │
│   ┌────────────────┐             ┌──────────────────────────────────┐  │
│   │ Pays high      │             │ strategy: "cost"                 │  │
│   │ prices         │             │ max_price: 7.0                   │  │
│   │ Expensive      │             │ min_success_rate: 0.95           │  │
│   └────────────────┘             └──────────────────────────────────┘  │
│                                  → Gets cheapest within budget         │
└────────────────────────────────────────────────────────────────────────┘

7. Comparison: Before vs After

╔══════════════════════════════════════════════════════════════════════════╗
║          BEFORE PROVIDER ROUTING   vs   AFTER PROVIDER ROUTING           ║
╚══════════════════════════════════════════════════════════════════════════╝

┌───────────────────────────────────────────────────────────────────────────┐
│ METRIC                │ BEFORE              │ AFTER                       │
├───────────────────────┼─────────────────────┼─────────────────────────────┤
│ Provider Selection    │ Manual/Random       │ Intelligent (metrics-based) │
│ Optimization          │ None                │ Real-time, multi-dimensional│
│ Availability          │ ~99.5%              │ ~99.99%                     │
│ Cost Optimization     │ No                  │ Yes (up to 50% savings)     │
│ Failover Time         │ Manual (minutes)    │ Automatic (milliseconds)    │
│ Performance Tracking  │ None                │ Comprehensive               │
│ User Control          │ None                │ Full (preferences)          │
│ Provider Diversity    │ Limited             │ Multiple per model          │
│ Quality Assurance     │ Manual              │ Automated (circuit breaker) │
│ Audit Trail           │ None                │ Complete logging            │
│ Admin Visibility      │ None                │ Full dashboard              │
│ Scalability           │ Limited             │ Highly scalable             │
└───────────────────────────────────────────────────────────────────────────┘

BEFORE: Simple but Inflexible
┌──────────────────────────────────────────────────────────────────────┐
│  Request → Channel (fixed) → Provider → Response                     │
│                                                                      │
│  Problems:                                                           │
│  • Single point of failure                                           │
│  • No optimization                                                   │
│  • No visibility                                                     │
│  • Manual intervention needed                                        │
└──────────────────────────────────────────────────────────────────────┘

AFTER: Intelligent & Resilient
┌──────────────────────────────────────────────────────────────────────┐
│  Request → Routing Engine → Best Provider (scored) → Response        │
│             ↓                      ↓                                 │
│          Metrics              Circuit Breaker                        │
│          User Prefs           Fallback Chain                         │
│          Config               Quality Tracking                       │
│                                                                      │
│  Benefits:                                                           │
│   Automatic failover                                                 │
│   Cost & performance optimization                                    │
│   Complete visibility                                                │
│   Self-healing system                                                │
└──────────────────────────────────────────────────────────────────────┘

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md

knoxchat/labs

Folders and files

Latest commit

History

Repository files navigation

Model Routing System

System Architecture Overview

Core Components Structure

Database Schema Architecture

1. provider_models (Source of Truth)

2. provider_model_metrics (Real-time Performance)

3. model_routing_config (Per-Model Configuration)

4. user_routing_preferences (User Preferences)

5. routing_decision_logs (Audit Trail)

Detailed Process Flow

Phase 1: Request Initiation

Phase 2: Provider Discovery

Phase 3: Configuration Loading

Phase 4: Intelligent Scoring

Strategy: Performance

Strategy: Cost

Strategy: Balanced

Phase 5: Provider Selection

Phase 6: Circuit Breaker Check

Phase 7: Request Execution

Phase 8: Metrics Recording

Routing Strategies Explained

1. Performance Strategy

2. Cost Strategy

3. Balanced Strategy (Default)

4. Round-Robin Strategy

Key Advantages

1. Intelligent Provider Selection

2. High Availability & Fault Tolerance

3. Cost Optimization

4. Performance Tracking

5. User Empowerment

6. Provider Ecosystem Benefits

7. Operational Excellence

8. Scalability

9. Developer Experience

10. Business Intelligence

Performance Characteristics

Routing Decision Speed

Metrics Update

Database Queries

Security & Isolation

Data Isolation

Access Control

API Key Management

Future Enhancements

Planned Improvements

Configuration Examples

Example 1: High-Performance Setup

Example 2: Cost-Optimized Setup

Example 3: User Cost Control

Provider Routing System

1. High-Level System Architecture

2. Circuit Breaker State Machine

3. Scoring Algorithm Comparison

4. Data Flow Timeline

5. Database Schema Relationships

6. Advantages Visualization

7. Comparison: Before vs After

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages