Skip to content

knoxchat/labs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

Model Routing System

System Architecture Overview

The Provider Model Routing System is an intelligent, multi-layered routing infrastructure that enables multiple providers to offer the same AI models while automatically selecting the optimal provider based on real-time metrics, user preferences, and system health.

Core Components Structure

┌─────────────────────────────────────────────────────────────────┐
│                    USER REQUEST (Model ID)                      │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│              RELAY HANDLER                                      │
│  • Request validation                                           │
│  • Channel selection with routing                               │
│  • Circuit breaker checking                                     │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│         CHANNEL SERVICE                                         │
│  get_routed_channel()                                           │
│  ├── Try Provider Model Routing (STEP 1)                        │
│  └── Fallback to Legacy Channel Routing (STEP 2)                │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│      MODEL ROUTING SERVICE                                      │
│                                                                 │
│  route_request()                                                │
│  ├─► 1. get_model_providers()      [Provider Discovery]         │
│  ├─► 2. load_user_preferences()    [User Prefs Loading]         │
│  ├─► 3. load_routing_config()      [Model Config Loading]       │
│  ├─► 4. score_providers()          [Intelligent Scoring]        │
│  └─► 5. select_provider()          [Final Selection]            │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│           ROUTING DECISION                                      │
│  • Selected Provider + Channel                                  │
│  • Fallback Providers (ordered)                                 │
│  • Routing Reason & Score                                       │
│  • Strategy Used                                                │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│      CIRCUIT BREAKER CHECK                                      │
│  should_allow_request()                                         │
│  • Closed → Allow                                               │
│  • Open → Try Fallback                                          │
│  • Half-Open → Limited Allow                                    │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│              EXECUTE REQUEST ON SELECTED CHANNEL                │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌──────────────────────────────────────────────────────────────────┐
│        METRICS RECORDING                                         │
│  record_request()                                                │
│  • Latency tracking                                              │
│  • Success/failure counting                                      │
│  • Token usage                                                   │
│  • Quality score calculation                                     │
│  • Circuit breaker state update                                  │
└──────────────────────────────────────────────────────────────────┘

Database Schema Architecture

1. provider_models (Source of Truth)

Provider-submitted model definitions

├── Model Info: model_id, model_name, description
├── Provider: provider_id, provider_name, channel_id
├── Pricing: pricing_prompt, pricing_completion, pricing_image
├── Specs: context_length, modality, supported_parameters
└── Status: status (0=pending, 1=approved, 2=rejected)

2. provider_model_metrics (Real-time Performance)

Live performance tracking per provider-model-channel

├── Cumulative: total_requests, successful_requests, failed_requests
├── Latency: avg/p50/p95/p99/min/max_latency_ms
├── Time Windows: last_hour, last_24h metrics
├── Circuit Breaker: circuit_state, consecutive_failures/successes
├── Quality: quality_score (0.0-1.0)
└── Token Throughput: total tokens, avg_tokens_per_second

3. model_routing_config (Per-Model Configuration)

Admin-configurable routing rules per model

├── Weights: latency_weight, success_rate_weight, price_weight
├── Strategy: default_strategy (performance/cost/balanced/round_robin)
├── Fallback: enable_auto_fallback, max_fallback_attempts
└── Circuit: failure_threshold, recovery_timeout_seconds

4. user_routing_preferences (User Preferences)

Per-user routing customization

├── Strategy: default_strategy
├── Providers: preferred_providers[], blocked_providers[]
├── Limits: max_price_per_million_tokens, min_success_rate, max_latency_ms
└── Requirements: require_streaming, require_function_calling

5. routing_decision_logs (Audit Trail)

Complete history of routing decisions

├── Decision: selected_provider_id, routing_strategy, routing_reason
├── Candidates: candidates_count, candidates_json
├── Fallback: fallback_providers[], is_fallback_request
└── Performance: routing_duration_us

Detailed Process Flow

Phase 1: Request Initiation

User/API Request
     │
     ├─► Model ID: "deepseek-chat"
     ├─► User ID: 12345
     └─► Optional: RoutingPreferences { strategy: "performance" }

Phase 2: Provider Discovery

SELECT provider_id, channel_id, provider_name, pricing, metrics, quality_score
FROM provider_models pm
LEFT JOIN provider_model_metrics pmm ON (...)
LEFT JOIN channels c ON pm.channel_id = c.id
WHERE pm.model_id = 'deepseek-chat' AND pm.status = 1
ORDER BY quality_score DESC

Output: List of ProviderCandidate structs

ProviderCandidate {
    provider_id: 5,
    channel_id: 23,
    provider_name: "Provider A",
    price_per_million_prompt: 2.50,
    price_per_million_completion: 10.00,
    success_rate: 0.98,
    avg_latency_ms: 450,
    quality_score: 0.92,
    circuit_state: Closed,
}

Phase 3: Configuration Loading

Model Config:

RoutingConfig {
    canonical_model_id: "deepseek-chat",
    latency_weight: 0.3,
    success_rate_weight: 0.4,
    price_weight: 0.2,
    provider_priority_weight: 0.1,
    default_strategy: "balanced",
}

User Preferences (merged with request prefs):

RoutingPreferences {
    strategy: Performance,
    prefer_providers: [5, 8],
    avoid_providers: [3],
    max_price: Some(15.0),
    min_success_rate: Some(0.95),
}

Phase 4: Intelligent Scoring

Strategy: Performance

score = success_rate * 0.4 + latency_score * 0.3 + quality_score * 0.1 + priority_bonus

Strategy: Cost

price_score = 1.0 - (avg_price / 100.0)
score = price_score * 0.6 + success_rate * 0.3 + quality_score * 0.1

Strategy: Balanced

perf_score = performance_score(candidate)
cost_score = cost_score(candidate)
score = perf_score * perf_weight + cost_score * cost_weight

Phase 5: Provider Selection

  1. Filter by preferences:

    • Remove avoided providers
    • Check max_price threshold
    • Check min_success_rate
    • Check max_latency_ms
  2. Boost preferred providers:

    • Apply 50% score boost to preferred providers
  3. Weighted random selection:

    • Sort by score (descending)
    • Take top 3 candidates
    • Weighted random selection (prevents provider starvation)
  4. Prepare fallback chain:

    • Remaining candidates become fallback providers (up to 3)

Phase 6: Circuit Breaker Check

┌──────────────────────────────────────────────┐
│         Circuit State Machine                │
├──────────────────────────────────────────────┤
│                                              │
│  CLOSED ──────────────► OPEN                 │
│    ▲        (5 failures)    │                │
│    │                        │                │
│    │                    (60s timeout)        │
│    │                        │                │
│    │                        ▼                │
│    └───── HALF-OPEN ◄────────                │
│        (3 successes)                         │
│                                              │
└──────────────────────────────────────────────┘

States:
• CLOSED: Normal operation (all requests pass)
• OPEN: Block all requests, try fallbacks
• HALF-OPEN: Allow limited test requests

Circuit Breaker Decision:

  • If primary provider circuit is OPEN → Try fallback providers
  • If all circuits OPEN → Fallback to legacy routing
  • If circuit is CLOSED or HALF-OPEN → Proceed

Phase 7: Request Execution

Request sent to selected channel:

Channel {
    id: 23,
    provider_id: 5,
    base_url: "https://api.provider-a.com/v1",
    key: "encrypted_key",
    status: 1 (active)
}

Phase 8: Metrics Recording

After request completion:

ProviderMetricsService::record_request(
    provider_id: 5,
    model_id: "deepseek-chat",
    channel_id: 23,
    latency_ms: 450,
    success: true,
    prompt_tokens: 1500,
    completion_tokens: 300,
)

Metrics Update Process:

  1. Record in memory buffer (fast)
  2. Periodic aggregation (every 60 seconds)
  3. Database update via update_provider_metrics() SQL function
  4. Quality score recalculation
  5. Circuit breaker state evaluation

Routing Strategies Explained

1. Performance Strategy

Goal: Maximize speed and reliability

Scoring Formula:

score = success_rate × 0.4 + latency_score × 0.3 + quality_score × 0.1 + priority_bonus

Best for:

  • Real-time applications
  • Latency-sensitive workloads
  • Production critical paths

Example:

Provider A: 98% success, 450ms → Score: 0.89
Provider B: 95% success, 800ms → Score: 0.78
Winner: Provider A

2. Cost Strategy

Goal: Minimize costs

Scoring Formula:

price_score = 1.0 - (avg_price / 100.0)
score = price_score × 0.6 + success_rate × 0.3 + quality_score × 0.1

Best for:

  • Batch processing
  • Development/testing
  • Cost-conscious applications

Example:

Provider A: $5/M → Score: 0.92
Provider B: $12/M → Score: 0.78
Winner: Provider A (cheaper)

3. Balanced Strategy (Default)

Goal: Optimize all factors

Scoring Formula:

Combined = performance_score × perf_weight + cost_score × cost_weight

Best for:

  • General purpose applications
  • Mixed workloads
  • Most production scenarios

4. Round-Robin Strategy

Goal: Equal distribution

Behavior:

  • All providers get equal score
  • Rotate through providers sequentially
  • No performance consideration

Best for:

  • Load distribution testing
  • Provider evaluation
  • Ensuring provider diversity

Key Advantages

1. Intelligent Provider Selection

Real-time metrics-based routing

  • Automatically routes to best-performing providers
  • Adapts to changing provider performance
  • No manual intervention required

Multi-dimensional scoring

  • Considers latency, success rate, cost, and quality
  • Configurable weights per model
  • Strategy-based optimization

2. High Availability & Fault Tolerance

Circuit breaker pattern

Failed Provider → Circuit Opens → Automatic Fallback
↓
Health Recovery → Circuit Half-Opens → Test Requests
↓
Success → Circuit Closes → Full Traffic Restoration

Automatic fallback chains

  • Up to 3 fallback providers per request
  • Ordered by score
  • Seamless failover on provider failure

No single point of failure

  • Multiple providers for same model
  • Instant failover without retries
  • Graceful degradation

3. Cost Optimization

Price-aware routing

  • Cost strategy prioritizes cheaper providers
  • Price thresholds per user
  • Balance cost vs performance

Provider competition

  • Multiple providers compete on price
  • Market-driven pricing
  • Automatic selection of best value

4. Performance Tracking

Comprehensive metrics

Latency: avg, p50, p95, p99, min, max
Success Rate: overall, last_hour, last_24h
Quality Score: calculated from success + latency + experience
Token Throughput: tokens/second tracking

Historical data

  • All-time cumulative metrics
  • Time-windowed metrics (hourly, daily)
  • Trend analysis capability

5. User Empowerment

Customizable preferences

UserPreferences {
    strategy: "performance",           // Choose optimization goal
    prefer_providers: [1, 5],          // Favorite providers
    avoid_providers: [3],              // Blacklist problematic ones
    max_price: 15.0,                   // Budget control
    min_success_rate: 0.95,            // Quality threshold
    max_latency_ms: 5000,              // Latency requirement
}

Per-request overrides

  • Can override preferences per API call
  • Flexible for different use cases
  • Maintains user defaults

6. Provider Ecosystem Benefits

Fair provider exposure

  • Weighted random selection prevents dominance
  • Quality providers get more traffic
  • New providers can compete

Transparent performance

  • Real metrics visible to admin
  • Quality score based on actual performance
  • Accountability for providers

7. Operational Excellence

Complete audit trail

routing_decision_logs:
- Every routing decision logged
- Full candidate list with scores
- Debugging and analytics
- 7-day retention (configurable)

Admin control

• Manual circuit breaker control
• Per-model routing configuration
• Provider approval workflow
• Analytics dashboard

8. Scalability

Efficient data structures

  • In-memory metrics buffering
  • Periodic batch updates to database
  • Minimal per-request overhead

Distributed-ready

  • Stateless routing decisions
  • Database-backed state
  • Redis-compatible circuit breakers

9. Developer Experience

Simple API integration

// Automatic routing - just pass model ID
let channel = ChannelService::get_routed_channel(
    &pool, "default", "deepseek-chat", user_id, None
).await?;

Simulation endpoint

POST /api/routing/simulate
{
  "model_id": "deepseek-chat",
  "preferences": { "strategy": "cost" }
}

10. Business Intelligence

Rich analytics

• Provider selection rates
• Strategy distribution
• Model usage patterns
• Cost analysis
• Performance trends

Performance Characteristics

Routing Decision Speed

  • Average: < 10ms
  • P99: < 50ms
  • Includes: DB queries + scoring + selection

Metrics Update

  • Memory buffer: ~1μs per record
  • DB flush: Every 60s (async, non-blocking)
  • Impact on request: Zero (async recording)

Database Queries

  • Provider lookup: Single JOIN query with indexes
  • Config loading: Cached or single query
  • Metrics aggregation: Periodic batch operation

Security & Isolation

Data Isolation

**Provider models completely separate from legacy channels No mixing of provider_models and abilities tables Clear separation of routing logic

Access Control

**Provider can only manage their own models Admin approval required for model visibility User-level routing preferences isolated

API Key Management

**Encrypted channel keys Provider-owned API keys Rotation support via provider_api_keys table

Future Enhancements

Planned Improvements

  1. ML-based routing

    • Predict provider performance
    • Learn from user patterns
    • Adaptive weight tuning
  2. Geographic routing

    • Provider location awareness
    • Latency-based geo selection
    • Regional failover
  3. Advanced analytics

    • Provider comparison dashboards
    • Cost forecasting
    • Performance predictions
  4. Enhanced fallback strategies

    • Intelligent retry with backoff
    • Cross-model fallbacks
    • Dynamic strategy switching

Configuration Examples

Example 1: High-Performance Setup

INSERT INTO model_routing_config VALUES (
    'deepseek-chat',
    0.35,  -- latency_weight (high)
    0.45,  -- success_rate_weight (high)
    0.10,  -- price_weight (low)
    0.10,  -- provider_priority_weight
    'performance'
);

Example 2: Cost-Optimized Setup

INSERT INTO model_routing_config VALUES (
    'deepseek-chat-v3.1',
    0.15,  -- latency_weight (low)
    0.35,  -- success_rate_weight (medium)
    0.40,  -- price_weight (high)
    0.10,  -- provider_priority_weight
    'cost'
);

Example 3: User Cost Control

UserRoutingPreferences {
    default_strategy: "cost",
    max_price_per_million_tokens: 10.0,  // Max $10/M
    min_success_rate: 0.90,              // Must maintain 90%+
    preferred_providers: [1, 5, 8],      // Try these first
}

Provider Routing System

1. High-Level System Architecture

┌────────────────────────────────────────────────────────────────────────────┐
│                         CLIENT APPLICATION                                 │
│                    (Web/Mobile/API Consumer)                               │
└────────────────────────────┬───────────────────────────────────────────────┘
                             │ HTTP Request
                             │ POST /v1/chat/completions
                             │ { "model": "deepseek-chat", "messages": [...] }
                             ▼
┌────────────────────────────────────────────────────────────────────────────┐
│                      ACTIX-WEB HTTP SERVER                                 │
│                    (backend/src/routes/)                                   │
└────────────────────────────┬───────────────────────────────────────────────┘
                             │
                             ▼
┌────────────────────────────────────────────────────────────────────────────┐
│                    RELAY HANDLER LAYER                                     │
│               (backend/src/relay/handlers.rs)                              │
│                                                                            │
│  ┌──────────────────────────────────────────────────────────────────────┐  │
│  │ 1. Authentication & Authorization                                    │  │
│  │ 2. Rate Limiting & Quotas                                            │  │
│  │ 3. Model Validation                                                  │  │
│  │ 4. select_channel_with_routing() ──────────────────────┐             │  │
│  └────────────────────────────────────────────────────────│─────────────┘  │
└───────────────────────────────────────────────────────────│────────────────┘
                                                            │
                             ┌──────────────────────────────┘
                             ▼
┌────────────────────────────────────────────────────────────────────────────┐
│                    ROUTING DECISION ENGINE                                 │
│               (backend/src/services/)                                      │
│                                                                            │
│  ┌───────────────────────────┐  ┌──────────────────────────────┐           │
│  │   ModelRoutingService     │  │    ChannelService            │           │
│  │   • route_request()       │◄─┤    • get_routed_channel()    │           │
│  │   • score_providers()     │  │    • Provider model routing  │           │
│  │   • select_provider()     │  │    • Legacy channel routing  │           │
│  └───────────┬───────────────┘  └──────────────────────────────┘           │
│              │                                                             │
│              ├──► load_user_preferences()                                  │
│              ├──► load_routing_config()                                    │
│              └──► get_model_providers() ──┐                                │
└───────────────────────────────────────────│────────────────────────────────┘
                                            │
                             ┌──────────────┘
                             ▼
┌────────────────────────────────────────────────────────────────────────────┐
│                       DATABASE LAYER (PostgreSQL)                          │
│                                                                            │
│  ┌──────────────────┐  ┌──────────────────┐  ┌─────────────────────────┐   │
│  │ provider_models  │  │ provider_model_  │  │ model_routing_config    │   │
│  │ • Model catalog  │  │   metrics        │  │ • Routing weights       │   │
│  │ • Pricing info   │  │ • Performance    │  │ • Default strategies    │   │
│  │ • Provider link  │  │ • Circuit state  │  │ • Fallback config       │   │
│  └──────────────────┘  └──────────────────┘  └─────────────────────────┘   │
│                                                                            │
│  ┌──────────────────┐  ┌──────────────────┐  ┌─────────────────────────┐   │
│  │ user_routing_    │  │ routing_decision_│  │ channels                │   │
│  │   preferences    │  │   logs           │  │ • Channel configs       │   │
│  │ • User settings  │  │ • Audit trail    │  │ • API keys              │   │
│  └──────────────────┘  └──────────────────┘  └─────────────────────────┘   │
└────────────────────────────────────────────────────────────────────────────┘
                             │
                             ▼
┌───────────────────────────────────────────────────────────────────────────┐
│                    SCORING & SELECTION ENGINE                             │
│                                                                           │
│  ┌─────────────────────────────────────────────────────────────────────┐  │
│  │              CANDIDATE PROVIDERS (Example)                          │  │
│  │                                                                     │  │
│  │  A: deepseek-chat    │ B: deepseek-chat    │ C: deepseek-chat │     │  │
│  │  • Price: $2.50/M    │ • Price: $3.00/M    │ • Price: $2.00/M       │  │
│  │  • Latency: 450ms    │ • Latency: 600ms    │ • Latency: 800ms       │  │
│  │  • Success: 98%      │ • Success: 97%      │ • Success: 95%         │  │
│  │  • Quality: 0.92     │ • Quality: 0.88     │ • Quality: 0.85        │  │
│  │  • Circuit: Closed   │ • Circuit: Closed   │ • Circuit: Half-Open   │  │
│  └─────────────────────────────────────────────────────────────────────┘  │
│                             │                                             │
│                             ▼                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐  │
│  │                    SCORING PROCESS                                  │  │
│  │                                                                     │  │
│  │  Strategy: "Performance"                                            │  │
│  │                                                                     │  │
│  │  Provider A Score = 0.98×0.4 + latency_score×0.3 + 0.92×0.1         │  │
│  │                   = 0.392 + 0.165 + 0.092 = 0.649                   │  │
│  │                                                                     │  │
│  │  Provider B Score = 0.97×0.4 + latency_score×0.3 + 0.88×0.1         │  │
│  │                   = 0.388 + 0.140 + 0.088 = 0.616                   │  │
│  │                                                                     │  │
│  │  Provider C Score = 0.95×0.4 + latency_score×0.3 + 0.85×0.1         │  │
│  │                   = 0.380 + 0.120 + 0.085 = 0.585                   │  │
│  │                   (Circuit Half-Open: Lower priority)               │  │
│  └─────────────────────────────────────────────────────────────────────┘  │
│                             │                                             │
│                             ▼                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐  │
│  │                   SELECTION RESULT                                  │  │
│  │                                                                     │  │
│  │              WINNER: Provider A (Score: 0.649)                      │  │          
│  │              Fallback: Provider B (Score: 0.616)                    │  │
│  │              Fallback: Provider C (Score: 0.585)                    │  │          
│  └─────────────────────────────────────────────────────────────────────┘  │
└───────────────────────────────────────────────────────────────────────────┘
                             │
                             ▼
┌────────────────────────────────────────────────────────────────────────────┐
│                    CIRCUIT BREAKER CHECK                                   │
│               (backend/src/services/circuit_breaker.rs)                    │
│                                                                            │
│     should_allow_request(Provider A, "deepseek-chat", channel_id) ?        │
│                                                                            │
│     Circuit: CLOSED → Allow Request                                        │
│     Circuit: OPEN   → Try Fallback Provider B                              │
│     Circuit: HALF_OPEN → Allow (limited)                                   │
└────────────────────────────┬───────────────────────────────────────────────┘
                             │
                             ▼
┌────────────────────────────────────────────────────────────────────────────┐
│                    EXECUTE REQUEST                                         │
│                                                                            │
│  Channel ID: 23                                                            │
│  Provider: Provider A                                                      │
│  Base URL: https://api.provider-a.com/v1                                   │
│  API Key: [encrypted]                                                      │
│                                                                            │
│  ┌──────────────────────────────────────────────────────────────────────┐  │
│  │  Forward Request to Provider                                         │  │
│  │  ├─► Add API key authentication                                      │  │
│  │  ├─► Transform request format                                        │  │
│  │  ├─► Handle streaming/non-streaming                                  │  │
│  │  └─► Track latency & tokens                                          │  │
│  └──────────────────────────────────────────────────────────────────────┘  │
└────────────────────────────┬───────────────────────────────────────────────┘
                             │
                   ┌─────────┴─────────┐
                   │                   │
                ✅ SUCCES         ❌ FAILURE
                   │                   │
                   ▼                   ▼
┌────────────────────────────┐  ┌────────────────────────────┐
│  Record Success Metrics    │  │  Record Failure Metrics    │
│  • Latency: 450ms          │  │  • Increment failure count │
│  • Tokens: 1500 + 300      │  │  • Update circuit state    │
│  • Update quality score    │  │  • Try fallback provider   │
│  • Circuit: record_success │  │  • Circuit: record_failure │
└────────────────────────────┘  └────────────────────────────┘
                   │                   │
                   └─────────┬─────────┘
                             ▼
┌────────────────────────────────────────────────────────────────────────────┐
│                    METRICS UPDATE PIPELINE                                 │
│               (backend/src/services/provider_metrics.rs)                   │
│                                                                            │
│  Step 1: Memory Buffer (Immediate)                                         │
│  ┌──────────────────────────────────────────────────────────────────────┐  │
│  │ METRICS_BUFFER (In-Memory HashMap)                                   │  │
│  │ Key: (provider_id=5, model_id="deepseek-chat", channel_id=23)        │  │
│  │ Value: [ {latency: 450, success: true, tokens: ...}, ... ]           │  │
│  └──────────────────────────────────────────────────────────────────────┘  │
│                             │                                              │
│  Step 2: Periodic Aggregation (Every 60s)                                  │
│  ┌──────────────────────────────────────────────────────────────────────┐  │
│  │ Aggregate metrics from buffer                                        │  │
│  │ • Calculate avg, p50, p95, p99 latency                               │  │
│  │ • Calculate success rate                                             │  │
│  │ • Sum token counts                                                   │  │
│  │ • Compute quality score                                              │  │
│  └──────────────────────────────────────────────────────────────────────┘  │
│                             │                                              │
│  Step 3: Database Flush (Batch)                                            │
│  ┌──────────────────────────────────────────────────────────────────────┐  │
│  │ UPDATE provider_model_metrics SET                                    │  │
│  │   total_requests = total_requests + 1,                               │  │
│  │   avg_latency_ms = (avg_latency_ms * 0.9 + 450 * 0.1),               │  │
│  │   quality_score = calculate_provider_quality_score(...),             │  │
│  │   circuit_state = ...                                                │  │
│  │ WHERE provider_id=5 AND model_id='deepseek-chat' AND channel_id=23   │  │
│  └──────────────────────────────────────────────────────────────────────┘  │
└────────────────────────────────────────────────────────────────────────────┘
                             │
                             ▼
┌────────────────────────────────────────────────────────────────────────────┐
│                    RETURN RESPONSE TO CLIENT                               │
│                                                                            │
│  HTTP 200 OK                                                               │
│  {                                                                         │
│    "id": "chatcmpl-...",                                                   │
│    "model": "deepseek-chat",                                               │
│    "choices": [...],                                                       │
│    "usage": { "prompt_tokens": 1500, "completion_tokens": 300 }            │
│  }                                                                         │
└────────────────────────────────────────────────────────────────────────────┘

2. Circuit Breaker State Machine

                    ╔═══════════════════════════════╗
                    ║    CIRCUIT STATE MACHINE      ║
                    ╚═══════════════════════════════╝

┌──────────────────────────────────────────────────────────────────────┐
│                                                                      │
│     ┌──────────────────────────────────────────────────────────┐     │
│     │                    CLOSED STATE                          │     │
│     │                 (Normal Operation)                       │     │
│     │  • All requests allowed                                  │     │
│     │  • failure_count = 0                                     │     │
│     │  • Tracking consecutive failures                         │     │
│     └──────────────┬────────────────────────────────────┬──────┘     │
│                    │                                    │            │
│        Success     │                         Failure    │            │
│        (reset      │                         (increment)│            │
│        counter)    │                                    │            │
│                    │                                    │            │
│                    │         ┌──────────────────────────┘            │
│                    │         │                                       │
│                    │         │ 5 consecutive failures                │
│                    │         │ (threshold reached)                   │
│                    │         ▼                                       │
│     ┌──────────────┴────────────────────────────────────────────┐    │
│     │                     OPEN STATE                            │    │
│     │                 (Blocking Requests)                       │    │
│     │  • All requests BLOCKED                                   │    │
│     │  • opened_at = current_timestamp                          │    │
│     │  • Return error / try fallback                            │    │
│     │  • Wait for recovery_timeout (60 seconds)                 │    │
│     └──────────────┬────────────────────────────────────────────┘    │
│                    │                                                 │
│                    │ Wait 60 seconds                                 │
│                    │ (recovery_timeout expired)                      │
│                    │                                                 │
│                    ▼                                                 │
│     ┌───────────────────────────────────────────────────────────┐    │
│     │                  HALF-OPEN STATE                          │    │
│     │                  (Testing Recovery)                       │    │
│     │  • Limited requests allowed (max 3)                       │    │
│     │  • half_open_requests = 0                                 │    │
│     │  • Testing if provider recovered                          │    │
│     └──────────────┬─────────────────────────────┬──────────────┘    │
│                    │                             │                   │
│        Success     │                  Failure    │                   │
│        (3 times)   │                  (any)      │                   │
│                    │                             │                   │
│                    ▼                             ▼                   │
│     ┌──────────────────────────┐   ┌───────────────────────────┐     │
│     │   Back to CLOSED         │   │   Back to OPEN            │     │
│     │   (Provider recovered)    │   │   (Still failing)        │     │
│     │   • Reset counters        │   │   • Reset timeout        │     │
│     │   • Full traffic resume   │   │   • Wait another 60s     │     │
│     └──────────────────────────┘   └───────────────────────────┘     │
│                                                                      │
└──────────────────────────────────────────────────────────────────────┘

Configuration:
• failure_threshold = 5          (failures to open circuit)
• success_threshold = 3          (successes to close circuit)
• recovery_timeout = 60 seconds  (wait before testing)
• half_open_max_requests = 3     (test request limit)

3. Scoring Algorithm Comparison

╔════════════════════════════════════════════════════════════════════════╗
║                 ROUTING STRATEGY SCORING ALGORITHMS                    ║
╚════════════════════════════════════════════════════════════════════════╝

┌──────────────────────────────────────────────────────────────────────────┐
│                        PERFORMANCE STRATEGY                              │
│  Goal: Maximize speed and reliability                                    │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Formula:                                                                │
│  ┌─────────────────────────────────────────────────────────────────┐     │
│  │ score = success_rate × 0.4                                      │     │
│  │       + latency_score × 0.3                                     │     │
│  │       + quality_score × 0.1                                     │     │
│  │       + priority_bonus                                          │     │
│  └─────────────────────────────────────────────────────────────────┘     │
│                                                                          │
│  Where:                                                                  │
│  • success_rate: 0.0 - 1.0 (higher is better)                            │
│  • latency_score = 1.0 - (latency_ms / 30000) (lower latency = higher)   │
│  • quality_score: historical quality metric (0.0 - 1.0)                  │
│  • priority_bonus: channel_priority / 100 (max 0.2)                      │
│                                                                          │
│  Example:                                                                │
│  Provider with 98% success, 450ms latency, quality 0.92, priority 10     │
│  score = 0.98×0.4 + (1-450/30000)×0.3 + 0.92×0.1 + 0.1                   │
│        = 0.392 + 0.296 + 0.092 + 0.1 = 0.880                             │
└──────────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────────┐
│                           COST STRATEGY                                  │
│  Goal: Minimize cost while maintaining quality                           │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Formula:                                                                │
│  ┌─────────────────────────────────────────────────────────────────┐     │
│  │ avg_price = (prompt_price + completion_price) / 2               │     │
│  │ price_score = 1.0 - (avg_price / 100.0)                         │     │
│  │ score = price_score × 0.6                                       │     │
│  │       + success_rate × 0.3                                      │     │
│  │       + quality_score × 0.1                                     │     │
│  └─────────────────────────────────────────────────────────────────┘     │
│                                                                          │
│  Where:                                                                  │
│  • prompt_price: $ per million prompt tokens                             │
│  • completion_price: $ per million completion tokens                     │
│  • price_score: normalized inverse price (cheaper = higher)              │
│                                                                          │
│  Example:                                                                │
│  Provider with $2.50 prompt, $10.00 completion, 97% success, quality 0.9 │
│  avg_price = (2.50 + 10.00) / 2 = $6.25                                  │
│  price_score = 1.0 - (6.25 / 100) = 0.9375                               │
│  score = 0.9375×0.6 + 0.97×0.3 + 0.9×0.1 = 0.944                         │
└──────────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────────┐
│                         BALANCED STRATEGY                                │
│  Goal: Optimize all factors with configurable weights                    │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Formula:                                                                │
│  ┌─────────────────────────────────────────────────────────────────┐     │
│  │ perf_score = performance_score(candidate, config)               │     │
│  │ cost_score = cost_score(candidate, config)                      │     │
│  │                                                                 │     │
│  │ total_weight = latency_w + success_w + price_w + priority_w     │     │
│  │ perf_weight = (latency_w + success_w) / total_weight            │     │
│  │ cost_weight = price_w / total_weight                            │     │
│  │                                                                 │     │
│  │ score = perf_score × perf_weight + cost_score × cost_weight     │     │
│  └─────────────────────────────────────────────────────────────────┘     │
│                                                                          │
│  Default weights (can be configured per model):                          │
│  • latency_weight: 0.3                                                   │
│  • success_rate_weight: 0.4                                              │
│  • price_weight: 0.2                                                     │
│  • provider_priority_weight: 0.1                                         │
│                                                                          │
│  Example:                                                                │
│  Using default weights:                                                  │
│  perf_weight = (0.3 + 0.4) / 1.0 = 0.7                                   │
│  cost_weight = 0.2 / 1.0 = 0.2                                           │
│  score = 0.880×0.7 + 0.944×0.2 = 0.616 + 0.189 = 0.805                   │
└──────────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────────┐
│                       ROUND-ROBIN STRATEGY                               │
│  Goal: Equal distribution across all providers                           │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Formula:                                                                │
│  ┌─────────────────────────────────────────────────────────────────┐     │
│  │ All candidates receive equal score = 1.0                        │     │
│  │ Selection: index = counter % provider_count                     │     │
│  │ counter = (counter + 1) % usize::MAX                            │     │
│  └─────────────────────────────────────────────────────────────────┘     │
│                                                                          │
│  Behavior:                                                               │
│  Request 1 → Provider A                                                  │
│  Request 2 → Provider B                                                  │
│  Request 3 → Provider C                                                  │
│  Request 4 → Provider A (cycle repeats)                                  │
│                                                                          │
│  Note: No performance consideration, purely sequential distribution      │
└──────────────────────────────────────────────────────────────────────────┘

4. Data Flow Timeline

Time  │ Component                │ Action
──────┼─────────────────────────┼────────────────────────────────────────────
0ms   │ Client                  │ POST /v1/chat/completions
      │                         │ { "model": "deepseek-chat", "messages": [...] }
──────┼─────────────────────────┼────────────────────────────────────────────
1ms   │ Relay Handler           │ Validate request, extract model_id
      │                         │ Check authentication & rate limits
──────┼─────────────────────────┼────────────────────────────────────────────
2ms   │ Channel Service         │ Call get_routed_channel("deepseek-chat", user_id)
      │                         │ Try provider model routing first
──────┼─────────────────────────┼────────────────────────────────────────────
3ms   │ Model Routing Service   │ Query provider_models table
      │                         │ SELECT * FROM provider_models WHERE model_id='deepseek-chat'
      │                         │ JOIN provider_model_metrics
      │                         │ Found 3 candidates
──────┼─────────────────────────┼────────────────────────────────────────────
4ms   │ Model Routing Service   │ Load user preferences (if exists)
      │                         │ SELECT * FROM user_routing_preferences WHERE user_id=...
──────┼─────────────────────────┼────────────────────────────────────────────
5ms   │ Model Routing Service   │ Load model routing config
      │                         │ SELECT * FROM model_routing_config WHERE model_id='deepseek-chat'
──────┼─────────────────────────┼────────────────────────────────────────────
6ms   │ Model Routing Service   │ Score 3 candidates using "balanced" strategy
      │                         │ Provider A: 0.880
      │                         │ Provider B: 0.805
      │                         │ Provider C: 0.750
──────┼─────────────────────────┼────────────────────────────────────────────
7ms   │ Model Routing Service   │ Apply user preferences filters
      │                         │ Boost preferred providers (+50%)
      │                         │ Remove blocked providers
──────┼─────────────────────────┼────────────────────────────────────────────
8ms   │ Model Routing Service   │ Weighted random selection from top 3
      │                         │ Selected: Provider A (channel_id=23)
      │                         │ Fallbacks: [Provider B, Provider C]
──────┼─────────────────────────┼────────────────────────────────────────────
9ms   │ Circuit Breaker         │ Check should_allow_request(Provider A, "deepseek-chat", 23)
      │                         │ Circuit state: CLOSED
      │                         │ Allow request
──────┼─────────────────────────┼────────────────────────────────────────────
10ms  │ Relay Handler           │ Get channel details from channels table
      │                         │ Channel 23: base_url, api_key
──────┼─────────────────────────┼────────────────────────────────────────────
11ms  │ Routing Decision Log    │ Async: INSERT INTO routing_decision_logs
      │                         │ (non-blocking, happens in background)
──────┼─────────────────────────┼────────────────────────────────────────────
12ms  │ Relay Handler           │ Transform request for provider API
      │                         │ Add Authorization: Bearer [api_key]
      │                         │ Adjust model name if needed
──────┼─────────────────────────┼────────────────────────────────────────────
15ms  │ HTTP Client             │ POST https://api.provider-a.com/v1/chat/completions
      │                         │ Start latency timer
──────┼─────────────────────────┼────────────────────────────────────────────
...   │ Provider A              │ Processing request...
──────┼─────────────────────────┼────────────────────────────────────────────
465ms │ HTTP Client             │ Response received from Provider A
      │                         │ Status: 200 OK
      │                         │ Latency: 450ms (15ms → 465ms)
──────┼─────────────────────────┼────────────────────────────────────────────
466ms │ Relay Handler           │ Parse response
      │                         │ Extract usage: prompt_tokens=1500, completion_tokens=300
──────┼─────────────────────────┼────────────────────────────────────────────
467ms │ Provider Metrics        │ record_request(provider_id=5, model="deepseek-chat", 
      │                         │   channel_id=23, latency=450, success=true,
      │                         │   prompt_tokens=1500, completion_tokens=300)
      │                         │ → Stored in memory buffer (non-blocking)
──────┼─────────────────────────┼────────────────────────────────────────────
468ms │ Circuit Breaker         │ record_success(provider_id=5, model="deepseek-chat", 
      │                         │   channel_id=23)
      │                         │ → success_count++, failure_count=0
──────┼─────────────────────────┼────────────────────────────────────────────
469ms │ Billing Service         │ post_consume_quota() (async, non-blocking)
      │                         │ Deduct quota from user balance
──────┼─────────────────────────┼────────────────────────────────────────────
470ms │ Relay Handler           │ Return response to client
      │                         │ HTTP 200 OK with completion
──────┼─────────────────────────┼────────────────────────────────────────────

Background Tasks (runs every 60 seconds):
──────┼─────────────────────────┼────────────────────────────────────────────
60s   │ Metrics Aggregator      │ Aggregate metrics from memory buffer
      │                         │ Calculate avg, p50, p95, p99 latency
      │                         │ Calculate success rate for last hour
──────┼─────────────────────────┼────────────────────────────────────────────
61s   │ Metrics Aggregator      │ Batch update to provider_model_metrics table
      │                         │ UPDATE provider_model_metrics SET ...
      │                         │ Recalculate quality scores
──────┼─────────────────────────┼────────────────────────────────────────────
62s   │ Circuit Breaker         │ recover_circuit_breakers()
      │                         │ Check if any OPEN circuits can move to HALF_OPEN
──────┼─────────────────────────┼────────────────────────────────────────────

5. Database Schema Relationships

┌─────────────────────────────────────────────────────────────────────────┐
│                     DATABASE SCHEMA RELATIONSHIPS                       │
└─────────────────────────────────────────────────────────────────────────┘

                              ┌──────────────┐
                              │    users     │
                              │ (providers)  │
                              ├──────────────┤
                              │ id (PK)      │◄────────┐
                              │ username     │         │
                              │ is_provider  │         │ provider_id (FK)
                              │ provider_    │         │
                              │   status     │         │
                              └──────┬───────┘         │
                                     │                 │
                  ┌──────────────────┼─────────────────┼──────────────┐
                  │                  │                 │              │
                  │ provider_id (FK) │                 │              │
                  ▼                  ▼                 │              │
       ┌──────────────────┐  ┌──────────────┐          │              │
       │  provider_       │  │  channels    │          │              │
       │    models        │  ├──────────────┤          │              │
       ├──────────────────┤  │ id (PK)      │◄────┐    │              │
       │ id (PK)          │  │ provider_id  │     │    │              │
       │ model_id         │  │   (FK)       │     │    │              │
       │ provider_id (FK) ├─►│ base_url     │     │    │              │
       │ channel_id (FK)  ├──┤ key          │     │    │              │
       │ model_name       │  │ status       │     │    │              │
       │ pricing_prompt   │  └──────────────┘     │    │              │
       │ pricing_         │                       │    │              │
       │   completion     │                       │    │              │
       │ context_length   │                       │    │              │
       │ status           │    channel_id (FK)    │    │              │
       │ quality_score    │         │             │    │              │
       └──────┬───────────┘         │             │    │              │
              │                     │             │    │              │
              │ (provider_id,       │             │    │              │
              │  model_id,          │             │    │              │
              │  channel_id)        │             │    │              │
              │                     │             │    │              │
              ▼                     │             │    │              │
       ┌──────────────────┐         │             │    │              │
       │  provider_model_ │         │             │    │              │
       │    metrics       │◄────────┘             │    │              │
       ├──────────────────┤                       │    │              │
       │ id (PK)          │                       │    │              │
       │ provider_id (FK) ├───────────────────────┘    │              │
       │ model_id         │                            │              │
       │ channel_id (FK)  ├────────────────────────────┘              │
       │ total_requests   │                                           │
       │ success_rate_    │                                           │
       │   last_hour      │                                           │
       │ avg_latency_ms   │                                           │
       │ quality_score    │                                           │
       │ circuit_state    │                                           │
       └──────────────────┘                                           │
                                                                      │
              ┌───────────────────────────────────────────────────────┘
              │
              │ user_id (FK)
              ▼
       ┌──────────────────┐
       │  user_routing_   │
       │    preferences   │
       ├──────────────────┤
       │ id (PK)          │
       │ user_id (FK)     │
       │ default_strategy │
       │ preferred_       │
       │   providers      │
       │ blocked_         │
       │   providers      │
       │ max_price        │
       │ min_success_rate │
       └──────────────────┘

       ┌──────────────────┐
       │  model_routing_  │
       │     config       │
       ├──────────────────┤
       │ id (PK)          │
       │ canonical_       │
       │   model_id       │
       │ latency_weight   │
       │ success_rate_    │
       │   weight         │
       │ price_weight     │
       │ default_strategy │
       └──────────────────┘

       ┌──────────────────┐
       │  routing_        │
       │    decision_logs │
       ├──────────────────┤
       │ id (PK)          │
       │ request_id       │
       │ user_id          │
       │ model_id         │
       │ selected_        │
       │   provider_id    │
       │ selected_        │
       │   channel_id     │
       │ routing_strategy │
       │ routing_reason   │
       │ candidates_json  │
       │ created_at       │
       └──────────────────┘

Legend:
PK = Primary Key
FK = Foreign Key

6. Advantages Visualization

╔══════════════════════════════════════════════════════════════════════════╗
║               KEY ADVANTAGES OF PROVIDER ROUTING SYSTEM                  ║
╚══════════════════════════════════════════════════════════════════════════╝

┌────────────────────────────────────────────────────────────────────────┐
│ 1. INTELLIGENT SELECTION                                               │
├────────────────────────────────────────────────────────────────────────┤
│                                                                        │
│   Traditional:                    Provider Routing:                    │
│   ┌──────────┐                   ┌──────────┐                          │
│   │ Request  │                   │ Request  │                          │
│   └────┬─────┘                   └────┬─────┘                          │
│        │                              │                                │
│        │ Fixed                        │ Intelligent                    │
│        │ Config                       │ Selection                      │
│        ▼                              ▼                                │
│   ┌──────────┐                   ┌──────────┐                          │
│   │ Channel  │                   │ Best     │ ← Based on:              │
│   │ (static) │                   │ Provider │   • Performance          │
│   └──────────┘                   └──────────┘   • Cost                 │
│                                                  • User prefs          │
│                                                  • Real-time metrics   │
│   Result: Fixed,                 Result: Dynamic,                      │
│           no optimization                always optimized              │
└────────────────────────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────────────────────────┐
│ 2. HIGH AVAILABILITY                                                   │
├────────────────────────────────────────────────────────────────────────┤
│                                                                        │
│   Without Routing:               With Provider Routing:                │
│   ┌──────────┐                   ┌──────────┐                          │
│   │ Provider │                   │ Provider │                          │
│   │    A     │ ← Request         │    A     │ ← Request                │
│   └────┬─────┘                   └────┬─────┘                          │
│        │                              │                                │
│        │ FAILS                        │ FAILS                          │
│        ▼                              ▼                                │
│   ┌──────────┐                   ┌──────────┐                          │
│   │  ERROR   │                   │ Circuit  │                          │
│   │ RETURNED │                   │ Breaker  │                          │
│   └──────────┘                   │  Opens   │                          │
│                                  └────┬─────┘                          │
│   User sees error                     │ Auto                           │
│                                       │ Fallback                       │
│                                       ▼                                │
│                                  ┌──────────┐                          │
│                                  │ Provider │                          │
│                                  │    B     │ ← Retry                  │
│                                  └────┬─────┘                          │
│                                       │                                │
│                                       │ SUCCESS                        │
│                                       ▼                                │
│                                  ┌──────────┐                          │
│                                  │ Response │                          │
│                                  │ Returned │                          │
│                                  └──────────┘                          │
│                                                                        │
│   Uptime: ~99.5%                 Uptime: ~99.99%                       │
└────────────────────────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────────────────────────┐
│ 3. COST OPTIMIZATION                                                   │
├────────────────────────────────────────────────────────────────────────┤
│                                                                        │
│   Fixed Provider:                Provider Routing (Cost Strategy):     │
│                                                                        │
│   Provider A: $10/M              Provider A: $10/M → Score: 0.70       │
│   (only option)                  Provider B: $5/M  → Score: 0.85       │
│                                  Provider C: $12/M → Score: 0.65       │
│   1M tokens = $10                                                      │
│   10M tokens = $100              Provider B selected (cheapest)        │
│   100M tokens = $1,000           1M tokens = $5                        │
│                                  10M tokens = $50                      │
│                                  100M tokens = $500                    │
│                                                                        │
│   Monthly cost: $1,000           Monthly cost: $500                    │
│                                  SAVINGS: 50%                          │
└────────────────────────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────────────────────────┐
│ 4. PERFORMANCE TRACKING                                                │
├────────────────────────────────────────────────────────────────────────┤
│                                                                        │
│   No Metrics:                    Provider Routing Metrics:             │
│   • Unknown performance          • Real-time latency (avg, p50-p99)    │
│   • No visibility                • Success rate (hourly, daily)        │
│   • Can't optimize               • Quality score (0.0-1.0)             │
│   • Blind to issues              • Circuit breaker state               │
│                                  • Token throughput                    │
│                                  • Historical trends                   │
│                                                                        │
│   Dashboard:                     Dashboard:                            │
│   ┌──────────────┐               ┌──────────────────────────────────┐  │
│   │              │               │ Provider A: 450ms avg, 98%       │  │
│   │   No Data    │               │ Provider B: 650ms avg, 97%       │  │
│   │              │               │ Provider C: 900ms avg, 92%       │  │
│   │              │               │                                  │  │
│   └──────────────┘               │ Trending: Provider A improving   │  │
│                                  │ Alert: Provider C degraded       │  │
│                                  └──────────────────────────────────┘  │
│                                                                        │
│   Result: Reactive               Result: Proactive                     │
└────────────────────────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────────────────────────┐
│ 5. USER EMPOWERMENT                                                    │
├────────────────────────────────────────────────────────────────────────┤
│                                                                        │
│   Fixed Config:                  User Preferences:                     │
│   • No control                   • Choose strategy (perf/cost/balanced)│
│   • One size fits all            • Set preferred providers             │
│   • Can't avoid bad providers    • Block problematic providers         │
│   • No budget control            • Set price limits ($X per M tokens)  │
│                                  • Set quality thresholds              │
│                                  • Per-request overrides               │
│                                                                        │
│   User A (needs speed):          User A preferences:                   │
│   ┌────────────────┐             ┌──────────────────────────────────┐  │
│   │ Gets random    │             │ strategy: "performance"          │  │
│   │ slow provider  │             │ min_success_rate: 0.99           │  │
│   │ Frustrated     │             │ max_latency_ms: 1000             │  │
│   └────────────────┘             └──────────────────────────────────┘  │
│                                  → Gets fastest, most reliable         │
│   User B (budget-conscious):     User B preferences:                   │
│   ┌────────────────┐             ┌──────────────────────────────────┐  │
│   │ Pays high      │             │ strategy: "cost"                 │  │
│   │ prices         │             │ max_price: 7.0                   │  │
│   │ Expensive      │             │ min_success_rate: 0.95           │  │
│   └────────────────┘             └──────────────────────────────────┘  │
│                                  → Gets cheapest within budget         │
└────────────────────────────────────────────────────────────────────────┘

7. Comparison: Before vs After

╔══════════════════════════════════════════════════════════════════════════╗
║          BEFORE PROVIDER ROUTING   vs   AFTER PROVIDER ROUTING           ║
╚══════════════════════════════════════════════════════════════════════════╝

┌───────────────────────────────────────────────────────────────────────────┐
│ METRIC                │ BEFORE              │ AFTER                       │
├───────────────────────┼─────────────────────┼─────────────────────────────┤
│ Provider Selection    │ Manual/Random       │ Intelligent (metrics-based) │
│ Optimization          │ None                │ Real-time, multi-dimensional│
│ Availability          │ ~99.5%              │ ~99.99%                     │
│ Cost Optimization     │ No                  │ Yes (up to 50% savings)     │
│ Failover Time         │ Manual (minutes)    │ Automatic (milliseconds)    │
│ Performance Tracking  │ None                │ Comprehensive               │
│ User Control          │ None                │ Full (preferences)          │
│ Provider Diversity    │ Limited             │ Multiple per model          │
│ Quality Assurance     │ Manual              │ Automated (circuit breaker) │
│ Audit Trail           │ None                │ Complete logging            │
│ Admin Visibility      │ None                │ Full dashboard              │
│ Scalability           │ Limited             │ Highly scalable             │
└───────────────────────────────────────────────────────────────────────────┘

BEFORE: Simple but Inflexible
┌──────────────────────────────────────────────────────────────────────┐
│  Request → Channel (fixed) → Provider → Response                     │
│                                                                      │
│  Problems:                                                           │
│  • Single point of failure                                           │
│  • No optimization                                                   │
│  • No visibility                                                     │
│  • Manual intervention needed                                        │
└──────────────────────────────────────────────────────────────────────┘

AFTER: Intelligent & Resilient
┌──────────────────────────────────────────────────────────────────────┐
│  Request → Routing Engine → Best Provider (scored) → Response        │
│             ↓                      ↓                                 │
│          Metrics              Circuit Breaker                        │
│          User Prefs           Fallback Chain                         │
│          Config               Quality Tracking                       │
│                                                                      │
│  Benefits:                                                           │
│   Automatic failover                                                 │
│   Cost & performance optimization                                    │
│   Complete visibility                                                │
│   Self-healing system                                                │
└──────────────────────────────────────────────────────────────────────┘

About

Knox Labs: R&D Direction: Rust, Operating System & AI Fusion Development

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published