The Provider Model Routing System is an intelligent, multi-layered routing infrastructure that enables multiple providers to offer the same AI models while automatically selecting the optimal provider based on real-time metrics, user preferences, and system health.
┌─────────────────────────────────────────────────────────────────┐
│ USER REQUEST (Model ID) │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ RELAY HANDLER │
│ • Request validation │
│ • Channel selection with routing │
│ • Circuit breaker checking │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ CHANNEL SERVICE │
│ get_routed_channel() │
│ ├── Try Provider Model Routing (STEP 1) │
│ └── Fallback to Legacy Channel Routing (STEP 2) │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ MODEL ROUTING SERVICE │
│ │
│ route_request() │
│ ├─► 1. get_model_providers() [Provider Discovery] │
│ ├─► 2. load_user_preferences() [User Prefs Loading] │
│ ├─► 3. load_routing_config() [Model Config Loading] │
│ ├─► 4. score_providers() [Intelligent Scoring] │
│ └─► 5. select_provider() [Final Selection] │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ ROUTING DECISION │
│ • Selected Provider + Channel │
│ • Fallback Providers (ordered) │
│ • Routing Reason & Score │
│ • Strategy Used │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ CIRCUIT BREAKER CHECK │
│ should_allow_request() │
│ • Closed → Allow │
│ • Open → Try Fallback │
│ • Half-Open → Limited Allow │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ EXECUTE REQUEST ON SELECTED CHANNEL │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ METRICS RECORDING │
│ record_request() │
│ • Latency tracking │
│ • Success/failure counting │
│ • Token usage │
│ • Quality score calculation │
│ • Circuit breaker state update │
└──────────────────────────────────────────────────────────────────┘
Provider-submitted model definitions
├── Model Info: model_id, model_name, description
├── Provider: provider_id, provider_name, channel_id
├── Pricing: pricing_prompt, pricing_completion, pricing_image
├── Specs: context_length, modality, supported_parameters
└── Status: status (0=pending, 1=approved, 2=rejected)
Live performance tracking per provider-model-channel
├── Cumulative: total_requests, successful_requests, failed_requests
├── Latency: avg/p50/p95/p99/min/max_latency_ms
├── Time Windows: last_hour, last_24h metrics
├── Circuit Breaker: circuit_state, consecutive_failures/successes
├── Quality: quality_score (0.0-1.0)
└── Token Throughput: total tokens, avg_tokens_per_second
Admin-configurable routing rules per model
├── Weights: latency_weight, success_rate_weight, price_weight
├── Strategy: default_strategy (performance/cost/balanced/round_robin)
├── Fallback: enable_auto_fallback, max_fallback_attempts
└── Circuit: failure_threshold, recovery_timeout_seconds
Per-user routing customization
├── Strategy: default_strategy
├── Providers: preferred_providers[], blocked_providers[]
├── Limits: max_price_per_million_tokens, min_success_rate, max_latency_ms
└── Requirements: require_streaming, require_function_calling
Complete history of routing decisions
├── Decision: selected_provider_id, routing_strategy, routing_reason
├── Candidates: candidates_count, candidates_json
├── Fallback: fallback_providers[], is_fallback_request
└── Performance: routing_duration_us
User/API Request
│
├─► Model ID: "deepseek-chat"
├─► User ID: 12345
└─► Optional: RoutingPreferences { strategy: "performance" }
SELECT provider_id, channel_id, provider_name, pricing, metrics, quality_score
FROM provider_models pm
LEFT JOIN provider_model_metrics pmm ON (...)
LEFT JOIN channels c ON pm.channel_id = c.id
WHERE pm.model_id = 'deepseek-chat' AND pm.status = 1
ORDER BY quality_score DESCOutput: List of ProviderCandidate structs
ProviderCandidate {
provider_id: 5,
channel_id: 23,
provider_name: "Provider A",
price_per_million_prompt: 2.50,
price_per_million_completion: 10.00,
success_rate: 0.98,
avg_latency_ms: 450,
quality_score: 0.92,
circuit_state: Closed,
}Model Config:
RoutingConfig {
canonical_model_id: "deepseek-chat",
latency_weight: 0.3,
success_rate_weight: 0.4,
price_weight: 0.2,
provider_priority_weight: 0.1,
default_strategy: "balanced",
}User Preferences (merged with request prefs):
RoutingPreferences {
strategy: Performance,
prefer_providers: [5, 8],
avoid_providers: [3],
max_price: Some(15.0),
min_success_rate: Some(0.95),
}score = success_rate * 0.4 + latency_score * 0.3 + quality_score * 0.1 + priority_bonusprice_score = 1.0 - (avg_price / 100.0)
score = price_score * 0.6 + success_rate * 0.3 + quality_score * 0.1perf_score = performance_score(candidate)
cost_score = cost_score(candidate)
score = perf_score * perf_weight + cost_score * cost_weight-
Filter by preferences:
- Remove avoided providers
- Check max_price threshold
- Check min_success_rate
- Check max_latency_ms
-
Boost preferred providers:
- Apply 50% score boost to preferred providers
-
Weighted random selection:
- Sort by score (descending)
- Take top 3 candidates
- Weighted random selection (prevents provider starvation)
-
Prepare fallback chain:
- Remaining candidates become fallback providers (up to 3)
┌──────────────────────────────────────────────┐
│ Circuit State Machine │
├──────────────────────────────────────────────┤
│ │
│ CLOSED ──────────────► OPEN │
│ ▲ (5 failures) │ │
│ │ │ │
│ │ (60s timeout) │
│ │ │ │
│ │ ▼ │
│ └───── HALF-OPEN ◄──────── │
│ (3 successes) │
│ │
└──────────────────────────────────────────────┘
States:
• CLOSED: Normal operation (all requests pass)
• OPEN: Block all requests, try fallbacks
• HALF-OPEN: Allow limited test requests
Circuit Breaker Decision:
- If primary provider circuit is OPEN → Try fallback providers
- If all circuits OPEN → Fallback to legacy routing
- If circuit is CLOSED or HALF-OPEN → Proceed
Request sent to selected channel:
Channel {
id: 23,
provider_id: 5,
base_url: "https://api.provider-a.com/v1",
key: "encrypted_key",
status: 1 (active)
}
After request completion:
ProviderMetricsService::record_request(
provider_id: 5,
model_id: "deepseek-chat",
channel_id: 23,
latency_ms: 450,
success: true,
prompt_tokens: 1500,
completion_tokens: 300,
)Metrics Update Process:
- Record in memory buffer (fast)
- Periodic aggregation (every 60 seconds)
- Database update via
update_provider_metrics()SQL function - Quality score recalculation
- Circuit breaker state evaluation
Goal: Maximize speed and reliability
Scoring Formula:
score = success_rate × 0.4 + latency_score × 0.3 + quality_score × 0.1 + priority_bonus
Best for:
- Real-time applications
- Latency-sensitive workloads
- Production critical paths
Example:
Provider A: 98% success, 450ms → Score: 0.89
Provider B: 95% success, 800ms → Score: 0.78
Winner: Provider A
Goal: Minimize costs
Scoring Formula:
price_score = 1.0 - (avg_price / 100.0)
score = price_score × 0.6 + success_rate × 0.3 + quality_score × 0.1
Best for:
- Batch processing
- Development/testing
- Cost-conscious applications
Example:
Provider A: $5/M → Score: 0.92
Provider B: $12/M → Score: 0.78
Winner: Provider A (cheaper)
Goal: Optimize all factors
Scoring Formula:
Combined = performance_score × perf_weight + cost_score × cost_weight
Best for:
- General purpose applications
- Mixed workloads
- Most production scenarios
Goal: Equal distribution
Behavior:
- All providers get equal score
- Rotate through providers sequentially
- No performance consideration
Best for:
- Load distribution testing
- Provider evaluation
- Ensuring provider diversity
Real-time metrics-based routing
- Automatically routes to best-performing providers
- Adapts to changing provider performance
- No manual intervention required
Multi-dimensional scoring
- Considers latency, success rate, cost, and quality
- Configurable weights per model
- Strategy-based optimization
Circuit breaker pattern
Failed Provider → Circuit Opens → Automatic Fallback
↓
Health Recovery → Circuit Half-Opens → Test Requests
↓
Success → Circuit Closes → Full Traffic Restoration
Automatic fallback chains
- Up to 3 fallback providers per request
- Ordered by score
- Seamless failover on provider failure
No single point of failure
- Multiple providers for same model
- Instant failover without retries
- Graceful degradation
Price-aware routing
- Cost strategy prioritizes cheaper providers
- Price thresholds per user
- Balance cost vs performance
Provider competition
- Multiple providers compete on price
- Market-driven pricing
- Automatic selection of best value
Comprehensive metrics
Latency: avg, p50, p95, p99, min, max
Success Rate: overall, last_hour, last_24h
Quality Score: calculated from success + latency + experience
Token Throughput: tokens/second tracking
Historical data
- All-time cumulative metrics
- Time-windowed metrics (hourly, daily)
- Trend analysis capability
Customizable preferences
UserPreferences {
strategy: "performance", // Choose optimization goal
prefer_providers: [1, 5], // Favorite providers
avoid_providers: [3], // Blacklist problematic ones
max_price: 15.0, // Budget control
min_success_rate: 0.95, // Quality threshold
max_latency_ms: 5000, // Latency requirement
}Per-request overrides
- Can override preferences per API call
- Flexible for different use cases
- Maintains user defaults
Fair provider exposure
- Weighted random selection prevents dominance
- Quality providers get more traffic
- New providers can compete
Transparent performance
- Real metrics visible to admin
- Quality score based on actual performance
- Accountability for providers
Complete audit trail
routing_decision_logs:
- Every routing decision logged
- Full candidate list with scores
- Debugging and analytics
- 7-day retention (configurable)Admin control
• Manual circuit breaker control
• Per-model routing configuration
• Provider approval workflow
• Analytics dashboard
Efficient data structures
- In-memory metrics buffering
- Periodic batch updates to database
- Minimal per-request overhead
Distributed-ready
- Stateless routing decisions
- Database-backed state
- Redis-compatible circuit breakers
Simple API integration
// Automatic routing - just pass model ID
let channel = ChannelService::get_routed_channel(
&pool, "default", "deepseek-chat", user_id, None
).await?;Simulation endpoint
POST /api/routing/simulate
{
"model_id": "deepseek-chat",
"preferences": { "strategy": "cost" }
}Rich analytics
• Provider selection rates
• Strategy distribution
• Model usage patterns
• Cost analysis
• Performance trends
- Average: < 10ms
- P99: < 50ms
- Includes: DB queries + scoring + selection
- Memory buffer: ~1μs per record
- DB flush: Every 60s (async, non-blocking)
- Impact on request: Zero (async recording)
- Provider lookup: Single JOIN query with indexes
- Config loading: Cached or single query
- Metrics aggregation: Periodic batch operation
**Provider models completely separate from legacy channels No mixing of provider_models and abilities tables Clear separation of routing logic
**Provider can only manage their own models Admin approval required for model visibility User-level routing preferences isolated
**Encrypted channel keys Provider-owned API keys Rotation support via provider_api_keys table
-
ML-based routing
- Predict provider performance
- Learn from user patterns
- Adaptive weight tuning
-
Geographic routing
- Provider location awareness
- Latency-based geo selection
- Regional failover
-
Advanced analytics
- Provider comparison dashboards
- Cost forecasting
- Performance predictions
-
Enhanced fallback strategies
- Intelligent retry with backoff
- Cross-model fallbacks
- Dynamic strategy switching
INSERT INTO model_routing_config VALUES (
'deepseek-chat',
0.35, -- latency_weight (high)
0.45, -- success_rate_weight (high)
0.10, -- price_weight (low)
0.10, -- provider_priority_weight
'performance'
);INSERT INTO model_routing_config VALUES (
'deepseek-chat-v3.1',
0.15, -- latency_weight (low)
0.35, -- success_rate_weight (medium)
0.40, -- price_weight (high)
0.10, -- provider_priority_weight
'cost'
);UserRoutingPreferences {
default_strategy: "cost",
max_price_per_million_tokens: 10.0, // Max $10/M
min_success_rate: 0.90, // Must maintain 90%+
preferred_providers: [1, 5, 8], // Try these first
}┌────────────────────────────────────────────────────────────────────────────┐
│ CLIENT APPLICATION │
│ (Web/Mobile/API Consumer) │
└────────────────────────────┬───────────────────────────────────────────────┘
│ HTTP Request
│ POST /v1/chat/completions
│ { "model": "deepseek-chat", "messages": [...] }
▼
┌────────────────────────────────────────────────────────────────────────────┐
│ ACTIX-WEB HTTP SERVER │
│ (backend/src/routes/) │
└────────────────────────────┬───────────────────────────────────────────────┘
│
▼
┌────────────────────────────────────────────────────────────────────────────┐
│ RELAY HANDLER LAYER │
│ (backend/src/relay/handlers.rs) │
│ │
│ ┌──────────────────────────────────────────────────────────────────────┐ │
│ │ 1. Authentication & Authorization │ │
│ │ 2. Rate Limiting & Quotas │ │
│ │ 3. Model Validation │ │
│ │ 4. select_channel_with_routing() ──────────────────────┐ │ │
│ └────────────────────────────────────────────────────────│─────────────┘ │
└───────────────────────────────────────────────────────────│────────────────┘
│
┌──────────────────────────────┘
▼
┌────────────────────────────────────────────────────────────────────────────┐
│ ROUTING DECISION ENGINE │
│ (backend/src/services/) │
│ │
│ ┌───────────────────────────┐ ┌──────────────────────────────┐ │
│ │ ModelRoutingService │ │ ChannelService │ │
│ │ • route_request() │◄─┤ • get_routed_channel() │ │
│ │ • score_providers() │ │ • Provider model routing │ │
│ │ • select_provider() │ │ • Legacy channel routing │ │
│ └───────────┬───────────────┘ └──────────────────────────────┘ │
│ │ │
│ ├──► load_user_preferences() │
│ ├──► load_routing_config() │
│ └──► get_model_providers() ──┐ │
└───────────────────────────────────────────│────────────────────────────────┘
│
┌──────────────┘
▼
┌────────────────────────────────────────────────────────────────────────────┐
│ DATABASE LAYER (PostgreSQL) │
│ │
│ ┌──────────────────┐ ┌──────────────────┐ ┌─────────────────────────┐ │
│ │ provider_models │ │ provider_model_ │ │ model_routing_config │ │
│ │ • Model catalog │ │ metrics │ │ • Routing weights │ │
│ │ • Pricing info │ │ • Performance │ │ • Default strategies │ │
│ │ • Provider link │ │ • Circuit state │ │ • Fallback config │ │
│ └──────────────────┘ └──────────────────┘ └─────────────────────────┘ │
│ │
│ ┌──────────────────┐ ┌──────────────────┐ ┌─────────────────────────┐ │
│ │ user_routing_ │ │ routing_decision_│ │ channels │ │
│ │ preferences │ │ logs │ │ • Channel configs │ │
│ │ • User settings │ │ • Audit trail │ │ • API keys │ │
│ └──────────────────┘ └──────────────────┘ └─────────────────────────┘ │
└────────────────────────────────────────────────────────────────────────────┘
│
▼
┌───────────────────────────────────────────────────────────────────────────┐
│ SCORING & SELECTION ENGINE │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ CANDIDATE PROVIDERS (Example) │ │
│ │ │ │
│ │ A: deepseek-chat │ B: deepseek-chat │ C: deepseek-chat │ │ │
│ │ • Price: $2.50/M │ • Price: $3.00/M │ • Price: $2.00/M │ │
│ │ • Latency: 450ms │ • Latency: 600ms │ • Latency: 800ms │ │
│ │ • Success: 98% │ • Success: 97% │ • Success: 95% │ │
│ │ • Quality: 0.92 │ • Quality: 0.88 │ • Quality: 0.85 │ │
│ │ • Circuit: Closed │ • Circuit: Closed │ • Circuit: Half-Open │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ SCORING PROCESS │ │
│ │ │ │
│ │ Strategy: "Performance" │ │
│ │ │ │
│ │ Provider A Score = 0.98×0.4 + latency_score×0.3 + 0.92×0.1 │ │
│ │ = 0.392 + 0.165 + 0.092 = 0.649 │ │
│ │ │ │
│ │ Provider B Score = 0.97×0.4 + latency_score×0.3 + 0.88×0.1 │ │
│ │ = 0.388 + 0.140 + 0.088 = 0.616 │ │
│ │ │ │
│ │ Provider C Score = 0.95×0.4 + latency_score×0.3 + 0.85×0.1 │ │
│ │ = 0.380 + 0.120 + 0.085 = 0.585 │ │
│ │ (Circuit Half-Open: Lower priority) │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ SELECTION RESULT │ │
│ │ │ │
│ │ WINNER: Provider A (Score: 0.649) │ │
│ │ Fallback: Provider B (Score: 0.616) │ │
│ │ Fallback: Provider C (Score: 0.585) │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────────────────────────────┘
│
▼
┌────────────────────────────────────────────────────────────────────────────┐
│ CIRCUIT BREAKER CHECK │
│ (backend/src/services/circuit_breaker.rs) │
│ │
│ should_allow_request(Provider A, "deepseek-chat", channel_id) ? │
│ │
│ Circuit: CLOSED → Allow Request │
│ Circuit: OPEN → Try Fallback Provider B │
│ Circuit: HALF_OPEN → Allow (limited) │
└────────────────────────────┬───────────────────────────────────────────────┘
│
▼
┌────────────────────────────────────────────────────────────────────────────┐
│ EXECUTE REQUEST │
│ │
│ Channel ID: 23 │
│ Provider: Provider A │
│ Base URL: https://api.provider-a.com/v1 │
│ API Key: [encrypted] │
│ │
│ ┌──────────────────────────────────────────────────────────────────────┐ │
│ │ Forward Request to Provider │ │
│ │ ├─► Add API key authentication │ │
│ │ ├─► Transform request format │ │
│ │ ├─► Handle streaming/non-streaming │ │
│ │ └─► Track latency & tokens │ │
│ └──────────────────────────────────────────────────────────────────────┘ │
└────────────────────────────┬───────────────────────────────────────────────┘
│
┌─────────┴─────────┐
│ │
✅ SUCCES ❌ FAILURE
│ │
▼ ▼
┌────────────────────────────┐ ┌────────────────────────────┐
│ Record Success Metrics │ │ Record Failure Metrics │
│ • Latency: 450ms │ │ • Increment failure count │
│ • Tokens: 1500 + 300 │ │ • Update circuit state │
│ • Update quality score │ │ • Try fallback provider │
│ • Circuit: record_success │ │ • Circuit: record_failure │
└────────────────────────────┘ └────────────────────────────┘
│ │
└─────────┬─────────┘
▼
┌────────────────────────────────────────────────────────────────────────────┐
│ METRICS UPDATE PIPELINE │
│ (backend/src/services/provider_metrics.rs) │
│ │
│ Step 1: Memory Buffer (Immediate) │
│ ┌──────────────────────────────────────────────────────────────────────┐ │
│ │ METRICS_BUFFER (In-Memory HashMap) │ │
│ │ Key: (provider_id=5, model_id="deepseek-chat", channel_id=23) │ │
│ │ Value: [ {latency: 450, success: true, tokens: ...}, ... ] │ │
│ └──────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ Step 2: Periodic Aggregation (Every 60s) │
│ ┌──────────────────────────────────────────────────────────────────────┐ │
│ │ Aggregate metrics from buffer │ │
│ │ • Calculate avg, p50, p95, p99 latency │ │
│ │ • Calculate success rate │ │
│ │ • Sum token counts │ │
│ │ • Compute quality score │ │
│ └──────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ Step 3: Database Flush (Batch) │
│ ┌──────────────────────────────────────────────────────────────────────┐ │
│ │ UPDATE provider_model_metrics SET │ │
│ │ total_requests = total_requests + 1, │ │
│ │ avg_latency_ms = (avg_latency_ms * 0.9 + 450 * 0.1), │ │
│ │ quality_score = calculate_provider_quality_score(...), │ │
│ │ circuit_state = ... │ │
│ │ WHERE provider_id=5 AND model_id='deepseek-chat' AND channel_id=23 │ │
│ └──────────────────────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────────────────┘
│
▼
┌────────────────────────────────────────────────────────────────────────────┐
│ RETURN RESPONSE TO CLIENT │
│ │
│ HTTP 200 OK │
│ { │
│ "id": "chatcmpl-...", │
│ "model": "deepseek-chat", │
│ "choices": [...], │
│ "usage": { "prompt_tokens": 1500, "completion_tokens": 300 } │
│ } │
└────────────────────────────────────────────────────────────────────────────┘
╔═══════════════════════════════╗
║ CIRCUIT STATE MACHINE ║
╚═══════════════════════════════╝
┌──────────────────────────────────────────────────────────────────────┐
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ CLOSED STATE │ │
│ │ (Normal Operation) │ │
│ │ • All requests allowed │ │
│ │ • failure_count = 0 │ │
│ │ • Tracking consecutive failures │ │
│ └──────────────┬────────────────────────────────────┬──────┘ │
│ │ │ │
│ Success │ Failure │ │
│ (reset │ (increment)│ │
│ counter) │ │ │
│ │ │ │
│ │ ┌──────────────────────────┘ │
│ │ │ │
│ │ │ 5 consecutive failures │
│ │ │ (threshold reached) │
│ │ ▼ │
│ ┌──────────────┴────────────────────────────────────────────┐ │
│ │ OPEN STATE │ │
│ │ (Blocking Requests) │ │
│ │ • All requests BLOCKED │ │
│ │ • opened_at = current_timestamp │ │
│ │ • Return error / try fallback │ │
│ │ • Wait for recovery_timeout (60 seconds) │ │
│ └──────────────┬────────────────────────────────────────────┘ │
│ │ │
│ │ Wait 60 seconds │
│ │ (recovery_timeout expired) │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ HALF-OPEN STATE │ │
│ │ (Testing Recovery) │ │
│ │ • Limited requests allowed (max 3) │ │
│ │ • half_open_requests = 0 │ │
│ │ • Testing if provider recovered │ │
│ └──────────────┬─────────────────────────────┬──────────────┘ │
│ │ │ │
│ Success │ Failure │ │
│ (3 times) │ (any) │ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────────────────┐ ┌───────────────────────────┐ │
│ │ Back to CLOSED │ │ Back to OPEN │ │
│ │ (Provider recovered) │ │ (Still failing) │ │
│ │ • Reset counters │ │ • Reset timeout │ │
│ │ • Full traffic resume │ │ • Wait another 60s │ │
│ └──────────────────────────┘ └───────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────────┘
Configuration:
• failure_threshold = 5 (failures to open circuit)
• success_threshold = 3 (successes to close circuit)
• recovery_timeout = 60 seconds (wait before testing)
• half_open_max_requests = 3 (test request limit)
╔════════════════════════════════════════════════════════════════════════╗
║ ROUTING STRATEGY SCORING ALGORITHMS ║
╚════════════════════════════════════════════════════════════════════════╝
┌──────────────────────────────────────────────────────────────────────────┐
│ PERFORMANCE STRATEGY │
│ Goal: Maximize speed and reliability │
├──────────────────────────────────────────────────────────────────────────┤
│ │
│ Formula: │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ score = success_rate × 0.4 │ │
│ │ + latency_score × 0.3 │ │
│ │ + quality_score × 0.1 │ │
│ │ + priority_bonus │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ Where: │
│ • success_rate: 0.0 - 1.0 (higher is better) │
│ • latency_score = 1.0 - (latency_ms / 30000) (lower latency = higher) │
│ • quality_score: historical quality metric (0.0 - 1.0) │
│ • priority_bonus: channel_priority / 100 (max 0.2) │
│ │
│ Example: │
│ Provider with 98% success, 450ms latency, quality 0.92, priority 10 │
│ score = 0.98×0.4 + (1-450/30000)×0.3 + 0.92×0.1 + 0.1 │
│ = 0.392 + 0.296 + 0.092 + 0.1 = 0.880 │
└──────────────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────────────┐
│ COST STRATEGY │
│ Goal: Minimize cost while maintaining quality │
├──────────────────────────────────────────────────────────────────────────┤
│ │
│ Formula: │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ avg_price = (prompt_price + completion_price) / 2 │ │
│ │ price_score = 1.0 - (avg_price / 100.0) │ │
│ │ score = price_score × 0.6 │ │
│ │ + success_rate × 0.3 │ │
│ │ + quality_score × 0.1 │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ Where: │
│ • prompt_price: $ per million prompt tokens │
│ • completion_price: $ per million completion tokens │
│ • price_score: normalized inverse price (cheaper = higher) │
│ │
│ Example: │
│ Provider with $2.50 prompt, $10.00 completion, 97% success, quality 0.9 │
│ avg_price = (2.50 + 10.00) / 2 = $6.25 │
│ price_score = 1.0 - (6.25 / 100) = 0.9375 │
│ score = 0.9375×0.6 + 0.97×0.3 + 0.9×0.1 = 0.944 │
└──────────────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────────────┐
│ BALANCED STRATEGY │
│ Goal: Optimize all factors with configurable weights │
├──────────────────────────────────────────────────────────────────────────┤
│ │
│ Formula: │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ perf_score = performance_score(candidate, config) │ │
│ │ cost_score = cost_score(candidate, config) │ │
│ │ │ │
│ │ total_weight = latency_w + success_w + price_w + priority_w │ │
│ │ perf_weight = (latency_w + success_w) / total_weight │ │
│ │ cost_weight = price_w / total_weight │ │
│ │ │ │
│ │ score = perf_score × perf_weight + cost_score × cost_weight │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ Default weights (can be configured per model): │
│ • latency_weight: 0.3 │
│ • success_rate_weight: 0.4 │
│ • price_weight: 0.2 │
│ • provider_priority_weight: 0.1 │
│ │
│ Example: │
│ Using default weights: │
│ perf_weight = (0.3 + 0.4) / 1.0 = 0.7 │
│ cost_weight = 0.2 / 1.0 = 0.2 │
│ score = 0.880×0.7 + 0.944×0.2 = 0.616 + 0.189 = 0.805 │
└──────────────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────────────┐
│ ROUND-ROBIN STRATEGY │
│ Goal: Equal distribution across all providers │
├──────────────────────────────────────────────────────────────────────────┤
│ │
│ Formula: │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ All candidates receive equal score = 1.0 │ │
│ │ Selection: index = counter % provider_count │ │
│ │ counter = (counter + 1) % usize::MAX │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ Behavior: │
│ Request 1 → Provider A │
│ Request 2 → Provider B │
│ Request 3 → Provider C │
│ Request 4 → Provider A (cycle repeats) │
│ │
│ Note: No performance consideration, purely sequential distribution │
└──────────────────────────────────────────────────────────────────────────┘
Time │ Component │ Action
──────┼─────────────────────────┼────────────────────────────────────────────
0ms │ Client │ POST /v1/chat/completions
│ │ { "model": "deepseek-chat", "messages": [...] }
──────┼─────────────────────────┼────────────────────────────────────────────
1ms │ Relay Handler │ Validate request, extract model_id
│ │ Check authentication & rate limits
──────┼─────────────────────────┼────────────────────────────────────────────
2ms │ Channel Service │ Call get_routed_channel("deepseek-chat", user_id)
│ │ Try provider model routing first
──────┼─────────────────────────┼────────────────────────────────────────────
3ms │ Model Routing Service │ Query provider_models table
│ │ SELECT * FROM provider_models WHERE model_id='deepseek-chat'
│ │ JOIN provider_model_metrics
│ │ Found 3 candidates
──────┼─────────────────────────┼────────────────────────────────────────────
4ms │ Model Routing Service │ Load user preferences (if exists)
│ │ SELECT * FROM user_routing_preferences WHERE user_id=...
──────┼─────────────────────────┼────────────────────────────────────────────
5ms │ Model Routing Service │ Load model routing config
│ │ SELECT * FROM model_routing_config WHERE model_id='deepseek-chat'
──────┼─────────────────────────┼────────────────────────────────────────────
6ms │ Model Routing Service │ Score 3 candidates using "balanced" strategy
│ │ Provider A: 0.880
│ │ Provider B: 0.805
│ │ Provider C: 0.750
──────┼─────────────────────────┼────────────────────────────────────────────
7ms │ Model Routing Service │ Apply user preferences filters
│ │ Boost preferred providers (+50%)
│ │ Remove blocked providers
──────┼─────────────────────────┼────────────────────────────────────────────
8ms │ Model Routing Service │ Weighted random selection from top 3
│ │ Selected: Provider A (channel_id=23)
│ │ Fallbacks: [Provider B, Provider C]
──────┼─────────────────────────┼────────────────────────────────────────────
9ms │ Circuit Breaker │ Check should_allow_request(Provider A, "deepseek-chat", 23)
│ │ Circuit state: CLOSED
│ │ Allow request
──────┼─────────────────────────┼────────────────────────────────────────────
10ms │ Relay Handler │ Get channel details from channels table
│ │ Channel 23: base_url, api_key
──────┼─────────────────────────┼────────────────────────────────────────────
11ms │ Routing Decision Log │ Async: INSERT INTO routing_decision_logs
│ │ (non-blocking, happens in background)
──────┼─────────────────────────┼────────────────────────────────────────────
12ms │ Relay Handler │ Transform request for provider API
│ │ Add Authorization: Bearer [api_key]
│ │ Adjust model name if needed
──────┼─────────────────────────┼────────────────────────────────────────────
15ms │ HTTP Client │ POST https://api.provider-a.com/v1/chat/completions
│ │ Start latency timer
──────┼─────────────────────────┼────────────────────────────────────────────
... │ Provider A │ Processing request...
──────┼─────────────────────────┼────────────────────────────────────────────
465ms │ HTTP Client │ Response received from Provider A
│ │ Status: 200 OK
│ │ Latency: 450ms (15ms → 465ms)
──────┼─────────────────────────┼────────────────────────────────────────────
466ms │ Relay Handler │ Parse response
│ │ Extract usage: prompt_tokens=1500, completion_tokens=300
──────┼─────────────────────────┼────────────────────────────────────────────
467ms │ Provider Metrics │ record_request(provider_id=5, model="deepseek-chat",
│ │ channel_id=23, latency=450, success=true,
│ │ prompt_tokens=1500, completion_tokens=300)
│ │ → Stored in memory buffer (non-blocking)
──────┼─────────────────────────┼────────────────────────────────────────────
468ms │ Circuit Breaker │ record_success(provider_id=5, model="deepseek-chat",
│ │ channel_id=23)
│ │ → success_count++, failure_count=0
──────┼─────────────────────────┼────────────────────────────────────────────
469ms │ Billing Service │ post_consume_quota() (async, non-blocking)
│ │ Deduct quota from user balance
──────┼─────────────────────────┼────────────────────────────────────────────
470ms │ Relay Handler │ Return response to client
│ │ HTTP 200 OK with completion
──────┼─────────────────────────┼────────────────────────────────────────────
Background Tasks (runs every 60 seconds):
──────┼─────────────────────────┼────────────────────────────────────────────
60s │ Metrics Aggregator │ Aggregate metrics from memory buffer
│ │ Calculate avg, p50, p95, p99 latency
│ │ Calculate success rate for last hour
──────┼─────────────────────────┼────────────────────────────────────────────
61s │ Metrics Aggregator │ Batch update to provider_model_metrics table
│ │ UPDATE provider_model_metrics SET ...
│ │ Recalculate quality scores
──────┼─────────────────────────┼────────────────────────────────────────────
62s │ Circuit Breaker │ recover_circuit_breakers()
│ │ Check if any OPEN circuits can move to HALF_OPEN
──────┼─────────────────────────┼────────────────────────────────────────────
┌─────────────────────────────────────────────────────────────────────────┐
│ DATABASE SCHEMA RELATIONSHIPS │
└─────────────────────────────────────────────────────────────────────────┘
┌──────────────┐
│ users │
│ (providers) │
├──────────────┤
│ id (PK) │◄────────┐
│ username │ │
│ is_provider │ │ provider_id (FK)
│ provider_ │ │
│ status │ │
└──────┬───────┘ │
│ │
┌──────────────────┼─────────────────┼──────────────┐
│ │ │ │
│ provider_id (FK) │ │ │
▼ ▼ │ │
┌──────────────────┐ ┌──────────────┐ │ │
│ provider_ │ │ channels │ │ │
│ models │ ├──────────────┤ │ │
├──────────────────┤ │ id (PK) │◄────┐ │ │
│ id (PK) │ │ provider_id │ │ │ │
│ model_id │ │ (FK) │ │ │ │
│ provider_id (FK) ├─►│ base_url │ │ │ │
│ channel_id (FK) ├──┤ key │ │ │ │
│ model_name │ │ status │ │ │ │
│ pricing_prompt │ └──────────────┘ │ │ │
│ pricing_ │ │ │ │
│ completion │ │ │ │
│ context_length │ │ │ │
│ status │ channel_id (FK) │ │ │
│ quality_score │ │ │ │ │
└──────┬───────────┘ │ │ │ │
│ │ │ │ │
│ (provider_id, │ │ │ │
│ model_id, │ │ │ │
│ channel_id) │ │ │ │
│ │ │ │ │
▼ │ │ │ │
┌──────────────────┐ │ │ │ │
│ provider_model_ │ │ │ │ │
│ metrics │◄────────┘ │ │ │
├──────────────────┤ │ │ │
│ id (PK) │ │ │ │
│ provider_id (FK) ├───────────────────────┘ │ │
│ model_id │ │ │
│ channel_id (FK) ├────────────────────────────┘ │
│ total_requests │ │
│ success_rate_ │ │
│ last_hour │ │
│ avg_latency_ms │ │
│ quality_score │ │
│ circuit_state │ │
└──────────────────┘ │
│
┌───────────────────────────────────────────────────────┘
│
│ user_id (FK)
▼
┌──────────────────┐
│ user_routing_ │
│ preferences │
├──────────────────┤
│ id (PK) │
│ user_id (FK) │
│ default_strategy │
│ preferred_ │
│ providers │
│ blocked_ │
│ providers │
│ max_price │
│ min_success_rate │
└──────────────────┘
┌──────────────────┐
│ model_routing_ │
│ config │
├──────────────────┤
│ id (PK) │
│ canonical_ │
│ model_id │
│ latency_weight │
│ success_rate_ │
│ weight │
│ price_weight │
│ default_strategy │
└──────────────────┘
┌──────────────────┐
│ routing_ │
│ decision_logs │
├──────────────────┤
│ id (PK) │
│ request_id │
│ user_id │
│ model_id │
│ selected_ │
│ provider_id │
│ selected_ │
│ channel_id │
│ routing_strategy │
│ routing_reason │
│ candidates_json │
│ created_at │
└──────────────────┘
Legend:
PK = Primary Key
FK = Foreign Key
╔══════════════════════════════════════════════════════════════════════════╗
║ KEY ADVANTAGES OF PROVIDER ROUTING SYSTEM ║
╚══════════════════════════════════════════════════════════════════════════╝
┌────────────────────────────────────────────────────────────────────────┐
│ 1. INTELLIGENT SELECTION │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ Traditional: Provider Routing: │
│ ┌──────────┐ ┌──────────┐ │
│ │ Request │ │ Request │ │
│ └────┬─────┘ └────┬─────┘ │
│ │ │ │
│ │ Fixed │ Intelligent │
│ │ Config │ Selection │
│ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ │
│ │ Channel │ │ Best │ ← Based on: │
│ │ (static) │ │ Provider │ • Performance │
│ └──────────┘ └──────────┘ • Cost │
│ • User prefs │
│ • Real-time metrics │
│ Result: Fixed, Result: Dynamic, │
│ no optimization always optimized │
└────────────────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────────────┐
│ 2. HIGH AVAILABILITY │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ Without Routing: With Provider Routing: │
│ ┌──────────┐ ┌──────────┐ │
│ │ Provider │ │ Provider │ │
│ │ A │ ← Request │ A │ ← Request │
│ └────┬─────┘ └────┬─────┘ │
│ │ │ │
│ │ FAILS │ FAILS │
│ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ │
│ │ ERROR │ │ Circuit │ │
│ │ RETURNED │ │ Breaker │ │
│ └──────────┘ │ Opens │ │
│ └────┬─────┘ │
│ User sees error │ Auto │
│ │ Fallback │
│ ▼ │
│ ┌──────────┐ │
│ │ Provider │ │
│ │ B │ ← Retry │
│ └────┬─────┘ │
│ │ │
│ │ SUCCESS │
│ ▼ │
│ ┌──────────┐ │
│ │ Response │ │
│ │ Returned │ │
│ └──────────┘ │
│ │
│ Uptime: ~99.5% Uptime: ~99.99% │
└────────────────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────────────┐
│ 3. COST OPTIMIZATION │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ Fixed Provider: Provider Routing (Cost Strategy): │
│ │
│ Provider A: $10/M Provider A: $10/M → Score: 0.70 │
│ (only option) Provider B: $5/M → Score: 0.85 │
│ Provider C: $12/M → Score: 0.65 │
│ 1M tokens = $10 │
│ 10M tokens = $100 Provider B selected (cheapest) │
│ 100M tokens = $1,000 1M tokens = $5 │
│ 10M tokens = $50 │
│ 100M tokens = $500 │
│ │
│ Monthly cost: $1,000 Monthly cost: $500 │
│ SAVINGS: 50% │
└────────────────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────────────┐
│ 4. PERFORMANCE TRACKING │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ No Metrics: Provider Routing Metrics: │
│ • Unknown performance • Real-time latency (avg, p50-p99) │
│ • No visibility • Success rate (hourly, daily) │
│ • Can't optimize • Quality score (0.0-1.0) │
│ • Blind to issues • Circuit breaker state │
│ • Token throughput │
│ • Historical trends │
│ │
│ Dashboard: Dashboard: │
│ ┌──────────────┐ ┌──────────────────────────────────┐ │
│ │ │ │ Provider A: 450ms avg, 98% │ │
│ │ No Data │ │ Provider B: 650ms avg, 97% │ │
│ │ │ │ Provider C: 900ms avg, 92% │ │
│ │ │ │ │ │
│ └──────────────┘ │ Trending: Provider A improving │ │
│ │ Alert: Provider C degraded │ │
│ └──────────────────────────────────┘ │
│ │
│ Result: Reactive Result: Proactive │
└────────────────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────────────┐
│ 5. USER EMPOWERMENT │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ Fixed Config: User Preferences: │
│ • No control • Choose strategy (perf/cost/balanced)│
│ • One size fits all • Set preferred providers │
│ • Can't avoid bad providers • Block problematic providers │
│ • No budget control • Set price limits ($X per M tokens) │
│ • Set quality thresholds │
│ • Per-request overrides │
│ │
│ User A (needs speed): User A preferences: │
│ ┌────────────────┐ ┌──────────────────────────────────┐ │
│ │ Gets random │ │ strategy: "performance" │ │
│ │ slow provider │ │ min_success_rate: 0.99 │ │
│ │ Frustrated │ │ max_latency_ms: 1000 │ │
│ └────────────────┘ └──────────────────────────────────┘ │
│ → Gets fastest, most reliable │
│ User B (budget-conscious): User B preferences: │
│ ┌────────────────┐ ┌──────────────────────────────────┐ │
│ │ Pays high │ │ strategy: "cost" │ │
│ │ prices │ │ max_price: 7.0 │ │
│ │ Expensive │ │ min_success_rate: 0.95 │ │
│ └────────────────┘ └──────────────────────────────────┘ │
│ → Gets cheapest within budget │
└────────────────────────────────────────────────────────────────────────┘
╔══════════════════════════════════════════════════════════════════════════╗
║ BEFORE PROVIDER ROUTING vs AFTER PROVIDER ROUTING ║
╚══════════════════════════════════════════════════════════════════════════╝
┌───────────────────────────────────────────────────────────────────────────┐
│ METRIC │ BEFORE │ AFTER │
├───────────────────────┼─────────────────────┼─────────────────────────────┤
│ Provider Selection │ Manual/Random │ Intelligent (metrics-based) │
│ Optimization │ None │ Real-time, multi-dimensional│
│ Availability │ ~99.5% │ ~99.99% │
│ Cost Optimization │ No │ Yes (up to 50% savings) │
│ Failover Time │ Manual (minutes) │ Automatic (milliseconds) │
│ Performance Tracking │ None │ Comprehensive │
│ User Control │ None │ Full (preferences) │
│ Provider Diversity │ Limited │ Multiple per model │
│ Quality Assurance │ Manual │ Automated (circuit breaker) │
│ Audit Trail │ None │ Complete logging │
│ Admin Visibility │ None │ Full dashboard │
│ Scalability │ Limited │ Highly scalable │
└───────────────────────────────────────────────────────────────────────────┘
BEFORE: Simple but Inflexible
┌──────────────────────────────────────────────────────────────────────┐
│ Request → Channel (fixed) → Provider → Response │
│ │
│ Problems: │
│ • Single point of failure │
│ • No optimization │
│ • No visibility │
│ • Manual intervention needed │
└──────────────────────────────────────────────────────────────────────┘
AFTER: Intelligent & Resilient
┌──────────────────────────────────────────────────────────────────────┐
│ Request → Routing Engine → Best Provider (scored) → Response │
│ ↓ ↓ │
│ Metrics Circuit Breaker │
│ User Prefs Fallback Chain │
│ Config Quality Tracking │
│ │
│ Benefits: │
│ Automatic failover │
│ Cost & performance optimization │
│ Complete visibility │
│ Self-healing system │
└──────────────────────────────────────────────────────────────────────┘