-
-
Notifications
You must be signed in to change notification settings - Fork 0
Provider System
M31 Autonomous (M31A) supports multiple LLM providers with automatic fallback, model caching, capability detection, and SSE streaming.
Source: internal/provider/interface.go
type LLMProvider interface {
Name() string
APIKey() string
FetchModels(ctx context.Context) ([]types.ModelInfo, error)
CachedModels() []types.ModelInfo
ChatCompletionStream(ctx context.Context, req ChatRequest) (*types.StreamIterator, error)
EstimateCost(modelID string, usage types.Usage) float64
HealthCheck(ctx context.Context) types.HealthStatus
GetModel(id string) (*types.ModelInfo, error)
}type ChatRequest struct {
Model string `json:"model"`
Messages []types.Message `json:"messages"`
MaxTokens int `json:"max_tokens,omitempty"`
Tools []ToolDefinition `json:"tools,omitempty"`
ReasoningEnabled bool `json:"reasoning_enabled,omitempty"`
}Source: internal/provider/registry.go
The Registry manages multiple providers with thread-safe operations:
| Method | Description |
|---|---|
Register(name, provider) |
Add a provider |
SetActive(name) |
Switch active provider |
TrySetActive(name) |
Atomic set (prevents TOCTOU races) |
RollbackActive(from, to) |
Revert on health check failure |
Active() |
Current active provider name |
ActiveProvider() |
Current active provider instance |
Get(name) |
Look up provider by name |
List() |
All registered provider names (sorted) |
Source: internal/provider/openrouter/client.go
- Aggregates 300+ models from multiple providers
- Default base URL:
https://openrouter.ai/api/v1 - Sends
HTTP-RefererandX-Titleheaders - Custom base URL supported for proxies
Source: internal/provider/zen/client.go
- OpenCode Zen API gateway
- Default base URL:
https://opencode.ai/zen/v1 - Default context length support
Source: internal/provider/nvidia/client.go
- NVIDIA Inference Microservices (NIM) API
- Default base URL:
https://integrate.api.nvidia.com/v1 - Full
LLMProviderinterface implementation - Completion-only model filtering (skips
codellama,starcoder, etc. from chat UI) - Retry logic with exponential backoff (max 2 retries) for transient errors (500, 502, 503)
- Rate limit (429), unauthorized (401), payment required (402), service unavailable (503) handling
- Health check latency classification: live (<500ms), slow (<2s), degraded (>2s)
See NVIDIA NIM Provider for detailed setup and configuration.
Source: internal/provider/base_client.go
Shared HTTP transport used by all provider implementations:
| Setting | Value |
|---|---|
| Max idle connections | 100 |
| Max idle per host | 10 |
| Idle timeout | 90 seconds |
| Dial timeout | 30 seconds (HTTPDialTimeout) |
Provides common operations: APIKey() (masked), EstimateCost(), GetModel(), CachedModels(), MakeIterator().
Source: internal/provider/cache.go
TTL-based in-memory cache with stale-while-revalidate semantics:
| Setting | Default | Description |
|---|---|---|
| TTL | 5 minutes (ModelCacheTTL) |
Fresh cache lifetime |
| Stale TTL | 24 hours (StaleCacheTTL) |
Fallback cache lifetime |
When fresh data expires, stale entries are served while a background refresh runs. Uses golang.org/x/sync/singleflight to prevent thundering herd on cache miss.
Source: internal/provider/model_metadata.go
Enriches raw model data with additional metadata:
- Tokenizer family detection
- Variant classification ("thinking", "fast", "extended", "vision")
Source: internal/provider/capabilities.go
Heuristic inference of model capabilities from model ID patterns:
| Capability | Detection Heuristic |
|---|---|
| Tool Use | claude, gpt, gemini, deepseek, qwen, llama, mistral, command-r, command-a |
| Reasoning | /o1, /o3, /o4 patterns + "reason" or "thinking" in ID |
| Vision | "vision" or "multimodal" in ID |
Source: internal/provider/fallback.go
When the active provider degrades, M31A automatically switches to a healthy alternative:
flowchart TD
Fail[Active Provider Fails] --> FFP[FindFallbackProvider]
FFP --> Collect[Collect Candidates<br/>exclude current]
Collect --> Health[Parallel Health Checks<br/>10s timeout]
Health --> Check{First Live?}
Check -->|Yes| Switch[TrySetActive]
Check -->|No Slow| Slow[Use Slow Fallback]
Check -->|None| Error[ErrProviderUnreachable]
style Fail fill:#ffcdd2
style Switch fill:#c8e6c9
style Slow fill:#fff9c4
FindFallbackWithRetryAfter() handles HTTP 429 responses:
- Extracts
Retry-Afterheader value - Caps wait at 120 seconds (
MaxRetryAfterWait) - Returns
FallbackAfterWaitstruct for async scheduling (avoids blocking the Bubble Tea event loop)
type FallbackEvent struct {
From string `json:"from"`
To string `json:"to"`
Reason string `json:"reason"` // "fallback_live", "fallback_slow", "rate_limited"
}Source: internal/provider/sse.go
Server-Sent Events parser for streaming LLM responses:
- Parses
data: [DONE]sentinel for stream completion - Handles
deltachunks for text content - Handles
tool_callchunks for native tool invocation - Detects truncated streams (
ErrStreamTruncated) - Enforces
MaxLLMResponseBytes(1 MB) to prevent OOM
type StreamIterator struct {
Next func() (*StreamChunk, error)
Close func() error
}The workflow engine's consumeStreamWithTools() reads chunks and:
- Accumulates text deltas into a
strings.Builder - Collects
tool_callchunks by index intotoolCallBuilderstructs - Finalizes into
ToolCallobjects with parsed JSON arguments
Source: internal/tui/provider_registration.go
Provider registration happens at startup via a centralized factory:
registry := provider.NewRegistry()
if cfg.Provider.OpenRouter.APIKey != "" {
RegisterProvider(registry, cfg, "openrouter", apiKey, version)
}
if cfg.Provider.Zen.APIKey != "" {
RegisterProvider(registry, cfg, "zen", apiKey, version)
}
if cfg.Provider.Nvidia.APIKey != "" {
RegisterProvider(registry, cfg, "nvidia", apiKey, version)
}
if cfg.Provider.Default != "" {
registry.SetActive(cfg.Provider.Default)
}Supported providers: openrouter, zen, nvidia.
Source: internal/provider/common.go
Shared helpers:
- HTTP response body reading with size limits
- Error sanitization (caps at
MaxProviderErrorChars = 200) - Retry-After header parsing
- Rate limit detection (HTTP 429)
-
HTTPStatusErrortyped error withIsRetryable()method for retry classification