Provider System

M31 Autonomous (M31A) supports multiple LLM providers with automatic fallback, model caching, capability detection, and SSE streaming.

Provider Interface

Source: internal/provider/interface.go

type LLMProvider interface {
    Name() string
    APIKey() string
    FetchModels(ctx context.Context) ([]types.ModelInfo, error)
    CachedModels() []types.ModelInfo
    ChatCompletionStream(ctx context.Context, req ChatRequest) (*types.StreamIterator, error)
    EstimateCost(modelID string, usage types.Usage) float64
    HealthCheck(ctx context.Context) types.HealthStatus
    GetModel(id string) (*types.ModelInfo, error)
}

ChatRequest

type ChatRequest struct {
    Model            string           `json:"model"`
    Messages         []types.Message  `json:"messages"`
    MaxTokens        int              `json:"max_tokens,omitempty"`
    Tools            []ToolDefinition `json:"tools,omitempty"`
    ReasoningEnabled bool             `json:"reasoning_enabled,omitempty"`
}

Registry

Source: internal/provider/registry.go

The Registry manages multiple providers with thread-safe operations:

Method	Description
`Register(name, provider)`	Add a provider
`SetActive(name)`	Switch active provider
`TrySetActive(name)`	Atomic set (prevents TOCTOU races)
`RollbackActive(from, to)`	Revert on health check failure
`Active()`	Current active provider name
`ActiveProvider()`	Current active provider instance
`Get(name)`	Look up provider by name
`List()`	All registered provider names (sorted)

Implementations

OpenRouter

Source: internal/provider/openrouter/client.go

Aggregates 300+ models from multiple providers
Default base URL: https://openrouter.ai/api/v1
Sends HTTP-Referer and X-Title headers
Custom base URL supported for proxies

Zen

Source: internal/provider/zen/client.go

OpenCode Zen API gateway
Default base URL: https://opencode.ai/zen/v1
Default context length support

NVIDIA NIM

Source: internal/provider/nvidia/client.go

NVIDIA Inference Microservices (NIM) API
Default base URL: https://integrate.api.nvidia.com/v1
Full LLMProvider interface implementation
Completion-only model filtering (skips codellama, starcoder, etc. from chat UI)
Retry logic with exponential backoff (max 2 retries) for transient errors (500, 502, 503)
Rate limit (429), unauthorized (401), payment required (402), service unavailable (503) handling
Health check latency classification: live (<500ms), slow (<2s), degraded (>2s)

See NVIDIA NIM Provider for detailed setup and configuration.

Base Client

Source: internal/provider/base_client.go

Shared HTTP transport used by all provider implementations:

Setting	Value
Max idle connections	100
Max idle per host	10
Idle timeout	90 seconds
Dial timeout	30 seconds (`HTTPDialTimeout`)

Provides common operations: APIKey() (masked), EstimateCost(), GetModel(), CachedModels(), MakeIterator().

Model Cache

Source: internal/provider/cache.go

TTL-based in-memory cache with stale-while-revalidate semantics:

Setting	Default	Description
TTL	5 minutes (`ModelCacheTTL`)	Fresh cache lifetime
Stale TTL	24 hours (`StaleCacheTTL`)	Fallback cache lifetime

When fresh data expires, stale entries are served while a background refresh runs. Uses golang.org/x/sync/singleflight to prevent thundering herd on cache miss.

Model Metadata

Source: internal/provider/model_metadata.go

Enriches raw model data with additional metadata:

Tokenizer family detection
Variant classification ("thinking", "fast", "extended", "vision")

Capability Detection

Source: internal/provider/capabilities.go

Heuristic inference of model capabilities from model ID patterns:

Capability	Detection Heuristic
Tool Use	claude, gpt, gemini, deepseek, qwen, llama, mistral, command-r, command-a
Reasoning	/o1, /o3, /o4 patterns + "reason" or "thinking" in ID
Vision	"vision" or "multimodal" in ID

Automatic Fallback

Source: internal/provider/fallback.go

When the active provider degrades, M31A automatically switches to a healthy alternative:

Health Check Flow

flowchart TD
    Fail[Active Provider Fails] --> FFP[FindFallbackProvider]
    FFP --> Collect[Collect Candidates<br/>exclude current]
    Collect --> Health[Parallel Health Checks<br/>10s timeout]
    Health --> Check{First Live?}
    Check -->|Yes| Switch[TrySetActive]
    Check -->|No Slow| Slow[Use Slow Fallback]
    Check -->|None| Error[ErrProviderUnreachable]

    style Fail fill:#ffcdd2
    style Switch fill:#c8e6c9
    style Slow fill:#fff9c4

Retry-After Awareness

FindFallbackWithRetryAfter() handles HTTP 429 responses:

Extracts Retry-After header value
Caps wait at 120 seconds (MaxRetryAfterWait)
Returns FallbackAfterWait struct for async scheduling (avoids blocking the Bubble Tea event loop)

Fallback Event

type FallbackEvent struct {
    From   string `json:"from"`
    To     string `json:"to"`
    Reason string `json:"reason"` // "fallback_live", "fallback_slow", "rate_limited"
}

SSE Streaming

Source: internal/provider/sse.go

Server-Sent Events parser for streaming LLM responses:

Parses data: [DONE] sentinel for stream completion
Handles delta chunks for text content
Handles tool_call chunks for native tool invocation
Detects truncated streams (ErrStreamTruncated)
Enforces MaxLLMResponseBytes (1 MB) to prevent OOM

Stream Iterator

type StreamIterator struct {
    Next  func() (*StreamChunk, error)
    Close func() error
}

The workflow engine's consumeStreamWithTools() reads chunks and:

Accumulates text deltas into a strings.Builder
Collects tool_call chunks by index into toolCallBuilder structs
Finalizes into ToolCall objects with parsed JSON arguments

Provider Registration

Source: internal/tui/provider_registration.go

Provider registration happens at startup via a centralized factory:

registry := provider.NewRegistry()
if cfg.Provider.OpenRouter.APIKey != "" {
    RegisterProvider(registry, cfg, "openrouter", apiKey, version)
}
if cfg.Provider.Zen.APIKey != "" {
    RegisterProvider(registry, cfg, "zen", apiKey, version)
}
if cfg.Provider.Nvidia.APIKey != "" {
    RegisterProvider(registry, cfg, "nvidia", apiKey, version)
}
if cfg.Provider.Default != "" {
    registry.SetActive(cfg.Provider.Default)
}

Supported providers: openrouter, zen, nvidia.

Common Utilities

Source: internal/provider/common.go

Shared helpers:

HTTP response body reading with size limits
Error sanitization (caps at MaxProviderErrorChars = 200)
Retry-After header parsing
Rate limit detection (HTTP 429)
HTTPStatusError typed error with IsRetryable() method for retry classification

Uh oh!

Provider System

Provider System

Provider Interface

ChatRequest

Registry

Implementations

OpenRouter

Zen

NVIDIA NIM

Base Client

Model Cache

Model Metadata

Capability Detection

Automatic Fallback

Health Check Flow

Retry-After Awareness

Fallback Event

SSE Streaming

Stream Iterator

Provider Registration

Common Utilities

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally