Skip to content

Provider System

Eshan Roy edited this page Jun 16, 2026 · 4 revisions

Provider System

M31 Autonomous (M31A) supports multiple LLM providers with automatic fallback, model caching, capability detection, and SSE streaming.

Provider Interface

Source: internal/provider/interface.go

type LLMProvider interface {
    Name() string
    APIKey() string
    FetchModels(ctx context.Context) ([]types.ModelInfo, error)
    CachedModels() []types.ModelInfo
    ChatCompletionStream(ctx context.Context, req ChatRequest) (*types.StreamIterator, error)
    EstimateCost(modelID string, usage types.Usage) float64
    HealthCheck(ctx context.Context) types.HealthStatus
    GetModel(id string) (*types.ModelInfo, error)
}

ChatRequest

type ChatRequest struct {
    Model            string           `json:"model"`
    Messages         []types.Message  `json:"messages"`
    MaxTokens        int              `json:"max_tokens,omitempty"`
    Tools            []ToolDefinition `json:"tools,omitempty"`
    ReasoningEnabled bool             `json:"reasoning_enabled,omitempty"`
}

Registry

Source: internal/provider/registry.go

The Registry manages multiple providers with thread-safe operations:

Method Description
Register(name, provider) Add a provider
SetActive(name) Switch active provider
TrySetActive(name) Atomic set (prevents TOCTOU races)
RollbackActive(from, to) Revert on health check failure
Active() Current active provider name
ActiveProvider() Current active provider instance
Get(name) Look up provider by name
List() All registered provider names (sorted)

Implementations

OpenRouter

Source: internal/provider/openrouter/client.go

  • Aggregates 300+ models from multiple providers
  • Default base URL: https://openrouter.ai/api/v1
  • Sends HTTP-Referer and X-Title headers
  • Custom base URL supported for proxies

Zen

Source: internal/provider/zen/client.go

  • OpenCode Zen API gateway
  • Default base URL: https://opencode.ai/zen/v1
  • Default context length support

Base Client

Source: internal/provider/base_client.go

Shared HTTP transport used by all provider implementations:

Setting Value
Max idle connections 100
Max idle per host 10
Idle timeout 90 seconds
Dial timeout 30 seconds (HTTPDialTimeout)

Provides common operations: APIKey() (masked), EstimateCost(), GetModel(), CachedModels(), MakeIterator().

Model Cache

Source: internal/provider/cache.go

TTL-based in-memory cache with stale-while-revalidate semantics:

Setting Default Description
TTL 5 minutes (ModelCacheTTL) Fresh cache lifetime
Stale TTL 24 hours (StaleCacheTTL) Fallback cache lifetime

When fresh data expires, stale entries are served while a background refresh runs. Uses golang.org/x/sync/singleflight to prevent thundering herd on cache miss.

Model Metadata

Source: internal/provider/model_metadata.go

Enriches raw model data with additional metadata:

  • Tokenizer family detection
  • Variant classification ("thinking", "fast", "extended", "vision")

Capability Detection

Source: internal/provider/capabilities.go

Heuristic inference of model capabilities from model ID patterns:

Capability Detection Heuristic
Tool Use claude, gpt, gemini, deepseek, qwen, llama, mistral, command-r, command-a
Reasoning /o1, /o3, /o4 patterns + "reason" or "thinking" in ID
Vision "vision" or "multimodal" in ID

Automatic Fallback

Source: internal/provider/fallback.go

When the active provider degrades, M31A automatically switches to a healthy alternative:

Health Check Flow

Active provider fails
    │
    ▼
FindFallbackProvider()
    ├── Collect candidate providers (exclude current)
    ├── Parallel health checks (10s timeout)
    ├── Pick first "live" or "slow" provider
    └── Commit switch via TrySetActive()

Retry-After Awareness

FindFallbackWithRetryAfter() handles HTTP 429 responses:

  • Extracts Retry-After header value
  • Caps wait at 120 seconds (MaxRetryAfterWait)
  • Returns FallbackAfterWait struct for async scheduling (avoids blocking the Bubble Tea event loop)

Fallback Event

type FallbackEvent struct {
    From   string `json:"from"`
    To     string `json:"to"`
    Reason string `json:"reason"` // "fallback_live", "fallback_slow", "rate_limited"
}

SSE Streaming

Source: internal/provider/sse.go

Server-Sent Events parser for streaming LLM responses:

  • Parses data: [DONE] sentinel for stream completion
  • Handles delta chunks for text content
  • Handles tool_call chunks for native tool invocation
  • Detects truncated streams (ErrStreamTruncated)
  • Enforces MaxLLMResponseBytes (1 MB) to prevent OOM

Stream Iterator

type StreamIterator struct {
    Next  func() (*StreamChunk, error)
    Close func() error
}

The workflow engine's consumeStreamWithTools() reads chunks and:

  1. Accumulates text deltas into a strings.Builder
  2. Collects tool_call chunks by index into toolCallBuilder structs
  3. Finalizes into ToolCall objects with parsed JSON arguments

Provider Registration

Source: internal/tui/provider_registration.go

Provider registration happens at startup in main.go:

registry := provider.NewRegistry()
if cfg.Provider.OpenRouter.APIKey != "" {
    RegisterProvider(registry, cfg, "openrouter", apiKey, version)
}
if cfg.Provider.Zen.APIKey != "" {
    RegisterProvider(registry, cfg, "zen", apiKey, version)
}
if cfg.Provider.Default != "" {
    registry.SetActive(cfg.Provider.Default)
}

Reasoning Support

Source: internal/provider/reasoning.go

Extended thinking / reasoning mode support:

  • Detects models that support reasoning via capability flags
  • ReasoningEnabled flag in ChatRequest
  • Thinking duration tracked in StreamChunk.ThinkingDuration
  • TUI renders thinking blocks with configurable opacity

Common Utilities

Source: internal/provider/common.go

Shared helpers:

  • HTTP response body reading with size limits
  • Error sanitization (caps at MaxProviderErrorChars = 200)
  • Retry-After header parsing
  • Rate limit detection (HTTP 429)

Clone this wiki locally