Skip to content

NVIDIA NIM Provider

Eshan Roy edited this page Jun 18, 2026 · 1 revision

NVIDIA NIM Provider

M31 Autonomous (M31A) supports NVIDIA NIM (Inference Microservices) as a third LLM provider alongside OpenRouter and Zen.

Overview

Source: internal/provider/nvidia/client.go

NVIDIA NIM provides access to NVIDIA's hosted AI models via an OpenAI-compatible API. The integration includes:

  • Full LLMProvider interface implementation
  • Completion-only model filtering (skips non-chat models)
  • Retry logic with exponential backoff
  • Health check with latency classification
  • First-run wizard and settings UI integration

Configuration

API Key

The NVIDIA API key is resolved in this order:

  1. Environment variable: M31A_NVIDIA_API_KEY
  2. Standard fallback: NVIDIA_API_KEY
  3. OS keychain: m31a/nvidia
  4. Config file: provider.nvidia.api_key

Config File

[provider]
default = "nvidia"

[provider.nvidia]
api_key = "nvapi-..."
nvidia_base_url = "https://integrate.api.nvidia.com/v1"

Environment Variables

Variable Description
M31A_NVIDIA_API_KEY Primary API key variable
NVIDIA_API_KEY Fallback API key variable

Features

Model Catalog

Fetches available models from the NVIDIA NIM API. Completion-only models (e.g., codellama, starcoder) are automatically filtered out of the chat UI.

Streaming

Standard SSE streaming compatible with the OpenAI chat completions format. Supports text deltas, tool call chunks, and usage tracking.

Retry Logic

Transient errors (HTTP 500, 502, 503) trigger automatic retry with exponential backoff (max 2 retries). Non-retryable errors (401 unauthorized, 402 payment required, 429 rate limited) are returned immediately.

Health Check

Health check latency is classified into three levels:

Classification Latency Meaning
Live < 500ms Healthy
Slow < 2 seconds Degraded but functional
Degraded > 2 seconds May affect performance

Model Cache

Uses the shared provider model cache with TTL-based invalidation:

Setting Default Description
TTL 5 minutes Fresh cache lifetime
Stale TTL 24 hours Fallback cache lifetime

First-Run Setup

The first-run wizard includes NVIDIA NIM as a provider option with:

  • Provider-specific API key placeholder (nvapi-...)
  • NVIDIA key validation (checks nvapi- prefix)
  • Model list fetching and display

Settings UI

The settings editor includes NVIDIA NIM in the 3-provider list:

  • Provider selection (OpenRouter, Zen, NVIDIA)
  • API key configuration with secure storage
  • Health check status display
  • Short name abbreviation: NVIDIA

Registration

Provider registration is centralized in internal/tui/provider_registration.go:

case "nvidia":
    return nvidia.New(apiKey, nvidia.Options{
        BaseURL:           cfg.Provider.NvidiaBaseURL,
        Version:           version,
        DefaultContextLen: types.DefaultContextLength,
    })

Source Files

File Purpose
internal/provider/nvidia/client.go NVIDIA NIM client implementation
internal/config/types.go NvidiaAPIKey, NvidiaBaseURL config fields
internal/config/loader.go API key resolution from env/keychain/config
internal/tui/provider_registration.go Centralized provider factory
internal/tui/firstrun_model.go First-run wizard integration
internal/tui/settings_model.go Settings UI integration

Clone this wiki locally