Skip to content

Model Providers

Jacob Centner edited this page Apr 10, 2026 · 3 revisions

Model Providers

Sentinel uses a pluggable model provider system for LLM operations (judge, semantic-drift, test-coherence, synthesis). Three providers are built in. Your choice of model determines which Capability Tiers to configure.

Provider comparison

Provider Local? Cost Setup complexity Best for
Ollama Yes Free (hardware only) Low Default, privacy-sensitive repos
OpenAI-compatible No Per-token Medium Higher capability analysis
Azure No Per-token Medium Enterprise environments

Ollama (default)

Runs models locally via Ollama. No data leaves your machine.

[sentinel]
provider = "ollama"
model = "qwen3.5:4b"
ollama_url = "http://localhost:11434"

Setup

ollama pull qwen3.5:4b    # default model (~2.5 GB)
ollama serve               # start the server

Recommended models

Model Size Capability tier Use case
qwen3.5:4b ~2.5 GB basic Default — judge + basic detectors
qwen3.5:8b ~5 GB basic–standard Better judgment quality
qwen3:14b ~9 GB standard Synthesis + enhanced analysis

OpenAI-compatible

Works with OpenAI API and any compatible endpoint (LiteLLM, vLLM, etc.).

[sentinel]
provider = "openai"
model = "gpt-4o-mini"
api_base = "https://api.openai.com/v1"
api_key_env = "OPENAI_API_KEY"
model_capability = "standard"

Set the API key in your environment:

export OPENAI_API_KEY=sk-...

Azure

Works with Azure AI / Azure OpenAI endpoints.

[sentinel]
provider = "azure"
model = "gpt-5.4-nano"
api_base = "https://your-resource.services.ai.azure.com"
api_key_env = "AZURE_API_KEY"
model_capability = "standard"

Running without an LLM

Sentinel works without any LLM provider. Use:

sentinel scan /repo --skip-judge --skip-llm

Or in config:

[sentinel]
skip_judge = true
skip_llm = true

This runs only deterministic and heuristic detectors. The two fully LLM-dependent detectors (semantic-drift, test-coherence) produce no findings. Docs-drift's deterministic sub-checks (stale refs, dep drift) still run — only its optional LLM doc-code comparison is skipped.

Health checks

Sentinel checks provider health before using it. If the provider is unreachable, LLM-assisted detectors gracefully skip and the judge step is skipped.

Run sentinel doctor to verify provider connectivity.

Retry logic

Cloud providers (OpenAI, Azure) automatically retry on:

  • Rate limits (429)
  • Server errors (500, 502, 503, 504)
  • Timeouts and connection errors

Uses exponential backoff with 2 retries. Respects Retry-After headers.

See also

Clone this wiki locally