🪖 trooper

A drop-in proxy that keeps your AI conversations alive.
When your cloud LLM quota runs out, trooper silently falls back to a local Ollama model — full conversation context intact. Your app notices nothing.

Works with Claude, OpenAI, Groq, Mistral, Together — any provider, any app.

How it works

Your App  →  http://localhost:3000  →  Claude / GPT / Groq ✅
                                    →  quota hit ⚡
                                    →  Ollama (local) 🪖 seamless fallback

Trooper is a zero-code-change drop-in — just point your base URL at trooper. One environment variable swap and you're protected.

Demo

Start trooper and watch the fallback happen in real time:

2026/04/21 08:24:10 🪖  Trooper proxy starting on http://localhost:3000
2026/04/21 08:24:10     Primary  : https://api.anthropic.com/v1/messages
2026/04/21 08:24:10     Fallback : http://localhost:11434/api/chat (qwen2.5:3b)
2026/04/21 08:24:48 📥 POST /v1/messages (stream=false)
2026/04/21 08:24:50 ⚠️  Primary 400 — falling back to local model
2026/04/21 08:24:50 🪖  Routing to local model: qwen2.5:3b

Full conversation context preserved — Ollama picks up exactly where Claude left off:

{
  "content": [{
    "text": "You just told me that your favorite food is pizza.",
    "type": "text"
  }],
  "model": "qwen2.5:3b",
  "id": "trooper-fallback"
}

The app never knew Claude went down. 🪖

Quickstart

Docker (recommended)

git clone https://github.com/shouvik12/trooper
cd trooper

cp .env.example .env
# edit .env — set PRIMARY_API_KEY and choose your provider

docker compose up

Local

# Prerequisites: Go 1.22+, Ollama running locally
ollama pull qwen2.5:3b
ollama serve

export PRIMARY_API_KEY=sk-ant-...
go run main.go

Trooper starts on http://localhost:3000.

Usage

Just change your base URL — nothing else:

Python + Claude SDK:

import anthropic

client = anthropic.Anthropic(
    api_key="your-key",
    base_url="http://localhost:3000",  # 👈 only change
)

Python + OpenAI SDK:

from openai import OpenAI

client = OpenAI(
    api_key="your-key",
    base_url="http://localhost:3000",  # 👈 only change
)

curl:

curl http://localhost:3000/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: $PRIMARY_API_KEY" \
  -d '{
    "model": "claude-sonnet-4-20250514",
    "max_tokens": 256,
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Configuration

Variable	Default	Description
`PRIMARY_URL`	`https://api.anthropic.com/v1/messages`	Your LLM provider endpoint
`PRIMARY_API_KEY`	(required)	API key for primary provider
`PRIMARY_AUTH_HEADER`	`x-api-key`	`x-api-key` for Claude, `Authorization` for OpenAI/Groq/others
`FALLBACK_URL`	`http://localhost:11434/api/chat`	Local Ollama endpoint
`FALLBACK_MODEL`	`qwen2.5:3b`	Local model to fall back to
`QUOTA_STATUS_CODES`	`429,402,529`	HTTP codes that trigger fallback
`TROOPER_PORT`	`3000`	Port trooper listens on

Provider examples

Claude (default):

PRIMARY_URL=https://api.anthropic.com/v1/messages
PRIMARY_API_KEY=sk-ant-...
PRIMARY_AUTH_HEADER=x-api-key

OpenAI:

PRIMARY_URL=https://api.openai.com/v1/chat/completions
PRIMARY_API_KEY=sk-...
PRIMARY_AUTH_HEADER=Authorization

Groq:

PRIMARY_URL=https://api.groq.com/openai/v1/chat/completions
PRIMARY_API_KEY=gsk_...
PRIMARY_AUTH_HEADER=Authorization

Mistral:

PRIMARY_URL=https://api.mistral.ai/v1/chat/completions
PRIMARY_API_KEY=...
PRIMARY_AUTH_HEADER=Authorization

Fallback behavior

Primary status	Trooper action
`200 OK`	Pass through response
`429 Too Many Requests`	Fall back to local model
`402 Payment Required`	Fall back to local model
`529 Overloaded`	Fall back to local model
`401 Unauthorized`	Return error — bad key is not masked
Network error	Fall back to local model
Any other error	Pass through as-is

Fallback responses include X-Trooper-Fallback: <model> header so you can detect them if needed.

Recommended local models

Model	Size	Quality	Pull command
`qwen2.5:3b`	1.9GB	Fast, lightweight	`ollama pull qwen2.5:3b`
`llama3.1:8b`	4.7GB	Best all-rounder	`ollama pull llama3.1:8b`
`mistral:7b`	4.1GB	Strong reasoning	`ollama pull mistral:7b`
`gemma2:9b`	5.5GB	Google's best mid-size	`ollama pull gemma2:9b`

Features

✅ Works with any LLM provider — Claude, GPT, Groq, Mistral, Together
✅ Zero code changes in your app — just redirect the base URL
✅ Full conversation context preserved across the switch
✅ Streaming support — Ollama responses re-emitted as SSE
✅ Configurable fallback trigger codes
✅ Single Go binary — tiny Docker image (~10MB)
✅ 401 errors surface properly — bad keys aren't silently masked

Roadmap

Hand back to primary when quota resets
Metrics endpoint — see fallback frequency
Multiple fallback models with priority order
Web UI for live routing visibility
LM Studio support
MCP Server Integration**: Enable native Model Context Protocol support to connect Claude AI directly to your agentic workflows.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
go.mod		go.mod
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🪖 trooper

How it works

Demo

Quickstart

Docker (recommended)

Local

Usage

Configuration

Provider examples

Fallback behavior

Recommended local models

Features

Roadmap

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

🪖 trooper

How it works

Demo

Quickstart

Docker (recommended)

Local

Usage

Configuration

Provider examples

Fallback behavior

Recommended local models

Features

Roadmap

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages