Drop-in replacement for OpenAI & Claude APIs. Lower cost. Lower latency. No code changes.
Just swap one URL:
import openai
client = openai.OpenAI(
api_key="your-aibadgr-key",
base_url="https://aibadgr.com/v1", # ← this
)That's it. Everything else stays the same.
LLM API bills compound fast. At scale, 429s and retry storms make it worse. AI Badgr runs on community GPU workers — giving you a cheaper, faster endpoint that's compatible with every OpenAI/Claude client you're already using.
Real numbers, not estimates. Upload your OpenAI or Anthropic usage export at aibadgr.com/bill-calculator and see your exact savings replayed against your real workload.
1. Get your API key → aibadgr.com/signup
2. Swap the base URL
| Provider | Old | New |
|---|---|---|
| OpenAI | api.openai.com |
aibadgr.com |
| Claude | api.anthropic.com |
aibadgr.com |
3. Verify your savings in 60 seconds
export OPENAI_API_KEY="sk-..."
export BADGR_API_KEY="your-aibadgr-key"
python examples/badgr-verify-demo.shOutput:
OpenAI: ttfb=410ms total=2.9s in=312 out=128 est_cost=$0.0009
Badgr: ttfb=180ms total=2.1s in=312 out=128 est_cost=$0.0004
Diff: -230ms ttfb, -0.8s total, -56% cost
| Library | Works? |
|---|---|
| OpenAI SDK (Python + JS) | ✅ |
| Anthropic SDK | ✅ |
| LangChain | ✅ |
| LlamaIndex | ✅ |
| LiteLLM | ✅ |
| Vercel AI SDK | ✅ |
| Haystack | ✅ |
| AutoGen / CrewAI | ✅ |
| Dify, RAGFlow, Open WebUI | ✅ |
See docs/examples.md for copy-paste snippets.
Per-token. No subscriptions. No minimums.
| Model | Input / 1M | Output / 1M |
|---|---|---|
| Llama 3.1 8B | $0.20 | $0.20 |
| Mistral Nemo | $0.30 | $0.60 |
| Llama 3.3 70B | $0.65 | $1.20 |
curl -s https://aibadgr.com/tools/migrate-to-aibadgr.py | python - ./srcPreview first with --dry-run:
curl -O https://aibadgr.com/tools/migrate-to-aibadgr.py
python migrate-to-aibadgr.py ./src --dry-runRun the worker on any CUDA-capable machine. While it's idle, AI Badgr automatically lists it on Vast.ai, RunPod, Thunder Compute, and 7 other marketplaces. When a job arrives, it instantly unlists. Zero configuration.
docker run -d --name aibadgr-worker --restart unless-stopped --gpus all \
-e API_BASE_URL="https://aibadgr.com" \
michaelmanleyx/aibadgr-worker:latestThe worker auto-detects GPU memory and picks the best model tier.
- Getting Started
- Concepts — architecture, tiers, encryption, incident radar
- Examples — LangChain, Anthropic, TypeScript, RAG
llm rag ai openai anthropic gpu inference langchain litellm retry
Found a bug, want an integration, or hitting 429s? Open an issue — the retry issue template is specifically for rate-limit pain. PRs welcome. See CONTRIBUTING.md.