Cut your Claude API bill by 60% with intelligent model routing.
A drop-in replacement for the Anthropic Python SDK that routes each request to the cheapest Claude model that can handle it — without sacrificing quality.
Most developers default to Sonnet — or worse, Opus — for every Claude API call. The price difference is dramatic:
| Model | Input ($/1M tokens) | Output ($/1M tokens) | Relative cost |
|---|---|---|---|
| Haiku | $0.25 | $1.25 | 1× |
| Sonnet | $3.00 | $15.00 | 12× |
| Opus | $15.00 | $75.00 | 60× |
Most tasks — classification, lookups, simple Q&A, short summaries — work just as well on Haiku. The result is monthly bills that should be $80 turning into $400.
Nimer Optimizer analyzes each prompt — length, content type, code presence — and routes it to the cheapest model that can handle it. Same quality, fraction of the cost.
pip install nimerfrom nimer import OptimizedClaude
client = OptimizedClaude(
anthropic_api_key="sk-ant-...",
nimer_api_key="nm_...", # optional — enables dashboard logging
)
response = client.messages.create(
max_tokens=512,
messages=[{"role": "user", "content": "Translate 'good morning' to Arabic."}],
)Three lines of config. Automatic routing. Real savings.
Change one import:
- from anthropic import Anthropic
+ from nimer import OptimizedClaude as AnthropicEverything else — messages.create, system prompts, streaming, tool use, multi-modal content — works identically.
Deterministic, explainable rules. No AI deciding which AI to use.
| Condition | Routed to |
|---|---|
| Total input > 5,000 chars | Sonnet |
| Code request + last message > 500 chars | Opus |
| Last message < 200 chars | Haiku |
| Everything else | Sonnet |
You can:
- Override per call:
client.messages.create(model="claude-opus-4-7", auto_route=False, ...) - Tune thresholds: pass a custom
Routerinstance with different limits - Inspect decisions: every routing call is logged to your dashboard with the chosen model and estimated savings
If you set nimer_api_key, the SDK sends:
- Token counts (input + output)
- Which model was selected
- Estimated savings (USD)
- Timestamp
The SDK never sends:
- Your prompts
- Your responses
- Your Anthropic API key
This is enforced in code, not policy. Read nimer/logger.py — it's 80 lines.
| Plan | Price | Includes |
|---|---|---|
| Free | $0/mo | 1,000 requests/month, basic routing, 7-day analytics |
| Pro | $29/mo | 100K requests, advanced routing, 90-day analytics, budget alerts |
| Scale | $99/mo | Unlimited requests, custom routing rules, team features |
The SDK itself is open source and free forever. The paid tiers add the dashboard, analytics, and managed routing rules.
🚧 Alpha — public launch in 6 weeks.
- Core SDK with rule-based routing
- Cost & savings calculations
- Async metadata logging
- Multi-modal content support
- Backend API
- Dashboard UI
- Closed beta (week 5)
- Public launch (week 6)
Follow @bynimer for weekly progress updates, or join the waitlist at nimer.dev.
git clone https://github.com/nimer-dev/optimizer-sdk
cd optimizer-sdk
pip install -e ".[dev]"
pytestThe router is the most important piece — tests/test_router.py covers the routing rules.
Is this actually a drop-in replacement?
Yes. The client.messages.create(...) signature matches the Anthropic SDK. The only addition is auto_route=True, which defaults to on. Set it to False and pass model= to keep the original behavior.
What if your routing picks the wrong model?
Override on a per-call basis with auto_route=False, model="...". The default is optimized for cost; you keep full control when you need it.
How is this different from Helicone or Langfuse? Those are full observability platforms — they log everything and offer rich tracing. Nimer focuses on one thing: cost-aware routing for Claude. If you need full LLM observability, use those. If you want to cut your bill in half with three lines of code, use this.
Will this work with streaming, tool use, or vision? Yes. The SDK forwards everything to the underlying Anthropic client unchanged.
Why Claude only? Because narrow beats broad in v1. Once we're great at Claude, we'll consider OpenAI and Gemini.
Where are you based? Saudi Arabia. Building globally.
Built by Nimer — a service engineer who taught himself Python in two months, watched his Claude API bill climb past $400/month, and built a router to fix it. Now turning it into a product, in public.
⭐ Star this repo if you've ever overspent on Claude API.
MIT — see LICENSE.