TokenCost

Know exactly what you spend on AI — and automatically spend less.

A local proxy that sits between your tools and the LLM APIs. It tracks every token, calculates real costs, and optimizes requests on the fly. Works with Claude Code, VS Code, Claude Desktop, OpenClaw, and 15+ API providers (OpenAI, Groq, Mistral, DeepSeek, xAI, Perplexity, Cerebras, Together, Fireworks, Cohere, OpenRouter, Ollama, Amazon Bedrock, Google Gemini, and more).

Why

If you use Claude Code or VS Code AI tools heavily, you're probably spending $2–10/day without knowing exactly where it goes. This tool:

Shows you a real-time breakdown by tool, model, session, and task type
Automatically reduces your bill by 50–80% via prompt caching and smart model routing
Works locally — your prompts never leave your machine

What it does

📊 Live Dashboard

Full spend analytics at http://localhost:8082/dashboard:

Cost by day / week / month — trend chart with daily breakdown
By task type — Coding vs Shell vs Agent vs Planning vs Web
By model — Claude Fable 5, Opus 4.8, Sonnet 4.6, Haiku 4.5, GPT-4o, etc. (218 models total)
By source — Claude Code, Claude Desktop, VS Code Extensions, OpenClaw, GitHub Copilot (usage only), API providers
Cache analytics — hit rate, money saved vs. what you'd pay without caching
Session view — every conversation: tokens in/out, cost, tools used
RAW Logs — last 500 requests with full prompt preview
Health grade A–F — scores your efficiency and tells you what to fix

🍎 macOS Menu Bar Widget

Always-visible cost tracker in your menu bar. Click to see:

Today's spend + health grade
7-day trend chart with hover tooltips
Breakdown by task, model, cache, tools
Smart Routing optimizer stats
One-click sync from local logs

⚡ Automatic Optimizations

The proxy applies these silently on every request:

Optimization	What it does	Typical savings
Prompt Cache	Auto-tags large system prompts and user messages for caching	60–90% on repeat reads
Smart Routing	Routes simple requests to cheaper models (Fable→Haiku, Opus→Sonnet, etc.)	5–25× cheaper per request
Thinking Budget	Caps `budget_tokens` for extended thinking based on complexity	80–90% on thinking tokens
Message Trim	Removes old messages when context exceeds 50k tokens	Prevents runaway costs
Session Cap	Limits history to 40 messages per session	Prevents VS Code "prompt too long" errors
Deduplication	Returns cached response for identical requests within 5s	100% on accidental double-sends

Smart Routing scores each prompt 0–10 (length, keywords, code presence) and silently downgrades model when complexity is low. You get the same response, cheaper.

Install

Requirements: Python 3.9+ · macOS (Monterey 12+) or Windows 10/11

macOS

Step 1 — first time only:

cd ~ && git clone https://github.com/mr-beaver/tokencost && cd tokencost && bash onbording.sh

The setup script installs everything and adds a tokencost command to your terminal.

Step 2 — every time after (restart / update):

Open Terminal and type:

tokencost

First install adds this command automatically. Open a new terminal tab after step 1 for it to appear.

The setup script:

Creates a Python virtualenv and installs dependencies
Imports your full Claude usage history from local logs
Sets ANTHROPIC_BASE_URL=http://localhost:8082 in ~/.zshrc and macOS launchd
Adds tokencost alias for quick restarts
Starts the proxy and opens the dashboard

Windows

Step 1 — first time only (PowerShell):

cd $HOME; git clone https://github.com/mr-beaver/tokencost; cd tokencost; powershell -ExecutionPolicy Bypass -File onbording.ps1

Step 2 — every time after: double-click tokencost.bat in the repo folder, or run the same onbording.ps1 command again. Pick option 1 to start, 2 to disable.

The Windows setup script (onbording.ps1):

Creates a Python virtualenv and installs dependencies (requirements.txt)
Imports your full Claude usage history from local logs (%APPDATA%\Claude, %APPDATA%\Code, ~/.claude)
Sets ANTHROPIC_BASE_URL=http://localhost:8082 as a User environment variable (picked up by Claude Code, VS Code, Claude Desktop, and new terminals)
Registers autostart at logon — a Scheduled Task if you run it as Administrator (this also adds the unattended 5-minute auto-sync task), otherwise a no-admin launcher in your Startup folder
Starts the proxy and opens the dashboard

Auto-sync needs admin. Without elevation the proxy still autostarts and captures live traffic in real time; historical logs sync whenever you run setup or click Sync now in the dashboard. Run onbording.ps1 from an Administrator PowerShell if you want the unattended 5-minute sync as well.

The macOS menu-bar widget is not available on Windows — use the web dashboard at http://localhost:8082/dashboard.

Updating (Windows): open the dashboard, click the version badge, and copy the Update command — it's shell-agnostic (runs in cmd.exe or PowerShell), does git pull, and restarts the proxy non-interactively. Or just re-run tokencost.bat.

After install, all your Claude Code and VS Code requests flow through the proxy automatically.

macOS Menu Bar Widget (optional)

The pre-built app is included in the repo — onbording.sh installs and launches it automatically.

To install manually:

cp -R menubar/TokenCostBar.app ~/Applications/
open ~/Applications/TokenCostBar.app

Build from source (requires Xcode Command Line Tools):

cd menubar && bash build.sh
mv TokenCostBar.app ~/Applications/
open ~/Applications/TokenCostBar.app

Supported Sources

Local applications (auto-tracked from local logs)

Paths are resolved per-platform — macOS ~/Library/Application Support/..., Windows %APPDATA%\..., Linux ~/.config/....

Claude Code / Claude CLI — ~/.claude/projects/**/*.jsonl (all platforms)
Claude Desktop — local-agent-mode-sessions/**/*.jsonl under Claude (macOS ~/Library/Application Support/Claude, Windows %APPDATA%\Claude)
OpenClaw — ~/.openclaw/agents/**/*.jsonl
VS Code Extensions (Cline, Roo Code, Kilo Code, IBM Bob) — globalStorage/*/tasks/ui_messages.json under Code (macOS ~/Library/Application Support/Code, Windows %APPDATA%\Code)
GitHub Copilot — usage tracking from VS Code logs (model + requests, pricing unavailable)

Sync cadence: every 5 minutes on macOS (launchd) and on admin Windows (Scheduled Task); on non-admin Windows, on each setup run or via Sync now in the dashboard.

API Providers (real-time via proxy — set base URL)

Set the base URL for each provider you want to track:

export ANTHROPIC_BASE_URL=http://localhost:8082            # Claude (auto-set by onbording.sh)
export OPENAI_BASE_URL=http://localhost:8082/openai        # OpenAI
export GROQ_API_BASE=http://localhost:8082/groq            # Groq
export MISTRAL_API_BASE=http://localhost:8082/mistral      # Mistral
export DEEPSEEK_API_BASE=http://localhost:8082/deepseek    # DeepSeek

On Windows, ANTHROPIC_BASE_URL is set for you by onbording.ps1; set additional providers with PowerShell, e.g.:

[Environment]::SetEnvironmentVariable("OPENAI_BASE_URL", "http://localhost:8082/openai", "User")

Supported: Anthropic · OpenAI · Groq · Mistral · DeepSeek · xAI · Perplexity · Cerebras · Together · Fireworks · Cohere · OpenRouter · Ollama · Amazon Bedrock · Google Gemini

218 models in the pricing database.

How it works

Your tool (Claude Code / VS Code / etc.)
        ↓
  localhost:8082  ← proxy runs here
        ↓  scores complexity, applies optimizations, logs tokens+cost
  api.anthropic.com / api.openai.com / etc.
        ↓
  Response back through proxy → your tool

The proxy is transparent — your tools see it as the real API. Zero changes to your workflow.

Chaining behind another proxy

TokenCost forwards every Anthropic path transparently (/v1/* and /api/oauth/*), so it can sit downstream of another Anthropic proxy (e.g. a context-compression layer) that points its upstream at TokenCost:

Your tool → other proxy (compress) → localhost:8082 (TokenCost logs cost) → api.anthropic.com

Point the upstream proxy's Anthropic target at http://localhost:8082 (subscription-usage polls pass through untouched and aren't logged).

Smart Routing details

When enabled (setup script → option 1), the proxy scores the last user message:

Score	Original model	Routes to	Savings
0–2	Fable 5	Haiku	~50× cheaper
0–2	Opus 4.8	Haiku	~25× cheaper
0–2	Sonnet 4.6	Haiku	~5× cheaper
3–5	Fable 5	Sonnet	~17× cheaper
3–5	Opus 4.8	Sonnet	~5× cheaper
6–10	any	unchanged	—

Simple questions (what is X, explain Y), short messages, and tool-chain intermediates score 0–2. Long coding tasks with keywords like implement, refactor, debug score 6+.

Supported Claude models: Fable 5 ($10/$50M), Opus 4.8 ($5/$25M), Sonnet 4.6 ($3/$15M), Haiku 4.5 ($1/$5M) — plus legacy 3.x versions.

Fable 5 specifics: Fable is only downrouted for very simple requests (score ≤2). For complex tasks (score 3+), Fable is preserved to take advantage of its reasoning capabilities. This means expensive long-horizon tasks stay on Fable, but throwaway pings and quick questions automatically use Haiku.

Does switching models break the cache? No. The cache key is based on request content, not the model — so when the proxy routes Opus → Haiku, the cheaper model still gets the 90% cache discount. The only nuance: different models have different input prices, so the absolute dollar value of the discount varies. But Haiku's base price is so much lower that the combined savings (cheaper model + cache discount) always exceed staying on the original model with cache alone. High-complexity requests that benefit most from caching (score 6–10) are never routed away from the original model.

Stats API

curl http://localhost:8082/stats?period=today   # today's stats (JSON)
curl http://localhost:8082/stats?period=7d      # last 7 days
curl http://localhost:8082/stats?period=30d     # last 30 days
curl http://localhost:8082/raw-logs?limit=100   # recent requests

Data & Privacy

Everything stored locally in tracker.db (SQLite)
Nothing is sent to any external service
The proxy only reads your token usage metadata — not the content of your conversations (prompt previews are stored locally, never transmitted)
Stop the proxy anytime via the setup script's option 2 (onbording.sh on macOS, tokencost.bat on Windows)

Stop / Uninstall

macOS:

bash onbording.sh   # choose option 2 — stops proxy, removes env vars

Windows: run tokencost.bat (or onbording.ps1) and choose option 2 — stops the proxy, removes the autostart entry (Scheduled Task / Startup launcher), and clears ANTHROPIC_BASE_URL.

⚠️ Don't kill the proxy process manually if you're using Claude Code — Claude routes through it and the call will fail. Always use the setup script's option 2.

Project structure

tokencost/
├── proxy.py           — FastAPI proxy, port 8082
├── optimizer.py       — Request optimizations (cache, routing, trim)
├── db.py              — SQLite schema, cost calculations, analytics
├── import_history.py  — Import historical logs from Claude CLI, Desktop, OpenClaw
├── projects.py        — Project/session tracking
├── dashboard.html     — Web dashboard UI
├── onbording.sh       — Setup / start / stop script (macOS)
├── onbording.ps1      — Setup / start / stop script (Windows)
├── tokencost.bat      — Windows launcher (double-click)
├── requirements.txt   — Python dependencies
└── menubar/           — macOS SwiftUI menu bar app
    ├── build.sh
    └── Sources/TokenCostBar/
        ├── MenuBarView.swift
        └── StatsModel.swift

FAQ

Does switching models break the prompt cache?

No — and this is the most important thing to understand about how the optimizations interact.

Anthropic's prompt caching works at the model level: each model maintains its own isolated KV cache for a given prefix. Haiku and Opus have completely different architectures and incompatible internal representations — there is no cross-family KV sharing. When the proxy routes a request from Opus to Haiku, Haiku builds its own cache entry for that prefix. The next Haiku request with the same prefix reads from Haiku's cache at 0.1× input cost.

The one real cost: the first request after a model switch pays a cache write (1.25× input cost) rather than a cache read (0.1× input cost). If you were switching models constantly — every request — that write overhead would accumulate. But the routing logic prevents this:

tool_result turns with no user text → scored 10, never routed. Active tool chains stay on the original model.
Any request where the last 4 messages contain tool_use or tool_result blocks → +2 to score.
Complex keywords (implement, refactor, debug, code blocks, file extensions) → score 6–10, original model preserved.

In practice, Haiku handles isolated simple questions ("what is X", short Q&A), while Opus stays on multi-turn coding tasks. Both models warm up their own cache entries and stay on them. The write overhead on the first routing switch is a one-time cost; everything after that is cheap reads.

Net result: prompt cache savings + model routing savings compound rather than cancel. Verified in optimizer.py — has_recent_tool_results() and the tool_result type check in complexity_score() are the guards that make this work correctly.

How much can you save?

Benchmarks below use Opus 4.8 base pricing with a typical vibe-coding session mix: 35% simple Q&A (score 0–2 → Haiku), 25% medium refactors (score 3–5 → Sonnet), 40% complex tasks (score 6–10 → Opus). System prompt ~3k tokens reused across the session, 15% duplicate requests filtered.

Savings from $10k/month

Optimization	Saves	Rate	How
Prompt cache	$1,200–1,800	12–18%	System prompt reads at 0.1× cost
Smart routing	$1,600–2,400	16–24%	60% of requests → Haiku/Sonnet
Thinking budget	$300–600	3–6%	Caps `budget_tokens` by complexity
Dedup + trim	$100–200	1–2%	5s dedup window + 50k context trim
Total	$3,800–5,200	38–52%	Combined effect

Savings by spend level

 $30k ┤
 $25k ┤                                                    ◉ Max savings
 $20k ┤
 $15k ┤
 $10k ┤                                         ◉
  $5k ┤                              ◉
  $2k ┤               ◉
  $0k ┼────────────────────────────────────────────────────
       $1k           $5k          $10k        $20k        $50k
                          Monthly spend

At $10k/month: saves ~$3,800–5,200 → $45–62k/year.
At $50k/month: saves ~$19k–26k.

Usage mix assumptions

Score	Routes to	Mix
0–2 (explain, what is, short Q&A)	Haiku	35%
3–5 (medium refactors, review)	Sonnet	25%
6–10 (implement, debug, architect)	Opus	40%

Results vary by usage pattern. Heavy Opus users with long system prompts see the largest gains — prompt cache + smart routing compound. Minimal savings if you already route manually or use Haiku exclusively.

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TokenCost

Why

What it does

📊 Live Dashboard

🍎 macOS Menu Bar Widget

⚡ Automatic Optimizations

Install

macOS

Windows

macOS Menu Bar Widget (optional)

Supported Sources

Local applications (auto-tracked from local logs)

API Providers (real-time via proxy — set base URL)

How it works

Chaining behind another proxy

Smart Routing details

Stats API

Data & Privacy

Stop / Uninstall

Project structure

FAQ

Does switching models break the prompt cache?

How much can you save?

Savings from $10k/month

Savings by spend level

Usage mix assumptions

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
img		img
menubar		menubar
.gitignore		.gitignore
.mailmap		.mailmap
CLAUDE.md		CLAUDE.md
README.md		README.md
RELEASE.md		RELEASE.md
VERSION		VERSION
dashboard.html		dashboard.html
db.py		db.py
import_history.py		import_history.py
info.md		info.md
onbording.ps1		onbording.ps1
onbording.sh		onbording.sh
optimizer.py		optimizer.py
projects.py		projects.py
proxy.py		proxy.py
requirements.txt		requirements.txt
tokencost.bat		tokencost.bat

Folders and files

Latest commit

History

Repository files navigation

TokenCost

Why

What it does

📊 Live Dashboard

🍎 macOS Menu Bar Widget

⚡ Automatic Optimizations

Install

macOS

Windows

macOS Menu Bar Widget (optional)

Supported Sources

Local applications (auto-tracked from local logs)

API Providers (real-time via proxy — set base URL)

How it works

Chaining behind another proxy

Smart Routing details

Stats API

Data & Privacy

Stop / Uninstall

Project structure

FAQ

Does switching models break the prompt cache?

How much can you save?

Savings from $10k/month

Savings by spend level

Usage mix assumptions

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages