Know exactly what your AI features cost, per user, per endpoint, per day.
Note: TokenLedger is in active development (v0.x). The API is stabilizing but may have breaking changes before v1.0. Pin your version in requirements.
TokenLedger is a self-hosted LLM cost analytics solution that runs on your existing Postgres database. Zero external dependencies, complete data ownership, works with Supabase out of the box.
Every startup building AI features lacks cost attribution:
- π "Which users are costing us the most?" β No idea
- π― "What's our cost per feature?" β Can't tell you
- π "Which endpoint is burning through tokens?" β Who knows
Existing solutions (Helicone, LangSmith, Langfuse) are either:
- SaaS β Your data leaves your infrastructure
- Heavy β Require significant setup and infrastructure
- Expensive β Per-seat pricing adds up fast
TokenLedger is different:
- β Postgres-native β Works with your existing database (Supabase, Neon, RDS)
- β Self-hosted β Your data never leaves your infrastructure
- β Zero overhead β 2-line integration, async batching
- β Cost-aware β Automatic cost calculation with up-to-date pricing
pip install tokenledgerimport tokenledger
import openai
# Configure once
tokenledger.configure(database_url="postgresql://...")
tokenledger.patch_openai()
# That's it! All calls are now tracked
response = openai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)Every OpenAI call is now automatically logged to your Postgres database with:
- Token counts (input, output, cached)
- Cost in USD
- Latency
- Model used
- User ID (if provided)
- Full request/response metadata
Streaming calls are also automatically tracked:
# Streaming works seamlessly
for chunk in openai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
stream=True,
stream_options={"include_usage": True} # Optional: get token counts
):
print(chunk.choices[0].delta.content or "", end="")
# Event is logged after stream completesimport tokenledger
import anthropic
tokenledger.configure(database_url="postgresql://...")
tokenledger.patch_anthropic()
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-5-latest",
messages=[{"role": "user", "content": "Hello!"}]
)import tokenledger
from google import genai
tokenledger.configure(database_url="postgresql://...")
tokenledger.patch_google()
client = genai.Client(api_key="...")
response = client.models.generate_content(
model="gemini-2.5-flash",
contents="Hello!"
)Know exactly who is spending money and which features are driving costs:
from tokenledger import attribution
# Context manager - all calls inside are attributed
with attribution(user_id="user_123", feature="summarize", team="ml"):
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Summarize this..."}]
)
# Decorator - attribute entire functions
@attribution(feature="chat", cost_center="CC-001")
def handle_chat(user_id: str, message: str):
with attribution(user_id=user_id): # Contexts nest and merge
return client.chat.completions.create(...)Query your costs by any dimension:
SELECT feature, team, SUM(cost_usd) as cost
FROM token_ledger_events
WHERE timestamp >= NOW() - INTERVAL '7 days'
GROUP BY feature, team
ORDER BY cost DESC;When using streaming/lazy responses (common with frameworks like pydantic-ai, LangChain),
the LLM API call may happen after the context manager exits. Use persistent=True mode:
from tokenledger import attribution, clear_attribution
# Problem: Context exits before stream is consumed
async with attribution(user_id="user123"):
response = await framework.stream(...) # Returns lazy response
# Context exits here!
async for chunk in response: # API call happens here, context is gone!
yield chunk
# Solution: Use persistent mode
async with attribution(user_id="user123", feature="chat", persistent=True):
response = await framework.stream(...)
async for chunk in response: # Context still active!
yield chunk
clear_attribution() # Explicitly clear when doneTokenLedger includes a beautiful React dashboard:
# Start with Docker
docker compose up
# Open http://localhost:3000Or run the API server standalone:
pip install tokenledger[server]
python -m tokenledger.serverimport tokenledger
tokenledger.configure(
# Database connection
database_url="postgresql://user:pass@localhost/db",
# App identification
app_name="my-app",
environment="production",
# Performance tuning
batch_size=100, # Events per batch write
flush_interval_seconds=5, # How often to flush
async_mode=True, # Background logging
# Sampling for high-volume apps
sample_rate=1.0, # 1.0 = log everything
)from tokenledger.queries import TokenLedgerQueries
queries = TokenLedgerQueries()
# Get cost summary
summary = queries.get_cost_summary(days=30)
print(f"Last 30 days: ${summary.total_cost:.2f}")
print(f"Total requests: {summary.total_requests}")
# Cost by model
models = queries.get_costs_by_model(days=30)
for m in models:
print(f"{m.model}: ${m.total_cost:.2f} ({m.total_requests} requests)")
# Cost by user
users = queries.get_costs_by_user(days=30)
for u in users[:5]:
print(f"{u.user_id}: ${u.total_cost:.2f}")
# Daily trends
daily = queries.get_daily_costs(days=7)
for d in daily:
print(f"{d.date}: ${d.total_cost:.2f}")-- Daily costs by model
SELECT
DATE(timestamp) as date,
model,
SUM(cost_usd) as total_cost,
COUNT(*) as requests
FROM token_ledger_events
WHERE timestamp >= NOW() - INTERVAL '30 days'
GROUP BY DATE(timestamp), model
ORDER BY date DESC, total_cost DESC;
-- Top 10 users by cost
SELECT
user_id,
SUM(cost_usd) as total_cost,
COUNT(*) as requests
FROM token_ledger_events
WHERE timestamp >= NOW() - INTERVAL '30 days'
GROUP BY user_id
ORDER BY total_cost DESC
LIMIT 10;
-- Projected monthly cost
SELECT
(SUM(cost_usd) / 7) * 30 as projected_monthly
FROM token_ledger_events
WHERE timestamp >= NOW() - INTERVAL '7 days';from fastapi import FastAPI
from tokenledger.middleware import FastAPIMiddleware
app = FastAPI()
app.add_middleware(FastAPIMiddleware)
# User ID from X-User-ID header is automatically trackedfrom flask import Flask
from tokenledger.middleware import TokenLedger
app = Flask(__name__)
TokenLedger(app)from tokenledger import track_cost
# Track manually if you need to
track_cost(
input_tokens=150,
output_tokens=500,
model="gpt-4o",
user_id="user_123",
)TokenLedger works perfectly with Supabase:
-
Get your connection string from Supabase Dashboard β Settings β Database
-
Run the migrations:
DATABASE_URL="postgresql://postgres:password@db.xxx.supabase.co:5432/postgres" tokenledger db init- Configure TokenLedger:
tokenledger.configure(
database_url="postgresql://postgres:password@db.xxx.supabase.co:5432/postgres"
)tokenledger/
βββ tokenledger/ # Python package
β βββ __init__.py # Main exports
β βββ config.py # Configuration
β βββ tracker.py # Core tracking logic
β βββ pricing.py # LLM pricing data
β βββ queries.py # Analytics queries
β βββ decorators.py # @track_llm decorator
β βββ middleware.py # FastAPI/Flask middleware
β βββ server.py # Dashboard API server
β βββ interceptors/ # SDK patches
β βββ openai.py
β βββ anthropic.py
β βββ google.py
βββ dashboard/ # React dashboard
βββ migrations/ # SQL migrations
βββ examples/ # Usage examples
TokenLedger includes up-to-date pricing (January 2026) for 74+ models across 3 providers:
| Model Family | Input/1M | Output/1M | Notes |
|---|---|---|---|
| GPT-5 (5.2, 5.1, 5, mini, nano) | $0.05-1.75 | $0.40-14.00 | Cached input support |
| GPT-5 Pro | $15.00 | $120.00 | Premium reasoning |
| GPT-4.1 (4.1, mini, nano) | $0.10-2.00 | $0.40-8.00 | 1M context window |
| GPT-4o (4o, 4o-mini) | $0.15-2.50 | $0.60-10.00 | 128K context |
| O-Series (o1, o3, o4-mini) | $1.10-20.00 | $4.40-80.00 | Reasoning models |
| Audio (Whisper, TTS) | $0.003-0.012/min | - | Per-minute billing |
| Images (DALL-E 3, GPT-Image) | $0.04-0.12/image | - | Per-image billing |
| Model Family | Input/1M | Output/1M | Notes |
|---|---|---|---|
| Claude 4.5 (Opus, Sonnet, Haiku) | $1.00-5.00 | $5-25 | Latest generation |
| Claude 4 (Opus, Sonnet) | $3.00-15.00 | $15-75 | Prompt caching |
| Claude 3.7 (Sonnet) | $3.00 | $15.00 | Prompt caching |
| Claude 3.5 (Sonnet, Haiku) | $0.80-3.00 | $4-15 | Prompt caching |
| Claude 3 (Opus, Sonnet, Haiku) | $0.25-15.00 | $1.25-75 | Legacy |
| Model Family | Input/1M | Output/1M | Notes |
|---|---|---|---|
| Gemini 3 (Pro, Flash preview) | $0.50-2.00 | $4-12 | Latest preview |
| Gemini 2.5 (Pro, Flash, Lite) | $0.10-1.25 | $0.40-10 | Production ready |
| Gemini 2.0 (Flash, Lite) | $0.075-0.10 | $0.30-0.40 | Fast inference |
- Mistral (pricing data included, interceptor planned)
- Custom/self-hosted models
# Clone the repo
git clone https://github.com/yourusername/tokenledger
cd tokenledger
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Start local development
docker compose up postgres
python -m tokenledger.server- Alerts & notifications (budget thresholds)
- Cost allocation tags (feature, team, project, cost_center)
- Team/project grouping via attribution context
- Google Gemini support
- OpenAI audio/image API tracking
- pydantic-ai framework compatibility
- OpenAI streaming support
- Anthropic streaming support
- Google streaming support
- Grafana integration
- CLI for querying
- More LLM providers (Mistral, Cohere)
- TimescaleDB optimization guide
TokenLedger is licensed under the Elastic License 2.0 (ELv2).
What this means:
- β Free to use β Use TokenLedger in your projects, even commercial ones
- β Modify freely β Fork it, extend it, make it yours
- β Self-host β Run it on your own infrastructure
- β No SaaS β You cannot offer TokenLedger as a hosted/managed service
This license protects the project while keeping it free for the community.
Contributions are welcome! Please read our Contributing Guide first.
Built with β€οΈ for the AI startup community