Know what your LLM calls actually cost.
tokonomix is a lightweight Python library for token counting and cost management across every major LLM provider. It replaces guesswork with exact numbers — so you can track spending, set budgets, and compare providers before committing to a model.
from tokonomix import estimate_cost, calculate_cost, compare_models
# How much will this prompt cost?
est = estimate_cost("Explain quantum computing in simple terms", model="gpt-4o")
print(f"Input: {est.estimated_input_tokens} tokens, ${est.estimated_input_cost:.6f}")
# Track exact costs from API responses
usage = calculate_cost("claude-sonnet-4-20250514", input_tokens=1500, output_tokens=800)
print(f"Total: ${usage.total_cost:.6f}")
# Which model is cheapest for this prompt?
results = compare_models("Your long prompt here...", output_tokens=2000)
for r in results[:5]:
print(f" {r['model']:<30} ${r['total_cost']:.6f}")Every team using LLM APIs has the same question: "how much is this costing us?"
Existing solutions are either unmaintained (tokencost hasn't been updated in 7 months), locked behind a signup wall, or buried inside massive frameworks. tokonomix is none of those things. It's a focused library that does one job well.
What you get:
- Accurate pricing for 40+ models across OpenAI, Anthropic, Google, Mistral, DeepSeek, xAI, and Cohere
- Proper handling of cached input tokens, thinking/reasoning tokens, and batch pricing
- Token counting via tiktoken (with a fallback estimator when tiktoken isn't installed)
- Cost tracking with decorators and context managers
- Budget management with threshold alerts
- Cross-provider comparison to find the cheapest model for any input
- A CLI for quick estimates without writing code
pip install tokonomixFor token counting with tiktoken (recommended):
pip install tokonomix[tiktoken]For the CLI:
pip install tokonomix[cli]Everything:
pip install tokonomix[all]from tokonomix import count_tokens, count_message_tokens
count_tokens("Hello, world!", model="gpt-4o")
# 4
count_message_tokens(
[
{"role": "system", "content": "You are helpful."},
{"role": "user", "content": "What is 2+2?"},
],
model="gpt-4o",
)
# 18from tokonomix import estimate_cost
est = estimate_cost("Your prompt text here", model="claude-sonnet-4-20250514")
print(est.estimated_input_tokens) # 5
print(est.estimated_input_cost) # Decimal('0.000015')
print(est.estimated_max_output_cost) # Decimal('0.240000')from tokonomix import calculate_cost
# After getting token counts from the API response:
usage = calculate_cost(
model="gpt-4o",
input_tokens=1500,
output_tokens=800,
cached_tokens=500, # prompt caching discount
)
print(usage.total_cost) # Decimal('0.010875')
print(usage.input_cost) # Decimal('0.002875')
print(usage.output_cost) # Decimal('0.008000')from tokonomix import CostTracker
with CostTracker() as tracker:
# After each API call, record the usage:
tracker.record("gpt-4o", input_tokens=500, output_tokens=200)
tracker.record("gpt-4o", input_tokens=300, output_tokens=150)
tracker.record("claude-sonnet-4-20250514", input_tokens=1000, output_tokens=500)
print(tracker.total_cost) # Decimal('0.013325')
print(tracker.by_model()) # {'claude-sonnet-4-20250514': ..., 'gpt-4o': ...}
print(tracker.by_provider()) # {'anthropic': ..., 'openai': ...}
print(tracker.summary())from tokonomix import track_cost
@track_cost(model="gpt-4o")
def ask_gpt(prompt: str) -> dict:
response = openai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
)
return {
"text": response.choices[0].message.content,
"input_tokens": response.usage.prompt_tokens,
"output_tokens": response.usage.completion_tokens,
}
result = ask_gpt("What is Python?")
print(ask_gpt.get_last_usage().total_cost)
print(ask_gpt.get_total_cost())from tokonomix import Budget, BudgetExceededError
budget = Budget(limit=5.00, period="daily")
budget.on_threshold(0.8, lambda b: print(f"Warning: {b.utilization:.0%} of daily budget used"))
# In your API call loop:
try:
usage = calculate_cost("gpt-4o", input_tokens=1000, output_tokens=500)
budget.record(usage.total_cost)
except BudgetExceededError:
print("Daily budget exceeded, switching to cheaper model")from tokonomix import compare_models, cheapest_model, format_comparison
# Compare all models
results = compare_models("Your prompt text", output_tokens=1000)
print(format_comparison(results, top_n=10))
# Find the absolute cheapest
model = cheapest_model("Your prompt text", min_context_window=128000)
print(f"Use {model.model_id}: ${model.input_per_million}/M input")from tokonomix import get_model, list_models, find_models, Provider
model = get_model("gpt-4o")
print(model.input_per_million) # Decimal('2.50')
print(model.cached_input_per_million) # Decimal('1.25')
print(model.context_window) # 128000
# List all Anthropic models
for m in list_models(Provider.ANTHROPIC):
print(f"{m.model_id}: ${m.input_per_million}/M in, ${m.output_per_million}/M out")
# Search models
for m in find_models("claude"):
print(m.model_id)# Estimate cost for a prompt
tokonomix estimate "What is the meaning of life?" -m gpt-4o
# Estimate from a file
tokonomix estimate @prompt.txt -m claude-sonnet-4-20250514
# Compare costs across all providers
tokonomix compare "Your prompt" -n 10
# Filter by provider
tokonomix compare "Your prompt" -p openai,anthropic
# List all models
tokonomix models
# List models for a specific provider
tokonomix models -p google
# Get detailed pricing
tokonomix price gpt-4.1
# Find the cheapest model
tokonomix cheapest "Your prompt" -c 128000| Provider | Models | Cached Pricing | Thinking Tokens |
|---|---|---|---|
| OpenAI | GPT-4.1, GPT-4o, o1, o3, o4-mini, embeddings | Yes | Yes (o-series) |
| Anthropic | Claude Opus 4, Sonnet 4, 3.7/3.5 Sonnet, 3.5 Haiku | Yes | Yes (Opus 4) |
| Gemini 2.5 Pro/Flash, 2.0 Flash, 1.5 Pro/Flash | Yes | Yes (2.5 series) | |
| Mistral | Large, Small, Codestral, Pixtral | No | No |
| DeepSeek | Chat (V3), Reasoner (R1) | Yes | Yes (Reasoner) |
| xAI | Grok-2, Grok-3, Grok-3 Mini | No | Yes (Mini) |
| Cohere | Command R+, Command R, embeddings | No | No |
Prices are verified against official provider pricing pages. If you notice a discrepancy, please open an issue.
Model pricing changes frequently. When prices change:
- Update the relevant entries in
src/tokonomix/models.py - Run the tests to verify consistency
- Submit a PR
We aim to update prices within 48 hours of provider announcements.
- Token counting uses tiktoken for accurate BPE tokenization. For non-OpenAI models, tiktoken's
o200k_baseencoding provides a reasonable approximation. If tiktoken isn't installed, a word-based heuristic kicks in. - Pricing is stored as
Decimalvalues to avoid floating-point rounding issues. $2.50 per million tokens is exactly $0.0000025 per token, not $0.0000024999999999. - The tracker is thread-safe and uses monotonic timestamps for period-based budgets.
Contributions are welcome — especially pricing updates, new provider support, and bug fixes.
git clone https://github.com/zbhatti/tokonomix.git
cd tokonomix
pip install -e ".[all]"
pip install pytest ruff mypy
pytestPart of the stef41 LLM toolkit — open-source tools for every stage of the LLM lifecycle:
| Project | What it does |
|---|---|
| datacrux | Training data quality — dedup, PII, contamination |
| castwright | Synthetic instruction data generation |
| datamix | Dataset mixing & curriculum optimization |
| toksight | Tokenizer analysis & comparison |
| trainpulse | Training health monitoring |
| ckpt | Checkpoint inspection, diffing & merging |
| quantbench | Quantization quality analysis |
| infermark | Inference benchmarking |
| modeldiff | Behavioral regression testing |
| vibesafe | AI-generated code safety scanner |
| injectionguard | Prompt injection detection |
Apache 2.0