-
Notifications
You must be signed in to change notification settings - Fork 2
Codex Cost Model – Marginal Token Counting
- Date: 2026-05-25
- Status: Accepted
- Issue: #973
- Amends: Provider Plugin Contract (adds a provider-level token-convention rule for OpenAI-family providers)
Codex (OpenAI) and Claude Code (Anthropic) report token usage differently:
| Field | Anthropic (Claude Code) | OpenAI (Codex) |
|---|---|---|
input_tokens |
Non-cached portion only | Total (inclusive of cached) |
cache_read_input_tokens / cached_input_tokens
|
Cached portion (separate, non-overlapping) | Cached portion (subset of input_tokens) |
Budi's cost calculator (ModelPricing::calculate_cost_cents in crates/budi-core/src/provider.rs) was designed around Anthropic's convention: it treats input_tokens and cache_read_tokens as non-overlapping categories. It charges input_tokens × input_rate plus cache_read_tokens × cache_read_rate.
The Codex parser (crates/budi-core/src/providers/codex.rs) passed OpenAI's raw input_tokens (which includes cached) alongside cached_input_tokens as cache_read_tokens. This caused double-counting: the cached portion was charged once at full input_rate (as part of input_tokens) and again at cache_read_rate.
On heavily-cached sessions (typical for Codex, which re-sends the full conversation context each turn), this produced costs approximately 29× higher than the actual API invoice:
| Source | Codex cost | Sessions |
|---|---|---|
| ccusage (marginal-token) | $41 | 20 |
| budi (pre-fix, double-counted) | $1,205 | 6,001 messages |
The 29× ratio matches the arithmetic: with ~95% cache hits, charging the full input_tokens at input_rate instead of only the non-cached 5% inflates the input cost by ~20×, which propagates to the total.
The issue title frames this as "marginal-token vs full-context." On closer inspection, it is neither — it is a token-convention mismatch between providers. OpenAI's actual billing is:
cost = (input_tokens - cached_input_tokens) × input_rate
+ cached_input_tokens × cache_read_rate
+ output_tokens × output_rate
Budi's cost calculator already implements exactly this formula — it just expects input_tokens to mean the non-cached portion (Anthropic's convention). The fix is in the parser, not the calculator.
In parse_token_count(), after extracting raw input_tokens and cached_input_tokens from either last_token_usage or total_token_usage, apply:
let input_tokens = input_tokens.saturating_sub(cached_input_tokens);This converts from OpenAI's convention (overlapping) to budi's internal convention (non-overlapping), which matches Anthropic's API and is what calculate_cost_cents expects.
Option 3 from the issue ("both, with --cost-model flag") is rejected. There is one correct cost: what the provider actually bills. The pre-fix behavior was not a "full-context" cost model — it was a double-counting bug. A flag to choose between "correct" and "wrong" is not useful.
Rows already ingested with the double-counted cost are stored with cost_cents_ingested reflecting the buggy calculation. Per the Model Pricing – Embedded Baseline and Runtime Refresh §5 Rule B, known-at-the-time rows are never automatically rewritten. Users who want corrected historical Codex costs should run budi db import --force after upgrading.
- Codex costs now match actual OpenAI invoices for cached-context sessions.
- The cost calculator's assumption (non-overlapping
input_tokens/cache_read_tokens) is documented as a provider-level contract, making it explicit for future providers.
- A visible step change in Codex cost reporting after upgrade. Users may see Codex costs drop ~29× for cached sessions. This is correct — the old number was wrong.
- Historical Codex rows retain the inflated cost until the user runs
budi db import --force.
- Claude Code, Copilot CLI, Copilot Chat, and Cursor providers are unaffected. Anthropic already reports non-overlapping categories. Copilot and Cursor do not currently report cached-input breakdowns.
- No budi-cloud changes needed — cloud consumes
cost_centsfrom daily rollups regardless of how the underlying provider parsed tokens. - No schema migration — the fix is parser-only.
Any provider whose upstream API reports input_tokens inclusive of cached_input_tokens (OpenAI-family convention) must subtract cached from input in its parse_file implementation before constructing ParsedMessage. The cost calculator's contract is: input_tokens is the non-cached (full-rate) portion; cache_read_tokens is the cached (discounted) portion; they do not overlap.
- Model Pricing – Embedded Baseline and Runtime Refresh: pricing manifest and immutability rules
- Provider Plugin Contract: provider shape contract
-
tmp-research/ccusage-output-comparison.md: the head-to-head comparison that surfaced the discrepancy
Last verified against code on 2026-05-25.
budi · Issues · Releases · app.getbudi.dev · getbudi.dev
Start here
ADRs — Data & privacy
ADRs — Ingestion
ADRs — Pricing
- Model Pricing – Embedded Baseline and Runtime Refresh
- Custom Team Pricing and Effective Cost
- Codex Cost Model – Marginal-Token Counting
ADRs — Provider contracts
Operational references
- Daemon Lifecycle and Autostart
- Provider Plugin Contract
- Cloud Sync Mechanics
- Statusline Integration
- Operations and Observability
- Release and Versioning
Ecosystem