Codex Cost Model – Marginal Token Counting

Codex Cost Model — Marginal-Token Counting

Date: 2026-05-25
Status: Accepted
Issue: #973
Amends: Provider Plugin Contract (adds a provider-level token-convention rule for OpenAI-family providers)

Context

Codex (OpenAI) and Claude Code (Anthropic) report token usage differently:

Field	Anthropic (Claude Code)	OpenAI (Codex)
`input_tokens`	Non-cached portion only	Total (inclusive of cached)
`cache_read_input_tokens` / `cached_input_tokens`	Cached portion (separate, non-overlapping)	Cached portion (subset of `input_tokens`)

Budi's cost calculator (ModelPricing::calculate_cost_cents in crates/budi-core/src/provider.rs) was designed around Anthropic's convention: it treats input_tokens and cache_read_tokens as non-overlapping categories. It charges input_tokens × input_rate plus cache_read_tokens × cache_read_rate.

The Codex parser (crates/budi-core/src/providers/codex.rs) passed OpenAI's raw input_tokens (which includes cached) alongside cached_input_tokens as cache_read_tokens. This caused double-counting: the cached portion was charged once at full input_rate (as part of input_tokens) and again at cache_read_rate.

Impact

On heavily-cached sessions (typical for Codex, which re-sends the full conversation context each turn), this produced costs approximately 29× higher than the actual API invoice:

Source	Codex cost	Sessions
ccusage (marginal-token)	$41	20
budi (pre-fix, double-counted)	$1,205	6,001 messages

The 29× ratio matches the arithmetic: with ~95% cache hits, charging the full input_tokens at input_rate instead of only the non-cached 5% inflates the input cost by ~20×, which propagates to the total.

Why this is a parser bug, not a cost-model choice

The issue title frames this as "marginal-token vs full-context." On closer inspection, it is neither — it is a token-convention mismatch between providers. OpenAI's actual billing is:

cost = (input_tokens - cached_input_tokens) × input_rate
     + cached_input_tokens × cache_read_rate
     + output_tokens × output_rate

Budi's cost calculator already implements exactly this formula — it just expects input_tokens to mean the non-cached portion (Anthropic's convention). The fix is in the parser, not the calculator.

Decision

1. Subtract cached from input in the Codex parser

In parse_token_count(), after extracting raw input_tokens and cached_input_tokens from either last_token_usage or total_token_usage, apply:

let input_tokens = input_tokens.saturating_sub(cached_input_tokens);

This converts from OpenAI's convention (overlapping) to budi's internal convention (non-overlapping), which matches Anthropic's API and is what calculate_cost_cents expects.

2. No new flags or cost-model modes

Option 3 from the issue ("both, with --cost-model flag") is rejected. There is one correct cost: what the provider actually bills. The pre-fix behavior was not a "full-context" cost model — it was a double-counting bug. A flag to choose between "correct" and "wrong" is not useful.

3. Existing data requires re-import

Rows already ingested with the double-counted cost are stored with cost_cents_ingested reflecting the buggy calculation. Per the Model Pricing – Embedded Baseline and Runtime Refresh §5 Rule B, known-at-the-time rows are never automatically rewritten. Users who want corrected historical Codex costs should run budi db import --force after upgrading.

Consequences

Positive

Codex costs now match actual OpenAI invoices for cached-context sessions.
The cost calculator's assumption (non-overlapping input_tokens / cache_read_tokens) is documented as a provider-level contract, making it explicit for future providers.

Negative

A visible step change in Codex cost reporting after upgrade. Users may see Codex costs drop ~29× for cached sessions. This is correct — the old number was wrong.
Historical Codex rows retain the inflated cost until the user runs budi db import --force.

Neutral

Claude Code, Copilot CLI, Copilot Chat, and Cursor providers are unaffected. Anthropic already reports non-overlapping categories. Copilot and Cursor do not currently report cached-input breakdowns.
No budi-cloud changes needed — cloud consumes cost_cents from daily rollups regardless of how the underlying provider parsed tokens.
No schema migration — the fix is parser-only.

Provider token-convention rule

Any provider whose upstream API reports input_tokens inclusive of cached_input_tokens (OpenAI-family convention) must subtract cached from input in its parse_file implementation before constructing ParsedMessage. The cost calculator's contract is: input_tokens is the non-cached (full-rate) portion; cache_read_tokens is the cached (discounted) portion; they do not overlap.

References

Model Pricing – Embedded Baseline and Runtime Refresh: pricing manifest and immutability rules
Provider Plugin Contract: provider shape contract
tmp-research/ccusage-output-comparison.md: the head-to-head comparison that surfaced the discrepancy

Last verified against code on 2026-05-25.

budi · Issues · Releases · app.getbudi.dev · getbudi.dev

Home

Start here

ADRs — Data & privacy

Cloud Data Contract and Privacy Boundary

ADRs — Ingestion

ADRs — Pricing

ADRs — Provider contracts

Operational references

Ecosystem

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Codex Cost Model – Marginal Token Counting

Codex Cost Model — Marginal-Token Counting

Context

Impact

Why this is a parser bug, not a cost-model choice

Decision

1. Subtract cached from input in the Codex parser

2. No new flags or cost-model modes

3. Existing data requires re-import

Consequences

Positive

Negative

Neutral

Provider token-convention rule

References

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally