Skip to content

Codex Cost Model – Marginal Token Counting

Ivan Seredkin edited this page May 25, 2026 · 2 revisions

Codex Cost Model — Marginal-Token Counting

  • Date: 2026-05-25
  • Status: Accepted
  • Issue: #973
  • Amends: Provider Plugin Contract (adds a provider-level token-convention rule for OpenAI-family providers)

Context

Codex (OpenAI) and Claude Code (Anthropic) report token usage differently:

Field Anthropic (Claude Code) OpenAI (Codex)
input_tokens Non-cached portion only Total (inclusive of cached)
cache_read_input_tokens / cached_input_tokens Cached portion (separate, non-overlapping) Cached portion (subset of input_tokens)

Budi's cost calculator (ModelPricing::calculate_cost_cents in crates/budi-core/src/provider.rs) was designed around Anthropic's convention: it treats input_tokens and cache_read_tokens as non-overlapping categories. It charges input_tokens × input_rate plus cache_read_tokens × cache_read_rate.

The Codex parser (crates/budi-core/src/providers/codex.rs) passed OpenAI's raw input_tokens (which includes cached) alongside cached_input_tokens as cache_read_tokens. This caused double-counting: the cached portion was charged once at full input_rate (as part of input_tokens) and again at cache_read_rate.

Impact

On heavily-cached sessions (typical for Codex, which re-sends the full conversation context each turn), this produced costs approximately 29× higher than the actual API invoice:

Source Codex cost Sessions
ccusage (marginal-token) $41 20
budi (pre-fix, double-counted) $1,205 6,001 messages

The 29× ratio matches the arithmetic: with ~95% cache hits, charging the full input_tokens at input_rate instead of only the non-cached 5% inflates the input cost by ~20×, which propagates to the total.

Why this is a parser bug, not a cost-model choice

The issue title frames this as "marginal-token vs full-context." On closer inspection, it is neither — it is a token-convention mismatch between providers. OpenAI's actual billing is:

cost = (input_tokens - cached_input_tokens) × input_rate
     + cached_input_tokens × cache_read_rate
     + output_tokens × output_rate

Budi's cost calculator already implements exactly this formula — it just expects input_tokens to mean the non-cached portion (Anthropic's convention). The fix is in the parser, not the calculator.

Decision

1. Subtract cached from input in the Codex parser

In parse_token_count(), after extracting raw input_tokens and cached_input_tokens from either last_token_usage or total_token_usage, apply:

let input_tokens = input_tokens.saturating_sub(cached_input_tokens);

This converts from OpenAI's convention (overlapping) to budi's internal convention (non-overlapping), which matches Anthropic's API and is what calculate_cost_cents expects.

2. No new flags or cost-model modes

Option 3 from the issue ("both, with --cost-model flag") is rejected. There is one correct cost: what the provider actually bills. The pre-fix behavior was not a "full-context" cost model — it was a double-counting bug. A flag to choose between "correct" and "wrong" is not useful.

3. Existing data requires re-import

Rows already ingested with the double-counted cost are stored with cost_cents_ingested reflecting the buggy calculation. Per the Model Pricing – Embedded Baseline and Runtime Refresh §5 Rule B, known-at-the-time rows are never automatically rewritten. Users who want corrected historical Codex costs should run budi db import --force after upgrading.

Consequences

Positive

  • Codex costs now match actual OpenAI invoices for cached-context sessions.
  • The cost calculator's assumption (non-overlapping input_tokens / cache_read_tokens) is documented as a provider-level contract, making it explicit for future providers.

Negative

  • A visible step change in Codex cost reporting after upgrade. Users may see Codex costs drop ~29× for cached sessions. This is correct — the old number was wrong.
  • Historical Codex rows retain the inflated cost until the user runs budi db import --force.

Neutral

  • Claude Code, Copilot CLI, Copilot Chat, and Cursor providers are unaffected. Anthropic already reports non-overlapping categories. Copilot and Cursor do not currently report cached-input breakdowns.
  • No budi-cloud changes needed — cloud consumes cost_cents from daily rollups regardless of how the underlying provider parsed tokens.
  • No schema migration — the fix is parser-only.

Provider token-convention rule

Any provider whose upstream API reports input_tokens inclusive of cached_input_tokens (OpenAI-family convention) must subtract cached from input in its parse_file implementation before constructing ParsedMessage. The cost calculator's contract is: input_tokens is the non-cached (full-rate) portion; cache_read_tokens is the cached (discounted) portion; they do not overlap.

References


Last verified against code on 2026-05-25.

Clone this wiki locally