fix(codex): show non-cached input tokens#1067
Conversation
Codex usage events report OpenAI-style input tokens that include cached prompt tokens. The native all-agent and Codex reports were forwarding that inclusive value directly, so days with heavy prompt caching could show an Input column that looked close to Cache Read even though cost calculation had already treated cached tokens separately. Normalise Codex report output to expose non-cached input tokens while preserving cachedInputTokens, totalTokens, and the raw internal values used for pricing. Apply the same normalisation when Codex rows are folded into the all-agent report, and cover the behaviour with regression tests for both report paths. Fixes #1056
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughCompute Codex non-cached input tokens (saturating subtraction of cached from total) and use that value in JSON output and AllRow generation; update tests to assert non-cached daily and per-model inputTokens while preserving cachedInputTokens. ChangesCodex non-cached input token reporting
AllRow integration and testing
Integration test updates
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review Please review this PR. The key change is that Codex input token display now separates non-cached input from cached input while preserving total token and pricing semantics. |
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ✅ Deployment successful! View logs |
ccusage-guide | 665500d | Commit Preview URL Branch Preview URL |
May 19 2026, 04:25 PM |
🧩 Analysis chain🏁 Script executed: Length of output: 7140 🏁 Script executed: Length of output: 11096
ReviewWhat's correct ✅
Minor observations1. Redundant guard in // rust/crates/ccusage/src/adapter/codex.rs (new function)
pub(crate) fn non_cached_input_tokens(input_tokens: u64, cached_input_tokens: u64) -> u64 {
input_tokens.saturating_sub(cached_input_tokens.min(input_tokens))
}The // Already in the codebase – consistent pattern
let non_cached_input = usage.input_tokens.saturating_sub(usage.cached_input_tokens);Simplifying to 2. // rust/crates/ccusage/src/types.rs
#[derive(Debug, Clone, Default, Serialize)]
#[serde(rename_all = "camelCase")]
pub(crate) struct CodexModelUsage {
pub(crate) input_tokens: u64, // ← still serializes as raw inclusive value
...
}Now that
3. Small test coverage gap — The new test // Missing assertion in the new test:
assert_eq!(report["totals"]["inputTokens"], 10);Since Summary
The PR is in good shape — the three items above are nits/minor suggestions, not blockers. ✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
rust/crates/ccusage/src/adapter/codex.rs (1)
717-725: ⚡ Quick winAdd an explicit totals input assertion for regression completeness.
Please also assert
report["totals"]["inputTokens"](expected10) so the totals path patched at Line 490 is directly covered.✅ Suggested test addition
assert_eq!(report["daily"][0]["inputTokens"], 10); assert_eq!(report["daily"][0]["cachedInputTokens"], 90); assert_eq!(report["daily"][0]["totalTokens"], 105); assert_eq!(report["daily"][0]["models"]["gpt-5"]["inputTokens"], 10); assert_eq!( report["daily"][0]["models"]["gpt-5"]["cachedInputTokens"], 90 ); + assert_eq!(report["totals"]["inputTokens"], 10);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rust/crates/ccusage/src/adapter/codex.rs` around lines 717 - 725, Add an explicit assertion that verifies the totals path: after the existing per-day and per-model assertions on the `report` JSON, add an `assert_eq!` that checks `report["totals"]["inputTokens"]` equals 10 so the totals path patched at the earlier change (around the totals handling at Line 490) is directly covered; locate the assertion block using the `report` variable in this test in codex.rs and insert the totals assertion alongside the other `assert_eq!` checks for `daily` and `models`.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@rust/crates/ccusage/src/adapter/codex.rs`:
- Around line 717-725: Add an explicit assertion that verifies the totals path:
after the existing per-day and per-model assertions on the `report` JSON, add an
`assert_eq!` that checks `report["totals"]["inputTokens"]` equals 10 so the
totals path patched at the earlier change (around the totals handling at Line
490) is directly covered; locate the assertion block using the `report` variable
in this test in codex.rs and insert the totals assertion alongside the other
`assert_eq!` checks for `daily` and `models`.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4a5d53d2-703d-41ec-8ff9-d2f50cee7b40
📒 Files selected for processing (3)
rust/crates/ccusage/src/adapter/all.rsrust/crates/ccusage/src/adapter/codex.rsrust/crates/ccusage/src/main.rs
Address CodeRabbit review feedback by covering the totals path that was normalised with the Codex row and per-model JSON output. This keeps the regression test aligned with the display bug fixed in #1067. Also simplify the non-cached input helper to match the existing pricing code and remove direct Serialize support from CodexModelUsage so future user-facing JSON must go through the normalised serializer.
|
@coderabbitai addressed your review feedback in 9a4afc2. Changes made:
Validation after the update:
@coderabbitai review |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 1320
The fix is correct, complete, and the regression tests now cover all patched paths. (ʳ ´•̥̥̥ ω•̥̥̥`)っ🐇 [approve] |
ccusage performance comparisonPR SHA: This compares the Rust PR release binary against the configured base package on the same CI runner. Package runner startupExecution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one
Cached bunx execution performanceRuns the same large fixture through Fixtures: Claude
Package runtime diagnosticsCompares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself. Fixtures: Claude
Committed fixture performanceCommitted small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage. Fixtures: Claude
Large real-world-shaped fixture performanceGenerated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures. Fixtures: Claude
Artifact size
Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees. |
ccusage performance comparisonPR SHA: This compares the PR package against the configured base package on the same CI runner. Package runner startupExecution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one
Cached bunx execution performanceRuns the same large fixture through Fixtures: Claude
Package runtime diagnosticsCompares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself. Fixtures: Claude
Committed fixture performanceCommitted small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage. Fixtures: Claude
Large real-world-shaped fixture performanceGenerated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures. Fixtures: Claude
Artifact size
Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees. |
ccusage
@ccusage/ccusage-darwin-arm64
@ccusage/ccusage-darwin-x64
@ccusage/ccusage-linux-arm64
@ccusage/ccusage-linux-x64
@ccusage/ccusage-win32-arm64
@ccusage/ccusage-win32-x64
commit: |
Resolve the all-agent adapter test conflict by keeping main branch parallel loader coverage and this branch Codex non-cached input token regression import.\n\nValidation after the merge:\n- direnv exec . pnpm run format\n- direnv exec . pnpm typecheck\n- direnv exec . pnpm run test
ccusage performance comparisonPR SHA: This compares the PR package against the configured base package on the same CI runner. Package runner startupExecution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one
Cached bunx execution performanceRuns the same large fixture through Fixtures: Claude
Package runtime diagnosticsCompares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself. Fixtures: Claude
Committed fixture performanceCommitted small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage. Fixtures: Claude
Large real-world-shaped fixture performanceGenerated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures. Fixtures: Claude
Artifact size
Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees. |
ccusage performance comparisonPR SHA: This compares the Rust PR release binary against the configured base package on the same CI runner. Package runner startupExecution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one
Cached bunx execution performanceRuns the same large fixture through Fixtures: Claude
Package runtime diagnosticsCompares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself. Fixtures: Claude
Committed fixture performanceCommitted small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage. Fixtures: Claude
Large real-world-shaped fixture performanceGenerated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures. Fixtures: Claude
Artifact size
Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees. |
|
@coderabbitai pushed 665500d to merge latest main and resolve the all-agent adapter test conflict. Conflict resolution kept both main branch parallel loader test coverage and this PR Codex non-cached input token regression import. Validation after the merge:
@coderabbitai review |
🧩 Analysis chain🏁 Script executed: Length of output: 12749 🏁 Script executed: Length of output: 2483
The merge state shows (ฅ•ᴗ•ฅ) 🐇 ✅ Actions performedReview triggered.
|
ccusage performance comparisonPR SHA: This compares the PR package against the configured base package on the same CI runner. Package runner startupExecution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one
Cached bunx execution performanceRuns the same large fixture through Fixtures: Claude
Package runtime diagnosticsCompares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself. Fixtures: Claude
Committed fixture performanceCommitted small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage. Fixtures: Claude
Large real-world-shaped fixture performanceGenerated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures. Fixtures: Claude
Artifact size
Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees. |
ccusage performance comparisonPR SHA: This compares the Rust PR release binary against the configured base package on the same CI runner. Package runner startupExecution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one
Cached bunx execution performanceRuns the same large fixture through Fixtures: Claude
Package runtime diagnosticsCompares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself. Fixtures: Claude
Committed fixture performanceCommitted small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage. Fixtures: Claude
Large real-world-shaped fixture performanceGenerated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures. Fixtures: Claude
Artifact size
Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees. |
Summary
Root Cause
Codex usage events use OpenAI-style input token counts where cached prompt tokens are included in `input_tokens`. The native reports forwarded that inclusive value to the user-facing Input column while also showing cached tokens separately, which made heavy cache-read days look like far more paid input than the reported cost supported.
Validation
Fixes #1056
Summary by cubic
Show non-cached Codex input tokens in Codex and all-agent reports so the Input column reflects paid input and matches cost. Cached reads stay visible, pricing is unchanged, and per-model rows and totals are normalized. Fixes #1056.
Bug Fixes
cachedInputTokensfrom CodexinputTokensin group rows, per-model entries, and totals (including all-agent Codex rows).cachedInputTokens,totalTokens, andcostUSDsemantics.Refactors
SerializefromCodexModelUsageand serialize via a normalizing helper so all user-facing JSON shows non-cached input.Written for commit 665500d. Summary will update on new commits. Review in cubic
Summary by CodeRabbit
Bug Fixes
Tests