fix(agents): fall back to total token counts#1071
Conversation
Some coding-agent logs can carry only aggregate total token fields when provider-specific input/output/cache token fields are absent or incomplete. Those records were either skipped or under-counted, which made reports show zero tokens or omit the unclassified remainder. Add a shared total-token fallback helper and apply it across the Rust agent adapters. When output tokens are missing, the helper assigns the missing total remainder to output so the row remains visible and cost calculation has billable tokens. When output tokens are already known, the remaining total is kept as extra total tokens so report totals preserve the source aggregate without changing the displayed input/output split. Cover the fallback behavior with targeted tests for Amp, OpenCode, Droid, Codebuff, Copilot, Gemini, Kilo, Kimi, OpenClaw, pi-agent, Qwen, and the shared helper.
|
@coderabbitai review\n\nPlease review the total-token fallback behavior across the coding-agent adapters. This PR intentionally keeps known input/output/cache fields unchanged and only uses aggregate totals when parts are missing or incomplete. |
📝 WalkthroughWalkthroughThis PR introduces a shared token-fallback mechanism across the ccusage crate. A new ChangesTotal Token Fallback Integration
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ✅ Deployment successful! View logs |
ccusage-guide | 1e7f103 | Commit Preview URL Branch Preview URL |
May 19 2026, 07:50 PM |
🧩 Analysis chain🏁 Script executed: Length of output: 2539 🏁 Script executed: Length of output: 44225
✅ Core Utility (
|
| Adapter | Fallback field read | Extra key |
|---|---|---|
| Amp | tokens.total |
— |
| OpenCode | tokens.total |
— |
| Kilo | tokens.total |
— |
| Pi | usage.totalTokens |
dedup key updated ✅ |
| Kimi | token_usage.total |
dedup key updated ✅ |
| Codebuff | totalTokens / total_tokens / total |
dedup key updated ✅ |
| Copilot | gen_ai.usage.total_tokens / total.token_count |
— |
| Gemini | tokens.total (+ new total_tokens alias) |
— |
| Droid | totalTokens |
— |
| OpenClaw | totalTokens |
total_tokens floor corrected ✅ |
| Qwen | totalTokenCount |
— |
Gemini minor enhancement: parse_tokens now also checks total_tokens as a key alias — sensible expansion.
OpenClaw total_tokens floor:
let total_tokens = total_tokens.max(crate::total_usage_tokens(raw_usage) + extra_total_tokens);Correctly ensures the stored total is never less than the sum of parsed parts after fallback. Good defensive move.
🔍 One Edge Case Worth Noting
In Copilot, reasoning is passed as the extra_total_tokens argument and also receives the return value:
let (usage, reasoning) = apply_total_token_fallback(usage, reasoning, total);In the rare scenario where reasoning > 0 and total > known and output_tokens > 0, the returned reasoning will be original_reasoning + overflow. This means reasoning_output_tokens in the candidate would silently absorb unattributed tokens. It's an unlikely combination and doesn't affect the common fallback path (all parts = 0, only total present), but it's worth keeping in mind for future Copilot log format changes.
✅ Tests
All 11 adapters have a targeted regression test following the same structure: write a record with only total tokens, assert output_tokens == total and extra_total_tokens == 0. The two unit tests in utils.rs cover the two main code paths of the helper directly. Coverage is solid.
Overall: The fallback logic is correct, non-destructive to existing known fields, and uniformly applied. The dedup-key updates in codebuff, kimi, and pi are a correctness win. LGTM. 🐇
✅ Actions performed
Review triggered.
Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
rust/crates/ccusage/src/adapter/kimi.rs (1)
337-378:⚠️ Potential issue | 🟠 Major | ⚡ Quick winCost calculation doesn't include
extra_total_tokens, unlike other adapters.In
kimi_entry_to_loaded, theusagestruct passed tocalculate_kimi_costdoesn't account forextra_total_tokens. Compare with OpenCode (lines 66-71) and Qwen (lines 121-126) which addextra_total_tokenstooutput_tokensfor billing purposes.When a record has partial breakdowns (e.g., input=100, output=50, total=500), the 350 extra tokens would be stored but not billed.
Proposed fix
fn kimi_entry_to_loaded( entry: KimiUsageEntry, tz: Option<&JiffTimeZone>, mode: CostMode, pricing: &PricingMap, ) -> LoadedEntry { let usage = TokenUsageRaw { input_tokens: entry.input_tokens, output_tokens: entry.output_tokens, cache_creation_input_tokens: entry.cache_creation_tokens, cache_read_input_tokens: entry.cache_read_tokens, speed: None, }; - let cost = calculate_kimi_cost(&entry, mode, pricing, usage); + let cost_usage = TokenUsageRaw { + output_tokens: usage.output_tokens.saturating_add(entry.extra_total_tokens), + ..usage + }; + let cost = calculate_kimi_cost(&entry, mode, pricing, cost_usage);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rust/crates/ccusage/src/adapter/kimi.rs` around lines 337 - 378, The cost calculation in kimi_entry_to_loaded omits entry.extra_total_tokens when building the TokenUsageRaw passed to calculate_kimi_cost; update TokenUsageRaw construction in kimi_entry_to_loaded so output_tokens includes entry.extra_total_tokens (e.g., output_tokens: entry.output_tokens + entry.extra_total_tokens) before calling calculate_kimi_cost, keeping all other fields (input_tokens, cache_creation_input_tokens, cache_read_input_tokens, speed) the same.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@rust/crates/ccusage/src/adapter/kimi.rs`:
- Around line 337-378: The cost calculation in kimi_entry_to_loaded omits
entry.extra_total_tokens when building the TokenUsageRaw passed to
calculate_kimi_cost; update TokenUsageRaw construction in kimi_entry_to_loaded
so output_tokens includes entry.extra_total_tokens (e.g., output_tokens:
entry.output_tokens + entry.extra_total_tokens) before calling
calculate_kimi_cost, keeping all other fields (input_tokens,
cache_creation_input_tokens, cache_read_input_tokens, speed) the same.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: aef96618-190c-43a5-88f7-e34ea5005f14
📒 Files selected for processing (13)
rust/crates/ccusage/src/adapter/amp.rsrust/crates/ccusage/src/adapter/codebuff.rsrust/crates/ccusage/src/adapter/copilot.rsrust/crates/ccusage/src/adapter/droid.rsrust/crates/ccusage/src/adapter/gemini.rsrust/crates/ccusage/src/adapter/kilo.rsrust/crates/ccusage/src/adapter/kimi.rsrust/crates/ccusage/src/adapter/openclaw.rsrust/crates/ccusage/src/adapter/opencode/parser.rsrust/crates/ccusage/src/adapter/pi.rsrust/crates/ccusage/src/adapter/qwen/parser.rsrust/crates/ccusage/src/main.rsrust/crates/ccusage/src/utils.rs
ccusage
@ccusage/ccusage-darwin-arm64
@ccusage/ccusage-darwin-x64
@ccusage/ccusage-linux-arm64
@ccusage/ccusage-linux-x64
@ccusage/ccusage-win32-arm64
@ccusage/ccusage-win32-x64
commit: |
ccusage performance comparisonPR SHA: This compares the Rust PR release binary against the configured base package on the same CI runner. Package runner startupExecution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one
Cached bunx execution performanceRuns the same large fixture through Fixtures: Claude
Package runtime diagnosticsCompares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself. Fixtures: Claude
Committed fixture performanceCommitted small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage. Fixtures: Claude
Large real-world-shaped fixture performanceGenerated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures. Fixtures: Claude
Artifact size
Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees. |
ccusage performance comparisonPR SHA: This compares the PR package against the configured base package on the same CI runner. Package runner startupExecution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one
Cached bunx execution performanceRuns the same large fixture through Fixtures: Claude
Package runtime diagnosticsCompares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself. Fixtures: Claude
Committed fixture performanceCommitted small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage. Fixtures: Claude
Large real-world-shaped fixture performanceGenerated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures. Fixtures: Claude
Artifact size
Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees. |
Summary
Testing
Summary by cubic
Add a shared fallback to use aggregate total token counts when input/output/cache fields are missing, so usage rows aren’t dropped and costs are accurate.
apply_total_token_fallback: if output tokens are missing, assign the remainder fromtotal; otherwise store it asextra_total_tokens.extra_total_tokens(added to output at billing time) while keeping the displayed split unchanged.Written for commit 1e7f103. Summary will update on new commits. Review in cubic
Summary by CodeRabbit
New Features
Bug Fixes
Tests