Skip to content

fix(agents): fall back to total token counts#1071

Merged
ryoppippi merged 1 commit into
mainfrom
codex/agent-token-fallback
May 19, 2026
Merged

fix(agents): fall back to total token counts#1071
ryoppippi merged 1 commit into
mainfrom
codex/agent-token-fallback

Conversation

@ryoppippi
Copy link
Copy Markdown
Owner

@ryoppippi ryoppippi commented May 19, 2026

Summary

  • add a shared total-token fallback for agent adapters when only aggregate totals are present
  • apply it across Amp, OpenCode, Droid, Codebuff, Copilot, Gemini, Kilo, Kimi, OpenClaw, pi-agent, and Qwen
  • include targeted regression tests for each adapter fallback path

Testing

  • pnpm run format
  • cargo test --manifest-path rust/Cargo.toml -p ccusage total_tokens -- --nocapture
  • cargo clippy --manifest-path rust/Cargo.toml -p ccusage --all-targets -- -D warnings
  • pnpm typecheck
  • pnpm run test

Summary by cubic

Add a shared fallback to use aggregate total token counts when input/output/cache fields are missing, so usage rows aren’t dropped and costs are accurate.

  • Bug Fixes
    • Introduced apply_total_token_fallback: if output tokens are missing, assign the remainder from total; otherwise store it as extra_total_tokens.
    • Applied across adapters: Amp, OpenCode, Droid, Codebuff, Copilot, Gemini, Kilo, Kimi, OpenClaw, pi-agent, Qwen.
    • Cost calculations now include extra_total_tokens (added to output at billing time) while keeping the displayed split unchanged.
    • Added targeted regression tests for each adapter and the shared helper.

Written for commit 1e7f103. Summary will update on new commits. Review in cubic

Summary by CodeRabbit

  • New Features

    • Added fallback token counting when detailed token breakdowns are unavailable from AI providers, improving accuracy and completeness of token usage tracking.
    • Enhanced token usage parsing across multiple AI provider integrations (AMP, Copilot, Gemini, Kilo, Kimi, and others) to handle missing or incomplete token data.
  • Bug Fixes

    • Fixed cost calculations to accurately reflect total token usage when detailed component data is absent from provider responses.
  • Tests

    • Added unit tests verifying fallback behavior for all updated provider adapters.

Review Change Stack

Some coding-agent logs can carry only aggregate total token fields when provider-specific input/output/cache token fields are absent or incomplete. Those records were either skipped or under-counted, which made reports show zero tokens or omit the unclassified remainder.

Add a shared total-token fallback helper and apply it across the Rust agent adapters. When output tokens are missing, the helper assigns the missing total remainder to output so the row remains visible and cost calculation has billable tokens. When output tokens are already known, the remaining total is kept as extra total tokens so report totals preserve the source aggregate without changing the displayed input/output split.

Cover the fallback behavior with targeted tests for Amp, OpenCode, Droid, Codebuff, Copilot, Gemini, Kilo, Kimi, OpenClaw, pi-agent, Qwen, and the shared helper.
@ryoppippi
Copy link
Copy Markdown
Owner Author

@coderabbitai review\n\nPlease review the total-token fallback behavior across the coding-agent adapters. This PR intentionally keeps known input/output/cache fields unchanged and only uses aggregate totals when parts are missing or incomplete.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 19, 2026

📝 Walkthrough

Walkthrough

This PR introduces a shared token-fallback mechanism across the ccusage crate. A new apply_total_token_fallback utility computes missing token counts from total-token metadata when per-component breakdowns are incomplete. This utility is then applied to eleven adapter modules, each updated to read total-token fields, invoke the fallback, and propagate extra_total_tokens into cost calculations and storage.

Changes

Total Token Fallback Integration

Layer / File(s) Summary
Core fallback utility and re-export
rust/crates/ccusage/src/utils.rs, rust/crates/ccusage/src/main.rs
New apply_total_token_fallback(usage, extra_total_tokens, total_tokens) function computes adjusted token usage by filling unknown output tokens or accumulating excess into extra totals. Returns (TokenUsageRaw, u64) tuple. Includes unit tests for both fill and accumulate branches. Utility is re-exported from main.rs for adapter use.
Adapter implementations
rust/crates/ccusage/src/adapter/{amp,codebuff,copilot,droid,gemini,kilo,kimi,openclaw,opencode,pi,qwen}.rs, rust/crates/ccusage/src/adapter/opencode/parser.rs
Each of eleven adapters is updated with the same pattern: import apply_total_token_fallback, read total-token field(s) from incoming data, call the fallback to compute (usage, extra_total_tokens), update cost calculation by adding extra_total_tokens to output tokens before pricing, store extra_total_tokens on LoadedEntry instead of forcing zero, and add a test verifying fallback behavior when per-component token fields are missing. Changes also include updating skip conditions and deduplication keys to account for extra_total_tokens.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • ryoppippi/ccusage#1031: Kimi adapter token parsing now applies apply_total_token_fallback to read token_usage.total and propagate extra_total_tokens.
  • ryoppippi/ccusage#1029: Codebuff adapter token parsing is extended with apply_total_token_fallback logic for missing token components.
  • ryoppippi/ccusage#1021: Gemini adapter's token computation is refactored to use the new apply_total_token_fallback utility for consistent fallback behavior.

A token breaks down with grace,
When parts go missing from their place,
The fallback reads the total sum,
And fills the gaps where parts are dumb,
Eleven adapters dance as one. 🐰✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 56.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(agents): fall back to total token counts' directly and clearly summarizes the main change: implementing a fallback mechanism to handle total token counts across multiple agent adapters when component-level token fields are missing.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/agent-token-fallback

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@cloudflare-workers-and-pages
Copy link
Copy Markdown

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
ccusage-guide 1e7f103 Commit Preview URL

Branch Preview URL
May 19 2026, 07:50 PM

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 19, 2026

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get the PR diff
gh pr view 1071 --repo ryoppippi/ccusage --json title,body,files,additions,deletions

# Get the list of changed files
gh pr diff 1071 --repo ryoppippi/ccusage --name-only

Length of output: 2539


🏁 Script executed:

#!/bin/bash
# Get the full diff
gh pr diff 1071 --repo ryoppippi/ccusage

Length of output: 44225


@ryoppippi — I've reviewed the full diff. The implementation is clean and consistently applied. Here's my analysis:


✅ Core Utility (utils.rs)

The apply_total_token_fallback function is well-designed:

known = Σ(usage fields) + extra_total_tokens
missing = total_tokens - known   (saturating)
  • No-op when missing == 0 — preserves all known fields unchanged, as intended.
  • total < known — handled gracefully by saturating_sub (returns 0 → no-op). Silent acceptance of inconsistent data is reasonable here.
  • total == 0 — same saturating path; no spurious filling.
  • Fill priority: puts missing into output_tokens when it's zero, otherwise spills into extra_total_tokens. This correctly avoids overwriting a known output count.

The two unit tests in utils.rs cover both branches cleanly.


📋 Adapter-by-Adapter Notes

All adapters follow the same pattern — no deviations spotted.

Adapter Fallback field read Extra key
Amp tokens.total
OpenCode tokens.total
Kilo tokens.total
Pi usage.totalTokens dedup key updated ✅
Kimi token_usage.total dedup key updated ✅
Codebuff totalTokens / total_tokens / total dedup key updated ✅
Copilot gen_ai.usage.total_tokens / total.token_count
Gemini tokens.total (+ new total_tokens alias)
Droid totalTokens
OpenClaw totalTokens total_tokens floor corrected ✅
Qwen totalTokenCount

Gemini minor enhancement: parse_tokens now also checks total_tokens as a key alias — sensible expansion.

OpenClaw total_tokens floor:

let total_tokens = total_tokens.max(crate::total_usage_tokens(raw_usage) + extra_total_tokens);

Correctly ensures the stored total is never less than the sum of parsed parts after fallback. Good defensive move.


🔍 One Edge Case Worth Noting

In Copilot, reasoning is passed as the extra_total_tokens argument and also receives the return value:

let (usage, reasoning) = apply_total_token_fallback(usage, reasoning, total);

In the rare scenario where reasoning > 0 and total > known and output_tokens > 0, the returned reasoning will be original_reasoning + overflow. This means reasoning_output_tokens in the candidate would silently absorb unattributed tokens. It's an unlikely combination and doesn't affect the common fallback path (all parts = 0, only total present), but it's worth keeping in mind for future Copilot log format changes.


✅ Tests

All 11 adapters have a targeted regression test following the same structure: write a record with only total tokens, assert output_tokens == total and extra_total_tokens == 0. The two unit tests in utils.rs cover the two main code paths of the helper directly. Coverage is solid.


Overall: The fallback logic is correct, non-destructive to existing known fields, and uniformly applied. The dedup-key updates in codebuff, kimi, and pi are a correctness win. LGTM. 🐇

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
rust/crates/ccusage/src/adapter/kimi.rs (1)

337-378: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Cost calculation doesn't include extra_total_tokens, unlike other adapters.

In kimi_entry_to_loaded, the usage struct passed to calculate_kimi_cost doesn't account for extra_total_tokens. Compare with OpenCode (lines 66-71) and Qwen (lines 121-126) which add extra_total_tokens to output_tokens for billing purposes.

When a record has partial breakdowns (e.g., input=100, output=50, total=500), the 350 extra tokens would be stored but not billed.

Proposed fix
 fn kimi_entry_to_loaded(
     entry: KimiUsageEntry,
     tz: Option<&JiffTimeZone>,
     mode: CostMode,
     pricing: &PricingMap,
 ) -> LoadedEntry {
     let usage = TokenUsageRaw {
         input_tokens: entry.input_tokens,
         output_tokens: entry.output_tokens,
         cache_creation_input_tokens: entry.cache_creation_tokens,
         cache_read_input_tokens: entry.cache_read_tokens,
         speed: None,
     };
-    let cost = calculate_kimi_cost(&entry, mode, pricing, usage);
+    let cost_usage = TokenUsageRaw {
+        output_tokens: usage.output_tokens.saturating_add(entry.extra_total_tokens),
+        ..usage
+    };
+    let cost = calculate_kimi_cost(&entry, mode, pricing, cost_usage);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rust/crates/ccusage/src/adapter/kimi.rs` around lines 337 - 378, The cost
calculation in kimi_entry_to_loaded omits entry.extra_total_tokens when building
the TokenUsageRaw passed to calculate_kimi_cost; update TokenUsageRaw
construction in kimi_entry_to_loaded so output_tokens includes
entry.extra_total_tokens (e.g., output_tokens: entry.output_tokens +
entry.extra_total_tokens) before calling calculate_kimi_cost, keeping all other
fields (input_tokens, cache_creation_input_tokens, cache_read_input_tokens,
speed) the same.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@rust/crates/ccusage/src/adapter/kimi.rs`:
- Around line 337-378: The cost calculation in kimi_entry_to_loaded omits
entry.extra_total_tokens when building the TokenUsageRaw passed to
calculate_kimi_cost; update TokenUsageRaw construction in kimi_entry_to_loaded
so output_tokens includes entry.extra_total_tokens (e.g., output_tokens:
entry.output_tokens + entry.extra_total_tokens) before calling
calculate_kimi_cost, keeping all other fields (input_tokens,
cache_creation_input_tokens, cache_read_input_tokens, speed) the same.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: aef96618-190c-43a5-88f7-e34ea5005f14

📥 Commits

Reviewing files that changed from the base of the PR and between 7e20024 and 1e7f103.

📒 Files selected for processing (13)
  • rust/crates/ccusage/src/adapter/amp.rs
  • rust/crates/ccusage/src/adapter/codebuff.rs
  • rust/crates/ccusage/src/adapter/copilot.rs
  • rust/crates/ccusage/src/adapter/droid.rs
  • rust/crates/ccusage/src/adapter/gemini.rs
  • rust/crates/ccusage/src/adapter/kilo.rs
  • rust/crates/ccusage/src/adapter/kimi.rs
  • rust/crates/ccusage/src/adapter/openclaw.rs
  • rust/crates/ccusage/src/adapter/opencode/parser.rs
  • rust/crates/ccusage/src/adapter/pi.rs
  • rust/crates/ccusage/src/adapter/qwen/parser.rs
  • rust/crates/ccusage/src/main.rs
  • rust/crates/ccusage/src/utils.rs

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented May 19, 2026

Open in StackBlitz

ccusage

npx https://pkg.pr.new/ryoppippi/ccusage@1071

@ccusage/ccusage-darwin-arm64

npx https://pkg.pr.new/ryoppippi/ccusage/@ccusage/ccusage-darwin-arm64@1071

@ccusage/ccusage-darwin-x64

npx https://pkg.pr.new/ryoppippi/ccusage/@ccusage/ccusage-darwin-x64@1071

@ccusage/ccusage-linux-arm64

npx https://pkg.pr.new/ryoppippi/ccusage/@ccusage/ccusage-linux-arm64@1071

@ccusage/ccusage-linux-x64

npx https://pkg.pr.new/ryoppippi/ccusage/@ccusage/ccusage-linux-x64@1071

@ccusage/ccusage-win32-arm64

npx https://pkg.pr.new/ryoppippi/ccusage/@ccusage/ccusage-win32-arm64@1071

@ccusage/ccusage-win32-x64

npx https://pkg.pr.new/ryoppippi/ccusage/@ccusage/ccusage-win32-x64@1071

commit: 1e7f103

@github-actions
Copy link
Copy Markdown

ccusage performance comparison

PR SHA: 1e7f1038163a
Base SHA: 7e200245fa60

This compares the Rust PR release binary against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package SHA Execution setup Bunx temp cache Bunx warm median Warm samples
Base pkg.pr.new 7e200245fa60 652.2ms 577.5ms 31.8ms 3
PR pkg.pr.new 1e7f1038163a 698.3ms 2.261s 32.2ms 3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: 7e200245fa60; PR package: 1e7f1038163a. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.

Command Input Base median PR median PR vs base Base throughput PR throughput
bunx -p <pkg> ccusage claude --offline --json 1.01 GiB 697.0ms 694.7ms 1.00x 1.44 GiB/s 1.45 GiB/s
bunx -p <pkg> ccusage codex --offline --json 1.01 GiB 824.7ms 833.2ms 0.99x 1.22 GiB/s 1.21 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command Runtime Input Median Throughput Samples
claude --offline --json Package wrapper 1.01 GiB 685.5ms 1.47 GiB/s 1
claude --offline --json Installed native binary 1.01 GiB 655.0ms 1.54 GiB/s 1
claude --offline --json Workspace release binary 1.01 GiB 705.8ms 1.43 GiB/s 1
codex --offline --json Package wrapper 1.01 GiB 820.5ms 1.23 GiB/s 1
codex --offline --json Installed native binary 1.01 GiB 793.4ms 1.27 GiB/s 1
codex --offline --json Workspace release binary 1.01 GiB 805.2ms 1.25 GiB/s 1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.

Command Input Base median PR median PR vs base Base throughput PR throughput
claude daily --offline --json 0.00 MiB 30.2ms 4.5ms 6.70x 0.05 MiB/s 0.34 MiB/s
claude session --offline --json 0.00 MiB 30.1ms 4.6ms 6.56x 0.05 MiB/s 0.34 MiB/s
codex daily --offline --json 0.00 MiB 30.2ms 4.3ms 7.07x 0.03 MiB/s 0.20 MiB/s
codex session --offline --json 0.00 MiB 29.6ms 4.2ms 7.04x 0.03 MiB/s 0.20 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.

Command Input Base median PR median PR vs base Base throughput PR throughput
claude --offline --json 1.01 GiB 687.2ms 703.3ms 0.98x 1.47 GiB/s 1.43 GiB/s
codex --offline --json 1.01 GiB 822.0ms 793.5ms 1.04x 1.22 GiB/s 1.27 GiB/s

Artifact size

Artifact Base PR Delta Ratio
packed ccusage-*.tgz 9.45 KiB 9.44 KiB -0.00 KiB 1.00x
installed native package binary 3288.24 KiB 3288.24 KiB +0.00 KiB 1.00x
Rust release binary rust/target/release/ccusage - 2955.68 KiB - -

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

@github-actions
Copy link
Copy Markdown

ccusage performance comparison

PR SHA: 1e7f1038163a
Base SHA: 7e200245fa60

This compares the PR package against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package SHA Execution setup Bunx temp cache Bunx warm median Warm samples
Base pkg.pr.new 7e200245fa60 497.4ms 412.2ms 35.9ms 3
PR pkg.pr.new 1e7f1038163a 719.8ms 907.0ms 34.9ms 3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: 7e200245fa60; PR package: 1e7f1038163a. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.

Command Input Base median PR median PR vs base Base throughput PR throughput
bunx -p <pkg> ccusage claude --offline --json 1.01 GiB 692.6ms 694.9ms 1.00x 1.45 GiB/s 1.45 GiB/s
bunx -p <pkg> ccusage codex --offline --json 1.01 GiB 887.4ms 886.3ms 1.00x 1.13 GiB/s 1.14 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command Runtime Input Median Throughput Samples
claude --offline --json Package wrapper 1.01 GiB 694.1ms 1.45 GiB/s 1
claude --offline --json Installed native binary 1.01 GiB 658.2ms 1.53 GiB/s 1
claude --offline --json Workspace release binary 1.01 GiB 723.5ms 1.39 GiB/s 1
codex --offline --json Package wrapper 1.01 GiB 875.3ms 1.15 GiB/s 1
codex --offline --json Installed native binary 1.01 GiB 842.4ms 1.20 GiB/s 1
codex --offline --json Workspace release binary 1.01 GiB 840.3ms 1.20 GiB/s 1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.

Command Input Base median PR median PR vs base Base throughput PR throughput
claude daily --offline --json 0.00 MiB 33.0ms 36.7ms 0.90x 0.05 MiB/s 0.04 MiB/s
claude session --offline --json 0.00 MiB 32.7ms 32.8ms 1.00x 0.05 MiB/s 0.05 MiB/s
codex daily --offline --json 0.00 MiB 32.4ms 32.9ms 0.98x 0.03 MiB/s 0.03 MiB/s
codex session --offline --json 0.00 MiB 32.8ms 33.0ms 0.99x 0.03 MiB/s 0.03 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.

Command Input Base median PR median PR vs base Base throughput PR throughput
claude --offline --json 1.01 GiB 775.5ms 691.7ms 1.12x 1.30 GiB/s 1.46 GiB/s
codex --offline --json 1.01 GiB 897.4ms 890.6ms 1.01x 1.12 GiB/s 1.13 GiB/s

Artifact size

Artifact Base PR Delta Ratio
packed ccusage-*.tgz 9.45 KiB 9.44 KiB -0.00 KiB 1.00x
installed native package binary 3288.24 KiB 3288.24 KiB +0.00 KiB 1.00x
Rust release binary rust/target/release/ccusage - 2955.68 KiB - -

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

@ryoppippi ryoppippi merged commit 4bc6794 into main May 19, 2026
39 checks passed
@ryoppippi ryoppippi deleted the codex/agent-token-fallback branch May 19, 2026 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant