fix(codex): skip replayed parent token history in thread_spawn subagent sessions by pullfrog[bot] · Pull Request #1218 · ryoppippi/ccusage

pullfrog · 2026-06-06T13:49:02Z

Summary

Fixes #950 — Massive token overcounting for Codex subagent sessions (91x inflation).

When OpenAI Codex spawns subagent threads via thread_spawn, the subagent rollout JSONL files contain a full replay of the parent thread's token usage history, re-timestamped to the subagent creation time. This caused usage to be reported up to 91x higher than actual.

Root Cause (3-layer inflation)

Parent history replay: Subagent files replay the parent's full token usage history with timestamps set to subagent creation time
Duplicate entries: ~47% of replayed entries are exact duplicates within the same file
Multiple subagents: 12 subagents each independently replay the same parent history

Fix

Detects thread_spawn subagent sessions by scanning for the thread_spawn byte pattern in the file prefix. For subagent sessions, a pre-scan identifies the replay timestamp pattern (≥2 token_count entries sharing the same second). In the main parse loop, all token_count entries matching the confirmed replay timestamp are skipped.

Changes

parser.rs: Added is_codex_subagent_session(), detect_subagent_replay_second(), and replay skip logic in visit_codex_session_file
loader.rs: Added two test fixtures:
- Single subagent with replayed parent history (2 replay + 2 real entries)
- Three subagent files each replaying the same parent history

All 215 existing tests pass unchanged.

^{｜ View workflow run ｜ via Pullfrog ｜ Using DeepSeek Pro (free via Pullfrog for OSS) ｜ 𝕏}

Summary by cubic

Fixes #950. Corrects token overcounting in Codex subagent sessions by skipping replayed parent history in thread_spawn rollouts and preserving cumulative baselines, preventing up to 91x inflation.

Bug Fixes
- Detect subagent files by scanning for thread_spawn.
- Pre-scan for the replay second (≥2 token_count lines in the same second).
- Skip token_count entries with that timestamp; stop after the first non-replay line.
- Preserve replayed total_token_usage as the baseline so the first real subagent entry is counted as a delta.
- Added tests for single/multiple subagent files and the baseline-only case.
Dependencies
- Pin actions/checkout@v6 and pullfrog/pullfrog@v0 to commit SHAs in the workflow.

^{Written for commit a668d2c. Summary will update on new commits.}

cloudflare-workers-and-pages · 2026-06-06T13:49:32Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Preview URL	Updated (UTC)
✅ Deployment successful! View logs	ccusage-guide	`a668d2c`	Commit Preview URL Branch Preview URL	Jun 08 2026, 09:21 PM

ryoppippi · 2026-06-06T13:49:34Z

@pullfrog rebase main

pullfrog · 2026-06-06T13:49:36Z

Rebased pullfrog/950-fix-codex-subagent-replay-overcounting onto main. Resolved a merge conflict in .github/workflows/pullfrog.yml (pinned commit hashes vs unpinned with zizmor ignore comments — kept the pinned hashes, which supersede the unpinned approach). Force-pushed the rebased branch.

Task list (5/5 completed)

Fetch latest main branch
Rebase current branch onto main
Resolve merge conflict in .github/workflows/pullfrog.yml
Push rebased branch
Report results

^{｜ View workflow run ｜ via Pullfrog ｜ Using DeepSeek Pro (free via Pullfrog for OSS) ｜ 𝕏}

…nt sessions When Codex spawns subagent threads via thread_spawn, the subagent rollout JSONL files contain a full replay of the parent thread's token usage history, re-timestamped to the subagent creation time. This caused usage to be reported up to 91x higher than actual. Detect subagent sessions by scanning for the thread_spawn byte pattern, then pre-scan the file for the replay timestamp pattern (>=2 token_count entries with the same second). In the main parse loop, skip all token_count entries matching the confirmed replay timestamp. Fixes #950

pkg-pr-new · 2026-06-06T14:18:44Z

Open in StackBlitz

ccusage

npx https://pkg.pr.new/ccusage@1218

@ccusage/ccusage-darwin-arm64

npx https://pkg.pr.new/@ccusage/ccusage-darwin-arm64@1218

@ccusage/ccusage-darwin-x64

npx https://pkg.pr.new/@ccusage/ccusage-darwin-x64@1218

@ccusage/ccusage-linux-arm64

npx https://pkg.pr.new/@ccusage/ccusage-linux-arm64@1218

@ccusage/ccusage-linux-x64

npx https://pkg.pr.new/@ccusage/ccusage-linux-x64@1218

@ccusage/ccusage-win32-arm64

npx https://pkg.pr.new/@ccusage/ccusage-win32-arm64@1218

@ccusage/ccusage-win32-x64

npx https://pkg.pr.new/@ccusage/ccusage-win32-x64@1218

commit: a668d2c

github-actions · 2026-06-06T14:22:40Z

ccusage performance comparison

PR SHA: 29c8a7300373
Base SHA: bee4a26e6cf5

This compares the PR package against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package	SHA	Execution setup	Bunx temp cache	Bunx warm median	Warm samples
Base pkg.pr.new	`bee4a26e6cf5`	1.331s	572.1ms	29.3ms	3
PR pkg.pr.new	`29c8a73`	490.3ms	479.3ms	29.5ms	3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: bee4a26e6cf5; PR package: 29c8a73. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`bunx -p <pkg> ccusage claude --offline --json`	1.01 GiB	548.6ms	549.9ms	1.00x	298.83 MiB	305.83 MiB	1.02x	1.84 GiB/s	1.83 GiB/s
`bunx -p <pkg> ccusage codex --offline --json`	1.01 GiB	367.3ms	365.6ms	1.00x	79.58 MiB	71.45 MiB	0.90x	2.74 GiB/s	2.75 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command	Runtime	Input	Median	Throughput	Samples
`claude --offline --json`	Package wrapper	1.01 GiB	536.8ms	1.88 GiB/s	1
`claude --offline --json`	Installed native binary	1.01 GiB	513.6ms	1.96 GiB/s	1
`codex --offline --json`	Package wrapper	1.01 GiB	358.7ms	2.81 GiB/s	1
`codex --offline --json`	Installed native binary	1.01 GiB	343.4ms	2.93 GiB/s	1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude daily --offline --json`	29.1ms	28.0ms	1.04x	43.61 MiB	43.73 MiB	1.00x	0.05 MiB/s	0.06 MiB/s
`claude session --offline --json`	28.2ms	28.2ms	1.00x	43.48 MiB	43.61 MiB	1.00x	0.05 MiB/s	0.05 MiB/s
`codex daily --offline --json`	27.4ms	27.2ms	1.01x	43.48 MiB	43.48 MiB	1.00x	0.03 MiB/s	0.03 MiB/s
`codex session --offline --json`	28.2ms	28.0ms	1.01x	43.61 MiB	43.48 MiB	1.00x	0.03 MiB/s	0.03 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude --offline --json`	1.01 GiB	551.7ms	541.2ms	1.02x	326.20 MiB	283.58 MiB	0.87x	1.82 GiB/s	1.86 GiB/s
`codex --offline --json`	1.01 GiB	358.7ms	350.4ms	1.02x	78.83 MiB	76.70 MiB	0.97x	2.81 GiB/s	2.87 GiB/s

Artifact size

Artifact	Base	PR	Delta	Ratio
packed `ccusage-*.tgz`	14.35 KiB	14.35 KiB	+0.00 KiB	1.00x
installed native package binary	3289.62 KiB	3289.74 KiB	+0.13 KiB	1.00x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

github-actions · 2026-06-06T14:22:53Z

ccusage performance comparison

PR SHA: 29c8a7300373
Base SHA: bee4a26e6cf5

This compares the Rust PR release binary against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package	SHA	Execution setup	Bunx temp cache	Bunx warm median	Warm samples
Base pkg.pr.new	`bee4a26e6cf5`	602.3ms	613.6ms	32.4ms	3
PR pkg.pr.new	`29c8a73`	547.0ms	620.5ms	32.7ms	3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: bee4a26e6cf5; PR package: 29c8a73. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`bunx -p <pkg> ccusage claude --offline --json`	1.01 GiB	558.3ms	555.7ms	1.00x	300.33 MiB	321.70 MiB	1.07x	1.80 GiB/s	1.81 GiB/s
`bunx -p <pkg> ccusage codex --offline --json`	1.01 GiB	370.7ms	366.0ms	1.01x	81.58 MiB	71.58 MiB	0.88x	2.72 GiB/s	2.75 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command	Runtime	Input	Median	Throughput	Samples
`claude --offline --json`	Package wrapper	1.01 GiB	548.7ms	1.83 GiB/s	1
`claude --offline --json`	Installed native binary	1.01 GiB	541.7ms	1.86 GiB/s	1
`codex --offline --json`	Package wrapper	1.01 GiB	364.5ms	2.76 GiB/s	1
`codex --offline --json`	Installed native binary	1.01 GiB	337.7ms	2.98 GiB/s	1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude daily --offline --json`	29.0ms	4.0ms	7.29x	43.73 MiB	2.70 MiB	0.06x	0.05 MiB/s	0.39 MiB/s
`claude session --offline --json`	29.3ms	3.8ms	7.67x	43.61 MiB	2.70 MiB	0.06x	0.05 MiB/s	0.40 MiB/s
`codex daily --offline --json`	29.3ms	3.8ms	7.78x	43.73 MiB	2.70 MiB	0.06x	0.03 MiB/s	0.23 MiB/s
`codex session --offline --json`	28.4ms	3.6ms	7.86x	43.61 MiB	2.70 MiB	0.06x	0.03 MiB/s	0.24 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude --offline --json`	1.01 GiB	564.4ms	536.2ms	1.05x	300.45 MiB	305.95 MiB	1.02x	1.78 GiB/s	1.88 GiB/s
`codex --offline --json`	1.01 GiB	375.4ms	336.2ms	1.12x	74.08 MiB	69.83 MiB	0.94x	2.68 GiB/s	2.99 GiB/s

Artifact size

Artifact	Base	PR	Delta	Ratio
packed `ccusage-*.tgz`	14.35 KiB	14.35 KiB	+0.00 KiB	1.00x
installed native package binary	3289.62 KiB	3289.74 KiB	+0.13 KiB	1.00x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

github-actions · 2026-06-06T15:23:21Z

ccusage performance comparison

PR SHA: a614b36e3b05
Base SHA: bee4a26e6cf5

This compares the Rust PR release binary against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package	SHA	Execution setup	Bunx temp cache	Bunx warm median	Warm samples
Base pkg.pr.new	`bee4a26e6cf5`	627.3ms	667.9ms	30.4ms	3
PR pkg.pr.new	`a614b36`	923.2ms	608.3ms	30.6ms	3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: bee4a26e6cf5; PR package: a614b36. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`bunx -p <pkg> ccusage claude --offline --json`	1.01 GiB	538.2ms	539.8ms	1.00x	327.70 MiB	316.83 MiB	0.97x	1.87 GiB/s	1.86 GiB/s
`bunx -p <pkg> ccusage codex --offline --json`	1.01 GiB	365.3ms	360.8ms	1.01x	80.83 MiB	81.58 MiB	1.01x	2.76 GiB/s	2.79 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command	Runtime	Input	Median	Throughput	Samples
`claude --offline --json`	Package wrapper	1.01 GiB	529.8ms	1.90 GiB/s	1
`claude --offline --json`	Installed native binary	1.01 GiB	517.5ms	1.95 GiB/s	1
`codex --offline --json`	Package wrapper	1.01 GiB	359.3ms	2.80 GiB/s	1
`codex --offline --json`	Installed native binary	1.01 GiB	328.0ms	3.07 GiB/s	1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude daily --offline --json`	29.4ms	3.9ms	7.48x	-	2.70 MiB	-	0.05 MiB/s	0.39 MiB/s
`claude session --offline --json`	29.3ms	3.9ms	7.58x	43.61 MiB	2.70 MiB	0.06x	0.05 MiB/s	0.40 MiB/s
`codex daily --offline --json`	28.4ms	3.6ms	7.98x	43.48 MiB	2.70 MiB	0.06x	0.03 MiB/s	0.24 MiB/s
`codex session --offline --json`	28.3ms	3.6ms	7.84x	-	2.70 MiB	-	0.03 MiB/s	0.24 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude --offline --json`	1.01 GiB	532.8ms	523.5ms	1.02x	316.45 MiB	313.45 MiB	0.99x	1.89 GiB/s	1.92 GiB/s
`codex --offline --json`	1.01 GiB	361.5ms	328.3ms	1.10x	80.45 MiB	79.33 MiB	0.99x	2.79 GiB/s	3.07 GiB/s

Artifact size

Artifact	Base	PR	Delta	Ratio
packed `ccusage-*.tgz`	14.35 KiB	14.35 KiB	+0.00 KiB	1.00x
installed native package binary	3289.62 KiB	3289.74 KiB	+0.13 KiB	1.00x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

github-actions · 2026-06-06T15:23:51Z

ccusage performance comparison

PR SHA: a614b36e3b05
Base SHA: bee4a26e6cf5

This compares the PR package against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package	SHA	Execution setup	Bunx temp cache	Bunx warm median	Warm samples
Base pkg.pr.new	`bee4a26e6cf5`	575.5ms	565.3ms	30.4ms	3
PR pkg.pr.new	`a614b36`	635.9ms	650.4ms	31.6ms	3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: bee4a26e6cf5; PR package: a614b36. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`bunx -p <pkg> ccusage claude --offline --json`	1.01 GiB	545.4ms	544.0ms	1.00x	319.45 MiB	307.70 MiB	0.96x	1.85 GiB/s	1.85 GiB/s
`bunx -p <pkg> ccusage codex --offline --json`	1.01 GiB	364.5ms	364.5ms	1.00x	76.58 MiB	79.70 MiB	1.04x	2.76 GiB/s	2.76 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command	Runtime	Input	Median	Throughput	Samples
`claude --offline --json`	Package wrapper	1.01 GiB	536.4ms	1.88 GiB/s	1
`claude --offline --json`	Installed native binary	1.01 GiB	506.2ms	1.99 GiB/s	1
`codex --offline --json`	Package wrapper	1.01 GiB	360.8ms	2.79 GiB/s	1
`codex --offline --json`	Installed native binary	1.01 GiB	339.2ms	2.97 GiB/s	1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude daily --offline --json`	28.4ms	28.0ms	1.01x	43.61 MiB	43.48 MiB	1.00x	0.05 MiB/s	0.06 MiB/s
`claude session --offline --json`	28.3ms	28.7ms	0.99x	43.61 MiB	43.48 MiB	1.00x	0.05 MiB/s	0.05 MiB/s
`codex daily --offline --json`	28.6ms	28.7ms	1.00x	43.48 MiB	43.61 MiB	1.00x	0.03 MiB/s	0.03 MiB/s
`codex session --offline --json`	28.8ms	29.1ms	0.99x	-	43.48 MiB	-	0.03 MiB/s	0.03 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude --offline --json`	1.01 GiB	552.7ms	551.0ms	1.00x	313.83 MiB	304.70 MiB	0.97x	1.82 GiB/s	1.83 GiB/s
`codex --offline --json`	1.01 GiB	363.4ms	356.1ms	1.02x	73.70 MiB	78.58 MiB	1.07x	2.77 GiB/s	2.83 GiB/s

Artifact size

Artifact	Base	PR	Delta	Ratio
packed `ccusage-*.tgz`	14.35 KiB	14.35 KiB	+0.00 KiB	1.00x
installed native package binary	3289.62 KiB	3289.74 KiB	+0.13 KiB	1.00x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

When thread_spawn replay entries are skipped, keep their cumulative total_token_usage as the parser baseline. Without that baseline, Codex logs that only provide total_token_usage on the first real subagent entry are counted as the full replayed cumulative total instead of the post-replay delta. Add a regression fixture that skips two replayed cumulative entries and verifies the following real entry is reported as a 100 input-token delta rather than the 1600-token cumulative total.

github-actions · 2026-06-08T21:34:22Z

ccusage performance comparison

PR SHA: a668d2c61adc
Base SHA: ae2881ffb48f

This compares the PR package against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package	SHA	Execution setup	Bunx temp cache	Bunx warm median	Warm samples
Base pkg.pr.new	`ae2881ffb48f`	578.3ms	546.3ms	31.3ms	3
PR pkg.pr.new	`a668d2c`	746.2ms	517.2ms	32.8ms	3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: ae2881ffb48f; PR package: a668d2c. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`bunx -p <pkg> ccusage claude --offline --json`	1.01 GiB	569.6ms	544.5ms	1.05x	326.83 MiB	325.33 MiB	1.00x	1.77 GiB/s	1.85 GiB/s
`bunx -p <pkg> ccusage codex --offline --json`	1.01 GiB	373.0ms	377.3ms	0.99x	81.08 MiB	83.33 MiB	1.03x	2.70 GiB/s	2.67 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command	Runtime	Input	Median	Throughput	Samples
`claude --offline --json`	Package wrapper	1.01 GiB	556.7ms	1.81 GiB/s	1
`claude --offline --json`	Installed native binary	1.01 GiB	520.7ms	1.93 GiB/s	1
`codex --offline --json`	Package wrapper	1.01 GiB	367.9ms	2.74 GiB/s	1
`codex --offline --json`	Installed native binary	1.01 GiB	344.3ms	2.92 GiB/s	1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude daily --offline --json`	30.3ms	30.3ms	1.00x	43.61 MiB	43.48 MiB	1.00x	0.05 MiB/s	0.05 MiB/s
`claude session --offline --json`	30.0ms	30.3ms	0.99x	43.48 MiB	43.73 MiB	1.01x	0.05 MiB/s	0.05 MiB/s
`codex daily --offline --json`	30.0ms	30.3ms	0.99x	43.48 MiB	43.61 MiB	1.00x	0.03 MiB/s	0.03 MiB/s
`codex session --offline --json`	30.1ms	29.9ms	1.01x	43.48 MiB	43.48 MiB	1.00x	0.03 MiB/s	0.03 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude --offline --json`	1.01 GiB	554.8ms	556.1ms	1.00x	288.20 MiB	307.45 MiB	1.07x	1.81 GiB/s	1.81 GiB/s
`codex --offline --json`	1.01 GiB	363.6ms	371.8ms	0.98x	81.08 MiB	77.83 MiB	0.96x	2.77 GiB/s	2.71 GiB/s

Artifact size

Artifact	Base	PR	Delta	Ratio
packed `ccusage-*.tgz`	14.35 KiB	14.50 KiB	+0.15 KiB	0.99x
installed native package binary	3289.62 KiB	3289.74 KiB	+0.13 KiB	1.00x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

github-actions · 2026-06-08T21:34:26Z

ccusage performance comparison

PR SHA: a668d2c61adc
Base SHA: ae2881ffb48f

This compares the Rust PR release binary against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package	SHA	Execution setup	Bunx temp cache	Bunx warm median	Warm samples
Base pkg.pr.new	`ae2881ffb48f`	1.171s	868.2ms	34.0ms	3
PR pkg.pr.new	`a668d2c`	888.3ms	850.5ms	34.4ms	3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: ae2881ffb48f; PR package: a668d2c. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`bunx -p <pkg> ccusage claude --offline --json`	1.01 GiB	560.2ms	570.3ms	0.98x	301.95 MiB	309.95 MiB	1.03x	1.80 GiB/s	1.77 GiB/s
`bunx -p <pkg> ccusage codex --offline --json`	1.01 GiB	375.3ms	382.9ms	0.98x	81.20 MiB	78.33 MiB	0.96x	2.68 GiB/s	2.63 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command	Runtime	Input	Median	Throughput	Samples
`claude --offline --json`	Package wrapper	1.01 GiB	561.3ms	1.79 GiB/s	1
`claude --offline --json`	Installed native binary	1.01 GiB	539.6ms	1.87 GiB/s	1
`codex --offline --json`	Package wrapper	1.01 GiB	373.2ms	2.70 GiB/s	1
`codex --offline --json`	Installed native binary	1.01 GiB	348.8ms	2.89 GiB/s	1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude daily --offline --json`	30.5ms	4.2ms	7.28x	43.73 MiB	2.83 MiB	0.06x	0.05 MiB/s	0.37 MiB/s
`claude session --offline --json`	31.5ms	4.5ms	7.05x	43.61 MiB	2.83 MiB	0.06x	0.05 MiB/s	0.35 MiB/s
`codex daily --offline --json`	31.1ms	4.0ms	7.75x	43.48 MiB	2.83 MiB	0.07x	0.03 MiB/s	0.21 MiB/s
`codex session --offline --json`	30.8ms	4.0ms	7.63x	43.48 MiB	2.83 MiB	0.07x	0.03 MiB/s	0.21 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude --offline --json`	1.01 GiB	561.0ms	533.0ms	1.05x	320.45 MiB	324.20 MiB	1.01x	1.79 GiB/s	1.89 GiB/s
`codex --offline --json`	1.01 GiB	370.3ms	346.7ms	1.07x	75.95 MiB	80.20 MiB	1.06x	2.72 GiB/s	2.90 GiB/s

Artifact size

Artifact	Base	PR	Delta	Ratio
packed `ccusage-*.tgz`	14.35 KiB	14.50 KiB	+0.15 KiB	0.99x
installed native package binary	3289.62 KiB	3289.74 KiB	+0.13 KiB	1.00x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

pullfrog Bot requested a review from ryoppippi June 6, 2026 13:49

pullfrog Bot mentioned this pull request Jun 6, 2026

Bug: Massive token overcounting for Codex subagent sessions (91x inflation) #950

Closed

pullfrog Bot added 4 commits June 6, 2026 13:57

chore: fix clippy lints

63d5773

revert: undo treefmt formatting of pullfrog.yml

2434521

chore: suppress zizmor unpinned-uses in pullfrog workflow

a614b36

pullfrog Bot force-pushed the pullfrog/950-fix-codex-subagent-replay-overcounting branch from 29c8a73 to a614b36 Compare June 6, 2026 14:08

ryoppippi merged commit 22e5944 into main Jun 8, 2026
37 checks passed

ryoppippi deleted the pullfrog/950-fix-codex-subagent-replay-overcounting branch June 8, 2026 21:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(codex): skip replayed parent token history in thread_spawn subagent sessions#1218

fix(codex): skip replayed parent token history in thread_spawn subagent sessions#1218
ryoppippi merged 5 commits into
mainfrom
pullfrog/950-fix-codex-subagent-replay-overcounting

pullfrog Bot commented Jun 6, 2026 •

edited by cubic-dev-ai Bot

Loading

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 6, 2026 •

edited

Loading

Uh oh!

ryoppippi commented Jun 6, 2026

Uh oh!

pullfrog Bot commented Jun 6, 2026 •

edited

Loading

Uh oh!

pkg-pr-new Bot commented Jun 6, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 6, 2026

Uh oh!

github-actions Bot commented Jun 6, 2026

Uh oh!

github-actions Bot commented Jun 6, 2026

Uh oh!

github-actions Bot commented Jun 6, 2026

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

pullfrog Bot commented Jun 6, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause (3-layer inflation)

Fix

Changes

Summary by cubic

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying with Cloudflare Workers

Uh oh!

ryoppippi commented Jun 6, 2026

Uh oh!

pullfrog Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pkg-pr-new Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 6, 2026

ccusage performance comparison

Package runner startup

Cached bunx execution performance

Package runtime diagnostics

Committed fixture performance

Large real-world-shaped fixture performance

Artifact size

Uh oh!

github-actions Bot commented Jun 6, 2026

ccusage performance comparison

Package runner startup

Cached bunx execution performance

Package runtime diagnostics

Committed fixture performance

Large real-world-shaped fixture performance

Artifact size

Uh oh!

github-actions Bot commented Jun 6, 2026

ccusage performance comparison

Package runner startup

Cached bunx execution performance

Package runtime diagnostics

Committed fixture performance

Large real-world-shaped fixture performance

Artifact size

Uh oh!

github-actions Bot commented Jun 6, 2026

ccusage performance comparison

Package runner startup

Cached bunx execution performance

Package runtime diagnostics

Committed fixture performance

Large real-world-shaped fixture performance

Artifact size

Uh oh!

github-actions Bot commented Jun 8, 2026

ccusage performance comparison

Package runner startup

Cached bunx execution performance

Package runtime diagnostics

Committed fixture performance

Large real-world-shaped fixture performance

Artifact size

Uh oh!

github-actions Bot commented Jun 8, 2026

ccusage performance comparison

Package runner startup

Cached bunx execution performance

Package runtime diagnostics

Committed fixture performance

Large real-world-shaped fixture performance

Artifact size

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

pullfrog Bot commented Jun 6, 2026 •

edited by cubic-dev-ai Bot

Loading

cloudflare-workers-and-pages Bot commented Jun 6, 2026 •

edited

Loading

pullfrog Bot commented Jun 6, 2026 •

edited

Loading

pkg-pr-new Bot commented Jun 6, 2026 •

edited

Loading