Skip to content

fix(codex): sort usage files before loading#1152

Merged
ryoppippi merged 1 commit into
mainfrom
codex/deterministic-codex-loading
May 25, 2026
Merged

fix(codex): sort usage files before loading#1152
ryoppippi merged 1 commit into
mainfrom
codex/deterministic-codex-loading

Conversation

@ryoppippi
Copy link
Copy Markdown
Owner

@ryoppippi ryoppippi commented May 25, 2026

Sorts Codex session files after recursive discovery so event loading no longer depends on filesystem read_dir ordering.

This matches Claude usage file ordering and keeps single-threaded and parallel Codex event loading on the same deterministic baseline.

Fixes #1105

Testing:

  • direnv exec . env -u CFLAGS -u CPPFLAGS -u LDFLAGS cargo test --manifest-path rust/Cargo.toml -p ccusage --bin ccusage tests::loads_codex_token_count_events_in_parallel -- --exact
  • direnv exec . pnpm run format
  • direnv exec . env -u CFLAGS -u CPPFLAGS -u LDFLAGS cargo test --manifest-path rust/Cargo.toml --workspace
  • direnv exec . env -u CFLAGS -u CPPFLAGS -u LDFLAGS cargo clippy --manifest-path rust/Cargo.toml --workspace --all-targets -- -D warnings

Summary by cubic

Sort Codex usage files by path before loading to remove dependence on filesystem order and match the Claude loader, fixing #1105. Event processing is now deterministic across single-threaded and parallel modes.

Written for commit 81969e4. Summary will update on new commits. Review in cubic

Summary by CodeRabbit

  • Bug Fixes
    • Improved reliability of event loading by ensuring consistent ordering when processing code usage data.

Review Change Stack

Sort Codex session files after recursive discovery so single-threaded and parallel event loading operate from a stable path order instead of filesystem read_dir order.

This matches the Claude loader ordering and makes the existing parallel loader regression test deterministic without changing token parsing, dedupe keys, or aggregation semantics.

Fixes #1105
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 25, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e1f6b65a-eaaf-4df5-85c3-f42bb44b72c9

📥 Commits

Reviewing files that changed from the base of the PR and between bf84d4b and 81969e4.

📒 Files selected for processing (1)
  • rust/crates/ccusage/src/adapter/codex/loader.rs

📝 Walkthrough

Walkthrough

This PR fixes nondeterministic Codex session file ordering by adding an explicit sort on collected file paths before event loading. The change aligns the Codex loader with the Claude loader's behavior and resolves test flakiness caused by filesystem read order variation.

Changes

Deterministic File Ordering

Layer / File(s) Summary
File sorting before event loading
rust/crates/ccusage/src/adapter/codex/loader.rs
Collected Codex session files are sorted by string path representation before single-threaded or parallel event loading, ensuring deterministic output order prior to deduplication.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related PRs

  • ryoppippi/ccusage#989: Both PRs address Codex session aggregation correctness—this PR makes file load order deterministic before deduplication, while the related PR adds global token-usage event fingerprint deduplication.

Poem

🐰 A sorted path through session trees,
No more chaos in the breeze,
Files align in order true,
Tests pass red and tests pass blue,
Deterministic victory! 🎉

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: sorting Codex usage files before loading to ensure deterministic file ordering.
Linked Issues check ✅ Passed The code change directly addresses issue #1105 by sorting collected Codex session files before loading, making file ordering deterministic and resolving the test failure.
Out of Scope Changes check ✅ Passed The single-line change is narrowly scoped to sorting files before loading, directly addressing the deterministic ordering issue without introducing unrelated modifications.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/deterministic-codex-loading

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ryoppippi
Copy link
Copy Markdown
Owner Author

@coderabbitai review

@cubic-dev-ai review

@cubic-dev-ai
Copy link
Copy Markdown

cubic-dev-ai Bot commented May 25, 2026

@coderabbitai review

@cubic-dev-ai review

@ryoppippi I have started the AI code review. It will take a few minutes to complete.

@cloudflare-workers-and-pages
Copy link
Copy Markdown

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
ccusage-guide 81969e4 Commit Preview URL

Branch Preview URL
May 25 2026, 06:44 PM

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 25, 2026

Tip

For best results, initiate chat on the files or code changes.

@ryoppippi: This is a clean, well-scoped fix. Here's the full review:


Review: fix(codex): sort usage files before loading

Overall: ✅ Correct and ready to merge.

What the change does

files.sort_by_cached_key(|path| path.to_string_lossy().into_owned());

collect_usage_files uses read_dir under the hood, which returns entries in inode/filesystem order — non-deterministic and OS-dependent. Sorting before dispatch ensures both the single-threaded path (flat_map in order) and the parallel path (chunk_file_indexes_by_size → index-reassembly) see a stable, reproducible file sequence. This also keeps dedupe_codex_events deterministic when the same event appears across files.

Implementation notes

  • sort_by_cached_key is the right choice — the key (String from to_string_lossy().into_owned()) is computed once per element and cached, avoiding a heap allocation per comparison. ✅
  • into_owned() is necessary because to_string_lossy() returns Cow<'_, str>, which can't be cached without converting to an owned value. ✅
  • Consistency with other loaders — Droid and Codebuff use files.sort() (which sorts PathBuf by OsStr bytes). For well-formed UTF-8 paths like Codex session files, the two approaches are equivalent. The explicit string sort here is also fine and arguably more readable.

Minor observation (non-blocking)

files.sort() would be a marginally lighter alternative (avoids per-element String allocation), since PathBuf: Ord gives the same lexicographic order for valid UTF-8 paths:

files.sort();

But the current approach is correct and idiomatic Rust, so this is purely cosmetic.

Test coverage

The referenced integration test (loads_codex_token_count_events_in_parallel) exercises the parallel path. One thing worth noting is that there's no multi-file test asserting that events appear in a specific deterministic order after the sort — but the existing test is sufficient to catch regressions in the parallel loading path, and the deduplication logic provides a natural correctness check.

Verdict: LGTM 🐇

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

Re-trigger cubic

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented May 25, 2026

Open in StackBlitz

ccusage

npx https://pkg.pr.new/ryoppippi/ccusage@1152

@ccusage/ccusage-darwin-arm64

npx https://pkg.pr.new/ryoppippi/ccusage/@ccusage/ccusage-darwin-arm64@1152

@ccusage/ccusage-darwin-x64

npx https://pkg.pr.new/ryoppippi/ccusage/@ccusage/ccusage-darwin-x64@1152

@ccusage/ccusage-linux-arm64

npx https://pkg.pr.new/ryoppippi/ccusage/@ccusage/ccusage-linux-arm64@1152

@ccusage/ccusage-linux-x64

npx https://pkg.pr.new/ryoppippi/ccusage/@ccusage/ccusage-linux-x64@1152

@ccusage/ccusage-win32-arm64

npx https://pkg.pr.new/ryoppippi/ccusage/@ccusage/ccusage-win32-arm64@1152

@ccusage/ccusage-win32-x64

npx https://pkg.pr.new/ryoppippi/ccusage/@ccusage/ccusage-win32-x64@1152

commit: 81969e4

@github-actions
Copy link
Copy Markdown
Contributor

ccusage performance comparison

PR SHA: 81969e48b8a5
Base SHA: bf84d4bc1ce8

This compares the Rust PR release binary against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package SHA Execution setup Bunx temp cache Bunx warm median Warm samples
Base pkg.pr.new bf84d4bc1ce8 951.5ms 418.5ms 34.6ms 3
PR pkg.pr.new 81969e48b8a5 421.6ms 454.2ms 35.5ms 3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: bf84d4bc1ce8; PR package: 81969e48b8a5. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
bunx -p <pkg> ccusage claude --offline --json 1.01 GiB 503.7ms 500.7ms 1.01x 267.20 MiB 251.45 MiB 0.94x 2.00 GiB/s 2.01 GiB/s
bunx -p <pkg> ccusage codex --offline --json 1.01 GiB 383.5ms 363.1ms 1.06x 66.33 MiB 54.08 MiB 0.82x 2.63 GiB/s 2.77 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command Runtime Input Median Throughput Samples
claude --offline --json Package wrapper 1.01 GiB 499.0ms 2.02 GiB/s 1
claude --offline --json Installed native binary 1.01 GiB 467.1ms 2.16 GiB/s 1
codex --offline --json Package wrapper 1.01 GiB 363.6ms 2.77 GiB/s 1
codex --offline --json Installed native binary 1.01 GiB 330.4ms 3.05 GiB/s 1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude daily --offline --json 0.00 MiB 32.3ms 4.4ms 7.38x - 2.83 MiB - 0.05 MiB/s 0.35 MiB/s
claude session --offline --json 0.00 MiB 32.5ms 4.3ms 7.50x - 2.83 MiB - 0.05 MiB/s 0.36 MiB/s
codex daily --offline --json 0.00 MiB 31.9ms 4.0ms 7.97x 43.61 MiB 2.83 MiB 0.06x 0.03 MiB/s 0.21 MiB/s
codex session --offline --json 0.00 MiB 31.9ms 4.1ms 7.76x - 2.83 MiB - 0.03 MiB/s 0.21 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude --offline --json 1.01 GiB 497.4ms 467.6ms 1.06x 249.58 MiB 265.08 MiB 1.06x 2.02 GiB/s 2.15 GiB/s
codex --offline --json 1.01 GiB 360.9ms 325.4ms 1.11x 60.20 MiB 55.20 MiB 0.92x 2.79 GiB/s 3.09 GiB/s

Artifact size

Artifact Base PR Delta Ratio
packed ccusage-*.tgz 14.25 KiB 14.25 KiB -0.00 KiB 1.00x
installed native package binary 3289.49 KiB 3289.49 KiB +0.00 KiB 1.00x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

@github-actions
Copy link
Copy Markdown
Contributor

ccusage performance comparison

PR SHA: 81969e48b8a5
Base SHA: bf84d4bc1ce8

This compares the PR package against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package SHA Execution setup Bunx temp cache Bunx warm median Warm samples
Base pkg.pr.new bf84d4bc1ce8 421.4ms 434.7ms 36.3ms 3
PR pkg.pr.new 81969e48b8a5 699.2ms 591.5ms 35.8ms 3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: bf84d4bc1ce8; PR package: 81969e48b8a5. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
bunx -p <pkg> ccusage claude --offline --json 1.01 GiB 514.4ms 505.5ms 1.02x 259.70 MiB 262.20 MiB 1.01x 1.96 GiB/s 1.99 GiB/s
bunx -p <pkg> ccusage codex --offline --json 1.01 GiB 374.0ms 378.7ms 0.99x 65.70 MiB 57.95 MiB 0.88x 2.69 GiB/s 2.66 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command Runtime Input Median Throughput Samples
claude --offline --json Package wrapper 1.01 GiB 496.4ms 2.03 GiB/s 1
claude --offline --json Installed native binary 1.01 GiB 480.3ms 2.10 GiB/s 1
codex --offline --json Package wrapper 1.01 GiB 369.9ms 2.72 GiB/s 1
codex --offline --json Installed native binary 1.01 GiB 343.2ms 2.93 GiB/s 1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude daily --offline --json 0.00 MiB 32.7ms 32.8ms 1.00x - - - 0.05 MiB/s 0.05 MiB/s
claude session --offline --json 0.00 MiB 33.0ms 32.3ms 1.02x 43.61 MiB - - 0.05 MiB/s 0.05 MiB/s
codex daily --offline --json 0.00 MiB 32.7ms 32.6ms 1.00x 43.48 MiB 43.48 MiB 1.00x 0.03 MiB/s 0.03 MiB/s
codex session --offline --json 0.00 MiB 32.8ms 32.3ms 1.02x - 43.73 MiB - 0.03 MiB/s 0.03 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude --offline --json 1.01 GiB 502.5ms 496.4ms 1.01x 268.95 MiB 272.83 MiB 1.01x 2.00 GiB/s 2.03 GiB/s
codex --offline --json 1.01 GiB 367.9ms 364.9ms 1.01x 63.95 MiB 63.70 MiB 1.00x 2.74 GiB/s 2.76 GiB/s

Artifact size

Artifact Base PR Delta Ratio
packed ccusage-*.tgz 14.25 KiB 14.25 KiB -0.00 KiB 1.00x
installed native package binary 3289.49 KiB 3289.49 KiB +0.00 KiB 1.00x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

@ryoppippi ryoppippi merged commit 63cf17e into main May 25, 2026
41 checks passed
@ryoppippi ryoppippi deleted the codex/deterministic-codex-loading branch May 25, 2026 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Codex loader file ordering is nondeterministic, causing Rust test failure

1 participant