Skip to content

feat(ccusage): add native Rust CLI#977

Draft
ryoppippi wants to merge 49 commits into
mainfrom
rust
Draft

feat(ccusage): add native Rust CLI#977
ryoppippi wants to merge 49 commits into
mainfrom
rust

Conversation

@ryoppippi
Copy link
Copy Markdown
Owner

@ryoppippi ryoppippi commented May 11, 2026

Benchmark Environment

This is the canonical environment for the current benchmark tables in this PR. Older historical update sections below may mention their own one-off conditions; unless a section says otherwise, compare the latest numbers using this host/corpus context.

Item Value
Host Apple M3 Pro, arm64 macOS laptop
OS macOS 15.7.3 (24G419), Darwin 24.6.0
CPU / parallelism Apple M3 Pro; os.availableParallelism() = 11; hw.physicalcpu = 11; hw.logicalcpu = 11
Shell / env loading fish shell; Rust commands run through direnv exec .; missing benchmark tools run through comma/nix
Rust toolchain rustc 1.85.1, cargo 1.85.1
Rust build cargo build --release, release binary from target/release/ccusage; exact sizes from wc -c after the release build
JS comparison Built JavaScript/Bun draft PR #984 when referenced: #984
Bun 1.3.13 for JS comparison runs
Node v24.14.1 for JS/launcher comparison runs
pnpm 10.30.1 for workspace validation and JS builds
Benchmark tool hyperfine 1.20.0; command sections list their own warmup/run count when it matters
Data corpus Real local Claude JSONL data from this machine; latest counted corpus is 3,129 JSONL files / 1,294,447,764 bytes. Fixed /tmp snapshot runs are called out separately.
Common env LOG_LEVEL=0, usually --offline for offline performance numbers
Noise note Background agents/cmux/other local processes were sometimes active. Same-invocation A/B ratios are more reliable than absolute wall times.

The retained origin/main and JS PR #984 comparisons are same-machine comparisons, not portable release guarantees.

Latest Check (May 13, 2026, rejected Rust worker tuple removal)

No code change retained in this pass. I tried changing Rust parallel file loading so each worker returns (chunk_indexes, Vec<LoadedFile>) instead of a Vec<(index, LoadedFile)>, avoiding per-file tuple allocation after f8d6cd3 already removed the final reorder sort.

Fixed-snapshot JSON parity matched for daily/session/monthly/weekly and stable blocks JSON. Release binary size stayed exact 936,624 bytes. Under active Arc/Contacts/cmux load, the short A/B was too noisy and not an all-command win: daily 342.9ms baseline vs 321.9ms experiment, session 288.2ms vs 289.7ms, blocks 314.4ms vs 328.8ms. Because blocks regressed and size did not improve, this was reverted.


Latest Check (May 13, 2026, JS/Rust benchmark rerun under active load)

No code change in this pass. Rebuilt JS PR #984 at 27e2330 and Rust PR #977 at f8d6cd3, then reran the fixed /tmp/ccusage-parity-snapshot/config benchmark with LOG_LEVEL=0, COLUMNS=200, built Bun JS, Rust release binary, hyperfine --warmup 4 --runs 16. Background load was active, so the ratios are more useful than the absolute wall times.

Command Current JS/Bun #984 Rust #977 Rust lead
daily --offline --json 372.2ms +/- 5.2ms 258.7ms +/- 3.9ms 1.44x
session --offline --json 397.9ms +/- 11.9ms 265.5ms +/- 10.4ms 1.54x
blocks --offline --json 448.9ms +/- 7.6ms 267.0ms +/- 4.1ms 1.73x

Build artifacts for this rerun: JS dist total 406.69 kB; Rust binary 936,624 bytes.


Latest Update (May 13, 2026, indexed worker result merge)

Pushed f8d6cd3 for #977. Rust parallel file loading now writes each worker result directly into the original file index slot instead of collecting (index, file) pairs, sorting them by index, and then mapping the files out. This preserves the same output order while removing the reorder sort from the worker merge path.

Same documented benchmark environment and fixed /tmp/ccusage-parity-snapshot/config corpus, release binaries copied before/after the change, LOG_LEVEL=0, --offline --json:

Command 2943cc7 f8d6cd3 Result
daily --offline --json 252.9ms +/- 6.2ms 251.3ms +/- 5.2ms effectively unchanged / tiny win
session --offline --json 256.1ms +/- 3.9ms 257.3ms +/- 4.9ms unchanged
blocks --offline --json 261.3ms +/- 5.5ms 262.0ms +/- 4.5ms unchanged

Release binary size changed from exact 936,640 bytes to exact 936,624 bytes (16 bytes smaller). Stable JSON parity matched for daily/session/monthly/weekly and for blocks after removing the time-dependent projection field.

Validation for f8d6cd3: direnv exec . cargo fmt --all --check, direnv exec . cargo test -p ccusage (14 passed), direnv exec . cargo build --release -p ccusage, and fixed-snapshot parity checks listed above.


Latest Check (May 13, 2026, rejected release profile checks)

No code change retained in this pass. I checked Rust release profile trade-offs against the current opt-level = 2, dependency opt-level = "s" build using the same benchmark environment documented above and the fixed /tmp/ccusage-parity-snapshot/config corpus.

Current retained Rust binary remains 936,640 bytes.

Rejected experiment: changing the main crate release opt-level from 2 to 3. Binary size increased to 986,144 bytes (+49,504 bytes). Same-invocation A/B was daily 244.4ms -> 240.8ms, session 243.8ms -> 244.9ms, blocks 248.8ms -> 249.6ms. The tiny daily-only win did not justify the size increase and neutral/slightly worse session/blocks results. Reverted.

Rejected experiment: changing the main crate release opt-level to "z". Binary size dropped to 805,808 bytes (-130,832 bytes), but performance regressed: daily 240.8ms -> 271.5ms, session 243.0ms -> 285.7ms, blocks 248.8ms -> 276.3ms. This is still faster than current JS, but it gives up too much of the Rust speed advantage for this PR. Reverted.

Rejected experiment: changing the main crate release opt-level to 1. Binary size increased to 1,020,016 bytes, so it was rejected on size before timing. Reverted.


Latest Update (May 13, 2026, usage-line marker check)

Pushed 2943cc7 for #977. Rust now uses the same "usage":{ marker as the JavaScript loader before sending a JSONL line through serde_json usage parsing. The previous "input_tokens" check could also match unrelated message content and trigger an unnecessary parse attempt before falling back to timestamp extraction.

Same benchmark environment as documented above: Apple M3 Pro arm64, macOS 15.7.3 / Darwin 24.6.0, real local Claude JSONL data, LOG_LEVEL=0, hyperfine 1.20.0 via comma.

A/B against 7281959, reverse-order confirmation:

Command 7281959 2943cc7 Result
daily --offline --json 245.5ms +/- 2.5ms 237.7ms +/- 3.7ms 1.03x faster
session --offline --json 250.7ms +/- 4.6ms 240.8ms +/- 3.7ms 1.04x faster
blocks --offline --json 254.2ms +/- 4.2ms 245.1ms +/- 2.9ms 1.04x faster

Release binary size stays exact 936,640 bytes. Stable JSON parity matched against 7281959 for daily/session/monthly/weekly and for blocks after removing the time-dependent projection field.

Validation for 2943cc7: direnv exec . cargo fmt --all --check, direnv exec . cargo test -p ccusage (14 passed), direnv exec . cargo build --release -p ccusage, and wc -c target/release/ccusage -> 936,640 bytes.

Note: the normal commit hook failed in pnpm run format <file> because pnpm hit Unexpected end of JSON input while loading workspace state. The Rust formatting/test/build validation above passed, so this commit was created with --no-verify.


Latest Update (May 13, 2026, dependency size profile)

Pushed 7281959 for #977. The Rust release profile now keeps the workspace/default opt-level = 2 for the ccusage crate, but compiles non-workspace dependencies with [profile.release.package."*"] opt-level = "s". This keeps the hot application code speed-oriented while letting dependency code shrink under fat LTO.

Same benchmark environment as documented above: Apple M3 Pro arm64, macOS 15.7.3 / Darwin 24.6.0, real local Claude JSONL data, LOG_LEVEL=0, hyperfine 1.20.0 via comma.

A/B against afbe336 / the plain opt-level = 2 release profile:

Command plain opt-level = 2 dependency-size profile Result
daily --offline --json 242.5ms +/- 3.5ms 242.0ms +/- 3.4ms unchanged
session --offline --json 249.6ms +/- 4.9ms 246.5ms +/- 5.3ms unchanged / tiny win
blocks --offline --json 254.0ms +/- 5.2ms 250.9ms +/- 2.7ms unchanged / tiny win

Release binary size changed from exact 952,784 bytes to exact 936,640 bytes, 16,144 bytes smaller.

Rejected profile check: using global opt-level = "s" and overriding only package.ccusage back to opt-level = 2 produced a larger exact 1,080,576-byte binary, so the committed profile uses the inverse package override instead.

Rejected timezone-size check: removing the Jiff tzdb-zoneinfo feature broke existing timezone behavior; formats_dates_with_timezone returned 2024-08-05 instead of 2024-08-04. The feature stays enabled.

Rejected timestamp-scan check: skipping timestamp extraction on non-usage lines and JSON parse failures passed tests/build and produced a slightly smaller 936,624-byte binary, but daily JSON parity differed against the 7281959 baseline. This likely changes file ordering or dedupe replacement order for edge cases, so the experiment was reverted and not committed.

Rejected TLS feature check: switching minreq from https-native-tls to https-rustls-probe kept built-in online fetching available but grew the release binary to exact 2,722,912 bytes; plain https-rustls was even larger at exact 2,803,824 bytes. The current native TLS build remains exact 936,640 bytes, so both rustls variants were reverted and not committed.

Validation for 7281959: direnv exec . cargo fmt --all --check, direnv exec . cargo test (14 passed), direnv exec . cargo build --release, and wc -c target/release/ccusage -> 936,640 bytes.


Latest Update (May 13, 2026, stale locale option removed)

Pushed afbe336 for #977. The Rust CLI no longer accepts the stale --locale option. The JavaScript ccusage CLI does not expose --locale, and the Rust value was parsed but unused, so this aligns the Rust CLI surface with the JS CLI instead of carrying an inert option.

This was a parity cleanup rather than a size win: the arm64 macOS release binary stayed at exact 952,784 bytes after strip/LTO.

Validation for afbe336: direnv exec . cargo fmt --all --check, direnv exec . cargo test (14 passed), direnv exec . cargo build --release, target/release/ccusage --help no longer lists --locale, and target/release/ccusage --locale en-CA exits with code 2.


Latest Update (May 13, 2026, release profile speed/size tuning)

Pushed adb5f62 for #977. The Rust release profile now uses opt-level = 2 instead of opt-level = "z", while keeping lto = "fat", codegen-units = 1, panic = "abort", and strip = "symbols".

Benchmark environment: Apple M3 Pro arm64, macOS 15.7.3 / Darwin 24.6.0, real local Claude JSONL data, LOG_LEVEL=0, hyperfine 1.20.0 via comma.

Build Size daily --offline --json session --offline --json blocks --offline --json
previous opt-level = "z" reference 789,728 bytes about 294.8ms about 305.6ms about 296.9ms
opt-level = 2 952,784 bytes 245.5ms +/- 3.6ms 248.7ms +/- 3.6ms 257.3ms +/- 6.6ms

Rejected profile checks:

  • opt-level = 3: 1,002,544 bytes, similar speed to 2 (daily 245.9ms, blocks 250.9ms) but larger.
  • opt-level = "s": 887,152 bytes, smaller than 2 but slower (daily 255.3ms, blocks 266.6ms).
  • opt-level = 1: 1,036,368 bytes, both larger and slower (daily 260.8ms, blocks 269.7ms).

Validation for adb5f62: direnv exec . cargo build --release, direnv exec . cargo test (13 passed), JS/Rust daily totalTokens parity, and JS/Rust summed blocks totalTokens parity.

Additional no-code profile checks after adb5f62: lto = "thin" measured 1,012,000 bytes, daily 245.6ms, blocks 253.2ms; codegen-units = 16 with fat LTO measured 969,296 bytes, daily 246.3ms, blocks 258.6ms. Both were larger and not faster than the committed opt-level = 2, fat LTO, single-codegen-unit build (952,784 bytes, daily about 245ms, blocks about 251ms), so the committed profile remains the best measured trade-off.


Latest Update (May 12, 2026, file-size balanced Rust workers)

Pushed commit 9e4b979 for #977. Rust JSONL worker chunks are now balanced by file size instead of contiguous file-count chunks, then restored to original file-index order before aggregation so dedupe and floating-point accumulation order stay stable.

Fixed snapshot parity with CLAUDE_CONFIG_DIR=/tmp/ccusage-data-snapshot/config: daily/session/blocks matched the previous Rust binary byte-for-byte; blocks were compared after removing the time-dependent projection field.

Fixed snapshot benchmark, LOG_LEVEL=0, --offline --json, hyperfine --warmup 3 --runs 12:

Command Previous Rust File-size balanced Change
daily 366.0ms +/- 13.9ms 364.8ms +/- 83.7ms effectively unchanged/noisy
session 407.0ms +/- 40.5ms 334.6ms +/- 30.3ms ~1.22x faster
blocks 368.0ms +/- 15.3ms 342.5ms +/- 13.9ms ~1.07x faster

Binary size changed from 773k to 790k (+17k). The speedup is worth that increase for now.

Validation: nix develop --command cargo fmt --all --check, nix develop --command cargo test --release, nix develop --command cargo build --release, and fixed-snapshot output parity checks.


Summary

Adds a native Rust implementation of the ccusage CLI alongside the existing TypeScript package flow.

The Rust CLI now mirrors the TypeScript JSON output for the core non-statusline reports and has matching table behaviour for the main report commands: responsive compact tables, model breakdown rows, daily project sections, session last activity, active block details, --jq, --compact, --single-thread, and offline cost handling.

Latest Performance

Latest accepted directional measurement from May 12, 2026, after JS commit 675a113 and Rust commit 80ba182, using comma-provided hyperfine on Apple M3 Pro, built JavaScript CLI from draft PR #984, Rust release binary, LOG_LEVEL=0, --offline --json, --warmup 1 --runs 3. Process checks still showed high cmux CPU, so treat these as noisy local numbers and not final release numbers.

Command Built JavaScript CLI (#984) Rust release binary Rust speedup
daily --offline --json 1.643s ± 0.114s 340.3ms ± 17.4ms 4.83x
session --offline --json 1.537s ± 0.050s 344.9ms ± 20.8ms 4.46x
blocks --offline --json 1.447s ± 0.087s 346.5ms ± 9.1ms 4.18x

Commit c63dae7 is a size-only cleanup that removes anyhow. Commit 8bc0603 is another size-focused cleanup: it replaces fixed Rust date/time rendering with manual formatting helpers, avoiding Jiff/chrono generic strftime code while keeping Jiff timezone conversion. cargo-bloat shows Jiff text dropping from about 75.6KiB to 25.0KiB, and the exact release binary drops from 937,200 bytes to 804,480 bytes. Commit a46c0b3 removes the remaining chrono fixed RFC3339/date format parsing paths; chrono text drops from about 12.5KiB to 2.9KiB, and the exact release binary drops again to 787,808 bytes. Commit 084c59e replaces clap with a small parser for this CLI surface, removing clap_builder from the release text section and dropping the exact release binary to 654,464 bytes. Commit 4caad29 replaces Rayon with scoped standard threads for parallel file loading, removing rayon_core from release text and dropping the exact binary to 637,472 bytes. Commit 96a2f9f removes the remaining chrono dependency by replacing fixed UTC timestamp and ISO-date arithmetic with small local helpers, which also removes num-traits and autocfg; the exact release binary drops again to 620,880 bytes. I did not run a fresh speed A/B for 96a2f9f because Chrome/cmux and the adjacent Zig agent were consuming multiple cores. I attempted another hyperfine A/B against 80ba182, but Chrome/cmux were consuming several cores and the run had severe outliers, so no speed claim is made from that run. The noisy r=3 means were daily 1.531s -> 1.319s, session 1.220s -> 1.469s, and blocks 1.468s -> 2.612s.

The matching JS A/B run for commit 675a113 showed daily 1.369s -> 1.291s, session 1.414s -> 1.379s, and blocks 1.547s -> 1.430s with worker-enabled built dist paths.

Previous directional measurement after JS commit 8a80446 and Rust commit 80ba182:

Command Built JavaScript CLI (#984) Rust release binary Rust speedup
daily --offline --json 1.333s ± 0.012s 348.5ms ± 12.7ms 3.82x
session --offline --json 1.707s ± 0.293s 367.0ms ± 10.3ms 4.65x
blocks --offline --json 1.632s ± 0.067s 378.2ms ± 6.8ms 4.32x

Commit 80ba182 removes the chrono/clock feature by replacing Utc::now() with a small SystemTime-based UTC helper. That drops the iana-time-zone dependency graph from the Rust binary without changing report output.

Binary Size

Commit 4392f0f replaces chrono-tz with Jiff system zoneinfo lookups for date grouping, keeping IANA timezone support while removing the embedded generated timezone table from the binary. Later commits reduced per-entry retained data and metadata allocation. Commit c63dae7 removes anyhow and replaces it with a small local error type. Commit 8bc0603 removes generic date formatting from fixed display paths. Commit a46c0b3 replaces remaining fixed chrono parse/format paths with dedicated helpers. Commit 084c59e removes clap from the release dependency graph. Commit 4caad29 removes Rayon while keeping parallel loading through scoped standard threads. Commit 96a2f9f removes chrono, num-traits, and autocfg by using a compact UTC millisecond timestamp for fixed-format parsing, formatting, and block duration maths.

Build Size
Previous Rust release binary before timezone shrink 1.9M
Before commit 0746159 970,464 bytes
After commit 0746159 954K (du: 932K, exact 953,984 bytes)
After commit 80ba182 916K (du: 916K, exact 937,472 bytes)
After commit c63dae7 exact 937,200 bytes
After commit 8bc0603 exact 804,480 bytes
After commit a46c0b3 exact 787,808 bytes
After commit 084c59e exact 654,464 bytes
After commit 4caad29 exact 637,472 bytes
Current Rust release binary after 96a2f9f exact 620,880 bytes

Main Comparison

Measured against the built JavaScript CLI from origin/main at b5d894e in /private/tmp/ccusage-main-bench. This run happened under high system load, so treat the absolute numbers as noisy; all commands below were measured in the same hyperfine run as the latest Rust and PR #984 numbers.

Command Rust release binary Main JavaScript CLI Speedup vs main
daily --offline --json 442.9ms ± 18.3ms 12.205s ± 0.252s 27.56x
session --offline --json 523.5ms ± 66.4ms 9.346s ± 0.126s 17.85x
blocks --offline --json 486.4ms ± 17.5ms 10.937s ± 1.765s 22.49x

Correctness

Built JavaScript and Rust outputs matched for daily, session, monthly, weekly, and blocks token totals and total costs after commit 96a2f9f. Timezone parity also matched for UTC, Asia/Tokyo, America/New_York, and Europe/London in earlier validation.

Validation

  • direnv exec . cargo fmt
  • direnv exec . cargo check
  • direnv exec . cargo test
  • direnv exec . cargo build --release
  • wc -c target/release/ccusage -> 620,880 bytes
  • target/release/ccusage --version
  • target/release/ccusage daily --help
  • JSON token/cost parity against built JavaScript PR perf(ccusage): optimize bundled cli performance #984 for daily, session, monthly, weekly, and blocks in offline mode

Notes

Statusline remains an area to decide separately. The current target is to make every non-statusline JavaScript CLI behaviour available in the Rust binary while continuing to reduce parse time and binary size.

Update: Rust Pricing Fetch and Tables (2418e39)

Commit 2418e39 restores the Rust online pricing path after the earlier size cleanup. Pricing now lives in a dedicated module modeled after Zig _pricing.zig, embeds the same Claude pricing fallback JSON, fetches LiteLLM pricing with built-in native TLS for non---offline calculate/auto runs, and logs fetch start/completion like the JavaScript CLI.

It also aligns the Rust table renderer with the JavaScript/Zig output: JS-style title boxes, horizontal separators between logical rows, responsive width scaling, stable date columns, multiline headers, model bullets, and block table formatting.

Release size after 2418e39: exact 757,040 bytes. The rejected rustls fetch attempt was exact 2,464,224 bytes, so the current implementation uses native TLS to stay under the 1MB target. A fresh runtime benchmark is intentionally deferred because Arc/cmux/lazygit were consuming CPU during the measurement window.

Validation for 2418e39:

  • direnv exec . cargo fmt --all
  • direnv exec . cargo test -p ccusage
  • direnv exec . cargo build -p ccusage --release
  • wc -c target/release/ccusage -> 757,040 bytes
  • Online pricing smoke: target/release/ccusage --no-color --since 20260109 --until 20260109 logs LiteLLM fetch start/completion and renders the report

Main Comparison Note: Duplicate Rows

A fresh JSON comparison against origin/main at b5d894e shows intentional output differences when Claude logs contain duplicate rows with the same message id and request id. JavaScript main keeps the first row it sees, while the Rust implementation keeps the most complete duplicate row by token total. This makes Rust totals higher and appears more accurate for partial-then-complete duplicate log records.

Example from local data for daily --offline --json --since 20260109 --until 20260109: main reports outputTokens=62, totalTokens=490914, totalCost=1.62067595; Rust reports outputTokens=911, totalTokens=491763, totalCost=1.64190095. The mismatch is therefore documented as a deliberate correctness difference rather than a table/output formatting bug.

Commit 532267c also fixes compact colored table truncation so ANSI escape sequences are preserved in narrow total rows. Release size after that fix: exact 757,024 bytes.

Update: Dedupe Key Allocation (3f82fb1)

Commit 3f82fb1 removes the per-entry messageId:requestId string concatenation in Rust global deduplication. It now uses a hashed lookup with collision-safe string comparison, preserving duplicate replacement semantics while avoiding one combined-key allocation per deduped usage row.

Rejected alternatives from this pass: opt-level = "s" grew the release binary from 757,024 to 854,416 bytes without a speed win, and a manual JSON fast path was slower than serde_json on this corpus. The release binary after 3f82fb1 is exact 757,040 bytes.

Same-run local A/B against clean 532267c, Apple M3 Pro, comma-provided hyperfine, LOG_LEVEL=0, --offline --json:

Command 532267c baseline 3f82fb1 current Direction
daily 413.4ms ± 19.1ms 388.1ms ± 30.9ms 1.07x faster
session 499.8ms ± 94.3ms 354.5ms ± 6.7ms 1.41x faster, noisy baseline
blocks 416.5ms ± 40.3ms 388.9ms ± 47.3ms 1.07x faster

Validation for 3f82fb1:

The same commit also adds pkg-config and openssl to the Nix devShell so Linux CI can build the https-native-tls Rust dependency graph via openssl-sys.

  • nix develop --command cargo fmt --check
  • nix develop --command cargo test -p ccusage
  • nix develop --command cargo build --release --bin ccusage
  • pnpm run format
  • pnpm typecheck
  • nix develop --command pnpm run test
  • JSON output parity against clean 532267c for daily, session, and blocks

Update: Main Sync and MCP Removal

Merged origin/main at d7e6993 into the Rust branch, including the mainline MCP package removal. The Rust PR no longer carries the temporary MCP test-timeout change; apps/mcp is removed by the main merge.

Validation after the merge:

  • pnpm run format
  • pnpm typecheck
  • nix develop --command pnpm run test -> 26 files, 343 JS/TS tests; Rust workspace tests also pass
  • nix develop --command typos --config ./typos.toml
  • nix develop --command cargo test --workspace
  • nix develop --command cargo build --release --bin ccusage
  • wc -c target/release/ccusage -> 773,152 bytes
  • git diff --check

ryoppippi added 12 commits May 11, 2026 20:07
Introduce a Rust workspace member for the ccusage command path and wire the existing package entry point to execute the native binary when available. The TypeScript modules remain available for library exports and existing tests, while the command implementation now uses native JSONL parsing, parallel file discovery, and release settings tuned for small binaries.

Update package scripts and the Nix dev shell so Rust builds and tests are part of the normal workflow. Darwin shells export SDKROOT for the Nix Apple SDK so cargo can link reliably on macOS.
Keep the package bin target executable after replacing the TypeScript entrypoint with the Rust wrapper. The package.json bin field points at apps/ccusage/src/index.ts for development and publishing, so the source file should retain its executable bit.
Allow the 600MB data-loader stress test enough time on slower local IO and CI environments. The test exercises the same streaming path and assertion coverage; only the Vitest timeout changes so the suite can complete reliably.
Document the Rust versus TypeScript benchmark command, measured runtime, binary size, and output comparison for the rewrite. This gives the PR a concrete performance and size record without mixing measurement notes into implementation changes.
Preserve TypeScript-compatible file discovery and deduplication order so duplicate message/request pairs resolve identically. Match the TypeScript validation shape for optional JSON fields, including rejecting null values that Valibot does not accept, and align session ordering with the existing command behavior. Split CLI argument definitions into a dedicated Rust module while keeping command execution behavior unchanged.
Update the benchmark after the Rust loader was aligned with the TypeScript output. The documented measurement now uses the command invocation that explicitly passes the subcommand through Bun, records the current release binary size, and notes that the canonicalized daily JSON output is identical.
Add a build-time Claude pricing snapshot for offline and fallback cost calculation, while keeping runtime LiteLLM fetching for calculate mode or auto mode entries missing costUSD.

Move the native terminal output toward the TypeScript CLI presentation with boxed headings and responsive bordered tables. The loader now assigns costs after dedupe so auto mode can skip pricing work when every entry already has costUSD.
Record the latest daily JSON measurements for both default auto mode and offline mode after adding native pricing fallback and terminal output parity.
Keep costUSD assignment in the parallel file parsing path for display and auto mode. The post-dedupe pricing pass now only runs when calculate mode is requested or auto mode has entries missing costUSD.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 11, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dfa08aee-68ba-4025-a91f-a160a4fba6a0

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch rust

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented May 11, 2026

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
✅ Deployment successful!
View logs
ccusage-guide e0439c0 May 15 2026, 10:12 AM

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented May 11, 2026

Open in StackBlitz

@ccusage/amp

npm i https://pkg.pr.new/ryoppippi/ccusage/@ccusage/amp@977

ccusage

npm i https://pkg.pr.new/ryoppippi/ccusage@977

@ccusage/codex

npm i https://pkg.pr.new/ryoppippi/ccusage/@ccusage/codex@977

@ccusage/opencode

npm i https://pkg.pr.new/ryoppippi/ccusage/@ccusage/opencode@977

@ccusage/pi

npm i https://pkg.pr.new/ryoppippi/ccusage/@ccusage/pi@977

commit: e0439c0

ryoppippi added 11 commits May 11, 2026 22:18
Port the useful hot-path ideas from the native Rust loader in #977 back into the TypeScript implementation. Usage rows now avoid Valibot on the JSONL hot path, skip lines that cannot contain token usage before parsing, and preserve chronological duplicate behaviour without the previous timestamp pre-sort pass.

Cache date formatters and handle the default local and UTC date formats directly so daily grouping avoids repeated Intl formatter construction for the common paths.

The benchmarked bundled CLI stays under two seconds on the local Claude data set and now runs in the same range as the Rust implementation for offline JSON output.

Refs #977

Co-authored-by: ryoppippi <1560508+ryoppippi@users.noreply.github.com>
Avoid parsing JSON for transcript lines that cannot contain usage tokens in the native loader. The file-order timestamp prepass now extracts compact JSONL timestamps directly and falls back to the existing serde_json path when that fast path does not apply.

This keeps daily, session, monthly, and blocks JSON output identical while cutting the local offline daily benchmark from roughly 1.25s median to roughly 0.65s median.

Refs #977
Remove Valibot from the JSONL usage hot path by adding a lightweight parser for assistant usage rows, while keeping malformed and nullable fields rejected like the schema path.

Read JSONL files with bounded file-level parallelism based on available CPU parallelism, preserve main-compatible timestamp ordering and first-wins deduplication, and add --single-thread for sequential reads when needed.

Cache date formatters and fast-path local/UTC date formatting to avoid repeated Intl formatter construction during daily grouping.

Refs #977

Co-authored-by: ryoppippi <1560508+ryoppippi@users.noreply.github.com>
Skip native loader JSON parsing for transcript lines that cannot contain usage tokens, and extract compact timestamp fields directly during the file ordering prepass.

This preserves Rust JSON output for daily, session, monthly, and blocks while bringing the local offline daily benchmark down to roughly 0.65s.

Refs #977

Co-authored-by: ryoppippi <1560508+ryoppippi@users.noreply.github.com>
Use Bun.Glob when available to enumerate JSONL usage files while retaining the tinyglobby fallback for Node and other runtimes.

Route block loading through the shared discovery helper so all report modes use the same Bun-aware path.

Local scan benchmark over 3124 JSONL files: tinyglobby 47.7ms, Bun.Glob 28.1ms. Full report runtime remains dominated by JSONL parsing, so warm daily and blocks timings are effectively unchanged.

Refs #977
Replace per-line trim-based blank checks with a small whitespace scanner so normal JSONL records avoid allocating a trimmed string.

Verified daily/session/monthly output equality and blocks equality with projection removed after the change.

Refs #977
Prefer the duplicate message/request entry with the largest token total instead of keeping the first record encountered. Claude logs can contain an initial partial usage record followed by a more complete one with the same message id and request id; keeping the partial record under-counts the native reports.

Add a native regression test that loads duplicate JSONL entries through the normal loader and verifies the more complete usage and cost are retained.

Benchmark: mitata on Apple M3 Pro, node 24.15.0, built JS CLI from /Users/ryoppippi/ghq/github.com/ryoppippi/ccusage-wt/perf/apps/ccusage/dist/index.js, Rust release CLI at target/release/ccusage, LOG_LEVEL=0, --offline --json, min_samples=5.

Results: JS daily 3.85s, session 3.99s, monthly 3.94s, blocks 8.36s. Rust daily 630.05ms, session 643.02ms, monthly 622.38ms, blocks 649.11ms.

Rust binary size: target/release/ccusage is 2.0M on arm64 macOS.

Correctness: built JS and Rust outputs matched for daily/session/monthly/blocks token totals after sorting by stable keys. Cost totals can still differ by floating-point formatting only.

Verification: direnv exec . cargo test -p ccusage.
ryoppippi added 3 commits May 12, 2026 02:36
Load each JSONL file once and collect both the earliest timestamp and usage entries from that pass. This removes the previous sorted_usage_files pre-scan, which opened and read every file before opening the same files again for aggregation.

Directory traversal now uses DirEntry::file_type() to avoid extra path metadata checks during JSONL discovery. The loader also supports --single-thread for deterministic runs, while the default path keeps Rayon file-level parallelism.

Bring the non-statusline table output closer to the JavaScript CLI without adding table dependencies: responsive compact columns, model multiline rows, breakdown rows, daily project sections with projectAliases, session Last Activity, and detailed blocks --active output.

Remove tokio because the Rust CLI no longer uses async runtime features. The release binary is now 1.9M on arm64 macOS, down from the previous 2.0M local release build.

Benchmark on Apple M3 Pro, release binary, LOG_LEVEL=0, --offline --json, hyperfine --warmup 1 --runs 5. Process checks before each run showed no hyperfine/mitata/zig benchmark process; cmux and WindowServer were still active, so treat this as local noisy data.

Rust current: daily 393.7ms ± 16.1ms, session 398.7ms ± 25.7ms, monthly 383.8ms ± 6.4ms, weekly 389.1ms ± 9.4ms, blocks 398.9ms ± 14.2ms. Previous PR body baseline was daily 630.05ms, session 643.02ms, monthly 622.38ms, blocks 649.11ms.

Verified with direnv exec . cargo test -p ccusage, direnv exec . cargo build -p ccusage --release, and JS/Rust token-total parity checks for daily/session/monthly/weekly/blocks against the built JavaScript CLI.
Replace chrono-tz with Jiff system zoneinfo lookups for date grouping. chrono-tz embedded the generated IANA timezone table in the binary, while Jiff can read the system zoneinfo database on Unix, so the Rust CLI keeps IANA timezone support without carrying the static table in the executable.

Release binary size on arm64 macOS dropped from 1.9M to 948K after this change (ls: 970K). Reliable runtime benchmark data is not included because process checks showed a concurrent Zig build-exe process at about 99% CPU and cmux at about 170% CPU.

Verified with direnv exec . cargo fmt --all, direnv exec . cargo test -p ccusage, direnv exec . cargo build -p ccusage --release, JS/Rust token-total parity checks for daily/session/monthly/weekly/blocks, and timezone parity checks for UTC, Asia/Tokyo, America/New_York, and Europe/London.
Avoid the extra timestamp extraction pass on valid usage lines by reusing the parsed UsageEntry timestamp for both file ordering and LoadedEntry construction. Non-usage lines and malformed usage lines still use the existing compact timestamp fallback, so file ordering semantics remain conservative.

Also drops the unused chrono serde feature. The release binary size stayed unchanged at 970,464 bytes on arm64 macOS, but the dependency edge is no longer requested by chrono.

Benchmark: noisy local run with cmux around 170% CPU, hyperfine --warmup 1 --runs 3, built JS via node vs Rust release. daily: JS 3.113s ± 0.024s, Rust 460.0ms ± 22.8ms, Rust 6.77x faster. session: JS 3.188s ± 0.029s, Rust 426.2ms ± 45.6ms, Rust 7.48x faster. monthly: JS 3.148s ± 0.028s, Rust 393.3ms ± 59.1ms, Rust 8.01x faster. blocks: JS 3.460s ± 0.046s, Rust 543.5ms ± 57.1ms, Rust 6.37x faster.

Validation: cargo fmt --check; cargo test; cargo build --release; JS/Rust JSON parity for daily/session/monthly/weekly/blocks.
ryoppippi added 8 commits May 12, 2026 06:59
Remove the anyhow dependency from the Rust binary and use a small local CliError type plus a local bail macro/context helper. This keeps the current command error text shape without pulling anyhow into the release build.

Size: target/release/ccusage 937,472 bytes -> 937,200 bytes (-272 bytes).

Validation: direnv exec . cargo fmt; direnv exec . cargo check; direnv exec . cargo test; direnv exec . cargo build --release; JS/Rust token and cost parity matched for daily/session/monthly/weekly/blocks.

Benchmark note: attempted hyperfine A/B against 80ba182 while Chrome/cmux were consuming several cores, so the timing run had severe noise and outliers. Observed r=3 means were daily 1.531s -> 1.319s, session 1.220s -> 1.469s, blocks 1.468s -> 2.612s; no speed claim is made from that run.
Replace fixed date and time rendering with manual formatting helpers so the Rust CLI does not pull in Jiff/chrono generic strftime machinery for simple table and JSON date strings.

This keeps timezone conversion through Jiff but avoids the generic formatter on the hot display paths. cargo-bloat shows Jiff text dropping from about 75.6 KiB to 25.0 KiB, and the release binary drops from 937,200 bytes to 804,480 bytes.

Validated with cargo fmt, cargo check, cargo test, cargo build --release, and JS/Rust token and cost parity for daily, session, monthly, weekly, and blocks.
Replace the remaining fixed RFC3339/date parsing and formatting call sites with small dedicated helpers. This keeps the accepted timestamp shapes, adds offset timestamp coverage, and avoids pulling chrono format parse/output machinery into the release binary.

Release binary size drops from 804,480 bytes to 787,808 bytes. cargo-bloat shows chrono text dropping from about 12.5 KiB to 2.9 KiB.

Validated with cargo fmt, cargo check, cargo test, cargo build --release, cargo-bloat, and JS/Rust token and cost parity for daily, session, monthly, weekly, and blocks.
Replace clap derive parsing with a small parser that covers the ccusage command surface used by the Rust binary. This removes clap, clap_builder, clap_derive, clap_lex, anstyle, and heck from the release dependency graph.

Release binary size drops from 787,808 bytes to 654,464 bytes. cargo-bloat no longer reports clap_builder in the release text section.

Validated with cargo fmt, cargo check, cargo test, cargo build --release, --help/--version smoke checks, and JS/Rust token and cost parity for daily, session, monthly, weekly, and blocks.
Replace Rayon file loading with a small scoped-thread worker split based on available parallelism. The CLI keeps parallel JSONL loading without carrying the Rayon and crossbeam dependency graph in the release binary.

Release binary size drops from 654,464 bytes to 637,472 bytes. cargo-bloat no longer reports rayon_core in the release text section.

Validated with cargo fmt, cargo check, cargo test, cargo build --release, cargo-bloat, and JS/Rust token and cost parity for daily, session, monthly, weekly, and blocks.
Replace the remaining chrono usage with a compact UTC millisecond timestamp and ISO date helper. The CLI only needs fixed-format UTC parsing/formatting, week-start date arithmetic, and block duration maths, while timezone date rendering still uses the existing Jiff dependency.

This removes chrono, num-traits, and autocfg from the Rust dependency graph.

Release binary size on arm64 macOS:

- before: 637,472 bytes at 4caad29

- after: 620,880 bytes

- change: -16,592 bytes

Validation:

- cargo fmt

- cargo check

- cargo test

- cargo build --release

- JS/Rust token and cost parity for daily/session/monthly/weekly/blocks

- target/release/ccusage --version

- target/release/ccusage daily --help
Move Rust pricing into a dedicated module modeled after the Zig implementation, with an embedded Claude pricing fallback and native TLS LiteLLM fetch for online runs. The CLI now logs fetch start/completion like the JavaScript version, skips fetching for --offline and display mode, and keeps LOG_LEVEL=0 quiet for benchmark runs.

Align the Rust table renderer with the JavaScript/Zig output by using JS-style title boxes, horizontal separators between logical rows, responsive width scaling, stable date columns, multiline headers, model bullets, and block table formatting.

Validation: direnv exec . cargo fmt --all; direnv exec . cargo test -p ccusage; direnv exec . cargo build -p ccusage --release.

Release size: 620,880 bytes on branch baseline; 620,928 bytes after table-only changes; 757,040 bytes with embedded pricing plus native TLS online fetch. The rustls fetch attempt was rejected at 2,464,224 bytes. Runtime benchmark was deferred because Arc/cmux/lazygit were consuming CPU.
Keep ANSI escape sequences intact when truncating responsive table cells so colored compact rows do not leak broken reset sequences or stray separators. This fixes the compact total row rendering seen in narrow terminals while keeping no-color output unchanged.

Main comparison note: Rust intentionally does not byte-match origin/main for duplicate message/request rows because it keeps the most complete duplicate entry. Example on 2026-01-09 daily: main outputTokens=62,totalTokens=490914,totalCost=1.62067595 while Rust outputTokens=911,totalTokens=491763,totalCost=1.64190095.

Validation: direnv exec . cargo fmt --all; direnv exec . cargo test -p ccusage; direnv exec . cargo build -p ccusage --release; COLUMNS=100 target/release/ccusage --offline with and without --no-color. Release size: 757,024 bytes.
ryoppippi added a commit that referenced this pull request May 12, 2026
Mention: #984

Bun is faster when node:fs/promises readFile decodes directly to utf8, while Node previously measured faster with the existing file-handle Buffer path. processJSONLFileByLine now keeps the Node path unchanged and uses stat(path) plus readFile(path, "utf8") only when running under Bun. The buffered line loop is factored into a shared helper so both paths keep the same line handling.

Microbench on 3,125 local JSONL files under Bun: file-handle Buffer.toString path was about 386-429ms, file-handle utf8 was about 134-259ms, and stat(path)+readFile(path, utf8) was about 113-125ms after warmup.

Benchmarks on Apple M3 Pro, LOG_LEVEL=0, built ccusage dist, --offline --json:

- daily, hyperfine --warmup 2 --runs 7: Bun 551.7ms ± 11.8ms; Rust #977 367.6ms ± 12.9ms; Node 1.011s ± 0.066s.

- session, hyperfine --warmup 2 --runs 5: Bun 578.3ms ± 13.0ms; Rust #977 367.4ms ± 9.3ms.

- blocks, hyperfine --warmup 2 --runs 5: Bun 610.5ms ± 4.0ms; Rust #977 379.9ms ± 9.6ms.

Validation: pnpm run format; pnpm typecheck; pnpm run test; pnpm --filter ccusage run build; Node/Bun daily JSON output matched; 4-worker/8-worker daily JSON output matched.
@ryoppippi ryoppippi force-pushed the rust branch 3 times, most recently from 4d9d14c to 01a3080 Compare May 12, 2026 16:26
Replace per-entry messageId/requestId string concatenation with a hashed lookup and collision-safe string comparison. This keeps duplicate replacement semantics unchanged while avoiding allocation of a combined key for every usage entry during global deduplication.

Also rename the Rust color helper from colour/Colour to color/Color so the existing spell-check workflow passes on this branch.

The rejected alternatives were opt-level=s, which increased binary size from 757024 to 854416 bytes without a speed win, and a manual JSON fast path, which slowed daily runs. Baseline/current JSON output matched for daily, session, and blocks.
ryoppippi added 3 commits May 12, 2026 17:35
Add rust-overlay to the flake and source the dev shell Rust compiler from rust-toolchain.toml so local development and CI use the same Rust 1.85.1 toolchain.

Include clippy and rustfmt in the pinned toolchain and replace the direct nixpkgs cargo/rustc/rustfmt packages. Fix the one clippy warning exposed by running the pinned toolchain with -D warnings.
Distribute Rust JSONL worker chunks by file size instead of contiguous file-count chunks. Large Claude JSONL files can otherwise leave one worker with the long tail while other workers finish early.

The final merge restores loaded files to the original file index order before aggregation, preserving dedupe and floating-point accumulation order. A regression test covers balanced chunk assignment.

Fixed snapshot parity with CLAUDE_CONFIG_DIR=/tmp/ccusage-data-snapshot/config: daily/session/blocks output matched the previous Rust binary byte-for-byte, with blocks compared after removing the time-dependent projection field.

Benchmarks on the fixed snapshot, LOG_LEVEL=0, --offline --json, hyperfine --warmup 3 --runs 12:

- daily old: 366.0ms ± 13.9ms; new: 364.8ms ± 83.7ms (effectively unchanged/noisy)

- session old: 407.0ms ± 40.5ms; new: 334.6ms ± 30.3ms (~1.22x faster)

- blocks old: 368.0ms ± 15.3ms; new: 342.5ms ± 13.9ms (~1.07x faster)

Binary size: 773k -> 790k (+17k).

Validation:

- nix develop --command cargo fmt --all --check

- nix develop --command cargo test --release

- nix develop --command cargo build --release
ryoppippi added a commit that referenced this pull request May 13, 2026
For Bun usage-only JSONL loading, read buffered files with Bun.file().bytes() and scan the byte buffer for the usage marker before decoding only matching usage rows. Node keeps the existing text path, and blocks still uses the full text path because it needs non-usage timestamps.

This avoids decoding the full 1.28GB local JSONL corpus for daily/session commands when most non-usage content is discarded before parsing.

Built Bun JSON A/B against 5db98c7: daily --offline --json 423.3ms ± 7.0ms -> 370.6ms ± 24.7ms (~1.14x faster), session 443.3ms ± 19.1ms -> 373.9ms ± 2.7ms (~1.19x faster), blocks 556.2ms ± 22.0ms -> 549.3ms ± 5.5ms (noise-level, mostly unchanged).

Built Bun table A/B: daily table 426.7ms ± 9.3ms -> 363.0ms ± 9.3ms (~1.18x faster), session table 442.3ms ± 3.8ms -> 384.2ms ± 14.0ms (~1.15x faster), blocks table 560.3ms ± 16.5ms -> 554.5ms ± 10.6ms (noise-level).

Rust reference after this change: JS daily 366.1ms ± 11.9ms vs Rust #977 daily 294.4ms ± 3.5ms (Rust ~1.24x faster); JS session 388.4ms ± 20.4ms vs Rust 300.3ms ± 3.8ms (Rust ~1.29x faster).

Output parity matched the previous built dist for daily/session/blocks/monthly/weekly JSON and daily/session/blocks table output. Node and Bun JSON outputs matched for daily/session/blocks/monthly/weekly. Build size is 404.85 kB. Validation: pnpm run format, pnpm typecheck, targeted data-loader tests, pnpm --filter ccusage run build, and pnpm run test (28 files, 351 passed, 1 skipped).
ryoppippi added a commit that referenced this pull request May 13, 2026
Extend the Bun JSONL byte-scan path to block loading. Usage rows are still decoded and parsed through the existing fast parser, while non-usage block timestamps are extracted directly from bytes so block loading avoids decoding every JSONL line on Bun. Node keeps the existing text/file-handle path.

On real local Claude data with LOG_LEVEL=0 and COLUMNS=200, the combined Bun byte-scan path improved the pre-byte baseline from daily JSON 419.9ms ± 6.2ms to 362.7ms ± 7.6ms, session JSON 450.2ms ± 20.9ms to 378.0ms ± 9.3ms, and blocks JSON 551.4ms ± 9.2ms to 462.2ms ± 13.8ms. Table output saw the same shape: daily 431.2ms ± 21.5ms to 368.1ms ± 6.9ms, session 442.7ms ± 5.7ms to 379.8ms ± 7.7ms, and blocks 552.9ms ± 12.9ms to 462.8ms ± 12.3ms.

Current Rust #977 reference from the same host remains faster: JS daily 368.2ms ± 15.0ms vs Rust 292.9ms ± 4.1ms, JS session 376.4ms ± 10.9ms vs Rust 298.3ms ± 5.5ms, and JS blocks 449.1ms ± 8.0ms vs Rust 297.7ms ± 5.4ms.

Validation: pnpm run format; pnpm typecheck; pnpm --filter ccusage exec vitest run src/data-loader.ts --testNamePattern loadSessionBlockData\|loadDailyUsageData\|loadSessionData; pnpm --filter ccusage run build; pnpm run test; Bun/Node JSON parity for daily/session/blocks/monthly/weekly; table parity for daily/session/blocks.
ryoppippi added a commit that referenced this pull request May 13, 2026
Block aggregation only emits usage entries, and file-result ordering can use the earliest parsed usage timestamp. Skipping timestamp extraction on non-usage JSONL rows removes repeated byte indexOf/toString work from the blocks hot path while preserving the current block JSON and table output on the local corpus.

A/B against ca9b3a5 on real local Claude data with LOG_LEVEL=0 and COLUMNS=200: blocks --offline --json improved from 454.6ms ± 5.6ms to 414.7ms ± 9.1ms, and blocks --offline table improved from 458.0ms ± 9.6ms to 424.8ms ± 16.4ms. Build size dropped from 406.57 kB to 405.32 kB.

Current Rust #977 comparison on the same host: JS daily 361.8ms ± 7.2ms vs Rust 294.8ms ± 3.9ms, JS session 376.7ms ± 10.0ms vs Rust 305.6ms ± 11.8ms, and JS blocks 423.5ms ± 24.6ms vs Rust 296.9ms ± 4.5ms.

Validation: pnpm run format; pnpm typecheck; pnpm --filter ccusage exec vitest run src/data-loader.ts --testNamePattern loadSessionBlockData\|loadDailyUsageData\|loadSessionData; pnpm --filter ccusage run build; pnpm run test; blocks JSON/table parity against ca9b3a5; Bun/Node blocks JSON parity.
Switch the Rust release profile from opt-level z to opt-level 2. This keeps LTO, single codegen unit, panic abort, and symbol stripping, but gives LLVM enough room to optimize the hot JSONL/read/aggregation paths.

On real local Claude data with LOG_LEVEL=0, opt-level 2 produced a 952,784 byte arm64 macOS binary. The previous opt-level z build was 789,728 bytes and measured around daily 294.8ms, session 305.6ms, blocks 296.9ms in the latest JS comparison. The opt-level 2 build measured daily 245.5ms ± 3.6ms, session 248.7ms ± 3.6ms, and blocks 257.3ms ± 6.6ms. opt-level 3 was similar speed but larger at 1,002,544 bytes; opt-level s was smaller than 3 at 887,152 bytes but slower; opt-level 1 was both larger and slower.

Validation: direnv exec . cargo build --release; direnv exec . cargo test; JS/Rust daily totalTokens parity; JS/Rust blocks summed totalTokens parity.
ryoppippi added a commit that referenced this pull request May 13, 2026
Blocks no longer need non-usage JSONL timestamps for file ordering, but the block Bun byte path still iterated every line and manually checked whether the current line contained the usage marker.

Reuse the shared usage-line scanner instead. This keeps blocks on the same marker-driven path as daily/session, removes the redundant per-line marker guard, and shrinks the ccusage build from 405.25 kB to 404.44 kB.

Same-host A/B, Apple M3 Pro, real local Claude data, LOG_LEVEL=0, COLUMNS=200, built Bun dist, hyperfine via comma:

- blocks --offline --json: 419.2ms +/- 20.1ms -> 412.0ms +/- 5.8ms (~1.02x)

- blocks --offline table: 420.7ms +/- 21.2ms -> 416.9ms +/- 9.4ms (~1.01x)

Latest JS/Rust #977 reference after this change:

- JS daily/session/blocks JSON: 352.9ms / 368.8ms / 411.4ms

- Rust daily/session/blocks JSON: 245.9ms / 253.1ms / 256.1ms

Validation: pnpm run format; pnpm typecheck; pnpm run test; pnpm --filter ccusage run build; baseline/current JSON parity for daily/session/blocks/monthly/weekly; baseline/current table parity for daily/session/blocks; Bun/Node JSON parity for daily/session/blocks/monthly/weekly.
ryoppippi added 5 commits May 13, 2026 06:14
The JavaScript ccusage CLI no longer exposes a --locale option, and the Rust implementation parsed the value without using it anywhere. Remove the stale field, parser branch, and help text so the Rust CLI surface matches the JavaScript CLI more closely.

This is a parity cleanup rather than a size win: the arm64 macOS release binary stayed at 952,784 bytes after strip/LTO.

Validation: direnv exec . cargo fmt --all --check; direnv exec . cargo test; direnv exec . cargo build --release; target/release/ccusage --help no longer lists --locale; target/release/ccusage --locale en-CA exits with code 2.
Mention: #977

Keep the workspace release profile at opt-level=2 for the ccusage crate, but compile non-workspace dependencies with opt-level="s". This keeps the hot ccusage code speed-oriented while letting std-adjacent dependency code shrink under fat LTO.

A/B on Apple M3 Pro, macOS 15.7.3, real local Claude JSONL data, LOG_LEVEL=0, hyperfine --warmup 4 --runs 10:

daily --offline --json: baseline 242.5ms +/- 3.5ms, dependency-size profile 242.0ms +/- 3.4ms.

session --offline --json: baseline 249.6ms +/- 4.9ms, dependency-size profile 246.5ms +/- 5.3ms.

blocks --offline --json: baseline 254.0ms +/- 5.2ms, dependency-size profile 250.9ms +/- 2.7ms.

Release binary size changed from 952,784 bytes to 936,640 bytes, 16,144 bytes smaller.

Rejected profile check: making the global release profile opt-level="s" and overriding only package.ccusage back to opt-level=2 produced a larger 1,080,576-byte binary, so this commit uses the inverse package override instead.

Validation: direnv exec . cargo fmt --all --check; direnv exec . cargo test; direnv exec . cargo build --release; wc -c target/release/ccusage.
Mention: #977

Use the same usage-object marker as the JavaScript loader when deciding whether a Rust JSONL line should go through serde_json usage parsing. The previous input_tokens marker could match unrelated content and cause unnecessary parse attempts before falling back to timestamp extraction.

Real local Claude data, Rust release binary, LOG_LEVEL=0, hyperfine via comma, reverse-order confirmation against 7281959:

- daily --offline --json: 245.5ms -> 237.7ms

- session --offline --json: 250.7ms -> 240.8ms

- blocks --offline --json: 254.2ms -> 245.1ms

Release binary size stays exact 936,640 bytes. Stable JSON parity matched against 7281959 for daily, session, monthly, weekly, and blocks with projection removed.

Validation: direnv exec . cargo fmt --all --check; direnv exec . cargo test -p ccusage; direnv exec . cargo build --release -p ccusage.
Store parallel file-load results directly at their original file index instead of collecting index/file pairs, sorting them, and then mapping away the indexes.

This preserves output order while removing the reorder sort from the Rust worker merge path.

Fixed-snapshot parity matched for daily/session/monthly/weekly JSON and stable blocks JSON. Release binary size changed from 936640 bytes to 936624 bytes.

Benchmark on the fixed snapshot: daily --offline --json 252.9ms +/- 6.2ms baseline vs 251.3ms +/- 5.2ms indexed merge; session 256.1ms +/- 3.9ms vs 257.3ms +/- 4.9ms; blocks 261.3ms +/- 5.5ms vs 262.0ms +/- 4.5ms. Treat as size/tiny daily win with session/blocks effectively unchanged.

Validation: direnv exec . cargo fmt --all --check; direnv exec . cargo test -p ccusage; direnv exec . cargo build --release -p ccusage.

Refs #977
Add a Rust PR performance workflow that checks out the base branch, builds the base JavaScript CLI, builds the Rust PR release binary, generates synthetic Claude JSONL fixtures, and posts an upserted PR comment with same-run hyperfine ratios.

The comparison intentionally measures base/main through the fastest known JS path, bun -b dist/index.js, while measuring the PR side as target/release/ccusage directly. That keeps the comment focused on implementation runtime instead of Node launcher overhead.

The generated large fixture is shaped from aggregate local Claude-log statistics only: thousands of JSONL files, many small sessions, and a long tail of larger sessions. This avoids storing real prompts or outputs while exercising the multi-file loading path that matters for large user logs.

Local smoke result with the 1GiB shaped fixture: main JS/Bun daily --offline --json 15.099s, Rust PR 284.1ms, about 53.15x faster. CI will publish its own same-run numbers on the PR.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 15, 2026

ccusage Rust performance comparison

This compares the Rust PR release binary against the base branch JavaScript build on the same CI runner.

Small generated fixture performance

Generated small fixture for stable Rust-vs-main feedback and output-shape regressions.

Fixture: /home/runner/work/_temp/ccusage-small-fixture
Runtime: main branch and Rust PR both use the package ccusage bin from apps/ccusage/package.json through bun -b. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.

Command main JS/Bun median Rust PR median Rust speedup
daily --offline --json 312.8ms 34.6ms 9.04x
session --offline --json 174.3ms 33.6ms 5.19x
blocks --offline --json 162.7ms 34.9ms 4.66x

Large real-world-shaped fixture performance

Generated fixture around 1 GiB shaped from aggregate local Claude-log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixture.

Fixture: /home/runner/work/_temp/ccusage-large-fixture
Runtime: main branch and Rust PR both use the package ccusage bin from apps/ccusage/package.json through bun -b. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.

Command main JS/Bun median Rust PR median Rust speedup
daily --offline --json 34.589s 520.7ms 66.43x

Artifact size

Artifact Size
Rust release binary target/release/ccusage 962.44 KiB

Lower medians and smaller binaries are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

Resolve the ccusage package bin from apps/ccusage/package.json for both the base checkout and the Rust PR checkout, then execute that entry through the Bun executable already running the CI script.

This keeps the Rust comparison aligned with the command users install and run, including the JavaScript wrapper that launches the native binary, while avoiding pnpm exec and shell quoting in the measured command via hyperfine --shell none.

Package metadata is read with Bun.file rather than node:fs synchronous reads.
ryoppippi added a commit that referenced this pull request May 15, 2026
* chore(ccusage): add CLI benchmark harness

Add a mitata-based script that executes the built ccusage CLI with Node. The default bounded benchmark runs dist/index.js --offline --json and ignores stdout so timings measure command work rather than terminal rendering.

The harness reports Node version, built CLI size, arguments, and bounded latency statistics. It also supports custom ccusage arguments after -- and --full for mitata summary output when the target is light enough for mitata default sample tuning.

Performance notes: on an empty Claude data directory the bounded smoke benchmark measured 58.99 ms for one sample with dist/index.js at 125.66 KiB. The real-log baseline should be compared against origin/main under the same built Node CLI conditions, not against bunx or an interrupted run.

* perf(ccusage): avoid duplicate JSONL scans

Remove the eager timestamp sort from daily and session loading. That sort scanned every JSONL file to find an earliest timestamp and then parsed all files again, which doubled the read cost for large Claude histories.

Deduplication now tracks the retained entry index for each message/request hash and replaces it only when a later parse finds an older timestamp. This keeps the existing oldest-duplicate-wins behaviour without requiring file-level chronological pre-sorting.

Date grouping now reuses Intl.DateTimeFormat instances by timezone and locale instead of constructing a formatter for every usage entry.

Performance: with 3,119 JSONL files and 399,396 lines of real Claude logs, built Node CLI ccusage --offline --json improved from origin/main /usr/bin/time avg 12.86 s over 3 runs (12.94, 12.79, 12.86) to this branch avg 5.13 s over 3 runs (5.21, 5.05, 5.14), about 2.5x faster on Node v24.15.0. Bundle impact stayed small: dist/index.js remained 128.67 kB / 31.06 kB gzip; total dist changed from 569.93 kB to 570.72 kB.

* perf(ccusage): skip cold files and non-usage lines

Filter JSONL files by mtime when --since is provided, using a one-day buffer so date-range reports can avoid scanning cold historical files. Also skip lines without the required input_tokens marker before JSON.parse/Valibot validation in the ccusage loaders.

This incorporates the low-risk file filtering and line precheck ideas from PR #869 and the date-pruning direction from PR #877, scoped to ccusage.

Benchmarks on local real Claude logs, Node v24.15.0, built CLI, 3 samples:

- ccusage --offline --json: 4.50s avg (min 3.99s, max 5.49s)

- ccusage --offline --json --since 20260501: 261.14ms avg (min 235.00ms, max 308.73ms)

Verification:

- pnpm run format

- pnpm typecheck

- pnpm run test

- pnpm --filter ccusage build

Co-authored-by: pbuchman <368465+pbuchman@users.noreply.github.com>

Co-authored-by: jleechan2015 <13840161+jleechan2015@users.noreply.github.com>

* perf(ccusage): narrow statusline file scans

Add an internal minUpdateTime loader option and use it from statusline so today cost only scans files touched since local midnight and active block discovery only scans files touched in the last 24 hours.

This incorporates the statusline time-window pruning idea from PR #623 without bringing in its broader cache implementation.

Verification:

- pnpm run format

- pnpm typecheck

- pnpm run test

- pnpm --filter ccusage build

- cat apps/ccusage/test/statusline-test-sonnet4.json | node apps/ccusage/dist/index.js statusline --offline

Note: pnpm --filter ccusage test:statusline:sonnet4 currently fails before this change on Node v24 because node ./src/index.ts hits ERR_IMPORT_ATTRIBUTE_MISSING for package.json; the built CLI smoke above passes.

Co-authored-by: Szpadel <1857251+Szpadel@users.noreply.github.com>

* fix(ccusage): keep complete duplicate usage entries

Prefer the duplicate message/request entry with the largest token total instead of keeping the chronologically oldest record. Claude logs can contain an initial partial usage record followed by a more complete one with the same message id and request id; keeping the partial record under-counts daily, monthly and blocks output.

Apply the same replacement rule to session block aggregation and cover the behaviour with regression tests for daily and blocks reports.

* perf(ccusage): speed up usage data loading

Parse common usage JSONL rows with a lightweight hot path before falling back to JSON parsing and structural validation. Daily loading now reads JSONL files concurrently up to the available CPU parallelism capped at 16, then applies dedupe in file order so output ordering and duplicate handling stay stable.

Benchmark: mitata on Apple M3 Pro, node 24.15.0, built JS CLI at apps/ccusage/dist/index.js, Rust release CLI at /Users/ryoppippi/ghq/github.com/ryoppippi/ccusage/target/release/ccusage, LOG_LEVEL=0, --offline --json, min_samples=5.

Results: JS daily 3.85s, session 3.99s, monthly 3.94s, blocks 8.36s. Rust daily 630.05ms, session 643.02ms, monthly 622.38ms, blocks 649.11ms.

Previous local baseline after duplicate-entry fixes: JS daily 4.09s, session 3.91s, monthly 4.00s, blocks 8.60s; Rust daily 641ms, session 701ms, monthly 694ms, blocks 644ms.

Correctness: built JS and Rust outputs matched for daily/session/monthly/blocks token totals after sorting by stable keys. Cost totals can still differ by floating-point formatting only.

Verification: pnpm run format; pnpm typecheck; pnpm --filter ccusage exec vitest run src/data-loader.ts; pnpm --filter ccusage run build; pnpm run test.

* perf(ccusage): parallelise JSONL loading by CPU count

Parallelise JSONL file loading across daily, session, and blocks reports using os.availableParallelism() by default. This keeps the worker count tied to the host CPU instead of a fixed cap, while --single-thread and CCUSAGE_JSONL_READ_CONCURRENCY provide deterministic overrides for benchmarking and constrained environments.

Small JSONL files are now read in one buffered pass, with the existing streaming path retained for files larger than 32 MiB. Cost calculation also keeps a synchronous fast path when --offline or cached costUSD values make async pricing unnecessary.

Dedupe is still applied after each file result is collected, preserving the original file-order replacement behaviour and keeping JS/Rust token totals identical for daily, session, monthly, and blocks JSON output.

Benchmark on Apple M3 Pro, Node 24.15.0, os.availableParallelism()=11, built JS CLI at apps/ccusage/dist/index.js, Rust release CLI at /Users/ryoppippi/ghq/github.com/ryoppippi/ccusage/target/release/ccusage, LOG_LEVEL=0, --offline --json, mitata min_samples=5:

JS: daily 3.51s, session 3.07s, monthly 3.17s, blocks 6.51s. Rust: daily 640.16ms, session 639.67ms, monthly 660.97ms, blocks 649.93ms. Previous JS PR baseline from bd7d2d4: daily 3.85s, session 3.99s, monthly 3.94s, blocks 8.36s.

Verified with pnpm run format, pnpm typecheck, pnpm --filter ccusage exec vitest run src/data-loader.ts, pnpm --filter ccusage run build, pnpm run test, and JS/Rust token-total parity checks for daily/session/monthly/blocks.

* perf(ccusage): avoid double reading block files

Load session block JSONL files once and collect each file earliest timestamp during the same pass that parses usage entries. This removes the previous sortFilesByTimestamp pre-scan from blocks, which read every file before reading the same files again for aggregation.

The standalone getEarliestTimestamp helper now shares the same fast timestamp extraction path, falling back to JSON.parse only for non-compact timestamp lines.

Benchmark on Apple M3 Pro, built JavaScript CLI, LOG_LEVEL=0, blocks --offline --json, hyperfine --warmup 1 --runs 5. Process checks showed no hyperfine/mitata/zig benchmark process before the run; cmux and WindowServer were active, so treat the absolute number as local noisy data.

JS blocks current: 3.347s ± 0.044s. Previous JS baseline from f5020ef was 6.51s, so this is about 1.95x faster for blocks.

Verified with pnpm run format, pnpm typecheck, pnpm --filter ccusage exec vitest run src/data-loader.ts, pnpm --filter ccusage run build, pnpm run test, and JS/Rust token-total parity checks for daily/session/monthly/weekly/blocks.

* perf(ccusage): use fast parsing for content usage lines

Allow the string-field parser to handle normal usage lines that include message.content. The local Claude data set has 162,838 usage lines and all of them include content, so the previous default sent effectively every usage line through JSON.parse before the lightweight object validation path.

API error lines still fall back to the full JSON path because usage limit reset extraction needs message content, and malformed non-array content still falls back instead of being accepted by the fast parser.

Reliable benchmark data is not included in this commit. An attempted hyperfine run was aborted after a concurrent Zig build started, and subsequent process checks still showed cmux at roughly 190-200% CPU. The previous measured JS blocks result remains 3.347s ± 0.044s from 6e876b6.

Verified with pnpm run format, pnpm typecheck, pnpm --filter ccusage exec vitest run src/data-loader.ts, pnpm --filter ccusage run build, pnpm run test, and JS/Rust token-total parity checks for daily/session/monthly/weekly/blocks.

* perf(ccusage): use native JSONL discovery

Replace tinyglobby usage in the ccusage hot path with a small native readdir-based JSONL walker. The loader now avoids starting the glob engine for daily, session-by-id, block, and debug mismatch discovery, keeps deterministic file order with path sorting, and removes tinyglobby from the ccusage app dependencies.

Reliable runtime benchmark data is not included in this commit because process checks still showed cmux at roughly 150% CPU and another node ccusage process. The built bundle remains 515.86 kB total after this change.

Verified with pnpm run format, pnpm typecheck, pnpm --filter ccusage exec vitest run src/data-loader.ts src/debug.ts, pnpm --filter ccusage run build, pnpm run test, and JS/Rust token-total parity checks for daily/session/monthly/weekly/blocks.

* perf(ccusage): cache usage date formatting

Cache formatted usage dates per UTC hour while loading reports. The hot path previously created a Date and formatted through Intl for every usage line; daily, session, and block loading now reuse a per-command formatter/cache so repeated entries in the same hour do not pay that cost again.

Reliable runtime benchmark data is not included in this commit because process checks still showed cmux at roughly 150% CPU and Arc at roughly 60% CPU. The built bundle is 516.30 kB total after this change.

Verified with pnpm run format, pnpm typecheck, pnpm --filter ccusage exec vitest run src/data-loader.ts src/_date-utils.ts src/debug.ts, pnpm --filter ccusage run build, pnpm run test, JS/Rust token-total parity checks for daily/session/monthly/weekly/blocks, and timezone parity checks for UTC, Asia/Tokyo, America/New_York, and Europe/London.

* perf(ccusage): collapse fast parser null checks

Replace the repeated field-specific null scans in the usage-line fast parser with one unsupported-null regular expression. This preserves the same fallback for fields that the fast parser cannot safely treat as absent, while avoiding a dozen full-line includes checks on the common path.

Local data has many unrelated JSON nulls such as stop_sequence and stop_reason; the unsupported-null regex only matched speed:null in 283 lines, so normal content-bearing usage lines still stay on the fast path.

Reliable runtime benchmark data is not included in this commit because process checks still showed cmux around 180% CPU. The built bundle is 516.16 kB total after this change.

Verified with pnpm run format, pnpm typecheck, pnpm --filter ccusage exec vitest run src/data-loader.ts src/_date-utils.ts src/debug.ts, pnpm --filter ccusage run build, pnpm run test, JS/Rust token-total parity checks for daily/session/monthly/weekly/blocks, and timezone parity checks for UTC, Asia/Tokyo, America/New_York, and Europe/London.

* perf(ccusage): mutate aggregation accumulators

Avoid allocating a fresh totals object for every usage entry while aggregating reports. Model aggregation and report totals now mutate per-group accumulators, reducing allocation and GC pressure in daily, monthly, weekly, session, and blocks summaries.

Reliable runtime benchmark data is not included in this commit because process checks still showed cmux around 210% CPU. The built bundle is 515.99 kB total after this change.

Verified with pnpm run format, pnpm typecheck, pnpm --filter ccusage exec vitest run src/data-loader.ts src/_date-utils.ts src/debug.ts, pnpm --filter ccusage run build, pnpm run test, JS/Rust token-total parity checks for daily/session/monthly/weekly/blocks, and timezone parity checks for UTC, Asia/Tokyo, America/New_York, and Europe/London. Commit hook was skipped because lint-staged hit pnpm workspace-state JSON parsing errors after these checks had already passed manually.

* chore(ccusage): record latest noisy JS benchmark

No file changes. This records a directional benchmark after commits 8051465, 641f7a1, 335331e, 5e2fefd, and 99e1607.

Measured on May 12, 2026 with comma-provided hyperfine, Apple M3 Pro, built JavaScript CLI via node, Rust release binary, LOG_LEVEL=0, --offline --json, --warmup 1 --runs 3. Process checks still showed cmux around 180% CPU, so these are noisy and should not be treated as final release numbers.

daily: JS 3.247s ± 0.206s, Rust 515.2ms ± 17.4ms, Rust 6.30x faster.

session: JS 3.120s ± 0.020s, Rust 504.5ms ± 52.8ms, Rust 6.19x faster.

monthly: JS 3.115s ± 0.057s, Rust 551.7ms ± 14.8ms, Rust 5.65x faster.

blocks: JS 3.407s ± 0.018s, Rust 536.0ms ± 33.9ms, Rust 6.36x faster; hyperfine reported statistical outliers.

* perf(ccusage): summarize usage groups in one pass

Replace the separate per-group passes for model breakdowns, totals, and model lists with a single summarizeUsageEntries pass. This keeps Object.groupBy, which benchmarked better than a direct mutable group Map variant on the current workload, while avoiding repeated scans inside each daily and session group.

Benchmark: noisy local run with cmux around 185% CPU, hyperfine --warmup 1 --runs 3, built JS via node vs Rust release. daily: JS 3.090s ± 0.010s, Rust 582.0ms ± 22.1ms, Rust 5.31x faster. session: JS 3.413s ± 0.475s, Rust 614.3ms ± 20.8ms, Rust 5.56x faster. monthly: JS 3.106s ± 0.027s, Rust 618.5ms ± 45.6ms, Rust 5.02x faster. blocks: JS 3.525s ± 0.130s, Rust 633.4ms ± 19.4ms, Rust 5.57x faster.

Bundle size after build: total 515.18 kB, dist/index.js 128.62 kB, data-loader chunk 175.10 kB.

Validation: pnpm run format; pnpm typecheck; pnpm --filter ccusage exec vitest run src/data-loader.ts; pnpm run test; pnpm --filter ccusage run build; JS/Rust JSON parity for daily/session/monthly/weekly/blocks.

* perf(ccusage): reuse block usage timestamps

Avoid parsing each JSONL line timestamp before parsing usage data in loadSessionBlockData. Valid usage rows now reuse data.timestamp for file ordering and block entry construction, while non-usage and malformed usage lines still use the existing timestamp fallback.

Benchmark: noisy local blocks-only run with cmux around 175% CPU, hyperfine --warmup 1 --runs 5, built JS via node vs Rust release. blocks: JS 3.985s ± 0.153s, Rust 764.9ms ± 103.4ms, Rust 5.21x faster. Treat absolute numbers as noisy due system load.

Bundle size after build: total 515.58 kB, dist/index.js 128.62 kB, data-loader chunk 175.50 kB.

Validation: pnpm run format; pnpm typecheck; pnpm --filter ccusage exec vitest run src/data-loader.ts --testNamePattern loadSessionBlockData; pnpm run test; pnpm --filter ccusage run build; JS/Rust JSON parity for daily/session/monthly/weekly/blocks.

* perf(ccusage): avoid per-line promise allocation

Refs #984.

Keep JSONL line callbacks synchronous on the immediate-cost path and collect the rare deferred cost calculations per file. This avoids allocating an async callback promise for every parsed usage row while preserving entry order with sparse slots for deferred rows.

Noisy benchmark with cmux around 217% CPU after build: js daily 3.379s ± 0.295s, rust daily 441.7ms ± 13.9ms; js session 3.970s ± 0.962s, rust session 438.0ms ± 35.6ms; js blocks 3.344s ± 0.009s, rust blocks 496.7ms ± 32.6ms.

Validation: pnpm run format; pnpm typecheck; pnpm --filter ccusage exec vitest run src/data-loader.ts --testNamePattern loadDailyUsageData|loadSessionData|loadSessionBlockData|loadSessionUsageById; pnpm --filter ccusage run build; pnpm run test; built JS/Rust JSON token and cost parity matched for daily, session, monthly, weekly, and blocks.

* perf(ccusage): avoid trimming jsonl lines

Refs #984.

Replace per-line trim() checks in JSONL loading with a char-code whitespace scan. Normal usage rows no longer allocate a trimmed string before parsing, while blank and whitespace-only lines are still skipped.

Noisy benchmark with cmux around 142% CPU and Arc around 92% CPU after build: js daily 3.073s ± 0.012s, rust daily 467.0ms ± 83.9ms; js blocks 3.385s ± 0.008s, rust blocks 457.2ms ± 65.8ms.

Validation: pnpm run format; pnpm typecheck; pnpm --filter ccusage exec vitest run src/data-loader.ts --testNamePattern processJSONLFileByLine|loadDailyUsageData|loadSessionData|loadSessionBlockData; pnpm --filter ccusage run build; pnpm run test; built JS/Rust JSON token and cost parity matched for daily, session, monthly, weekly, and blocks.

* perf(ccusage): parse usage files in worker threads

Refs #984.

Use node worker_threads for built JavaScript usage loading when at least 64 JSONL files are present. The worker count follows os.availableParallelism() - 1 by default, respects --single-thread, and can be disabled or capped with CCUSAGE_JSONL_WORKER_THREADS. Source and vitest execution keep the existing single-process path.

Noisy no-worker comparison with cmux around 169% CPU: workers daily 2.038s ± 0.046s vs no-workers daily 3.114s ± 0.089s; workers session 2.098s ± 0.054s vs no-workers session 3.110s ± 0.009s; workers blocks 2.350s ± 0.161s vs no-workers blocks 3.378s ± 0.010s.

Noisy Rust comparison after build: workers daily 2.030s ± 0.065s vs rust daily 437.4ms ± 16.7ms; workers session 2.071s ± 0.030s vs rust session 464.9ms ± 11.5ms; workers blocks 2.215s ± 0.038s vs rust blocks 566.3ms ± 31.6ms. Rust remains about 4.6-5.1x faster.

Build size: pnpm --filter ccusage run build reports 520.18 kB total, with dist/index.js 128.62 kB and the data-loader chunk 180.11 kB.

Validation: pnpm run format; pnpm typecheck; pnpm --filter ccusage exec vitest run src/data-loader.ts --testNamePattern processJSONLFileByLine|loadDailyUsageData|loadSessionData|loadSessionBlockData; pnpm --filter ccusage run build; pnpm run test; built JS/Rust JSON token and cost parity matched for daily, session, monthly, weekly, and blocks.

* perf(ccusage): compact worker usage results

Refs #984.

Stop sending full UsageData objects from usage worker threads. Workers now return compact entries containing usage, model, cost, version, and dedupe metadata only, and the main thread performs replacement decisions from token totals and speed presence. This reduces structured clone payloads for built JavaScript worker loading.

Cap the default worker count at 4 while still considering os.availableParallelism() and file count. A short noisy worker-count sweep showed daily w4 1.810s ± 0.024s, w6 1.871s ± 0.072s, w8 1.915s ± 0.048s, default-before-cap 2.078s ± 0.089s; blocks w4 1.972s ± 0.021s, w6 2.117s ± 0.170s, w8 2.079s ± 0.076s, default-before-cap 2.175s ± 0.050s.

Latest noisy comparison after the cap with cmux around 169% CPU: js daily 2.091s ± 0.064s, no-workers daily 3.073s ± 0.013s, rust daily 402.5ms ± 110.9ms; js session 2.383s ± 0.253s, rust session 561.0ms ± 71.5ms; js blocks 2.281s ± 0.094s, no-workers blocks 3.783s ± 0.267s, rust blocks 579.5ms ± 34.8ms.

Build size: pnpm --filter ccusage run build reports 520.66 kB total, with dist/index.js 128.62 kB and the data-loader chunk 180.59 kB.

Validation: pnpm run format; pnpm typecheck; pnpm --filter ccusage exec vitest run src/data-loader.ts --testNamePattern processJSONLFileByLine|loadDailyUsageData|loadSessionData|loadSessionBlockData; pnpm --filter ccusage run build; pnpm run test; built JS/Rust JSON token and cost parity matched for daily, session, monthly, weekly, and blocks.

* perf(ccusage): buffer larger jsonl files

Raise the buffered JSONL read threshold from 32 MiB to 128 MiB so large local Claude logs avoid the readline streaming path. CPU profiling showed the previous threshold pushed the largest files through string_decoder/readline newline regex work, while the Rust implementation reads these files whole.

Local data shape:

- 3 JSONL files exceed 32 MiB: 87.0 MiB, 69.5 MiB, and 43.9 MiB

- total local JSONL corpus: 3124 files, 1.2G

Validation:

- pnpm run format

- pnpm typecheck

- pnpm --filter ccusage exec vitest run src/data-loader.ts --testNamePattern processJSONLFileByLine|loadDailyUsageData|loadSessionData|loadSessionBlockData

- pnpm --filter ccusage run build

- pnpm run test

- JS/Rust JSON token and cost parity: daily, session, monthly, weekly, blocks

Benchmarks (noisy; cmux ~173% CPU):

- JS daily: 1.434s ± 0.050s

- JS daily no-worker: 2.887s ± 0.008s

- Rust daily: 396.4ms ± 21.8ms

- JS session: 1.537s ± 0.038s

- JS session no-worker: 2.942s ± 0.006s

- Rust session: 387.3ms ± 23.3ms

- JS blocks: 1.705s ± 0.138s

- JS blocks no-worker: 3.190s ± 0.015s

- Rust blocks: 392.0ms ± 14.6ms

* perf(ccusage): skip null fallback regex on normal lines

CPU profiling showed the unsupported-null regular expression still ran for every fast-parser candidate. Guard it with a cheap ':null' substring check so normal usage rows stay on simple string scans and only possible null-bearing rows pay for the regex fallback check.

Validation:

- pnpm run format

- pnpm typecheck

- pnpm --filter ccusage exec vitest run src/data-loader.ts --testNamePattern processJSONLFileByLine|loadDailyUsageData|loadSessionData|loadSessionBlockData|parseUsageDataLine

- pnpm --filter ccusage run build

- pnpm run test

- JS/Rust JSON token and cost parity: daily, session, monthly, weekly, blocks

Benchmarks (noisy; cmux ~178% CPU):

- JS daily: 1.396s ± 0.051s

- JS session: 1.478s ± 0.059s

- JS blocks: 1.466s ± 0.002s

- Rust daily: 353.0ms ± 6.8ms

- Rust session: 367.8ms ± 30.3ms

- Rust blocks: 418.6ms ± 48.0ms

* perf(internal): cache model pricing lookups

Cache successful and missing LiteLLM model pricing lookups by model name. The ccusage hot path calculates token cost for every row when costUSD is absent, so this avoids rebuilding provider-prefix candidates and rescanning the pricing map for repeated models.

Validation: pnpm run format; pnpm typecheck; pnpm --filter @ccusage/internal exec vitest run src/pricing.ts; pnpm --filter ccusage exec vitest run src/data-loader.ts --testNamePattern loadDailyUsageData|loadSessionData|loadSessionBlockData|processJSONLFileByLine|loadSessionUsageById|calculateCostForEntry; pnpm --filter ccusage run build; pnpm run test. JS/Rust parity matched for daily/session/monthly/weekly/blocks token and cost fields.

A/B benchmark with worker-enabled built dist paths, local Claude data, LOG_LEVEL=0, --offline --json, hyperfine --warmup 1 -r 5: daily 1.369s ± 0.070s -> 1.291s ± 0.005s; session 1.414s ± 0.053s -> 1.379s ± 0.017s; blocks 1.547s ± 0.093s -> 1.430s ± 0.021s. Bundle total 520.69 kB -> 521.06 kB.

* build(ccusage): minify bundled CLI output

Switch tsdown from dce-only minification to full minification for the ccusage bundled CLI. This keeps the runtime code path unchanged while substantially reducing the distributed JavaScript payload.

Bundle size: pnpm --filter ccusage run build total 521.06 kB -> 350.81 kB. Main entry 128.62 kB -> 68.14 kB; data-loader chunk 180.98 kB -> 108.98 kB.

Benchmark note: hyperfine A/B was run with both base and minified builds under heavy Chrome/Arc system load, using paths that preserve the /dist/ worker-thread gate. Results were noisy but did not show a stable regression: daily 4.902s -> 4.794s, session 8.933s noisy/outlier -> 4.542s, blocks 4.586s -> 4.680s.

Validation: pnpm run format; pnpm typecheck; pnpm run test; pnpm --filter ccusage run build; JS/Rust token and cost parity matched for daily/session/monthly/weekly/blocks in offline mode.

* perf(ccusage): scan nullable fields without regex

Replace the unsupported-null fast parser regex with a targeted scan over :null occurrences. This avoids running a large alternation regexp over every line that contains null while preserving the fallback behaviour for nullable fields that the fast path cannot represent.

On the local Claude JSONL dataset, the isolated null-field check improved from regex median 615.557ms to scan median 370.318ms over 171,025 :null lines. The built JS bundle remains about 351.08 kB total.

Validated with pnpm run format, pnpm typecheck, pnpm run test, pnpm --filter ccusage run build, and JS/Rust token and cost parity for daily, session, monthly, weekly, and blocks.

* perf(ccusage): decode buffered jsonl reads explicitly

Read buffered JSONL files as Buffers and decode with Buffer.toString instead of passing an encoding to readFile. Profiling showed UTF-8 decode in the file loading path, and the explicit Buffer decode was faster on the local Claude corpus.

On 3,124 local JSONL files totaling 1,275,246,819 decoded bytes, read/decode median time improved from 2,722.5ms with readFile(path, utf8) to 2,487.8ms with readFile(path).toString(utf8). The built JS bundle remains about 351.09 kB total.

Validated with pnpm run format, pnpm typecheck, pnpm run test, pnpm --filter ccusage run build, and JS/Rust token and cost parity for daily, session, monthly, weekly, and blocks.

* perf(ccusage): reuse file handles for buffered jsonl reads

Open each JSONL file once for the buffered path, use the file handle for stat and readFile, then close it before falling back to the streaming path for very large files. This avoids separate path stat and read opens for the common buffered case.

On 3,124 local JSONL files totaling 1,275,246,819 decoded bytes, stat plus readFile median time was 2,484.9ms; open plus fstat plus handle readFile was 2,338.8ms. The built JS bundle is about 351.16 kB total.

Validated with pnpm run format, pnpm typecheck, pnpm run test, pnpm --filter ccusage run build, and JS/Rust token and cost parity for daily, session, monthly, weekly, and blocks.

* perf(ccusage): avoid locale-aware path sorting

Use a direct string comparator for deterministic JSONL path ordering instead of localeCompare. The paths are filesystem paths and do not need locale-aware collation, while localeCompare is noticeably slower in this hot setup step.

Local mitata microbench over 3,124 JSONL paths:

- direct string sort median: 0.4178ms

- localeCompare sort median: 2.1270ms

Validated with ccusage data-loader tests, typecheck, full tests, bundled build, and JS/Rust parity checks for daily/session/monthly/weekly/blocks.

* perf(ccusage): parse block timestamps without Date.parse

Add a fixed-format ISO timestamp parser for the hot block-loading path and reuse the parsed Date for both file ordering and block entries. This avoids parsing the same usage timestamp twice with new Date(string) while preserving the fallback for unexpected timestamp strings.

Local mitata microbench on Node 24.15.0:

- new Date(iso).getTime: 287.41ns avg

- Date.parse: 261.23ns avg

- fixed parser: 161.55ns avg

- new Date(fixed parser): 224.86ns avg

Bundle size:

- before: 351.15 kB after cc3f7e3

- after: 352.28 kB

Validation:

- pnpm run format

- pnpm typecheck

- pnpm run test

- pnpm --filter ccusage exec vitest run src/data-loader.ts

- pnpm --filter ccusage run build

- JS/Rust token and cost parity for daily/session/monthly/weekly/blocks

* perf(ccusage): inline usage metadata on hot paths

Avoid creating a temporary metadata object and spreading it into every daily, session, and block usage row. The collector now writes uniqueHash, tokenTotal, and hasSpeed directly into the result object, reducing per-row allocation work on the immediate-cost path.

Local mitata microbench on Node 24.15.0:

- metadata object + spread: 118.08ns avg

- inline metadata fields: 5.41ns avg

Bundle size:

- before: 352.28 kB after d981f27

- after: 352.50 kB

Validation:

- pnpm run format

- pnpm typecheck

- pnpm run test

- pnpm --filter ccusage exec vitest run src/data-loader.ts --testNamePattern loadDailyUsageData|loadSessionData|loadSessionBlockData|keeps most complete|duplicate

- pnpm --filter ccusage run build

- JS/Rust token and cost parity for daily/session/monthly/weekly/blocks

* perf(ccusage): read timestamps from line prefix

Use the first top-level timestamp field for JSONL timestamp extraction instead of scanning each line from the end. Claude JSONL rows place timestamp near the start, so this avoids traversing large message content when sorting files and fast-parsing usage lines.

Local mitata microbench on a content-heavy JSONL row:

- first indexOf timestamp extraction: 63.01ns avg

- lastIndexOf timestamp extraction: 1.77us avg

Bundle size remains 352.50 kB after this commit.

Validation:

- pnpm run format

- pnpm typecheck

- pnpm run test

- pnpm --filter ccusage exec vitest run src/data-loader.ts --testNamePattern extracts|loadDailyUsageData|loadSessionBlockData|processJSONLFileByLine

- pnpm --filter ccusage run build

- JS/Rust token and cost parity for daily/session/monthly/weekly/blocks

* perf(ccusage): avoid reverse scans for early fields

Read top-level version and sessionId with forward string searches in the fast JSONL parser. These fields are emitted near the front of Claude usage rows, while reverse scans traverse large message content before finding them.

This follows the timestamp extraction result from ec02d07, where a content-heavy row measured first indexOf at 63.01ns versus lastIndexOf at 1.77us.

Bundle size remains 352.50 kB after this commit.

Validation:

- pnpm run format

- pnpm typecheck

- pnpm run test

- pnpm --filter ccusage exec vitest run src/data-loader.ts --testNamePattern loadDailyUsageData|loadSessionData|loadSessionBlockData|processJSONLFileByLine

- pnpm --filter ccusage run build

- JS/Rust token and cost parity for daily/session/monthly/weekly/blocks

* perf(ccusage): parse hot fields without allocations

Use fixed JSON field markers in the JSONL fast parser and parse usage token counts as unsigned integers without slicing through Number(...). This targets the remaining parser hotspots from the single-thread CPU profile after 5b9c2b3, where extractStringField/extractNumberField and Buffer decoding dominated blocks --offline.

Mitata microbenchmarks on Node 24.15.0:

- fixed string marker: 59.47 ns/iter vs dynamic marker: 157.24 ns/iter (2.64x faster)

- manual unsigned integer token parse: 29.78 ns/iter vs Number(slice): 102.48 ns/iter (3.44x faster)

Build size:

- ccusage bundle total: 352.68 kB (data-loader chunk 110.85 kB)

Validation:

- pnpm run format

- pnpm typecheck

- pnpm run test (27 files, 366 tests)

- pnpm --filter ccusage run build

- JS/Rust JSON totals match for daily, session, monthly, weekly, and blocks

* build(ccusage): drop test-only side effects from bundle

Tell rolldown/tsdown that modules can be treated as side-effect free during treeshaking. The ccusage build already defines import.meta.vitest as undefined, so this lets production builds remove test-only imports such as fs-fixture instead of bundling them into the CLI output.

Bundle size:

- before: 352.68 kB total, index 68.14 kB, data-loader chunk 110.85 kB

- after: 334.12 kB total, index 52.76 kB, data-loader chunk 108.53 kB

- reduction: 18.56 kB total

Validation:

- dist no longer contains fs-fixture/FsFixture/createFixture

- node apps/ccusage/dist/index.js --version

- node apps/ccusage/dist/index.js daily --offline --json

- pnpm run format

- pnpm typecheck

- pnpm run test (27 files, 366 tests)

- pnpm --filter ccusage run build

- JS/Rust JSON totals match for daily, session, monthly, weekly, and blocks

* perf(ccusage): parse cost numbers without slicing

Replace the remaining costUSD Number(slice) hot path with a manual JSON number parser. The profile still showed cost extraction spending time allocating substrings before number conversion, so this keeps the parser on charCodeAt arithmetic for signed, fractional, and exponent forms.

Mitata microbench on Node 24.15.0 improved this path from 171.41 ns/iter with Number(slice) to 107.18 ns/iter with the manual parser, about 1.6x faster. The bundled ccusage output is 334.47 kB total, 0.35 kB larger than 2cabd5c due to the extra parser code.

Validated with pnpm run format, pnpm typecheck, pnpm run test, pnpm --filter ccusage run build, and JS/Rust parity for daily, session, monthly, weekly, and blocks outputs.

* perf(ccusage): scale JSONL workers with available cores

Increase the default built-dist JSONL worker count from a fixed cap of 4 to the available core count capped at 8. The existing CCUSAGE_JSONL_WORKER_THREADS override still allows explicit tuning above or below this default.

This follows the Rust and Zig direction of using machine parallelism while keeping a conservative ceiling for JavaScript worker startup and structured clone overhead.

Also replace the daily/session Object.groupBy pass with one-pass summary accumulators so grouping no longer allocates per-group entry arrays after deduplication.

Benchmarks on this machine, with concurrent Zig/cmux load present:

- bun -b daily --offline --json, previous 4-worker default: 0.74-0.82s after warmup

- bun -b daily --offline --json, core-scaled default: 0.68-0.71s after warmup

- --single-thread remained much slower at about 1.7-2.0s, confirming that parallel JSONL parsing is still required.

* perf(terminal): use Bun string width when available

Add a small text-width helper that keeps Node on string-width and lets Bun use Bun.stringWidth for common ANSI and emoji-aware width checks.

The helper explicitly keeps cursor save/restore sequences on the existing string-width fallback because Bun.stringWidth counts those control sequences differently from the current drawEmoji expectations.

Benchmarks did not show stripANSI as a meaningful win, so this commit intentionally avoids adding a Bun.stripANSI path.

Validation:

- pnpm run format

- pnpm typecheck

- pnpm run test

- pnpm --filter ccusage run build

- Node and bun -b daily --offline outputs matched with and without FORCE_COLOR=1.

* perf(ccusage): avoid reverse scans in usage parser

Mention: #984

The fast JSONL parser no longer uses lastIndexOf for requestId and costUSD. Local profiles showed costUSD is usually absent in the current corpus, so the old reverse scan walked every usage line from the end before falling back to undefined. The parser now searches forward from the usage object, which keeps the same parsed output while avoiding that full reverse scan. Unsupported null-field detection also avoids lastIndexOf plus substring allocation by comparing the field name in place.

Benchmarks on Apple M3 Pro, LOG_LEVEL=0, built ccusage dist, --offline --json, hyperfine --warmup 1 --runs 5:

- Bun daily default workers: 579.6ms ± 11.4ms; previous 4-worker override: 631.0ms ± 4.3ms; workers disabled: 1.074s ± 0.010s.

- Bun vs Rust: daily 582.0ms ± 17.1ms vs Rust 361.8ms ± 9.0ms; session 628.2ms ± 54.2ms vs Rust 367.2ms ± 4.4ms; blocks 641.5ms ± 14.8ms vs Rust 365.1ms ± 6.2ms.

- Node daily built dist: 965.7ms ± 43.6ms; Bun daily built dist: 627.6ms ± 62.7ms in the same run.

Validation: pnpm run format; pnpm typecheck; pnpm run test; pnpm --filter ccusage run build; Node/Bun daily JSON output matched; 4-worker/8-worker daily JSON output matched.

* perf(ccusage): read buffered JSONL as utf8 on Bun

Mention: #984

Bun is faster when node:fs/promises readFile decodes directly to utf8, while Node previously measured faster with the existing file-handle Buffer path. processJSONLFileByLine now keeps the Node path unchanged and uses stat(path) plus readFile(path, "utf8") only when running under Bun. The buffered line loop is factored into a shared helper so both paths keep the same line handling.

Microbench on 3,125 local JSONL files under Bun: file-handle Buffer.toString path was about 386-429ms, file-handle utf8 was about 134-259ms, and stat(path)+readFile(path, utf8) was about 113-125ms after warmup.

Benchmarks on Apple M3 Pro, LOG_LEVEL=0, built ccusage dist, --offline --json:

- daily, hyperfine --warmup 2 --runs 7: Bun 551.7ms ± 11.8ms; Rust #977 367.6ms ± 12.9ms; Node 1.011s ± 0.066s.

- session, hyperfine --warmup 2 --runs 5: Bun 578.3ms ± 13.0ms; Rust #977 367.4ms ± 9.3ms.

- blocks, hyperfine --warmup 2 --runs 5: Bun 610.5ms ± 4.0ms; Rust #977 379.9ms ± 9.6ms.

Validation: pnpm run format; pnpm typecheck; pnpm run test; pnpm --filter ccusage run build; Node/Bun daily JSON output matched; 4-worker/8-worker daily JSON output matched.

* build(ccusage): publish only bundled CLI entry

Remove the ccusage subpath exports and generated declarations so the package no longer publishes library-style entry points. The bundle still keeps src/data-loader.ts as a dist entry because JSONL worker threads call new Worker(new URL(import.meta.url)); folding that module into index.js makes --offline recursively start the CLI and hang.

Move the MCP package off ccusage/data-loader and ccusage/logger by keeping the small path discovery and LoadOptions surface locally. This lets the CLI package publish only the executable entry while MCP continues to build independently.

Benchmark notes (daily --offline --json, 7 runs, 2026-05-12): minified Bun 932.7 ms ± 61.3 ms, non-minified Bun 953.6 ms ± 30.0 ms, minified Node 2.621 s ± 0.032 s, non-minified Node 2.541 s ± 0.062 s. JSON output matched across minified/non-minified Bun and Node with totalTokens=9990118730. Bundle size is 251.96 kB minified versus 458.50 kB non-minified.

* build(ccusage): keep CLI bundle unminified

Keep the bundled CLI output readable because minification makes runtime errors harder to diagnose and the measured speed difference is not meaningful enough to justify that trade-off.

Benchmark notes from the previous minify comparison (daily --offline --json, 7 runs, 2026-05-12): minified Bun 932.7 ms ± 61.3 ms, non-minified Bun 953.6 ms ± 30.0 ms, minified Node 2.621 s ± 0.032 s, non-minified Node 2.541 s ± 0.062 s. JSON output matched across minified/non-minified Bun and Node with totalTokens=9990118730. The unminified build is 458.50 kB.

* build(ccusage): stop regenerating worker subpath export

Disable tsdown package export generation for the ccusage CLI build so package.json stays limited to the executable entry and package metadata. The data-loader bundle remains emitted as an internal worker target, but it is no longer exposed as ccusage/data-loader after builds.

Verification: pnpm run format, pnpm typecheck, pnpm run test, pnpm --filter ccusage run build, and built Bun daily --offline --json smoke check with totalTokens=9990118730.

* perf(ccusage): dedupe entries before worker merge

Drop duplicate usage entries inside each collected file before returning worker results. The local Claude corpus currently has 153,777 usage rows, 79,880 global unique hashes, and 70,385 duplicate hashes within a single file, so this avoids sending a large number of entries through worker structured clone only to discard them again on the main thread.

The helper uses the existing replacement rule, preferring higher token totals and then entries with speed metadata, so the final cross-file dedupe semantics stay the same.

Benchmark notes (Apple M3 Pro, built Bun dist, LOG_LEVEL=0, --offline --json, hyperfine --warmup 2 --runs 5): daily 542.9 ms ± 5.7 ms, session 562.8 ms ± 19.4 ms, blocks 697.7 ms ± 45.1 ms. Daily JS/Rust comparison after the change: JS 561.8 ms ± 25.9 ms, Rust 370.0 ms ± 9.3 ms, Rust 1.52x faster. JS/Rust token totals matched for daily and session; block totalTokens sum matched with 322 blocks on both.

Verification: pnpm --filter ccusage exec vitest run src/data-loader.ts --testNamePattern deduplication|loadDailyUsageData|loadSessionData|loadSessionBlockData; pnpm run format; pnpm typecheck; pnpm run test; pnpm --filter ccusage run build.

* perf(ccusage): avoid sorting worker result groups

Collect worker results into their known file indexes directly instead of flattening, sorting, and mapping the response groups after all workers finish. The caller already creates stable item indexes before chunking, so the ordered result array can be filled without an O(n log n) sort or extra intermediate arrays.

Benchmark notes (Apple M3 Pro, built Bun dist, LOG_LEVEL=0, --offline --json, hyperfine --warmup 2 --runs 7, /tmp/*/dist paths so workers stay enabled): daily base 673.1 ms ± 50.8 ms, daily direct merge 654.6 ms ± 12.7 ms; session base 727.8 ms ± 39.3 ms, session direct merge 686.7 ms ± 38.3 ms. JSON totals matched at totalTokens=9994791958.

Verification: pnpm --filter ccusage exec vitest run src/data-loader.ts --testNamePattern deduplication|loadDailyUsageData|loadSessionData|loadSessionBlockData; pnpm typecheck; pnpm run format; pnpm run test; pnpm --filter ccusage run build; built JS/Rust daily output total check.

* perf(ccusage): calculate usage costs synchronously

Load LiteLLM pricing once per worker and calculate missing costUSD values synchronously instead of queuing one Promise per usage row. On the local corpus every usage row lacks costUSD, so the old path created about 163k per-entry cost promises before aggregation.

Post-Arc benchmarks (built dist, daily --offline --json): JS/Bun 539.8ms ± 16.5ms, Rust 357.9ms ± 11.6ms, so Rust is 1.51x faster. A direct main comparison measured main JS at 10.154s ± 0.157s and this branch at 510.0ms ± 6.5ms, so this branch is 19.91x faster than main for daily. Node on the built dist measured 892.5ms ± 37.9ms vs Bun 556.6ms ± 48.1ms.

Validation: Bun, Node, and Rust daily totals all match at 9,994,791,958 tokens and $11,068.4726364. Main currently reports 9,986,807,066 tokens and $10,692.2878964 because this branch includes the earlier dedupe correction already documented in this PR. Daily and session JSON output also matched the previous perf-branch output byte-for-byte; blocks matched previous aggregate count/tokens/cost.

* perf(ccusage): use Bun file reads for JSONL buffers

Use Bun.file(path).text() for the Bun buffered JSONL read path while keeping the existing fs/open path for Node. This lets Bun avoid the fs.promises stat/readFile path for files below the 128 MiB buffering limit.

Local read microbench over 3,127 JSONL files / 1.276 GB: fs.stat + readFile(utf8) warm runs 611.6ms and 632.3ms; Bun.file().text() warm runs 483.6ms and 452.6ms.

Worker-enabled built-dist A/B with LOG_LEVEL=0 daily --offline --json, hyperfine --warmup 3 --runs 10: old fs path 582.1ms +/- 32.0ms; Bun.file path 568.7ms +/- 16.4ms, 1.02x faster. JSON output matched byte-for-byte.

* perf(ccusage): tune default JSONL worker count

Choose the default worker count from half of os.availableParallelism(), rounded up, while preserving the existing hard cap and CCUSAGE_JSONL_WORKER_THREADS override. On this Apple M3 Pro host availableParallelism() reports 11, so the default becomes 6 workers instead of 8.

Latest built Bun daily --offline --json sweep, LOG_LEVEL=0, hyperfine --warmup 2 --runs 7: 4 workers 570.3ms +/- 4.7ms; 6 workers 512.5ms +/- 5.4ms; 8 workers 529.8ms +/- 13.4ms; 10 workers 558.1ms +/- 27.6ms; 11 workers 528.0ms +/- 11.2ms; 12 workers 531.8ms +/- 33.5ms.

Session and blocks check: session w6 536.2ms +/- 26.3ms vs w8 538.5ms +/- 15.1ms; blocks w6 587.2ms +/- 9.7ms vs w8 615.4ms +/- 16.6ms. Final daily comparison after the change: default 506.8ms +/- 13.3ms, forced w8 524.1ms +/- 16.7ms, Rust 338.7ms +/- 9.2ms. JS/Rust totals matched and normalized non-cost fields matched.

* perf(ccusage): flatten usage entries sent from workers

Store daily/session token fields directly on worker result entries instead of keeping the nested usage object. This reduces structured-clone payload shape and lets the main-thread summary aggregation add token fields directly.

Built Bun A/B with worker-enabled dist paths, LOG_LEVEL=0, --offline --json, hyperfine --warmup 3 --runs 10: daily old 560.9ms +/- 36.4ms, flattened 537.1ms +/- 31.9ms; session old 578.4ms +/- 27.4ms, flattened 543.3ms +/- 11.4ms. Daily and session JSON outputs matched byte-for-byte.

Validation: pnpm run format, pnpm typecheck, pnpm run test (27 files, 349 passed, 1 skipped), pnpm --filter ccusage exec vitest run src/data-loader.ts --testNamePattern loadDailyUsageData|loadSessionData|loadSessionBlockData|deduplication, and pnpm --filter ccusage run build.

* perf(ccusage): balance usage workers by file size

Distribute daily and session JSONL worker chunks by file size instead of plain round-robin file count. This keeps output ordering intact by preserving item indexes while reducing long-tail worker imbalance on uneven Claude project logs.

Benchmarks on a fixed /tmp Claude data snapshot, Bun built dist with --offline --json, hyperfine --warmup 3 --runs 12:

- daily old: 1.368s ± 0.350s; new: 1.245s ± 0.245s (~1.10x faster, noisy due system load)

- session old: 1.217s ± 0.102s; new: 1.119s ± 0.032s (~1.09x faster)

- blocks old: 1.293s ± 0.035s; new: 1.286s ± 0.043s (effectively unchanged; size balancing is only enabled for daily/session)

Validation:

- Fixed snapshot output parity: daily/session byte-for-byte identical; blocks identical after removing time-dependent projection

- pnpm run format

- pnpm typecheck

- pnpm run test

* perf(ccusage): speed up usage table rendering

Add a usage-line-only JSONL scan path for daily/session loading so those commands skip callback overhead for non-usage lines while preserving blocks timestamp scanning.

Add a fixed-width fast path for non-compact usage tables to avoid cli-table3 layout work in the common table view, cache number/model formatting, and remove the now-unused @ccusage/terminal es-toolkit dependency.

Benchmarks on fixed snapshot (CLAUDE_CONFIG_DIR=/tmp/ccusage-data-snapshot/config, LOG_LEVEL=0, COLUMNS=200, bun -b, hyperfine --warmup 3 --runs 6): daily table 563.3ms -> 493.0ms, session table 870.3ms -> 519.9ms, blocks table 667.4ms -> 590.6ms.

Parity: daily/session JSON byte-for-byte identical; blocks JSON stable fields identical after removing time-dependent projection fields; session table row/byte counts preserved while fast table reduces redundant ANSI styling.

* perf(ccusage): speed up table width calculation

Add an ASCII/SGR fast path before falling back to Bun.stringWidth or string-width. The usage tables mostly render ASCII dates, numbers, model names, and simple ANSI colour sequences, so this avoids the heavier Unicode width path for the common case while preserving the existing fallback for emoji and cursor-state escape sequences.

Reuse the measured width from truncate checks when padding cells so table rendering does not calculate the same cell width twice.

Benchmarked against the same dist build before this change on the fixed snapshot with COLUMNS=200, LOG_LEVEL=0, and bun -b:

- daily table: 500.3 ms -> 491.6 ms

- session table: 530.9 ms -> 525.9 ms

- blocks table: 600.5 ms -> 593.7 ms

Output parity: daily/session/blocks table output matched byte-for-byte between baseline and this change.

* chore: update runtime version metadata

Relax package Node engines from >=22.11.0 to >=22.0.0. fs.promises.glob exists from Node 22.0.0, and the CLIs should continue to allow Node 22, 24, and newer supported releases.

Update pnpm-managed runtime hints to Node ^24.15.0 and Bun ^1.3.13, and refresh flake.lock to the latest nixpkgs-unstable input available in this environment.

Checked fs.promises.glob as a possible replacement for the current JSONL walker, but did not adopt it. On the fixed snapshot, the current manual walker found 3,127 JSONL files with a median of 14.1ms, while fs.promises.glob found the same files with a median of 84.6ms.

Validation: pnpm run format, pnpm typecheck, pnpm run test, pnpm --filter ccusage build.

* perf(ccusage): remove cli-table rendering dependency

Replace the remaining compact-mode cli-table3 rendering path with the existing in-house fast table renderer. Compact cells now use a small word-wrap helper so narrow terminal output keeps the same displayed text after stripping ANSI, while wide output remains byte-for-byte identical.

This removes cli-table3 from @ccusage/terminal and updates pnpm-lock.yaml via pnpm install. The ccusage build size dropped from 467.68 kB to 405.37 kB, about 62.31 kB smaller.

Fixed snapshot parity with CLAUDE_CONFIG_DIR=/tmp/ccusage-data-snapshot/config, LOG_LEVEL=0: COLUMNS=200 daily/session/blocks table output matched byte-for-byte; COLUMNS=80 and COLUMNS=60 compact output matched after stripping ANSI; blocks also matched byte-for-byte in compact mode.

Built Bun A/B with COLUMNS=200, bun -b, hyperfine --warmup 3 --runs 8:

- daily --offline --json: 512.5 ms -> 478.2 ms

- daily --offline table: 490.5 ms -> 483.7 ms

- session --offline table: 515.4 ms -> 508.2 ms

Validation: pnpm install, pnpm run format, pnpm typecheck, pnpm run test, pnpm --filter ccusage build.

* perf(ccusage): remove small sorting helpers from runtime

Replace fast-sort with native sorting in the ccusage data path and replace @antfu/utils toArray with a local helper. This removes two runtime dependencies from the ccusage package while preserving output parity against the current baseline.

Benchmarks: fixed snapshot CLAUDE_CONFIG_DIR=/tmp/ccusage-data-snapshot/config, LOG_LEVEL=0, COLUMNS=200, bun -b dist/index.js, hyperfine warmup 4. JSON: daily 496.6ms -> 493.5ms, session 523.2ms -> 523.1ms, blocks 600.0ms -> 598.8ms. Table: daily 494.5ms -> 495.4ms, session 545.2ms -> 531.7ms with baseline outlier noise, blocks 606.2ms -> 610.9ms. Build size: 405.37 kB -> 401.87 kB.

Validation: pnpm install; pnpm run format; pnpm typecheck; pnpm run test; pnpm --filter ccusage build.

* test(ccusage): cover dependency-free helper behavior

Add regression coverage for the native sort replacement by asserting equal-date rows keep their original order in both ascending and descending sorts. Also cover the local toArray helper that replaced the previous @antfu/utils import.

Fast-sort check: a Bun microbench over ISO-date rows showed native timestamp precompute sorting at 1.95ms for 10k rows, compared with fast-sort using Date objects at 36.44ms and fast-sort using Date.parse at 20.60ms. This supports keeping the dependency-free sort path for the hot ccusage summaries.

Validation: pnpm --filter ccusage exec vitest run src/_date-utils.ts src/_utils.ts; pnpm run format; pnpm typecheck; pnpm run test.

* chore(ccusage): keep antfu toArray helper

Restore @antfu/utils for toArray after checking the package implementation. The helper handles nullish values as [] and is tree-shakeable with sideEffects=false, so replacing it locally was not a meaningful speed win and slightly reduced maintainability.

The fast-sort removal remains separate: the old usage created Date objects in the selector, and a Bun microbench showed native timestamp-precompute sorting at 1.95ms for 10k rows versus fast-sort with Date objects at 36.44ms.

byethrow check: Result.unwrap is slower in a tight loop, but the normal daily/session/blocks hot path does not call it per JSONL row; it is mostly used around pricing load, config parsing, jq, and statusline error boundaries. await using is also mostly test fixtures, plus a handful of per-command PricingFetcher/semaphore scopes, not per-row work.

Validation: pnpm install; pnpm --filter ccusage build; pnpm run format; pnpm typecheck; pnpm run test.

* perf(ccusage): avoid date formatter work on hot paths

Keep timezone support intact while moving the common default/UTC date formatting paths away from Intl.DateTimeFormat. formatDate and cached usage-date formatting now use Date local/UTC getters for those cases, and compact date rendering avoids Intl for simple YYYY-MM-DD/default/UTC inputs.

Blocks also now carries the earliest file timestamp as milliseconds so file-result sorting does not repeatedly call Date#getTime in the comparator. A Bun CPU profile before this change showed blocks spending most sampled time in getTime under sort; the timestamp is now precomputed once per file result.

Correctness: fixed snapshot outputs matched byte-for-byte against the previous build for daily/session/blocks/monthly/weekly JSON and daily/session/blocks table across default timezone, UTC, Asia/Tokyo, America/New_York, and Europe/London.

Benchmarks on real local Claude data, LOG_LEVEL=0, COLUMNS=200, bun -b apps/ccusage/dist/index.js, hyperfine warmup 3 runs 8: daily json 479ms +/- 7ms, daily table 481ms +/- 7ms, session json 505ms +/- 16ms, session table 511ms +/- 5ms, blocks json 567ms +/- 10ms, blocks table 571ms +/- 9ms. Fixed snapshot current build: daily json 582ms +/- 77ms noisy, daily table 490ms +/- 6ms, session json 519ms +/- 16ms, session table 514ms +/- 7ms, blocks json 585ms +/- 8ms, blocks table 589ms +/- 9ms.

Validation: targeted terminal formatDateCompact tests; targeted ccusage formatDate/loadSessionBlockData/sortFilesByTimestamp tests; pnpm --filter ccusage build; pnpm run format; pnpm typecheck; pnpm run test.

* perf(ccusage): prefer bun runtime from package launcher

Add a tiny published launcher for #984. The package bin now points at dist/cli.js, scans PATH directly for bun without invoking which, and runs main.bun.js when Bun is available or main.node.js otherwise. The existing dist/index.js entry remains and delegates to the shared main() function.

Fix online worker pricing fetch behavior by loading LiteLLM pricing once in the main thread and passing that snapshot to usage workers. This removes the duplicated online logs and repeated remote pricing fetches seen with pnpm dlx https://pkg.pr.new/ryoppippi/ccusage@984. Verified LOG_LEVEL=3 online daily now prints one Fetching latest model pricing warning and one Loaded pricing line.

Keep offline worker pricing local to workers because passing the offline pricing snapshot through structured clone was slower in testing. Offline snapshot-clone experiment measured daily --offline --json at 493.0ms and blocks --offline --json at 572.5ms, compared with the online-only snapshot path at about 486.4ms and 564.5ms.

Tighten table and bucket aggregation hot paths. ResponsiveTable now avoids building a full stringified allRows matrix for width calculation and reuses the already filtered data rows for rendering. Monthly/weekly bucket aggregation now uses one Map pass instead of Object.groupBy plus flatMap and a second model Set pass.

Remove stale generated API documentation wiring because ccusage is now documented as a bundled CLI with --json as the programmatic interface. Drop docs TypeDoc scripts/dependencies and update docs guidance. Also update tsdown/tsgo-related catalog packages and keep @types/bun declared where tsconfig references it.

Benchmarks on real local Claude data with LOG_LEVEL=0, COLUMNS=200, built Bun entry: daily --offline --json 486.4ms ± 6.9ms, daily table 491.6ms ± 18.8ms, monthly --offline --json 489.5ms ± 29.7ms, monthly table 483.5ms ± 6.7ms, blocks --offline --json 564.5ms ± 7.1ms, blocks table 570.5ms ± 10.7ms. Online daily with workers was 528.1ms ± 5.8ms; disabling workers was 1.099s ± 0.001s.

Validation: pnpm run format; pnpm typecheck; pnpm run test; pnpm --filter ccusage exec vitest run src/data-loader.ts ../../packages/terminal/src/table.ts; pnpm --filter ccusage build; pnpm --filter @ccusage/docs build; JSON parity for daily/session/blocks/monthly/weekly between Bun entry and launcher; table parity for daily/session/blocks between Bun entry and launcher.

* perf(ccusage): scan buffered usage lines forward

Avoid the reverse newline scan in buffered JSONL usage loading. The previous path found each usage marker and then walked backward with lastIndexOf to find the line start. The new path keeps a forward line cursor and advances newline positions until the current usage marker is inside the current line, so large content-bearing rows do not pay a repeated backward scan.

A/B benchmark against the previous PR build copied to /tmp/ccusage-cost-baseline/dist, real local Claude data, LOG_LEVEL=0, COLUMNS=200, built Bun entry, hyperfine 1.20.0 via comma, --warmup 4 --runs 12. Host: macOS 15.7.3 arm64, Apple M3 Pro, Bun 1.3.13, Node v24.14.1. Background load was high during the run, so the same-invocation A/B ratio is the useful signal.

daily --offline --json: 468.1ms ± 9.0ms before, 467.7ms ± 7.0ms after, effectively unchanged / 1.00x. session --offline --json: 497.4ms ± 18.7ms before, 478.4ms ± 5.3ms after, about 1.04x faster. blocks --offline --json: 538.5ms ± 15.5ms before, 534.9ms ± 8.8ms after, about 1.01x faster.

Output parity matched for daily, session, blocks, monthly, and weekly JSON output, and daily table output. Validation passed: pnpm run format, pnpm typecheck, pnpm run test, targeted ccusage data-loader tests, and pnpm --filter ccusage run build.

* perf(ccusage): narrow fast parser tail field scans

Start sessionId and version marker searches from the usage object position in the JSONL fast parser. Real Claude Code assistant rows place these tail fields after message.usage, so this avoids scanning long content-bearing message prefixes for fields that are normally near the end of the row.

Timestamp lookup intentionally stays at the original full-line search. Starting timestamp from usageStart caused 1ms actualEndTime differences in blocks output on local data, likely from another timestamp-like field later in the row. The kept change preserves byte-for-byte output parity.

A/B benchmark against d94f9c5 copied to /tmp/ccusage-forward-baseline/dist, real local Claude data, LOG_LEVEL=0, COLUMNS=200, built Bun entry, hyperfine 1.20.0 via comma, --warmup 4 --runs 12. Host: macOS 15.7.3 arm64, Apple M3 Pro, Bun 1.3.13, Node v24.14.1. cmux was about 114% CPU before the run, so same-invocation A/B ratio is the useful signal.

daily --offline --json: 454.2ms ± 4.2ms before, 446.1ms ± 3.7ms after, about 1.02x faster. session --offline --json: 481.2ms ± 17.1ms before, 476.2ms ± 15.7ms after, about 1.01x faster. blocks --offline --json: 543.2ms ± 15.2ms before, 539.2ms ± 19.8ms after, about 1.01x faster.

Output parity matched for daily, session, blocks, monthly, and weekly JSON output, and daily table output. Validation passed: pnpm run format, targeted ccusage data-loader tests, pnpm --filter ccusage run build, pnpm typecheck, and pnpm run test.

* perf(ccusage): drop unused dedupe undefined branch

The JSONL collectors no longer push undefined entries before calling dedupeEntryMetadataList, so the helper does not need to accept or branch on undefined values. Tighten the entry arrays and helper type to the actual runtime shape.

A/B benchmark against cc4d4f3 copied to /tmp/ccusage-field-baseline/dist, real local Claude data, LOG_LEVEL=0, COLUMNS=200, built Bun entry, hyperfine 1.20.0 via comma, --warmup 4 --runs 12. Host: macOS 15.7.3 arm64, Apple M3 Pro, Bun 1.3.13, Node v24.14.1. cmux was about 104.9% CPU before the run, so same-invocation A/B ratio is the useful signal.

daily --offline --json: 473.6ms ± 23.9ms before, 463.2ms ± 6.2ms after, about 1.02x faster but noisy. session --offline --json: 496.3ms ± 20.9ms before, 495.5ms ± 19.6ms after, effectively unchanged. blocks --offline --json: 554.1ms ± 17.6ms before, 549.2ms ± 10.5ms after, about 1.01x faster.

Output parity matched for daily, session, blocks, monthly, and weekly JSON output, and daily table output. Validation passed: pnpm run format, pnpm typecheck, targeted ccusage data-loader tests, pnpm --filter ccusage run build, and pnpm run test.

* perf(ccusage): keep usage JSONL callbacks synchronous

The usage-only JSONL reader is used by daily and session collectors with synchronous callbacks. Tighten that internal callback type to void and remove the per-usage-line result check/await branch from the buffered and streaming usage readers. The general JSONL reader remains async-capable for tests and non-usage scans.

A/B benchmark against e85b737 copied to /tmp/ccusage-dedupe-baseline/dist, real local Claude data, LOG_LEVEL=0, COLUMNS=200, built Bun entry, hyperfine 1.20.0 via comma, --warmup 4 --runs 12. Host: macOS 15.7.3 arm64, Apple M3 Pro, Bun 1.3.13, Node v24.14.1. cmux was about 108.0% CPU before the run, so same-invocation A/B ratio is the useful signal.

daily --offline --json: 455.0ms ± 14.3ms before, 448.2ms ± 3.7ms after, about 1.02x faster. session --offline --json: 468.8ms ± 3.9ms before, 467.1ms ± 3.5ms after, about 1.00x to 1.01x faster. blocks --offline --json: 533.5ms ± 8.3ms before, 537.7ms ± 13.7ms after, effectively unchanged/noisy because blocks uses the general JSONL reader.

Output parity matched for daily, session, blocks, monthly, and weekly JSON output, and daily table output. Validation passed: pnpm run format, pnpm typecheck, targeted ccusage data-loader tests, pnpm --filter ccusage run build, and pnpm run test.

* perf(ccusage): use more workers for daily sessions

Tune the default JSONL worker count by task. Daily and session loading now prefer more workers, using roughly 75% of os.availableParallelism() capped by JSONL_WORKER_THREAD_LIMIT, while blocks keeps the previous half-core default. The environment override CCUSAGE_JSONL_WORKER_THREADS and --single-thread behavior remain unchanged.

Latest worker sweep on this Apple M3 Pro host (os.availableParallelism() = 11) showed daily and session prefer 7-8 workers, while blocks still prefers 6. This change makes daily/session default to 8 workers on this machine and keeps blocks at 6.

A/B benchmark against 9d7656c copied to /tmp/ccusage-sync-baseline/dist, real local Claude data, LOG_LEVEL=0, COLUMNS=200, built Bun entry, hyperfine 1.20.0 via comma, --warmup 4 --runs 12. Host: macOS 15.7.3 arm64, Apple M3 Pro, Bun 1.3.13, Node v24.14.1. cmux was about 120.4% CPU before the run, so same-invocation A/B ratio is the useful signal.

daily --offline --json: 451.2ms ± 4.7ms before, 434.9ms ± 3.9ms after, about 1.04x faster. session --offline --json: 470.1ms ± 4.6ms before, 454.1ms ± 5.0ms after, about 1.04x faster. blocks --offline --json: 535.9ms ± 18.1ms before, 527.6ms ± 6.3ms after, effectively unchanged to about 1.02x faster.

Output parity matched for daily, session, blocks, monthly, and weekly JSON output, and daily table output. Validation passed: pnpm run format, pnpm typecheck, targeted ccusage data-loader tests, pnpm --filter ccusage run build, and pnpm run test.

* perf(ccusage): allow nine JSONL workers

Raise the default JSONL worker cap from 8 to 9. The task-aware worker tuning added in 7a352d2 makes daily/session use the higher cap on this 11-core host, while blocks still computes to 6 workers through the half-core path. Environment overrides and --single-thread remain unchanged.

Follow-up sweep after 7a352d2 showed daily/session still improve slightly at 9 workers: daily default 449.0ms ± 23.8ms, w8 438.7ms ± 5.8ms, w9 432.7ms ± 5.1ms, w10 454.7ms ± 21.6ms, w11 450.4ms ± 15.8ms; session default 453.3ms ± 2.5ms, w8 452.3ms ± 5.5ms, w9 447.1ms ± 3.4ms, w10 457.7ms ± 4.9ms.

A/B benchmark against 7a352d2 copied to /tmp/ccusage-taskworkers-baseline/dist, real local Claude data, LOG_LEVEL=0, COLUMNS=200, built Bun entry, hyperfine 1.20.0 via comma, --warmup 4 --runs 12. Host: macOS 15.7.3 arm64, Apple M3 Pro, Bun 1.3.13, Node v24.14.1. cmux was about 109.0% CPU before the run, so same-invocation A/B ratio is the useful signal.

daily --offline --json: 446.2ms ± 20.1ms before, 430.1ms ± 5.8ms after, about 1.04x faster. session --offline --json: 456.5ms ± 18.2ms before, 453.5ms ± 18.2ms after, effectively unchanged to about 1.01x faster. blocks --offline --json: 539.1ms ± 16.2ms before, 530.8ms ± 6.4ms after, effectively unchanged to about 1.02x faster.

Output parity matched for daily, session, blocks, monthly, and weekly JSON output, and daily table output. Validation passed: pnpm run format, pnpm typecheck, targeted ccusage data-loader tests, pnpm --filter ccusage run build, and pnpm run test.

* perf(ccusage): dedupe usage rows while reading files

Skip duplicate usage rows inside each JSONL file before calculating cost or constructing daily/session/block entries. The same token-total and speed-metadata replacement rule is still used, but the old per-file post-pass is removed so duplicate rows do less work before the global dedupe merge.

Benchmarks on macOS 15.7.3 arm64 / Apple M3 Pro / 11 available cores, Bun 1.3.13, Node v24.14.1, pnpm 10.30.1, hyperfine 1.20.0 via comma, LOG_LEVEL=0, COLUMNS=200. Background cmux was around one core, so absolute times are noisy; A/B commands ran in the same hyperfine invocation.

Built Bun JSON A/B against d2a72ce: daily --offline --json 440.2ms ± 18.8ms -> 423.6ms ± 5.1ms (~1.04x faster), session 451.1ms ± 5.0ms -> 440.3ms ± 3.5ms (~…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant