Skip to content

fix(pricing): do not charge GLM cache creation#1235

Merged
ryoppippi merged 3 commits into
mainfrom
codex/fix-glm-cache-create
Jun 8, 2026
Merged

fix(pricing): do not charge GLM cache creation#1235
ryoppippi merged 3 commits into
mainfrom
codex/fix-glm-cache-create

Conversation

@ryoppippi

@ryoppippi ryoppippi commented Jun 8, 2026

Copy link
Copy Markdown
Owner

Fixes the Z.AI GLM built-in fallback pricing added in #1225 so cache creation tokens are not charged in offline cost calculations.

Z.AI lists cached input storage for these GLM models as free, and the corresponding LiteLLM direct Z.AI entries use zero cache creation cost. The previous fallback used the generic cache creation assumption and could overstate GLM costs when cache creation tokens were present.

Testing:

  • direnv exec . cargo test --manifest-path rust/Cargo.toml --workspace embedded_pricing_includes_z_ai_glm_models_for_offline_reports
  • direnv exec . pnpm run format
  • direnv exec . pnpm run test
  • direnv exec . pnpm typecheck
  • pre-push hook: clippy, treefmt, gitleaks, cargo test

Summary by cubic

Stop charging cache creation for Z.AI GLM models and embed direct zai/* entries from LiteLLM so offline pricing matches Z.AI rates and context limits. Also patch GLM 4.5 variant gaps so cached-input reads and free cache creation are applied correctly, fixing overstated cost reports.

  • Bug Fixes
    • Set GLM cache_create to 0.0 in fallback pricing (4.5/4.6/4.7/5/5-turbo/5.1).
    • Embed zai/* pricing snapshot and override zai/glm-4.5 and its variants (-x, -air, -airx, v, -4-32b-0414-128k, -flash) with zero cache creation and official cache-read rates; treat zai/* as embedded to retain upstream context limits.

Written for commit 43271a9. Summary will update on new commits.

Review in cubic

Summary by CodeRabbit

  • Bug Fixes

    • Defaulted GLM pricing to use a 0.0 creation cache and removed per-model overrides; also treat zai-prefixed GLM entries as embedded so their pricing is retained.
  • Tests

    • Expanded unit tests to validate 0.0 creation cache across multiple GLM variants and added assertions for zai/glm-4.5 variants (including context limits).

Z.AI lists cached input storage for these GLM models as free, and the LiteLLM entries that expose the direct Z.AI pricing use zero cache creation cost.

The built-in fallback was using the generic input * 1.25 cache creation assumption, which could overstate offline costs for GLM cache creation tokens. Set the built-in GLM cache creation rate to zero and cover all six fallback models in the offline pricing test.
@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 980bd95d-a5ce-4629-b724-ad8aae974b28

📥 Commits

Reviewing files that changed from the base of the PR and between 52b8749 and 43271a9.

📒 Files selected for processing (1)
  • rust/crates/ccusage/src/pricing.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • rust/crates/ccusage/src/pricing.rs

📝 Walkthrough

Walkthrough

Zero the shared GLM pricing glm_base.cache_create, remove per-model cache_create so GLM variants inherit zero, treat zai/* names as embedded, and extend tests to assert cache_create == 0.0 for multiple GLM models including zai/glm-4.5.

Changes

GLM Model Pricing Cache-Create Configuration

Layer / File(s) Summary
GLM base and model pricing configuration
rust/crates/ccusage/src/pricing.rs
Introduce a glm pricing helper with glm_base.cache_create = 0.0; rewire glm-4.5/zai/glm-4.5, glm-4.6, glm-4.7 to use glm_base; remove per-model cache_create for glm-5, glm-5-turbo, glm-5.1 so they inherit 0.0 while keeping other rates.
Embedded model detection
rust/crates/ccusage/build.rs
is_embedded_model updated to treat model names starting with zai/ as embedded so zai/* embedded pricing entries are preserved.
GLM pricing unit tests
rust/crates/ccusage/src/pricing.rs
Unit tests extended to assert cache_create == 0.0 for glm-5.1, glm-5, glm-5-turbo, glm-4.7, glm-4.6, glm-4.5, zai/glm-4.5, and added assertions for zai/glm-4.5-air, zai/glm-4.5-x, zai/glm-4.5v.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Possibly related PRs

Poem

🐰
I nibbled numbers, soft and neat,
Zeroed cache costs — such a treat.
Base now shares with every model,
Tests hop in to check and coddle.
Hooray — GLM hops light and fleet!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(pricing): do not charge GLM cache creation' clearly and concisely summarizes the main change: updating GLM pricing to set cache_create to 0.0 to match Z.AI's published pricing where cached input storage is free.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/fix-glm-cache-create

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ryoppippi

Copy link
Copy Markdown
Owner Author

@coderabbitai review
@cubic-dev-ai review

@cubic-dev-ai

cubic-dev-ai Bot commented Jun 8, 2026

Copy link
Copy Markdown

@coderabbitai review
@cubic-dev-ai review

@ryoppippi I have started the AI code review. It will take a few minutes to complete.

@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jun 8, 2026

Copy link
Copy Markdown

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
ccusage-guide 43271a9 Commit Preview URL

Branch Preview URL
Jun 08 2026, 10:20 PM

@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get PR details and diff
gh pr view 1235 --repo ryoppippi/ccusage --json title,body,files,additions,deletions

echo "--- PR DIFF ---"
gh pr diff 1235 --repo ryoppippi/ccusage

Length of output: 5072



PR Review: fix(pricing): do not charge GLM cache creation

Summary: Fixes overstated offline cost calculations for Z.AI GLM models by setting cache_create to 0.0, matching Z.AI's published pricing (cached input storage is free for these models).


Changes Reviewed

rust/crates/ccusage/src/pricing.rs

Fix is correct and minimal.

  • Setting cache_create: 0.0 in glm_base cleanly propagates the zero cost to all base-tier models (glm-4.7, glm-4.6, glm-4.5) via the struct literal default.
  • Removing the explicit cache_create overrides for glm-5, glm-5-turbo, and glm-5.1 is correct — they now inherit 0.0 from glm_base through Rust's struct update syntax (..glm_base).
  • cache_read values (e.g. 0.11e-6, 0.2e-6) are deliberately left non-zero, consistent with Z.AI's pricing where reading from cache incurs a cost but creating/storing it does not.

Tests are well-expanded.

All 6 GLM models now have explicit assert_eq!(*.cache_create, 0.0) assertions. The addition of glm-4.6 and glm-4.5 test cases (both resolving to glm_base values) also implicitly validates that the prefix/fuzzy find() fallback picks up the corrected base pricing.


One Minor Observation

The glm-4.6 and glm-4.5 test entries assert identical values to glm-4.7 (the glm_base), suggesting they're testing the find() fallback/prefix-match path rather than explicitly registered model entries. This is fine as-is, but a brief comment in the test would help future readers understand why the values match (e.g. // glm-4.6 and glm-4.5 fall back to glm_base pricing via prefix match). Not a blocker.


Overall: Clean, focused, and well-tested. LGTM! 🐇

[approve]

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

Re-trigger cubic

@pullfrog pullfrog Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ No new issues found.

Reviewed changes — fixes GLM offline cost calculation by setting cache_create to zero to match the Z.AI pricing page, which lists Cached Input Storage as "Limited-time Free" for all GLM text models.

  • Set glm_base.cache_create to 0.0 — matches the upstream Z.AI pricing table where all GLM text models show "Limited-time Free" for Cached Input Storage.
  • Removed per-model cache_create overrides from glm-5, glm-5-turbo, and glm-5.1 — they now inherit 0.0 from glm_base via struct update syntax.
  • Added glm-4.5 and glm-4.6 entries using glm_base pricing, consistent with the other 4.x models.
  • Expanded tests to assert cache_create == 0.0 across all six GLM models.

Pullfrog  | View workflow run | Using DeepSeek Pro (free via Pullfrog for OSS) | 𝕏

@pkg-pr-new

pkg-pr-new Bot commented Jun 8, 2026

Copy link
Copy Markdown

Open in StackBlitz

ccusage

npx https://pkg.pr.new/ccusage@1235

@ccusage/ccusage-darwin-arm64

npx https://pkg.pr.new/@ccusage/ccusage-darwin-arm64@1235

@ccusage/ccusage-darwin-x64

npx https://pkg.pr.new/@ccusage/ccusage-darwin-x64@1235

@ccusage/ccusage-linux-arm64

npx https://pkg.pr.new/@ccusage/ccusage-linux-arm64@1235

@ccusage/ccusage-linux-x64

npx https://pkg.pr.new/@ccusage/ccusage-linux-x64@1235

@ccusage/ccusage-win32-arm64

npx https://pkg.pr.new/@ccusage/ccusage-win32-arm64@1235

@ccusage/ccusage-win32-x64

npx https://pkg.pr.new/@ccusage/ccusage-win32-x64@1235

commit: 43271a9

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

ccusage performance comparison

PR SHA: 400e822d877f
Base SHA: 00b186588a64

This compares the PR package against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package SHA Execution setup Bunx temp cache Bunx warm median Warm samples
Base pkg.pr.new 00b186588a64 660.8ms 550.6ms 33.3ms 3
PR pkg.pr.new 400e822 717.7ms 636.4ms 34.3ms 3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: 00b186588a64; PR package: 400e822. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
bunx -p <pkg> ccusage claude --offline --json 1.01 GiB 585.1ms 576.0ms 1.02x 342.58 MiB 295.95 MiB 0.86x 1.72 GiB/s 1.75 GiB/s
bunx -p <pkg> ccusage codex --offline --json 1.01 GiB 378.8ms 383.8ms 0.99x 78.83 MiB 77.33 MiB 0.98x 2.66 GiB/s 2.62 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command Runtime Input Median Throughput Samples
claude --offline --json Package wrapper 1.01 GiB 570.9ms 1.76 GiB/s 1
claude --offline --json Installed native binary 1.01 GiB 550.8ms 1.83 GiB/s 1
codex --offline --json Package wrapper 1.01 GiB 377.8ms 2.66 GiB/s 1
codex --offline --json Installed native binary 1.01 GiB 346.1ms 2.91 GiB/s 1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude daily --offline --json 0.00 MiB 30.7ms 30.5ms 1.01x 43.48 MiB 43.73 MiB 1.01x 0.05 MiB/s 0.05 MiB/s
claude session --offline --json 0.00 MiB 30.7ms 30.5ms 1.00x 43.61 MiB - - 0.05 MiB/s 0.05 MiB/s
codex daily --offline --json 0.00 MiB 30.0ms 30.3ms 0.99x 43.61 MiB 43.61 MiB 1.00x 0.03 MiB/s 0.03 MiB/s
codex session --offline --json 0.00 MiB 30.2ms 30.6ms 0.99x - - - 0.03 MiB/s 0.03 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude --offline --json 1.01 GiB 561.8ms 563.5ms 1.00x 324.45 MiB 317.08 MiB 0.98x 1.79 GiB/s 1.79 GiB/s
codex --offline --json 1.01 GiB 381.5ms 370.5ms 1.03x 81.08 MiB 79.20 MiB 0.98x 2.64 GiB/s 2.72 GiB/s

Artifact size

Artifact Base PR Delta Ratio
packed ccusage-*.tgz 16.83 KiB 16.83 KiB -0.00 KiB 1.00x
installed native package binary 3353.62 KiB 3353.62 KiB +0.00 KiB 1.00x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

ccusage performance comparison

PR SHA: 400e822d877f
Base SHA: 00b186588a64

This compares the Rust PR release binary against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package SHA Execution setup Bunx temp cache Bunx warm median Warm samples
Base pkg.pr.new 00b186588a64 1.856s 1.031s 33.0ms 3
PR pkg.pr.new 400e822 967.4ms 835.1ms 32.9ms 3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: 00b186588a64; PR package: 400e822. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
bunx -p <pkg> ccusage claude --offline --json 1.01 GiB 569.9ms 569.6ms 1.00x 320.83 MiB 319.08 MiB 0.99x 1.77 GiB/s 1.77 GiB/s
bunx -p <pkg> ccusage codex --offline --json 1.01 GiB 387.2ms 378.7ms 1.02x 73.33 MiB 79.33 MiB 1.08x 2.60 GiB/s 2.66 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command Runtime Input Median Throughput Samples
claude --offline --json Package wrapper 1.01 GiB 582.4ms 1.73 GiB/s 1
claude --offline --json Installed native binary 1.01 GiB 547.8ms 1.84 GiB/s 1
codex --offline --json Package wrapper 1.01 GiB 373.7ms 2.69 GiB/s 1
codex --offline --json Installed native binary 1.01 GiB 343.7ms 2.93 GiB/s 1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude daily --offline --json 0.00 MiB 31.4ms 4.3ms 7.29x 43.48 MiB 2.70 MiB 0.06x 0.05 MiB/s 0.36 MiB/s
claude session --offline --json 0.00 MiB 31.0ms 4.3ms 7.24x 43.61 MiB 2.70 MiB 0.06x 0.05 MiB/s 0.36 MiB/s
codex daily --offline --json 0.00 MiB 30.8ms 4.1ms 7.60x 43.73 MiB 2.70 MiB 0.06x 0.03 MiB/s 0.21 MiB/s
codex session --offline --json 0.00 MiB 30.7ms 4.0ms 7.67x 43.61 MiB 2.70 MiB 0.06x 0.03 MiB/s 0.21 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude --offline --json 1.01 GiB 564.8ms 553.7ms 1.02x 320.95 MiB 332.45 MiB 1.04x 1.78 GiB/s 1.82 GiB/s
codex --offline --json 1.01 GiB 373.6ms 345.3ms 1.08x 82.08 MiB 76.45 MiB 0.93x 2.70 GiB/s 2.92 GiB/s

Artifact size

Artifact Base PR Delta Ratio
packed ccusage-*.tgz 16.83 KiB 16.83 KiB -0.00 KiB 1.00x
installed native package binary 3353.62 KiB 3353.62 KiB +0.00 KiB 1.00x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

@ryoppippi

Copy link
Copy Markdown
Owner Author

Source for the cache creation/storage change: Z.AI official pricing lists the text-model column "Cached Input Storage" as "Limited-time Free" for the GLM models touched here: GLM-5.1, GLM-5, GLM-5-Turbo, GLM-4.7, GLM-4.6, and GLM-4.5.

https://docs.z.ai/guides/overview/pricing

That is why this PR sets the built-in fallback `cache_create` rate to `0.0` while preserving the non-zero `Cached Input` / cache-read rates.

Include direct Z.AI provider entries from the pinned LiteLLM pricing snapshot so offline pricing can reuse upstream-maintained GLM metadata such as context limits instead of relying only on hand-written fallback entries.

Keep a built-in override for zai/glm-4.5 because the pinned LiteLLM entry currently lacks prompt-cache pricing, while Z.AI official pricing lists cached input storage as free and cached input reads at the GLM 4.5 rate.
@ryoppippi

Copy link
Copy Markdown
Owner Author

Follow-up change: instead of relying only on handwritten GLM fallback pricing, this now embeds direct Z.AI provider entries from the pinned LiteLLM snapshot by allowing the `zai/` prefix in `build.rs`.

The built-in override remains only where the embedded source is currently insufficient: `zai/glm-4.5` lacks prompt-cache pricing in the pinned LiteLLM data, and `glm-5.1` / `glm-5-turbo` are still covered by fallback aliases when they are absent from the pinned direct Z.AI entries.

@coderabbitai review
@cubic-dev-ai review

@cubic-dev-ai

cubic-dev-ai Bot commented Jun 8, 2026

Copy link
Copy Markdown

Follow-up change: instead of relying only on handwritten GLM fallback pricing, this now embeds direct Z.AI provider entries from the pinned LiteLLM snapshot by allowing the `zai/` prefix in `build.rs`.

The built-in override remains only where the embedded source is currently insufficient: `zai/glm-4.5` lacks prompt-cache pricing in the pinned LiteLLM data, and `glm-5.1` / `glm-5-turbo` are still covered by fallback aliases when they are absent from the pinned direct Z.AI entries.

@coderabbitai review
...

@ryoppippi I have started the AI code review. It will take a few minutes to complete.

@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
gh pr diff 1235 --repo ryoppippi/ccusage

Length of output: 4148



Follow-up Review: zai/ prefix embedding + zai/glm-4.5 override

The follow-up is clean and well-reasoned. Here's the breakdown:


build.rsis_embedded_model()

|| model.starts_with("zai/")

✅ Minimal, scoped change. zai/ entries from the pinned LiteLLM snapshot will now be embedded directly, replacing the need for purely handwritten fallbacks where the data exists.


pricing.rszai/glm-4.5 explicit override

self.entries.insert("zai/glm-4.5".to_string(), glm_base);

✅ Correct gap-fill. Since the LiteLLM snapshot lacks prompt-cache pricing for zai/glm-4.5, the handwritten entry (using glm_base, which now has cache_create: 0.0 and a non-zero cache_read) is the right fallback. This ensures the zai/-prefixed form gets the same cache-read-capable pricing as the bare glm-4.5 alias.


Tests

✅ The context_limit("zai/glm-5") == Some(200_000) assertion is a particularly good addition — it implicitly validates that the build.rs change is actually embedding the LiteLLM snapshot entry for zai/glm-5 (not just hitting a manual fallback), since the context limit comes from the snapshot data, not from the pricing.rs handwritten entries.

context_limit("zai/glm-4.5") == Some(128_000) covers the explicit override path, confirming the 128k vs 200k distinction between the 4.x and 5.x model generations.


Overall: The two-layer approach (prefer embedded LiteLLM snapshot data; handwrite only where the snapshot is deficient) is sound and well-executed. LGTM 🐇

[approve]

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread rust/crates/ccusage/build.rs

@pullfrog pullfrog Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ No new issues found.

Reviewed changes — embeds Z.AI zai/* LiteLLM entries and adds a built-in fallback for zai/glm-4.5 so all GLM pricing variants work correctly in offline mode.

  • Added zai/ to is_embedded_model in build.rs — LiteLLM entries for zai/glm-5, zai/glm-4.6, zai/glm-4.7 all carry cache_creation_input_token_cost: 0 directly, so no derived fallback is needed.
  • Added zai/glm-4.5 built-in fallback with cache_create: 0.0 — the LiteLLM entry for zai/glm-4.5 lacks an explicit cache-create field, so the built-in fallback covers it.
  • Expanded tests to assert cache_create == 0.0 across all six models, plus zai/glm-4.5 pricing and zai/glm-5 and zai/glm-4.5 context limits.

Pullfrog  | View workflow run | Using DeepSeek Pro (free via Pullfrog for OSS) | 𝕏

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

ccusage performance comparison

PR SHA: 52b8749689b0
Base SHA: 00b186588a64

This compares the PR package against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package SHA Execution setup Bunx temp cache Bunx warm median Warm samples
Base pkg.pr.new 00b186588a64 1.220s 1.097s 30.6ms 3
PR pkg.pr.new 52b8749 889.9ms 852.2ms 30.2ms 3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: 00b186588a64; PR package: 52b8749. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
bunx -p <pkg> ccusage claude --offline --json 1.01 GiB 544.5ms 528.0ms 1.03x 321.33 MiB 346.45 MiB 1.08x 1.85 GiB/s 1.91 GiB/s
bunx -p <pkg> ccusage codex --offline --json 1.01 GiB 362.6ms 368.7ms 0.98x 81.45 MiB 81.58 MiB 1.00x 2.78 GiB/s 2.73 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command Runtime Input Median Throughput Samples
claude --offline --json Package wrapper 1.01 GiB 541.0ms 1.86 GiB/s 1
claude --offline --json Installed native binary 1.01 GiB 514.8ms 1.96 GiB/s 1
codex --offline --json Package wrapper 1.01 GiB 367.9ms 2.74 GiB/s 1
codex --offline --json Installed native binary 1.01 GiB 344.4ms 2.92 GiB/s 1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude daily --offline --json 0.00 MiB 28.2ms 28.3ms 1.00x 43.86 MiB 43.73 MiB 1.00x 0.05 MiB/s 0.05 MiB/s
claude session --offline --json 0.00 MiB 28.5ms 28.7ms 0.99x 43.73 MiB 43.61 MiB 1.00x 0.05 MiB/s 0.05 MiB/s
codex daily --offline --json 0.00 MiB 28.3ms 27.9ms 1.01x 43.48 MiB 43.48 MiB 1.00x 0.03 MiB/s 0.03 MiB/s
codex session --offline --json 0.00 MiB 27.9ms 28.3ms 0.99x 43.73 MiB 43.61 MiB 1.00x 0.03 MiB/s 0.03 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude --offline --json 1.01 GiB 547.5ms 540.0ms 1.01x 322.08 MiB 326.45 MiB 1.01x 1.84 GiB/s 1.86 GiB/s
codex --offline --json 1.01 GiB 375.2ms 385.9ms 0.97x 71.95 MiB 82.83 MiB 1.15x 2.68 GiB/s 2.61 GiB/s

Artifact size

Artifact Base PR Delta Ratio
packed ccusage-*.tgz 16.83 KiB 16.83 KiB +0.00 KiB 1.00x
installed native package binary 3353.62 KiB 3353.74 KiB +0.13 KiB 1.00x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

ccusage performance comparison

PR SHA: 52b8749689b0
Base SHA: 00b186588a64

This compares the Rust PR release binary against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package SHA Execution setup Bunx temp cache Bunx warm median Warm samples
Base pkg.pr.new 00b186588a64 865.1ms 633.7ms 32.9ms 3
PR pkg.pr.new 52b8749 861.9ms 711.1ms 31.8ms 3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: 00b186588a64; PR package: 52b8749. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
bunx -p <pkg> ccusage claude --offline --json 1.01 GiB 574.7ms 564.4ms 1.02x 324.08 MiB 310.83 MiB 0.96x 1.75 GiB/s 1.78 GiB/s
bunx -p <pkg> ccusage codex --offline --json 1.01 GiB 375.9ms 388.7ms 0.97x 82.20 MiB 72.20 MiB 0.88x 2.68 GiB/s 2.59 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command Runtime Input Median Throughput Samples
claude --offline --json Package wrapper 1.01 GiB 554.3ms 1.82 GiB/s 1
claude --offline --json Installed native binary 1.01 GiB 528.0ms 1.91 GiB/s 1
codex --offline --json Package wrapper 1.01 GiB 379.0ms 2.66 GiB/s 1
codex --offline --json Installed native binary 1.01 GiB 353.9ms 2.84 GiB/s 1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude daily --offline --json 0.00 MiB 30.0ms 4.1ms 7.25x 43.73 MiB 2.70 MiB 0.06x 0.05 MiB/s 0.37 MiB/s
claude session --offline --json 0.00 MiB 30.3ms 4.2ms 7.21x 43.61 MiB 2.83 MiB 0.06x 0.05 MiB/s 0.37 MiB/s
codex daily --offline --json 0.00 MiB 30.6ms 3.9ms 7.94x 43.61 MiB 2.70 MiB 0.06x 0.03 MiB/s 0.22 MiB/s
codex session --offline --json 0.00 MiB 30.2ms 3.8ms 8.00x 43.48 MiB 2.70 MiB 0.06x 0.03 MiB/s 0.23 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude --offline --json 1.01 GiB 571.8ms 540.2ms 1.06x 343.45 MiB 341.45 MiB 0.99x 1.76 GiB/s 1.86 GiB/s
codex --offline --json 1.01 GiB 375.4ms 351.4ms 1.07x 74.95 MiB 79.70 MiB 1.06x 2.68 GiB/s 2.86 GiB/s

Artifact size

Artifact Base PR Delta Ratio
packed ccusage-*.tgz 16.83 KiB 16.83 KiB +0.00 KiB 1.00x
installed native package binary 3353.62 KiB 3353.74 KiB +0.13 KiB 1.00x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

Keep the broader embedded Z.AI LiteLLM snapshot, but override GLM entries whose pinned LiteLLM records currently omit prompt-cache prices. Without these overrides, ccusage would synthesize cache creation at input * 1.25 and overstate offline costs for GLM 4.5 variants.

The overrides use the official Z.AI pricing table, preserving zero cache creation/storage and the published cached-input read rates for the affected direct Z.AI GLM models.
@ryoppippi

Copy link
Copy Markdown
Owner Author

Follow-up pushed for the Cubic finding: GLM entries with missing LiteLLM cache fields now get official Z.AI cache overrides while keeping the embedded `zai/` snapshot. @coderabbitai review @cubic-dev-ai review

@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
gh pr diff 1235 --repo ryoppippi/ccusage

Length of output: 6252



Follow-up Review: GLM cache-field overrides for LiteLLM-snapshot gaps

The change is clean and well-structured. Here's the breakdown:


pricing.rsglm_pricing closure

let glm_pricing = |input: f64, output: f64, cache_read: f64| Pricing {
    input,
    output,
    cache_create: 0.0,
    cache_read,
    ...
};

✅ Excellent refactor. Encoding cache_create: 0.0 as an invariant of the closure makes it structurally impossible to accidentally set a non-zero cache-create rate for any new GLM entry. This is a better design than the previous per-model ..glm_base spread.


New zai/ entries (the "Cubic finding")

Entry input output cache_read Notes
zai/glm-4.5-x 2.2e-6 8.9e-6 0.45e-6 Enhanced variant
zai/glm-4.5-air 0.2e-6 1.1e-6 0.03e-6 Efficient variant
zai/glm-4.5-airx 1.1e-6 4.5e-6 0.22e-6 Enhanced-air variant
zai/glm-4.5v 0.6e-6 1.8e-6 0.11e-6 Vision variant
zai/glm-4-32b-0414-128k 0.1e-6 0.1e-6 0.0 Free cache reads
zai/glm-4.5-flash 0.0 0.0 0.0 Free tier

✅ All six correctly get cache_create: 0.0 via the closure.


Tests

embedded_pricing_patches_z_ai_glm_entries_without_litellm_cache_rates covers glm-4.5-air, glm-4.5-x, and glm-4.5v with full rate assertions including cache_create == 0.0.

One minor gap: zai/glm-4.5-airx, zai/glm-4-32b-0414-128k, and zai/glm-4.5-flash are not covered by the new test. These are lower-risk (two are zero-cost models, airx follows the same pattern as x/air), so it's not a blocker — but adding assertions for them would complete the picture.


Overall: The glm_pricing closure refactor is a genuine improvement over the previous approach. The gap-filling for LiteLLM-snapshot-deficient entries is correct and well-motivated. LGTM 🐇

[approve]

@pullfrog pullfrog Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ No new issues found.

Reviewed changes — refactors GLM pricing with a shared closure that bakes cache_create: 0.0, adds six new zai/glm-4.5-* variant entries, and expands test coverage.

  • Extracted glm_pricing closure in put_builtin_pricing() — bakes cache_create: 0.0, cache_read_explicit: true, and all _above_200k: None into every GLM variant, preventing per-model cache-create drift.
  • Added 6 new zai/glm-4.5-* variant entrieszai/glm-4.5-x, zai/glm-4.5-air, zai/glm-4.5-airx, zai/glm-4.5v, zai/glm-4-32b-0414-128k, zai/glm-4.5-flash, all with cache_create: 0.0. Pricing verified against the Z.AI pricing page.
  • Removed explicit per-model cache_create overrides from glm-5, glm-5-turbo, glm-5.1 — they now inherit 0.0 from glm_base via struct update syntax.
  • Added embedded_pricing_patches_z_ai_glm_entries_without_litellm_cache_rates test covering zai/glm-4.5-air, zai/glm-4.5-x, zai/glm-4.5v.
  • Expanded embedded_pricing_includes_z_ai_glm_models_for_offline_reports with cache_create == 0.0 assertions for all models plus context-limit checks for zai/glm-5 and zai/glm-4.5.

Pullfrog  | View workflow run | Using DeepSeek Pro (free via Pullfrog for OSS) | 𝕏

@cubic-dev-ai

cubic-dev-ai Bot commented Jun 8, 2026

Copy link
Copy Markdown

Follow-up pushed for the Cubic finding: GLM entries with missing LiteLLM cache fields now get official Z.AI cache overrides while keeping the embedded `zai/` snapshot. @coderabbitai review @cubic-dev-ai review

@ryoppippi I have started the AI code review. It will take a few minutes to complete.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

Re-trigger cubic

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

ccusage performance comparison

PR SHA: 43271a93d7d9
Base SHA: 00b186588a64

This compares the Rust PR release binary against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package SHA Execution setup Bunx temp cache Bunx warm median Warm samples
Base pkg.pr.new 00b186588a64 591.8ms 709.5ms 31.1ms 3
PR pkg.pr.new 43271a9 778.8ms 1.133s 32.6ms 3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: 00b186588a64; PR package: 43271a9. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
bunx -p <pkg> ccusage claude --offline --json 1.01 GiB 575.1ms 548.8ms 1.05x 313.58 MiB 305.08 MiB 0.97x 1.75 GiB/s 1.83 GiB/s
bunx -p <pkg> ccusage codex --offline --json 1.01 GiB 383.9ms 383.0ms 1.00x 73.20 MiB 72.45 MiB 0.99x 2.62 GiB/s 2.63 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command Runtime Input Median Throughput Samples
claude --offline --json Package wrapper 1.01 GiB 569.6ms 1.77 GiB/s 1
claude --offline --json Installed native binary 1.01 GiB 523.7ms 1.92 GiB/s 1
codex --offline --json Package wrapper 1.01 GiB 363.7ms 2.77 GiB/s 1
codex --offline --json Installed native binary 1.01 GiB 355.2ms 2.83 GiB/s 1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude daily --offline --json 0.00 MiB 29.5ms 4.1ms 7.23x 43.61 MiB 2.70 MiB 0.06x 0.05 MiB/s 0.38 MiB/s
claude session --offline --json 0.00 MiB 29.6ms 4.0ms 7.49x 43.48 MiB 2.70 MiB 0.06x 0.05 MiB/s 0.39 MiB/s
codex daily --offline --json 0.00 MiB 28.7ms 3.7ms 7.84x 43.61 MiB 2.70 MiB 0.06x 0.03 MiB/s 0.23 MiB/s
codex session --offline --json 0.00 MiB 29.1ms 3.7ms 7.85x 43.73 MiB 2.70 MiB 0.06x 0.03 MiB/s 0.23 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude --offline --json 1.01 GiB 549.8ms 508.1ms 1.08x 325.20 MiB 296.58 MiB 0.91x 1.83 GiB/s 1.98 GiB/s
codex --offline --json 1.01 GiB 364.2ms 333.3ms 1.09x 80.45 MiB 74.33 MiB 0.92x 2.76 GiB/s 3.02 GiB/s

Artifact size

Artifact Base PR Delta Ratio
packed ccusage-*.tgz 16.83 KiB 16.83 KiB +0.00 KiB 1.00x
installed native package binary 3353.62 KiB 3353.62 KiB +0.00 KiB 1.00x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

ccusage performance comparison

PR SHA: 43271a93d7d9
Base SHA: 00b186588a64

This compares the PR package against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package SHA Execution setup Bunx temp cache Bunx warm median Warm samples
Base pkg.pr.new 00b186588a64 589.0ms 672.7ms 33.2ms 3
PR pkg.pr.new 43271a9 1.017s 761.3ms 33.0ms 3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: 00b186588a64; PR package: 43271a9. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
bunx -p <pkg> ccusage claude --offline --json 1.01 GiB 574.6ms 574.6ms 1.00x 320.33 MiB 330.58 MiB 1.03x 1.75 GiB/s 1.75 GiB/s
bunx -p <pkg> ccusage codex --offline --json 1.01 GiB 378.2ms 381.6ms 0.99x 82.20 MiB 78.08 MiB 0.95x 2.66 GiB/s 2.64 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command Runtime Input Median Throughput Samples
claude --offline --json Package wrapper 1.01 GiB 588.5ms 1.71 GiB/s 1
claude --offline --json Installed native binary 1.01 GiB 531.2ms 1.90 GiB/s 1
codex --offline --json Package wrapper 1.01 GiB 369.8ms 2.72 GiB/s 1
codex --offline --json Installed native binary 1.01 GiB 343.2ms 2.93 GiB/s 1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude daily --offline --json 0.00 MiB 31.2ms 30.9ms 1.01x 43.73 MiB 43.61 MiB 1.00x 0.05 MiB/s 0.05 MiB/s
claude session --offline --json 0.00 MiB 31.0ms 30.9ms 1.00x 43.48 MiB 43.48 MiB 1.00x 0.05 MiB/s 0.05 MiB/s
codex daily --offline --json 0.00 MiB 30.1ms 30.6ms 0.98x 43.48 MiB 43.73 MiB 1.01x 0.03 MiB/s 0.03 MiB/s
codex session --offline --json 0.00 MiB 31.1ms 30.4ms 1.02x 43.48 MiB 43.48 MiB 1.00x 0.03 MiB/s 0.03 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude --offline --json 1.01 GiB 566.5ms 550.3ms 1.03x 331.83 MiB 313.45 MiB 0.94x 1.78 GiB/s 1.83 GiB/s
codex --offline --json 1.01 GiB 374.1ms 369.6ms 1.01x 82.08 MiB 73.08 MiB 0.89x 2.69 GiB/s 2.72 GiB/s

Artifact size

Artifact Base PR Delta Ratio
packed ccusage-*.tgz 16.83 KiB 16.83 KiB +0.00 KiB 1.00x
installed native package binary 3353.62 KiB 3353.62 KiB +0.00 KiB 1.00x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

@ryoppippi ryoppippi merged commit 83816d5 into main Jun 8, 2026
40 checks passed
@ryoppippi ryoppippi deleted the codex/fix-glm-cache-create branch June 8, 2026 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant