fix(pricing): do not charge GLM cache creation by ryoppippi · Pull Request #1235 · ryoppippi/ccusage

ryoppippi · 2026-06-08T21:35:05Z

Fixes the Z.AI GLM built-in fallback pricing added in #1225 so cache creation tokens are not charged in offline cost calculations.

Z.AI lists cached input storage for these GLM models as free, and the corresponding LiteLLM direct Z.AI entries use zero cache creation cost. The previous fallback used the generic cache creation assumption and could overstate GLM costs when cache creation tokens were present.

Testing:

direnv exec . cargo test --manifest-path rust/Cargo.toml --workspace embedded_pricing_includes_z_ai_glm_models_for_offline_reports
direnv exec . pnpm run format
direnv exec . pnpm run test
direnv exec . pnpm typecheck
pre-push hook: clippy, treefmt, gitleaks, cargo test

Summary by cubic

Stop charging cache creation for Z.AI GLM models and embed direct zai/* entries from LiteLLM so offline pricing matches Z.AI rates and context limits. Also patch GLM 4.5 variant gaps so cached-input reads and free cache creation are applied correctly, fixing overstated cost reports.

Bug Fixes
- Set GLM cache_create to 0.0 in fallback pricing (4.5/4.6/4.7/5/5-turbo/5.1).
- Embed zai/* pricing snapshot and override zai/glm-4.5 and its variants (-x, -air, -airx, v, -4-32b-0414-128k, -flash) with zero cache creation and official cache-read rates; treat zai/* as embedded to retain upstream context limits.

^{Written for commit 43271a9. Summary will update on new commits.}

Summary by CodeRabbit

Bug Fixes
- Defaulted GLM pricing to use a 0.0 creation cache and removed per-model overrides; also treat zai-prefixed GLM entries as embedded so their pricing is retained.
Tests
- Expanded unit tests to validate 0.0 creation cache across multiple GLM variants and added assertions for zai/glm-4.5 variants (including context limits).

Z.AI lists cached input storage for these GLM models as free, and the LiteLLM entries that expose the direct Z.AI pricing use zero cache creation cost. The built-in fallback was using the generic input * 1.25 cache creation assumption, which could overstate offline costs for GLM cache creation tokens. Set the built-in GLM cache creation rate to zero and cover all six fallback models in the offline pricing test.

coderabbitai · 2026-06-08T21:35:16Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 980bd95d-a5ce-4629-b724-ad8aae974b28

📥 Commits

Reviewing files that changed from the base of the PR and between 52b8749 and 43271a9.

📒 Files selected for processing (1)

rust/crates/ccusage/src/pricing.rs

🚧 Files skipped from review as they are similar to previous changes (1)

rust/crates/ccusage/src/pricing.rs

📝 Walkthrough

Walkthrough

Zero the shared GLM pricing glm_base.cache_create, remove per-model cache_create so GLM variants inherit zero, treat zai/* names as embedded, and extend tests to assert cache_create == 0.0 for multiple GLM models including zai/glm-4.5.

Changes

GLM Model Pricing Cache-Create Configuration

Layer / File(s)	Summary
GLM base and model pricing configuration `rust/crates/ccusage/src/pricing.rs`	Introduce a glm pricing helper with `glm_base.cache_create = 0.0`; rewire `glm-4.5`/`zai/glm-4.5`, `glm-4.6`, `glm-4.7` to use `glm_base`; remove per-model `cache_create` for `glm-5`, `glm-5-turbo`, `glm-5.1` so they inherit `0.0` while keeping other rates.
Embedded model detection `rust/crates/ccusage/build.rs`	`is_embedded_model` updated to treat model names starting with `zai/` as embedded so `zai/*` embedded pricing entries are preserved.
GLM pricing unit tests `rust/crates/ccusage/src/pricing.rs`	Unit tests extended to assert `cache_create == 0.0` for `glm-5.1`, `glm-5`, `glm-5-turbo`, `glm-4.7`, `glm-4.6`, `glm-4.5`, `zai/glm-4.5`, and added assertions for `zai/glm-4.5-air`, `zai/glm-4.5-x`, `zai/glm-4.5v`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

feat(pricing): add Z.AI GLM model pricing for cost calculation #1201: Addresses the same GLM pricing entries and zai/* embedded classification as discussed in that issue.

Possibly related PRs

ryoppippi/ccusage#1132: Related changes to embedded pricing pipeline and build-time pricing input handling.
ryoppippi/ccusage#657: Prior edits affecting GLM pricing and zai/glm-4.5 handling and tests.

Poem

🐰
I nibbled numbers, soft and neat,
Zeroed cache costs — such a treat.
Base now shares with every model,
Tests hop in to check and coddle.
Hooray — GLM hops light and fleet!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix(pricing): do not charge GLM cache creation' clearly and concisely summarizes the main change: updating GLM pricing to set cache_create to 0.0 to match Z.AI's published pricing where cached input storage is free.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch codex/fix-glm-cache-create

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ryoppippi · 2026-06-08T21:35:32Z

@coderabbitai review
@cubic-dev-ai review

cubic-dev-ai · 2026-06-08T21:35:40Z

@coderabbitai review
@cubic-dev-ai review

@ryoppippi I have started the AI code review. It will take a few minutes to complete.

cloudflare-workers-and-pages · 2026-06-08T21:35:50Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Preview URL	Updated (UTC)
✅ Deployment successful! View logs	ccusage-guide	`43271a9`	Commit Preview URL Branch Preview URL	Jun 08 2026, 10:20 PM

coderabbitai · 2026-06-08T21:36:24Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get PR details and diff
gh pr view 1235 --repo ryoppippi/ccusage --json title,body,files,additions,deletions

echo "--- PR DIFF ---"
gh pr diff 1235 --repo ryoppippi/ccusage

Length of output: 5072

PR Review: `fix(pricing): do not charge GLM cache creation`

Summary: Fixes overstated offline cost calculations for Z.AI GLM models by setting cache_create to 0.0, matching Z.AI's published pricing (cached input storage is free for these models).

Changes Reviewed

rust/crates/ccusage/src/pricing.rs

✅ Fix is correct and minimal.

Setting cache_create: 0.0 in glm_base cleanly propagates the zero cost to all base-tier models (glm-4.7, glm-4.6, glm-4.5) via the struct literal default.
Removing the explicit cache_create overrides for glm-5, glm-5-turbo, and glm-5.1 is correct — they now inherit 0.0 from glm_base through Rust's struct update syntax (..glm_base).
cache_read values (e.g. 0.11e-6, 0.2e-6) are deliberately left non-zero, consistent with Z.AI's pricing where reading from cache incurs a cost but creating/storing it does not.

✅ Tests are well-expanded.

All 6 GLM models now have explicit assert_eq!(*.cache_create, 0.0) assertions. The addition of glm-4.6 and glm-4.5 test cases (both resolving to glm_base values) also implicitly validates that the prefix/fuzzy find() fallback picks up the corrected base pricing.

One Minor Observation

The glm-4.6 and glm-4.5 test entries assert identical values to glm-4.7 (the glm_base), suggesting they're testing the find() fallback/prefix-match path rather than explicitly registered model entries. This is fine as-is, but a brief comment in the test would help future readers understand why the values match (e.g. // glm-4.6 and glm-4.5 fall back to glm_base pricing via prefix match). Not a blocker.

Overall: Clean, focused, and well-tested. LGTM! 🐇

[approve]

cubic-dev-ai

No issues found across 1 file

_{Re-trigger cubic}

pullfrog

✅ No new issues found.

Reviewed changes — fixes GLM offline cost calculation by setting cache_create to zero to match the Z.AI pricing page, which lists Cached Input Storage as "Limited-time Free" for all GLM text models.

Set glm_base.cache_create to 0.0 — matches the upstream Z.AI pricing table where all GLM text models show "Limited-time Free" for Cached Input Storage.
Removed per-model cache_create overrides from glm-5, glm-5-turbo, and glm-5.1 — they now inherit 0.0 from glm_base via struct update syntax.
Added glm-4.5 and glm-4.6 entries using glm_base pricing, consistent with the other 4.x models.
Expanded tests to assert cache_create == 0.0 across all six GLM models.

^{｜ View workflow run ｜ Using DeepSeek Pro (free via Pullfrog for OSS) ｜ 𝕏}

pkg-pr-new · 2026-06-08T21:46:15Z

Open in StackBlitz

ccusage

npx https://pkg.pr.new/ccusage@1235

@ccusage/ccusage-darwin-arm64

npx https://pkg.pr.new/@ccusage/ccusage-darwin-arm64@1235

@ccusage/ccusage-darwin-x64

npx https://pkg.pr.new/@ccusage/ccusage-darwin-x64@1235

@ccusage/ccusage-linux-arm64

npx https://pkg.pr.new/@ccusage/ccusage-linux-arm64@1235

@ccusage/ccusage-linux-x64

npx https://pkg.pr.new/@ccusage/ccusage-linux-x64@1235

@ccusage/ccusage-win32-arm64

npx https://pkg.pr.new/@ccusage/ccusage-win32-arm64@1235

@ccusage/ccusage-win32-x64

npx https://pkg.pr.new/@ccusage/ccusage-win32-x64@1235

commit: 43271a9

github-actions · 2026-06-08T21:50:31Z

ccusage performance comparison

PR SHA: 400e822d877f
Base SHA: 00b186588a64

This compares the PR package against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package	SHA	Execution setup	Bunx temp cache	Bunx warm median	Warm samples
Base pkg.pr.new	`00b186588a64`	660.8ms	550.6ms	33.3ms	3
PR pkg.pr.new	`400e822`	717.7ms	636.4ms	34.3ms	3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: 00b186588a64; PR package: 400e822. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`bunx -p <pkg> ccusage claude --offline --json`	1.01 GiB	585.1ms	576.0ms	1.02x	342.58 MiB	295.95 MiB	0.86x	1.72 GiB/s	1.75 GiB/s
`bunx -p <pkg> ccusage codex --offline --json`	1.01 GiB	378.8ms	383.8ms	0.99x	78.83 MiB	77.33 MiB	0.98x	2.66 GiB/s	2.62 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command	Runtime	Input	Median	Throughput	Samples
`claude --offline --json`	Package wrapper	1.01 GiB	570.9ms	1.76 GiB/s	1
`claude --offline --json`	Installed native binary	1.01 GiB	550.8ms	1.83 GiB/s	1
`codex --offline --json`	Package wrapper	1.01 GiB	377.8ms	2.66 GiB/s	1
`codex --offline --json`	Installed native binary	1.01 GiB	346.1ms	2.91 GiB/s	1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude daily --offline --json`	30.7ms	30.5ms	1.01x	43.48 MiB	43.73 MiB	1.01x	0.05 MiB/s	0.05 MiB/s
`claude session --offline --json`	30.7ms	30.5ms	1.00x	43.61 MiB	-	-	0.05 MiB/s	0.05 MiB/s
`codex daily --offline --json`	30.0ms	30.3ms	0.99x	43.61 MiB	43.61 MiB	1.00x	0.03 MiB/s	0.03 MiB/s
`codex session --offline --json`	30.2ms	30.6ms	0.99x	-	-	-	0.03 MiB/s	0.03 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude --offline --json`	1.01 GiB	561.8ms	563.5ms	1.00x	324.45 MiB	317.08 MiB	0.98x	1.79 GiB/s	1.79 GiB/s
`codex --offline --json`	1.01 GiB	381.5ms	370.5ms	1.03x	81.08 MiB	79.20 MiB	0.98x	2.64 GiB/s	2.72 GiB/s

Artifact size

Artifact	Base	PR	Delta	Ratio
packed `ccusage-*.tgz`	16.83 KiB	16.83 KiB	-0.00 KiB	1.00x
installed native package binary	3353.62 KiB	3353.62 KiB	+0.00 KiB	1.00x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

github-actions · 2026-06-08T21:50:39Z

ccusage performance comparison

PR SHA: 400e822d877f
Base SHA: 00b186588a64

This compares the Rust PR release binary against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package	SHA	Execution setup	Bunx temp cache	Bunx warm median	Warm samples
Base pkg.pr.new	`00b186588a64`	1.856s	1.031s	33.0ms	3
PR pkg.pr.new	`400e822`	967.4ms	835.1ms	32.9ms	3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: 00b186588a64; PR package: 400e822. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`bunx -p <pkg> ccusage claude --offline --json`	1.01 GiB	569.9ms	569.6ms	1.00x	320.83 MiB	319.08 MiB	0.99x	1.77 GiB/s	1.77 GiB/s
`bunx -p <pkg> ccusage codex --offline --json`	1.01 GiB	387.2ms	378.7ms	1.02x	73.33 MiB	79.33 MiB	1.08x	2.60 GiB/s	2.66 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command	Runtime	Input	Median	Throughput	Samples
`claude --offline --json`	Package wrapper	1.01 GiB	582.4ms	1.73 GiB/s	1
`claude --offline --json`	Installed native binary	1.01 GiB	547.8ms	1.84 GiB/s	1
`codex --offline --json`	Package wrapper	1.01 GiB	373.7ms	2.69 GiB/s	1
`codex --offline --json`	Installed native binary	1.01 GiB	343.7ms	2.93 GiB/s	1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude daily --offline --json`	31.4ms	4.3ms	7.29x	43.48 MiB	2.70 MiB	0.06x	0.05 MiB/s	0.36 MiB/s
`claude session --offline --json`	31.0ms	4.3ms	7.24x	43.61 MiB	2.70 MiB	0.06x	0.05 MiB/s	0.36 MiB/s
`codex daily --offline --json`	30.8ms	4.1ms	7.60x	43.73 MiB	2.70 MiB	0.06x	0.03 MiB/s	0.21 MiB/s
`codex session --offline --json`	30.7ms	4.0ms	7.67x	43.61 MiB	2.70 MiB	0.06x	0.03 MiB/s	0.21 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude --offline --json`	1.01 GiB	564.8ms	553.7ms	1.02x	320.95 MiB	332.45 MiB	1.04x	1.78 GiB/s	1.82 GiB/s
`codex --offline --json`	1.01 GiB	373.6ms	345.3ms	1.08x	82.08 MiB	76.45 MiB	0.93x	2.70 GiB/s	2.92 GiB/s

Artifact size

Artifact	Base	PR	Delta	Ratio
packed `ccusage-*.tgz`	16.83 KiB	16.83 KiB	-0.00 KiB	1.00x
installed native package binary	3353.62 KiB	3353.62 KiB	+0.00 KiB	1.00x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

ryoppippi · 2026-06-08T21:54:53Z

Source for the cache creation/storage change: Z.AI official pricing lists the text-model column "Cached Input Storage" as "Limited-time Free" for the GLM models touched here: GLM-5.1, GLM-5, GLM-5-Turbo, GLM-4.7, GLM-4.6, and GLM-4.5.

https://docs.z.ai/guides/overview/pricing

That is why this PR sets the built-in fallback `cache_create` rate to `0.0` while preserving the non-zero `Cached Input` / cache-read rates.

Include direct Z.AI provider entries from the pinned LiteLLM pricing snapshot so offline pricing can reuse upstream-maintained GLM metadata such as context limits instead of relying only on hand-written fallback entries. Keep a built-in override for zai/glm-4.5 because the pinned LiteLLM entry currently lacks prompt-cache pricing, while Z.AI official pricing lists cached input storage as free and cached input reads at the GLM 4.5 rate.

ryoppippi · 2026-06-08T21:59:55Z

Follow-up change: instead of relying only on handwritten GLM fallback pricing, this now embeds direct Z.AI provider entries from the pinned LiteLLM snapshot by allowing the `zai/` prefix in `build.rs`.

The built-in override remains only where the embedded source is currently insufficient: `zai/glm-4.5` lacks prompt-cache pricing in the pinned LiteLLM data, and `glm-5.1` / `glm-5-turbo` are still covered by fallback aliases when they are absent from the pinned direct Z.AI entries.

@coderabbitai review
@cubic-dev-ai review

cubic-dev-ai · 2026-06-08T22:00:04Z

Follow-up change: instead of relying only on handwritten GLM fallback pricing, this now embeds direct Z.AI provider entries from the pinned LiteLLM snapshot by allowing the `zai/` prefix in `build.rs`.

The built-in override remains only where the embedded source is currently insufficient: `zai/glm-4.5` lacks prompt-cache pricing in the pinned LiteLLM data, and `glm-5.1` / `glm-5-turbo` are still covered by fallback aliases when they are absent from the pinned direct Z.AI entries.

@coderabbitai review
...

@ryoppippi I have started the AI code review. It will take a few minutes to complete.

coderabbitai · 2026-06-08T22:01:05Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
gh pr diff 1235 --repo ryoppippi/ccusage

Length of output: 4148

Follow-up Review: `zai/` prefix embedding + `zai/glm-4.5` override

The follow-up is clean and well-reasoned. Here's the breakdown:

`build.rs` — `is_embedded_model()`

|| model.starts_with("zai/")

✅ Minimal, scoped change. zai/ entries from the pinned LiteLLM snapshot will now be embedded directly, replacing the need for purely handwritten fallbacks where the data exists.

`pricing.rs` — `zai/glm-4.5` explicit override

self.entries.insert("zai/glm-4.5".to_string(), glm_base);

✅ Correct gap-fill. Since the LiteLLM snapshot lacks prompt-cache pricing for zai/glm-4.5, the handwritten entry (using glm_base, which now has cache_create: 0.0 and a non-zero cache_read) is the right fallback. This ensures the zai/-prefixed form gets the same cache-read-capable pricing as the bare glm-4.5 alias.

Tests

✅ The context_limit("zai/glm-5") == Some(200_000) assertion is a particularly good addition — it implicitly validates that the build.rs change is actually embedding the LiteLLM snapshot entry for zai/glm-5 (not just hitting a manual fallback), since the context limit comes from the snapshot data, not from the pricing.rs handwritten entries.

✅ context_limit("zai/glm-4.5") == Some(128_000) covers the explicit override path, confirming the 128k vs 200k distinction between the 4.x and 5.x model generations.

Overall: The two-layer approach (prefer embedded LiteLLM snapshot data; handwrite only where the snapshot is deficient) is sound and well-executed. LGTM 🐇

[approve]

cubic-dev-ai

1 issue found across 2 files

_{Reply with feedback, questions, or to request a fix.

Re-trigger cubic}

pullfrog

✅ No new issues found.

Reviewed changes — embeds Z.AI zai/* LiteLLM entries and adds a built-in fallback for zai/glm-4.5 so all GLM pricing variants work correctly in offline mode.

Added zai/ to is_embedded_model in build.rs — LiteLLM entries for zai/glm-5, zai/glm-4.6, zai/glm-4.7 all carry cache_creation_input_token_cost: 0 directly, so no derived fallback is needed.
Added zai/glm-4.5 built-in fallback with cache_create: 0.0 — the LiteLLM entry for zai/glm-4.5 lacks an explicit cache-create field, so the built-in fallback covers it.
Expanded tests to assert cache_create == 0.0 across all six models, plus zai/glm-4.5 pricing and zai/glm-5 and zai/glm-4.5 context limits.

^{｜ View workflow run ｜ Using DeepSeek Pro (free via Pullfrog for OSS) ｜ 𝕏}

github-actions · 2026-06-08T22:15:26Z

ccusage performance comparison

PR SHA: 52b8749689b0
Base SHA: 00b186588a64

This compares the PR package against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package	SHA	Execution setup	Bunx temp cache	Bunx warm median	Warm samples
Base pkg.pr.new	`00b186588a64`	1.220s	1.097s	30.6ms	3
PR pkg.pr.new	`52b8749`	889.9ms	852.2ms	30.2ms	3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: 00b186588a64; PR package: 52b8749. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`bunx -p <pkg> ccusage claude --offline --json`	1.01 GiB	544.5ms	528.0ms	1.03x	321.33 MiB	346.45 MiB	1.08x	1.85 GiB/s	1.91 GiB/s
`bunx -p <pkg> ccusage codex --offline --json`	1.01 GiB	362.6ms	368.7ms	0.98x	81.45 MiB	81.58 MiB	1.00x	2.78 GiB/s	2.73 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command	Runtime	Input	Median	Throughput	Samples
`claude --offline --json`	Package wrapper	1.01 GiB	541.0ms	1.86 GiB/s	1
`claude --offline --json`	Installed native binary	1.01 GiB	514.8ms	1.96 GiB/s	1
`codex --offline --json`	Package wrapper	1.01 GiB	367.9ms	2.74 GiB/s	1
`codex --offline --json`	Installed native binary	1.01 GiB	344.4ms	2.92 GiB/s	1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude daily --offline --json`	28.2ms	28.3ms	1.00x	43.86 MiB	43.73 MiB	1.00x	0.05 MiB/s	0.05 MiB/s
`claude session --offline --json`	28.5ms	28.7ms	0.99x	43.73 MiB	43.61 MiB	1.00x	0.05 MiB/s	0.05 MiB/s
`codex daily --offline --json`	28.3ms	27.9ms	1.01x	43.48 MiB	43.48 MiB	1.00x	0.03 MiB/s	0.03 MiB/s
`codex session --offline --json`	27.9ms	28.3ms	0.99x	43.73 MiB	43.61 MiB	1.00x	0.03 MiB/s	0.03 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude --offline --json`	1.01 GiB	547.5ms	540.0ms	1.01x	322.08 MiB	326.45 MiB	1.01x	1.84 GiB/s	1.86 GiB/s
`codex --offline --json`	1.01 GiB	375.2ms	385.9ms	0.97x	71.95 MiB	82.83 MiB	1.15x	2.68 GiB/s	2.61 GiB/s

Artifact size

Artifact	Base	PR	Delta	Ratio
packed `ccusage-*.tgz`	16.83 KiB	16.83 KiB	+0.00 KiB	1.00x
installed native package binary	3353.62 KiB	3353.74 KiB	+0.13 KiB	1.00x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

github-actions · 2026-06-08T22:15:55Z

ccusage performance comparison

PR SHA: 52b8749689b0
Base SHA: 00b186588a64

This compares the Rust PR release binary against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package	SHA	Execution setup	Bunx temp cache	Bunx warm median	Warm samples
Base pkg.pr.new	`00b186588a64`	865.1ms	633.7ms	32.9ms	3
PR pkg.pr.new	`52b8749`	861.9ms	711.1ms	31.8ms	3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: 00b186588a64; PR package: 52b8749. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`bunx -p <pkg> ccusage claude --offline --json`	1.01 GiB	574.7ms	564.4ms	1.02x	324.08 MiB	310.83 MiB	0.96x	1.75 GiB/s	1.78 GiB/s
`bunx -p <pkg> ccusage codex --offline --json`	1.01 GiB	375.9ms	388.7ms	0.97x	82.20 MiB	72.20 MiB	0.88x	2.68 GiB/s	2.59 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command	Runtime	Input	Median	Throughput	Samples
`claude --offline --json`	Package wrapper	1.01 GiB	554.3ms	1.82 GiB/s	1
`claude --offline --json`	Installed native binary	1.01 GiB	528.0ms	1.91 GiB/s	1
`codex --offline --json`	Package wrapper	1.01 GiB	379.0ms	2.66 GiB/s	1
`codex --offline --json`	Installed native binary	1.01 GiB	353.9ms	2.84 GiB/s	1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude daily --offline --json`	30.0ms	4.1ms	7.25x	43.73 MiB	2.70 MiB	0.06x	0.05 MiB/s	0.37 MiB/s
`claude session --offline --json`	30.3ms	4.2ms	7.21x	43.61 MiB	2.83 MiB	0.06x	0.05 MiB/s	0.37 MiB/s
`codex daily --offline --json`	30.6ms	3.9ms	7.94x	43.61 MiB	2.70 MiB	0.06x	0.03 MiB/s	0.22 MiB/s
`codex session --offline --json`	30.2ms	3.8ms	8.00x	43.48 MiB	2.70 MiB	0.06x	0.03 MiB/s	0.23 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude --offline --json`	1.01 GiB	571.8ms	540.2ms	1.06x	343.45 MiB	341.45 MiB	0.99x	1.76 GiB/s	1.86 GiB/s
`codex --offline --json`	1.01 GiB	375.4ms	351.4ms	1.07x	74.95 MiB	79.70 MiB	1.06x	2.68 GiB/s	2.86 GiB/s

Artifact size

Artifact	Base	PR	Delta	Ratio
packed `ccusage-*.tgz`	16.83 KiB	16.83 KiB	+0.00 KiB	1.00x
installed native package binary	3353.62 KiB	3353.74 KiB	+0.13 KiB	1.00x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

Keep the broader embedded Z.AI LiteLLM snapshot, but override GLM entries whose pinned LiteLLM records currently omit prompt-cache prices. Without these overrides, ccusage would synthesize cache creation at input * 1.25 and overstate offline costs for GLM 4.5 variants. The overrides use the official Z.AI pricing table, preserving zero cache creation/storage and the published cached-input read rates for the affected direct Z.AI GLM models.

ryoppippi · 2026-06-08T22:20:26Z

Follow-up pushed for the Cubic finding: GLM entries with missing LiteLLM cache fields now get official Z.AI cache overrides while keeping the embedded `zai/` snapshot. @coderabbitai review @cubic-dev-ai review

coderabbitai · 2026-06-08T22:21:43Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
gh pr diff 1235 --repo ryoppippi/ccusage

Length of output: 6252

Follow-up Review: GLM cache-field overrides for LiteLLM-snapshot gaps

The change is clean and well-structured. Here's the breakdown:

`pricing.rs` — `glm_pricing` closure

let glm_pricing = |input: f64, output: f64, cache_read: f64| Pricing {
    input,
    output,
    cache_create: 0.0,
    cache_read,
    ...
};

✅ Excellent refactor. Encoding cache_create: 0.0 as an invariant of the closure makes it structurally impossible to accidentally set a non-zero cache-create rate for any new GLM entry. This is a better design than the previous per-model ..glm_base spread.

New `zai/` entries (the "Cubic finding")

Entry	input	output	cache_read	Notes
`zai/glm-4.5-x`	2.2e-6	8.9e-6	0.45e-6	Enhanced variant
`zai/glm-4.5-air`	0.2e-6	1.1e-6	0.03e-6	Efficient variant
`zai/glm-4.5-airx`	1.1e-6	4.5e-6	0.22e-6	Enhanced-air variant
`zai/glm-4.5v`	0.6e-6	1.8e-6	0.11e-6	Vision variant
`zai/glm-4-32b-0414-128k`	0.1e-6	0.1e-6	0.0	Free cache reads
`zai/glm-4.5-flash`	0.0	0.0	0.0	Free tier

✅ All six correctly get cache_create: 0.0 via the closure.

Tests

✅ embedded_pricing_patches_z_ai_glm_entries_without_litellm_cache_rates covers glm-4.5-air, glm-4.5-x, and glm-4.5v with full rate assertions including cache_create == 0.0.

One minor gap: zai/glm-4.5-airx, zai/glm-4-32b-0414-128k, and zai/glm-4.5-flash are not covered by the new test. These are lower-risk (two are zero-cost models, airx follows the same pattern as x/air), so it's not a blocker — but adding assertions for them would complete the picture.

Overall: The glm_pricing closure refactor is a genuine improvement over the previous approach. The gap-filling for LiteLLM-snapshot-deficient entries is correct and well-motivated. LGTM 🐇

[approve]

pullfrog

✅ No new issues found.

Reviewed changes — refactors GLM pricing with a shared closure that bakes cache_create: 0.0, adds six new zai/glm-4.5-* variant entries, and expands test coverage.

Extracted glm_pricing closure in put_builtin_pricing() — bakes cache_create: 0.0, cache_read_explicit: true, and all _above_200k: None into every GLM variant, preventing per-model cache-create drift.
Added 6 new zai/glm-4.5-* variant entries — zai/glm-4.5-x, zai/glm-4.5-air, zai/glm-4.5-airx, zai/glm-4.5v, zai/glm-4-32b-0414-128k, zai/glm-4.5-flash, all with cache_create: 0.0. Pricing verified against the Z.AI pricing page.
Removed explicit per-model cache_create overrides from glm-5, glm-5-turbo, glm-5.1 — they now inherit 0.0 from glm_base via struct update syntax.
Added embedded_pricing_patches_z_ai_glm_entries_without_litellm_cache_rates test covering zai/glm-4.5-air, zai/glm-4.5-x, zai/glm-4.5v.
Expanded embedded_pricing_includes_z_ai_glm_models_for_offline_reports with cache_create == 0.0 assertions for all models plus context-limit checks for zai/glm-5 and zai/glm-4.5.

^{｜ View workflow run ｜ Using DeepSeek Pro (free via Pullfrog for OSS) ｜ 𝕏}

cubic-dev-ai · 2026-06-08T22:28:56Z

Follow-up pushed for the Cubic finding: GLM entries with missing LiteLLM cache fields now get official Z.AI cache overrides while keeping the embedded `zai/` snapshot. @coderabbitai review @cubic-dev-ai review

@ryoppippi I have started the AI code review. It will take a few minutes to complete.

cubic-dev-ai

No issues found across 2 files

_{Re-trigger cubic}

github-actions · 2026-06-08T22:35:24Z

ccusage performance comparison

PR SHA: 43271a93d7d9
Base SHA: 00b186588a64

This compares the Rust PR release binary against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package	SHA	Execution setup	Bunx temp cache	Bunx warm median	Warm samples
Base pkg.pr.new	`00b186588a64`	591.8ms	709.5ms	31.1ms	3
PR pkg.pr.new	`43271a9`	778.8ms	1.133s	32.6ms	3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: 00b186588a64; PR package: 43271a9. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`bunx -p <pkg> ccusage claude --offline --json`	1.01 GiB	575.1ms	548.8ms	1.05x	313.58 MiB	305.08 MiB	0.97x	1.75 GiB/s	1.83 GiB/s
`bunx -p <pkg> ccusage codex --offline --json`	1.01 GiB	383.9ms	383.0ms	1.00x	73.20 MiB	72.45 MiB	0.99x	2.62 GiB/s	2.63 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command	Runtime	Input	Median	Throughput	Samples
`claude --offline --json`	Package wrapper	1.01 GiB	569.6ms	1.77 GiB/s	1
`claude --offline --json`	Installed native binary	1.01 GiB	523.7ms	1.92 GiB/s	1
`codex --offline --json`	Package wrapper	1.01 GiB	363.7ms	2.77 GiB/s	1
`codex --offline --json`	Installed native binary	1.01 GiB	355.2ms	2.83 GiB/s	1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude daily --offline --json`	29.5ms	4.1ms	7.23x	43.61 MiB	2.70 MiB	0.06x	0.05 MiB/s	0.38 MiB/s
`claude session --offline --json`	29.6ms	4.0ms	7.49x	43.48 MiB	2.70 MiB	0.06x	0.05 MiB/s	0.39 MiB/s
`codex daily --offline --json`	28.7ms	3.7ms	7.84x	43.61 MiB	2.70 MiB	0.06x	0.03 MiB/s	0.23 MiB/s
`codex session --offline --json`	29.1ms	3.7ms	7.85x	43.73 MiB	2.70 MiB	0.06x	0.03 MiB/s	0.23 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude --offline --json`	1.01 GiB	549.8ms	508.1ms	1.08x	325.20 MiB	296.58 MiB	0.91x	1.83 GiB/s	1.98 GiB/s
`codex --offline --json`	1.01 GiB	364.2ms	333.3ms	1.09x	80.45 MiB	74.33 MiB	0.92x	2.76 GiB/s	3.02 GiB/s

Artifact size

Artifact	Base	PR	Delta	Ratio
packed `ccusage-*.tgz`	16.83 KiB	16.83 KiB	+0.00 KiB	1.00x
installed native package binary	3353.62 KiB	3353.62 KiB	+0.00 KiB	1.00x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

github-actions · 2026-06-08T22:35:25Z

ccusage performance comparison

PR SHA: 43271a93d7d9
Base SHA: 00b186588a64

This compares the PR package against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package	SHA	Execution setup	Bunx temp cache	Bunx warm median	Warm samples
Base pkg.pr.new	`00b186588a64`	589.0ms	672.7ms	33.2ms	3
PR pkg.pr.new	`43271a9`	1.017s	761.3ms	33.0ms	3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: 00b186588a64; PR package: 43271a9. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`bunx -p <pkg> ccusage claude --offline --json`	1.01 GiB	574.6ms	574.6ms	1.00x	320.33 MiB	330.58 MiB	1.03x	1.75 GiB/s	1.75 GiB/s
`bunx -p <pkg> ccusage codex --offline --json`	1.01 GiB	378.2ms	381.6ms	0.99x	82.20 MiB	78.08 MiB	0.95x	2.66 GiB/s	2.64 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command	Runtime	Input	Median	Throughput	Samples
`claude --offline --json`	Package wrapper	1.01 GiB	588.5ms	1.71 GiB/s	1
`claude --offline --json`	Installed native binary	1.01 GiB	531.2ms	1.90 GiB/s	1
`codex --offline --json`	Package wrapper	1.01 GiB	369.8ms	2.72 GiB/s	1
`codex --offline --json`	Installed native binary	1.01 GiB	343.2ms	2.93 GiB/s	1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude daily --offline --json`	31.2ms	30.9ms	1.01x	43.73 MiB	43.61 MiB	1.00x	0.05 MiB/s	0.05 MiB/s
`claude session --offline --json`	31.0ms	30.9ms	1.00x	43.48 MiB	43.48 MiB	1.00x	0.05 MiB/s	0.05 MiB/s
`codex daily --offline --json`	30.1ms	30.6ms	0.98x	43.48 MiB	43.73 MiB	1.01x	0.03 MiB/s	0.03 MiB/s
`codex session --offline --json`	31.1ms	30.4ms	1.02x	43.48 MiB	43.48 MiB	1.00x	0.03 MiB/s	0.03 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude --offline --json`	1.01 GiB	566.5ms	550.3ms	1.03x	331.83 MiB	313.45 MiB	0.94x	1.78 GiB/s	1.83 GiB/s
`codex --offline --json`	1.01 GiB	374.1ms	369.6ms	1.01x	82.08 MiB	73.08 MiB	0.89x	2.69 GiB/s	2.72 GiB/s

Artifact size

Artifact	Base	PR	Delta	Ratio
packed `ccusage-*.tgz`	16.83 KiB	16.83 KiB	+0.00 KiB	1.00x
installed native package binary	3353.62 KiB	3353.62 KiB	+0.00 KiB	1.00x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

cubic-dev-ai Bot reviewed Jun 8, 2026

View reviewed changes

pullfrog Bot approved these changes Jun 8, 2026

View reviewed changes

cubic-dev-ai Bot reviewed Jun 8, 2026

View reviewed changes

Comment thread rust/crates/ccusage/build.rs

pullfrog Bot approved these changes Jun 8, 2026

View reviewed changes

cubic-dev-ai Bot reviewed Jun 8, 2026

View reviewed changes

ryoppippi merged commit 83816d5 into main Jun 8, 2026
40 checks passed

ryoppippi deleted the codex/fix-glm-cache-create branch June 8, 2026 22:36

Uh oh!

Conversation

ryoppippi commented Jun 8, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by cubic

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

ryoppippi commented Jun 8, 2026

Uh oh!

cubic-dev-ai Bot commented Jun 8, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying with Cloudflare Workers

Uh oh!

coderabbitai Bot commented Jun 8, 2026

PR Review: fix(pricing): do not charge GLM cache creation

Changes Reviewed

One Minor Observation

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot left a comment

Choose a reason for hiding this comment

Uh oh!

pkg-pr-new Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 8, 2026

ccusage performance comparison

Package runner startup

Cached bunx execution performance

Package runtime diagnostics

Committed fixture performance

Large real-world-shaped fixture performance

Artifact size

Uh oh!

github-actions Bot commented Jun 8, 2026

ccusage performance comparison

Package runner startup

Cached bunx execution performance

Package runtime diagnostics

Committed fixture performance

Large real-world-shaped fixture performance

Artifact size

Uh oh!

ryoppippi commented Jun 8, 2026

Uh oh!

ryoppippi commented Jun 8, 2026

Uh oh!

cubic-dev-ai Bot commented Jun 8, 2026

Uh oh!

coderabbitai Bot commented Jun 8, 2026

Follow-up Review: zai/ prefix embedding + zai/glm-4.5 override

build.rs — is_embedded_model()

pricing.rs — zai/glm-4.5 explicit override

Tests

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pullfrog Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 8, 2026

ccusage performance comparison

Package runner startup

ryoppippi commented Jun 8, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading

cloudflare-workers-and-pages Bot commented Jun 8, 2026 •

edited

Loading

PR Review: `fix(pricing): do not charge GLM cache creation`

pkg-pr-new Bot commented Jun 8, 2026 •

edited

Loading

Follow-up Review: `zai/` prefix embedding + `zai/glm-4.5` override

`build.rs` — `is_embedded_model()`

`pricing.rs` — `zai/glm-4.5` explicit override

cubic-dev-ai Bot left a comment •

edited

Loading

`pricing.rs` — `glm_pricing` closure

New `zai/` entries (the "Cubic finding")