fix(pricing): do not charge GLM cache creation#1235
Conversation
Z.AI lists cached input storage for these GLM models as free, and the LiteLLM entries that expose the direct Z.AI pricing use zero cache creation cost. The built-in fallback was using the generic input * 1.25 cache creation assumption, which could overstate offline costs for GLM cache creation tokens. Set the built-in GLM cache creation rate to zero and cover all six fallback models in the offline pricing test.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughZero the shared GLM pricing ChangesGLM Model Pricing Cache-Create Configuration
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
@ryoppippi I have started the AI code review. It will take a few minutes to complete. |
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ✅ Deployment successful! View logs |
ccusage-guide | 43271a9 | Commit Preview URL Branch Preview URL |
Jun 08 2026, 10:20 PM |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Get PR details and diff
gh pr view 1235 --repo ryoppippi/ccusage --json title,body,files,additions,deletions
echo "--- PR DIFF ---"
gh pr diff 1235 --repo ryoppippi/ccusageLength of output: 5072 PR Review:
|
There was a problem hiding this comment.
✅ No new issues found.
Reviewed changes — fixes GLM offline cost calculation by setting cache_create to zero to match the Z.AI pricing page, which lists Cached Input Storage as "Limited-time Free" for all GLM text models.
- Set
glm_base.cache_createto0.0— matches the upstream Z.AI pricing table where all GLM text models show "Limited-time Free" for Cached Input Storage. - Removed per-model
cache_createoverrides fromglm-5,glm-5-turbo, andglm-5.1— they now inherit0.0fromglm_basevia struct update syntax. - Added
glm-4.5andglm-4.6entries usingglm_basepricing, consistent with the other 4.x models. - Expanded tests to assert
cache_create == 0.0across all six GLM models.
DeepSeek Pro (free via Pullfrog for OSS) | 𝕏
ccusage
@ccusage/ccusage-darwin-arm64
@ccusage/ccusage-darwin-x64
@ccusage/ccusage-linux-arm64
@ccusage/ccusage-linux-x64
@ccusage/ccusage-win32-arm64
@ccusage/ccusage-win32-x64
commit: |
ccusage performance comparisonPR SHA: This compares the PR package against the configured base package on the same CI runner. Package runner startupExecution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one
Cached bunx execution performanceRuns the same large fixture through Fixtures: Claude
Package runtime diagnosticsCompares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself. Fixtures: Claude
Committed fixture performanceCommitted small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage. Fixtures: Claude
Large real-world-shaped fixture performanceGenerated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures. Fixtures: Claude
Artifact size
Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees. |
ccusage performance comparisonPR SHA: This compares the Rust PR release binary against the configured base package on the same CI runner. Package runner startupExecution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one
Cached bunx execution performanceRuns the same large fixture through Fixtures: Claude
Package runtime diagnosticsCompares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself. Fixtures: Claude
Committed fixture performanceCommitted small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage. Fixtures: Claude
Large real-world-shaped fixture performanceGenerated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures. Fixtures: Claude
Artifact size
Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees. |
|
Source for the cache creation/storage change: Z.AI official pricing lists the text-model column "Cached Input Storage" as "Limited-time Free" for the GLM models touched here: GLM-5.1, GLM-5, GLM-5-Turbo, GLM-4.7, GLM-4.6, and GLM-4.5. https://docs.z.ai/guides/overview/pricing That is why this PR sets the built-in fallback `cache_create` rate to `0.0` while preserving the non-zero `Cached Input` / cache-read rates. |
Include direct Z.AI provider entries from the pinned LiteLLM pricing snapshot so offline pricing can reuse upstream-maintained GLM metadata such as context limits instead of relying only on hand-written fallback entries. Keep a built-in override for zai/glm-4.5 because the pinned LiteLLM entry currently lacks prompt-cache pricing, while Z.AI official pricing lists cached input storage as free and cached input reads at the GLM 4.5 rate.
|
Follow-up change: instead of relying only on handwritten GLM fallback pricing, this now embeds direct Z.AI provider entries from the pinned LiteLLM snapshot by allowing the `zai/` prefix in `build.rs`. The built-in override remains only where the embedded source is currently insufficient: `zai/glm-4.5` lacks prompt-cache pricing in the pinned LiteLLM data, and `glm-5.1` / `glm-5-turbo` are still covered by fallback aliases when they are absent from the pinned direct Z.AI entries. @coderabbitai review |
@ryoppippi I have started the AI code review. It will take a few minutes to complete. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
gh pr diff 1235 --repo ryoppippi/ccusageLength of output: 4148 Follow-up Review:
|
There was a problem hiding this comment.
1 issue found across 2 files
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
There was a problem hiding this comment.
✅ No new issues found.
Reviewed changes — embeds Z.AI zai/* LiteLLM entries and adds a built-in fallback for zai/glm-4.5 so all GLM pricing variants work correctly in offline mode.
- Added
zai/tois_embedded_modelinbuild.rs— LiteLLM entries forzai/glm-5,zai/glm-4.6,zai/glm-4.7all carrycache_creation_input_token_cost: 0directly, so no derived fallback is needed. - Added
zai/glm-4.5built-in fallback withcache_create: 0.0— the LiteLLM entry forzai/glm-4.5lacks an explicit cache-create field, so the built-in fallback covers it. - Expanded tests to assert
cache_create == 0.0across all six models, pluszai/glm-4.5pricing andzai/glm-5andzai/glm-4.5context limits.
DeepSeek Pro (free via Pullfrog for OSS) | 𝕏
ccusage performance comparisonPR SHA: This compares the PR package against the configured base package on the same CI runner. Package runner startupExecution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one
Cached bunx execution performanceRuns the same large fixture through Fixtures: Claude
Package runtime diagnosticsCompares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself. Fixtures: Claude
Committed fixture performanceCommitted small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage. Fixtures: Claude
Large real-world-shaped fixture performanceGenerated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures. Fixtures: Claude
Artifact size
Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees. |
ccusage performance comparisonPR SHA: This compares the Rust PR release binary against the configured base package on the same CI runner. Package runner startupExecution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one
Cached bunx execution performanceRuns the same large fixture through Fixtures: Claude
Package runtime diagnosticsCompares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself. Fixtures: Claude
Committed fixture performanceCommitted small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage. Fixtures: Claude
Large real-world-shaped fixture performanceGenerated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures. Fixtures: Claude
Artifact size
Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees. |
Keep the broader embedded Z.AI LiteLLM snapshot, but override GLM entries whose pinned LiteLLM records currently omit prompt-cache prices. Without these overrides, ccusage would synthesize cache creation at input * 1.25 and overstate offline costs for GLM 4.5 variants. The overrides use the official Z.AI pricing table, preserving zero cache creation/storage and the published cached-input read rates for the affected direct Z.AI GLM models.
|
Follow-up pushed for the Cubic finding: GLM entries with missing LiteLLM cache fields now get official Z.AI cache overrides while keeping the embedded `zai/` snapshot. @coderabbitai review @cubic-dev-ai review |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
gh pr diff 1235 --repo ryoppippi/ccusageLength of output: 6252 Follow-up Review: GLM cache-field overrides for LiteLLM-snapshot gapsThe change is clean and well-structured. Here's the breakdown:
|
| Entry | input | output | cache_read | Notes |
|---|---|---|---|---|
zai/glm-4.5-x |
2.2e-6 | 8.9e-6 | 0.45e-6 | Enhanced variant |
zai/glm-4.5-air |
0.2e-6 | 1.1e-6 | 0.03e-6 | Efficient variant |
zai/glm-4.5-airx |
1.1e-6 | 4.5e-6 | 0.22e-6 | Enhanced-air variant |
zai/glm-4.5v |
0.6e-6 | 1.8e-6 | 0.11e-6 | Vision variant |
zai/glm-4-32b-0414-128k |
0.1e-6 | 0.1e-6 | 0.0 | Free cache reads |
zai/glm-4.5-flash |
0.0 | 0.0 | 0.0 | Free tier |
✅ All six correctly get cache_create: 0.0 via the closure.
Tests
✅ embedded_pricing_patches_z_ai_glm_entries_without_litellm_cache_rates covers glm-4.5-air, glm-4.5-x, and glm-4.5v with full rate assertions including cache_create == 0.0.
One minor gap: zai/glm-4.5-airx, zai/glm-4-32b-0414-128k, and zai/glm-4.5-flash are not covered by the new test. These are lower-risk (two are zero-cost models, airx follows the same pattern as x/air), so it's not a blocker — but adding assertions for them would complete the picture.
Overall: The glm_pricing closure refactor is a genuine improvement over the previous approach. The gap-filling for LiteLLM-snapshot-deficient entries is correct and well-motivated. LGTM 🐇
[approve]
There was a problem hiding this comment.
✅ No new issues found.
Reviewed changes — refactors GLM pricing with a shared closure that bakes cache_create: 0.0, adds six new zai/glm-4.5-* variant entries, and expands test coverage.
- Extracted
glm_pricingclosure input_builtin_pricing()— bakescache_create: 0.0,cache_read_explicit: true, and all_above_200k: Noneinto every GLM variant, preventing per-model cache-create drift. - Added 6 new
zai/glm-4.5-*variant entries —zai/glm-4.5-x,zai/glm-4.5-air,zai/glm-4.5-airx,zai/glm-4.5v,zai/glm-4-32b-0414-128k,zai/glm-4.5-flash, all withcache_create: 0.0. Pricing verified against the Z.AI pricing page. - Removed explicit per-model
cache_createoverrides fromglm-5,glm-5-turbo,glm-5.1— they now inherit0.0fromglm_basevia struct update syntax. - Added
embedded_pricing_patches_z_ai_glm_entries_without_litellm_cache_ratestest coveringzai/glm-4.5-air,zai/glm-4.5-x,zai/glm-4.5v. - Expanded
embedded_pricing_includes_z_ai_glm_models_for_offline_reportswithcache_create == 0.0assertions for all models plus context-limit checks forzai/glm-5andzai/glm-4.5.
DeepSeek Pro (free via Pullfrog for OSS) | 𝕏
@ryoppippi I have started the AI code review. It will take a few minutes to complete. |
ccusage performance comparisonPR SHA: This compares the Rust PR release binary against the configured base package on the same CI runner. Package runner startupExecution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one
Cached bunx execution performanceRuns the same large fixture through Fixtures: Claude
Package runtime diagnosticsCompares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself. Fixtures: Claude
Committed fixture performanceCommitted small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage. Fixtures: Claude
Large real-world-shaped fixture performanceGenerated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures. Fixtures: Claude
Artifact size
Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees. |
ccusage performance comparisonPR SHA: This compares the PR package against the configured base package on the same CI runner. Package runner startupExecution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one
Cached bunx execution performanceRuns the same large fixture through Fixtures: Claude
Package runtime diagnosticsCompares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself. Fixtures: Claude
Committed fixture performanceCommitted small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage. Fixtures: Claude
Large real-world-shaped fixture performanceGenerated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures. Fixtures: Claude
Artifact size
Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees. |

Fixes the Z.AI GLM built-in fallback pricing added in #1225 so cache creation tokens are not charged in offline cost calculations.
Z.AI lists cached input storage for these GLM models as free, and the corresponding LiteLLM direct Z.AI entries use zero cache creation cost. The previous fallback used the generic cache creation assumption and could overstate GLM costs when cache creation tokens were present.
Testing:
Summary by cubic
Stop charging cache creation for Z.AI GLM models and embed direct
zai/*entries fromLiteLLMso offline pricing matches Z.AI rates and context limits. Also patch GLM 4.5 variant gaps so cached-input reads and free cache creation are applied correctly, fixing overstated cost reports.cache_createto 0.0 in fallback pricing (4.5/4.6/4.7/5/5-turbo/5.1).zai/*pricing snapshot and overridezai/glm-4.5and its variants (-x,-air,-airx,v,-4-32b-0414-128k,-flash) with zero cache creation and official cache-read rates; treatzai/*as embedded to retain upstream context limits.Written for commit 43271a9. Summary will update on new commits.
Summary by CodeRabbit
Bug Fixes
Tests