Skip to content

feat(cost): split cache creation pricing by 5m/1h duration#1221

Merged
ryoppippi merged 7 commits into
mainfrom
pullfrog/899-split-cache-creation-pricing
Jun 8, 2026
Merged

feat(cost): split cache creation pricing by 5m/1h duration#1221
ryoppippi merged 7 commits into
mainfrom
pullfrog/899-split-cache-creation-pricing

Conversation

@pullfrog

@pullfrog pullfrog Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Closes #899

Problem

ccusage prices all cache_creation_input_tokens at a single 1.25x base input rate (5-minute cache write), but Claude Code predominantly uses 1-hour caching, which Anthropic prices at 2x base input.

What Claude Code JSONL provides

{
  "usage": {
    "cache_creation_input_tokens": 23566,
    "cache_creation": {
      "ephemeral_5m_input_tokens": 0,
      "ephemeral_1h_input_tokens": 23566
    }
  }
}

Fix

  • Parse cache_creation.ephemeral_5m_input_tokens and ephemeral_1h_input_tokens from JSONL usage records
  • Price 5m tokens at existing cache_create rate (1.25x), 1h tokens at input * 2.0 (matching Anthropic pricing)
  • Fall back to flat cache_creation_input_tokens when breakdown is absent (older records, non-Claude agents)
  • Fix token count aggregation in 6 locations (total_usage_tokens, TokenCounts::add_usage, usage_token_total, daily_usage_token_total, both accumulator breakdowns)

Impact

For users with mixed 5m/1h cache creation (the vast majority of Claude Code users), costs will be ~20-60% higher than previously reported, matching actual billing.

Pullfrog  | View workflow run | via Pullfrog | Using DeepSeek Pro (free via Pullfrog for OSS) | 𝕏


Summary by cubic

Split cache creation pricing into 5-minute and 1-hour durations to match Anthropic prompt caching rates, aligning Claude Code costs with actual billing. Closes #899 by charging 1h cache writes at 2x base input.

  • New Features

    • Add optional usage.cache_creation with ephemeral_5m_input_tokens and ephemeral_1h_input_tokens; parse from Claude Code JSONL and default to None in other adapters.
    • Price 5m tokens at existing cache_create rate (1.25x); price 1h tokens at input * 2.0, including above-200k tiers.
    • Fallback to cache_creation_input_tokens when breakdown is missing (older records, non-Claude agents).
    • Update cost mode guide and formula; add regression tests for duration pricing, fallback, and JSON parsing.
  • Bug Fixes

    • Fix token totals to include cache creation duration by using a single cache_creation_token_count() across summaries, daily reports, and calculators.

Written for commit 2d20874. Summary will update on new commits.

Review in cubic

pullfrog Bot added 5 commits June 6, 2026 13:49
Claude Code JSONL includes a `cache_creation` breakdown with
`ephemeral_5m_input_tokens` and `ephemeral_1h_input_tokens`.
Previously all cache creation tokens were priced at the 5-minute
rate (1.25x base input), but Claude Code predominantly uses 1-hour
caching (2x base input).

- Parse `cache_creation.ephemeral_5m_input_tokens` and
  `ephemeral_1h_input_tokens` from JSONL usage records
- Price 5m tokens at existing cache_create rate, 1h tokens at
  2x base input (matching Anthropic pricing)
- Fall back to flat `cache_creation_input_tokens` when breakdown
  is absent (older records, non-Claude agents)
- Fix token count aggregation to include breakdown tokens

Closes #899
@pullfrog pullfrog Bot requested a review from ryoppippi June 6, 2026 13:54
@pullfrog

pullfrog Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

Pullfrog  | View workflow run | via Pullfrog | Using DeepSeek Pro (free via Pullfrog for OSS) | 𝕏

@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jun 6, 2026

Copy link
Copy Markdown

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
ccusage-guide 2d20874 Commit Preview URL

Branch Preview URL
Jun 08 2026, 09:13 PM

@ryoppippi

Copy link
Copy Markdown
Owner

@coderabbitai review it

@coderabbitai

coderabbitai Bot commented Jun 6, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get the PR diff
gh pr diff 1221 --repo ryoppippi/ccusage

echo "=== PR FILES CHANGED ==="
gh pr view 1221 --repo ryoppippi/ccusage --json files --jq '.files[].path'

Length of output: 32147


🏁 Script executed:

#!/bin/bash
# Check if TokenUsageRaw derives Copy and if it's serialized anywhere
echo "=== TokenUsageRaw full definition ==="
rg -n "struct TokenUsageRaw" -A 20 rust/crates/ccusage/src/types.rs

echo ""
echo "=== Speed type definition ==="
rg -n "struct Speed|enum Speed" -A 10 rust/crates/ccusage/src/types.rs

echo ""
echo "=== Serialize on TokenUsageRaw ==="
rg -n "Serialize" rust/crates/ccusage/src/types.rs

echo ""
echo "=== Serialize usage on TokenUsageRaw ==="
rg -rn "serde::Serialize|impl Serialize" rust/crates/ccusage/src/

echo ""
echo "=== Claude JSONL parsing of cache_creation field ==="
rg -n "cache_creation" rust/crates/ccusage/src/adapter/claude/ -A 3 -B 3

echo ""
echo "=== CacheCreationRaw test coverage ==="
rg -n "CacheCreationRaw\|ephemeral_5m\|ephemeral_1h" rust/crates/ccusage/src/ -A 3

echo ""
echo "=== cost.rs full function ==="
cat rust/crates/ccusage/src/cost.rs

Length of output: 12468



@ryoppippi: I've reviewed the PR carefully. The core approach is well-designed — leveraging #[serde(default)] on cache_creation: Option<CacheCreationRaw> means Claude Code JSONL gets parsed automatically without explicit adapter code, while every other adapter just sets None. The fallback logic in cost.rs is also correct. Here are the findings:


Review: feat(cost): split cache creation pricing by 5m/1h duration

🟡 Medium — cache_creation_token_count should take &self

rust/crates/ccusage/src/types.rs Line 42

// Current — takes self by value (works only because TokenUsageRaw: Copy)
pub(crate) fn cache_creation_token_count(self) -> u64 {

// Preferred — idiomatic accessor; safe even if Copy is removed later
pub(crate) fn cache_creation_token_count(&self) -> u64 {

This is a read-only accessor that doesn't consume the value. Taking self instead of &self compiles today because TokenUsageRaw: Copy, but is non-idiomatic and would break silently if Copy is ever removed (e.g., when adding a non-Copy field like a String).


🟡 Medium — No tests for the new 1h pricing path

The CacheCreationRaw / ephemeral_* tokens have zero test coverage. The existing daily.rs test at Line 567 uses a JSON string without cache_creation, exercising only the fallback. Suggested additions:

  1. A unit test for cache_creation_token_count() with Some(CacheCreationRaw { ephemeral_5m_input_tokens: 100, ephemeral_1h_input_tokens: 200 }) → should return 300.
  2. A calculate_cost_from_tokens test with a cache_creation breakdown to verify the 2× pricing applies to 1h tokens.
  3. A serde round-trip test parsing JSON like {"cache_creation": {"ephemeral_5m_input_tokens": 0, "ephemeral_1h_input_tokens": 23566}} to confirm the #[serde(default)] field populates correctly.

🟢 Low — Magic number 2.0 for 1h cache pricing

rust/crates/ccusage/src/cost.rs Line 102

// Current
let cache_create_1h_cost = pricing.input * 2.0;

// Suggested — makes the Anthropic pricing rule self-documenting
const CACHE_1H_RATE_MULTIPLIER: f64 = 2.0; // Anthropic prices 1h cache writes at 2× base input
let cache_create_1h_cost = pricing.input * CACHE_1H_RATE_MULTIPLIER;

🟢 Low — Redundant #[serde(rename)] in CacheCreationRaw

rust/crates/ccusage/src/types.rs Lines 53–56

// Current — rename is a no-op since field name already matches the JSON key
#[serde(rename = "ephemeral_5m_input_tokens", default)]
pub(crate) ephemeral_5m_input_tokens: u64,
#[serde(rename = "ephemeral_1h_input_tokens", default)]
pub(crate) ephemeral_1h_input_tokens: u64,

#[serde(default)] alone is sufficient here:

#[serde(default)]
pub(crate) ephemeral_5m_input_tokens: u64,
#[serde(default)]
pub(crate) ephemeral_1h_input_tokens: u64,

✅ Things done well

  • The serde #[serde(default)] mechanism for auto-parsing the breakdown from Claude JSONL is clean — no adapter-specific parsing code needed.
  • Fallback to (cache_creation_input_tokens, 0) when cache_creation is None is correct and preserves backwards compatibility with older records.
  • tiered_cost is correctly reused for both the 5m and 1h paths, including the above_200k tier.
  • Token count aggregation updated consistently across all 6 sites.
  • CacheCreationRaw: Copy + Default makes struct-update syntax safe throughout the codebase.

ryoppippi added 2 commits June 8, 2026 22:05
Add focused regression coverage for Claude Code cache creation duration handling. The tests pin the three important cases from #899: records with 5-minute and 1-hour cache creation breakdowns use separate rates, older records without the breakdown keep the flat cache creation fallback, and serde parses the nested cache_creation usage fields.

Update the cost mode guide so the documented formula matches the new 5m/1h calculation. Also make cache_creation_token_count borrow self and name the 1-hour multiplier so future pricing changes have a single obvious constant to inspect.
…che-creation-pricing

# Conflicts:
#	.github/workflows/pullfrog.yml
@pkg-pr-new

pkg-pr-new Bot commented Jun 8, 2026

Copy link
Copy Markdown

Open in StackBlitz

ccusage

npx https://pkg.pr.new/ccusage@1221

@ccusage/ccusage-darwin-arm64

npx https://pkg.pr.new/@ccusage/ccusage-darwin-arm64@1221

@ccusage/ccusage-darwin-x64

npx https://pkg.pr.new/@ccusage/ccusage-darwin-x64@1221

@ccusage/ccusage-linux-arm64

npx https://pkg.pr.new/@ccusage/ccusage-linux-arm64@1221

@ccusage/ccusage-linux-x64

npx https://pkg.pr.new/@ccusage/ccusage-linux-x64@1221

@ccusage/ccusage-win32-arm64

npx https://pkg.pr.new/@ccusage/ccusage-win32-arm64@1221

@ccusage/ccusage-win32-x64

npx https://pkg.pr.new/@ccusage/ccusage-win32-x64@1221

commit: 2d20874

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

ccusage performance comparison

PR SHA: 2d20874701b3
Base SHA: cf2d7cd93e75

This compares the PR package against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package SHA Execution setup Bunx temp cache Bunx warm median Warm samples
Base pkg.pr.new cf2d7cd93e75 764.5ms 495.8ms 31.9ms 3
PR pkg.pr.new 2d20874 535.7ms 667.0ms 32.5ms 3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: cf2d7cd93e75; PR package: 2d20874. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
bunx -p <pkg> ccusage claude --offline --json 1.01 GiB 563.1ms 557.4ms 1.01x 313.45 MiB 331.20 MiB 1.06x 1.79 GiB/s 1.81 GiB/s
bunx -p <pkg> ccusage codex --offline --json 1.01 GiB 376.3ms 375.8ms 1.00x 82.20 MiB 79.08 MiB 0.96x 2.68 GiB/s 2.68 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command Runtime Input Median Throughput Samples
claude --offline --json Package wrapper 1.01 GiB 560.6ms 1.80 GiB/s 1
claude --offline --json Installed native binary 1.01 GiB 535.2ms 1.88 GiB/s 1
codex --offline --json Package wrapper 1.01 GiB 366.8ms 2.74 GiB/s 1
codex --offline --json Installed native binary 1.01 GiB 337.3ms 2.98 GiB/s 1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude daily --offline --json 0.00 MiB 29.6ms 29.7ms 1.00x 43.73 MiB 43.61 MiB 1.00x 0.05 MiB/s 0.05 MiB/s
claude session --offline --json 0.00 MiB 30.5ms 30.7ms 0.99x 43.48 MiB 43.61 MiB 1.00x 0.05 MiB/s 0.05 MiB/s
codex daily --offline --json 0.00 MiB 30.2ms 29.6ms 1.02x 43.48 MiB 43.48 MiB 1.00x 0.03 MiB/s 0.03 MiB/s
codex session --offline --json 0.00 MiB 29.8ms 30.1ms 0.99x - 43.48 MiB - 0.03 MiB/s 0.03 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude --offline --json 1.01 GiB 558.7ms 554.4ms 1.01x 308.45 MiB - - 1.80 GiB/s 1.82 GiB/s
codex --offline --json 1.01 GiB 365.1ms 367.7ms 0.99x - 82.08 MiB - 2.76 GiB/s 2.74 GiB/s

Artifact size

Artifact Base PR Delta Ratio
packed ccusage-*.tgz 14.50 KiB 14.50 KiB +0.00 KiB 1.00x
installed native package binary 3289.62 KiB 3289.62 KiB +0.00 KiB 1.00x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

ccusage performance comparison

PR SHA: 2d20874701b3
Base SHA: cf2d7cd93e75

This compares the Rust PR release binary against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package SHA Execution setup Bunx temp cache Bunx warm median Warm samples
Base pkg.pr.new cf2d7cd93e75 929.7ms 881.2ms 31.0ms 3
PR pkg.pr.new 2d20874 713.1ms 827.1ms 31.7ms 3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: cf2d7cd93e75; PR package: 2d20874. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
bunx -p <pkg> ccusage claude --offline --json 1.01 GiB 561.6ms 551.2ms 1.02x 331.95 MiB 322.33 MiB 0.97x 1.79 GiB/s 1.83 GiB/s
bunx -p <pkg> ccusage codex --offline --json 1.01 GiB 370.5ms 373.1ms 0.99x 71.83 MiB 70.08 MiB 0.98x 2.72 GiB/s 2.70 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command Runtime Input Median Throughput Samples
claude --offline --json Package wrapper 1.01 GiB 578.1ms 1.74 GiB/s 1
claude --offline --json Installed native binary 1.01 GiB 533.3ms 1.89 GiB/s 1
codex --offline --json Package wrapper 1.01 GiB 365.1ms 2.76 GiB/s 1
codex --offline --json Installed native binary 1.01 GiB 337.3ms 2.98 GiB/s 1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude daily --offline --json 0.00 MiB 30.2ms 4.1ms 7.45x 43.73 MiB 2.70 MiB 0.06x 0.05 MiB/s 0.38 MiB/s
claude session --offline --json 0.00 MiB 29.9ms 4.1ms 7.28x 43.48 MiB 2.83 MiB 0.07x 0.05 MiB/s 0.38 MiB/s
codex daily --offline --json 0.00 MiB 29.2ms 3.9ms 7.56x 43.48 MiB 2.83 MiB 0.07x 0.03 MiB/s 0.22 MiB/s
codex session --offline --json 0.00 MiB 29.6ms 3.8ms 7.89x 43.61 MiB 2.83 MiB 0.06x 0.03 MiB/s 0.23 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude --offline --json 1.01 GiB 553.5ms 542.3ms 1.02x - 342.70 MiB - 1.82 GiB/s 1.86 GiB/s
codex --offline --json 1.01 GiB 367.9ms 340.3ms 1.08x 79.70 MiB 81.08 MiB 1.02x 2.74 GiB/s 2.96 GiB/s

Artifact size

Artifact Base PR Delta Ratio
packed ccusage-*.tgz 14.50 KiB 14.50 KiB +0.00 KiB 1.00x
installed native package binary 3289.62 KiB 3289.62 KiB +0.00 KiB 1.00x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

@ryoppippi ryoppippi merged commit 00b1865 into main Jun 8, 2026
38 checks passed
@ryoppippi ryoppippi deleted the pullfrog/899-split-cache-creation-pricing branch June 8, 2026 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cache creation cost underestimated: 1-hour cache writes priced at 5m rate (1.25x) instead of 2x

1 participant