Skip to content

fix(cost): Opus-accurate context ring + cost ledger + stop button#2273

Open
stansalvatec wants to merge 16 commits intopingdotgg:mainfrom
stansalvatec:feat/token-cost-meter
Open

fix(cost): Opus-accurate context ring + cost ledger + stop button#2273
stansalvatec wants to merge 16 commits intopingdotgg:mainfrom
stansalvatec:feat/token-cost-meter

Conversation

@stansalvatec
Copy link
Copy Markdown

@stansalvatec stansalvatec commented Apr 21, 2026

Summary

Five related fixes stacked on the token-cost-meter branch, now merged together:

  1. Stop button stuck active after model done (96768f1) — client optimistically reconciles session.status in thread.message-sent and guards onInterrupt when the latest turn is in a terminal state.
  2. Bot-review cleanups (76a3495) — dead useInvalidateCostSummary removed, duplicate formatUsd re-exported from @t3tools/shared/pricing, no-op ternaries in sanitizePersistedFile simplified.
  3. Cost ledger over-counting (b027c89) — ProviderRuntimeIngestion now gates recordUsage on the presence of any lastXxxTokens field (the canonical "turn-final" signal). Mid-turn Claude snapshots only flow to the context-window activity, not the ledger. Also normalises the model slug before ledger writes so the byModel breakdown stays stable.
  4. Context-window ring over-reporting (step 1) (b027c89) — both adapters redefined usedTokens as input-side only (input + cache-read + cache-creation for Claude, last.inputTokens + last.cachedInputTokens for Codex). totalProcessedTokens keeps its billing-side semantic. ContextWindowSnapshot now carries cacheCreationInputTokens / lastCacheCreationInputTokens.
  5. Opus-accurate context ring (step 2) (d46b444) — the step-1 fix still over-reported on Opus because we fell back to result.usage (which is session-cumulative across every API call, not per-turn), and the task_progress.usage SDK field only exposes an opaque total_tokens. Now we capture input + cache_read + cache_creation from every SDKAssistantMessage.message.usage (Anthropic-native per-call breakdown) and use it as the top-priority usedTokens source, emit mid-turn ring updates on each assistant frame, and drop the session-cumulative fallback entirely.

Migration: existing ledger files are polluted and can't be repaired in-place. CostTrackerLive writes a .schema-v2 sentinel in the usage dir on boot; when absent it wipes the known ledger files (session_*.json, YYYY-MM.json, alltime.json) and writes the sentinel. Stray non-ledger files are left alone. Bumping LEDGER_SCHEMA_VERSION is the single line needed for future reducer-incompatible changes.

Existing threads: context-window.updated activities sit in the orchestration event log per-thread. The ring reads the latest such activity, so existing threads keep their pre-fix (inflated) values until a new turn lands. New chats → correct immediately. Old threads → self-heal on next turn.

Files

  • apps/server/src/orchestration/Layers/ProviderRuntimeIngestion.ts — turn-final filter + model slug normalisation.
  • apps/server/src/provider/Layers/ClaudeAdapter.tsusedTokens = input-side; per-call capture from SDKAssistantMessage.message.usage; priority for lastApiCallInputSide.
  • apps/server/src/provider/Layers/CodexAdapter.tsusedTokens = last.inputTokens + last.cachedInputTokens.
  • apps/server/src/cost/Layers/CostTracker.ts — schema sentinel + first-boot wipe.
  • apps/web/src/lib/contextWindow.ts — carry through cacheCreationInputTokens / lastCacheCreationInputTokens.
  • packages/contracts/src/providerRuntime.ts — JSDoc for ThreadTokenUsageSnapshot (two dimensions, turn-final signal).
  • apps/web/src/store.ts + apps/web/src/components/ChatView.tsx — stop-button reconciliation.
  • apps/web/src/lib/costQuery.ts + apps/server/src/cost/Reducer.ts — bot-review cleanups.

Test plan

  • apps/server cost + adapter + ingestion tests — 206/206 pass (+3 new for Opus fix)
  • apps/web — 908/908 pass
  • packages/shared — 126/126 pass
  • Typecheck clean
  • oxlint clean on changed files
  • Manual: send a message, wait for model done, confirm stop button disappears immediately
  • Manual: Opus multi-call turn — confirm context ring tracks real prompt size, not a number that grows with each turn
  • Manual: long-output Claude turn — confirm ring stays within reasonable bounds (excludes output)
  • Manual: send several turns with Claude — confirm turnCount matches user-visible turn count (not 3-10× higher)
  • Manual: confirm first boot writes .schema-v2 sentinel in <T3CODE_HOME>/<state>/usage/ and wipes existing ledger files; second boot leaves them intact

Migration note for users

On first server boot after merging, the usage ledger at <T3CODE_HOME>/<state>/usage/ is wiped (session + month + all-time files) to clear totals polluted by the pre-fix reducer. A .schema-v2 sentinel is written to prevent re-wipes. Month + all-time totals rebuild from subsequent turns; per-thread session files show $0/0 until a new turn lands.

Existing thread rings may keep their old wrong values until the next turn generates a new activity.

🤖 Generated with Claude Code

Note

Add per-turn cost ledger, accurate context-window ring, and stop-button guard for Claude/Codex

  • Adds a CostTrackerLive service that atomically persists per-session, per-month (local timezone), and all-time cost ledgers as JSON files under usageDir, exposed via GET /api/cost/summary.
  • Adds a pricing.ts module in packages/shared with model pricing for Claude and Codex models, computeTurnCost, and formatUsd for UI display.
  • Reworks ClaudeAdapter and CodexAdapter to report usedTokens as input-side tokens only (not including output), emit mid-turn thread.token-usage.updated events from assistant frames, and include per-class deltas (last* fields) on turn completion.
  • Adds a CostMeter ring in the chat composer footer showing session and month-to-date spend with a per-model breakdown popover, invalidating on each context-window.updated activity.
  • ProviderRuntimeIngestion now records costs to CostTracker only for turn-final token-usage events (those with last* fields); mid-turn snapshots are ignored.
  • Fixes ChatView.onInterrupt to skip dispatching a stop command when the latest turn is already in a terminal state (completed, interrupted, or error).
  • Behavioral Change: usedTokens in token-usage snapshots now reflects input-side tokens only and is no longer capped to maxTokens; totalProcessedTokens carries the full cumulative billing total.

Macroscope summarized bd0fc3b.

Olympicx and others added 11 commits April 21, 2026 19:30
Seed rates for Claude (sonnet-4.6, opus-4.6/4.7/4.5, haiku-4.5) and
Codex (gpt-5.4, 5.3-codex, spark, mini) in USD per 1M tokens.
getPricing() resolves via provider aliases with zero-rate fallback.
computeTurnCost() splits input / cached / output / reasoning spend.

Prep for session + MTD cost meter.
localStorage-persisted zustand store at t3code:cost-store:v1.
Pure reducers accumulate token + USD spend per thread (session)
and per YYYY-MM in local tz (month-to-date). sanitize*() guards
garbage payloads; selectors expose session/month buckets and
avg cost per turn. Tests: 17 pass.
useCostTracking hook observes activeThread activities and records
each new context-window.updated event (with lastXxxTokens deltas)
into the cost store. Seeds seen-set on mount / thread switch so
historical activity is not retroactively charged to this month.
Pure processActivitiesForCost reducer is unit-tested; the hook is
a thin ref+effect wrapper. Tests: 9 pass.
CostMeter mirrors ContextWindowMeter's ring + Popover style.
Fill ratio uses VITE_MONTHLY_BUDGET_USD if set, else a compressed
log scale. Popover shows session/MTD totals, budget %, turn count,
avg cost per turn, and per-model breakdown. Turns destructive
color when over budget.

useCostSummary zustand hook reads sessions + months slices and
recomputes summary; cheap enough to recompute per render since
selector is O(models).

Composer wires useCostTracking side-effect + passes summary to
ComposerFooterPrimaryActions next to ContextWindowMeter.
Let dev mode point at the installed app's "userdata" state for
history continuity, and pave the way for a server-side usage/ JSON
store that both dev and prod reuse.

- deriveServerPaths accepts optional stateSubdir; env wins over the
  default (dev/userdata selection via devUrl).
- Adds usageDir (<stateDir>/usage) to derived paths + ensures it
  exists at startup.
- dev-runner: new --state-subdir flag + --use-userdata shortcut;
  forwards to T3CODE_STATE_SUBDIR. Startup logs warn loudly when
  dev is aimed at userdata.
- Tests: dev-runner env matrix (22 pass), cli-config subdir override
  + usageDir derivation (10 pass).
- Add cacheCreationInputTokens + lastCacheCreationInputTokens to
  ThreadTokenUsageSnapshot. Anthropic charges cache-write at 1.25x
  input; reporting it separately lets the cost meter bill correctly.
- Add optional model field to ThreadTokenUsageUpdatedPayload so the
  server-side cost tracker can resolve pricing without a lookup
  against thread state.
Anthropic bills cache-writes at 1.25x input; OpenAI has no separate
write tier. Model a distinct cacheCreationInputPerMTok rate (with
provider-aware defaults) so the cost meter no longer conflates
cache hits, cache writes, and fresh input.

- ModelPricing gains cacheCreationInputPerMTok; Claude auto-applies
  the 1.25x multiplier, OpenAI defaults to inputPerMTok.
- TurnTokenDeltas + TurnCostBreakdown gain cacheCreation slots; zero
  for providers that don't distinguish the tier.
- computeTurnCost bills each class additively.
- Client extractDeltas reads lastCacheCreationInputTokens; helpers +
  fixtures carry the new field through.
- Tests: +2 cases covering Anthropic cache-write premium and the
  OpenAI default.
… usage

The Claude adapter lumped cache_read / cache_creation / fresh input
into a single inputTokens field and emitted no per-turn deltas,
leaving the cost meter silently $0 for every Claude turn and
over-charging cached contexts by ~10x when it did fire. It also
clamped usedTokens at maxTokens on cumulative totals, pinning the
context ring at 100% once totalProcessedTokens exceeded the window.

Changes:
- Extract parseClaudeUsageBreakdown: splits SDK usage into four
  tiers (input / cachedInput / cacheCreationInput / output) with an
  explicit totalTokens.
- normalizeClaudeTokenUsage emits all four tiers and drops the
  min(total, max) cap; callers decide how to render overflow.
- Add buildClaudeTurnCompleteUsage: maintains a per-session
  lastTurnCumulativeUsage accumulator, subtracts from each
  result.usage to produce lastInputTokens / lastCachedInputTokens /
  lastCacheCreationInputTokens / lastOutputTokens deltas for the
  cost tracker. usedTokens prefers the task snapshot (real current
  context) over the cumulative total.
- Context state gains lastTurnCumulativeUsage; initialized at
  session start, advanced on each turn-complete emission.

Tests:
- New ClaudeAdapter.usage.test.ts: 10 unit tests cover parseBreakdown
  semantics, first-turn vs second-turn deltas, clamp behaviour,
  task-snapshot fallback, and negative-delta guards.
- ClaudeAdapter.test.ts updated: three existing cases now assert the
  split tiers + uncapped usedTokens (what the SDK actually reports).
- Full server suite: 894 pass.
Introduces a server-owned cost ledger that writes three atomic JSON
files per recorded turn:
  - session_<threadId>.json  per-thread cumulative
  - YYYY-MM.json             month bucket (local tz)
  - alltime.json             running total since install

Works across dev, installed app, and standalone binaries because
persistence lives next to the server's existing SQLite state at
<T3CODE_HOME>/<state>/usage/. Atomic writes mirror serverSettings:
write .tmp, rename into place; errors log and swallow so
orchestration never blocks on FS failure.

Components:
- types.ts: plain-TS interfaces + local-tz month key helper +
  empty-bucket constructors.
- Reducer.ts: pure deriveTurnDeltas / processTurn / isTurnNoOp /
  sanitizePersistedFile. Prefers lastXxxTokens from the payload
  (Codex + post-fix Claude); falls back to delta-vs-lastCumulative
  for older providers. Zero-cost unknown models still record their
  token usage.
- Services/CostTracker.ts: Effect Context.Service API
  (recordUsage / getSummary / updates stream).
- Layers/CostTracker.ts: FS-backed live layer; semaphore-serialized
  writes; PubSub exposes live updates for WS broadcast.
- shared/pricing: re-export ProviderKind so server consumers don't
  reach into contracts for it.

Tests: 14 pure reducer cases + 5 live-layer cases (record, idempotent
no-op, accumulate, stream emission, zero-summary). All green.
Wire the runtime event stream into the new CostTracker and expose
the ledger over HTTP so web + desktop + standalone binaries all
share the same authoritative cost data.

Server (c11 + c12)
- ProviderRuntimeIngestion now calls CostTracker.recordUsage after
  appending the context-window.updated activity. Errors are logged
  and swallowed so orchestration is never blocked by FS faults.
- Model comes from event.payload.model (set by adapters) with a
  fallback to thread.modelSelection.model.
- CostTrackerLive added to the server composition root + wired into
  test + integration layers (stub mock for server.test.ts).
- New GET /api/cost/summary?threadId=X route returns the freshest
  session + month + all-time summary. CORS handled via the existing
  browserApi layer.

Client (c13)
- Drop zustand + localStorage. The old costStore.ts /
  useCostTracking.ts (plus their tests) are gone — server is now
  source of truth.
- New lib/costQuery.ts: react-query queryOptions + sanitizer for
  the HTTP response, plus formatUsd utility. Invalidation helper
  bumps the cache whenever the active thread receives a new
  context-window.updated activity, so the ring updates within one
  render of the server write.
- ChatComposer replaces useCostTracking/useCostSummary with a
  useQuery subscription and a tiny effect that invalidates on new
  usage activities. Plumbs activeProvider through to the meter.
- CostMeter: rebuild around the new {thread, month, allTime}
  shape. Popover now shows session ⋅ MTD ⋅ all-time and gracefully
  renders "—" for providers without token-usage telemetry (cursor /
  opencode) instead of a misleading $0.

Tests: 913 server pass, 906 web pass (26 old localStorage tests
deleted, replaced by server-owned CostTracker coverage from c10).
When the final `thread.message-sent` (streaming:false) arrives, the
client marks `latestTurn.state` as "completed" but leaves
`session.status === "running"` until the separate `thread.session-set`
event (emitted server-side on `turn.completed`) arrives.  In that gap:

- The stop button stays red because visibility is derived from
  `derivePhase(session)` → `"running"` via `session.status`.
- Clicking it dispatches `thread.turn.interrupt`; the server has no
  active turn so the command is a no-op, and the UI stays stuck until
  the late `thread.session-set` lands.

Fix:

- `store.ts` `thread.message-sent` handler: when the final assistant
  message for the currently active turn arrives and `latestTurn`
  resolves to "completed", optimistically flip `session.status` /
  `orchestrationStatus` to "ready" and clear `activeTurnId`.  The
  later server-sent `thread.session-set` overwrites session via
  `mapSession` and is idempotent over this change.  Interrupted and
  errored turns are excluded (checked via `latestTurn.state ===
  "completed"` and the `activeTurnId === event.turnId` guard).

- `ChatView.tsx` `onInterrupt`: defensive guard — if `latestTurn` is
  already in a terminal state (completed / interrupted / error), skip
  the dispatch.  This closes the small window where a click lands
  before React re-renders the composer.

Tests:

- Updated the existing replay-batch test: after a final assistant
  `message-sent` for the active turn, `session.status` is now "ready"
  and `activeTurnId` is cleared.
- Added a test that a mismatched turnId (active turn ≠ streaming:false
  message turn) does NOT reconcile — the server's session-set remains
  authoritative.
- Added a test that an interrupted turn's final message does NOT
  reconcile session to "ready".

All 908 web tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 21, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: cca1ac92-e77a-4f6b-8156-6c4070c32b09

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added vouch:unvouched PR author is not yet trusted in the VOUCHED list. size:XXL 1,000+ changed lines (additions + deletions). labels Apr 21, 2026
Comment thread apps/web/src/lib/costQuery.ts Outdated
Comment thread apps/web/src/lib/costQuery.ts Outdated
Comment thread apps/server/src/cost/Reducer.ts Outdated
@macroscopeapp
Copy link
Copy Markdown
Contributor

macroscopeapp Bot commented Apr 21, 2026

Approvability

Verdict: Needs human review

This PR introduces a significant new cost tracking feature with server-side ledger, pricing calculations, HTTP endpoints, and UI components. Unresolved HIGH severity review comments identify potential bugs in the Codex adapter token calculation (NaN from undefined addition) that could cause incorrect cost reporting. The billing/cost tracking nature of these changes combined with the identified bugs warrant careful human review.

You can customize Macroscope's approvability policy. Learn more.

Address Cursor Bugbot + Macroscope findings on pingdotgg#2273:

- apps/server/src/cost/Reducer.ts: drop the no-op ternaries in
  sanitizePersistedFile (`r.version === 1 ? 1 : 1` and
  `r.kind === expectedKind ? expectedKind : expectedKind`).  Both
  always returned the right-hand value regardless of the stored
  value, so they were silently forcing the expected defaults — which
  is actually the intended sanitize-on-mismatch behaviour.  Simplify
  to the constants directly and add a comment explaining the intent.
  (Macroscope, Reducer.ts:325-326.)

- apps/web/src/lib/costQuery.ts: stop duplicating `formatUsd` and
  instead re-export it from `@t3tools/shared/pricing` (the shared
  package was already a workspace dep and owns computeTurnCost next
  to the formatter).  Keeping the re-export so CostMeter and any
  future consumer continue to import from `~/lib/costQuery` as the
  single cost-UI utility module.  (Cursor, duplicated-function.)

- apps/web/src/lib/costQuery.ts: remove the dead
  `useInvalidateCostSummary` hook.  The ChatComposer calls
  `invalidateCostSummary` directly with its own `useQueryClient`, so
  the hook wrapper was unused surface area.  (Cursor, dead-code.)

Verified: web typecheck clean, web tests 908/908 pass, server cost
tests 19/19 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Olympicx
Copy link
Copy Markdown

updated the PR

Two independent bugs in the token-usage pipeline, both user-visible
and both rooted in the same conflation between the context-window
dimension (what fills the ring) and the billing dimension (what
lands in the cost ledger).

## 1. Cost ledger over-counting (CRITICAL)

Claude emits `thread.token-usage.updated` events from three places
per turn: every `task_progress`, every `task_notification`, and the
final `completeTurn`. The mid-turn snapshots carry per-API-call
breakdowns *without* `lastXxxTokens` fields, while the turn-complete
snapshot carries cumulative totals *with* `lastXxx` deltas.

`ProviderRuntimeIngestion` fed every one of these events into
`CostTracker.recordUsage`. For the mid-turn events, the Reducer's
`hasExplicitLast=false` branch subtracts the payload's cumulative
against the session's `lastCumulative` — but what gets stored in
`lastCumulative` between mid-turn events is one API call's
breakdown, not the session running total, so the resulting "deltas"
are arbitrary diffs between per-call snapshots. Net effect: cost
over/undercounted unpredictably every turn, and `turnCount`
inflated by 3–10× because every mid-turn snapshot with any positive
delta bumped it.

Fix: gate `recordUsage` in `ProviderRuntimeIngestion` on the
presence of any `lastXxxTokens` field. Mid-turn snapshots still
flow to the `context-window.updated` activity for the ring, they
just skip the ledger. Codex only emits one snapshot per turn (and
always with `lastXxx`) so it's unaffected.

While here, normalise the model slug (`resolveModelSlugForProvider`)
before passing it to the ledger so aliased/canonical variants
collapse to a single `byModel` key.

## 2. Context-window ring over-reporting

Both adapters set `usedTokens = totalTokens`, which for the cost
dimension meant *every* billed token including outputs. But the
ring consumes `usedTokens / maxTokens`, and output tokens are
generated *out* of the model — they don't live in the prompt
window, so including them inflated the ring (especially on long-
output turns). Reasoning tokens have the same property (ephemeral,
not persisted into next-turn context).

Fix: redefine `usedTokens` as the input-side total only
(`input + cache-read + cache-creation`), in both
`normalizeClaudeTokenUsage`/`buildClaudeTurnCompleteUsage` and
`normalizeCodexTokenUsage` (`last.inputTokens +
last.cachedInputTokens` — Codex V2 has no cache-creation tier).
`totalProcessedTokens` keeps the original semantic ("tokens
processed so far", billing-side). Added a contract-level JSDoc on
`ThreadTokenUsageSnapshot` that spells out the two dimensions and
the `lastXxxTokens` "turn-final" signal.

Also: the client's `deriveLatestContextWindowSnapshot` was silently
dropping `cacheCreationInputTokens` / `lastCacheCreationInputTokens`
from the `ContextWindowSnapshot` shape even though the payload
carries them. Wire them through.

## 3. Migration

Existing ledger files are polluted and can't be repaired in-place.
Added a `.schema-v2` sentinel in the usage dir: `CostTrackerLive`
boots, sees no sentinel, wipes only the known ledger files
(`session_*.json`, `YYYY-MM.json`, `alltime.json`) — any stray
files are left alone — writes the sentinel, and subsequent boots
skip. Bumping `LEDGER_SCHEMA_VERSION` is the single line needed
for any future reducer-incompatible change.

## Tests

- Reworked Claude/Codex adapter assertions for the new input-side
  `usedTokens` semantic (24542 → 23863 for the Claude cumulative
  case, 126 → 120 for Codex, etc.); explanatory comments added.
- New ProviderRuntimeIngestion test: mid-turn snapshot (no
  `lastXxx`) projects into the activity stream but does NOT bump
  the ledger; turn-final snapshot records exactly one turn.
- New CostTrackerLive tests: first boot wipes pre-v2 ledger files
  (including a `.json` stray, which survives); subsequent boot
  with sentinel present leaves ledger files intact.
- Existing ingestion tests retargeted at a temp-dir base so the
  first-boot wipe can't touch the developer's real
  `<cwd>/userdata/usage/` directory.

All 203 server tests pass in the changed files; 908 web tests
pass; 126 shared tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@stansalvatec stansalvatec changed the title fix(web): stop button stays active after model response completes fix(cost): context ring + cost ledger accuracy + stop button Apr 21, 2026
const usedTokens = inputSideTokens > 0 ? inputSideTokens : usage.last.totalTokens;
if (usedTokens <= 0) {
return undefined;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing undefined guard in Codex token usage fallback

Medium Severity

The old code guarded against usage.last.totalTokens being undefined with an explicit usedTokens === undefined || usedTokens <= 0 check. The new code removed the undefined check. When inputSideTokens is 0, usedTokens falls back to usage.last.totalTokens. If that value is undefined, the guard usedTokens <= 0 evaluates to false (because undefined coerces to NaN, and NaN <= 0 is false), so the function proceeds to return a snapshot with usedTokens: undefined instead of returning undefined to signal no valid usage data.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit b027c89. Configure here.

Olympicx and others added 2 commits April 22, 2026 01:11
…accuracy

The earlier switch to input-side `usedTokens` still showed inflated
values for Claude Opus (and any multi-call turn) because the two
signals we trusted are both unreliable sources of current context
size:

1. `result.usage` is **session-cumulative** across every API call on
   the thread, not just this turn. Summing its input-side classes
   grows linearly with turn count — exactly what users saw on Opus,
   which makes many API calls per turn.
2. `task_progress.usage` only carries an opaque SDK
   `total_tokens`; the Anthropic-native per-class breakdown
   (`input_tokens` / `cache_read_input_tokens` /
   `cache_creation_input_tokens`) is **not present** on
   `SDKTaskProgressMessage.usage`. Parsing it always falls through
   to `total_tokens`.

The only source that carries the *exact per-call prompt breakdown*
is `SDKAssistantMessage.message.usage` — that's `BetaUsage` from
the Anthropic API, refreshed on every assistant frame.

Fix:

- New `context.lastApiCallInputSideTokens` tracks `input_tokens +
  cache_read_input_tokens + cache_creation_input_tokens` captured
  from each `SDKAssistantMessage.message.usage`. Refreshed per
  frame, cleared after the turn-completion emission so the next
  turn starts clean.
- `handleAssistantMessage` also emits a
  `thread.token-usage.updated` event on each assistant frame with
  this input-side sum as `usedTokens`, so the mid-turn ring tracks
  real prompt size (not the SDK's opaque total).
- `buildClaudeTurnCompleteUsage` now takes an optional
  `lastApiCallInputSide` and uses it as the top-priority
  `usedTokens` source. Priority:
    1. `lastApiCallInputSide` — exact current context.
    2. `taskSnapshot.usedTokens` — SDK opaque (fallback).
    3. Per-turn *delta* input-side — last-ditch when neither
       above is present. The old session-cumulative fallback has
       been removed; it inflated any multi-call turn.
- `lastUsedTokens` mirrors `usedTokens` when the per-turn input-side
  delta is zero, so we never fall back to the session-cumulative sum.

Tests:

- Updated the "preserves oversized result totals after task
  progress" test: `lastUsedTokens` is now `190_000` (mirrors
  `usedTokens`), not `535_000` (the removed cumulative fallback).
- New `prefers lastApiCallInputSide over the task snapshot for
  usedTokens`: when both are present, per-call wins.
- New `does NOT fall back to cumulative input-side for usedTokens`:
  with a real prior cumulative, fallback now returns the per-turn
  delta, not the session-wide sum.
- New adapter-level test verifying an assistant frame with
  Anthropic-native usage emits a `thread.token-usage.updated`
  event with `usedTokens = input + cache_read + cache_creation`.

Important: existing threads retain their pre-fix `usedTokens`
values in stored `context-window.updated` activities until the
next turn generates a new activity. The ring self-heals on the
first new turn; old turns in-history keep their stale numbers.

Verified: 206/206 targeted server tests pass (3 new), 908/908 web
tests pass, typecheck + oxlint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Local rebuild for personal distribution off the
feat/token-cost-meter branch. Keeps the app bundle identifier
(`com.t3tools.t3code`) untouched so existing auto-update channels
aren't disturbed, but changes the user-facing name, dev launcher
label, and artifact filename.

- apps/desktop/package.json: productName → "T3 by Stan".
- apps/desktop/scripts/electron-launcher.mjs: APP_DISPLAY_NAME
  follows the new name (dev / prod variants).
- scripts/build-desktop-artifact.ts: artifactName →
  `T3-by-Stan-${version}-${arch}.${ext}` so the DMG / zip /
  blockmap files land as `release/T3-by-Stan-0.0.21-arm64.dmg` etc.
- apps/{desktop,server,web}/package.json + bun.lock: version bump
  0.0.20 → 0.0.21.

The legacy user-data migration constant in `apps/desktop/src/main.ts`
(`LEGACY_USER_DATA_DIR_NAME = "T3 Code (Alpha)"`) is intentionally
left alone so this build still picks up data from the prior install.

Built macOS arm64 DMG sits at release/T3-by-Stan-0.0.21-arm64.dmg
(136 MB, unsigned / ad-hoc — Gatekeeper first-launch warning
expected). Signing / notarization not configured; would require
Apple Developer credentials.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
// size. Fall back to the raw `last.totalTokens` only when the
// breakdown is zero (defensive — shouldn't happen for any real turn).
const inputSideTokens = inputTokens + cachedInputTokens;
const usedTokens = inputSideTokens > 0 ? inputSideTokens : usage.last.totalTokens;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NaN from undefined addition defeats input-side-only fix

High Severity

In normalizeCodexTokenUsage, inputTokens and cachedInputTokens can be undefined (the code itself checks !== undefined when conditionally spreading them into the snapshot a few lines later). Adding undefined + undefined or number + undefined produces NaN, and NaN > 0 is false, so the fallback to usage.last.totalTokens (which includes output + reasoning tokens) silently kicks in — exactly the over-reporting this PR is meant to fix. The values need a nullish coalesce to zero before addition.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 1790ec5. Configure here.

kind,
key,
bucket: emptyCostBucket(now),
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused emptyBucketFile function is dead code

Low Severity

The emptyBucketFile helper is defined inside the make generator but never called anywhere. The loadFile function already handles missing files via sanitizePersistedFile, which returns an empty bucket when the raw input is undefined. This is dead code that can be removed.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 1790ec5. Configure here.

Comment on lines +517 to +520
const usedTokens =
input.lastApiCallInputSide !== undefined && input.lastApiCallInputSide > 0
? input.lastApiCallInputSide
: (input.taskSnapshot?.usedTokens ?? deltaUsedFallback);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low Layers/ClaudeAdapter.ts:517

Line 520 uses ?? so when input.taskSnapshot.usedTokens is 0, the code keeps that 0 instead of falling through to deltaUsedFallback. The comment on lines 521–524 states the intent is to "never emit 0 for a turn that clearly had activity", but ?? only falls back on undefined/null, not on 0. If the SDK reports usedTokens: 0 while cumulative indicates activity, usedTokens becomes 0, violating the stated intent. Consider using a ternary that checks > 0 instead of ??.

-  const usedTokens =
-    input.lastApiCallInputSide !== undefined && input.lastApiCallInputSide > 0
-      ? input.lastApiCallInputSide
-      : (input.taskSnapshot?.usedTokens ?? deltaUsedFallback);
+  const usedTokens =
+    input.lastApiCallInputSide !== undefined && input.lastApiCallInputSide > 0
+      ? input.lastApiCallInputSide
+      : (input.taskSnapshot?.usedTokens ?? deltaUsedFallback) || deltaUsedFallback;
🤖 Copy this AI Prompt to have your agent fix this:
In file apps/server/src/provider/Layers/ClaudeAdapter.ts around lines 517-520:

Line 520 uses `??` so when `input.taskSnapshot.usedTokens` is `0`, the code keeps that `0` instead of falling through to `deltaUsedFallback`. The comment on lines 521–524 states the intent is to "never emit 0 for a turn that clearly had activity", but `??` only falls back on `undefined`/`null`, not on `0`. If the SDK reports `usedTokens: 0` while `cumulative` indicates activity, `usedTokens` becomes `0`, violating the stated intent. Consider using a ternary that checks `> 0` instead of `??`.

Evidence trail:
apps/server/src/provider/Layers/ClaudeAdapter.ts lines 505-525 at REVIEWED_COMMIT. Line 520 shows `(input.taskSnapshot?.usedTokens ?? deltaUsedFallback)` using nullish coalescing. Lines 521-524 contain the comment stating "so we never emit 0 for a turn that clearly had activity". Line 505-506 shows `deltaUsedFallback = lastInputSideTokens > 0 ? lastInputSideTokens : cumulative.totalTokens` which would provide a non-zero fallback when cumulative indicates activity.

Rebuilds the personal T3-by-Stan DMG to pick up the per-call
input-side usedTokens fix (d46b444) so the context ring shows
accurate values on Opus + multi-call turns.

No behavioural change beyond version; bun.lock re-synced.

Artifact: release/T3-by-Stan-0.0.22-arm64.dmg (136 MB, unsigned).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@stansalvatec stansalvatec changed the title fix(cost): context ring + cost ledger accuracy + stop button fix(cost): Opus-accurate context ring + cost ledger + stop button Apr 21, 2026
Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 5 total unresolved issues (including 3 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit bd0fc3b. Configure here.


function sanitizeNumber(value: unknown): number {
return typeof value === "number" && Number.isFinite(value) && value >= 0 ? value : 0;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate identical functions in same file

Low Severity

sanitizeNumber and finiteNonNeg are identical functions defined in the same file — both accept unknown, check for a finite non-negative number, and return 0 otherwise. One of them can be removed and all call sites pointed at the surviving function.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit bd0fc3b. Configure here.

thread: null,
month: emptyBucket(),
allTime: emptyBucket(),
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale monthKey in shared singleton constant

Low Severity

EMPTY_COST_SUMMARY computes monthKey via monthKeyNow() once at module-load time and reuses it as a frozen constant. If the browser tab stays open across a month boundary, the placeholder and fallback monthKey becomes stale (e.g., shows "2026-03" in April). Converting EMPTY_COST_SUMMARY to a function or computing monthKey lazily would avoid the stale value.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit bd0fc3b. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XXL 1,000+ changed lines (additions + deletions). vouch:unvouched PR author is not yet trusted in the VOUCHED list.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants