fix(cost): Opus-accurate context ring + cost ledger + stop button#2273
fix(cost): Opus-accurate context ring + cost ledger + stop button#2273stansalvatec wants to merge 16 commits intopingdotgg:mainfrom
Conversation
Seed rates for Claude (sonnet-4.6, opus-4.6/4.7/4.5, haiku-4.5) and Codex (gpt-5.4, 5.3-codex, spark, mini) in USD per 1M tokens. getPricing() resolves via provider aliases with zero-rate fallback. computeTurnCost() splits input / cached / output / reasoning spend. Prep for session + MTD cost meter.
localStorage-persisted zustand store at t3code:cost-store:v1. Pure reducers accumulate token + USD spend per thread (session) and per YYYY-MM in local tz (month-to-date). sanitize*() guards garbage payloads; selectors expose session/month buckets and avg cost per turn. Tests: 17 pass.
useCostTracking hook observes activeThread activities and records each new context-window.updated event (with lastXxxTokens deltas) into the cost store. Seeds seen-set on mount / thread switch so historical activity is not retroactively charged to this month. Pure processActivitiesForCost reducer is unit-tested; the hook is a thin ref+effect wrapper. Tests: 9 pass.
CostMeter mirrors ContextWindowMeter's ring + Popover style. Fill ratio uses VITE_MONTHLY_BUDGET_USD if set, else a compressed log scale. Popover shows session/MTD totals, budget %, turn count, avg cost per turn, and per-model breakdown. Turns destructive color when over budget. useCostSummary zustand hook reads sessions + months slices and recomputes summary; cheap enough to recompute per render since selector is O(models). Composer wires useCostTracking side-effect + passes summary to ComposerFooterPrimaryActions next to ContextWindowMeter.
Let dev mode point at the installed app's "userdata" state for history continuity, and pave the way for a server-side usage/ JSON store that both dev and prod reuse. - deriveServerPaths accepts optional stateSubdir; env wins over the default (dev/userdata selection via devUrl). - Adds usageDir (<stateDir>/usage) to derived paths + ensures it exists at startup. - dev-runner: new --state-subdir flag + --use-userdata shortcut; forwards to T3CODE_STATE_SUBDIR. Startup logs warn loudly when dev is aimed at userdata. - Tests: dev-runner env matrix (22 pass), cli-config subdir override + usageDir derivation (10 pass).
- Add cacheCreationInputTokens + lastCacheCreationInputTokens to ThreadTokenUsageSnapshot. Anthropic charges cache-write at 1.25x input; reporting it separately lets the cost meter bill correctly. - Add optional model field to ThreadTokenUsageUpdatedPayload so the server-side cost tracker can resolve pricing without a lookup against thread state.
Anthropic bills cache-writes at 1.25x input; OpenAI has no separate write tier. Model a distinct cacheCreationInputPerMTok rate (with provider-aware defaults) so the cost meter no longer conflates cache hits, cache writes, and fresh input. - ModelPricing gains cacheCreationInputPerMTok; Claude auto-applies the 1.25x multiplier, OpenAI defaults to inputPerMTok. - TurnTokenDeltas + TurnCostBreakdown gain cacheCreation slots; zero for providers that don't distinguish the tier. - computeTurnCost bills each class additively. - Client extractDeltas reads lastCacheCreationInputTokens; helpers + fixtures carry the new field through. - Tests: +2 cases covering Anthropic cache-write premium and the OpenAI default.
… usage The Claude adapter lumped cache_read / cache_creation / fresh input into a single inputTokens field and emitted no per-turn deltas, leaving the cost meter silently $0 for every Claude turn and over-charging cached contexts by ~10x when it did fire. It also clamped usedTokens at maxTokens on cumulative totals, pinning the context ring at 100% once totalProcessedTokens exceeded the window. Changes: - Extract parseClaudeUsageBreakdown: splits SDK usage into four tiers (input / cachedInput / cacheCreationInput / output) with an explicit totalTokens. - normalizeClaudeTokenUsage emits all four tiers and drops the min(total, max) cap; callers decide how to render overflow. - Add buildClaudeTurnCompleteUsage: maintains a per-session lastTurnCumulativeUsage accumulator, subtracts from each result.usage to produce lastInputTokens / lastCachedInputTokens / lastCacheCreationInputTokens / lastOutputTokens deltas for the cost tracker. usedTokens prefers the task snapshot (real current context) over the cumulative total. - Context state gains lastTurnCumulativeUsage; initialized at session start, advanced on each turn-complete emission. Tests: - New ClaudeAdapter.usage.test.ts: 10 unit tests cover parseBreakdown semantics, first-turn vs second-turn deltas, clamp behaviour, task-snapshot fallback, and negative-delta guards. - ClaudeAdapter.test.ts updated: three existing cases now assert the split tiers + uncapped usedTokens (what the SDK actually reports). - Full server suite: 894 pass.
Introduces a server-owned cost ledger that writes three atomic JSON files per recorded turn: - session_<threadId>.json per-thread cumulative - YYYY-MM.json month bucket (local tz) - alltime.json running total since install Works across dev, installed app, and standalone binaries because persistence lives next to the server's existing SQLite state at <T3CODE_HOME>/<state>/usage/. Atomic writes mirror serverSettings: write .tmp, rename into place; errors log and swallow so orchestration never blocks on FS failure. Components: - types.ts: plain-TS interfaces + local-tz month key helper + empty-bucket constructors. - Reducer.ts: pure deriveTurnDeltas / processTurn / isTurnNoOp / sanitizePersistedFile. Prefers lastXxxTokens from the payload (Codex + post-fix Claude); falls back to delta-vs-lastCumulative for older providers. Zero-cost unknown models still record their token usage. - Services/CostTracker.ts: Effect Context.Service API (recordUsage / getSummary / updates stream). - Layers/CostTracker.ts: FS-backed live layer; semaphore-serialized writes; PubSub exposes live updates for WS broadcast. - shared/pricing: re-export ProviderKind so server consumers don't reach into contracts for it. Tests: 14 pure reducer cases + 5 live-layer cases (record, idempotent no-op, accumulate, stream emission, zero-summary). All green.
Wire the runtime event stream into the new CostTracker and expose
the ledger over HTTP so web + desktop + standalone binaries all
share the same authoritative cost data.
Server (c11 + c12)
- ProviderRuntimeIngestion now calls CostTracker.recordUsage after
appending the context-window.updated activity. Errors are logged
and swallowed so orchestration is never blocked by FS faults.
- Model comes from event.payload.model (set by adapters) with a
fallback to thread.modelSelection.model.
- CostTrackerLive added to the server composition root + wired into
test + integration layers (stub mock for server.test.ts).
- New GET /api/cost/summary?threadId=X route returns the freshest
session + month + all-time summary. CORS handled via the existing
browserApi layer.
Client (c13)
- Drop zustand + localStorage. The old costStore.ts /
useCostTracking.ts (plus their tests) are gone — server is now
source of truth.
- New lib/costQuery.ts: react-query queryOptions + sanitizer for
the HTTP response, plus formatUsd utility. Invalidation helper
bumps the cache whenever the active thread receives a new
context-window.updated activity, so the ring updates within one
render of the server write.
- ChatComposer replaces useCostTracking/useCostSummary with a
useQuery subscription and a tiny effect that invalidates on new
usage activities. Plumbs activeProvider through to the meter.
- CostMeter: rebuild around the new {thread, month, allTime}
shape. Popover now shows session ⋅ MTD ⋅ all-time and gracefully
renders "—" for providers without token-usage telemetry (cursor /
opencode) instead of a misleading $0.
Tests: 913 server pass, 906 web pass (26 old localStorage tests
deleted, replaced by server-owned CostTracker coverage from c10).
When the final `thread.message-sent` (streaming:false) arrives, the client marks `latestTurn.state` as "completed" but leaves `session.status === "running"` until the separate `thread.session-set` event (emitted server-side on `turn.completed`) arrives. In that gap: - The stop button stays red because visibility is derived from `derivePhase(session)` → `"running"` via `session.status`. - Clicking it dispatches `thread.turn.interrupt`; the server has no active turn so the command is a no-op, and the UI stays stuck until the late `thread.session-set` lands. Fix: - `store.ts` `thread.message-sent` handler: when the final assistant message for the currently active turn arrives and `latestTurn` resolves to "completed", optimistically flip `session.status` / `orchestrationStatus` to "ready" and clear `activeTurnId`. The later server-sent `thread.session-set` overwrites session via `mapSession` and is idempotent over this change. Interrupted and errored turns are excluded (checked via `latestTurn.state === "completed"` and the `activeTurnId === event.turnId` guard). - `ChatView.tsx` `onInterrupt`: defensive guard — if `latestTurn` is already in a terminal state (completed / interrupted / error), skip the dispatch. This closes the small window where a click lands before React re-renders the composer. Tests: - Updated the existing replay-batch test: after a final assistant `message-sent` for the active turn, `session.status` is now "ready" and `activeTurnId` is cleared. - Added a test that a mismatched turnId (active turn ≠ streaming:false message turn) does NOT reconcile — the server's session-set remains authoritative. - Added a test that an interrupted turn's final message does NOT reconcile session to "ready". All 908 web tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
ApprovabilityVerdict: Needs human review This PR introduces a significant new cost tracking feature with server-side ledger, pricing calculations, HTTP endpoints, and UI components. Unresolved HIGH severity review comments identify potential bugs in the Codex adapter token calculation (NaN from undefined addition) that could cause incorrect cost reporting. The billing/cost tracking nature of these changes combined with the identified bugs warrant careful human review. You can customize Macroscope's approvability policy. Learn more. |
Address Cursor Bugbot + Macroscope findings on pingdotgg#2273: - apps/server/src/cost/Reducer.ts: drop the no-op ternaries in sanitizePersistedFile (`r.version === 1 ? 1 : 1` and `r.kind === expectedKind ? expectedKind : expectedKind`). Both always returned the right-hand value regardless of the stored value, so they were silently forcing the expected defaults — which is actually the intended sanitize-on-mismatch behaviour. Simplify to the constants directly and add a comment explaining the intent. (Macroscope, Reducer.ts:325-326.) - apps/web/src/lib/costQuery.ts: stop duplicating `formatUsd` and instead re-export it from `@t3tools/shared/pricing` (the shared package was already a workspace dep and owns computeTurnCost next to the formatter). Keeping the re-export so CostMeter and any future consumer continue to import from `~/lib/costQuery` as the single cost-UI utility module. (Cursor, duplicated-function.) - apps/web/src/lib/costQuery.ts: remove the dead `useInvalidateCostSummary` hook. The ChatComposer calls `invalidateCostSummary` directly with its own `useQueryClient`, so the hook wrapper was unused surface area. (Cursor, dead-code.) Verified: web typecheck clean, web tests 908/908 pass, server cost tests 19/19 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
updated the PR |
Two independent bugs in the token-usage pipeline, both user-visible
and both rooted in the same conflation between the context-window
dimension (what fills the ring) and the billing dimension (what
lands in the cost ledger).
## 1. Cost ledger over-counting (CRITICAL)
Claude emits `thread.token-usage.updated` events from three places
per turn: every `task_progress`, every `task_notification`, and the
final `completeTurn`. The mid-turn snapshots carry per-API-call
breakdowns *without* `lastXxxTokens` fields, while the turn-complete
snapshot carries cumulative totals *with* `lastXxx` deltas.
`ProviderRuntimeIngestion` fed every one of these events into
`CostTracker.recordUsage`. For the mid-turn events, the Reducer's
`hasExplicitLast=false` branch subtracts the payload's cumulative
against the session's `lastCumulative` — but what gets stored in
`lastCumulative` between mid-turn events is one API call's
breakdown, not the session running total, so the resulting "deltas"
are arbitrary diffs between per-call snapshots. Net effect: cost
over/undercounted unpredictably every turn, and `turnCount`
inflated by 3–10× because every mid-turn snapshot with any positive
delta bumped it.
Fix: gate `recordUsage` in `ProviderRuntimeIngestion` on the
presence of any `lastXxxTokens` field. Mid-turn snapshots still
flow to the `context-window.updated` activity for the ring, they
just skip the ledger. Codex only emits one snapshot per turn (and
always with `lastXxx`) so it's unaffected.
While here, normalise the model slug (`resolveModelSlugForProvider`)
before passing it to the ledger so aliased/canonical variants
collapse to a single `byModel` key.
## 2. Context-window ring over-reporting
Both adapters set `usedTokens = totalTokens`, which for the cost
dimension meant *every* billed token including outputs. But the
ring consumes `usedTokens / maxTokens`, and output tokens are
generated *out* of the model — they don't live in the prompt
window, so including them inflated the ring (especially on long-
output turns). Reasoning tokens have the same property (ephemeral,
not persisted into next-turn context).
Fix: redefine `usedTokens` as the input-side total only
(`input + cache-read + cache-creation`), in both
`normalizeClaudeTokenUsage`/`buildClaudeTurnCompleteUsage` and
`normalizeCodexTokenUsage` (`last.inputTokens +
last.cachedInputTokens` — Codex V2 has no cache-creation tier).
`totalProcessedTokens` keeps the original semantic ("tokens
processed so far", billing-side). Added a contract-level JSDoc on
`ThreadTokenUsageSnapshot` that spells out the two dimensions and
the `lastXxxTokens` "turn-final" signal.
Also: the client's `deriveLatestContextWindowSnapshot` was silently
dropping `cacheCreationInputTokens` / `lastCacheCreationInputTokens`
from the `ContextWindowSnapshot` shape even though the payload
carries them. Wire them through.
## 3. Migration
Existing ledger files are polluted and can't be repaired in-place.
Added a `.schema-v2` sentinel in the usage dir: `CostTrackerLive`
boots, sees no sentinel, wipes only the known ledger files
(`session_*.json`, `YYYY-MM.json`, `alltime.json`) — any stray
files are left alone — writes the sentinel, and subsequent boots
skip. Bumping `LEDGER_SCHEMA_VERSION` is the single line needed
for any future reducer-incompatible change.
## Tests
- Reworked Claude/Codex adapter assertions for the new input-side
`usedTokens` semantic (24542 → 23863 for the Claude cumulative
case, 126 → 120 for Codex, etc.); explanatory comments added.
- New ProviderRuntimeIngestion test: mid-turn snapshot (no
`lastXxx`) projects into the activity stream but does NOT bump
the ledger; turn-final snapshot records exactly one turn.
- New CostTrackerLive tests: first boot wipes pre-v2 ledger files
(including a `.json` stray, which survives); subsequent boot
with sentinel present leaves ledger files intact.
- Existing ingestion tests retargeted at a temp-dir base so the
first-boot wipe can't touch the developer's real
`<cwd>/userdata/usage/` directory.
All 203 server tests pass in the changed files; 908 web tests
pass; 126 shared tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| const usedTokens = inputSideTokens > 0 ? inputSideTokens : usage.last.totalTokens; | ||
| if (usedTokens <= 0) { | ||
| return undefined; | ||
| } |
There was a problem hiding this comment.
Missing undefined guard in Codex token usage fallback
Medium Severity
The old code guarded against usage.last.totalTokens being undefined with an explicit usedTokens === undefined || usedTokens <= 0 check. The new code removed the undefined check. When inputSideTokens is 0, usedTokens falls back to usage.last.totalTokens. If that value is undefined, the guard usedTokens <= 0 evaluates to false (because undefined coerces to NaN, and NaN <= 0 is false), so the function proceeds to return a snapshot with usedTokens: undefined instead of returning undefined to signal no valid usage data.
Reviewed by Cursor Bugbot for commit b027c89. Configure here.
…accuracy
The earlier switch to input-side `usedTokens` still showed inflated
values for Claude Opus (and any multi-call turn) because the two
signals we trusted are both unreliable sources of current context
size:
1. `result.usage` is **session-cumulative** across every API call on
the thread, not just this turn. Summing its input-side classes
grows linearly with turn count — exactly what users saw on Opus,
which makes many API calls per turn.
2. `task_progress.usage` only carries an opaque SDK
`total_tokens`; the Anthropic-native per-class breakdown
(`input_tokens` / `cache_read_input_tokens` /
`cache_creation_input_tokens`) is **not present** on
`SDKTaskProgressMessage.usage`. Parsing it always falls through
to `total_tokens`.
The only source that carries the *exact per-call prompt breakdown*
is `SDKAssistantMessage.message.usage` — that's `BetaUsage` from
the Anthropic API, refreshed on every assistant frame.
Fix:
- New `context.lastApiCallInputSideTokens` tracks `input_tokens +
cache_read_input_tokens + cache_creation_input_tokens` captured
from each `SDKAssistantMessage.message.usage`. Refreshed per
frame, cleared after the turn-completion emission so the next
turn starts clean.
- `handleAssistantMessage` also emits a
`thread.token-usage.updated` event on each assistant frame with
this input-side sum as `usedTokens`, so the mid-turn ring tracks
real prompt size (not the SDK's opaque total).
- `buildClaudeTurnCompleteUsage` now takes an optional
`lastApiCallInputSide` and uses it as the top-priority
`usedTokens` source. Priority:
1. `lastApiCallInputSide` — exact current context.
2. `taskSnapshot.usedTokens` — SDK opaque (fallback).
3. Per-turn *delta* input-side — last-ditch when neither
above is present. The old session-cumulative fallback has
been removed; it inflated any multi-call turn.
- `lastUsedTokens` mirrors `usedTokens` when the per-turn input-side
delta is zero, so we never fall back to the session-cumulative sum.
Tests:
- Updated the "preserves oversized result totals after task
progress" test: `lastUsedTokens` is now `190_000` (mirrors
`usedTokens`), not `535_000` (the removed cumulative fallback).
- New `prefers lastApiCallInputSide over the task snapshot for
usedTokens`: when both are present, per-call wins.
- New `does NOT fall back to cumulative input-side for usedTokens`:
with a real prior cumulative, fallback now returns the per-turn
delta, not the session-wide sum.
- New adapter-level test verifying an assistant frame with
Anthropic-native usage emits a `thread.token-usage.updated`
event with `usedTokens = input + cache_read + cache_creation`.
Important: existing threads retain their pre-fix `usedTokens`
values in stored `context-window.updated` activities until the
next turn generates a new activity. The ring self-heals on the
first new turn; old turns in-history keep their stale numbers.
Verified: 206/206 targeted server tests pass (3 new), 908/908 web
tests pass, typecheck + oxlint clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Local rebuild for personal distribution off the
feat/token-cost-meter branch. Keeps the app bundle identifier
(`com.t3tools.t3code`) untouched so existing auto-update channels
aren't disturbed, but changes the user-facing name, dev launcher
label, and artifact filename.
- apps/desktop/package.json: productName → "T3 by Stan".
- apps/desktop/scripts/electron-launcher.mjs: APP_DISPLAY_NAME
follows the new name (dev / prod variants).
- scripts/build-desktop-artifact.ts: artifactName →
`T3-by-Stan-${version}-${arch}.${ext}` so the DMG / zip /
blockmap files land as `release/T3-by-Stan-0.0.21-arm64.dmg` etc.
- apps/{desktop,server,web}/package.json + bun.lock: version bump
0.0.20 → 0.0.21.
The legacy user-data migration constant in `apps/desktop/src/main.ts`
(`LEGACY_USER_DATA_DIR_NAME = "T3 Code (Alpha)"`) is intentionally
left alone so this build still picks up data from the prior install.
Built macOS arm64 DMG sits at release/T3-by-Stan-0.0.21-arm64.dmg
(136 MB, unsigned / ad-hoc — Gatekeeper first-launch warning
expected). Signing / notarization not configured; would require
Apple Developer credentials.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| // size. Fall back to the raw `last.totalTokens` only when the | ||
| // breakdown is zero (defensive — shouldn't happen for any real turn). | ||
| const inputSideTokens = inputTokens + cachedInputTokens; | ||
| const usedTokens = inputSideTokens > 0 ? inputSideTokens : usage.last.totalTokens; |
There was a problem hiding this comment.
NaN from undefined addition defeats input-side-only fix
High Severity
In normalizeCodexTokenUsage, inputTokens and cachedInputTokens can be undefined (the code itself checks !== undefined when conditionally spreading them into the snapshot a few lines later). Adding undefined + undefined or number + undefined produces NaN, and NaN > 0 is false, so the fallback to usage.last.totalTokens (which includes output + reasoning tokens) silently kicks in — exactly the over-reporting this PR is meant to fix. The values need a nullish coalesce to zero before addition.
Reviewed by Cursor Bugbot for commit 1790ec5. Configure here.
| kind, | ||
| key, | ||
| bucket: emptyCostBucket(now), | ||
| }); |
There was a problem hiding this comment.
Unused emptyBucketFile function is dead code
Low Severity
The emptyBucketFile helper is defined inside the make generator but never called anywhere. The loadFile function already handles missing files via sanitizePersistedFile, which returns an empty bucket when the raw input is undefined. This is dead code that can be removed.
Reviewed by Cursor Bugbot for commit 1790ec5. Configure here.
| const usedTokens = | ||
| input.lastApiCallInputSide !== undefined && input.lastApiCallInputSide > 0 | ||
| ? input.lastApiCallInputSide | ||
| : (input.taskSnapshot?.usedTokens ?? deltaUsedFallback); |
There was a problem hiding this comment.
🟢 Low Layers/ClaudeAdapter.ts:517
Line 520 uses ?? so when input.taskSnapshot.usedTokens is 0, the code keeps that 0 instead of falling through to deltaUsedFallback. The comment on lines 521–524 states the intent is to "never emit 0 for a turn that clearly had activity", but ?? only falls back on undefined/null, not on 0. If the SDK reports usedTokens: 0 while cumulative indicates activity, usedTokens becomes 0, violating the stated intent. Consider using a ternary that checks > 0 instead of ??.
- const usedTokens =
- input.lastApiCallInputSide !== undefined && input.lastApiCallInputSide > 0
- ? input.lastApiCallInputSide
- : (input.taskSnapshot?.usedTokens ?? deltaUsedFallback);
+ const usedTokens =
+ input.lastApiCallInputSide !== undefined && input.lastApiCallInputSide > 0
+ ? input.lastApiCallInputSide
+ : (input.taskSnapshot?.usedTokens ?? deltaUsedFallback) || deltaUsedFallback;🤖 Copy this AI Prompt to have your agent fix this:
In file apps/server/src/provider/Layers/ClaudeAdapter.ts around lines 517-520:
Line 520 uses `??` so when `input.taskSnapshot.usedTokens` is `0`, the code keeps that `0` instead of falling through to `deltaUsedFallback`. The comment on lines 521–524 states the intent is to "never emit 0 for a turn that clearly had activity", but `??` only falls back on `undefined`/`null`, not on `0`. If the SDK reports `usedTokens: 0` while `cumulative` indicates activity, `usedTokens` becomes `0`, violating the stated intent. Consider using a ternary that checks `> 0` instead of `??`.
Evidence trail:
apps/server/src/provider/Layers/ClaudeAdapter.ts lines 505-525 at REVIEWED_COMMIT. Line 520 shows `(input.taskSnapshot?.usedTokens ?? deltaUsedFallback)` using nullish coalescing. Lines 521-524 contain the comment stating "so we never emit 0 for a turn that clearly had activity". Line 505-506 shows `deltaUsedFallback = lastInputSideTokens > 0 ? lastInputSideTokens : cumulative.totalTokens` which would provide a non-zero fallback when cumulative indicates activity.
Rebuilds the personal T3-by-Stan DMG to pick up the per-call input-side usedTokens fix (d46b444) so the context ring shows accurate values on Opus + multi-call turns. No behavioural change beyond version; bun.lock re-synced. Artifact: release/T3-by-Stan-0.0.22-arm64.dmg (136 MB, unsigned). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
There are 5 total unresolved issues (including 3 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit bd0fc3b. Configure here.
|
|
||
| function sanitizeNumber(value: unknown): number { | ||
| return typeof value === "number" && Number.isFinite(value) && value >= 0 ? value : 0; | ||
| } |
There was a problem hiding this comment.
Duplicate identical functions in same file
Low Severity
sanitizeNumber and finiteNonNeg are identical functions defined in the same file — both accept unknown, check for a finite non-negative number, and return 0 otherwise. One of them can be removed and all call sites pointed at the surviving function.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit bd0fc3b. Configure here.
| thread: null, | ||
| month: emptyBucket(), | ||
| allTime: emptyBucket(), | ||
| }; |
There was a problem hiding this comment.
Stale monthKey in shared singleton constant
Low Severity
EMPTY_COST_SUMMARY computes monthKey via monthKeyNow() once at module-load time and reuses it as a frozen constant. If the browser tab stays open across a month boundary, the placeholder and fallback monthKey becomes stale (e.g., shows "2026-03" in April). Converting EMPTY_COST_SUMMARY to a function or computing monthKey lazily would avoid the stale value.
Reviewed by Cursor Bugbot for commit bd0fc3b. Configure here.


Summary
Five related fixes stacked on the token-cost-meter branch, now merged together:
session.statusinthread.message-sentand guardsonInterruptwhen the latest turn is in a terminal state.useInvalidateCostSummaryremoved, duplicateformatUsdre-exported from@t3tools/shared/pricing, no-op ternaries insanitizePersistedFilesimplified.ProviderRuntimeIngestionnow gatesrecordUsageon the presence of anylastXxxTokensfield (the canonical "turn-final" signal). Mid-turn Claude snapshots only flow to the context-window activity, not the ledger. Also normalises the model slug before ledger writes so thebyModelbreakdown stays stable.usedTokensas input-side only (input + cache-read + cache-creationfor Claude,last.inputTokens + last.cachedInputTokensfor Codex).totalProcessedTokenskeeps its billing-side semantic.ContextWindowSnapshotnow carriescacheCreationInputTokens/lastCacheCreationInputTokens.result.usage(which is session-cumulative across every API call, not per-turn), and thetask_progress.usageSDK field only exposes an opaquetotal_tokens. Now we captureinput + cache_read + cache_creationfrom everySDKAssistantMessage.message.usage(Anthropic-native per-call breakdown) and use it as the top-priorityusedTokenssource, emit mid-turn ring updates on each assistant frame, and drop the session-cumulative fallback entirely.Migration: existing ledger files are polluted and can't be repaired in-place.
CostTrackerLivewrites a.schema-v2sentinel in the usage dir on boot; when absent it wipes the known ledger files (session_*.json,YYYY-MM.json,alltime.json) and writes the sentinel. Stray non-ledger files are left alone. BumpingLEDGER_SCHEMA_VERSIONis the single line needed for future reducer-incompatible changes.Existing threads:
context-window.updatedactivities sit in the orchestration event log per-thread. The ring reads the latest such activity, so existing threads keep their pre-fix (inflated) values until a new turn lands. New chats → correct immediately. Old threads → self-heal on next turn.Files
apps/server/src/orchestration/Layers/ProviderRuntimeIngestion.ts— turn-final filter + model slug normalisation.apps/server/src/provider/Layers/ClaudeAdapter.ts—usedTokens= input-side; per-call capture fromSDKAssistantMessage.message.usage; priority forlastApiCallInputSide.apps/server/src/provider/Layers/CodexAdapter.ts—usedTokens=last.inputTokens + last.cachedInputTokens.apps/server/src/cost/Layers/CostTracker.ts— schema sentinel + first-boot wipe.apps/web/src/lib/contextWindow.ts— carry throughcacheCreationInputTokens/lastCacheCreationInputTokens.packages/contracts/src/providerRuntime.ts— JSDoc forThreadTokenUsageSnapshot(two dimensions, turn-final signal).apps/web/src/store.ts+apps/web/src/components/ChatView.tsx— stop-button reconciliation.apps/web/src/lib/costQuery.ts+apps/server/src/cost/Reducer.ts— bot-review cleanups.Test plan
apps/servercost + adapter + ingestion tests — 206/206 pass (+3 new for Opus fix)apps/web— 908/908 passpackages/shared— 126/126 passoxlintclean on changed filesturnCountmatches user-visible turn count (not 3-10× higher).schema-v2sentinel in<T3CODE_HOME>/<state>/usage/and wipes existing ledger files; second boot leaves them intactMigration note for users
On first server boot after merging, the usage ledger at
<T3CODE_HOME>/<state>/usage/is wiped (session + month + all-time files) to clear totals polluted by the pre-fix reducer. A.schema-v2sentinel is written to prevent re-wipes. Month + all-time totals rebuild from subsequent turns; per-thread session files show $0/0 until a new turn lands.Existing thread rings may keep their old wrong values until the next turn generates a new activity.
🤖 Generated with Claude Code
Note
Add per-turn cost ledger, accurate context-window ring, and stop-button guard for Claude/Codex
CostTrackerLiveservice that atomically persists per-session, per-month (local timezone), and all-time cost ledgers as JSON files underusageDir, exposed viaGET /api/cost/summary.pricing.tsmodule inpackages/sharedwith model pricing for Claude and Codex models,computeTurnCost, andformatUsdfor UI display.ClaudeAdapterandCodexAdapterto reportusedTokensas input-side tokens only (not including output), emit mid-turnthread.token-usage.updatedevents from assistant frames, and include per-class deltas (last*fields) on turn completion.CostMeterring in the chat composer footer showing session and month-to-date spend with a per-model breakdown popover, invalidating on eachcontext-window.updatedactivity.ProviderRuntimeIngestionnow records costs toCostTrackeronly for turn-final token-usage events (those withlast*fields); mid-turn snapshots are ignored.ChatView.onInterruptto skip dispatching a stop command when the latest turn is already in a terminal state (completed, interrupted, or error).usedTokensin token-usage snapshots now reflects input-side tokens only and is no longer capped tomaxTokens;totalProcessedTokenscarries the full cumulative billing total.Macroscope summarized bd0fc3b.