Symptom
Token usage and cost are the primary cost driver for any agent run, but right now there's no first-class way to read them. Every caller hand-scrapes `total_cost_usd` from the last `result`-typed entry in `result.messages`.
Live evidence from a security-agent run against shreyaskapale/GOT today:
```
[usage: in=106 out=8480 cost=$0.3013]
```
The cost is right and cumulative across all 22 tool-using turns. But `in=106` / `out=8480` are only the final API call's tokens, not the whole run — the Claude Agent SDK's `SDKResultMessage.usage` carries last-turn numbers while `total_cost_usd` carries the running total. Mixing them as we do now is misleading.
What's already in place
| Surface |
State |
| Protocol `ca_usage_snapshot` event (`packages/protocol/src/sse-events.ts:54`) |
✅ Defined with `inputTokens`, `outputTokens`, `cacheCreationInputTokens`, `cacheReadInputTokens`, `costUsd` — all optional |
| Harness forwards `ca_usage_snapshot` (`packages/harness-server/src/services/run-session.ts:73`) |
✅ Wired |
| Engines emit them |
❌ Neither `engine-claude-agent-sdk` nor `engine-gitagent` translates engine output into these events |
| `ChatResult.usage` field |
❌ Doesn't exist |
So the protocol is right, the wire is right, the engines just don't fill in the events.
Where costUsd comes from (the question worth pinning down up front)
We do NOT maintain a price table. Two reasons:
- The provider already returns cost. Anthropic's API ships `usage.total_cost_usd` directly in each response. Claude Agent SDK's `SDKResultMessage.total_cost_usd` is a verbatim forward of that field. We read it and forward; we don't compute.
- A price table is a maintenance trap. Anthropic changes prices, Bedrock charges differently, regional pricing exists. Anything we hardcode goes stale silently.
For engines whose underlying provider doesn't surface cost, `costUsd` is simply left undefined — the protocol already marks it optional. Tokens (`inputTokens` / `outputTokens`) are universally available and always populated; cost is best-effort and provider-dependent.
This means: engine-claude-agent-sdk gets cost for free (Anthropic computes it). engine-gitagent likely too (gitclaw uses Anthropic under the hood). Future engines for non-cost-emitting providers just emit tokens and the SDK will return `usage.costUsd === undefined`.
Proposed change
Three pieces, ~80 LOC total, additive only:
1. `engine-claude-agent-sdk` — translate SDKResultMessage to ca_usage_snapshot
When the SDK emits a `result` message, yield a parallel `{ kind: 'ca_usage_snapshot' }` event populated from:
```ts
// Per-turn snapshot
{
inputTokens: msg.usage?.input_tokens,
outputTokens: msg.usage?.output_tokens,
cacheCreationInputTokens: msg.usage?.cache_creation_input_tokens,
cacheReadInputTokens: msg.usage?.cache_read_input_tokens,
costUsd: msg.total_cost_usd, // CUMULATIVE — see note in commit
}
```
Tokens are per-turn (what Anthropic's API returns for that request); cost is cumulative across the whole run (also what Anthropic returns; their SDK accumulates internally). Both fields shaped exactly as the protocol expects.
2. `engine-gitagent` — translate gitclaw's terminal system message
gitclaw emits a `{ type: 'system', subtype: 'session_end' }` with its own usage fields. Forward those into `ca_usage_snapshot`. Same shape. If the field set differs from Claude SDK's (gitclaw might not track cache tokens), the optional fields just go undefined.
3. SDK — first-class `ChatResult.usage`
Aggregate `ca_usage_snapshot` events seen during the chat handle's drain into a single rollup on the final `ChatResult`:
```ts
interface ChatResult {
// ... existing fields
readonly usage: {
readonly inputTokens: number; // SUM of per-turn input tokens
readonly outputTokens: number; // SUM of per-turn output tokens
readonly cacheCreationInputTokens: number;
readonly cacheReadInputTokens: number;
readonly costUsd: number | undefined; // FROM the last cumulative snapshot, or undefined if provider didn't supply it
};
}
```
After this, every caller can do:
```ts
const r = await runTask({...});
console.log(`$${r.usage.costUsd?.toFixed(4) ?? '?'} • ${r.usage.inputTokens + r.usage.outputTokens} tokens`);
```
Why per-turn snapshots matter (not just a final rollup)
Streaming UIs need the running total in real time — pricing tickers, budget warnings, mid-flight kill switches. The protocol already supports `ca_usage_snapshot` events flowing during the stream; we just need the engines to actually emit them. The final `ChatResult.usage` is the convenience rollup for non-streaming callers.
Definition of done
- `engine-claude-agent-sdk` emits `ca_usage_snapshot` after every `SDKResultMessage`. Test that mocks query() to yield a result message and asserts the snapshot lands in the event stream.
- `engine-gitagent` emits `ca_usage_snapshot` after the `session_end` system message. Test with mocked gitclaw query.
- `ChatResult.usage` exists. Existing aggregator tests use real numbers.
- Live validation: re-run `examples/security-agent.ts` against shreyaskapale/GOT and assert the SUM of per-turn input tokens matches what we'd compute from the raw `result` messages.
- No new dependencies. No price table.
Severity
Medium. Users can already get accurate cost (just `total_cost_usd` from the messages array). What's missing is correct token numbers (last-turn-only today, not cumulative) and an ergonomic API. Worth doing before more callers grow custom usage-scraping logic — currently it's a 1-line scrape; in six months it's everywhere.
Related
- `@computeragent/protocol` line 54 already defines the event correctly
- PLAN.md mentions `ca_usage_snapshot.compute_seconds_substrate` for substrate-level metering (v0.5+) — out of scope here; we're only doing LLM tokens + LLM cost
Symptom
Token usage and cost are the primary cost driver for any agent run, but right now there's no first-class way to read them. Every caller hand-scrapes `total_cost_usd` from the last `result`-typed entry in `result.messages`.
Live evidence from a security-agent run against shreyaskapale/GOT today:
```
[usage: in=106 out=8480 cost=$0.3013]
```
The cost is right and cumulative across all 22 tool-using turns. But `in=106` / `out=8480` are only the final API call's tokens, not the whole run — the Claude Agent SDK's `SDKResultMessage.usage` carries last-turn numbers while `total_cost_usd` carries the running total. Mixing them as we do now is misleading.
What's already in place
So the protocol is right, the wire is right, the engines just don't fill in the events.
Where costUsd comes from (the question worth pinning down up front)
We do NOT maintain a price table. Two reasons:
For engines whose underlying provider doesn't surface cost, `costUsd` is simply left undefined — the protocol already marks it optional. Tokens (`inputTokens` / `outputTokens`) are universally available and always populated; cost is best-effort and provider-dependent.
This means: engine-claude-agent-sdk gets cost for free (Anthropic computes it). engine-gitagent likely too (gitclaw uses Anthropic under the hood). Future engines for non-cost-emitting providers just emit tokens and the SDK will return `usage.costUsd === undefined`.
Proposed change
Three pieces, ~80 LOC total, additive only:
1. `engine-claude-agent-sdk` — translate SDKResultMessage to ca_usage_snapshot
When the SDK emits a `result` message, yield a parallel `{ kind: 'ca_usage_snapshot' }` event populated from:
```ts
// Per-turn snapshot
{
inputTokens: msg.usage?.input_tokens,
outputTokens: msg.usage?.output_tokens,
cacheCreationInputTokens: msg.usage?.cache_creation_input_tokens,
cacheReadInputTokens: msg.usage?.cache_read_input_tokens,
costUsd: msg.total_cost_usd, // CUMULATIVE — see note in commit
}
```
Tokens are per-turn (what Anthropic's API returns for that request); cost is cumulative across the whole run (also what Anthropic returns; their SDK accumulates internally). Both fields shaped exactly as the protocol expects.
2. `engine-gitagent` — translate gitclaw's terminal system message
gitclaw emits a `{ type: 'system', subtype: 'session_end' }` with its own usage fields. Forward those into `ca_usage_snapshot`. Same shape. If the field set differs from Claude SDK's (gitclaw might not track cache tokens), the optional fields just go undefined.
3. SDK — first-class `ChatResult.usage`
Aggregate `ca_usage_snapshot` events seen during the chat handle's drain into a single rollup on the final `ChatResult`:
```ts
interface ChatResult {
// ... existing fields
readonly usage: {
readonly inputTokens: number; // SUM of per-turn input tokens
readonly outputTokens: number; // SUM of per-turn output tokens
readonly cacheCreationInputTokens: number;
readonly cacheReadInputTokens: number;
readonly costUsd: number | undefined; // FROM the last cumulative snapshot, or undefined if provider didn't supply it
};
}
```
After this, every caller can do:
```ts
const r = await runTask({...});
console.log(`$${r.usage.costUsd?.toFixed(4) ?? '?'} • ${r.usage.inputTokens + r.usage.outputTokens} tokens`);
```
Why per-turn snapshots matter (not just a final rollup)
Streaming UIs need the running total in real time — pricing tickers, budget warnings, mid-flight kill switches. The protocol already supports `ca_usage_snapshot` events flowing during the stream; we just need the engines to actually emit them. The final `ChatResult.usage` is the convenience rollup for non-streaming callers.
Definition of done
Severity
Medium. Users can already get accurate cost (just `total_cost_usd` from the messages array). What's missing is correct token numbers (last-turn-only today, not cumulative) and an ergonomic API. Worth doing before more callers grow custom usage-scraping logic — currently it's a 1-line scrape; in six months it's everywhere.
Related