Token usage + cost telemetry: engines should emit ca_usage_snapshot, ChatResult.usage should be first-class

## Symptom

Token usage and cost are the primary cost driver for any agent run, but right now there's no first-class way to read them. Every caller hand-scrapes \`total_cost_usd\` from the last \`result\`-typed entry in \`result.messages\`.

Live evidence from a security-agent run against shreyaskapale/GOT today:

\`\`\`
[usage: in=106 out=8480 cost=\$0.3013]
\`\`\`

The cost is right and cumulative across all 22 tool-using turns. But \`in=106\` / \`out=8480\` are only the **final** API call's tokens, not the whole run — the Claude Agent SDK's \`SDKResultMessage.usage\` carries last-turn numbers while \`total_cost_usd\` carries the running total. Mixing them as we do now is misleading.

## What's already in place

| Surface | State |
|---|---|
| Protocol \`ca_usage_snapshot\` event (\`packages/protocol/src/sse-events.ts:54\`) | ✅ Defined with \`inputTokens\`, \`outputTokens\`, \`cacheCreationInputTokens\`, \`cacheReadInputTokens\`, \`costUsd\` — all optional |
| Harness forwards \`ca_usage_snapshot\` (\`packages/harness-server/src/services/run-session.ts:73\`) | ✅ Wired |
| **Engines emit them** | ❌ **Neither \`engine-claude-agent-sdk\` nor \`engine-gitagent\` translates engine output into these events** |
| \`ChatResult.usage\` field | ❌ Doesn't exist |

So the protocol is right, the wire is right, the engines just don't fill in the events.

## Where costUsd comes from (the question worth pinning down up front)

**We do NOT maintain a price table.** Two reasons:

1. **The provider already returns cost.** Anthropic's API ships \`usage.total_cost_usd\` directly in each response. Claude Agent SDK's \`SDKResultMessage.total_cost_usd\` is a verbatim forward of that field. We read it and forward; we don't compute.
2. **A price table is a maintenance trap.** Anthropic changes prices, Bedrock charges differently, regional pricing exists. Anything we hardcode goes stale silently.

For engines whose underlying provider doesn't surface cost, \`costUsd\` is simply left undefined — the protocol already marks it optional. Tokens (\`inputTokens\` / \`outputTokens\`) are universally available and always populated; cost is best-effort and provider-dependent.

This means: engine-claude-agent-sdk gets cost for free (Anthropic computes it). engine-gitagent likely too (gitclaw uses Anthropic under the hood). Future engines for non-cost-emitting providers just emit tokens and the SDK will return \`usage.costUsd === undefined\`.

## Proposed change

Three pieces, ~80 LOC total, additive only:

### 1. \`engine-claude-agent-sdk\` — translate SDKResultMessage to ca_usage_snapshot

When the SDK emits a \`result\` message, yield a parallel \`{ kind: 'ca_usage_snapshot' }\` event populated from:

\`\`\`ts
// Per-turn snapshot
{
  inputTokens: msg.usage?.input_tokens,
  outputTokens: msg.usage?.output_tokens,
  cacheCreationInputTokens: msg.usage?.cache_creation_input_tokens,
  cacheReadInputTokens: msg.usage?.cache_read_input_tokens,
  costUsd: msg.total_cost_usd,   // CUMULATIVE — see note in commit
}
\`\`\`

Tokens are per-turn (what Anthropic's API returns for *that* request); cost is cumulative across the whole run (also what Anthropic returns; their SDK accumulates internally). Both fields shaped exactly as the protocol expects.

### 2. \`engine-gitagent\` — translate gitclaw's terminal system message

gitclaw emits a \`{ type: 'system', subtype: 'session_end' }\` with its own usage fields. Forward those into \`ca_usage_snapshot\`. Same shape. If the field set differs from Claude SDK's (gitclaw might not track cache tokens), the optional fields just go undefined.

### 3. SDK — first-class \`ChatResult.usage\`

Aggregate \`ca_usage_snapshot\` events seen during the chat handle's drain into a single rollup on the final \`ChatResult\`:

\`\`\`ts
interface ChatResult {
  // ... existing fields
  readonly usage: {
    readonly inputTokens: number;          // SUM of per-turn input tokens
    readonly outputTokens: number;         // SUM of per-turn output tokens
    readonly cacheCreationInputTokens: number;
    readonly cacheReadInputTokens: number;
    readonly costUsd: number | undefined;  // FROM the last cumulative snapshot, or undefined if provider didn't supply it
  };
}
\`\`\`

After this, every caller can do:

\`\`\`ts
const r = await runTask({...});
console.log(\`\$\${r.usage.costUsd?.toFixed(4) ?? '?'} • \${r.usage.inputTokens + r.usage.outputTokens} tokens\`);
\`\`\`

## Why per-turn snapshots matter (not just a final rollup)

Streaming UIs need the running total in real time — pricing tickers, budget warnings, mid-flight kill switches. The protocol already supports \`ca_usage_snapshot\` events flowing during the stream; we just need the engines to actually emit them. The final \`ChatResult.usage\` is the convenience rollup for non-streaming callers.

## Definition of done

- \`engine-claude-agent-sdk\` emits \`ca_usage_snapshot\` after every \`SDKResultMessage\`. Test that mocks query() to yield a result message and asserts the snapshot lands in the event stream.
- \`engine-gitagent\` emits \`ca_usage_snapshot\` after the \`session_end\` system message. Test with mocked gitclaw query.
- \`ChatResult.usage\` exists. Existing aggregator tests use real numbers.
- Live validation: re-run \`examples/security-agent.ts\` against shreyaskapale/GOT and assert the SUM of per-turn input tokens matches what we'd compute from the raw \`result\` messages.
- No new dependencies. No price table.

## Severity

Medium. Users can already get accurate cost (just \`total_cost_usd\` from the messages array). What's missing is correct **token** numbers (last-turn-only today, not cumulative) and an ergonomic API. Worth doing before more callers grow custom usage-scraping logic — currently it's a 1-line scrape; in six months it's everywhere.

## Related

- \`@computeragent/protocol\` line 54 already defines the event correctly
- PLAN.md mentions \`ca_usage_snapshot.compute_seconds_substrate\` for substrate-level metering (v0.5+) — out of scope here; we're only doing LLM tokens + LLM cost

Surface	State
Protocol `ca_usage_snapshot` event (`packages/protocol/src/sse-events.ts:54`)	✅ Defined with `inputTokens`, `outputTokens`, `cacheCreationInputTokens`, `cacheReadInputTokens`, `costUsd` — all optional
Harness forwards `ca_usage_snapshot` (`packages/harness-server/src/services/run-session.ts:73`)	✅ Wired
Engines emit them	❌ Neither `engine-claude-agent-sdk` nor `engine-gitagent` translates engine output into these events
`ChatResult.usage` field	❌ Doesn't exist

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Token usage + cost telemetry: engines should emit ca_usage_snapshot, ChatResult.usage should be first-class #5

Symptom

What's already in place

Where costUsd comes from (the question worth pinning down up front)

Proposed change

1. `engine-claude-agent-sdk` — translate SDKResultMessage to ca_usage_snapshot

2. `engine-gitagent` — translate gitclaw's terminal system message

3. SDK — first-class `ChatResult.usage`

Why per-turn snapshots matter (not just a final rollup)

Definition of done

Severity

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Token usage + cost telemetry: engines should emit ca_usage_snapshot, ChatResult.usage should be first-class #5

Description

Symptom

What's already in place

Where costUsd comes from (the question worth pinning down up front)

Proposed change

1. `engine-claude-agent-sdk` — translate SDKResultMessage to ca_usage_snapshot

2. `engine-gitagent` — translate gitclaw's terminal system message

3. SDK — first-class `ChatResult.usage`

Why per-turn snapshots matter (not just a final rollup)

Definition of done

Severity

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions