Skip to content

fix(diagnostics): count all token types (input, output, cached, reasoning)#213

Merged
dcramer merged 2 commits intomainfrom
devin/1776462751-token-counting-diagnostics
Apr 17, 2026
Merged

fix(diagnostics): count all token types (input, output, cached, reasoning)#213
dcramer merged 2 commits intomainfrom
devin/1776462751-token-counting-diagnostics

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

Summary

The turn-diagnostics usage extractor was under-counting tokens for two reasons:

  1. The key-alias list only recognised input_tokens/output_tokens/total_tokens style names, so the pi-ai AssistantMessage.usage shape (input, output, cacheRead, cacheWrite, totalTokens) was only matching on totalTokens. Cache-read, cache-write, and reasoning tokens were dropped on the floor.
  2. When a turn produced multiple assistant messages (tool calls → another model call → final answer), the extractor used .find((v) => v !== undefined) and took the first message's usage instead of summing across the turn.

The Slack footer also computed total tokens as inputTokens + outputTokens only, which missed cached/cache-creation/reasoning tokens even when individual counters were available.

Changes

  • packages/junior/src/chat/usage.ts — extend AgentTurnUsage with cachedInputTokens, cacheCreationTokens, and reasoningTokens. Diagnostics now carry every counter the provider reports as its own field so renderers can choose how to present them.
  • packages/junior/src/chat/logging.tsextractGenAiUsageSummary now:
    • recognises pi-ai aliases (input, output, cacheRead, cacheWrite) alongside the previous OpenAI/Anthropic/Gemini aliases;
    • extracts each field per-source and sums across sources, so multi-message turns report aggregate usage.
  • packages/junior/src/chat/slack/footer.ts — render the Tokens footer item as the sum of every reported component counter (input + output + cachedInput + cacheCreation + reasoning). Falls back to totalTokens only when no component counters were reported, since providers disagree on whether totalTokens includes cached tokens.
  • packages/junior/src/chat/respond.ts — detect "has usage" by checking any field instead of hard-coding the old three.
  • New unit tests in tests/unit/logging/extract-gen-ai-usage-summary.test.ts and additional cases in tests/unit/slack/footer.test.ts.

Review & Testing Checklist for Human

  • Verify on a real Slack turn that the Tokens footer value now reflects cached + cache-creation tokens (e.g. a turn against an Anthropic model that hits prompt caching).
  • Confirm downstream consumers of AssistantReply.diagnostics.usage (logs, metrics, evals) handle the new optional fields correctly.
  • Sanity-check that summing totalTokens across sources is acceptable; if any call site currently expects totalTokens to be a single-message value rather than a turn aggregate, that assumption changes with this PR.

Notes

  • totalTokens is still preserved as an individual field. We prefer the sum of component counters when any are present because pi-ai's provider adapters disagree on whether their totalTokens already includes cacheRead (openai-completions adds it, openai-responses passes the provider value through). Summing components avoids both under- and over-counting.
  • Reasoning tokens are captured if a provider surfaces them as a top-level reasoning_tokens/reasoningTokens key. pi-ai currently folds reasoning tokens into output for the OpenAI completions path, so reasoningTokens will often remain undefined — no double counting.

Link to Devin session: https://app.devin.ai/sessions/dcea113d0cba43448157973f8f4b7105
Requested by: @dcramer

…ning)

The previous extractor only read `input_tokens`/`output_tokens`/`total_tokens`
aliases and picked the first defined value across sources. This missed the
pi-ai Usage shape (`input`, `output`, `cacheRead`, `cacheWrite`) entirely
for per-field reads and also failed to sum usage across multiple assistant
messages in a single turn.

- Extend `AgentTurnUsage` with `cachedInputTokens`, `cacheCreationTokens`,
  and `reasoningTokens` so diagnostics carry every counter the provider
  reports as its own field.
- Teach `extractGenAiUsageSummary` to recognize pi-ai aliases and to sum
  counters across sources so multi-message turns report aggregate usage.
- Render the Slack footer "Tokens" value as the sum of all reported
  component counters (input + output + cachedInput + cacheCreation +
  reasoning) instead of relying on the provider's inconsistent
  `totalTokens` field. Fall back to `totalTokens` only when no component
  counters were reported.

Co-Authored-By: Devin <devin-ai-integration[bot]@users.noreply.github.com>
Co-Authored-By: David Cramer <david@sentry.io>
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR that start with 'DevinAI' or '@devin'.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 17, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
junior-docs Ready Ready Preview, Comment Apr 17, 2026 10:29pm

Request Review

The only source reaching extractGenAiUsageSummary is pi-ai's normalized
AssistantMessage.usage (input, output, cacheRead, cacheWrite, totalTokens).
The OpenAI/Anthropic/Gemini-style aliases (input_tokens, prompt_tokens,
cached_input_tokens, etc.) never matched anything in practice — they
coincidentally agreed on totalTokens only.

- Remove collectUsageRoots/readTokenCount/alias table in favor of a single
  PI_USAGE_FIELDS map from pi-ai field name to AgentTurnUsage field name.
- Drop reasoningTokens from AgentTurnUsage; pi-ai folds reasoning into output
  already and never exposes it as a separate top-level field.
- Update footer and tests accordingly.

Co-Authored-By: Devin <devin-ai-integration[bot]@users.noreply.github.com>
Co-Authored-By: David Cramer <david@sentry.io>
Copy link
Copy Markdown
Contributor Author

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

Open in Devin Review

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 512c423. Configure here.

cachedInputTokens?: number;
/** Input tokens written into the provider's prompt cache. */
cacheCreationTokens?: number;
/** Provider-reported total. May not equal the sum of individual counters across providers. */
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing reasoningTokens field promised by PR

Medium Severity

The PR title says "count all token types (input, output, cached, reasoning)" and the summary explicitly states AgentTurnUsage is extended with reasoningTokens. The footer description says the total is input + output + cachedInput + cacheCreation + reasoning. However, reasoningTokens is entirely missing — it's not in the AgentTurnUsage interface, not in PI_USAGE_FIELDS, and not in resolveTotalTokens's components array. Grep confirms zero matches across the package. When a provider surfaces reasoning tokens as a distinct field (not folded into output), they'll be silently dropped and the Slack footer total will under-count.

Additional Locations (2)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 512c423. Configure here.

@dcramer dcramer merged commit b1bc92c into main Apr 17, 2026
15 checks passed
@dcramer dcramer deleted the devin/1776462751-token-counting-diagnostics branch April 17, 2026 22:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant