fix(diagnostics): count all token types (input, output, cached, reasoning) by devin-ai-integration[bot] · Pull Request #213 · getsentry/junior

devin-ai-integration · 2026-04-17T21:56:13Z

Summary

The turn-diagnostics usage extractor was under-counting tokens for two reasons:

The key-alias list only recognised input_tokens/output_tokens/total_tokens style names, so the pi-ai AssistantMessage.usage shape (input, output, cacheRead, cacheWrite, totalTokens) was only matching on totalTokens. Cache-read, cache-write, and reasoning tokens were dropped on the floor.
When a turn produced multiple assistant messages (tool calls → another model call → final answer), the extractor used .find((v) => v !== undefined) and took the first message's usage instead of summing across the turn.

The Slack footer also computed total tokens as inputTokens + outputTokens only, which missed cached/cache-creation/reasoning tokens even when individual counters were available.

Changes

packages/junior/src/chat/usage.ts — extend AgentTurnUsage with cachedInputTokens, cacheCreationTokens, and reasoningTokens. Diagnostics now carry every counter the provider reports as its own field so renderers can choose how to present them.
packages/junior/src/chat/logging.ts — extractGenAiUsageSummary now:
- recognises pi-ai aliases (input, output, cacheRead, cacheWrite) alongside the previous OpenAI/Anthropic/Gemini aliases;
- extracts each field per-source and sums across sources, so multi-message turns report aggregate usage.
packages/junior/src/chat/slack/footer.ts — render the Tokens footer item as the sum of every reported component counter (input + output + cachedInput + cacheCreation + reasoning). Falls back to totalTokens only when no component counters were reported, since providers disagree on whether totalTokens includes cached tokens.
packages/junior/src/chat/respond.ts — detect "has usage" by checking any field instead of hard-coding the old three.
New unit tests in tests/unit/logging/extract-gen-ai-usage-summary.test.ts and additional cases in tests/unit/slack/footer.test.ts.

Review & Testing Checklist for Human

Verify on a real Slack turn that the Tokens footer value now reflects cached + cache-creation tokens (e.g. a turn against an Anthropic model that hits prompt caching).
Confirm downstream consumers of AssistantReply.diagnostics.usage (logs, metrics, evals) handle the new optional fields correctly.
Sanity-check that summing totalTokens across sources is acceptable; if any call site currently expects totalTokens to be a single-message value rather than a turn aggregate, that assumption changes with this PR.

Notes

totalTokens is still preserved as an individual field. We prefer the sum of component counters when any are present because pi-ai's provider adapters disagree on whether their totalTokens already includes cacheRead (openai-completions adds it, openai-responses passes the provider value through). Summing components avoids both under- and over-counting.
Reasoning tokens are captured if a provider surfaces them as a top-level reasoning_tokens/reasoningTokens key. pi-ai currently folds reasoning tokens into output for the OpenAI completions path, so reasoningTokens will often remain undefined — no double counting.

Link to Devin session: https://app.devin.ai/sessions/dcea113d0cba43448157973f8f4b7105
Requested by: @dcramer

…ning) The previous extractor only read `input_tokens`/`output_tokens`/`total_tokens` aliases and picked the first defined value across sources. This missed the pi-ai Usage shape (`input`, `output`, `cacheRead`, `cacheWrite`) entirely for per-field reads and also failed to sum usage across multiple assistant messages in a single turn. - Extend `AgentTurnUsage` with `cachedInputTokens`, `cacheCreationTokens`, and `reasoningTokens` so diagnostics carry every counter the provider reports as its own field. - Teach `extractGenAiUsageSummary` to recognize pi-ai aliases and to sum counters across sources so multi-message turns report aggregate usage. - Render the Slack footer "Tokens" value as the sum of all reported component counters (input + output + cachedInput + cacheCreation + reasoning) instead of relying on the provider's inconsistent `totalTokens` field. Fall back to `totalTokens` only when no component counters were reported. Co-Authored-By: Devin <devin-ai-integration[bot]@users.noreply.github.com> Co-Authored-By: David Cramer <david@sentry.io>

devin-ai-integration · 2026-04-17T21:56:16Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR that start with 'DevinAI' or '@devin'.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

vercel · 2026-04-17T21:56:17Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
junior-docs	Ready	Preview, Comment	Apr 17, 2026 10:29pm

The only source reaching extractGenAiUsageSummary is pi-ai's normalized AssistantMessage.usage (input, output, cacheRead, cacheWrite, totalTokens). The OpenAI/Anthropic/Gemini-style aliases (input_tokens, prompt_tokens, cached_input_tokens, etc.) never matched anything in practice — they coincidentally agreed on totalTokens only. - Remove collectUsageRoots/readTokenCount/alias table in favor of a single PI_USAGE_FIELDS map from pi-ai field name to AgentTurnUsage field name. - Drop reasoningTokens from AgentTurnUsage; pi-ai folds reasoning into output already and never exposes it as a separate top-level field. - Update footer and tests accordingly. Co-Authored-By: Devin <devin-ai-integration[bot]@users.noreply.github.com> Co-Authored-By: David Cramer <david@sentry.io>

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 512c423. Configure here.}

cursor · 2026-04-17T22:40:03Z

+  cachedInputTokens?: number;
+  /** Input tokens written into the provider's prompt cache. */
+  cacheCreationTokens?: number;
+  /** Provider-reported total. May not equal the sum of individual counters across providers. */


Missing reasoningTokens field promised by PR

Medium Severity

The PR title says "count all token types (input, output, cached, reasoning)" and the summary explicitly states AgentTurnUsage is extended with reasoningTokens. The footer description says the total is input + output + cachedInput + cacheCreation + reasoning. However, reasoningTokens is entirely missing — it's not in the AgentTurnUsage interface, not in PI_USAGE_FIELDS, and not in resolveTotalTokens's components array. Grep confirms zero matches across the package. When a provider surfaces reasoning tokens as a distinct field (not folded into output), they'll be silently dropped and the Slack footer total will under-count.

Additional Locations (2)

packages/junior/src/chat/logging.ts#L1785-L1792

packages/junior/src/chat/slack/footer.ts#L63-L69

^{Reviewed by Cursor Bugbot for commit 512c423. Configure here.}

devin-ai-integration Bot assigned dcramer Apr 17, 2026

devin-ai-integration Bot requested a review from dcramer April 17, 2026 21:56

vercel Bot deployed to Preview – junior-docs April 17, 2026 21:56 View deployment

vercel Bot deployed to Preview – junior-docs April 17, 2026 22:29 View deployment

dcramer marked this pull request as ready for review April 17, 2026 22:32

devin-ai-integration Bot commented Apr 17, 2026

View reviewed changes

cursor Bot reviewed Apr 17, 2026

View reviewed changes

dcramer merged commit b1bc92c into main Apr 17, 2026
15 checks passed

dcramer deleted the devin/1776462751-token-counting-diagnostics branch April 17, 2026 22:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(diagnostics): count all token types (input, output, cached, reasoning)#213

fix(diagnostics): count all token types (input, output, cached, reasoning)#213
dcramer merged 2 commits intomainfrom
devin/1776462751-token-counting-diagnostics

devin-ai-integration Bot commented Apr 17, 2026

Uh oh!

devin-ai-integration Bot commented Apr 17, 2026

Uh oh!

vercel Bot commented Apr 17, 2026 •

edited

Loading

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

devin-ai-integration Bot commented Apr 17, 2026

Summary

Changes

Review & Testing Checklist for Human

Notes

Uh oh!

devin-ai-integration Bot commented Apr 17, 2026

🤖 Devin AI Engineer

Uh oh!

vercel Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Apr 17, 2026

Choose a reason for hiding this comment

Missing reasoningTokens field promised by PR

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Apr 17, 2026 •

edited

Loading

Missing `reasoningTokens` field promised by PR