Skip to content

Fix token overcounting from streaming duplicate JSONL entries#278

Merged
sirmalloc merged 3 commits intosirmalloc:mainfrom
hgreene624:fix/streaming-duplicate-token-count
Apr 8, 2026
Merged

Fix token overcounting from streaming duplicate JSONL entries#278
sirmalloc merged 3 commits intosirmalloc:mainfrom
hgreene624:fix/streaming-duplicate-token-count

Conversation

@hgreene624
Copy link
Copy Markdown
Contributor

Summary

  • Claude Code writes multiple JSONL entries per API call during streaming (intermediate entries have stop_reason: null, final entries have "end_turn" or "tool_use")
  • getTokenMetrics was summing all entries, inflating the tokens-total widget by ~2.5x
  • Now only counts entries with a truthy stop_reason, with fallback to counting all entries when no stop_reason data is present (backward compatibility with older transcripts)

Before/After (real session, 21 API calls)

Metric Before (bug) After (fix)
JSONL entries counted 54 21
Reported total 2,427,587 957,939
Overcount factor 2.5x 1.0x

Changes

  • src/types/TokenMetrics.ts — Added stop_reason to TranscriptLine.message
  • src/utils/jsonl-metrics.ts — Two-pass approach: first scan detects if any entry has stop_reason, then second pass skips intermediate streaming entries
  • src/utils/__tests__/jsonl-metrics.test.ts — Added stopReason param to test helper, two new tests (streaming dedup + legacy fallback)

Test plan

  • New test: streaming entries are deduplicated, only final entries counted
  • New test: legacy transcripts without stop_reason fall back to counting all entries
  • Existing jsonl-metrics.test.ts tests pass (23/23)
  • Full suite: no new failures (pre-existing ESM/Bun compat failures in 7 unrelated files)

🤖 Generated with Claude Code

Claude Code writes multiple JSONL entries per API call during streaming:
intermediate entries have stop_reason: null, and only the final entry has
a string value like "end_turn" or "tool_use". The getTokenMetrics function
was summing all entries, inflating the total by ~2.5x.

Now only counts final entries (those with a truthy stop_reason). Falls
back to counting all entries when no stop_reason data is present, for
backward compatibility with older transcript formats.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d4c0eaf6ec

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/utils/jsonl-metrics.ts Outdated

for (const data of parsedEntries) {
// Skip intermediate streaming entries when stop_reason data is available
if (hasAnyStopReason && !data.message!.stop_reason) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve entries missing stop_reason in mixed transcripts

When any entry has a truthy stop_reason, the new filter skips all entries where stop_reason is falsy (if (hasAnyStopReason && !data.message!.stop_reason)), which also drops legacy entries where the field is absent. In a mixed-format transcript (e.g., older lines without stop_reason plus newer streaming lines), those legacy usage rows are excluded from totals and contextLength, causing undercounting despite the backward-compatibility goal.

Useful? React with 👍 / 👎.

@sirmalloc sirmalloc merged commit ccc1803 into sirmalloc:main Apr 8, 2026
3 checks passed
@sirmalloc
Copy link
Copy Markdown
Owner

Thanks for this, it will be published in the next release

pcvelz added a commit to pcvelz/ccstatusline-usage that referenced this pull request Apr 10, 2026
- fix: Fix token overcounting from streaming duplicate JSONL entries (sirmalloc#278)
- fix: strip parenthetical suffix from model display name (sirmalloc#283)
- Version bump, README cleanup

Fork adaptation: Model.ts preserves [1m] suffix while adopting upstream's
general parenthetical-strip regex. Model.test.ts updated to match fork
behavior (ResetTimerWidget depends on [1m] for charged-model detection).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants