Skip to content

Fix streaming dedup: keep last message.id occurrence for accurate tokens and tools#201

Merged
iamtoruk merged 1 commit intomainfrom
fix/streaming-dedup
May 3, 2026
Merged

Fix streaming dedup: keep last message.id occurrence for accurate tokens and tools#201
iamtoruk merged 1 commit intomainfrom
fix/streaming-dedup

Conversation

@iamtoruk
Copy link
Copy Markdown
Member

@iamtoruk iamtoruk commented May 3, 2026

Summary

  • Claude Code writes the same message.id multiple times during streaming (message_start, intermediate, message_stop)
  • Old parser kept the first occurrence (partial: 1 output token, 0 tool_use blocks)
  • New parser keeps the last occurrence (authoritative: real token counts, all tool_use/MCP blocks) but preserves the first timestamp for date bucketing

Validation (4 agents, 21,390 real session files)

Metric Finding
Sessions affected 40.5% of all files have duplicate message IDs
Output tokens +6.3% increase (was undercounted)
Cost +0.77% increase
Tool counts ~50% of tools were invisible (only in last occurrence)
MCP tools 100% were dropped (playwright, context7, sequential_thinking)
Cross-midnight timestamps 2 real cases where wrong timestamp would bucket cost to wrong day

Before/After comparison (6 days, Claude only)

Date       | ccusage Out  | Before Out   | After Out    | Fix Delta
2026-04-27 |       57,015 |       57,569 |       58,796 |       +1,227
2026-04-28 |       37,859 |       38,575 |       41,183 |       +2,608
2026-04-29 |      820,522 |      819,252 |      902,037 |      +82,785
2026-04-30 |    2,084,720 |    2,086,109 |    2,167,857 |      +81,748
2026-05-01 |      847,510 |      846,121 |      854,175 |       +8,054
2026-05-02 |      313,781 |      311,896 |      398,355 |      +86,459
TOTALS     |    4,161,407 |    4,159,522 |    4,422,403 |     +262,881

Old codeburn matched ccusage within 0.05% because both had the same keep-first bug.

Scope

Only affects Claude Code session parsing. Other providers (Gemini, Cursor, Goose, etc.) have their own parsers and are unaffected.

Closes #110

…ession files

Claude Code writes the same message.id multiple times during streaming.
The first write has partial tokens (often 1) and no tool_use blocks.
The last write has authoritative token counts and all tool_use/MCP blocks.

Old behavior kept the first occurrence (keep-first), silently dropping
real output tokens (+6.3% undercount) and all MCP tool calls.

New behavior keeps the last occurrence's content but preserves the first
occurrence's timestamp for correct date bucketing.

Validated against 21,390 real session files: 40.5% had duplicate IDs,
output tokens were understated by up to 78% per session.
@iamtoruk iamtoruk merged commit 95585fe into main May 3, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MCP tool calls and tool_use blocks missing from dashboard due to streaming write deduplication

1 participant