fix: Reconcile token displays and persist metadata-only saves by edenreich · Pull Request #459 · inference-gateway/cli

edenreich · 2026-04-27T18:57:13Z

Summary

Token display reconciliation: Status bar T.<n> and Context: N% now derive from LastInputTokens, matching /context's Current Context Size. All three surfaces report the same metric — window fill, not cumulative billing.
Tokenizer fallback in /context: When the provider does not return usage in its response, the shortcut falls back to the existing TokenizerService polyfill and marks the value as ~<n> (estimated). The container's tokenizer is now created unconditionally and reused by the shortcut, the optimizer, and the rollover manager.
JSONL metadata-only save bug: saveConversationUnlocked previously skipped writing metadata when no new entries were added (if len(entries) > persistedCount gate). This dropped saves triggered by AddTokenUsage after the assistant message was already persisted, causing /conversations to lag by one save event. Now metadata is appended on every save; the reader already uses the last meta line.

- Status bar T.<n> and Context: N% now derive from LastInputTokens, matching /context's "Current Context Size" so all three surfaces report the same metric (window fill, not cumulative billing). The Context: N% indicator always renders; HIGH/FULL labels still kick in at 75%/90%. - /context shortcut now falls back to the tokenizer polyfill when the provider does not return usage in its response, marking the value as ~<n> (estimated). The container's TokenizerService is reused so the shortcut, the optimizer, and the rollover manager share one instance. - JSONL storage's saveConversationUnlocked now appends a fresh metadata line on every save, including saves with no new entries. The previous early-return dropped metadata-only saves (e.g. AddTokenUsage updating token stats after the assistant message was already persisted), causing /conversations to lag by one save event. Added regression test TestJsonlStorage_MetadataOnlyUpdatePersists.

Switch the percentage shown by /context, the status bar Context: N% indicator, and the T.<n> raw counter to use cumulative TotalInputTokens instead of the most-recent prompt size. Users expect the meter to reflect "how much of the 1M-token allowance the session has spent" — e.g. 20K input cumulative on a 1M window now correctly shows 2%. The /context "Current Context Size" line is renamed to "Total Input Tokens" since it now reports cumulative usage. Tests updated to assert the new semantics with realistic numbers (20000 input → 2.0%).

## [0.104.2](v0.104.1...v0.104.2) (2026-04-27) ### 🐛 Bug Fixes * **services:** Prevent compact summary truncation at 200-token cap ([#457](#457)) ([36b9612](36b9612)), closes [#454](#454) * Reconcile token displays and persist metadata-only saves ([#459](#459)) ([8bc8767](8bc8767)) ### 📚 Documentation * Update agents MD ([#458](#458)) ([27bfaea](27bfaea)) ### 🧹 Maintenance * **nix:** Update package to v0.104.1 ([#456](#456)) ([784e4bc](784e4bc))

edenreich added 2 commits April 27, 2026 20:56

edenreich merged commit 8bc8767 into main Apr 27, 2026
5 checks passed

edenreich deleted the fix/context-and-conversations-token-display branch April 27, 2026 19:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Reconcile token displays and persist metadata-only saves#459

fix: Reconcile token displays and persist metadata-only saves#459
edenreich merged 2 commits intomainfrom
fix/context-and-conversations-token-display

edenreich commented Apr 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

edenreich commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

edenreich commented Apr 27, 2026 •

edited

Loading