Summary
SessionRolloverManager.tokenTriggerFires and ConversationOptimizer.OptimizeMessages both gate on TokenizerService.EstimateMessagesTokens(entries), which counts only message content + role + tool-call arguments + a 3-token overhead. The chat UI and the gateway's reported prompt_tokens both include the system prompt and tool-definition JSON in addition to message content, so the two numbers drift apart by several thousand tokens. In a typical chat session the entry-only estimate stays below (context_window × 80%) / 100 even when the UI shows 100% full, so auto-compact never fires.
Steps to Reproduce
- Run
infer chat with default config (compact.enabled: true, compact.auto_at: 80, jsonl storage enabled).
- Pick a small-context model — e.g.
ollama_cloud/gemma3:4b (8K window before the recent gemma matcher fix; reproduces on any 8K–32K model).
- Have a back-and-forth conversation that involves tool calls (Read, Grep, Bash, etc.) until
/context reports ≥80% usage.
- Send another user message.
Expected Behavior
When LastInputTokens >= context_window × auto_at / 100, SessionRolloverManager.PerformRollover runs before the new user message is appended:
- The optimizer summarises the existing conversation,
StartNewConversation opens a fresh jsonl file,
- The summary message is re-added,
- The new user message lands in the new session.
UI shows a fresh, low-percentage context bar.
Actual Behavior
- The token trigger silently returns
false even when the UI says "100% FULL".
- The conversation keeps growing indefinitely; the user must run
/compact manually.
logger.Info(\"session rollover triggered by token threshold\", …) is never written for typical chat sessions, confirming the trigger condition is unreachable.
Observed evidence: ollama_cloud/gemma3:4b, 22 messages, 10 API requests, LastInputTokens=8539, displayed Context Window=8192. 22 messages with 10 requests means rollover never fired — if it had, the message count would have reset. 8539 > threshold of 6553 (8192 × 80 / 100), yet the trigger didn't fire because the entries-only estimate stays well below 6553 once tool defs + system prompt are excluded.
Root Cause
- `internal/services/session_rollover_manager.go:198-217` (`tokenTriggerFires`): uses `tokenizer.EstimateMessagesTokens(msgs)` — entries only.
- `internal/services/tokenizer.go:118-131` (`EstimateMessagesTokens`): sums per-message content + 3-token overhead. Does not include system prompts or tool definitions.
- `internal/services/conversation_optimizer.go:80-91` (`OptimizeMessages`): same defect on the inline-optimizer path — both gates use the same flawed estimate.
- The "real" number — gateway-reported `prompt_tokens` — is already stored at `internal/services/conversation.go:415` as `sessionStats.LastInputTokens` and is what `/context` displays via `internal/shortcuts/core.go:115`.
System prompts (built per request in `internal/agent/agent.go`, including git context + working dir context) typically add 1500–3000 tokens. Tool definitions for the 15+ registered tools typically add 3000–5000 tokens. `EstimateToolDefinitionsTokens` already exists at `tokenizer.go:134-157` but is not called by the trigger path.
Suggested Fix Direction
Two viable paths:
- (a) Use `LastInputTokens` for the trigger. Read it from the repo's session stats; it's the gateway's authoritative count for what was actually sent. Fall back to the entries-only estimate before the first round-trip when `LastInputTokens == 0`. More accurate because the gateway counts provider-specific reformatting too.
- (b) Augment the estimate. Add system-prompt tokens and tool-definition tokens (`EstimateToolDefinitionsTokens` already exists) into `tokenTriggerFires`. More self-contained but duplicates work the gateway already does.
The same fix should be applied to `ConversationOptimizer.OptimizeMessages` so the inline optimizer fires consistently with the rollover manager.
Acceptance Criteria
- A chat session that reaches `LastInputTokens >= context_window × 80%` triggers `PerformRollover` exactly once on the next user message.
- `logger.Info("session rollover triggered by token threshold", …)` is emitted.
- The chat UI's `Current Context Size` drops sharply on the next render after rollover.
- Existing `internal/services/session_rollover_manager_test.go` cases still pass.
- New test case covers the system-prompt + tool-def scenario (entries-only count below threshold, full count above threshold → trigger fires).
Environment
- Storage backend: jsonl (default)
- Affected model observed: `ollama_cloud/gemma3:4b`, but the defect is provider-agnostic.
Summary
SessionRolloverManager.tokenTriggerFiresandConversationOptimizer.OptimizeMessagesboth gate onTokenizerService.EstimateMessagesTokens(entries), which counts only message content + role + tool-call arguments + a 3-token overhead. The chat UI and the gateway's reportedprompt_tokensboth include the system prompt and tool-definition JSON in addition to message content, so the two numbers drift apart by several thousand tokens. In a typical chat session the entry-only estimate stays below(context_window × 80%) / 100even when the UI shows 100% full, so auto-compact never fires.Steps to Reproduce
infer chatwith default config (compact.enabled: true,compact.auto_at: 80, jsonl storage enabled).ollama_cloud/gemma3:4b(8K window before the recent gemma matcher fix; reproduces on any 8K–32K model)./contextreports ≥80% usage.Expected Behavior
When
LastInputTokens >= context_window × auto_at / 100,SessionRolloverManager.PerformRolloverruns before the new user message is appended:StartNewConversationopens a fresh jsonl file,UI shows a fresh, low-percentage context bar.
Actual Behavior
falseeven when the UI says "100% FULL"./compactmanually.logger.Info(\"session rollover triggered by token threshold\", …)is never written for typical chat sessions, confirming the trigger condition is unreachable.Observed evidence:
ollama_cloud/gemma3:4b, 22 messages, 10 API requests,LastInputTokens=8539, displayedContext Window=8192. 22 messages with 10 requests means rollover never fired — if it had, the message count would have reset. 8539 > threshold of 6553 (8192 × 80 / 100), yet the trigger didn't fire because the entries-only estimate stays well below 6553 once tool defs + system prompt are excluded.Root Cause
System prompts (built per request in `internal/agent/agent.go`, including git context + working dir context) typically add 1500–3000 tokens. Tool definitions for the 15+ registered tools typically add 3000–5000 tokens. `EstimateToolDefinitionsTokens` already exists at `tokenizer.go:134-157` but is not called by the trigger path.
Suggested Fix Direction
Two viable paths:
The same fix should be applied to `ConversationOptimizer.OptimizeMessages` so the inline optimizer fires consistently with the rollover manager.
Acceptance Criteria
Environment