Skip to content

[BUG] Auto-compact / session rollover never fires — token-trigger estimate excludes system prompt and tool definitions #453

@edenreich

Description

@edenreich

Summary

SessionRolloverManager.tokenTriggerFires and ConversationOptimizer.OptimizeMessages both gate on TokenizerService.EstimateMessagesTokens(entries), which counts only message content + role + tool-call arguments + a 3-token overhead. The chat UI and the gateway's reported prompt_tokens both include the system prompt and tool-definition JSON in addition to message content, so the two numbers drift apart by several thousand tokens. In a typical chat session the entry-only estimate stays below (context_window × 80%) / 100 even when the UI shows 100% full, so auto-compact never fires.

Steps to Reproduce

  1. Run infer chat with default config (compact.enabled: true, compact.auto_at: 80, jsonl storage enabled).
  2. Pick a small-context model — e.g. ollama_cloud/gemma3:4b (8K window before the recent gemma matcher fix; reproduces on any 8K–32K model).
  3. Have a back-and-forth conversation that involves tool calls (Read, Grep, Bash, etc.) until /context reports ≥80% usage.
  4. Send another user message.

Expected Behavior

When LastInputTokens >= context_window × auto_at / 100, SessionRolloverManager.PerformRollover runs before the new user message is appended:

  • The optimizer summarises the existing conversation,
  • StartNewConversation opens a fresh jsonl file,
  • The summary message is re-added,
  • The new user message lands in the new session.

UI shows a fresh, low-percentage context bar.

Actual Behavior

  • The token trigger silently returns false even when the UI says "100% FULL".
  • The conversation keeps growing indefinitely; the user must run /compact manually.
  • logger.Info(\"session rollover triggered by token threshold\", …) is never written for typical chat sessions, confirming the trigger condition is unreachable.

Observed evidence: ollama_cloud/gemma3:4b, 22 messages, 10 API requests, LastInputTokens=8539, displayed Context Window=8192. 22 messages with 10 requests means rollover never fired — if it had, the message count would have reset. 8539 > threshold of 6553 (8192 × 80 / 100), yet the trigger didn't fire because the entries-only estimate stays well below 6553 once tool defs + system prompt are excluded.

Root Cause

  • `internal/services/session_rollover_manager.go:198-217` (`tokenTriggerFires`): uses `tokenizer.EstimateMessagesTokens(msgs)` — entries only.
  • `internal/services/tokenizer.go:118-131` (`EstimateMessagesTokens`): sums per-message content + 3-token overhead. Does not include system prompts or tool definitions.
  • `internal/services/conversation_optimizer.go:80-91` (`OptimizeMessages`): same defect on the inline-optimizer path — both gates use the same flawed estimate.
  • The "real" number — gateway-reported `prompt_tokens` — is already stored at `internal/services/conversation.go:415` as `sessionStats.LastInputTokens` and is what `/context` displays via `internal/shortcuts/core.go:115`.

System prompts (built per request in `internal/agent/agent.go`, including git context + working dir context) typically add 1500–3000 tokens. Tool definitions for the 15+ registered tools typically add 3000–5000 tokens. `EstimateToolDefinitionsTokens` already exists at `tokenizer.go:134-157` but is not called by the trigger path.

Suggested Fix Direction

Two viable paths:

  • (a) Use `LastInputTokens` for the trigger. Read it from the repo's session stats; it's the gateway's authoritative count for what was actually sent. Fall back to the entries-only estimate before the first round-trip when `LastInputTokens == 0`. More accurate because the gateway counts provider-specific reformatting too.
  • (b) Augment the estimate. Add system-prompt tokens and tool-definition tokens (`EstimateToolDefinitionsTokens` already exists) into `tokenTriggerFires`. More self-contained but duplicates work the gateway already does.

The same fix should be applied to `ConversationOptimizer.OptimizeMessages` so the inline optimizer fires consistently with the rollover manager.

Acceptance Criteria

  • A chat session that reaches `LastInputTokens >= context_window × 80%` triggers `PerformRollover` exactly once on the next user message.
  • `logger.Info("session rollover triggered by token threshold", …)` is emitted.
  • The chat UI's `Current Context Size` drops sharply on the next render after rollover.
  • Existing `internal/services/session_rollover_manager_test.go` cases still pass.
  • New test case covers the system-prompt + tool-def scenario (entries-only count below threshold, full count above threshold → trigger fires).

Environment

  • Storage backend: jsonl (default)
  • Affected model observed: `ollama_cloud/gemma3:4b`, but the defect is provider-agnostic.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingreleased

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions