[BUG] Auto-compact / session rollover never fires — token-trigger estimate excludes system prompt and tool definitions

## Summary

`SessionRolloverManager.tokenTriggerFires` and `ConversationOptimizer.OptimizeMessages` both gate on `TokenizerService.EstimateMessagesTokens(entries)`, which counts only message content + role + tool-call arguments + a 3-token overhead. The chat UI and the gateway's reported `prompt_tokens` both include the system prompt and tool-definition JSON in addition to message content, so the two numbers drift apart by several thousand tokens. In a typical chat session the entry-only estimate stays below `(context_window × 80%) / 100` even when the UI shows 100% full, so auto-compact never fires.

### Steps to Reproduce

1. Run `infer chat` with default config (`compact.enabled: true`, `compact.auto_at: 80`, jsonl storage enabled).
2. Pick a small-context model — e.g. `ollama_cloud/gemma3:4b` (8K window before the recent gemma matcher fix; reproduces on any 8K–32K model).
3. Have a back-and-forth conversation that involves tool calls (Read, Grep, Bash, etc.) until `/context` reports ≥80% usage.
4. Send another user message.

### Expected Behavior

When `LastInputTokens >= context_window × auto_at / 100`, `SessionRolloverManager.PerformRollover` runs **before** the new user message is appended:

- The optimizer summarises the existing conversation,
- `StartNewConversation` opens a fresh jsonl file,
- The summary message is re-added,
- The new user message lands in the new session.

UI shows a fresh, low-percentage context bar.

### Actual Behavior

- The token trigger silently returns `false` even when the UI says \"100% FULL\".
- The conversation keeps growing indefinitely; the user must run `/compact` manually.
- `logger.Info(\"session rollover triggered by token threshold\", …)` is never written for typical chat sessions, confirming the trigger condition is unreachable.

Observed evidence: `ollama_cloud/gemma3:4b`, 22 messages, 10 API requests, `LastInputTokens=8539`, displayed `Context Window=8192`. 22 messages with 10 requests means rollover never fired — if it had, the message count would have reset. 8539 > threshold of 6553 (8192 × 80 / 100), yet the trigger didn't fire because the entries-only estimate stays well below 6553 once tool defs + system prompt are excluded.

### Root Cause

- \`internal/services/session_rollover_manager.go:198-217\` (\`tokenTriggerFires\`): uses \`tokenizer.EstimateMessagesTokens(msgs)\` — entries only.
- \`internal/services/tokenizer.go:118-131\` (\`EstimateMessagesTokens\`): sums per-message content + 3-token overhead. **Does not include** system prompts or tool definitions.
- \`internal/services/conversation_optimizer.go:80-91\` (\`OptimizeMessages\`): same defect on the inline-optimizer path — both gates use the same flawed estimate.
- The \"real\" number — gateway-reported \`prompt_tokens\` — is already stored at \`internal/services/conversation.go:415\` as \`sessionStats.LastInputTokens\` and is what \`/context\` displays via \`internal/shortcuts/core.go:115\`.

System prompts (built per request in \`internal/agent/agent.go\`, including git context + working dir context) typically add 1500–3000 tokens. Tool definitions for the 15+ registered tools typically add 3000–5000 tokens. \`EstimateToolDefinitionsTokens\` already exists at \`tokenizer.go:134-157\` but is not called by the trigger path.

### Suggested Fix Direction

Two viable paths:

- **(a) Use \`LastInputTokens\` for the trigger.** Read it from the repo's session stats; it's the gateway's authoritative count for what was actually sent. Fall back to the entries-only estimate before the first round-trip when \`LastInputTokens == 0\`. More accurate because the gateway counts provider-specific reformatting too.
- **(b) Augment the estimate.** Add system-prompt tokens and tool-definition tokens (\`EstimateToolDefinitionsTokens\` already exists) into \`tokenTriggerFires\`. More self-contained but duplicates work the gateway already does.

The same fix should be applied to \`ConversationOptimizer.OptimizeMessages\` so the inline optimizer fires consistently with the rollover manager.

### Acceptance Criteria

- A chat session that reaches \`LastInputTokens >= context_window × 80%\` triggers \`PerformRollover\` exactly once on the next user message.
- \`logger.Info(\"session rollover triggered by token threshold\", …)\` is emitted.
- The chat UI's \`Current Context Size\` drops sharply on the next render after rollover.
- Existing \`internal/services/session_rollover_manager_test.go\` cases still pass.
- New test case covers the system-prompt + tool-def scenario (entries-only count below threshold, full count above threshold → trigger fires).

### Environment

- Storage backend: jsonl (default)
- Affected model observed: \`ollama_cloud/gemma3:4b\`, but the defect is provider-agnostic.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Auto-compact / session rollover never fires — token-trigger estimate excludes system prompt and tool definitions #453

Summary

Steps to Reproduce

Expected Behavior

Actual Behavior

Root Cause

Suggested Fix Direction

Acceptance Criteria

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Auto-compact / session rollover never fires — token-trigger estimate excludes system prompt and tool definitions #453

Description

Summary

Steps to Reproduce

Expected Behavior

Actual Behavior

Root Cause

Suggested Fix Direction

Acceptance Criteria

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions