Description
When using Anton with a local model (e.g. qwen3.5:35b via llama.cpp) configured with a 65K token context window, longer chat sessions eventually hit the context limit and produce an unhandled error instead of recovering gracefully.
Error message
❌ What went wrong
The chat context was too large (over 69,000 tokens). This happens when many queries and results are stored.
Expected behavior
Anton's code in chat.py has an automatic history summarization mechanism (_summarize_history()) that is supposed to trigger when context_pressure exceeds 70% (_CONTEXT_PRESSURE_THRESHOLD = 0.7). It also catches ContextOverflowError to compress older turns reactively.
In practice, with a local model at 65K context, this automatic summarization does not kick in before the limit is exceeded, causing the session to fail rather than recovering transparently.
Environment
- Model:
qwen3.5:35b (local, via llama.cpp)
- Context window: ~65,000 tokens
- Trigger: Long chat session with many tool calls and query results accumulated in history
Steps to reproduce
- Configure Anton with a local model (
qwen3.5:35b) via /setup
- Run a long, multi-step session with several data queries and tool calls
- After enough turns, the session fails with the context-too-large error
Possible causes
- The
context_pressure calculation may not be accurate for local models, so the 70% threshold is never triggered proactively
ContextOverflowError may not be raised or mapped correctly for local backends, so the reactive summarization path is also skipped
- Token counting for local models may differ from what Anton expects
Suggested fix
- Ensure token counting works correctly for local models
- Fall back to a character-based estimate if token counts are unavailable from the local model API
- Show a user-friendly recovery message (e.g. "Context too large — summarizing history and continuing…") instead of a hard failure
- Consider adding a
/compact or /reset slash command to let users manually trim history mid-session
Current workarounds
- Start a new Anton session (history is cleared, learned memory is preserved)
- Switch to a model with a larger context window via
/setup
- Split long tasks into shorter sessions
Description
When using Anton with a local model (e.g.
qwen3.5:35bvia llama.cpp) configured with a 65K token context window, longer chat sessions eventually hit the context limit and produce an unhandled error instead of recovering gracefully.Error message
❌ What went wrong
The chat context was too large (over 69,000 tokens). This happens when many queries and results are stored.
Expected behavior
Anton's code in
chat.pyhas an automatic history summarization mechanism (_summarize_history()) that is supposed to trigger whencontext_pressureexceeds 70% (_CONTEXT_PRESSURE_THRESHOLD = 0.7). It also catchesContextOverflowErrorto compress older turns reactively.In practice, with a local model at 65K context, this automatic summarization does not kick in before the limit is exceeded, causing the session to fail rather than recovering transparently.
Environment
qwen3.5:35b(local, via llama.cpp)Steps to reproduce
qwen3.5:35b) via/setupPossible causes
context_pressurecalculation may not be accurate for local models, so the 70% threshold is never triggered proactivelyContextOverflowErrormay not be raised or mapped correctly for local backends, so the reactive summarization path is also skippedSuggested fix
/compactor/resetslash command to let users manually trim history mid-sessionCurrent workarounds
/setup