Skip to content

Context overflow not handled gracefully with local model (qwen3.5:35b, 65K context) #106

@rolandscho

Description

@rolandscho

Description

When using Anton with a local model (e.g. qwen3.5:35b via llama.cpp) configured with a 65K token context window, longer chat sessions eventually hit the context limit and produce an unhandled error instead of recovering gracefully.

Error message

❌ What went wrong
The chat context was too large (over 69,000 tokens). This happens when many queries and results are stored.

Expected behavior

Anton's code in chat.py has an automatic history summarization mechanism (_summarize_history()) that is supposed to trigger when context_pressure exceeds 70% (_CONTEXT_PRESSURE_THRESHOLD = 0.7). It also catches ContextOverflowError to compress older turns reactively.

In practice, with a local model at 65K context, this automatic summarization does not kick in before the limit is exceeded, causing the session to fail rather than recovering transparently.

Environment

  • Model: qwen3.5:35b (local, via llama.cpp)
  • Context window: ~65,000 tokens
  • Trigger: Long chat session with many tool calls and query results accumulated in history

Steps to reproduce

  1. Configure Anton with a local model (qwen3.5:35b) via /setup
  2. Run a long, multi-step session with several data queries and tool calls
  3. After enough turns, the session fails with the context-too-large error

Possible causes

  • The context_pressure calculation may not be accurate for local models, so the 70% threshold is never triggered proactively
  • ContextOverflowError may not be raised or mapped correctly for local backends, so the reactive summarization path is also skipped
  • Token counting for local models may differ from what Anton expects

Suggested fix

  • Ensure token counting works correctly for local models
  • Fall back to a character-based estimate if token counts are unavailable from the local model API
  • Show a user-friendly recovery message (e.g. "Context too large — summarizing history and continuing…") instead of a hard failure
  • Consider adding a /compact or /reset slash command to let users manually trim history mid-session

Current workarounds

  • Start a new Anton session (history is cleared, learned memory is preserved)
  • Switch to a model with a larger context window via /setup
  • Split long tasks into shorter sessions

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions