feat: Token-aware context management for conversation managers

## Summary

Conversation managers currently rely solely on message count (`window_size`) to decide when to reduce context. This is a coarse heuristic — a conversation with 10 messages containing large tool results can exceed a model's context window, while 100 short messages may fit comfortably.

## Related Issues

This proposal addresses or partially addresses several existing feature requests:
- **Directly addresses #1294** — Token Estimation API: adds `estimate_tokens()` utility and pluggable `TokenCounter` type for conversation managers
- **Directly addresses #555** — Proactive Context Compression: implements threshold-based proactive reduction via `max_context_tokens` + `BeforeModelCallEvent` hook, preventing `ContextWindowOverflowException` before it occurs
- **Directly addresses #298** — In-event-loop cycle context management: `per_turn`, `compactable_after_messages`, and hook-based token budget checks enable within-cycle context management
- **Complementary to #1295** — Context Limit Property on Model Interface: if `model.context_limit` is added, it could auto-configure `max_context_tokens`
- **Related to #1678, #1296** — Large Content Aliasing / Externalization: micro-compaction replaces stale tool results with stubs (different strategy, same goal of reclaiming context budget)
- **Related to #2048** — Expose reduce_context() as Hook Event: our hook calls `apply_management()` which may call `reduce_context()`, but does not fire a dedicated event for it

## Problem

1. **No token-budget awareness**: `SlidingWindowConversationManager` only checks `len(messages) > window_size`. There's no way to set a token budget and have context reduction trigger based on estimated token usage.
2. **No proactive reduction for summarizing manager**: `SummarizingConversationManager` only summarizes reactively after a `ContextWindowOverflowException`, which means the agent has already failed a model call.
3. **No micro-compaction**: Stale tool results from early in the conversation consume token budget long after they're relevant, but there's no mechanism to replace them with stubs while preserving the toolUse/toolResult pair structure.
4. **No token estimation utility**: There's no shared utility for estimating token counts across conversation managers.

## Proposed Solution

### New: `_token_utils.py`
- `estimate_tokens(messages)` — chars/4 heuristic covering all `ContentBlock` types (text, toolResult, toolUse, image, document, video, reasoningContent, cachePoint, guardContent, citationsContent)
- `TokenCounter` type alias for custom token counting functions

### `SlidingWindowConversationManager` enhancements
- `max_context_tokens: int | None` — optional token budget, checked alongside `window_size`
- `token_counter: TokenCounter | None` — pluggable token counting function
- `compactable_after_messages: int | None` — micro-compaction of stale tool results
- Proactive token-budget enforcement via `BeforeModelCallEvent` hook
- `_last_compacted_index` tracking to avoid re-scanning already-compacted messages

### `SummarizingConversationManager` enhancements
- `max_context_tokens: int | None` — optional token budget
- `proactive_threshold: float` — fraction of budget at which proactive summarization triggers
- `token_counter: TokenCounter | None` — pluggable token counting function
- Proactive summarization via `BeforeModelCallEvent` hook (only registered when `max_context_tokens` is set)

### Design decisions
- Always uses heuristic estimator, never model-reported `latest_context_size` (stale after reduction → over-reduction spirals)
- Hook calls `apply_management()` (not `reduce_context()` directly) to ensure micro-compaction runs first
- `_model_call_count` only increments when `per_turn` is enabled (preserves existing semantics)
- Summarizing manager's `apply_management` is a no-op to prevent double-summarization (hook + finally block)

## Test Plan

- [ ] 55 new tests in `test_token_aware_context_management.py`
- [ ] All 73 existing conversation manager tests pass (no regressions)
- [ ] Lint clean (`ruff check`), type clean (`mypy`)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Token-aware context management for conversation managers #2146

Summary

Related Issues

Problem

Proposed Solution

New: `_token_utils.py`

`SlidingWindowConversationManager` enhancements

`SummarizingConversationManager` enhancements

Design decisions

Test Plan

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: Token-aware context management for conversation managers #2146

Description

Summary

Related Issues

Problem

Proposed Solution

New: _token_utils.py

SlidingWindowConversationManager enhancements

SummarizingConversationManager enhancements

Design decisions

Test Plan

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

New: `_token_utils.py`

`SlidingWindowConversationManager` enhancements

`SummarizingConversationManager` enhancements