-
-
Notifications
You must be signed in to change notification settings - Fork 0
Token Estimation
M31 Autonomous (M31A) estimates token counts for context window management, budget tracking, and context warnings. The estimator supports tiktoken-based counting for OpenAI models and a calibrated fallback for other models.
Source: internal/tokens/estimator.go
type Estimator struct {
modelID string
tokenizer *tiktoken.Tiktoken // nil for unsupported models
emaAlpha float64 // EMA calibration rate (default 0.3)
emaFactorBits atomic.Uint64 // calibration factor as float64 bits
}| Method | Strategy |
|---|---|
| tiktoken available |
len(tokenizer.Encode(text)) — exact token count |
| tiktoken unavailable |
utf8.RuneCountInString(text) / 4 * 1.3 — rune-based estimate |
| Both | Multiply by emaFactor calibration factor |
The rune-based fallback uses utf8.RuneCountInString() which is O(N) time but O(1) space, avoiding the O(N) allocation of len([]rune(text)).
The estimator self-calibrates against actual token counts returned by the API:
func (e *Estimator) Calibrate(estimated, actual int) {
ratio := float64(actual) / float64(estimated)
newFactor = emaAlpha * ratio + (1 - emaAlpha) * previousFactor
// clamped to [0.1, 10.0]
}-
emaAlphadefaults to 0.3 (EMACorrectionAlpha) - Factor is stored as
uint64bits for lock-free atomic access - Uses
CompareAndSwapfor concurrent safety
EstimateMessages() accounts for the full cost of a conversation:
| Component | Overhead |
|---|---|
| Per message | +4 tokens (role/separator metadata) |
| Message content | Estimated via Estimate()
|
| Tool call input JSON | Estimated via Estimate()
|
| Tool call name | Estimated via Estimate()
|
This prevents underestimation on tool-heavy conversations (BUG-29 fix).
When estimated usage exceeds 80% of the model's context window:
⚠ Context at 85% — 19,200 tokens remaining — consider using /compress
The warning is styled with a yellow background via Lipgloss.
Before each LLM request, the workflow engine runs preflightContextCheck():
if float64(estimated) > 0.95 * float64(contextLength) {
return ErrContextExceeded // hard reject
}
if float64(estimated) > 0.80 * float64(contextLength) {
slog.Warn("context usage approaching limit") // soft warning
}[model]
context_warning_threshold = 0.8 # Warning threshold (0.0-1.0)
default_context_length = 128000 # Fallback when provider doesn't return one
token_ema_alpha = 0.3 # EMA calibration rateM31A uses tiktoken-go for OpenAI-compatible tokenization. Note: tiktoken-go is unmaintained since 2024; newer tokenizers (e.g., o200k_base for GPT-4o) may not be recognized, causing silent fallback to rune counting.