Skip to content

Token Estimation

Eshan Roy edited this page Jun 16, 2026 · 2 revisions

Token Estimation

M31 Autonomous (M31A) estimates token counts for context window management, budget tracking, and context warnings. The estimator supports tiktoken-based counting for OpenAI models and a calibrated fallback for other models.

Source: internal/tokens/estimator.go

Estimator

type Estimator struct {
    modelID       string
    tokenizer     *tiktoken.Tiktoken   // nil for unsupported models
    emaAlpha      float64              // EMA calibration rate (default 0.3)
    emaFactorBits atomic.Uint64        // calibration factor as float64 bits
}

Token Counting

Method Strategy
tiktoken available len(tokenizer.Encode(text)) — exact token count
tiktoken unavailable utf8.RuneCountInString(text) / 4 * 1.3 — rune-based estimate
Both Multiply by emaFactor calibration factor

The rune-based fallback uses utf8.RuneCountInString() which is O(N) time but O(1) space, avoiding the O(N) allocation of len([]rune(text)).

EMA Calibration

The estimator self-calibrates against actual token counts returned by the API:

func (e *Estimator) Calibrate(estimated, actual int) {
    ratio := float64(actual) / float64(estimated)
    newFactor = emaAlpha * ratio + (1 - emaAlpha) * previousFactor
    // clamped to [0.1, 10.0]
}
  • emaAlpha defaults to 0.3 (EMACorrectionAlpha)
  • Factor is stored as uint64 bits for lock-free atomic access
  • Uses CompareAndSwap for concurrent safety

Message Estimation

EstimateMessages() accounts for the full cost of a conversation:

Component Overhead
Per message +4 tokens (role/separator metadata)
Message content Estimated via Estimate()
Tool call input JSON Estimated via Estimate()
Tool call name Estimated via Estimate()

This prevents underestimation on tool-heavy conversations (BUG-29 fix).

Context Warning

When estimated usage exceeds 80% of the model's context window:

⚠ Context at 85% — 19,200 tokens remaining — consider using /compress

The warning is styled with a yellow background via Lipgloss.

Preflight Check

Before each LLM request, the workflow engine runs preflightContextCheck():

if float64(estimated) > 0.95 * float64(contextLength) {
    return ErrContextExceeded  // hard reject
}
if float64(estimated) > 0.80 * float64(contextLength) {
    slog.Warn("context usage approaching limit")  // soft warning
}

Configuration

[model]
context_warning_threshold = 0.8    # Warning threshold (0.0-1.0)
default_context_length = 128000    # Fallback when provider doesn't return one
token_ema_alpha = 0.3              # EMA calibration rate

tiktoken-go Dependency

M31A uses tiktoken-go for OpenAI-compatible tokenization. Note: tiktoken-go is unmaintained since 2024; newer tokenizers (e.g., o200k_base for GPT-4o) may not be recognized, causing silent fallback to rune counting.

Clone this wiki locally