Token Estimation

M31 Autonomous (M31A) estimates token counts for context window management, budget tracking, and context warnings. The estimator supports tiktoken-based counting for OpenAI models and a calibrated fallback for other models.

Source: internal/tokens/estimator.go

Estimator

type Estimator struct {
    modelID       string
    tokenizer     *tiktoken.Tiktoken   // nil for unsupported models
    emaAlpha      float64              // EMA calibration rate (default 0.3)
    emaFactorBits atomic.Uint64        // calibration factor as float64 bits
}

Token Counting

Method	Strategy
tiktoken available	`len(tokenizer.Encode(text))` — exact token count
tiktoken unavailable	`utf8.RuneCountInString(text) / 4 * 1.3` — rune-based estimate
Both	Multiply by `emaFactor` calibration factor

The rune-based fallback uses utf8.RuneCountInString() which is O(N) time but O(1) space, avoiding the O(N) allocation of len([]rune(text)).

EMA Calibration

The estimator self-calibrates against actual token counts returned by the API:

func (e *Estimator) Calibrate(estimated, actual int) {
    ratio := float64(actual) / float64(estimated)
    newFactor = emaAlpha * ratio + (1 - emaAlpha) * previousFactor
    // clamped to [0.1, 10.0]
}

emaAlpha defaults to 0.3 (EMACorrectionAlpha)
Factor is stored as uint64 bits for lock-free atomic access
Uses CompareAndSwap for concurrent safety

Message Estimation

EstimateMessages() accounts for the full cost of a conversation:

Component	Overhead
Per message	+4 tokens (role/separator metadata)
Message content	Estimated via `Estimate()`
Tool call input JSON	Estimated via `Estimate()`
Tool call name	Estimated via `Estimate()`

This prevents underestimation on tool-heavy conversations (BUG-29 fix).

Context Warning

When estimated usage exceeds 80% of the model's context window:

⚠ Context at 85% — 19,200 tokens remaining — consider using /compress

The warning is styled with a yellow background via Lipgloss.

Preflight Check

Before each LLM request, the workflow engine runs preflightContextCheck():

if float64(estimated) > 0.95 * float64(contextLength) {
    return ErrContextExceeded  // hard reject
}
if float64(estimated) > 0.80 * float64(contextLength) {
    slog.Warn("context usage approaching limit")  // soft warning
}

Configuration

[model]
context_warning_threshold = 0.8    # Warning threshold (0.0-1.0)
default_context_length = 128000    # Fallback when provider doesn't return one
token_ema_alpha = 0.3              # EMA calibration rate

tiktoken-go Dependency

M31A uses tiktoken-go for OpenAI-compatible tokenization. Note: tiktoken-go is unmaintained since 2024; newer tokenizers (e.g., o200k_base for GPT-4o) may not be recognized, causing silent fallback to rune counting.

Uh oh!

Token Estimation

Token Estimation

Estimator

Token Counting

EMA Calibration

Message Estimation

Context Warning

Preflight Check

Configuration

tiktoken-go Dependency

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally