Skip to content

Setting model_context_window in config.toml breaks auto-compaction (fill_to_context_window resets token counter) #16068

@brandomagnani

Description

@brandomagnani

What version of Codex CLI is running?

codex-cli 0.117.0 (also reproduced on 0.116.0)

What subscription do you have?

Pro

Which model were you using?

gpt-5.3-codex

What platform is your computer?

macOS Darwin 25.3.0 (Apple Silicon)

What issue are you seeing?

Setting model_context_window in ~/.codex/config.toml (to any value: 300K, 350K, or 400K) causes auto-compaction to fail permanently after the first context overflow. Removing the setting and using defaults (272K from server metadata) works correctly.

Reproduction rate: 100% with any custom model_context_window value. 0% with defaults.

Root cause (traced in source)

After the first ContextWindowExceeded error, fill_to_context_window() in protocol/src/protocol.rs:1940-1950 sets last_token_usage.total_tokens to a delta value (context_window - previous_total), which can be near zero:

fn fill_to_context_window(&mut self, context_window: i64) {
    let previous_total = self.total_token_usage.total_tokens;
    let delta = (context_window - previous_total).max(0);
    self.last_token_usage = TokenUsage {
        total_tokens: delta,  // <-- near zero when previous_total ~ context_window
        ..TokenUsage::default()
    };
}

But the compaction trigger at core/src/codex.rs:5878 calls get_total_token_usage() which reads last_token_usage.total_tokens (from context_manager/history.rs:300-305):

let last_tokens = self.token_info
    .as_ref()
    .map(|info| info.last_token_usage.total_tokens)  // reads the delta, not cumulative
    .unwrap_or(0);

So after any overflow, the compaction check sees ~0 tokens used and never triggers compaction. Every subsequent retry also overflows, creating a permanent crash loop.

Why defaults work but custom values don't

The bundled models.json reports context_window: 272000 for gpt-5.3-codex. With defaults:

With custom model_context_window = 400000 (the actual model context window per OpenAI docs):

  • Config overrides server metadata (models_manager/model_info.rs:30-35)
  • Compaction threshold becomes 400K * 90% = 360K
  • But remote compaction sends the full history to /responses/compact, which may reject oversized payloads
  • Once overflow happens, fill_to_context_window poisons the token counter permanently

Even model_context_window = 300000 breaks because:

  • Any first overflow (for any reason) triggers fill_to_context_window
  • After that, token counter reads ~0, compaction never fires again

Additional factors

  1. Remote compaction has no fallback. OpenAI models use remote compaction (compact.rs:50-52). If it fails, error propagates immediately with no retry or fallback to local compaction (compact_remote.rs:127-139).

  2. Pre-compaction trim only removes Codex-generated items (compact_remote.rs:287-292). If history is dominated by user/tool content, trimming stops early and oversized payload is still sent to /responses/compact.

  3. A fuller token estimate exists but is not used for compaction decisions. estimated_token_count is computed at codex.rs:5881-5882 but only logged. The compaction gate at codex.rs:5895 uses total_usage_tokens from get_total_token_usage() which can be stale/poisoned.

Related issues

What steps can reproduce the bug?

  1. Add model_context_window = 400000 to ~/.codex/config.toml
  2. Start a session that reads many files (e.g., codex exec "Read all source files in this repo")
  3. Agent accumulates context past the threshold
  4. First overflow triggers fill_to_context_window
  5. All subsequent turns see ~0 tokens, compaction never fires, permanent crash loop

Remove model_context_window from config.toml and the same workload completes successfully with compaction working.

What is the expected behavior?

model_context_window should work correctly at any value. After context overflow, get_total_token_usage() should return the actual context size, not a poisoned delta value. Compaction should fire and recover.

Suggested fix

get_total_token_usage() should use total_token_usage.total_tokens (cumulative actual usage) instead of last_token_usage.total_tokens (incremental delta) for the compaction threshold comparison. Alternatively, fill_to_context_window should set last_token_usage.total_tokens = context_window (the full value) rather than the delta.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions