What version of Codex CLI is running?
codex-cli 0.117.0 (also reproduced on 0.116.0)
What subscription do you have?
Pro
Which model were you using?
gpt-5.3-codex
What platform is your computer?
macOS Darwin 25.3.0 (Apple Silicon)
What issue are you seeing?
Setting model_context_window in ~/.codex/config.toml (to any value: 300K, 350K, or 400K) causes auto-compaction to fail permanently after the first context overflow. Removing the setting and using defaults (272K from server metadata) works correctly.
Reproduction rate: 100% with any custom model_context_window value. 0% with defaults.
Root cause (traced in source)
After the first ContextWindowExceeded error, fill_to_context_window() in protocol/src/protocol.rs:1940-1950 sets last_token_usage.total_tokens to a delta value (context_window - previous_total), which can be near zero:
fn fill_to_context_window(&mut self, context_window: i64) {
let previous_total = self.total_token_usage.total_tokens;
let delta = (context_window - previous_total).max(0);
self.last_token_usage = TokenUsage {
total_tokens: delta, // <-- near zero when previous_total ~ context_window
..TokenUsage::default()
};
}
But the compaction trigger at core/src/codex.rs:5878 calls get_total_token_usage() which reads last_token_usage.total_tokens (from context_manager/history.rs:300-305):
let last_tokens = self.token_info
.as_ref()
.map(|info| info.last_token_usage.total_tokens) // reads the delta, not cumulative
.unwrap_or(0);
So after any overflow, the compaction check sees ~0 tokens used and never triggers compaction. Every subsequent retry also overflows, creating a permanent crash loop.
Why defaults work but custom values don't
The bundled models.json reports context_window: 272000 for gpt-5.3-codex. With defaults:
With custom model_context_window = 400000 (the actual model context window per OpenAI docs):
- Config overrides server metadata (
models_manager/model_info.rs:30-35)
- Compaction threshold becomes 400K * 90% = 360K
- But remote compaction sends the full history to
/responses/compact, which may reject oversized payloads
- Once overflow happens,
fill_to_context_window poisons the token counter permanently
Even model_context_window = 300000 breaks because:
- Any first overflow (for any reason) triggers
fill_to_context_window
- After that, token counter reads ~0, compaction never fires again
Additional factors
-
Remote compaction has no fallback. OpenAI models use remote compaction (compact.rs:50-52). If it fails, error propagates immediately with no retry or fallback to local compaction (compact_remote.rs:127-139).
-
Pre-compaction trim only removes Codex-generated items (compact_remote.rs:287-292). If history is dominated by user/tool content, trimming stops early and oversized payload is still sent to /responses/compact.
-
A fuller token estimate exists but is not used for compaction decisions. estimated_token_count is computed at codex.rs:5881-5882 but only logged. The compaction gate at codex.rs:5895 uses total_usage_tokens from get_total_token_usage() which can be stale/poisoned.
Related issues
What steps can reproduce the bug?
- Add
model_context_window = 400000 to ~/.codex/config.toml
- Start a session that reads many files (e.g.,
codex exec "Read all source files in this repo")
- Agent accumulates context past the threshold
- First overflow triggers
fill_to_context_window
- All subsequent turns see ~0 tokens, compaction never fires, permanent crash loop
Remove model_context_window from config.toml and the same workload completes successfully with compaction working.
What is the expected behavior?
model_context_window should work correctly at any value. After context overflow, get_total_token_usage() should return the actual context size, not a poisoned delta value. Compaction should fire and recover.
Suggested fix
get_total_token_usage() should use total_token_usage.total_tokens (cumulative actual usage) instead of last_token_usage.total_tokens (incremental delta) for the compaction threshold comparison. Alternatively, fill_to_context_window should set last_token_usage.total_tokens = context_window (the full value) rather than the delta.
What version of Codex CLI is running?
codex-cli 0.117.0 (also reproduced on 0.116.0)
What subscription do you have?
Pro
Which model were you using?
gpt-5.3-codex
What platform is your computer?
macOS Darwin 25.3.0 (Apple Silicon)
What issue are you seeing?
Setting
model_context_windowin~/.codex/config.toml(to any value: 300K, 350K, or 400K) causes auto-compaction to fail permanently after the first context overflow. Removing the setting and using defaults (272K from server metadata) works correctly.Reproduction rate: 100% with any custom
model_context_windowvalue. 0% with defaults.Root cause (traced in source)
After the first
ContextWindowExceedederror,fill_to_context_window()inprotocol/src/protocol.rs:1940-1950setslast_token_usage.total_tokensto a delta value (context_window - previous_total), which can be near zero:But the compaction trigger at
core/src/codex.rs:5878callsget_total_token_usage()which readslast_token_usage.total_tokens(fromcontext_manager/history.rs:300-305):So after any overflow, the compaction check sees ~0 tokens used and never triggers compaction. Every subsequent retry also overflows, creating a permanent crash loop.
Why defaults work but custom values don't
The bundled
models.jsonreportscontext_window: 272000for gpt-5.3-codex. With defaults:fill_to_context_windowis never reachedWith custom
model_context_window = 400000(the actual model context window per OpenAI docs):models_manager/model_info.rs:30-35)/responses/compact, which may reject oversized payloadsfill_to_context_windowpoisons the token counter permanentlyEven
model_context_window = 300000breaks because:fill_to_context_windowAdditional factors
Remote compaction has no fallback. OpenAI models use remote compaction (
compact.rs:50-52). If it fails, error propagates immediately with no retry or fallback to local compaction (compact_remote.rs:127-139).Pre-compaction trim only removes Codex-generated items (
compact_remote.rs:287-292). If history is dominated by user/tool content, trimming stops early and oversized payload is still sent to/responses/compact.A fuller token estimate exists but is not used for compaction decisions.
estimated_token_countis computed atcodex.rs:5881-5882but only logged. The compaction gate atcodex.rs:5895usestotal_usage_tokensfromget_total_token_usage()which can be stale/poisoned.Related issues
context_compactedauto-compaction Codex outputs handoff/summary and stops (CLI + Windows App) #16042 — regression >=0.115 in compaction behaviorWhat steps can reproduce the bug?
model_context_window = 400000to~/.codex/config.tomlcodex exec "Read all source files in this repo")fill_to_context_windowRemove
model_context_windowfrom config.toml and the same workload completes successfully with compaction working.What is the expected behavior?
model_context_windowshould work correctly at any value. After context overflow,get_total_token_usage()should return the actual context size, not a poisoned delta value. Compaction should fire and recover.Suggested fix
get_total_token_usage()should usetotal_token_usage.total_tokens(cumulative actual usage) instead oflast_token_usage.total_tokens(incremental delta) for the compaction threshold comparison. Alternatively,fill_to_context_windowshould setlast_token_usage.total_tokens = context_window(the full value) rather than the delta.