What version of the IDE extension are you using?
0.140.0-alpha.2
What subscription do you have?
ChatGPT Plus
Which IDE are you using?
VS Code
What platform is your computer?
Linux 6.12.74+deb13+1-amd64 x86_64 unknown
What issue are you seeing?
Codex stops with this user-facing message:
Codex ran out of room in the model's context window. Start a new thread or clear earlier history before retrying.
The corresponding websocket failure in the logs is:
event.kind=response.failed
error.code=context_length_exceeded
error.message="Your input exceeds the context window of this model. Please adjust your input and try again."
The confusing part is the state immediately before the failure. The post-sampling token usage logs repeatedly show the turn entering a follow-up/continuation state, while the local context-limit flags are still false:
thread_id=019ed7f3-2509-76d3-8975-a5b51ab0411b
turn_id=019ed7fa-e2a2-7a62-a617-0e780a2b06ce
model=gpt-5.5
reasoning_effort=xhigh
total_usage_tokens=282204
estimated_token_count=Some(304934)
auto_compact_scope_tokens=282204
auto_compact_scope_limit=334800
full_context_window_limit=None
full_context_window_limit_reached=false
token_limit_reached=false
model_needs_follow_up=true
has_pending_input=false
needs_follow_up=true
There are several similar rows leading up to the failure:
total_usage_tokens=237867, estimated_token_count=Some(257051), needs_follow_up=true
total_usage_tokens=242488, estimated_token_count=Some(261711), needs_follow_up=true
total_usage_tokens=257185, estimated_token_count=Some(276508), needs_follow_up=true
total_usage_tokens=270057, estimated_token_count=Some(290746), needs_follow_up=true
total_usage_tokens=274825, estimated_token_count=Some(297507), needs_follow_up=true
total_usage_tokens=282204, estimated_token_count=Some(304934), needs_follow_up=true
All of those rows also report:
full_context_window_limit_reached=false
token_limit_reached=false
Then the same turn fails with:
event.kind=response.failed
error.code=context_length_exceeded
message="Your input exceeds the context window of this model. Please adjust your input and try again."
So the failure pattern looks like this:
post sampling token usage:
model_needs_follow_up=true
needs_follow_up=true
token_limit_reached=false
full_context_window_limit_reached=false
then:
event.kind=response.failed
error.code=context_length_exceeded
This looks like the follow-up / remote compaction continuation request is still too large when it is sent, even though the local turn accounting did not mark the token/context limit as reached.
I’m not sure whether the root cause is remote compaction, follow-up handling, token estimation, or the UI error message, but the logs make the current behavior hard to reason about: local accounting says the limit was not reached, but the next request fails with backend context_length_exceeded.
What steps can reproduce the bug?
I can reproduce this with long-running Codex VS Code sessions using gpt-5.5 with reasoning_effort=xhigh, especially during repo-heavy tasks where Codex reads many files and runs shell commands.
My config:
model = "gpt-5.5"
model_reasoning_effort = "xhigh"
service_tier = "default"
[projects."/srv/repsnord/app"]
trust_level = "trusted"
[projects."/"]
trust_level = "trusted"
[projects."/srv"]
trust_level = "trusted"
Typical reproduction flow:
1. Open Codex in VS Code.
2. Use model gpt-5.5 with reasoning_effort=xhigh.
3. Start a repo-heavy task that reads rules/docs/source files and runs shell commands.
4. Let the turn continue until Codex starts producing automatic follow-up/continuation behavior.
5. Codex eventually stops with: “Codex ran out of room in the model's context window.”
6. Inspect ~/.codex/logs_2.sqlite.
7. The logs show repeated model_needs_follow_up=true / needs_follow_up=true rows followed by response.failed context_length_exceeded.
Example thread:
thread_id=019ed7f3-2509-76d3-8975-a5b51ab0411b
turn_id=019ed7fa-e2a2-7a62-a617-0e780a2b06ce
app.version=0.140.0-alpha.2
originator=codex_vscode
model=gpt-5.5
reasoning_effort=xhigh
I also saw the same failure pattern in another thread:
thread_id=019ed7dd-d86c-7270-a147-404cdf301f0f
model=gpt-5.5
app.version=0.140.0-alpha.2
event.kind=response.failed
error.code=context_length_exceeded
What is the expected behavior?
If Codex logs:
model_needs_follow_up=true
needs_follow_up=true
then I would expect one of these outcomes:
- Codex successfully performs the follow-up turn after compaction.
- Codex compacts further before sending the follow-up request.
- Codex stops earlier with a specific message that the follow-up/compaction request could not fit.
- Codex logs the local context-limit flags consistently before the backend rejects the request.
Right now the local post-sampling state says:
full_context_window_limit_reached=false
token_limit_reached=false
but the next request fails with:
error.code=context_length_exceeded
That makes it difficult to tell whether the failure is caused by model context, remote compaction, follow-up handling, token estimation, or just the UI message.
Additional information
I generated a local evidence bundle from:
~/.codex/logs_2.sqlite
~/.codex/state_5.sqlite
Safe summary:
Codex VS Code 0.140.0-alpha.2
model=gpt-5.5
reasoning_effort=xhigh
originator=codex_vscode
wire_api=responses
transport=responses_websocket
x-codex-beta-features includes remote_compaction_v2
The most relevant local log pattern is:
model_needs_follow_up=true
needs_follow_up=true
token_limit_reached=false
full_context_window_limit_reached=false
followed by:
event.kind=response.failed
error.code=context_length_exceeded
I can provide more redacted excerpts if needed.
What version of the IDE extension are you using?
0.140.0-alpha.2
What subscription do you have?
ChatGPT Plus
Which IDE are you using?
VS Code
What platform is your computer?
Linux 6.12.74+deb13+1-amd64 x86_64 unknown
What issue are you seeing?
Codex stops with this user-facing message:
The corresponding websocket failure in the logs is:
The confusing part is the state immediately before the failure. The post-sampling token usage logs repeatedly show the turn entering a follow-up/continuation state, while the local context-limit flags are still false:
There are several similar rows leading up to the failure:
All of those rows also report:
Then the same turn fails with:
So the failure pattern looks like this:
This looks like the follow-up / remote compaction continuation request is still too large when it is sent, even though the local turn accounting did not mark the token/context limit as reached.
I’m not sure whether the root cause is remote compaction, follow-up handling, token estimation, or the UI error message, but the logs make the current behavior hard to reason about: local accounting says the limit was not reached, but the next request fails with backend
context_length_exceeded.What steps can reproduce the bug?
I can reproduce this with long-running Codex VS Code sessions using
gpt-5.5withreasoning_effort=xhigh, especially during repo-heavy tasks where Codex reads many files and runs shell commands.My config:
Typical reproduction flow:
Example thread:
I also saw the same failure pattern in another thread:
What is the expected behavior?
If Codex logs:
then I would expect one of these outcomes:
Right now the local post-sampling state says:
but the next request fails with:
That makes it difficult to tell whether the failure is caused by model context, remote compaction, follow-up handling, token estimation, or just the UI message.
Additional information
I generated a local evidence bundle from:
Safe summary:
The most relevant local log pattern is:
followed by:
I can provide more redacted excerpts if needed.