Codex VS Code follow-up turn fails with context_length_exceeded after needs_follow_up=true

### What version of the IDE extension are you using?

0.140.0-alpha.2

### What subscription do you have?

ChatGPT Plus

### Which IDE are you using?

VS Code

### What platform is your computer?

Linux 6.12.74+deb13+1-amd64 x86_64 unknown

### What issue are you seeing?

Codex stops with this user-facing message:

```text
Codex ran out of room in the model's context window. Start a new thread or clear earlier history before retrying.
```

The corresponding websocket failure in the logs is:

```text
event.kind=response.failed
error.code=context_length_exceeded
error.message="Your input exceeds the context window of this model. Please adjust your input and try again."
```

The confusing part is the state immediately before the failure. The post-sampling token usage logs repeatedly show the turn entering a follow-up/continuation state, while the local context-limit flags are still false:

```text
thread_id=019ed7f3-2509-76d3-8975-a5b51ab0411b
turn_id=019ed7fa-e2a2-7a62-a617-0e780a2b06ce
model=gpt-5.5
reasoning_effort=xhigh
total_usage_tokens=282204
estimated_token_count=Some(304934)
auto_compact_scope_tokens=282204
auto_compact_scope_limit=334800
full_context_window_limit=None
full_context_window_limit_reached=false
token_limit_reached=false
model_needs_follow_up=true
has_pending_input=false
needs_follow_up=true
```

There are several similar rows leading up to the failure:

```text
total_usage_tokens=237867, estimated_token_count=Some(257051), needs_follow_up=true
total_usage_tokens=242488, estimated_token_count=Some(261711), needs_follow_up=true
total_usage_tokens=257185, estimated_token_count=Some(276508), needs_follow_up=true
total_usage_tokens=270057, estimated_token_count=Some(290746), needs_follow_up=true
total_usage_tokens=274825, estimated_token_count=Some(297507), needs_follow_up=true
total_usage_tokens=282204, estimated_token_count=Some(304934), needs_follow_up=true
```

All of those rows also report:

```text
full_context_window_limit_reached=false
token_limit_reached=false
```

Then the same turn fails with:

```text
event.kind=response.failed
error.code=context_length_exceeded
message="Your input exceeds the context window of this model. Please adjust your input and try again."
```

So the failure pattern looks like this:

```text
post sampling token usage:
  model_needs_follow_up=true
  needs_follow_up=true
  token_limit_reached=false
  full_context_window_limit_reached=false

then:
  event.kind=response.failed
  error.code=context_length_exceeded
```

This looks like the follow-up / remote compaction continuation request is still too large when it is sent, even though the local turn accounting did not mark the token/context limit as reached.

I’m not sure whether the root cause is remote compaction, follow-up handling, token estimation, or the UI error message, but the logs make the current behavior hard to reason about: local accounting says the limit was not reached, but the next request fails with backend `context_length_exceeded`.

### What steps can reproduce the bug?

I can reproduce this with long-running Codex VS Code sessions using `gpt-5.5` with `reasoning_effort=xhigh`, especially during repo-heavy tasks where Codex reads many files and runs shell commands.

My config:

```toml
model = "gpt-5.5"
model_reasoning_effort = "xhigh"
service_tier = "default"

[projects."/srv/repsnord/app"]
trust_level = "trusted"

[projects."/"]
trust_level = "trusted"

[projects."/srv"]
trust_level = "trusted"
```

Typical reproduction flow:

```text
1. Open Codex in VS Code.
2. Use model gpt-5.5 with reasoning_effort=xhigh.
3. Start a repo-heavy task that reads rules/docs/source files and runs shell commands.
4. Let the turn continue until Codex starts producing automatic follow-up/continuation behavior.
5. Codex eventually stops with: “Codex ran out of room in the model's context window.”
6. Inspect ~/.codex/logs_2.sqlite.
7. The logs show repeated model_needs_follow_up=true / needs_follow_up=true rows followed by response.failed context_length_exceeded.
```

Example thread:

```text
thread_id=019ed7f3-2509-76d3-8975-a5b51ab0411b
turn_id=019ed7fa-e2a2-7a62-a617-0e780a2b06ce
app.version=0.140.0-alpha.2
originator=codex_vscode
model=gpt-5.5
reasoning_effort=xhigh
```

I also saw the same failure pattern in another thread:

```text
thread_id=019ed7dd-d86c-7270-a147-404cdf301f0f
model=gpt-5.5
app.version=0.140.0-alpha.2
event.kind=response.failed
error.code=context_length_exceeded
```

### What is the expected behavior?

If Codex logs:

```text
model_needs_follow_up=true
needs_follow_up=true
```

then I would expect one of these outcomes:

1. Codex successfully performs the follow-up turn after compaction.
2. Codex compacts further before sending the follow-up request.
3. Codex stops earlier with a specific message that the follow-up/compaction request could not fit.
4. Codex logs the local context-limit flags consistently before the backend rejects the request.

Right now the local post-sampling state says:

```text
full_context_window_limit_reached=false
token_limit_reached=false
```

but the next request fails with:

```text
error.code=context_length_exceeded
```

That makes it difficult to tell whether the failure is caused by model context, remote compaction, follow-up handling, token estimation, or just the UI message.

### Additional information

I generated a local evidence bundle from:

```text
~/.codex/logs_2.sqlite
~/.codex/state_5.sqlite
```

Safe summary:

```text
Codex VS Code 0.140.0-alpha.2
model=gpt-5.5
reasoning_effort=xhigh
originator=codex_vscode
wire_api=responses
transport=responses_websocket
x-codex-beta-features includes remote_compaction_v2
```

The most relevant local log pattern is:

```text
model_needs_follow_up=true
needs_follow_up=true
token_limit_reached=false
full_context_window_limit_reached=false
```

followed by:

```text
event.kind=response.failed
error.code=context_length_exceeded
```

I can provide more redacted excerpts if needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Codex VS Code follow-up turn fails with context_length_exceeded after needs_follow_up=true #28816

What version of the IDE extension are you using?

What subscription do you have?

Which IDE are you using?

What platform is your computer?

What issue are you seeing?

What steps can reproduce the bug?

What is the expected behavior?

Additional information

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Codex VS Code follow-up turn fails with context_length_exceeded after needs_follow_up=true #28816

Description

What version of the IDE extension are you using?

What subscription do you have?

Which IDE are you using?

What platform is your computer?

What issue are you seeing?

What steps can reproduce the bug?

What is the expected behavior?

Additional information

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions