Codex Desktop meta-bug: unbounded session/turn state causes freezes, context bloat, and lost active-turn control

### What version of the Codex App are you using (From “About Codex” dialog)?

26.527.60818

### What subscription do you have?

pro

### What platform is your computer?

Microsoft Windows NT 10.0.19045.0

### What issue are you seeing?

# Codex Desktop meta-bug: unbounded session/turn state causes freezes, context bloat, and lost active-turn control

## Suggested labels

`bug`, `app`, `session`, `context`, `tool-calls`, `performance`, `app-server`

## Summary

Several open Codex issues appear to be different symptoms of the same larger reliability problem:

Codex Desktop allows local session/turn state to become unbounded, then places that state on hot UI, resume, context assembly, IPC, and active-turn ownership paths. Once a thread is long-running, tool-heavy, image-heavy, repeatedly compacted, or has experienced renderer/restart/interrupt recovery, the app can freeze, become extremely slow, overfill context, lose trace/progress visibility, fail Stop/cancel, or continue backend work while the UI appears stuck in `Thinking`.

I do not think every linked issue is the same single code bug. The clearer framing is one parent/meta bug with several fix surfaces:

1. unbounded persisted rollout/session records;
2. full-history hydration into app-server/client/renderer state;
3. context replay/compaction retaining large tool or image payloads;
4. renderer/app-server turn ownership and trace-stream rehydration losing authority;
5. insufficient diagnostics, because very different waits all collapse into `Thinking`, `Working`, or `Reconnecting`.



## Evidence from related open reports

### 1. Large local rollout/history hydration makes Desktop slow, frozen, or unrecoverable

- #18693 reports a verified A/B profile test where a copied profile is smooth until only a few giant local histories are restored. Largest anonymized rollouts were about `455 MiB`, `507 MiB`, `561 MiB`, `584 MiB`, and `1867 MiB`; operations such as resume/read/unsubscribe could take tens of seconds, with worst cases around `82s`.
- #20269 reports a long-running Windows Desktop task producing a `718.6 MB` rollout JSONL with `3,305` lines. After force quit, every launch auto-resumed the oversized session and froze. Renderer CPU was around `140%` of one core and working-set memory grew by about `1.177 GB` in `10s`.
- #22991 reports multiple long-running session files around `500 MB` or larger. The strongest freeze occurred not merely when opening a large conversation, but when sending a new prompt in an already-open large thread.
- #11984 is the long-running umbrella report for Desktop UI lag during long sessions. Later comments tie the problem to full history hydration, renderer CPU/RSS, and large IPC/app-server payloads rather than only DOM rendering.
- #21299 reports Windows long-thread message submit lag: sending a new message in a long thread makes the app unresponsive for about `5s`, and pruning old local JSONL session files is reported as a workaround.

### 2. Hard size limits and inline payloads expose the same design failure

- #22004 reports a reproducible Electron main-process crash, `RangeError: Invalid string length`, when loading sessions whose rollout JSONL exceeds V8's max string length. Reported sizes include `506.8 MB`, `786.7 MB`, `963.5 MB`, `1050 MB`, and `1601 MB`. Related comments report an `825 MB` rollout producing a dropped `405 MB` IPC payload, and a macOS `1.37 GB` rollout where app-server grew to `6-8 GB` RSS.
- #24676 shows that the problem can occur below the hard `512 MB` crash threshold. A `182.8 MB` image-heavy rollout had `41,361` lines, a max JSONL line of `6,850,815` chars, and many inline `image_url` / `payload.images` records. A newer repro was only `102.44 MB`, but almost the whole file was `data:image` payloads; a compacted record alone was `24.207 MB` with `16` image references.
- #22091 reports Desktop context growth from retained tool outputs: a fresh session reached `137,293 / 258,400` tokens after a small number of visible interactions, and an earlier diagnostic reached `234,757 / 258,400`, then re-inflated after compaction. Several retained tool outputs were around `37K-40K` characters each.

### 3. Context compaction/replay can amplify the same state bloat

- #25009 reports Desktop becoming slow or stuck after `context_compacted` events and approval transcript injection. The report found repeated `approval assessment` / `TRANSCRIPT DELTA` blocks containing prior tool calls, tool outputs, retry reasons, and planned actions.
- #24095 tracks Windows memory spikes and freezes after repeated context compactions in a single long-running session, distinguishing this from one-time renderer regressions or image-only rollouts.
- #19842 reports Codex CLI running out of context instead of compacting/resuming a long tool-heavy thread. One tool output had an original token count of `815,104`; the final token-count event reached `258,400 / 258,400`.
- #19585 is nominally about usage depletion, but the useful overlap is the "compaction tax": failed/slow compaction and repeated reconstruction consume additional usage and make long-session failure more expensive.

### 4. Active turn ownership and renderer/session state can desynchronize

- #24287 reports the dangerous control-plane symptom: prompt accepted, UI stuck in `Thinking`, Stop fails or is misleading, progress traces disappear, and backend usage can continue decreasing while no activity is visible. It also reports multi-window rehydration making already-visible traces disappear and state disagreements between the goal bar, prompt box, and chat/trace area.
- #24434 reports a post-tool continuation stall: `pwd` and `rg --files` returned successfully almost instantly, then no assistant message, no new tool call, and no task completion occurred for about `6m59s` until manual interrupt.
- #24263 gives concrete renderer rehydration evidence. After a renderer reload, logs showed placeholder latest-turn rebinding, completed turns marked `markedStreaming=true`, `Received turn/started for unknown conversation`, and `5,466` `Item not found in turn state` errors in one Desktop log.
- #23644 reports composer submit timing out after stale conversation state accumulated over several days. Local app-state snapshots included `pending_request_count=20`, `thread_count_active=15`, `thread_count_streaming_owner=6`, `thread_count_streaming_without_active_runtime=13`, `item_count_total_loaded=22550`, and about `201 MB` of estimated delta bytes. Restart cleared the issue.
- #23035 reports an orphan `task_started` without `task_complete` poisoning reopen/resume. The rollout parsed cleanly, but balancing the orphan turn with a synthetic completion made the lifecycle valid again. This suggests interrupted/failed turns need durable terminal state or tolerant resume logic.
- #21360 reports multiple turn lifecycle stall modes: `task_started` without `task_complete`, tool outputs returned without assistant continuation, and long non-image sessions with multiple compactions and unfinished turns.

### 5. Transport/first-output stalls are adjacent and currently indistinguishable in the UI

Some reports may not share the same local-history root cause, but they matter because the UI collapses them into the same stuck state and recovery paths can interact with renderer/session state:

- #24260 reports a `gpt-5.5` xhigh turn accepted immediately, then `30m38s` before the first persisted reasoning item. Later comments include `responses_http` idle spans of hundreds of seconds and a packet-capture case where a reused connection was reset but recovery waited for the `300s` stream idle timeout.
- #24414 reports the VS Code extension staying on `Thinking` for minutes even for simple prompts.
- #24419 reports CLI `Working` hangs and reconnect retries; interrupting and resubmitting the same prompt can run normally.

These may require separate transport/request watchdog fixes, but the Desktop issue should still distinguish them from renderer detachment, post-tool continuation stalls, and full-history hydration.


## Suspected root cause family

The common design problem seems to be that "thread state" is doing too many jobs at once:

- durable audit log;
- UI transcript;
- context replay source;
- app-server resume payload;
- renderer live state;
- trace/progress stream state;
- tool output/image artifact store;
- recovery source after restart or renderer reload.

When those roles are all served by unbounded JSONL and large in-memory turn arrays, one oversized or inconsistent thread can poison many surfaces.

## Requested fixes / invariants

### Bound persisted session records

- Cap persisted `function_call_output`, custom tool output, `event_msg` payloads, and `InputText`/image fields before they enter rollout JSONL.
- Do not persist raw `data:image` / base64 image payloads in normal rollout records or `compacted.payload.replacement_history`; use file/blob references, hashes, or placeholders.
- Avoid duplicating the same image/tool payload across both `response_item` and `event_msg`.
- Add rollout size warnings and automatic safe repair/export paths.

### Make thread loading lazy and paged

- `thread/read`, `thread/resume`, `thread/turns/list`, stream-state snapshots, and sidebar state should have hard byte/count caps.
- Opening a thread should return metadata and a recent bounded tail, for example the most recent 100-200 messages or a small byte cap.
- Older turns, heavy tool outputs, and images should load only on scroll/expand.
- Transcript rendering should be virtualized, but virtualization alone is insufficient if full history is still hydrated into renderer/app-server state.

### Separate context assembly from durable transcript history

- Context compaction should summarize or reference old heavy content instead of replaying raw tool outputs/images.
- Approval transcript or retry transcript injection should be bounded and deduplicated.
- Context/token accounting should expose whether growth came from visible user text, hidden tool output, compaction replacement history, approval transcript injection, or old transcript replay.

### Make active-turn ownership durable and recoverable

- Persist a turn record and its originating thread before streaming begins.
- Persist accepted user prompts before upstream/network submission.
- On renderer reload/restart, reattach by durable turn id and event cursor, not just in-memory renderer state.
- Stop/cancel should resolve to an explicit state: cancelled, still running remotely, already completed, failed to cancel, or unknown/detached.
- If backend active turns exist without renderer ownership, show a recovery banner and reattach option.

### Make resume/reconciliation authoritative

- A terminal backend turn must not rehydrate as `markedStreaming=true`.
- If a renderer receives item deltas for an unknown conversation, it should buffer briefly and force a thread/turn re-read instead of dropping state and spamming `Item not found in turn state`.
- On resume, reconcile the full item id set for the active/recent turn before applying live deltas.
- Orphan `task_started` turns should be repaired as interrupted/failed during resume rather than crashing LocalConversationPage.

### Improve diagnostics

- Add phase-specific timing/log events for: request accepted, upstream request sent, response headers received, first byte, first Responses event, first assistant/reasoning/tool item, tool result returned, post-tool continuation requested, context compaction started/completed, renderer attached/detached, and Stop/cancel result.
- Keep performance traces locally if upload fails (#24262).
- Preserve enough local diagnostics to tell whether a stuck turn is model/backend stall, transport retry, context compaction, app-server hydration, renderer detachment, or post-tool continuation loss.

## Related issues

Large history / full hydration:

- #11984
- #18693
- #20269
- #21299
- #22991

Hard size limits / image or tool payloads:

- #22004
- #24676
- #22091

Context compaction / context replay / usage amplification:

- #19842
- #19585
- #24095
- #25009

Turn lifecycle / UI ownership / trace and Stop desync:

- #21360
- #23035
- #23644
- #24263
- #24287
- #24434

Adjacent transport / first-output stall reports that should be distinguished in diagnostics:

- #24260
- #24414
- #24419

Potentially adjacent Desktop lifecycle reports:

- #22655
- #25094
- #21076

## Why this should be tracked as a meta issue

Individual reports often look like separate bugs because the immediate symptom differs: slow thread switching, V8 string crash, image-heavy freeze, compaction memory spike, stuck `Thinking`, invisible active turn, failed Stop, composer timeout, missing tool traces, or CLI/extension first-output stalls.

But the recurring evidence points to one architectural boundary that needs a coordinated fix: Codex should treat durable history, model context, renderer transcript, live turn stream, and recovery state as separate bounded contracts. Until those contracts are bounded and authoritative, fixes to only one surface, such as rendering virtualization or a single timeout tweak, are likely to leave other variants open.


### What steps can reproduce the bug?

## User-visible failure modes

- Desktop becomes slow or unresponsive when opening a long thread, switching threads, sending a new prompt in an old thread, or after repeated context compactions.
- The app can become unrecoverable on launch if the most recent session auto-resumes an oversized rollout.
- A short follow-up in an old session can freeze the UI even though a fresh session still works.
- Tool outputs can return successfully, but the assistant continuation never resumes.
- A prompt can be accepted and backend work can continue, while the Desktop UI stays stuck in `Thinking` and no progress traces appear.
- Stop/cancel can become unavailable, misleading, or ineffective because the UI has lost the active turn reference.
- After restart or renderer reload, prompts/traces/tool calls can be missing, stale, or only partially recovered.
- The same root state bloat can also drive fast context growth and usage drain through retained tool output, compaction history, and replayed approval/diagnostic transcripts.

### What is the expected behavior?

## Expected behavior

- Opening or switching to a thread should load metadata plus a bounded recent tail, not the entire transcript/tool/image history.
- Sending a short prompt in an old thread should not synchronously parse, render, serialize, or replay hundreds of MB of session data.
- Context compaction should reduce active prompt pressure and should not retain raw large tool outputs, image data, or repeated transcript-injection blocks.
- Rollout JSONL should not store raw image bytes or unbounded tool output inline when references, caps, summaries, or external artifacts would work.
- If one thread is too large or malformed, Desktop should fail that thread safely while leaving the rest of the app usable.
- If a prompt is accepted, the originating thread should durably show the user prompt, trace/progress stream, and Stop/cancel state.
- If the renderer loses ownership of an active backend turn, it should show an explicit recovery/reattach state rather than generic `Thinking`.
- Terminal backend states should be authoritative: completed, failed, interrupted, or cancelled turns should not rehydrate as streaming.
- Tool-returned, waiting-for-first-output, reconnecting, context-compacting, renderer-detached, and post-tool-continuation-stalled states should be distinguishable in UI and logs.


### Additional information

_No response_

Codex Desktop meta-bug: unbounded session/turn state causes freezes, context bloat, and lost active-turn control #25779

Description

What version of the Codex App are you using (From “About Codex” dialog)?

What subscription do you have?

What platform is your computer?

What issue are you seeing?

Codex Desktop meta-bug: unbounded session/turn state causes freezes, context bloat, and lost active-turn control

Suggested labels

Summary

Evidence from related open reports

1. Large local rollout/history hydration makes Desktop slow, frozen, or unrecoverable

2. Hard size limits and inline payloads expose the same design failure

3. Context compaction/replay can amplify the same state bloat

4. Active turn ownership and renderer/session state can desynchronize

5. Transport/first-output stalls are adjacent and currently indistinguishable in the UI

Suspected root cause family

Requested fixes / invariants

Bound persisted session records

Make thread loading lazy and paged

Separate context assembly from durable transcript history

Make active-turn ownership durable and recoverable

Make resume/reconciliation authoritative

Improve diagnostics

Related issues

Why this should be tracked as a meta issue

What steps can reproduce the bug?

User-visible failure modes

What is the expected behavior?

Expected behavior

Additional information

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions