Skip to content

Codex Desktop meta-bug: unbounded session/turn state causes freezes, context bloat, and lost active-turn control #25779

@FromAriel

Description

@FromAriel

What version of the Codex App are you using (From “About Codex” dialog)?

26.527.60818

What subscription do you have?

pro

What platform is your computer?

Microsoft Windows NT 10.0.19045.0

What issue are you seeing?

Codex Desktop meta-bug: unbounded session/turn state causes freezes, context bloat, and lost active-turn control

Suggested labels

bug, app, session, context, tool-calls, performance, app-server

Summary

Several open Codex issues appear to be different symptoms of the same larger reliability problem:

Codex Desktop allows local session/turn state to become unbounded, then places that state on hot UI, resume, context assembly, IPC, and active-turn ownership paths. Once a thread is long-running, tool-heavy, image-heavy, repeatedly compacted, or has experienced renderer/restart/interrupt recovery, the app can freeze, become extremely slow, overfill context, lose trace/progress visibility, fail Stop/cancel, or continue backend work while the UI appears stuck in Thinking.

I do not think every linked issue is the same single code bug. The clearer framing is one parent/meta bug with several fix surfaces:

  1. unbounded persisted rollout/session records;
  2. full-history hydration into app-server/client/renderer state;
  3. context replay/compaction retaining large tool or image payloads;
  4. renderer/app-server turn ownership and trace-stream rehydration losing authority;
  5. insufficient diagnostics, because very different waits all collapse into Thinking, Working, or Reconnecting.

Evidence from related open reports

1. Large local rollout/history hydration makes Desktop slow, frozen, or unrecoverable

2. Hard size limits and inline payloads expose the same design failure

3. Context compaction/replay can amplify the same state bloat

4. Active turn ownership and renderer/session state can desynchronize

5. Transport/first-output stalls are adjacent and currently indistinguishable in the UI

Some reports may not share the same local-history root cause, but they matter because the UI collapses them into the same stuck state and recovery paths can interact with renderer/session state:

These may require separate transport/request watchdog fixes, but the Desktop issue should still distinguish them from renderer detachment, post-tool continuation stalls, and full-history hydration.

Suspected root cause family

The common design problem seems to be that "thread state" is doing too many jobs at once:

  • durable audit log;
  • UI transcript;
  • context replay source;
  • app-server resume payload;
  • renderer live state;
  • trace/progress stream state;
  • tool output/image artifact store;
  • recovery source after restart or renderer reload.

When those roles are all served by unbounded JSONL and large in-memory turn arrays, one oversized or inconsistent thread can poison many surfaces.

Requested fixes / invariants

Bound persisted session records

  • Cap persisted function_call_output, custom tool output, event_msg payloads, and InputText/image fields before they enter rollout JSONL.
  • Do not persist raw data:image / base64 image payloads in normal rollout records or compacted.payload.replacement_history; use file/blob references, hashes, or placeholders.
  • Avoid duplicating the same image/tool payload across both response_item and event_msg.
  • Add rollout size warnings and automatic safe repair/export paths.

Make thread loading lazy and paged

  • thread/read, thread/resume, thread/turns/list, stream-state snapshots, and sidebar state should have hard byte/count caps.
  • Opening a thread should return metadata and a recent bounded tail, for example the most recent 100-200 messages or a small byte cap.
  • Older turns, heavy tool outputs, and images should load only on scroll/expand.
  • Transcript rendering should be virtualized, but virtualization alone is insufficient if full history is still hydrated into renderer/app-server state.

Separate context assembly from durable transcript history

  • Context compaction should summarize or reference old heavy content instead of replaying raw tool outputs/images.
  • Approval transcript or retry transcript injection should be bounded and deduplicated.
  • Context/token accounting should expose whether growth came from visible user text, hidden tool output, compaction replacement history, approval transcript injection, or old transcript replay.

Make active-turn ownership durable and recoverable

  • Persist a turn record and its originating thread before streaming begins.
  • Persist accepted user prompts before upstream/network submission.
  • On renderer reload/restart, reattach by durable turn id and event cursor, not just in-memory renderer state.
  • Stop/cancel should resolve to an explicit state: cancelled, still running remotely, already completed, failed to cancel, or unknown/detached.
  • If backend active turns exist without renderer ownership, show a recovery banner and reattach option.

Make resume/reconciliation authoritative

  • A terminal backend turn must not rehydrate as markedStreaming=true.
  • If a renderer receives item deltas for an unknown conversation, it should buffer briefly and force a thread/turn re-read instead of dropping state and spamming Item not found in turn state.
  • On resume, reconcile the full item id set for the active/recent turn before applying live deltas.
  • Orphan task_started turns should be repaired as interrupted/failed during resume rather than crashing LocalConversationPage.

Improve diagnostics

  • Add phase-specific timing/log events for: request accepted, upstream request sent, response headers received, first byte, first Responses event, first assistant/reasoning/tool item, tool result returned, post-tool continuation requested, context compaction started/completed, renderer attached/detached, and Stop/cancel result.
  • Keep performance traces locally if upload fails (Performance trace upload timeout does not retain local trace artifact #24262).
  • Preserve enough local diagnostics to tell whether a stuck turn is model/backend stall, transport retry, context compaction, app-server hydration, renderer detachment, or post-tool continuation loss.

Related issues

Large history / full hydration:

Hard size limits / image or tool payloads:

Context compaction / context replay / usage amplification:

Turn lifecycle / UI ownership / trace and Stop desync:

Adjacent transport / first-output stall reports that should be distinguished in diagnostics:

Potentially adjacent Desktop lifecycle reports:

Why this should be tracked as a meta issue

Individual reports often look like separate bugs because the immediate symptom differs: slow thread switching, V8 string crash, image-heavy freeze, compaction memory spike, stuck Thinking, invisible active turn, failed Stop, composer timeout, missing tool traces, or CLI/extension first-output stalls.

But the recurring evidence points to one architectural boundary that needs a coordinated fix: Codex should treat durable history, model context, renderer transcript, live turn stream, and recovery state as separate bounded contracts. Until those contracts are bounded and authoritative, fixes to only one surface, such as rendering virtualization or a single timeout tweak, are likely to leave other variants open.

What steps can reproduce the bug?

User-visible failure modes

  • Desktop becomes slow or unresponsive when opening a long thread, switching threads, sending a new prompt in an old thread, or after repeated context compactions.
  • The app can become unrecoverable on launch if the most recent session auto-resumes an oversized rollout.
  • A short follow-up in an old session can freeze the UI even though a fresh session still works.
  • Tool outputs can return successfully, but the assistant continuation never resumes.
  • A prompt can be accepted and backend work can continue, while the Desktop UI stays stuck in Thinking and no progress traces appear.
  • Stop/cancel can become unavailable, misleading, or ineffective because the UI has lost the active turn reference.
  • After restart or renderer reload, prompts/traces/tool calls can be missing, stale, or only partially recovered.
  • The same root state bloat can also drive fast context growth and usage drain through retained tool output, compaction history, and replayed approval/diagnostic transcripts.

What is the expected behavior?

Expected behavior

  • Opening or switching to a thread should load metadata plus a bounded recent tail, not the entire transcript/tool/image history.
  • Sending a short prompt in an old thread should not synchronously parse, render, serialize, or replay hundreds of MB of session data.
  • Context compaction should reduce active prompt pressure and should not retain raw large tool outputs, image data, or repeated transcript-injection blocks.
  • Rollout JSONL should not store raw image bytes or unbounded tool output inline when references, caps, summaries, or external artifacts would work.
  • If one thread is too large or malformed, Desktop should fail that thread safely while leaving the rest of the app usable.
  • If a prompt is accepted, the originating thread should durably show the user prompt, trace/progress stream, and Stop/cancel state.
  • If the renderer loses ownership of an active backend turn, it should show an explicit recovery/reattach state rather than generic Thinking.
  • Terminal backend states should be authoritative: completed, failed, interrupted, or cancelled turns should not rehydrate as streaming.
  • Tool-returned, waiting-for-first-output, reconnecting, context-compacting, renderer-detached, and post-tool-continuation-stalled states should be distinguishable in UI and logs.

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    appIssues related to the Codex desktop appapp-serverIssues involving app server protocol or interfacesbugSomething isn't workingcontextIssues related to context management (including compaction)performancesessionIssues involving session (thread) management, resuming, forking, naming, archivingtool-callsIssues related to tool calling

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions