Skip to content

Codex Desktop 26.506 rapidly bloats context from retained tool outputs and can freeze old sessions #22091

@javenfang

Description

@javenfang

What version of the Codex App are you using (From "About Codex" dialog)?

26.506.31421 (bundle version 2620)

What subscription do you have?

Pro

What platform is your computer?

Darwin 25.3.0 arm64 arm

macOS 26.3 (25D125), Apple Silicon

What issue are you seeing?

The Codex Desktop app appears to have a recent regression where old and tool-heavy sessions can quickly become unusable:

  1. Today I hit two UI freezes / unresponsive states in Codex Desktop and had to kill the app process and restart.
  2. After the first kill/restart, I sent only one message in each of two older sessions, and the app became unresponsive again.
  3. In a fresh new session, the UI has not frozen yet, but the context meter still grows very quickly. After only a small number of visible user interactions, the session already reached about half of the model context window.

This does not look like a simple user-configuration issue. The same general configuration had been used for a long time without obvious UI freezes. The current app bundle on this machine was updated/modified locally on 2026-05-09, and the issue became obvious recently.

Local token counters confirm the context meter behavior:

  • Current fresh session id: 019e150c-336e-7220-a03d-bc2e3187603c
  • Current fresh session reached last_input_tokens = 137293
  • Model context window: 258400
  • Context usage: about 53.1%
  • The thread only had a few visible interactions.

An earlier same-topic diagnostic session showed a more severe version:

  • Session id: 019e14f1-6997-7f52-b854-97813060dee7
  • Peak observed last_input_tokens = 234757
  • Model context window: 258400
  • After compaction, it dropped to about 30045, then quickly re-inflated to 171023
  • Total accumulated input tokens reached 5610822 during a short diagnostic thread

In both cases, the direct local evidence points to large tool outputs being retained in the transcript and replayed into later turns. In the current session, the largest retained tool outputs were approximately:

  • 40153 characters
  • 39910 characters
  • 39893 characters
  • 39440 characters
  • 37293 characters

Those outputs were not pasted by the user; they were shell/tool results shown during diagnosis. Once retained, they appear to drive rapid context growth and may also contribute to the Electron renderer becoming unresponsive when older sessions are resumed or displayed.

What steps can reproduce the bug?

I do not have a minimal public repo yet, but the local reproduction pattern is:

  1. Use Codex Desktop 26.506.31421 on macOS / Apple Silicon.
  2. Open or create a session using gpt-5.5.
  3. Run several diagnostic shell commands that produce moderately large outputs, e.g. log searches, tail, rg, process listings, or JSONL session inspection. Some outputs around 37K-40K characters are enough to make the effect visible.
  4. Continue the conversation with a few short user messages.
  5. Observe that last_input_tokens and the UI context meter rise very quickly, even when the user-visible conversation is short.
  6. In older / already-large sessions, after killing and restarting Codex Desktop, send one message in the old session. The UI may freeze / become unresponsive again.

Observed token progression in a fresh session:

27541 / 258400
42797 / 258400
60516 / 258400
68688 / 258400
106885 / 258400
115264 / 258400
137293 / 258400

Observed token progression in the previous same-topic session:

199962 / 258400
234757 / 258400
compaction or reset to about 30045 / 258400
144346 / 258400
171023 / 258400

What is the expected behavior?

  • Tool output should be aggressively bounded, summarized, or excluded from later prompt replay when it is too large.
  • Context compaction should keep the active prompt budget under control after tool-heavy turns.
  • Opening or sending a short message in an older session should not freeze the Desktop app.
  • The UI context meter should not jump to half or near-full context after only a few short visible user interactions unless the app can clearly explain what hidden retained content is being counted.

Additional information

Related but not identical issues found before filing:

This report is specifically for Codex Desktop on macOS 26.506.31421, with two observed UI freezes today and numeric evidence that tool output retention/replay can push a new thread to ~53% of the context window after only a few visible interactions.

No Crashpad dump was found locally, which is consistent with a UI hang / unresponsive renderer rather than a clean crash.

Metadata

Metadata

Assignees

No one assigned

    Labels

    appIssues related to the Codex desktop appbugSomething isn't workingcontextIssues related to context management (including compaction)performancetool-callsIssues related to tool calling

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions