Summary
A Codex Desktop turn using gpt-5.5 with xhigh reasoning remained user-visible as Thinking for more than 30 minutes before the first persisted assistant/reasoning item appeared. Once the first item appeared, the turn continued normally with assistant text and tool calls within seconds.
This looks different from ordinary slow generation: the local rollout has no assistant/reasoning/tool event during the gap, and normal persisted logs did not retain a useful stream/retry diagnostic for the missing interval.
Environment
- Product: Codex Desktop on Windows 11 with WSL2 Ubuntu workspace
- Desktop package observed in logs:
OpenAI.Codex_26.519.5221.0
- Desktop release string observed in logs:
26.519.41501
- WSL app-server binary:
codex-cli 0.130.0-alpha.5
- Model:
gpt-5.5
- Reasoning effort:
xhigh
- Workspace type: WSL project
Primary observed case
All timestamps below are UTC.
- User submitted turn:
2026-05-23T16:09:18.281Z
- First persisted assistant/reasoning item:
2026-05-23T16:39:56.682Z
- Pre-first-output gap: about
30m38s
First persisted output sequence after the stall:
2026-05-23T16:39:56.682Z response_item reasoning
2026-05-23T16:39:57.612Z event_msg agent_message
2026-05-23T16:39:57.612Z response_item message assistant
2026-05-23T16:40:08.580Z response_item function_call
2026-05-23T16:40:08.726Z response_item function_call_output
2026-05-23T16:40:08.727Z event_msg token_count
User-visible behavior: the thread sat on Thinking for the entire gap. When it finally resumed, it did not appear to replay a backlog; it just began producing the first reasoning/message/tool items at normal cadence.
Secondary same-day signal
In another thread on the same Desktop session, the UI visibly showed Reconnecting... 2/5 while still in Thinking on gpt-5.5/xhigh. That shorter case had about a 41.6s gap before first reasoning output, but the local Desktop app-server transport logs did not show a corresponding app-server reconnect/restart.
This may be related to existing reconnect/stream issues, but the important gap here is observability: the visible reconnect state and the long pre-first-output stall are not represented clearly enough in the durable rollout/log artifacts.
Historical local scan
I scanned local rollout JSONL files for the same host, using time from user submission to first assistant/reasoning/tool item. Private transcripts and paths were not included in this report.
High-level results:
rollout files scanned: 305
completed turns with usable timing: 7618
gpt-5.5 / xhigh: n=168, p95=1m11s, max=30m38s, >=120s=3, >=300s=1, >=600s=1, >=1800s=1
gpt-5.5 / high: n=496, max=3m13s, >=120s=2
gpt-5.5 / medium: n=347, max=1m55s, >=120s=0
gpt-5.5 / low: n=120, max=1m07s, >=120s=0
codex-auto-review / low: n=6452, max=0m48s, >=120s=0
This suggests the 30m+ outlier is strongly associated with gpt-5.5 + xhigh in this local sample, though it does not prove the issue is exclusive to xhigh.
Expected behavior
- A turn should either start streaming within the normal startup range, or surface a durable diagnostic/error state if the response stream is idle/retrying for many minutes before first output.
- If the UI shows reconnecting/retry state, the durable logs/rollout should retain enough information to distinguish model queueing, websocket retry, backend stall, app-server transport reconnect, and local UI state races.
Actual behavior
- UI remained on
Thinking for 30m38s before the first persisted output item.
- The turn eventually resumed normally, making it look like the request was alive but silent for the entire interval.
- A shorter same-day case showed visible
Reconnecting... 2/5, but there was no matching Desktop app-server reconnect/restart in local logs.
Related issues
Possibly related, but not exact duplicates:
Summary
A Codex Desktop turn using
gpt-5.5withxhighreasoning remained user-visible asThinkingfor more than 30 minutes before the first persisted assistant/reasoning item appeared. Once the first item appeared, the turn continued normally with assistant text and tool calls within seconds.This looks different from ordinary slow generation: the local rollout has no assistant/reasoning/tool event during the gap, and normal persisted logs did not retain a useful stream/retry diagnostic for the missing interval.
Environment
OpenAI.Codex_26.519.5221.026.519.41501codex-cli 0.130.0-alpha.5gpt-5.5xhighPrimary observed case
All timestamps below are UTC.
2026-05-23T16:09:18.281Z2026-05-23T16:39:56.682Z30m38sFirst persisted output sequence after the stall:
User-visible behavior: the thread sat on
Thinkingfor the entire gap. When it finally resumed, it did not appear to replay a backlog; it just began producing the first reasoning/message/tool items at normal cadence.Secondary same-day signal
In another thread on the same Desktop session, the UI visibly showed
Reconnecting... 2/5while still inThinkingongpt-5.5/xhigh. That shorter case had about a41.6sgap before first reasoning output, but the local Desktop app-server transport logs did not show a corresponding app-server reconnect/restart.This may be related to existing reconnect/stream issues, but the important gap here is observability: the visible reconnect state and the long pre-first-output stall are not represented clearly enough in the durable rollout/log artifacts.
Historical local scan
I scanned local rollout JSONL files for the same host, using time from user submission to first assistant/reasoning/tool item. Private transcripts and paths were not included in this report.
High-level results:
This suggests the 30m+ outlier is strongly associated with
gpt-5.5+xhighin this local sample, though it does not prove the issue is exclusive toxhigh.Expected behavior
Actual behavior
Thinkingfor30m38sbefore the first persisted output item.Reconnecting... 2/5, but there was no matching Desktop app-server reconnect/restart in local logs.Related issues
Possibly related, but not exact duplicates:
Reconnecting...visible while app-server transport appears connectedresponse.completed