Skip to content

VS Code Codex extension repeatedly broadcasts huge thread-stream-state-changed snapshots for completed long threads, causing extension-host/renderer CPU spikes #20781

@duncanmcl

Description

@duncanmcl

What version of the IDE extension are you using?

26.429.30905

What subscription do you have?

Pro

Which IDE are you using?

VS Code

What platform is your computer?

Linux 6.8.0-110-generic x86_64 x86_64

What issue are you seeing?

The VS Code Codex extension can enter a high-CPU state when a long completed thread is opened or resumed. The extension repeatedly broadcasts thread-stream-state-changed messages with:

change.type = snapshot
params.change.conversationState = full conversation state

For one long local thread, each snapshot frame was about 116 MB. Repeated snapshots drove the VS Code NodeService / extension-host process to 250-400% CPU while idle or while switching/resuming threads.

This does not look like workspace indexing, project scanning, or app-server compute. The hot stack is Node stream/Buffer copying in the VS Code extension host while receiving/sending huge IPC frames.

Representative process sample before mitigation:

PID 3261642 code NodeService: 340-405% CPU
PID 3261826 codex app-server: 0-1% CPU

Representative Codex log line for the long thread:

maybe_resume_success
conversationId=019dd343-efa8-7093-8746-772db1e1ade7
latestTurnStatus=completed
markedStreaming=true
turnCount=1294+

IPC capture showed repeated huge frames:

method: thread-stream-state-changed
change.type: snapshot
frameBytes: about 116,220,000 to 116,550,000
large field: params.change.conversationState
conversationState bytes: about 116 MB

Normal healthy patch updates were tiny:

method: thread-stream-state-changed
change.type: patches
frameBytes: hundreds of bytes to a few KB

Largest observed frame shape, with conversation text omitted/redacted:

{
  "method": "thread-stream-state-changed",
  "changeType": "snapshot",
  "frameBytes": 116249683,
  "paramsBytes": 116249542,
  "changeBytes": 116249409,
  "fieldBytes": {
    "type": 10,
    "conversationState": 116249369
  }
}

CPU profile hot stack:

onStreamRead
Readable.push
addChunk
Buffer.concat
_copyActual
createUnsafeBuffer

Native perf sample showed:

node::StreamBase::CallJSOnreadMethod
node::EmitToJSStreamListener::OnStreamRead
node::LibuvStreamWrap::OnUvRead
__memmove_avx_unaligned_erms

strace -f -c against the hot VS Code NodeService showed very little file I/O:

read calls:   222
write calls:  175
dominant time: futex/epoll/memory activity

That argues against filesystem scanning as the cause.

After a local mitigation and VS Code reload, the extension log showed the large snapshot was being skipped:

Skipping large conversation snapshot broadcast
conversationId=019dd343-efa8-7093-8746-772db1e1ade7
turnCount=1307

After that, idle CPU dropped back to low/zero. CPU can still rise while the webview actively renders streamed answer text, but the idle full-snapshot storm stopped.

What steps can reproduce the bug?

The most reliable local trigger was:

  1. Have a long local Codex thread with about 1300 turns.
  2. Restart or reload VS Code.
  3. Open/resume the long thread in the Codex VS Code extension.
  4. Watch window1/exthost/openai.chatgpt/Codex.log.
  5. Observe a line like:
maybe_resume_success ... latestTurnStatus=completed markedStreaming=true turnCount=...
  1. Observe repeated thread-stream-state-changed warnings and high CPU.
  2. Instrument IPC or profile CPU to see huge snapshot frames and Buffer.concat hot stacks.

The relevant installed webview bundle code appears to be this reconnect/status path:

this.ipcBridge.registerBroadcastHandler(`client-status-changed`, e => {
  if (e.params.status === `connected`) {
    for (let e of this.streamingConversations) this.broadcastConversationSnapshot(e);
    return
  }
  ...
})

The resume path marks the conversation as streaming and broadcasts a snapshot even when the latest turn is completed:

e.markConversationStreaming(t),
e.setConversationStreamRole(t,{role:`owner`}),
e.broadcastConversationSnapshot(t);
V.info(`maybe_resume_success`, {
  safe: {
    conversationId:t,
    turnCount:ee.length,
    latestTurnStatus:F?.status??null,
    markedStreaming:true
  }
})

The full snapshot path sends the whole conversation:

broadcastConversationSnapshot(e){
  if(this.getStreamRole(e)?.role!==`owner`)return;
  let t=this.conversations.get(e);
  t && this.dispatchMessageFromView(
    `thread-stream-state-changed`,
    {
      conversationId:e,
      hostId:this.hostId,
      change:{type:`snapshot`,conversationState:t},
      version:ae(`thread-stream-state-changed`)
    }
  )
}

The patch path is already efficient and small:

broadcastIpcStatePatches(e,t){
  t.length!==0 &&
  this.getStreamRole(e)?.role===`owner` &&
  this.dispatchMessageFromView(
    `thread-stream-state-changed`,
    {
      conversationId:e,
      hostId:this.hostId,
      change:{type:`patches`,patches:t},
      version:ae(`thread-stream-state-changed`)
    }
  )
}

Likely root cause:

completed long thread remains in streamingConversations
-> client-status-changed/resume broadcasts full snapshot
-> full conversationState is about 116 MB
-> extension host spends CPU copying/parsing/rendering huge IPC frames

What is the expected behavior?

The extension should not repeatedly broadcast the entire conversation state for long completed threads.

Expected behavior:

completed resumed thread should not remain in streamingConversations
reconnect/client-status events should not send full snapshots for huge threads
long threads should use bounded/delta/paginated hydration
idle CPU should return to near zero after resume/render is complete

Suggested upstream fix:

  1. Do not call markConversationStreaming for resumed threads whose latest turn is completed.
  2. Do not include completed long threads in streamingConversations.
  3. On client-status-changed: connected, do not rebroadcast full snapshots for completed/non-active threads.
  4. Replace full-snapshot broadcast for long threads with bounded/paginated hydration or patch-only sync.
  5. Add a defensive max snapshot size before IPC dispatch.
  6. Ensure thread-stream-state-changed warnings do not flood logs when no handler is present.

An upstream guard could be shaped like:

if conversation is completed and not actively generating:
  do not mark streaming
  do not broadcast snapshot on reconnect

if snapshot would exceed a bounded IPC size:
  use paginated state sync or request/response hydration instead of broadcast

Additional information

I applied a local mitigation to confirm the diagnosis. This is not the ideal upstream fix, but it stopped the local CPU storm.

Patched local file:

/home/dun/.vscode/extensions/openai.chatgpt-26.429.30905-linux-x64/webview/assets/app-server-manager-signals-D5l5fvZj.js

Local mitigation:

broadcastConversationSnapshot(e){
  if(this.getStreamRole(e)?.role!==`owner`)return;
  let t=this.conversations.get(e);
  if(!t)return;

  // Cheap guard first. Do not JSON.stringify a huge conversation just to decide
  // whether to skip it.
  let n=t?.turns?.length??0;
  if(n>100){
    V.warning(`Skipping large conversation snapshot broadcast`,{
      safe:{conversationId:e,turnCount:n},
      sensitive:{}
    });
    return
  }

  // Backstop for small/medium threads with unexpectedly huge embedded state.
  let r=0;
  try{r=JSON.stringify(t).length}catch{}
  if(r>5e6){
    V.warning(`Skipping oversized conversation snapshot broadcast`,{
      safe:{conversationId:e,snapshotBytes:r,turnCount:n},
      sensitive:{}
    });
    return
  }

  this.dispatchMessageFromView(`thread-stream-state-changed`,{
    conversationId:e,
    hostId:this.hostId,
    change:{type:`snapshot`,conversationState:t},
    version:ae(`thread-stream-state-changed`)
  })
}

Important detail: warning suppression alone is not enough. Some existing workarounds suppress:

[IpcClient] Received broadcast but no handler is configured method=thread-stream-state-changed

That reduces log spam, but it does not stop the underlying huge conversationState broadcast. In this case the dominant cost was the 116 MB IPC frame and Buffer/JSON processing, not just the warning log line.

Metadata

Metadata

Assignees

No one assigned

    Labels

    app-serverIssues involving app server protocol or interfacesbugSomething isn't workingextensionIssues related to the VS Code extensionperformance

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions