Skip to content

Add throttling for tool stream invocations#308607

Merged
pwang347 merged 5 commits intomainfrom
pawang/throttlePartialJson
Apr 9, 2026
Merged

Add throttling for tool stream invocations#308607
pwang347 merged 5 commits intomainfrom
pawang/throttlePartialJson

Conversation

@pwang347
Copy link
Copy Markdown
Member

@pwang347 pwang347 commented Apr 8, 2026

Throttle updateToolInvocation during tool-call streaming

Problem

During chat streaming with tool calls, the extension host experienced 1.3–1.5s MajorGC pauses that froze token delivery, causing visible hangs in the chat UI.

Root cause

In PseudoStopStartResponseProcessor.applyDeltaToProgress, every incoming SSE token triggered:

  1. tryParsePartialToolInput(update.arguments) — a recursive-descent partial JSON parse (best-effort-json-parser) of the entire accumulated arguments string from scratch
  2. progress.updateToolInvocation() — an RPC call that serializes the parsed object via JSON.stringify, sends it over postMessage to the main thread, which then issues a round-trip RPC back to the ext host via $handleToolStream

For a tool call with N argument tokens, this produced:

  • O(N²) total parsing work (re-parsing from position 0 on every token)
  • ~2N RPC round-trips (ext host → renderer → ext host per token)
  • Heap growth from ~700MB to ~920MB, triggering V8 MajorGC

Before-fix traces

The issue was reproduced across three independent traces, all showing the same pattern: ext host MajorGC pauses during tool-call streaming causing message delivery gaps.

Trace 1 — single hang near end of response

  • Ext host (pid=10273): 1,109ms MajorGC followed by 1,732ms RunTask (5,444 operator() calls + 13 MinorGCs)
  • Heap: 97MB → 303MB during chat session
  • Resulted in a 1,385ms message gap at the renderer (last message at 1255.6s, next at 1257.0s)
  • User-visible: ~1.4s freeze near "What's included: Overview" before remaining text appeared

Trace 2 — multiple hangs throughout streaming

  • Ext host (pid=19772): 3 MajorGCs during ~40s of active streaming:
    • 1,242ms at chat+1s → 1,355ms message gap
    • 1,342ms at chat+18s → 4,091ms message gap (compounded with renderer GC)
    • 1,529ms at chat+36s → 1,979ms message gap
  • GC durations increased over time (1,242 → 1,342 → 1,529ms) as the heap grew

Trace 3 — with CPU profiling enabled

  • Ext host (pid=27683): 3 MajorGCs totaling 4,086ms (1,378ms + 1,374ms + 1,333ms)
  • Heap: 759MB → 920MB
  • CPU profiler confirmed the allocation sources during GC windows:
Function CPU samples % of non-idle
(garbage collector) 27,738 18.1%
parseStringparseObject (partial JSON parser) 1,177 0.8%
parse9tryParsePartialToolInputapplyDeltaToProgress 987 0.6%
stringify (RPC serialization) 572 0.4%
deserializeRequestJSONArgs (RPC deserialization) 456 0.3%
postMessage (IPC) 722 0.5%

Summary of before-fix traces

Metric Trace 1 Trace 2 Trace 3
Ext host MajorGC count 1 3 3
Ext host MajorGC total time 1,109ms 4,113ms 4,086ms
Ext host max heap 303MB 920MB
Longest message gap 1,385ms 4,091ms

Fix

Added a 100ms throttle to updateToolInvocation calls in pseudoStartStopConversationCallback.ts. When tool stream updates arrive faster than every 100ms for the same tool call, the latest update is buffered. Pending updates are flushed when the response stream ends.

// Before: called on every token
progress.updateToolInvocation(update.id ?? '', {
    partialInput: tryParsePartialToolInput(update.arguments)
});

// After: throttled to at most once per 100ms per tool call
const now = Date.now();
const lastUpdate = this._lastToolStreamUpdate.get(toolId) ?? 0;
if (now - lastUpdate >= PseudoStopStartResponseProcessor._toolStreamThrottleMs) {
    this._lastToolStreamUpdate.set(toolId, now);
    this._pendingToolStreamUpdates.delete(toolId);
    progress.updateToolInvocation(toolId, { partialInput: tryParsePartialToolInput(update.arguments) });
} else {
    this._pendingToolStreamUpdates.set(toolId, { id: toolId, arguments: update.arguments });
}

The partial input is only used for streaming UI messages (e.g., "Creating file.ts (42 lines)") — it does not need per-token precision.

Results

Validated across two independent post-fix traces (traces 4 and 5).

Metric Trace 1 (before) Trace 2 (before) Trace 3 (before) Trace 4 (after) Trace 5 (after)
Ext host MajorGC count 1 3 3 0 0
Ext host MajorGC total time 1,109ms 4,113ms 4,086ms 0ms 0ms
Ext host max heap 303MB 920MB
Longest message gap 1,385ms 4,091ms 448ms*
parseString CPU samples 1,177 16 0
tryParsePartialToolInput CPU samples 987 gone gone
stringify (RPC) CPU samples 572

* Trace 4's remaining 448ms gap was caused by the main renderer's own GC (pid=19660), not the ext host.

Files changed

  • extensions/copilot/src/extension/prompt/node/pseudoStartStopConversationCallback.ts

Perfetto query:

inclusive_samples: The number of CPU profiler samples where that function was anywhere on the call stack — either it was the function being directly executed, OR it was a caller higher up in the chain. For example, tryParsePartialToolInput has 27 inclusive samples meaning 27 times the CPU was sampled while executing either inside tryParsePartialToolInput itself or inside any function it called (parse9, parseObject, parseString, etc.).

pct_of_total: That inclusive count divided by the total number of CPU samples collected for that process (251,649 in trace 4, 406,012 in trace 3). It answers the question: "what fraction of the ext host's CPU time was spent in or under this function?" This normalizes for trace duration — a longer trace collects more samples proportionally, so the percentage is comparable across traces of different lengths.

So tryParsePartialToolInput at 0.011% in trace 4 means the ext host spent 0.011% of its total CPU time in the partial JSON parsing path, versus 0.552% in trace 3 before the fix.

WITH RECURSIVE ancestors AS (SELECT cpss.id as sample_id, cpss.callsite_id, spc.parent_id, spf.name FROM cpu_profile_stack_sample cpss JOIN stack_profile_callsite spc ON cpss.callsite_id = spc.id JOIN stack_profile_frame spf ON spc.frame_id = spf.id JOIN thread t ON cpss.utid = t.utid JOIN process p ON t.upid = p.upid WHERE p.pid = 27683 UNION ALL SELECT a.sample_id, spc.id, spc.parent_id, spf.name FROM ancestors a JOIN stack_profile_callsite spc ON a.parent_id = spc.id JOIN stack_profile_frame spf ON spc.frame_id = spf.id), total AS (SELECT count(*) as cnt FROM cpu_profile_stack_sample cpss JOIN thread t ON cpss.utid = t.utid JOIN process p ON t.upid = p.upid WHERE p.pid = 27683) SELECT a.name, count(DISTINCT a.sample_id) as inclusive_samples, round(100.0 * count(DISTINCT a.sample_id) / total.cnt, 3) as pct_of_total FROM ancestors a, total WHERE a.name IN ('tryParsePartialToolInput', 'applyDeltaToProgress', 'parseString', 'parse9', 'parseObject', '(garbage collector)') GROUP BY a.name ORDER BY inclusive_samples DESC

image

vs.

WITH RECURSIVE ancestors AS (SELECT cpss.id as sample_id, cpss.callsite_id, spc.parent_id, spf.name FROM cpu_profile_stack_sample cpss JOIN stack_profile_callsite spc ON cpss.callsite_id = spc.id JOIN stack_profile_frame spf ON spc.frame_id = spf.id JOIN thread t ON cpss.utid = t.utid JOIN process p ON t.upid = p.upid WHERE p.pid = 30297 UNION ALL SELECT a.sample_id, spc.id, spc.parent_id, spf.name FROM ancestors a JOIN stack_profile_callsite spc ON a.parent_id = spc.id JOIN stack_profile_frame spf ON spc.frame_id = spf.id), total AS (SELECT count(*) as cnt FROM cpu_profile_stack_sample cpss JOIN thread t ON cpss.utid = t.utid JOIN process p ON t.upid = p.upid WHERE p.pid = 30297) SELECT a.name, count(DISTINCT a.sample_id) as inclusive_samples, round(100.0 * count(DISTINCT a.sample_id) / total.cnt, 3) as pct_of_total FROM ancestors a, total WHERE a.name IN ('tryParsePartialToolInput', 'applyDeltaToProgress', 'parseString', 'parse9', 'parseObject', '(garbage collector)') GROUP BY a.name ORDER BY inclusive_samples DESC (trace 4)

image

Copilot AI review requested due to automatic review settings April 8, 2026 20:48
@pwang347 pwang347 marked this pull request as ready for review April 8, 2026 20:50
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

Screenshot Changes

Base: 956e12ad Current: f7f81581

Changed (1)

editor/inlineChatAffordance/InlineChatOverlay/Light
Before After
before after

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces extension-host GC pauses and UI hangs during chat streaming with tool calls by throttling updateToolInvocation updates, which currently trigger repeated partial-JSON parsing and cross-thread RPC overhead.

Changes:

  • Add per-tool-call throttling (100ms) for progress.updateToolInvocation during copilotToolCallStreamUpdates.
  • Buffer the latest tool stream update within the throttle window and flush buffered updates when the response stream completes.
  • Track last-update timestamps and pending updates per tool call ID.
Show a summary per file
File Description
extensions/copilot/src/extension/prompt/node/pseudoStartStopConversationCallback.ts Implements throttling + buffering/flush for tool invocation streaming updates to reduce parse/RPC frequency and GC pressure.

Copilot's findings

  • Files reviewed: 1/1 changed files
  • Comments generated: 2

@pwang347 pwang347 merged commit 27ff22c into main Apr 9, 2026
23 checks passed
@pwang347 pwang347 deleted the pawang/throttlePartialJson branch April 9, 2026 14:40
@vs-code-engineering vs-code-engineering bot added this to the 1.116.0 milestone Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants