Add throttling for tool stream invocations by pwang347 · Pull Request #308607 · microsoft/vscode

pwang347 · 2026-04-08T20:48:35Z

Throttle `updateToolInvocation` during tool-call streaming

Problem

During chat streaming with tool calls, the extension host experienced 1.3–1.5s MajorGC pauses that froze token delivery, causing visible hangs in the chat UI.

Root cause

In PseudoStopStartResponseProcessor.applyDeltaToProgress, every incoming SSE token triggered:

tryParsePartialToolInput(update.arguments) — a recursive-descent partial JSON parse (best-effort-json-parser) of the entire accumulated arguments string from scratch
progress.updateToolInvocation() — an RPC call that serializes the parsed object via JSON.stringify, sends it over postMessage to the main thread, which then issues a round-trip RPC back to the ext host via $handleToolStream

For a tool call with N argument tokens, this produced:

O(N²) total parsing work (re-parsing from position 0 on every token)
~2N RPC round-trips (ext host → renderer → ext host per token)
Heap growth from ~700MB to ~920MB, triggering V8 MajorGC

Before-fix traces

The issue was reproduced across three independent traces, all showing the same pattern: ext host MajorGC pauses during tool-call streaming causing message delivery gaps.

Trace 1 — single hang near end of response

Ext host (pid=10273): 1,109ms MajorGC followed by 1,732ms RunTask (5,444 operator() calls + 13 MinorGCs)
Heap: 97MB → 303MB during chat session
Resulted in a 1,385ms message gap at the renderer (last message at 1255.6s, next at 1257.0s)
User-visible: ~1.4s freeze near "What's included: Overview" before remaining text appeared

Trace 2 — multiple hangs throughout streaming

Ext host (pid=19772): 3 MajorGCs during ~40s of active streaming:
- 1,242ms at chat+1s → 1,355ms message gap
- 1,342ms at chat+18s → 4,091ms message gap (compounded with renderer GC)
- 1,529ms at chat+36s → 1,979ms message gap
GC durations increased over time (1,242 → 1,342 → 1,529ms) as the heap grew

Trace 3 — with CPU profiling enabled

Ext host (pid=27683): 3 MajorGCs totaling 4,086ms (1,378ms + 1,374ms + 1,333ms)
Heap: 759MB → 920MB
CPU profiler confirmed the allocation sources during GC windows:

Function	CPU samples	% of non-idle
`(garbage collector)`	27,738	18.1%
`parseString` → `parseObject` (partial JSON parser)	1,177	0.8%
`parse9` → `tryParsePartialToolInput` → `applyDeltaToProgress`	987	0.6%
`stringify` (RPC serialization)	572	0.4%
`deserializeRequestJSONArgs` (RPC deserialization)	456	0.3%
`postMessage` (IPC)	722	0.5%

Summary of before-fix traces

Metric	Trace 1	Trace 2	Trace 3
Ext host MajorGC count	1	3	3
Ext host MajorGC total time	1,109ms	4,113ms	4,086ms
Ext host max heap	303MB	—	920MB
Longest message gap	1,385ms	4,091ms	—

Fix

Added a 100ms throttle to updateToolInvocation calls in pseudoStartStopConversationCallback.ts. When tool stream updates arrive faster than every 100ms for the same tool call, the latest update is buffered. Pending updates are flushed when the response stream ends.

// Before: called on every token
progress.updateToolInvocation(update.id ?? '', {
    partialInput: tryParsePartialToolInput(update.arguments)
});

// After: throttled to at most once per 100ms per tool call
const now = Date.now();
const lastUpdate = this._lastToolStreamUpdate.get(toolId) ?? 0;
if (now - lastUpdate >= PseudoStopStartResponseProcessor._toolStreamThrottleMs) {
    this._lastToolStreamUpdate.set(toolId, now);
    this._pendingToolStreamUpdates.delete(toolId);
    progress.updateToolInvocation(toolId, { partialInput: tryParsePartialToolInput(update.arguments) });
} else {
    this._pendingToolStreamUpdates.set(toolId, { id: toolId, arguments: update.arguments });
}

The partial input is only used for streaming UI messages (e.g., "Creating file.ts (42 lines)") — it does not need per-token precision.

Results

Validated across two independent post-fix traces (traces 4 and 5).

Metric	Trace 1 (before)	Trace 2 (before)	Trace 3 (before)	Trace 4 (after)	Trace 5 (after)
Ext host MajorGC count	1	3	3	0	0
Ext host MajorGC total time	1,109ms	4,113ms	4,086ms	0ms	0ms
Ext host max heap	303MB	—	920MB	—	—
Longest message gap	1,385ms	4,091ms	—	448ms*	—
`parseString` CPU samples	—	—	1,177	16	0
`tryParsePartialToolInput` CPU samples	—	—	987	gone	gone
`stringify` (RPC) CPU samples	—	—	572	—	—

* Trace 4's remaining 448ms gap was caused by the main renderer's own GC (pid=19660), not the ext host.

Files changed

extensions/copilot/src/extension/prompt/node/pseudoStartStopConversationCallback.ts

Perfetto query:

inclusive_samples: The number of CPU profiler samples where that function was anywhere on the call stack — either it was the function being directly executed, OR it was a caller higher up in the chain. For example, tryParsePartialToolInput has 27 inclusive samples meaning 27 times the CPU was sampled while executing either inside tryParsePartialToolInput itself or inside any function it called (parse9, parseObject, parseString, etc.).

pct_of_total: That inclusive count divided by the total number of CPU samples collected for that process (251,649 in trace 4, 406,012 in trace 3). It answers the question: "what fraction of the ext host's CPU time was spent in or under this function?" This normalizes for trace duration — a longer trace collects more samples proportionally, so the percentage is comparable across traces of different lengths.

So tryParsePartialToolInput at 0.011% in trace 4 means the ext host spent 0.011% of its total CPU time in the partial JSON parsing path, versus 0.552% in trace 3 before the fix.

WITH RECURSIVE ancestors AS (SELECT cpss.id as sample_id, cpss.callsite_id, spc.parent_id, spf.name FROM cpu_profile_stack_sample cpss JOIN stack_profile_callsite spc ON cpss.callsite_id = spc.id JOIN stack_profile_frame spf ON spc.frame_id = spf.id JOIN thread t ON cpss.utid = t.utid JOIN process p ON t.upid = p.upid WHERE p.pid = 27683 UNION ALL SELECT a.sample_id, spc.id, spc.parent_id, spf.name FROM ancestors a JOIN stack_profile_callsite spc ON a.parent_id = spc.id JOIN stack_profile_frame spf ON spc.frame_id = spf.id), total AS (SELECT count(*) as cnt FROM cpu_profile_stack_sample cpss JOIN thread t ON cpss.utid = t.utid JOIN process p ON t.upid = p.upid WHERE p.pid = 27683) SELECT a.name, count(DISTINCT a.sample_id) as inclusive_samples, round(100.0 * count(DISTINCT a.sample_id) / total.cnt, 3) as pct_of_total FROM ancestors a, total WHERE a.name IN ('tryParsePartialToolInput', 'applyDeltaToProgress', 'parseString', 'parse9', 'parseObject', '(garbage collector)') GROUP BY a.name ORDER BY inclusive_samples DESC

vs.

WITH RECURSIVE ancestors AS (SELECT cpss.id as sample_id, cpss.callsite_id, spc.parent_id, spf.name FROM cpu_profile_stack_sample cpss JOIN stack_profile_callsite spc ON cpss.callsite_id = spc.id JOIN stack_profile_frame spf ON spc.frame_id = spf.id JOIN thread t ON cpss.utid = t.utid JOIN process p ON t.upid = p.upid WHERE p.pid = 30297 UNION ALL SELECT a.sample_id, spc.id, spc.parent_id, spf.name FROM ancestors a JOIN stack_profile_callsite spc ON a.parent_id = spc.id JOIN stack_profile_frame spf ON spc.frame_id = spf.id), total AS (SELECT count(*) as cnt FROM cpu_profile_stack_sample cpss JOIN thread t ON cpss.utid = t.utid JOIN process p ON t.upid = p.upid WHERE p.pid = 30297) SELECT a.name, count(DISTINCT a.sample_id) as inclusive_samples, round(100.0 * count(DISTINCT a.sample_id) / total.cnt, 3) as pct_of_total FROM ancestors a, total WHERE a.name IN ('tryParsePartialToolInput', 'applyDeltaToProgress', 'parseString', 'parse9', 'parseObject', '(garbage collector)') GROUP BY a.name ORDER BY inclusive_samples DESC (trace 4)

github-actions · 2026-04-08T20:52:38Z

Screenshot Changes

Base: 956e12ad Current: f7f81581

Changed (1)

editor/inlineChatAffordance/InlineChatOverlay/Light

Before	After

Copilot

Pull request overview

This PR reduces extension-host GC pauses and UI hangs during chat streaming with tool calls by throttling updateToolInvocation updates, which currently trigger repeated partial-JSON parsing and cross-thread RPC overhead.

Changes:

Add per-tool-call throttling (100ms) for progress.updateToolInvocation during copilotToolCallStreamUpdates.
Buffer the latest tool stream update within the throttle window and flush buffered updates when the response stream completes.
Track last-update timestamps and pending updates per tool call ID.

Show a summary per file

File	Description
extensions/copilot/src/extension/prompt/node/pseudoStartStopConversationCallback.ts	Implements throttling + buffering/flush for tool invocation streaming updates to reduce parse/RPC frequency and GC pressure.

Copilot's findings

Files reviewed: 1/1 changed files
Comments generated: 2

extensions/copilot/src/extension/prompt/node/pseudoStartStopConversationCallback.ts

pwang347 added 2 commits April 8, 2026 13:46

throttle

6ca0718

throttle

9bc1d53

Copilot AI review requested due to automatic review settings April 8, 2026 20:48

throttle

4219c1e

Copilot started reviewing on behalf of pwang347 April 8, 2026 20:49 View session

pwang347 assigned lramos15 Apr 8, 2026

pwang347 marked this pull request as ready for review April 8, 2026 20:50

Copilot AI reviewed Apr 8, 2026

View reviewed changes

extensions/copilot/src/extension/prompt/node/pseudoStartStopConversationCallback.ts Outdated Show resolved Hide resolved

extensions/copilot/src/extension/prompt/node/pseudoStartStopConversationCallback.ts Show resolved Hide resolved

pwang347 added 2 commits April 8, 2026 14:06

PR update

c4542c6

test

5d15828

austindyoung mentioned this pull request Apr 8, 2026

[mirror] microsoft/vscode#308607 Add throttling for tool stream invocations austindyoung/vscode-driftfence-mirror-fork#1331

Open

bryanchen-d approved these changes Apr 9, 2026

View reviewed changes

lramos15 approved these changes Apr 9, 2026

View reviewed changes

pwang347 merged commit 27ff22c into main Apr 9, 2026
23 checks passed

pwang347 deleted the pawang/throttlePartialJson branch April 9, 2026 14:40

vs-code-engineering bot added this to the 1.116.0 milestone Apr 9, 2026

joshspicer pushed a commit that referenced this pull request Apr 9, 2026

Add throttling for tool stream invocations (#308607)

5dbdfba

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add throttling for tool stream invocations#308607

Add throttling for tool stream invocations#308607
pwang347 merged 5 commits intomainfrom
pawang/throttlePartialJson

pwang347 commented Apr 8, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

pwang347 commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Throttle updateToolInvocation during tool-call streaming

Problem

Root cause

Before-fix traces

Trace 1 — single hang near end of response

Trace 2 — multiple hangs throughout streaming

Trace 3 — with CPU profiling enabled

Summary of before-fix traces

Fix

Results

Files changed

Perfetto query:

Uh oh!

github-actions bot commented Apr 8, 2026

Screenshot Changes

Changed (1)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pwang347 commented Apr 8, 2026 •

edited

Loading

Throttle `updateToolInvocation` during tool-call streaming