Add throttling for tool stream invocations#308607
Merged
Conversation
Contributor
Contributor
There was a problem hiding this comment.
Pull request overview
This PR reduces extension-host GC pauses and UI hangs during chat streaming with tool calls by throttling updateToolInvocation updates, which currently trigger repeated partial-JSON parsing and cross-thread RPC overhead.
Changes:
- Add per-tool-call throttling (100ms) for
progress.updateToolInvocationduringcopilotToolCallStreamUpdates. - Buffer the latest tool stream update within the throttle window and flush buffered updates when the response stream completes.
- Track last-update timestamps and pending updates per tool call ID.
Show a summary per file
| File | Description |
|---|---|
| extensions/copilot/src/extension/prompt/node/pseudoStartStopConversationCallback.ts | Implements throttling + buffering/flush for tool invocation streaming updates to reduce parse/RPC frequency and GC pressure. |
Copilot's findings
- Files reviewed: 1/1 changed files
- Comments generated: 2
extensions/copilot/src/extension/prompt/node/pseudoStartStopConversationCallback.ts
Outdated
Show resolved
Hide resolved
extensions/copilot/src/extension/prompt/node/pseudoStartStopConversationCallback.ts
Show resolved
Hide resolved
bryanchen-d
approved these changes
Apr 9, 2026
lramos15
approved these changes
Apr 9, 2026
joshspicer
pushed a commit
that referenced
this pull request
Apr 9, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Throttle
updateToolInvocationduring tool-call streamingProblem
During chat streaming with tool calls, the extension host experienced 1.3–1.5s MajorGC pauses that froze token delivery, causing visible hangs in the chat UI.
Root cause
In
PseudoStopStartResponseProcessor.applyDeltaToProgress, every incoming SSE token triggered:tryParsePartialToolInput(update.arguments)— a recursive-descent partial JSON parse (best-effort-json-parser) of the entire accumulated arguments string from scratchprogress.updateToolInvocation()— an RPC call that serializes the parsed object viaJSON.stringify, sends it overpostMessageto the main thread, which then issues a round-trip RPC back to the ext host via$handleToolStreamFor a tool call with N argument tokens, this produced:
Before-fix traces
The issue was reproduced across three independent traces, all showing the same pattern: ext host MajorGC pauses during tool-call streaming causing message delivery gaps.
Trace 1 — single hang near end of response
operator()calls + 13 MinorGCs)Trace 2 — multiple hangs throughout streaming
Trace 3 — with CPU profiling enabled
(garbage collector)parseString→parseObject(partial JSON parser)parse9→tryParsePartialToolInput→applyDeltaToProgressstringify(RPC serialization)deserializeRequestJSONArgs(RPC deserialization)postMessage(IPC)Summary of before-fix traces
Fix
Added a 100ms throttle to
updateToolInvocationcalls inpseudoStartStopConversationCallback.ts. When tool stream updates arrive faster than every 100ms for the same tool call, the latest update is buffered. Pending updates are flushed when the response stream ends.The partial input is only used for streaming UI messages (e.g., "Creating file.ts (42 lines)") — it does not need per-token precision.
Results
Validated across two independent post-fix traces (traces 4 and 5).
parseStringCPU samplestryParsePartialToolInputCPU samplesstringify(RPC) CPU samples* Trace 4's remaining 448ms gap was caused by the main renderer's own GC (pid=19660), not the ext host.
Files changed
extensions/copilot/src/extension/prompt/node/pseudoStartStopConversationCallback.tsPerfetto query:
inclusive_samples: The number of CPU profiler samples where that function was anywhere on the call stack — either it was the function being directly executed, OR it was a caller higher up in the chain. For example, tryParsePartialToolInput has 27 inclusive samples meaning 27 times the CPU was sampled while executing either inside tryParsePartialToolInput itself or inside any function it called (parse9, parseObject, parseString, etc.).
pct_of_total: That inclusive count divided by the total number of CPU samples collected for that process (251,649 in trace 4, 406,012 in trace 3). It answers the question: "what fraction of the ext host's CPU time was spent in or under this function?" This normalizes for trace duration — a longer trace collects more samples proportionally, so the percentage is comparable across traces of different lengths.
So tryParsePartialToolInput at 0.011% in trace 4 means the ext host spent 0.011% of its total CPU time in the partial JSON parsing path, versus 0.552% in trace 3 before the fix.
WITH RECURSIVE ancestors AS (SELECT cpss.id as sample_id, cpss.callsite_id, spc.parent_id, spf.name FROM cpu_profile_stack_sample cpss JOIN stack_profile_callsite spc ON cpss.callsite_id = spc.id JOIN stack_profile_frame spf ON spc.frame_id = spf.id JOIN thread t ON cpss.utid = t.utid JOIN process p ON t.upid = p.upid WHERE p.pid = 27683 UNION ALL SELECT a.sample_id, spc.id, spc.parent_id, spf.name FROM ancestors a JOIN stack_profile_callsite spc ON a.parent_id = spc.id JOIN stack_profile_frame spf ON spc.frame_id = spf.id), total AS (SELECT count(*) as cnt FROM cpu_profile_stack_sample cpss JOIN thread t ON cpss.utid = t.utid JOIN process p ON t.upid = p.upid WHERE p.pid = 27683) SELECT a.name, count(DISTINCT a.sample_id) as inclusive_samples, round(100.0 * count(DISTINCT a.sample_id) / total.cnt, 3) as pct_of_total FROM ancestors a, total WHERE a.name IN ('tryParsePartialToolInput', 'applyDeltaToProgress', 'parseString', 'parse9', 'parseObject', '(garbage collector)') GROUP BY a.name ORDER BY inclusive_samples DESCvs.
WITH RECURSIVE ancestors AS (SELECT cpss.id as sample_id, cpss.callsite_id, spc.parent_id, spf.name FROM cpu_profile_stack_sample cpss JOIN stack_profile_callsite spc ON cpss.callsite_id = spc.id JOIN stack_profile_frame spf ON spc.frame_id = spf.id JOIN thread t ON cpss.utid = t.utid JOIN process p ON t.upid = p.upid WHERE p.pid = 30297 UNION ALL SELECT a.sample_id, spc.id, spc.parent_id, spf.name FROM ancestors a JOIN stack_profile_callsite spc ON a.parent_id = spc.id JOIN stack_profile_frame spf ON spc.frame_id = spf.id), total AS (SELECT count(*) as cnt FROM cpu_profile_stack_sample cpss JOIN thread t ON cpss.utid = t.utid JOIN process p ON t.upid = p.upid WHERE p.pid = 30297) SELECT a.name, count(DISTINCT a.sample_id) as inclusive_samples, round(100.0 * count(DISTINCT a.sample_id) / total.cnt, 3) as pct_of_total FROM ancestors a, total WHERE a.name IN ('tryParsePartialToolInput', 'applyDeltaToProgress', 'parseString', 'parse9', 'parseObject', '(garbage collector)') GROUP BY a.name ORDER BY inclusive_samples DESC(trace 4)