Make timeout optional in run_in_terminal and guide model to omit it for long-running commands#311965
Merged
meganrogge merged 1 commit intomainfrom Apr 22, 2026
Merged
Make timeout optional in run_in_terminal and guide model to omit it for long-running commands#311965meganrogge merged 1 commit intomainfrom
timeout optional in run_in_terminal and guide model to omit it for long-running commands#311965meganrogge merged 1 commit intomainfrom
Conversation
Contributor
Contributor
There was a problem hiding this comment.
Pull request overview
Updates the run_in_terminal chat tool to avoid model-chosen short timeouts for long-running terminal commands by making timeout optional and defaulting omitted sync invocations to no-timeout behavior.
Changes:
- Makes
timeoutoptional in therun_in_terminalinput schema and updates its schema description to recommend omission for long-running commands. - Extends the tool
modelDescriptionwith explicit guidance to omittimeoutfor long-running scenarios. - Defaults
args.timeoutto0whenmode='sync'andtimeoutis omitted.
Show a summary per file
| File | Description |
|---|---|
| src/vs/workbench/contrib/terminalContrib/chatAgentTools/browser/tools/runInTerminalTool.ts | Adjusts tool prompt/schema to make timeout optional and changes runtime behavior to treat omitted sync timeouts as no-timeout. |
Copilot's findings
Comments suppressed due to low confidence (2)
src/vs/workbench/contrib/terminalContrib/chatAgentTools/browser/tools/runInTerminalTool.ts:360
- Making
timeoutnon-required in the input schema also makes it optional formode='async'. Ininvoke, ifargs.timeoutis omitted and the wait strategy isidle,timeoutRacePromiseis never created and the tool can wait indefinitely for the initial idle/output signal (it awaitsoutputMonitor.onDidFinishCommandwithout a race). Consider either keepingtimeoutrequired formode='async'(schema-level conditional) or assigning a reasonable default timeout whenmode='async'andtimeoutis omitted, while keeping the new "omit for long-running sync commands" behavior.
type: 'number',
description: 'Optional hard cap in milliseconds on how long the tool tracks the command before returning. Omit to let the command run to completion (recommended for package installs, builds, and long-running scripts). Use 0 to explicitly indicate no timeout.',
},
},
required: ['command', 'explanation', 'goal', 'mode']
src/vs/workbench/contrib/terminalContrib/chatAgentTools/browser/tools/runInTerminalTool.ts:1185
- There are existing tests for
RunInTerminalTool, but none appear to cover the new behavior wheremode='sync'omitstimeoutand the tool defaults it to0. Adding a test for this regression path would help ensure the tool no longer errors and that the command isn't prematurely moved to the background due to an implicit/guessed timeout.
if (executionOptions.mode === 'sync' && args.timeout === undefined) {
// Timeout is optional for mode=sync: when omitted, the tool waits for
// the command to complete with no hard cap. Models frequently pick
// timeouts that are too short for package installs, builds, and
// long-running scripts, which causes the command to be moved to the
// background unnecessarily.
args.timeout = 0;
- Files reviewed: 1/1 changed files
- Comments generated: 1
roblourens
approved these changes
Apr 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes https://github.com/microsoft/vscode-copilot-evaluation/issues/3499
Root Cause
Commit 992a684 (April 6, 2026) made
timeouta required parameter in therun_in_terminaltool schema:With timeout mandatory and the only guidance being
"Use 0 for no timeout", models consistently picked short values (≤120s) rather than 0. Combined withchat.tools.terminal.enforceTimeoutFromModeldefaulting totrue, these short timeouts were always enforced — causing commands to be moved to the background mid-execution and cascading into confusion.Evidence from Eval Runs
Run
24049820070(April 6, 2026 — same day as the commit, VS Code1.117.0)qemu-alpine-sshextract-moves-from-videoquery-optimizeExample from
qemu-alpine-ssh, step 2:Run
24638368292(April 19, 2026 — VS Code commita947515f, timeout still mandatory)install-windows-3.11mteb-leaderboardmcmc-sampling-stanquery-optimizeExample from
install-windows-3.11, step 6:Example from
mteb-leaderboard, step 4:Timeout distribution across 56 total timeout events:
46% of timeouts were at exactly 120s — the model's "safe-seeming" default that's still insufficient for builds, installs, and network-bound commands.
How the Current Fix Addresses It
The uncommitted changes on the working tree make two targeted changes:
1. Remove
timeoutfromrequired:2. Replace the sparse description with explicit guidance:
3. Add a "Timeout parameter" section to
modelDescription:4. When
timeoutis omitted formode=sync, default to 0 (no timeout):This means for the majority of
mode=synccalls where the model omits timeout, commands will wait indefinitely — exactly right forpython3 boot_alpine_vm.py,apt-get install,make, etc. The model can still pass an explicit timeout for cases where it truly wants a cap (e.g.,sshconnection attempts that should fail fast).