Skip to content

Make timeout optional in run_in_terminal and guide model to omit it for long-running commands#311965

Merged
meganrogge merged 1 commit intomainfrom
merogge/timeout-update
Apr 22, 2026
Merged

Make timeout optional in run_in_terminal and guide model to omit it for long-running commands#311965
meganrogge merged 1 commit intomainfrom
merogge/timeout-update

Conversation

@meganrogge
Copy link
Copy Markdown
Collaborator

@meganrogge meganrogge commented Apr 22, 2026

Fixes https://github.com/microsoft/vscode-copilot-evaluation/issues/3499

Root Cause

Commit 992a684 (April 6, 2026) made timeout a required parameter in the run_in_terminal tool schema:

-required: ['command', 'explanation', 'goal', 'mode']
+required: ['command', 'explanation', 'goal', 'mode', 'timeout']

With timeout mandatory and the only guidance being "Use 0 for no timeout", models consistently picked short values (≤120s) rather than 0. Combined with chat.tools.terminal.enforceTimeoutFromModel defaulting to true, these short timeouts were always enforced — causing commands to be moved to the background mid-execution and cascading into confusion.


Evidence from Eval Runs

Run 24049820070 (April 6, 2026 — same day as the commit, VS Code 1.117.0)

  • 16 timeout events across 6 instances
  • Top offenders:
Instance Timeouts Values used
qemu-alpine-ssh 5 5s, 120s, 240s
extract-moves-from-video 4 30s, 60s
query-optimize 3 40s, 180s

Example from qemu-alpine-ssh, step 2:

command: python3 /tmp/boot_alpine_vm.py
timeout: 120000ms → timed out after 120s

Run 24638368292 (April 19, 2026 — VS Code commit a947515f, timeout still mandatory)

  • 40 timeout events across 14 instances2.5× worse
  • Top offenders:
Instance Timeouts Values used
install-windows-3.11 10 20s, 30s, 60s
mteb-leaderboard 11 30s, 70s, 100s
mcmc-sampling-stan 4 120s
query-optimize 3 20s, 40s, 120s

Example from install-windows-3.11, step 6:

command: set -e\nls -1 /usr/share/novnc | head
mode: sync, timeout: 20000ms → "Command timed out after 20000ms.
The command may still be running in terminal ID 543e922b-..."

Example from mteb-leaderboard, step 4:

command: python - <<'PY'\nimport urllib.request\nurls=['https://mteb-leaderboard.hf.space', ...]
mode: sync, timeout: 30000ms → "Command timed out after 30000ms"

Timeout distribution across 56 total timeout events:

Range Count %
≤10s 6 10%
11–30s 13 23%
31–60s 8 14%
61–120s 26 46%
>120s 3 5%

46% of timeouts were at exactly 120s — the model's "safe-seeming" default that's still insufficient for builds, installs, and network-bound commands.


How the Current Fix Addresses It

The uncommitted changes on the working tree make two targeted changes:

1. Remove timeout from required:

-required: ['command', 'explanation', 'goal', 'mode', 'timeout']
+required: ['command', 'explanation', 'goal', 'mode']

2. Replace the sparse description with explicit guidance:

-description: 'Timeout in milliseconds that determines how long to wait before returning. Use 0 for no timeout.'
+description: 'Optional hard cap in milliseconds on how long the tool tracks the command before returning. Omit to let the command run to completion (recommended for package installs, builds, and long-running scripts). Use 0 to explicitly indicate no timeout.'

3. Add a "Timeout parameter" section to modelDescription:

Timeout parameter: Only set 'timeout' when you want a hard cap on how long the tool tracks the command. Omit it to let the command run to completion. Package installs, builds, and long-running scripts should usually omit the timeout rather than guessing a value.

4. When timeout is omitted for mode=sync, default to 0 (no timeout):

if (executionOptions.mode === 'sync' && args.timeout === undefined) {
    // Models frequently pick timeouts that are too short for builds/installs
    args.timeout = 0;
}

This means for the majority of mode=sync calls where the model omits timeout, commands will wait indefinitely — exactly right for python3 boot_alpine_vm.py, apt-get install, make, etc. The model can still pass an explicit timeout for cases where it truly wants a cap (e.g., ssh connection attempts that should fail fast).

Copilot AI review requested due to automatic review settings April 22, 2026 18:42
@meganrogge meganrogge self-assigned this Apr 22, 2026
@meganrogge meganrogge added this to the 1.118.0 milestone Apr 22, 2026
@meganrogge meganrogge enabled auto-merge (squash) April 22, 2026 18:42
@github-actions
Copy link
Copy Markdown
Contributor

Screenshot Changes

Base: 31320b4c Current: 03d13e1b

Changed (3)

chat/aiCustomizations/aiCustomizationManagementEditor/McpBrowseMode/Dark
Before After
before after
agentSessionsViewer/CompletedUnread/Dark
Before After
before after
agentSessionsViewer/CompletedUnread/Light
Before After
before after

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the run_in_terminal chat tool to avoid model-chosen short timeouts for long-running terminal commands by making timeout optional and defaulting omitted sync invocations to no-timeout behavior.

Changes:

  • Makes timeout optional in the run_in_terminal input schema and updates its schema description to recommend omission for long-running commands.
  • Extends the tool modelDescription with explicit guidance to omit timeout for long-running scenarios.
  • Defaults args.timeout to 0 when mode='sync' and timeout is omitted.
Show a summary per file
File Description
src/vs/workbench/contrib/terminalContrib/chatAgentTools/browser/tools/runInTerminalTool.ts Adjusts tool prompt/schema to make timeout optional and changes runtime behavior to treat omitted sync timeouts as no-timeout.

Copilot's findings

Comments suppressed due to low confidence (2)

src/vs/workbench/contrib/terminalContrib/chatAgentTools/browser/tools/runInTerminalTool.ts:360

  • Making timeout non-required in the input schema also makes it optional for mode='async'. In invoke, if args.timeout is omitted and the wait strategy is idle, timeoutRacePromise is never created and the tool can wait indefinitely for the initial idle/output signal (it awaits outputMonitor.onDidFinishCommand without a race). Consider either keeping timeout required for mode='async' (schema-level conditional) or assigning a reasonable default timeout when mode='async' and timeout is omitted, while keeping the new "omit for long-running sync commands" behavior.
					type: 'number',
					description: 'Optional hard cap in milliseconds on how long the tool tracks the command before returning. Omit to let the command run to completion (recommended for package installs, builds, and long-running scripts). Use 0 to explicitly indicate no timeout.',
				},
			},
			required: ['command', 'explanation', 'goal', 'mode']

src/vs/workbench/contrib/terminalContrib/chatAgentTools/browser/tools/runInTerminalTool.ts:1185

  • There are existing tests for RunInTerminalTool, but none appear to cover the new behavior where mode='sync' omits timeout and the tool defaults it to 0. Adding a test for this regression path would help ensure the tool no longer errors and that the command isn't prematurely moved to the background due to an implicit/guessed timeout.
		if (executionOptions.mode === 'sync' && args.timeout === undefined) {
			// Timeout is optional for mode=sync: when omitted, the tool waits for
			// the command to complete with no hard cap. Models frequently pick
			// timeouts that are too short for package installs, builds, and
			// long-running scripts, which causes the command to be moved to the
			// background unnecessarily.
			args.timeout = 0;
  • Files reviewed: 1/1 changed files
  • Comments generated: 1

@meganrogge meganrogge merged commit 942c025 into main Apr 22, 2026
30 checks passed
@meganrogge meganrogge deleted the merogge/timeout-update branch April 22, 2026 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants