Skip to content

Reinforce timeout + do-not-poll guidance in terminal tool descriptions#320141

Merged
meganrogge merged 6 commits into
mainfrom
megan/terminal-tool-desc-no-poll
Jun 5, 2026
Merged

Reinforce timeout + do-not-poll guidance in terminal tool descriptions#320141
meganrogge merged 6 commits into
mainfrom
megan/terminal-tool-desc-no-poll

Conversation

@meganrogge
Copy link
Copy Markdown
Collaborator

Addresses microsoft/vscode-internalbacklog#7870.

The agent has been observed polling get_terminal_output for ~27 steps across three npm commands when a too-short timeout (120s for a 3–5 minute install) caused sync→background promotion. The system prompt preamble already says "do NOT poll", and runInTerminal's modelDescription already recommends generous timeouts — but neither of those signals attaches to the spot where the model is actually making the decision (the timeout parameter description for runInTerminal, and the top-level description for get_terminal_output).

This PR reinforces both signals at the point of use:

runInTerminaltimeout parameter description

Before:

Optional hard cap in milliseconds on how long the tool tracks the command before returning. Omit to let the command run to completion (recommended for package installs, builds, and long-running scripts). Use 0 to explicitly indicate no timeout.

After:

Optional hard cap in milliseconds on how long the tool tracks the command before returning. Recommended: 600000 (10 min) for package installs, 900000 (15 min) for large builds. Omit entirely for commands that should run to completion (the safest default for installs and builds). Use 0 to explicitly indicate no timeout. A too-short timeout forces background promotion and triggers polling — prefer generous timeouts.

getTerminalOutputmodelDescription

Before:

Get output from an active terminal execution (identified by the id returned from run_in_terminal).

After:

Get output from an active terminal execution (identified by the id returned from run_in_terminal). Only use this tool if you need to inspect output from a terminal that was started in async mode and you have a concrete reason to read it now. If a sync command timed out and moved to the background, you will be automatically notified on your next turn when it completes — do NOT poll with this tool. End your turn and wait for the notification.

Why not pull these from the system prompt preamble?

The preamble already contains both pieces of guidance, but models attend most strongly to tool parameter descriptions at call time, and the preamble has to compete with many other paragraphs. Restating the rule at the decision point is what gets it followed.

Risks / follow-ups

  • The two strings are longer now (more prompt tokens per tool definition). Estimated cost: ~80 extra tokens combined per request that includes these tools — small compared to the ~27 polling steps each ~1.5 KB of output that this is trying to prevent.
  • Issue find free port based on node-portfinder package #7870 (proposal P5c) also asks us to verify the notification mechanism is reliable; that's a separate investigation and not in this PR.

Tests

No assertions reference these exact strings (verified via grep for "Optional hard cap in milliseconds" and "Get output from an active terminal"). No test changes required.

Copilot AI review requested due to automatic review settings June 5, 2026 16:58
@vs-code-engineering
Copy link
Copy Markdown
Contributor

vs-code-engineering Bot commented Jun 5, 2026

📬 CODENOTIFY

The following users are being notified based on files changed in this PR:

@anthonykim1

Matched files:

  • src/vs/workbench/contrib/terminalContrib/chatAgentTools/browser/tools/getTerminalOutputTool.ts
  • src/vs/workbench/contrib/terminalContrib/chatAgentTools/browser/tools/runInTerminalTool.ts

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the terminal chat agent tool descriptions to more strongly discourage polling for terminal output and to steer agents toward using sufficiently generous timeout values so sync commands don’t get promoted to background unexpectedly.

Changes:

  • Strengthen run_in_terminal.timeout parameter guidance with concrete “generous timeout” examples and explicit warning about short timeouts causing background promotion/polling behavior.
  • Expand get_terminal_output’s modelDescription with “do not poll; wait for notification” guidance at the tool decision point.
Show a summary per file
File Description
src/vs/workbench/contrib/terminalContrib/chatAgentTools/browser/tools/runInTerminalTool.ts Updates the timeout parameter description to steer agents toward longer timeouts and avoid timeout-driven background promotion/polling.
src/vs/workbench/contrib/terminalContrib/chatAgentTools/browser/tools/getTerminalOutputTool.ts Updates the tool description to discourage polling and instruct waiting for completion notifications.

Copilot's findings

  • Files reviewed: 2/2 changed files
  • Comments generated: 2

Megan Rogge and others added 2 commits June 5, 2026 13:02
- getTerminalOutput: clarify valid use case (async output inspection)
  before the do-not-poll guidance
- timeout: restore 'Optional' label, add human-readable durations
  (600000 = 10 min, 900000 = 15 min), and explain background promotion
  in plain terms the model can reason about

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@meganrogge meganrogge self-assigned this Jun 5, 2026
@meganrogge meganrogge added this to the 1.125.0 milestone Jun 5, 2026
- getTerminalOutput: cover both async-mode and timed-out-sync use cases
  instead of restricting to async-only
- timeout: restructure to lead with 'if you set a timeout, be generous'
  so the numeric recommendations are clearly conditional, avoiding
  contradiction with the omit-entirely guidance

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@meganrogge
Copy link
Copy Markdown
Collaborator Author

/requires-eval-assessment terminalbench2 gpt-5.4,claude-opus-4.6,claude-opus-4.7

@meganrogge meganrogge added the ~requires-eval-assessment Evals will be run and will generate a report upon completion label Jun 5, 2026
@vs-code-engineering
Copy link
Copy Markdown
Contributor

⏳ Queued vscode build for d6b25ffd4811be6c53c41c8680ee122b162fe253 (step 1/2).

@vs-code-engineering
Copy link
Copy Markdown
Contributor

🔄 First vscode build failed; retried failed stages: https://dev.azure.com/monacotools/Monaco/_build/results?buildId=445312

@vs-code-engineering vs-code-engineering Bot removed the ~requires-eval-assessment Evals will be run and will generate a report upon completion label Jun 5, 2026
@meganrogge meganrogge added the ~requires-eval-assessment Evals will be run and will generate a report upon completion label Jun 5, 2026
@vs-code-engineering
Copy link
Copy Markdown
Contributor

⏳ Queued vscode build for f465bdf99855f7830d5cbde95684d5792ae623de (step 1/2).

@meganrogge meganrogge modified the milestones: 1.125.0, 1.124.0 Jun 5, 2026
@meganrogge meganrogge enabled auto-merge (squash) June 5, 2026 20:15
@meganrogge meganrogge merged commit c2d6b5a into main Jun 5, 2026
25 checks passed
@meganrogge meganrogge deleted the megan/terminal-tool-desc-no-poll branch June 5, 2026 20:23
@vs-code-engineering
Copy link
Copy Markdown
Contributor

🚀 Queued eval-assessment publish build for 5b01c33963060dbf8a51a628e8db202435bc7e12 (step 2/2).

@vs-code-engineering
Copy link
Copy Markdown
Contributor

✅ Eval-assessment build published.

@vs-code-engineering vs-code-engineering Bot removed the ~requires-eval-assessment Evals will be run and will generate a report upon completion label Jun 5, 2026
@vs-code-engineering
Copy link
Copy Markdown
Contributor

🔬 Queued eval-assessment benchmark for 96e8d26961.

Results will be posted back here when the run completes.

@vs-code-engineering
Copy link
Copy Markdown
Contributor

📊 Eval-assessment benchmark complete.

🧪 Results

@vs-code-engineering
Copy link
Copy Markdown
Contributor

📊 Eval-assessment benchmark complete.

🧪 Results

@vs-code-engineering
Copy link
Copy Markdown
Contributor

📊 Eval-assessment benchmark complete.

🧪 Results

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants