Reinforce timeout + do-not-poll guidance in terminal tool descriptions#320141
Conversation
📬 CODENOTIFYThe following users are being notified based on files changed in this PR: @anthonykim1Matched files:
|
There was a problem hiding this comment.
Pull request overview
This PR updates the terminal chat agent tool descriptions to more strongly discourage polling for terminal output and to steer agents toward using sufficiently generous timeout values so sync commands don’t get promoted to background unexpectedly.
Changes:
- Strengthen
run_in_terminal.timeoutparameter guidance with concrete “generous timeout” examples and explicit warning about short timeouts causing background promotion/polling behavior. - Expand
get_terminal_output’smodelDescriptionwith “do not poll; wait for notification” guidance at the tool decision point.
Show a summary per file
| File | Description |
|---|---|
| src/vs/workbench/contrib/terminalContrib/chatAgentTools/browser/tools/runInTerminalTool.ts | Updates the timeout parameter description to steer agents toward longer timeouts and avoid timeout-driven background promotion/polling. |
| src/vs/workbench/contrib/terminalContrib/chatAgentTools/browser/tools/getTerminalOutputTool.ts | Updates the tool description to discourage polling and instruct waiting for completion notifications. |
Copilot's findings
- Files reviewed: 2/2 changed files
- Comments generated: 2
- getTerminalOutput: clarify valid use case (async output inspection) before the do-not-poll guidance - timeout: restore 'Optional' label, add human-readable durations (600000 = 10 min, 900000 = 15 min), and explain background promotion in plain terms the model can reason about Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- getTerminalOutput: cover both async-mode and timed-out-sync use cases instead of restricting to async-only - timeout: restructure to lead with 'if you set a timeout, be generous' so the numeric recommendations are clearly conditional, avoiding contradiction with the omit-entirely guidance Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/requires-eval-assessment terminalbench2 gpt-5.4,claude-opus-4.6,claude-opus-4.7 |
|
⏳ Queued vscode build for
|
|
🔄 First vscode build failed; retried failed stages: https://dev.azure.com/monacotools/Monaco/_build/results?buildId=445312 |
|
⏳ Queued vscode build for
|
|
🚀 Queued eval-assessment publish build for
|
|
✅ Eval-assessment build published.
|
|
🔬 Queued eval-assessment benchmark for
Results will be posted back here when the run completes. |
|
📊 Eval-assessment benchmark complete.
🧪 Results |
|
📊 Eval-assessment benchmark complete.
🧪 Results |
|
📊 Eval-assessment benchmark complete.
🧪 Results |
Addresses microsoft/vscode-internalbacklog#7870.
The agent has been observed polling
get_terminal_outputfor ~27 steps across threenpmcommands when a too-short timeout (120s for a 3–5 minute install) caused sync→background promotion. The system prompt preamble already says "do NOT poll", andrunInTerminal'smodelDescriptionalready recommends generous timeouts — but neither of those signals attaches to the spot where the model is actually making the decision (thetimeoutparameter description forrunInTerminal, and the top-level description forget_terminal_output).This PR reinforces both signals at the point of use:
runInTerminal→timeoutparameter descriptionBefore:
After:
getTerminalOutput→modelDescriptionBefore:
After:
Why not pull these from the system prompt preamble?
The preamble already contains both pieces of guidance, but models attend most strongly to tool parameter descriptions at call time, and the preamble has to compete with many other paragraphs. Restating the rule at the decision point is what gets it followed.
Risks / follow-ups
Tests
No assertions reference these exact strings (verified via grep for
"Optional hard cap in milliseconds"and"Get output from an active terminal"). No test changes required.