Experiment with terminal output deltas for repeated polls#315543
Conversation
Add an experimental setting for get_terminal_output polling to return unchanged markers or appended deltas after the first full terminal snapshot. This keeps existing behavior when the flag is disabled and falls back to the current full output when the previous snapshot no longer matches. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds an experimental, stateful “output delta” mode for get_terminal_output polling to reduce repeated prompt/tool-result size when terminal output is unchanged or only appended.
Changes:
- Introduces
chat.tools.terminal.outputDeltas(experimental, auto-enabled experiment flag) configuration setting. - Updates
GetTerminalOutputToolto optionally return unchanged markers or appended-output suffixes on repeated polls (falling back to full output for non-prefix changes). - Adds browser tests covering unchanged output, appended output, and non-prefix rewrite fallback behavior.
Show a summary per file
| File | Description |
|---|---|
| src/vs/workbench/contrib/terminalContrib/chatAgentTools/test/browser/getTerminalOutputTool.test.ts | Adds tests validating unchanged-marker, delta-suffix, and fallback-to-full-output behaviors when the experiment is enabled. |
| src/vs/workbench/contrib/terminalContrib/chatAgentTools/common/terminalChatAgentToolsConfiguration.ts | Registers the new chat.tools.terminal.outputDeltas experimental setting. |
| src/vs/workbench/contrib/terminalContrib/chatAgentTools/browser/tools/getTerminalOutputTool.ts | Implements stateful output snapshotting and delta formatting behind the new configuration flag. |
Copilot's findings
- Files reviewed: 3/3 changed files
- Comments generated: 0
|
Validated that this is working as expected locally with the flag. Everything after the initial terminal poll returns a message like when the terminal hasn't updated since the last poll:
|
|
The map stores full output strings (potentially large) for up to 100 terminals, with no cleanup on terminal disposal. Stale entries only get evicted when the cap is hit via FIFO. Suggestions:
|
meganrogge
left a comment
There was a problem hiding this comment.
Thanks for this! See my comment
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Thanks @meganrogge - I think these are all great suggestions. I made some updates that I believe addresses your feedback and tested again locally. For #3 I went with the hash approach. |
|
Probably not a big deal but worth checking if it works across compaction boundaries since we lose the full tool output in that case if the model reuses terminals/kicks off async tasks, hits compaction, and then resumes tasks |
I was chatting about this with @isidorn as well. I'm not sure how we solve this scenario, other than relying on the compaction call to preserve what is relevant from the previous tool result. |
I think this is a fair assumption to make. |
|
/requires-eval-assessment terminalbench2 gpt-5.4,claude-opus-4.6,claude-opus-4.7 |
|
Seeing how this impacts evals |
|
⏳ Queued vscode build for
|
|
Pretty cool! I did not know we have this evals integration! Also consider doing evals for GPT-5.5 as well, since that is the latest OpenAI model. |
|
/requires-eval-assessment terminalbench2 gpt-5.4,claude-opus-4.6,claude-opus-4.7,gpt-5.5 |
|
⏳ Queued vscode build for
|
|
🚀 Queued eval-assessment publish build for
|
|
🔬 Queued eval-assessment benchmark for
Results will be posted back here when the run completes. |
|
✅ Eval-assessment build published.
|
|
@jukasper I tried to access the https://github.com/github/evald/ links but I do not seem to have access. Can you add me please? |
|
Repeating as we saw a regression on Claude Opus 4.6, but it was not huge. |
|
⏳ Queued vscode build for
|
|
🚀 Queued eval-assessment publish build for
|
|
🔬 Queued eval-assessment benchmark for
Results will be posted back here when the run completes. |
|
✅ Eval-assessment build published.
|
meganrogge
left a comment
There was a problem hiding this comment.
Thanks! Let's see how it goes 😄 . We'll want to kick off evals with and without this setting too.
|
📊 Eval-assessment benchmark complete.
🧪 Results |
|
📊 Eval-assessment benchmark complete.
🧪 Results |
|
📊 Eval-assessment benchmark complete.
🧪 Results |
|
📊 Eval-assessment benchmark complete.
🧪 Results |
This adds an experimental
chat.tools.terminal.outputDeltassetting for repeatedget_terminal_outputpolling.When the experiment is enabled:
This change is stateful across explicit
get_terminal_outputcalls and targets prompt/tool-result replay from terminal polling loops.