Skip to content

Experiment with terminal output deltas for repeated polls#315543

Merged
meganrogge merged 2 commits into
microsoft:mainfrom
kevin-m-kent:kevin-m-kent/terminal-output-poll-deltas
May 13, 2026
Merged

Experiment with terminal output deltas for repeated polls#315543
meganrogge merged 2 commits into
microsoft:mainfrom
kevin-m-kent:kevin-m-kent/terminal-output-poll-deltas

Conversation

@kevin-m-kent
Copy link
Copy Markdown
Contributor

@kevin-m-kent kevin-m-kent commented May 10, 2026

This adds an experimental chat.tools.terminal.outputDeltas setting for repeated get_terminal_output polling.

When the experiment is enabled:

  • The first poll for a terminal execution returns the full output, preserving current context.
  • Repeated polls with identical output return a short unchanged marker instead of replaying the full terminal buffer.
  • Polls with appended output return only the new suffix.
  • Non-prefix changes, such as screen rewrites or truncated scrollback, fall back to returning the current full output.

This change is stateful across explicit get_terminal_output calls and targets prompt/tool-result replay from terminal polling loops.

Add an experimental setting for get_terminal_output polling to return unchanged markers or appended deltas after the first full terminal snapshot. This keeps existing behavior when the flag is disabled and falls back to the current full output when the previous snapshot no longer matches.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 10, 2026 15:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an experimental, stateful “output delta” mode for get_terminal_output polling to reduce repeated prompt/tool-result size when terminal output is unchanged or only appended.

Changes:

  • Introduces chat.tools.terminal.outputDeltas (experimental, auto-enabled experiment flag) configuration setting.
  • Updates GetTerminalOutputTool to optionally return unchanged markers or appended-output suffixes on repeated polls (falling back to full output for non-prefix changes).
  • Adds browser tests covering unchanged output, appended output, and non-prefix rewrite fallback behavior.
Show a summary per file
File Description
src/vs/workbench/contrib/terminalContrib/chatAgentTools/test/browser/getTerminalOutputTool.test.ts Adds tests validating unchanged-marker, delta-suffix, and fallback-to-full-output behaviors when the experiment is enabled.
src/vs/workbench/contrib/terminalContrib/chatAgentTools/common/terminalChatAgentToolsConfiguration.ts Registers the new chat.tools.terminal.outputDeltas experimental setting.
src/vs/workbench/contrib/terminalContrib/chatAgentTools/browser/tools/getTerminalOutputTool.ts Implements stateful output snapshotting and delta formatting behind the new configuration flag.

Copilot's findings

  • Files reviewed: 3/3 changed files
  • Comments generated: 0

@kevin-m-kent
Copy link
Copy Markdown
Contributor Author

kevin-m-kent commented May 11, 2026

Validated that this is working as expected locally with the flag. Everything after the initial terminal poll returns a message like when the terminal hasn't updated since the last poll:

Output of terminal f7a744dc-de40-46e7-a631-776c4d73ea22 unchanged since previous poll (161 characters already shown). No new output.

@meganrogge
Copy link
Copy Markdown
Collaborator

meganrogge commented May 11, 2026

The map stores full output strings (potentially large) for up to 100 terminals, with no cleanup on terminal disposal. Stale entries only get evicted when the cap is hit via FIFO.

Suggestions:

  1. Clear on dispose — override dispose() to call this._lastOutputByTerminalId.clear().
  2. Listen for terminal disposal — register a listener (e.g. onDidDisposeInstance) to delete the entry for that terminal ID, so we don't hold output strings for terminals that no longer exist.
  3. Consider storing a hash + length instead of the full string — if we only need the "unchanged" and "startsWith" checks, we could store a hash of the previous output and the length. We'd lose the ability to do the prefix check cheaply, but it would drastically reduce memory. Alternatively, keep the current approach but lower the cap or add a per-entry size limit.

Copy link
Copy Markdown
Collaborator

@meganrogge meganrogge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! See my comment

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@kevin-m-kent
Copy link
Copy Markdown
Contributor Author

kevin-m-kent commented May 11, 2026

The map stores full output strings (potentially large) for up to 100 terminals, with no cleanup on terminal disposal. Stale entries only get evicted when the cap is hit via FIFO.

Suggestions:

  1. Clear on dispose — override dispose() to call this._lastOutputByTerminalId.clear().
  2. Listen for terminal disposal — register a listener (e.g. onDidDisposeInstance) to delete the entry for that terminal ID, so we don't hold output strings for terminals that no longer exist.
  3. Consider storing a hash + length instead of the full string — if we only need the "unchanged" and "startsWith" checks, we could store a hash of the previous output and the length. We'd lose the ability to do the prefix check cheaply, but it would drastically reduce memory. Alternatively, keep the current approach but lower the cap or add a per-entry size limit.

Thanks @meganrogge - I think these are all great suggestions. I made some updates that I believe addresses your feedback and tested again locally. For #3 I went with the hash approach.

@kevin-m-kent kevin-m-kent requested a review from meganrogge May 11, 2026 20:05
@bhavyaus
Copy link
Copy Markdown
Collaborator

Probably not a big deal but worth checking if it works across compaction boundaries since we lose the full tool output in that case if the model reuses terminals/kicks off async tasks, hits compaction, and then resumes tasks

@kevin-m-kent
Copy link
Copy Markdown
Contributor Author

Probably not a big deal but worth checking if it works across compaction boundaries since we lose the full tool output in that case if the model reuses terminals/kicks off async tasks, hits compaction, and then resumes tasks

I was chatting about this with @isidorn as well. I'm not sure how we solve this scenario, other than relying on the compaction call to preserve what is relevant from the previous tool result.

@isidorn
Copy link
Copy Markdown
Collaborator

isidorn commented May 12, 2026

relying on the compaction call to preserve what is relevant from the previous tool result.

I think this is a fair assumption to make.

@meganrogge
Copy link
Copy Markdown
Collaborator

/requires-eval-assessment terminalbench2 gpt-5.4,claude-opus-4.6,claude-opus-4.7

@meganrogge meganrogge added the ~requires-eval-assessment Evals will be run and will generate a report upon completion label May 12, 2026
@meganrogge
Copy link
Copy Markdown
Collaborator

Seeing how this impacts evals

@vs-code-engineering
Copy link
Copy Markdown
Contributor

⏳ Queued vscode build for f0bd87449265b04584b1d082863a7bd1ef6d01b6 (step 1/2).

@isidorn
Copy link
Copy Markdown
Collaborator

isidorn commented May 12, 2026

Pretty cool! I did not know we have this evals integration!

Also consider doing evals for GPT-5.5 as well, since that is the latest OpenAI model.

@meganrogge
Copy link
Copy Markdown
Collaborator

/requires-eval-assessment terminalbench2 gpt-5.4,claude-opus-4.6,claude-opus-4.7,gpt-5.5

@vs-code-engineering vs-code-engineering Bot removed the ~requires-eval-assessment Evals will be run and will generate a report upon completion label May 12, 2026
@meganrogge meganrogge added the ~requires-eval-assessment Evals will be run and will generate a report upon completion label May 12, 2026
@vs-code-engineering
Copy link
Copy Markdown
Contributor

⏳ Queued vscode build for f0bd87449265b04584b1d082863a7bd1ef6d01b6 (step 1/2).

@vs-code-engineering
Copy link
Copy Markdown
Contributor

🚀 Queued eval-assessment publish build for 858d4286b1e2e7ae6586ed44dad2c11af46cab92 (step 2/2).

@vs-code-engineering
Copy link
Copy Markdown
Contributor

🔬 Queued eval-assessment benchmark for 95df7cf309.

Results will be posted back here when the run completes.

@vs-code-engineering
Copy link
Copy Markdown
Contributor

✅ Eval-assessment build published.

@vs-code-engineering vs-code-engineering Bot removed the ~requires-eval-assessment Evals will be run and will generate a report upon completion label May 12, 2026
@isidorn
Copy link
Copy Markdown
Collaborator

isidorn commented May 13, 2026

@jukasper I tried to access the https://github.com/github/evald/ links but I do not seem to have access. Can you add me please?

@meganrogge
Copy link
Copy Markdown
Collaborator

Repeating as we saw a regression on Claude Opus 4.6, but it was not huge.

@meganrogge meganrogge added the ~requires-eval-assessment Evals will be run and will generate a report upon completion label May 13, 2026
@vs-code-engineering
Copy link
Copy Markdown
Contributor

⏳ Queued vscode build for f0bd87449265b04584b1d082863a7bd1ef6d01b6 (step 1/2).

@microsoft microsoft deleted a comment from vs-code-engineering Bot May 13, 2026
@microsoft microsoft deleted a comment from vs-code-engineering Bot May 13, 2026
@microsoft microsoft deleted a comment from vs-code-engineering Bot May 13, 2026
@microsoft microsoft deleted a comment from vs-code-engineering Bot May 13, 2026
@vs-code-engineering
Copy link
Copy Markdown
Contributor

🚀 Queued eval-assessment publish build for 537b6f0cc8115f78a903d0e1b2ab2706423aa2cc (step 2/2).

@vs-code-engineering
Copy link
Copy Markdown
Contributor

🔬 Queued eval-assessment benchmark for 641f97f4c9.

Results will be posted back here when the run completes.

@vs-code-engineering
Copy link
Copy Markdown
Contributor

✅ Eval-assessment build published.

@vs-code-engineering vs-code-engineering Bot removed the ~requires-eval-assessment Evals will be run and will generate a report upon completion label May 13, 2026
Copy link
Copy Markdown
Collaborator

@meganrogge meganrogge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Let's see how it goes 😄 . We'll want to kick off evals with and without this setting too.

@vs-code-engineering
Copy link
Copy Markdown
Contributor

📊 Eval-assessment benchmark complete.

🧪 Results

@meganrogge meganrogge merged commit 91f7718 into microsoft:main May 13, 2026
25 checks passed
@vs-code-engineering vs-code-engineering Bot added this to the 1.121.0 milestone May 13, 2026
@vs-code-engineering
Copy link
Copy Markdown
Contributor

📊 Eval-assessment benchmark complete.

🧪 Results

@vs-code-engineering
Copy link
Copy Markdown
Contributor

📊 Eval-assessment benchmark complete.

🧪 Results

@vs-code-engineering
Copy link
Copy Markdown
Contributor

📊 Eval-assessment benchmark complete.

🧪 Results

NikolaRHristov pushed a commit to CodeEditorLand/Editor that referenced this pull request May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants