trackIdleOnPrompt logging by meganrogge · Pull Request #313080 · microsoft/vscode

meganrogge · 2026-04-28T19:18:04Z

Problem

Eval run 25073061392 (Ubuntu 24.04, bash 5.2) had 20 tasks that hung indefinitely (1-hour eval timeout). The trackIdleOnPrompt state machine — which detects when a command finishes by monitoring shell integration sequences and data idle — is currently a complete black box in the logs. We can see commands start and finish, but nothing about internal state transitions, timer fires, or why the state machine failed to resolve.

Observed behavior

After certain commands (often containing set -e, but not exclusively), shell integration breaks mid-session. Subsequent commands return exitCode=undefined, result.length=0 and resolve only via fallback timers (~11s). Eventually a command hangs forever because no fallback fires.

Log evidence — winning-avg-corewars task: the first two set -e commands complete normally (exitCode=0), the third breaks SI, and the final command hangs indefinitely:

19:42:01.271 Finished with exitCode `0`      ← set -e command #1, SI working
19:43:17.339 Finished with exitCode `0`      ← set -e command #2, SI working  
19:43:46.945 Shell integration failed to add capabilities within 10 seconds
19:44:07.896 Finished with exitCode `undefined`, result.length `1100`  ← SI broken
...
19:45:28.819 Using `rich` execute strategy    ← final command
                                               ← NO "Finished" line. Hangs forever.

What we don't know

The April 6 eval (run 24049820070) had 38 set -e tasks with zero hangs on an older build. The April 28 eval (run 25073061392) had 26 set -e tasks with 20 hangs. The set -e correlation is strong but doesn't fully explain the regression — something else changed between these builds. Without visibility into trackIdleOnPrompt's internal state, we can't determine the exact failure mode.

Changes

1. `trackIdleOnPrompt` diagnostic logging (`executeStrategy.ts`)

Add INFO-level logging to the state machine that's currently invisible in logs:

State transitions: Initial -> Executing (sequence C), Executing -> PromptAfterExecuting (sequence A)
Timer fires: Initial fallback fired, no data events received, Executing fallback fired after 30s data-idle
Data event counts: dataEvents=N on each timer fire

This will make the next eval run fully diagnosable — we'll see exactly where and why commands hang.

2. Bash SI `set -e` protection (`shellIntegration-bash.sh`)

Wrap all 4 SI handler functions with set +e / restore + return 0:

__vsc_prompt_cmd — main PROMPT_COMMAND handler
__vsc_prompt_cmd_original — wrapper for user's original PROMPT_COMMAND
__vsc_preexec_only — DEBUG trap preexec (emits 633;C)
__vsc_preexec_all — DEBUG trap combined handler

On all bash versions, the DEBUG trap is NOT protected from set -e — if any statement returns non-zero, the handler aborts before emitting 633;C (command start). On macOS bash 3.2, PROMPT_COMMAND is also unprotected. This fix hardens both paths. While set -e may not be the sole root cause (see above), this is good defense-in-depth.

3. Tool description update (`runInTerminalTool.ts`)

Instruct the model to pair set -e with set +e at the end of bash commands, reducing the chance of set -e persisting across commands.

4. Trace log placement fix (`shellIntegrationAddon.ts`)

Move trace logs from _handleFinalTermSequence/_handleVSCodeSequence into _doHandle* methods per review feedback.

Copilot

Pull request overview

Improves terminal shell integration reliability for bash and makes xterm shell-integration sequence processing more diagnosable via trace logging, targeting intermittent mid-session SI breakages (issue #313074).

Changes:

Disable and restore errexit (set -e) inside bash DEBUG-trap preexec handlers to prevent bash from dropping the DEBUG trap.
Add trace logs for received VS Code (OSC 633) and FinalTerm (OSC 133) sequence types.

Show a summary per file

File	Description
src/vs/workbench/contrib/terminal/common/scripts/shellIntegration-bash.sh	Wrap DEBUG trap handler logic with save/restore of `errexit` to prevent trap removal under `set -e`.
src/vs/platform/terminal/common/xterm/shellIntegrationAddon.ts	Add trace logging of received shell integration sequence types for debugging SI dropouts.

Copilot's findings

Comments suppressed due to low confidence (1)

src/vs/platform/terminal/common/xterm/shellIntegrationAddon.ts:445

Same as above: _handleVSCodeSequence does a parse purely for logging, but _doHandleVSCodeSequence immediately repeats indexOf/substring to derive command. If trace logs are meant to be cheap, consider logging from the already-parsed command (or gating by log level) to avoid duplicated parsing.

		const argsIndex = data.indexOf(';');
		const sequenceType = argsIndex === -1 ? data : data.substring(0, argsIndex);
		this._logService.trace(`ShellIntegrationAddon#_handleVSCodeSequence: received sequence ${sequenceType}`);
		const didHandle = this._doHandleVSCodeSequence(data);

Files reviewed: 2/2 changed files
Comments generated: 2

When shell integration breaks mid-session and the model doesn't specify a timeout, the foreground race had no timeout at all — leading to indefinite hangs (observed as 1-hour eval timeouts). Root cause analysis of eval run 25073061392 (20 hanging tasks): - For sync mode, args.timeout is set to 0 when model omits it, so timeoutValue > 0 is false and no timeout race candidate is added - trackIdleOnPrompt's fallback timers should resolve, but onData events may not fire reliably (xterm write callbacks depend on requestAnimationFrame which can be throttled in headless environments) - When onData does fire with a 633;C sequence, state transitions to Executing — neutering both initialFallbackScheduler and promptFallbackScheduler — and executingFallbackScheduler can be indefinitely rescheduled by periodic data events Changes: 1. Add 5-minute safety-net timeout to foreground race when no model timeout is present, preventing indefinite hangs regardless of root cause 2. Add INFO-level logging to trackIdleOnPrompt state machine (state transitions, timer fires, data event counts) so the next eval run will reveal exactly why fallback timers fail to resolve Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Legitimate foreground commands (compilations, test suites) can easily exceed 5 minutes. The safety-net timeout would interrupt them, returning incomplete output and potentially causing the model to take wrong actions. The set -e bash protection and trackIdleOnPrompt logging are the real fixes. For eval environments, the model should specify an explicit timeout. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

vs-code-engineering · 2026-04-29T13:22:23Z

⏳ Queued vscode build for 04eda7d278ea10d520a66dd8cf289e80edf99bf5 (step 1/2).

Build: https://dev.azure.com/monacotools/Monaco/_build/results?buildId=434847
When this succeeds, the eval-assessment publish build will be queued automatically.

meganrogge · 2026-04-29T13:25:28Z

/requires-eval-assessment terminalbench2 gpt-5.4,claude-opus-4.6,claude-opus-4.7

vs-code-engineering · 2026-04-29T14:11:12Z

🚀 Queued eval-assessment publish build for c1548822493ed9e25a5a45b2c2c892be6f191973 (step 2/2).

Pipeline run: https://dev.azure.com/monacotools/Monaco/_build/results?buildId=434870
On success, publishes @vscode/vscode-copilot-evaluation-agent@0.0.0-dev.c154882249 on the dev tag.

vs-code-engineering · 2026-04-29T14:23:32Z

🔬 Queued eval-assessment benchmark for b969e48125.

Package: @vscode/vscode-copilot-evaluation-agent@0.0.0-dev.c154882249 (dev tag)
Benchmark: terminalbench2
Tracking issues:
- gpt-5.4: https://github.com/github/evald/issues/18775
- claude-opus-4.6: https://github.com/github/evald/issues/18776
- claude-opus-4.7: https://github.com/github/evald/issues/18777

Results will be posted back here when the run completes.

vs-code-engineering · 2026-04-29T14:33:19Z

✅ Eval-assessment build published.

Package: @vscode/vscode-copilot-evaluation-agent@0.0.0-dev.c154882249 (tag: dev)
Install: npm install @vscode/vscode-copilot-evaluation-agent@0.0.0-dev.c154882249
Pipeline run: https://dev.azure.com/monacotools/Monaco/_build/results?buildId=434870

vs-code-engineering · 2026-04-29T20:08:19Z

📊 Eval-assessment benchmark complete.

Tracking issue: https://github.com/github/evald/issues/18777
Publish build: https://dev.azure.com/monacotools/Monaco/_build/results?buildId=434870

Analysis Results

Resolution Rate

Benchmark	Total Cases	Passed	Failed	Resolved Rate
terminalbench2	89	54	35	60.67%

Token Usage

Metric	Value
Total Tokens	44,873,263
Input Tokens	44,007,532
Output Tokens	865,731
Cached Tokens	41,681,032

Step Counts

Metric	Value
Total Steps	1,442
Mean Steps/Instance	16.20

vs-code-engineering · 2026-04-29T20:38:27Z

📊 Eval-assessment benchmark complete.

Tracking issue: https://github.com/github/evald/issues/18775
Publish build: https://dev.azure.com/monacotools/Monaco/_build/results?buildId=434870

Analysis Results

Resolution Rate

Benchmark	Total Cases	Passed	Failed	Resolved Rate
terminalbench2	89	47	42	52.81%

Token Usage

Metric	Value
Total Tokens	68,706,378
Input Tokens	67,780,506
Output Tokens	925,872
Cached Tokens	55,488,768

Step Counts

Metric	Value
Total Steps	1,513
Mean Steps/Instance	17.00

vs-code-engineering · 2026-04-29T20:42:45Z

📊 Eval-assessment benchmark complete.

Tracking issue: https://github.com/github/evald/issues/18776
Publish build: https://dev.azure.com/monacotools/Monaco/_build/results?buildId=434870

Analysis Results

Resolution Rate

Benchmark	Total Cases	Passed	Failed	Resolved Rate
terminalbench2	89	54	35	60.67%

Token Usage

Metric	Value
Total Tokens	66,437,478
Input Tokens	65,261,731
Output Tokens	1,175,747
Cached Tokens	54,700,590

Step Counts

Metric	Value
Total Steps	1,988
Mean Steps/Instance	22.34

Copilot AI review requested due to automatic review settings April 28, 2026 19:18

meganrogge self-assigned this Apr 28, 2026

meganrogge added this to the 1.119.0 milestone Apr 28, 2026

Copilot started reviewing on behalf of meganrogge April 28, 2026 19:18 View session

meganrogge requested review from anthonykim1 and vritant24 April 28, 2026 19:19

Copilot AI reviewed Apr 28, 2026

View reviewed changes

Comment thread src/vs/workbench/contrib/terminal/common/scripts/shellIntegration-bash.sh Outdated

Comment thread src/vs/platform/terminal/common/xterm/shellIntegrationAddon.ts Outdated

meganrogge force-pushed the meganrogge/fix-si-flake branch from 736f57f to ce6f69a Compare April 28, 2026 19:31

meganrogge mentioned this pull request Apr 28, 2026

Shell integration intermittently breaks mid-session in agent terminal commands #313074

Closed

meganrogge changed the title ~~terminal: protect bash SI DEBUG trap from set -e and add trace logging~~ terminal: protect bash SI from set -e in PROMPT_COMMAND and add trace logging Apr 28, 2026

meganrogge force-pushed the meganrogge/fix-si-flake branch from ce6f69a to 70a15a1 Compare April 28, 2026 21:49

meganrogge changed the title ~~terminal: protect bash SI from set -e in PROMPT_COMMAND and add trace logging~~ add logging Apr 28, 2026

meganrogge force-pushed the meganrogge/fix-si-flake branch from 70a15a1 to 3cf1ee8 Compare April 28, 2026 21:50

terminal: add shell integration trace logging

4aebd90

meganrogge force-pushed the meganrogge/fix-si-flake branch from 3cf1ee8 to 4aebd90 Compare April 28, 2026 21:51

meganrogge enabled auto-merge (squash) April 28, 2026 21:52

meganrogge changed the title ~~add logging~~ Fix terminal SI flake: set -e protection, foreground safety timeout, trackIdleOnPrompt logging Apr 28, 2026

meganrogge and others added 2 commits April 28, 2026 19:36

Merge branch 'main' into meganrogge/fix-si-flake

04eda7d

meganrogge changed the title ~~Fix terminal SI flake: set -e protection, foreground safety timeout, trackIdleOnPrompt logging~~ Fix terminal SI flake: set -e protection + trackIdleOnPrompt diagnostic logging Apr 28, 2026

meganrogge changed the title ~~Fix terminal SI flake: set -e protection + trackIdleOnPrompt diagnostic logging~~ trackIdleOnPrompt logging Apr 28, 2026

meganrogge added the ~requires-eval-assessment Evals will be run and will generate a report upon completion label Apr 29, 2026

Merge branch 'main' into meganrogge/fix-si-flake

be83708

meganrogge removed the ~requires-eval-assessment Evals will be run and will generate a report upon completion label Apr 29, 2026

vs-code-engineering Bot added the ~requires-eval-assessment Evals will be run and will generate a report upon completion label Apr 29, 2026

vs-code-engineering Bot removed the ~requires-eval-assessment Evals will be run and will generate a report upon completion label Apr 29, 2026

alexr00 approved these changes Apr 29, 2026

View reviewed changes

meganrogge merged commit 8f9e979 into main Apr 29, 2026
26 checks passed

meganrogge deleted the meganrogge/fix-si-flake branch April 29, 2026 15:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

trackIdleOnPrompt logging#313080

trackIdleOnPrompt logging#313080
meganrogge merged 5 commits into
mainfrom
meganrogge/fix-si-flake

meganrogge commented Apr 28, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

vs-code-engineering Bot commented Apr 29, 2026

Uh oh!

meganrogge commented Apr 29, 2026

Uh oh!

vs-code-engineering Bot commented Apr 29, 2026

Uh oh!

vs-code-engineering Bot commented Apr 29, 2026

Uh oh!

vs-code-engineering Bot commented Apr 29, 2026

Uh oh!

Uh oh!

vs-code-engineering Bot commented Apr 29, 2026

Uh oh!

vs-code-engineering Bot commented Apr 29, 2026

Uh oh!

vs-code-engineering Bot commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

meganrogge commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Observed behavior

What we don't know

Changes

1. trackIdleOnPrompt diagnostic logging (executeStrategy.ts)

2. Bash SI set -e protection (shellIntegration-bash.sh)

3. Tool description update (runInTerminalTool.ts)

4. Trace log placement fix (shellIntegrationAddon.ts)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Uh oh!

Uh oh!

vs-code-engineering Bot commented Apr 29, 2026

Uh oh!

meganrogge commented Apr 29, 2026

Uh oh!

vs-code-engineering Bot commented Apr 29, 2026

Uh oh!

vs-code-engineering Bot commented Apr 29, 2026

Uh oh!

vs-code-engineering Bot commented Apr 29, 2026

Uh oh!

Uh oh!

vs-code-engineering Bot commented Apr 29, 2026

Analysis Results

Resolution Rate

Token Usage

Step Counts

Uh oh!

vs-code-engineering Bot commented Apr 29, 2026

Analysis Results

Resolution Rate

Token Usage

Step Counts

Uh oh!

vs-code-engineering Bot commented Apr 29, 2026

Analysis Results

Resolution Rate

Token Usage

Step Counts

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

meganrogge commented Apr 28, 2026 •

edited

Loading

1. `trackIdleOnPrompt` diagnostic logging (`executeStrategy.ts`)

2. Bash SI `set -e` protection (`shellIntegration-bash.sh`)

3. Tool description update (`runInTerminalTool.ts`)

4. Trace log placement fix (`shellIntegrationAddon.ts`)