Skip to content

Agent host: clearer worktree git timeout errors and 60s budget#318242

Merged
roblourens merged 2 commits into
mainfrom
agents/vsckb-explore-look-at-the-logs-in-users-432a75fe
May 26, 2026
Merged

Agent host: clearer worktree git timeout errors and 60s budget#318242
roblourens merged 2 commits into
mainfrom
agents/vsckb-explore-look-at-the-logs-in-users-432a75fe

Conversation

@roblourens
Copy link
Copy Markdown
Member

@roblourens roblourens commented May 25, 2026

Problem

A user reported createSession failing on an SSH remote with this misleading error in the side-effects log:

[AgentSideEffects] sendMessage failed for session=copilotcli:/537534a9...:
code=undefined,
message=Preparing worktree (new branch 'agents/vsckb-implement-please-investigate-...')
Updating files:   0% (7/14834) ... Updating files:   1% (149/14834)

The remote (a Mac Air) happened to be under heavy disk-I/O contention from many concurrent npm install runs in other worktrees, so git worktree add only managed 149 of 14,834 files in 30s and got killed by the timeout in AgentHostGitService.addWorktree.

Two underlying bugs surfaced:

  1. Error reporting was misleading. In _runGit, the timeout path did reject(new Error(stderr || error.message)). When git was killed before producing any diagnostic stderr, the surfaced error was just the progress meter, with code=undefined and no hint of what really happened.
  2. 30s is tight for worktree ops on a large repo when the remote is under load. Normally sub-second, but no headroom for transient I/O storms.

Changes

  • All three worktree operations (addWorktree, addExistingWorktree, removeWorktree --force) now use a 60s timeout instead of 30s. All three can do real filesystem work proportional to the worktree's content, so they share the same bumping consistently rather than leaving removeWorktree at 30s.rationale

  • _runGit: replaced execFile's built-in timeout option with our own setTimeout that flips a didTimeOut flag before calling child.kill(). The flag lets us definitively distinguish a timeout from other kill paths (rather than "killed && signal, likely a timeout"). The timer is cleared via child.on('exit').

  • New exported helper formatGitError: classifies the failure (timeout / signal / exit code / other) and produces messages like:

    • git worktree timed out after 60000ms: Updating files: 0% (149/14834)
    • git worktree exited with code 128: fatal: invalid reference: missing-branch
    • spawn git ENOENT

    The underlying ExecFileException is preserved as Error.cause so downstream consumers can still inspect it.

  • summarizeStderrForError helper: git progress output is full of \r-separated repaints (e.g. Updating files: 0% (7/14834)\rUpdating files: 0% (149/14834)\r). Stripping naively still left a noisy multi-line blob in the error. The helper now keeps only the last non-empty line and caps at 200 chars, so the error message is a clean one-liner.

  • Full stderr is still logged. Since the thrown error message is summarized, _runGit now logs the complete unmodified stderr at warn level via ILogService on any git failure, so no diagnostic context is lost.

  • Unit tests for formatGitError (four classification branches) and summarizeStderrForError (empty / whitespace / multi-line progress / normal one-liner / truncation).

After this change, the same failure would be surfaced as a one-liner that immediately says "timed out after diagnosable from the side-effects log without log archaeology, with the full raw stderr available in the agent-host log if deeper inspection is needed.60000ms"

(Written by Copilot)

The 30s timeout in addWorktree/addExistingWorktree/removeWorktree was
fine under normal conditions but bumped to 60s to absorb transient
disk-I/O contention on the remote (e.g. many concurrent npm installs
across multiple agent sessions).

When git timed out, _runGit re-threw `new Error(stderr || error.message)`,
which lost the timeout indicator. For `git worktree add`, stderr only
contains git's progress meter (`Updating files: 0% (149/14834)`), so
the surfaced error looked like git progress instead of a timeout.

Now _runGit uses its own timer to flag the timeout case definitively
and a new formatGitError helper produces messages like:

  git worktree timed out after 60000ms: <stderr tail>
  git worktree exited with code 128: fatal: invalid reference: ...

The underlying ExecFileException is preserved as Error.cause.

(Written by Copilot)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 25, 2026 18:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves agent-host git worktree operations by increasing the timeout budget and making git timeout failures diagnosable (rather than surfacing git progress output as the error), with unit tests covering the new error formatting helper.

Changes:

  • Increased git worktree operation timeouts from 30s to 60s.
  • Reworked _runGit to use a custom timeout timer so timeouts can be reliably distinguished from other kill paths.
  • Added formatGitError helper (exported for tests) plus unit tests covering timeout/signal/exit-code/fallback formatting.
Show a summary per file
File Description
src/vs/platform/agentHost/node/agentHostGitService.ts Increases worktree timeouts; adds custom timeout handling; introduces formatGitError and uses Error.cause to preserve the original ExecFileException.
src/vs/platform/agentHost/test/node/agentHostGitService.test.ts Adds unit tests for formatGitError classification branches.

Copilot's findings

  • Files reviewed: 2/2 changed files
  • Comments generated: 3

Comment thread src/vs/platform/agentHost/node/agentHostGitService.ts
Comment thread src/vs/platform/agentHost/node/agentHostGitService.ts
Comment thread src/vs/platform/agentHost/node/agentHostGitService.ts
- Use child.on('exit') to clear the timeout, so 'timer' is no longer
  referenced in the execFile callback before its declaration.
- Add summarizeStderrForError() to squash carriage-return-heavy progress
  meter output into a single short line for the thrown error message,
  with a 200-char cap.
- Log the full unmodified stderr at warn level via ILogService whenever
  git exits with an error, so the raw output is still available even
  though the thrown error message is summarized.
- Add unit tests for summarizeStderrForError and update the existing
  formatGitError timeout test to exercise multi-line input.

(Written by Copilot)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@roblourens roblourens marked this pull request as ready for review May 25, 2026 20:26
@roblourens roblourens enabled auto-merge (squash) May 25, 2026 20:26
@roblourens roblourens merged commit d9eb8ee into main May 26, 2026
25 checks passed
@roblourens roblourens deleted the agents/vsckb-explore-look-at-the-logs-in-users-432a75fe branch May 26, 2026 00:51
@vs-code-engineering vs-code-engineering Bot added this to the 1.122.0 milestone May 26, 2026
anthonykim1 added a commit that referenced this pull request May 26, 2026
Squashed cherry-pick of 10 commits from main that are included in the
Insiders build (183159e) people are verifying:

- agentHost: show fetched URL for web_fetch (#318240)
- Fix SSH remote agent host passphrase auth (#318244)
- agentHost: add setting to disable worktreeCreated task auto-dispatch (#318243)
- Agent host: clearer worktree git timeout errors and 60s budget (#318242)
- Normalize LF to CRLF in agent host terminal tool output (#318257)
- sessions: restore X-button removal of SSH remote agent host entries (#318262)
- chat: fix duplicate command registration for agent-host-copilotcli (#318273)
- launch: build copilot in compile; wait for CDP in launch.sh (#318272)
- Preserve unread state across remote host disconnect (#318267)
- Add more codenotify for terminal (#318285)
dileepyavan pushed a commit that referenced this pull request May 27, 2026
Squashed cherry-pick of 10 commits from main that are included in the
Insiders build (183159e) people are verifying:

- agentHost: show fetched URL for web_fetch (#318240)
- Fix SSH remote agent host passphrase auth (#318244)
- agentHost: add setting to disable worktreeCreated task auto-dispatch (#318243)
- Agent host: clearer worktree git timeout errors and 60s budget (#318242)
- Normalize LF to CRLF in agent host terminal tool output (#318257)
- sessions: restore X-button removal of SSH remote agent host entries (#318262)
- chat: fix duplicate command registration for agent-host-copilotcli (#318273)
- launch: build copilot in compile; wait for CDP in launch.sh (#318272)
- Preserve unread state across remote host disconnect (#318267)
- Add more codenotify for terminal (#318285)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants