Skip to content

agentHost: narrow Copilot resume fallback to true empty-session errors#319456

Merged
roblourens merged 5 commits into
mainfrom
agents/vsckb-implement-please-investigate-why-this-sess-56e5215e
Jun 5, 2026
Merged

agentHost: narrow Copilot resume fallback to true empty-session errors#319456
roblourens merged 5 commits into
mainfrom
agents/vsckb-implement-please-investigate-why-this-sess-56e5215e

Conversation

@roblourens
Copy link
Copy Markdown
Member

@roblourens roblourens commented Jun 1, 2026

Symptom

Opening an existing Agent Host session sometimes shows an empty chat view, even when the same session opens fine in copilot CLI. The session contents appear lost.

Root cause

CopilotAgent._doResumeSession had a too-broad catch block: any -32603 JSON-RPC error from client.resumeSession was treated as "this session has no messages", and the agent fell back to creating a new session with the same ID via client.createSession({ sessionId }). The user then saw an empty chat instead of the real error.

In the case that prompted this investigation, the real failure was upstream schema validation: a session.compaction_complete event in the session file had tokenDetails[].batchSize: 0, and the installed @github/copilot@1.0.55-3 schema declares exclusiveMinimum: 0 for that field, so the SDK rejected the entire file with -32603 Session file is corrupted (...batchSize: Number must be greater than 0). (Upstream relaxed this to minimum: 0 in @github/copilot@1.0.56, but the underlying that the writer can emit 0 while the reader rejects is independent of our fallback.)it issue

Why the fallback exists (don't remove it)

The fallback was introduced (54caaaf3943, "finally works v.v") for the case where the on-disk session legitimately has zero most commonly after the user invokes "Start Over", which calls truncateSession and removes every turn. The SDK then refuses to resume an empty session, and the only reasonable recovery is to create a fresh session with the same ID so the user can keep typing into the same chat. This PR preserves that behavior.events

Fix

Invert the heuristic. Treat any -32603 from resumeSession as the empty-session case (preserving Start Over and any future SDK rewording of the empty-session message) unless the message clearly indicates corruption / schema validation / parse failure / malformed those propagate so the user sees the real error rather than silently getting an empty session.input

function shouldCreateEmptySessionAfterResumeError(err: unknown): boolean {
    if (getCopilotSdkErrorCode(err) !== -32603) {
        return false;
    }
    const message = getErrorMessage(err);
    return !/\b(corrupt|corrupted|invalid|validation|schema|must be|parse|malformed|unexpected token)\b/i.test(message);
}

Out of scope

1.0.56bump fixes the specificbatchSize: 0` schema issue but is a 70-file delta (CLI JS, schemas, native prebuilds, ripgrep, MCP behavior changes) that we don't want to ship the day before a release without smoke testing. If a runtime upgrade is desired, it should be its own PR.

Tests

Three regression tests in a new _resumeSession fallback suite in src/vs/platform/agentHost/test/node/copilotAgent.test.ts:

fallback fires, new session is created with the same id.
fallback fires (defensive default).
fallback does NOT fire, the error propagates.

The tests exercise the real _doResumeSession path via a small ResumePathCopilotAgent subclass (the existing TestableCopilotAgent bypasses it).

(Written by Copilot)

The catch block in CopilotAgent._doResumeSession converted *any* -32603
JSON-RPC error from client.resumeSession into a fresh session created with
the same id via client.createSession({ sessionId }). When the underlying
session file fails schema validation (e.g. a session.compaction_complete
event with batchSize: 0, rejected by @github/copilot's schema), this masked
the corruption: the user saw an empty chat instead of an error, and the
original session contents were not surfaced.

Narrow the fallback so we only recreate an empty session when the error
message clearly indicates 'no messages / empty session', and never when the
message looks like corruption / validation / parse failure. All other
-32603s now propagate so the UI and logs reflect the real failure.

Add regression tests that exercise the real _doResumeSession path via a
ResumePathCopilotAgent subclass: one confirming the empty-session fallback
still works, one confirming a corrupted-session error is no longer swallowed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 1, 2026 21:28
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Narrows the Copilot resume-session fallback so an empty session is only recreated when the SDK error message clearly indicates "no messages"/"empty session", and never when it indicates corruption/validation/parse failures. This prevents users with a corrupted session file (e.g. batchSize: 0 schema validation rejection) from silently being shown an empty new session under the same ID instead of the real error.

Changes:

  • Add getCopilotSdkErrorCode, getErrorMessage, and shouldCreateEmptySessionAfterResumeError helpers and use them in _doResumeSession's catch path.
  • Replace the broad errCode !== -32603 guard with the narrower message-based check.
  • Add a _resumeSession fallback test suite exercising the real resume path via a new ResumePathCopilotAgent test subclass and a useRealResumePath option.
Show a summary per file
File Description
src/vs/platform/agentHost/node/copilot/copilotAgent.ts Introduces the SDK-error inspection helpers and narrows the resume fallback condition.
src/vs/platform/agentHost/test/node/copilotAgent.test.ts Adds regression tests covering both the empty-session fallback and the corrupted-session propagation, with a ResumePathCopilotAgent to drive the real _doResumeSession.

Copilot's findings

  • Files reviewed: 2/2 changed files
  • Comments generated: 0

The previous narrowing was too aggressive: it required the SDK error
message to match a small whitelist ('no messages', 'empty session', ...)
to trigger the fallback. But the legitimate empty-session case this
fallback exists for is post-'Start Over' / post-truncateSession, where
the SDK currently surfaces a generic 'no events' message that doesn't
match any of those phrases. The previous version would have regressed
Start Over (user truncates all turns, reopens session, gets a hard
error instead of an empty chat).

Invert the heuristic: treat any -32603 from resumeSession as the
empty-session case (preserving Start Over and any future SDK rewording)
UNLESS the message clearly indicates corruption / schema validation /
parse failure / malformed  those should propagate so the userinput
sees the real error rather than silently getting an empty session.

Refresh tests: add a Start Over / truncated-session test using a
realistic 'returned no events' message, plus an 'unknown -32603'
defensive test; keep the corruption test.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
roblourens and others added 3 commits June 1, 2026 17:07
PR #312679 (folding markers pattern+flags syntax) removed a doc line
from FoldingMarkers in languageConfiguration.ts but didn't regenerate
src/vs/monaco.d.ts. Picking up the regeneration so the monaco-d.ts
check passes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…t-please-investigate-why-this-sess-56e5215e

# Conflicts:
#	src/vs/platform/agentHost/node/copilot/copilotAgent.ts
@roblourens roblourens marked this pull request as ready for review June 5, 2026 15:34
@roblourens roblourens enabled auto-merge (squash) June 5, 2026 15:34
@roblourens roblourens merged commit 401bbd9 into main Jun 5, 2026
25 checks passed
@roblourens roblourens deleted the agents/vsckb-implement-please-investigate-why-this-sess-56e5215e branch June 5, 2026 16:22
@vs-code-engineering vs-code-engineering Bot added this to the 1.124.0 milestone Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants