Skip to content

agentHost: add diagnostic tracing for session disconnect errors#312078

Merged
osortega merged 3 commits intomainfrom
osortega/session-disconnect-tracing
Apr 23, 2026
Merged

agentHost: add diagnostic tracing for session disconnect errors#312078
osortega merged 3 commits intomainfrom
osortega/session-disconnect-tracing

Conversation

@osortega
Copy link
Copy Markdown
Contributor

@osortega osortega commented Apr 23, 2026

Problem

After a WebSocket reconnect (e.g. tab hidden → server kills socket → tab visible), sending a message to an existing agent session fails with:

Error: (sendFailed) Error: Request session.send failed with message: Session not found: <session-id>

The root cause is unclear — multiple theories exist but none are proven:

  1. Stale in-memory SDK session reused after clientId change (isOutdated doesn't compare clientId)
  2. SDK resumeSession failing with a non--32603 error code
  3. SDK session expiring independently of the agent host lifecycle

Changes

Diagnostic logging (4 files)

Adds logging at every decision point in the session lifecycle to determine the exact failure path:

  • copilotAgent.tssendMessage: logs cached entry status, isOutdated result, full error code/message/type on failure. setClientTools: logs clientId + whether a cached SDK session exists. _resumeSession: logs SDK resumeSession/createSession calls and results.
  • agentSideEffects.ts — logs error code and type on sendMessage failure (was only logging raw error)
  • protocolServerHandler.ts — logs subscription count on client disconnect
  • remoteAgentHost.contribution.ts — logs old vs new clientId on reconnect

What the logs will tell us

When the error reproduces, the sequence will reveal:

[ProtocolServer] Client disconnected: <old-id>, subscriptions=N
[RemoteAgentHost] Reconnecting: oldClientId=X, newClientId=Y
[Copilot:<session>] setClientTools: clientId=Y, hasCachedSdkSession=true/false
[Copilot:<session>] sendMessage: cachedEntry=true/false, hasActiveClient=true/false
[Copilot:<session>] entry.send() failed: code=<X>, message=...

This proves whether it's path A (stale cached entry) or something else.

Add logging at key decision points to diagnose 'Session not found'
errors after WebSocket reconnect. The logs will reveal:

- Whether the stale in-memory SDK session is reused (cachedEntry=true)
- Whether isOutdated correctly detects clientId changes
- The exact error code from the SDK when send() fails
- Client disconnect/reconnect lifecycle with old/new clientIds

Also adds a retry path in sendMessage: if entry.send() fails with
code -32603, dispose the stale entry and retry via _resumeSession.

Also adds clientId comparison to ActiveClient.isOutdated() so the SDK
session is proactively refreshed when a different client takes over.
Copilot AI review requested due to automatic review settings April 23, 2026 04:44
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds diagnostic tracing across the agent host session lifecycle to help pinpoint the cause of “Session not found” failures after reconnects, by emitting structured logs around reconnect/disconnect, session caching, resume/create fallbacks, and send failures.

Changes:

  • Add detailed lifecycle and failure-path logging in CopilotAgent around setClientTools, sendMessage, and _resumeSession.
  • Improve disconnect logging to include subscription counts and reconnect logging to include old/new client IDs.
  • Enhance AgentSideEffects send failure logs to include error code/type/message.
Show a summary per file
File Description
src/vs/sessions/contrib/remoteAgentHost/browser/remoteAgentHost.contribution.ts Logs old/new clientId and whether name changed when reconnecting a contribution.
src/vs/platform/agentHost/node/protocolServerHandler.ts Logs subscription count when a protocol client disconnects.
src/vs/platform/agentHost/node/copilot/copilotAgent.ts Adds detailed logs around session caching, resumeSession/createSession behavior, and send failures.
src/vs/platform/agentHost/node/agentSideEffects.ts Logs sendMessage failures with code/type/message context.

Copilot's findings

Comments suppressed due to low confidence (1)

src/vs/platform/agentHost/node/copilot/copilotAgent.ts:972

  • SDK resumeSession failed is logged without including the thrown error object, which can hide stack traces and any additional fields the SDK attaches. Pass err as an additional logger argument (as is done for getSessionMetadata failed) so the diagnostic output remains actionable.
		const snapshot = activeClient ? await activeClient.snapshot() : undefined;
		const storedMetadata = await this._readSessionMetadata(sessionUri);
		const sessionMetadata = await client.getSessionMetadata(sessionId).catch(err => {
			this._logService.warn(`[Copilot:${sessionId}] getSessionMetadata failed`, err);
			return undefined;
  • Files reviewed: 4/4 changed files
  • Comments generated: 5

Comment thread src/vs/platform/agentHost/node/copilot/copilotAgent.ts
Comment thread src/vs/platform/agentHost/node/agentSideEffects.ts Outdated
Comment thread src/vs/platform/agentHost/node/copilot/copilotAgent.ts
Comment thread src/vs/platform/agentHost/node/copilot/copilotAgent.ts
Comment thread src/vs/platform/agentHost/node/copilot/copilotAgent.ts
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 23, 2026

Screenshot Changes

Base: 9932ad01 Current: 46f0b07b

Changed (2)

chat/aiCustomizations/aiCustomizationManagementEditor/McpBrowseMode/Light
Before After
before after
editor/inlineCompletions/other/JumpToHint/Dark
Before After
before after

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@osortega osortega enabled auto-merge (squash) April 23, 2026 04:51
@osortega osortega merged commit a76a61c into main Apr 23, 2026
26 checks passed
@osortega osortega deleted the osortega/session-disconnect-tracing branch April 23, 2026 05:12
@vs-code-engineering vs-code-engineering Bot added this to the 1.118.0 milestone Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants