Conversation
…h short-circuit, host-online resume, session cleanup Transport layer: - Add malformed-frame detection across all three transports (WebSocketClientTransport, TunnelConnectionTransport, TunnelRelayTransport): warn log first 5 per connection, force-close transport after >10 to trigger reconnect loop instead of silently dropping corrupt data. Shared constants in new transportConstants.ts. Reconnect logic (tunnelAgentHost.contribution.ts): - Auth-error short-circuit: authExpired/auth errors immediately pause reconnects instead of burning 10 retry slots, resume driven by onDidChangeSessions. - Host-online auto-resume: _silentStatusCheck detects when a host-offline-paused tunnel comes back online and auto-resumes without needing a wake/visibility event. - Session-removal cleanup: react to github session removal by tearing down matching tunnel state and best-effort disconnect. - Richer _categorizeError: distinguish authExpired (401/403/token expired) from generic auth, add ECONN/ENOTFOUND/ETIMEDOUT to network category. Telemetry: - Add authExpired to TunnelConnectErrorCategory and TunnelConnectFailureReason types. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR hardens the tunnel-based remote agent host connection flow in the Agents window by surfacing malformed protocol frames, improving reconnect behavior around auth/host-offline states, and extending telemetry to distinguish auth-expired failures.
Changes:
- Add malformed JSON frame detection/logging + forced-close thresholds across transports (shared constants in
transportConstants.ts). - Short-circuit reconnect on auth/authExpired failures and resume on GitHub session changes; add host-online auto-resume for host-offline pauses.
- Extend tunnel connect telemetry categories/reasons to include
authExpired.
Show a summary per file
| File | Description |
|---|---|
| src/vs/sessions/contrib/remoteAgentHost/browser/webTunnelAgentHostService.ts | Adds malformed-frame counting/logging + forced-close for the web tunnel connection transport. |
| src/vs/sessions/contrib/remoteAgentHost/browser/tunnelAgentHost.contribution.ts | Improves reconnect pausing/resuming for auth failures, session removal, and host-offline recovery; adds rate-limiting constant. |
| src/vs/sessions/common/sessionsTelemetry.ts | Extends tunnel connect telemetry types with authExpired. |
| src/vs/platform/agentHost/electron-browser/tunnelRelayTransport.ts | Adds malformed-frame handling + forced disconnect for relay IPC transport. |
| src/vs/platform/agentHost/common/transportConstants.ts | Introduces shared malformed-frame thresholds for consistent behavior across transports. |
| src/vs/platform/agentHost/browser/webSocketClientTransport.ts | Adds malformed-frame handling + forced close for direct WebSocket transport. |
Copilot's findings
Comments suppressed due to low confidence (1)
src/vs/sessions/contrib/remoteAgentHost/browser/tunnelAgentHost.contribution.ts:623
- _resumeReconnects currently applies the same rate-limit to the 'sessionAdded' trigger. If a user signs in shortly after a wake/visibility resume, the session-added resume can be dropped, leaving auth-paused tunnels stuck until another wake/visibility event. Consider bypassing the rate-limit for 'sessionAdded' (or using a separate timestamp) so auth refresh reliably restarts reconnects immediately.
private _resumeReconnects(trigger: 'wake' | 'visible' | 'sessionAdded'): void {
if (!this._configurationService.getValue<boolean>(RemoteAgentHostsEnabledSettingId)) {
return;
}
// Rate-limit rapid wake/visibility events (e.g. alt-tab bursts or
// flaky Wi-Fi toggling online/offline) so we don't hammer the relay
// with immediate retries. This is an event-smoothing gate, not an
// error-backoff — that's handled by `_scheduleReconnect`.
const now = Date.now();
if (now - this._lastResumeAt < RESUME_RATE_LIMIT_MS) {
return;
- Files reviewed: 6/6 changed files
- Comments generated: 3
- _categorizeError: remove \btoken\b from auth regex to avoid matching 'connection token' protocol errors. Use 'auth.*(fail|error|invalid)' instead, which catches real auth failures without over-matching. - _silentStatusCheck: pass 'github' authProvider to cacheTunnel() so auto-discovered tunnels are properly matched by _handleSessionsChange for teardown on session removal. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
- Pass actual errorCategory ('auth'|'authExpired') to _pauseReconnect
instead of always using 'authExpired'. Add 'auth' to
TunnelConnectFailureReason type.
- Update telemetry classification comments to list all current enum
members (authExpired for errorCategory, auth/authExpired for
failureReason).
- Log actual data type (ArrayBuffer/Blob) and byte length for non-string
WebSocket frames instead of coercing to empty string.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
benvillalobos
approved these changes
Apr 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Improves tunnel agent host connection reliability with targeted fixes for specific failure modes identified during a deep comparison of the vscode.dev/agents connection stack vs codespaces-web.
Companion PR: microsoft/vscode-dev#1410 (server-side structured close codes + SDK client pool)
Changes
Transport malformed-frame hardening
All three transports (
WebSocketClientTransport,TunnelConnectionTransport,TunnelRelayTransport) now detect and surface malformed JSON frames instead of silently dropping them:transportConstants.tsAuth-error short-circuit
_categorizeErrornow distinguishesauthExpired(401/403, expired tokens) from genericautherrors. Both immediately pause reconnects instead of burning 10 retry slots with guaranteed-failing attempts. Resume is driven byonDidChangeSessionswhen a fresh GitHub session appears.Host-online auto-resume
_silentStatusCheckdetects when a tunnel paused for host-offline has its host come back online, and auto-resumes reconnect without requiring a wake/visibility event. Covers the common "laptop came back, remote host came back first" scenario.Session-removal cleanup
Reacts to GitHub auth session removal by tearing down matching tunnel state (reconnect timers, backoff, telemetry sessions) and best-effort disconnect. Previously, signing out left stale reconnect loops running.
Telemetry
Added
authExpiredtoTunnelConnectErrorCategoryandTunnelConnectFailureReasontypes.Validation
npm run compile-check-ts-native✅npm run valid-layers-check✅