Skip to content

fix: prevent double terminal-attach on reconnect#116

Merged
gbasin merged 14 commits intomasterfrom
fix/double-attach-on-reconnect
Mar 31, 2026
Merged

fix: prevent double terminal-attach on reconnect#116
gbasin merged 14 commits intomasterfrom
fix/double-attach-on-reconnect

Conversation

@gbasin
Copy link
Copy Markdown
Owner

@gbasin gbasin commented Mar 30, 2026

Summary

  • Eliminate double terminal-attach after WebSocket reconnect that was sending 124KB (2x62KB scrollback) into fresh TCP connections still in slow start
  • On 5G/Tailscale this caused 5-11 second blank screens; subsequent session switches were instant (~100ms)

Three-layer fix

  • Server-side dedup: skip duplicate scrollback for same session within 500ms
  • Client-side debounce: 50ms debounce on terminal-attach collapses rapid epoch changes
  • WebSocket connect guard: prevent double connections from opening in rapid succession
  • Atomic epoch update: moved connectionEpoch into Zustand alongside connectionStatus

Verified locally

  • TCP throttling proxy (8KB/s) reproduces the issue: double-attach takes 8.6s vs single-attach 2.7s (57% improvement)
  • 643 tests pass, including new integration and performance tests

Test plan

  • Verify on iOS/5G that reconnect paints faster
  • Verify session switching still works normally
  • Verify terminal input works after reconnect

gbasin added 14 commits March 30, 2026 16:36
connectionStatus lived in Zustand (synchronous) while connectionEpoch
lived in React useState (batched). On reconnect, Zustand's status update
triggered a render with the stale epoch, causing useTerminal to attach.
Then the useState epoch update triggered a second render and second attach.
The first server response was discarded by the second terminal.reset(),
adding 1-2s on WiFi and potentially much more on slow connections.

Move connectionEpoch into the Zustand store alongside connectionStatus
so both update atomically in a single set() call.
Logs when iOS repaints are scheduled/executed, what trigger caused them,
whether the terminal has content at repaint time, and when xterm.write()
completes. This will help diagnose blank-screen-after-reconnect on iOS
where the connection works but the canvas doesn't paint.
iOS can silently kill the WebGL context during background without firing
the onContextLoss event. When the app returns to foreground, xterm tries
to render through the dead context, producing a blank screen even though
the terminal buffer has data and input works.

Force-dispose and recreate the WebGL addon on every iOS visibility resume
to ensure a fresh rendering context. This explains why the issue was worse
on 5G (longer background time = more likely context reclamation).
1. Server-side dedup: if two terminal-attach messages arrive for the same
   session+target within 500ms, skip the second scrollback capture and
   send only terminal-ready. This prevents 62KB of redundant data.

2. Client-side debounce: wrap the terminal-attach send in a 50ms
   setTimeout so rapid connectionEpoch changes collapse into one attach.

3. WebSocket connect guard: skip connect() if a socket is already in
   CONNECTING state and was created <200ms ago, preventing rapid
   connect-destroy-connect cycles that produce two onopen events.

The double-attach was sending 124KB into a fresh TCP connection still in
slow start, causing 5-11 second delays on 5G/Tailscale. Subsequent
session switches over the warm connection were instant (~100ms).
Three tests verify the server-side double-attach prevention:

1. Rapid back-to-back attaches: seq mechanism cancels the first, only
   one scrollback is sent
2. Second attach within 500ms after first completes: dedup layer skips
   scrollback, sends only terminal-ready
3. Different sessions within 500ms: NOT deduped (correctness check)
TCP throttling proxy (8KB/s) simulates slow start. Results:

  Double-attach (bug):   8.9s median, 43KB data
  Single-attach (fix):   3.9s median, 19KB data
  Improvement:           55% faster, 57% less data

Also fix flaky assertion in dedup test — both seq mechanism and dedup
layer can handle the duplicate, so accept 1-2 terminal-ready messages
but assert exactly 1 scrollback capture.
Uses deferred timers to simulate the double-onopen scenario:
epoch changes 1→2→3 in rapid succession. Asserts that only ONE
terminal-attach is sent (not two), confirming the 50ms debounce
collapses the duplicates.
P1 from review: server lastAttachKey/lastAttachTs were set before the
scrollback capture completed, so a second attach could hit the dedup
fast-path while the first was still in flight. Moved dedup key recording
to after the successful switchTo().

P1 client-side: refs stay before debounce (reverted attempted move).
Input during the 50ms debounce is naturally suppressed because
terminal-input uses attachedSessionRef as sessionId, and the server
drops input when ws.data.currentSessionId doesn't match until the
terminal-attach handler runs.
Review P1: lastAttachKey/lastAttachTs survived proxy replacement, so a
same-session attach within 500ms of a proxy teardown+recreate could hit
the dedup fast-path on a brand-new proxy that was never switched to the
target. Clear dedup state in cleanupTerminals() and ensureCorrectProxyType()
proxy replacement path.
…select/unmount

Review P1s:
- Server: onExit callbacks for local and SSH proxies now clear
  lastAttachKey/lastAttachTs via shared clearAttachDedup() helper.
  Prevents stale dedup state from suppressing legitimate reattach
  after proxy death+recreation.
- Client: debounce timer cancelled on session deselection (sessionId→null)
  and in effect cleanup (unmount/re-run). Prevents stale terminal-attach
  from firing after the UI has moved on.
Review P1s:
- detachTerminalPersistent now calls clearAttachDedup so fast A→B→A
  session switches don't skip scrollback for A
- All proxy start/error paths (createPersistentTerminal,
  ensurePersistentTerminal, ensureCorrectProxyType reuse failures,
  createAndStartSshProxy, cleanupAllTerminals) now clear dedup state
  when ws.data.terminal is nulled
Three-layer fix for slow terminal rendering on iOS reconnect over
5G/Tailscale. The double-attach was sending 124KB into fresh TCP
connections still in slow start, causing 5-11 second blank screens.

- Server-side dedup: skip duplicate scrollback for same session <500ms
- Client-side debounce: 50ms debounce collapses rapid epoch changes
- WebSocket connect guard: prevent double connections opening
- Atomic Zustand epoch: connectionEpoch + connectionStatus update together
- WebGL context recreation on iOS visibility resume
Split terminal scrollback into 12KB chunks before sending. On
high-latency connections (5G/Tailscale), the first chunk fits in TCP's
initial congestion window (~14.6KB) and arrives after one RTT instead
of waiting for the full payload to drain through slow start.

Throttled proxy test (8KB/s + 100ms RTT) confirms:
  Double-attach (old):  7.9s, 36KB first output
  Single + chunked:     3.0s, 12KB first output
  Improvement:          62% faster, 67% less data to first paint
Review P1: ios_write_complete was firing at info level on every terminal
write flush, spamming /api/client-log on the exact slow mobile resume
path this PR optimizes. Changed to only log writes >50ms as ios_write_slow
at debug level.
@gbasin gbasin merged commit b3ee1a3 into master Mar 31, 2026
5 checks passed
@gbasin gbasin deleted the fix/double-attach-on-reconnect branch March 31, 2026 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant