Skip to content

tunnel: preserve connection_id on resume URL for affinity routing#54

Merged
joelgwebber merged 1 commit intomainfrom
joel/tunnel-resume-preserve-connection-id
May 5, 2026
Merged

tunnel: preserve connection_id on resume URL for affinity routing#54
joelgwebber merged 1 commit intomainfrom
joel/tunnel-resume-preserve-connection-id

Conversation

@joelgwebber
Copy link
Copy Markdown
Contributor

Summary

The resume reconnect was deleting connection_id from the WebSocket URL along with the spent token. The relay's affinity router hashes on that connection_id to route the WS to the pod that owns the chromium browser context. Without it, the router minted a fresh UUID and the reconnect landed on a random pod.

The new tunnel registered there with the (correct, preserved) connection_id, but the chromium-side forward proxy on the original pod still couldn't see it — so the next navigation got ERR_TUNNEL_CONNECTION_FAILED and chromium showed chrome-error://chromewebdata/. Symptom in the field: tunnel-status reports state=ready, the new tunnel really is registered, but it's on a pod the chromium browser doesn't know about — so any subsequent live-view-* call's navigation fails with a blank chrome-error page.

This is the companion to cowpaths/mn#103915 (preserve connection_id in the trace row across resume). Both are needed end-to-end:

  • Trace-row preservation alone → the relay still has a stable connection_id to issue, but the WS reconnect has no affinity hint and lands on the wrong pod.
  • URL preservation alone → the relay would have nothing to issue if the trace row had rotated.

Together they keep the WS pinned to the chromium-owning pod across reconnect.

Tracked in SUBTEXT-338. Bumps to 0.1.14.

Test plan

  • Unit test: resume reconnect strips spent token but preserves connection_id (inverted from the prior assertion that connection_id was stripped — that assertion was the bug, not the spec). Asserts the post-ready value (server-cid) appears on the reconnect URL.
  • Full suite passes (127/127).
  • Manual: reconnect against staging after a forced WS drop, verify the next navigation succeeds rather than chrome-erroring.

🤖 Generated with Claude Code

The resume reconnect was deleting connection_id from the WebSocket URL
along with the spent token. The relay's affinity router hashes on that
connection_id to route the WS to the pod that owns the chromium browser
context; without it, the router minted a fresh UUID and the reconnect
landed on a random pod. The new tunnel registered there with the
(correct, preserved) connection_id, but the chromium-side forward proxy
on the original pod still couldn't see it — so the next navigation got
ERR_TUNNEL_CONNECTION_FAILED and chromium showed chrome-error://chromewebdata/.

Companion fix to cowpaths/mn#103915 (preserve connection_id in the trace
row across resume): both are needed end-to-end.

Bumps to 0.1.14.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@sirrah23 sirrah23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@joelgwebber joelgwebber merged commit 3396e09 into main May 5, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants