Skip to content

Implement cluster-side global router with WebSocket tunneling (MIR-825)#686

Merged
evanphx merged 7 commits intomainfrom
mir-825-implement-cluster-side-of-websocket-and-pop-connec
Mar 23, 2026
Merged

Implement cluster-side global router with WebSocket tunneling (MIR-825)#686
evanphx merged 7 commits intomainfrom
mir-825-implement-cluster-side-of-websocket-and-pop-connec

Conversation

@evanphx
Copy link
Copy Markdown
Contributor

@evanphx evanphx commented Mar 18, 2026

Summary

  • Global router: Adds WebSocket-based tunnel between clusters and PoP edge servers for NAT traversal, allowing clusters behind NAT/firewalls to receive inbound traffic via Miren Cloud's PoP infrastructure
  • Netcheck: Adds public reachability detection via cloud endpoint, caching results and using them to build accurate API address lists in status reports. Hardcodes port 8443 and validates c.Address contains a parseable IP before including it
  • HTTP ingress server: New ingress handler that routes incoming HTTP requests from the global router to the appropriate app sandboxes

Test plan

  • Verify netcheck correctly determines public reachability and caches results
  • Verify apiAddresses() excludes c.Address when it lacks a valid IP host (e.g. :8443)
  • Verify global router WebSocket tunnel connects to PoP and handles reconnection
  • Verify HTTP ingress correctly routes requests through the tunnel
  • Run pkg/cloudauth and pkg/globalrouter unit tests

evanphx added 3 commits March 17, 2026 16:14
Add pkg/globalrouter package that connects clusters to the cloud
coordination service and POP servers for NAT traversal:

- WebSocket client with JWT auth and exponential backoff reconnection
- Two-connection POP protocol (control plane + data plane over QUIC)
- HTTP/3 request forwarding from POPs to local apps via httpingress
- WebSocket tunnel handler (/_pop/ws) with tunnel frame encoding that
  bridges POP WebSocket tunnels to local app WebSocket connections
- Message handlers for connection.request, time sync, and org info
- AcquireTunnel method on httpingress for direct sandbox URL resolution

Gated behind the globalrouter labs flag and cloud auth being enabled.
MIR-825)

Introduce netcheck to determine public reachability via the cloud endpoint,
cache results, and use them to build accurate API address lists for status
reports. Hardcode port 8443 instead of parsing c.Address, and skip c.Address
from the address list when it lacks a valid IP host. Use net.JoinHostPort
for proper IPv6 formatting.
@evanphx evanphx requested a review from a team as a code owner March 18, 2026 03:13
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 18, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 15cce235-f452-403a-af42-d1d9784b3d54

📥 Commits

Reviewing files that changed from the base of the PR and between 09b60e0 and a046c96.

📒 Files selected for processing (3)
  • pkg/globalrouter/client.go
  • pkg/globalrouter/pop.go
  • servers/httpingress/httpingress.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • servers/httpingress/httpingress.go

📝 Walkthrough

Walkthrough

Adds netcheck support: cloudauth.Netcheck client, tests, and Coordinator state/logic to query, cache, and expose public reachability and to derive API listen addresses from netcheck results. Introduces a globalrouter package with a WebSocket client, message router, GlobalRouter, POPManager, protocol framing, and tests; Coordinator optionally starts a background GlobalRouter. Extends httpingress with TunnelConn and Server.AcquireTunnel. Updates go.mod and CI manifests (hack/test-groups.json, hack/test-times.json). No exported public method signatures were removed; several new exported types and methods were added in globalrouter and httpingress.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (5)
pkg/globalrouter/pop_test.go (1)

52-53: Assert HandleConnectionRequest errors in test setup paths.

These calls are part of test setup; ignoring errors can let tests pass with broken preconditions.

💡 Suggested fix
-	pm.HandleConnectionRequest(t.Context(), req)
+	if err := pm.HandleConnectionRequest(t.Context(), req); err != nil {
+		t.Fatalf("HandleConnectionRequest: %v", err)
+	}
 	pm.RemovePOP("pop-1")
-		pm.HandleConnectionRequest(t.Context(), req)
+		if err := pm.HandleConnectionRequest(t.Context(), req); err != nil {
+			t.Fatalf("HandleConnectionRequest: %v", err)
+		}

Also applies to: 74-75

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/globalrouter/pop_test.go` around lines 52 - 53, The test setup currently
ignores errors returned by pm.HandleConnectionRequest(...) and
pm.RemovePOP(...); update the test to capture and assert these errors (e.g., err
:= pm.HandleConnectionRequest(...); require.NoError(t, err) or t.Fatalf on
error) so setup failures fail the test immediately; do the same for
pm.RemovePOP(...) and for the other occurrences at the later setup lines (the
calls on lines 74-75).
pkg/globalrouter/client.go (2)

79-101: Backoff should reset after a successful connection.

The backoff increases on each disconnection but never resets when a connection is successfully established and maintained. After a transient failure followed by recovery, subsequent disconnections will start with an unnecessarily high backoff.

Proposed fix to reset backoff on successful connection
 func (c *Client) Run(ctx context.Context) error {
 	backoff := initialBackoff
 
 	for {
 		err := c.runOnce(ctx)
 		if ctx.Err() != nil {
 			return ctx.Err()
 		}
 
+		// Reset backoff on successful connection (err != connect error)
+		// A connection that lasted indicates recovery
+		backoff = initialBackoff
+
 		c.log.Warn("websocket disconnected, reconnecting",
 			"error", err, "backoff", backoff)
 
 		select {
 		case <-ctx.Done():
 			return ctx.Err()
 		case <-time.After(backoff):
 		}
 
 		backoff = min(backoff*2, maxBackoff)
 	}
 }

Alternatively, reset backoff only if the connection was healthy for a minimum duration (e.g., 30 seconds).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/globalrouter/client.go` around lines 79 - 101, The Run loop currently
grows backoff on every disconnect but never resets it; modify Client.Run so that
after a successful sustained connection (i.e., runOnce returns nil and the ctx
hasn't been cancelled) you reset backoff = initialBackoff (or reset only if the
connection lasted at least a threshold, e.g., 30s) before continuing; locate the
backoff, initialBackoff, maxBackoff and runOnce references in Client.Run and
update the logic to track the last successful connect time (or check runOnce's
success) and reset backoff accordingly.

172-187: Consider reducing log verbosity for production.

Logging at Info level for every received and dispatched message (lines 179, 184) will be very verbose under load. Consider using Debug level for per-message logging.

Proposed change
-		c.log.Info("received message from cloud", "type", env.Type, "data_len", len(env.Data))
+		c.log.Debug("received message from cloud", "type", env.Type, "data_len", len(env.Data))
 
 		if err := c.router.Dispatch(ctx, env); err != nil {
 			c.log.Warn("dispatch error", "type", env.Type, "error", err)
 		} else {
-			c.log.Info("message dispatched successfully", "type", env.Type)
+			c.log.Debug("message dispatched successfully", "type", env.Type)
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/globalrouter/client.go` around lines 172 - 187, The per-message Info logs
in Client.readLoop (the c.log.Info calls logging received messages and
successful dispatches for Envelope handling) are too verbose for production;
change those per-message logs to use c.log.Debug instead of c.log.Info (leave
the dispatch error warning via c.log.Warn as-is) so normal Info-level production
logging is not flooded—update the two c.log.Info calls in readLoop to
c.log.Debug and ensure the change applies to the Envelope receive and successful
dispatch log paths.
pkg/globalrouter/globalrouter.go (1)

59-65: Errors from SendMessage are silently ignored.

The OnConnect callback calls SendMessage twice but doesn't handle potential marshaling errors. While unlikely to fail for these simple structs, logging the error would aid debugging.

Proposed fix
 	client.OnConnect(func(ctx context.Context) {
 		gr.log.Info("sending initial requests")
-		client.SendMessage(TypeTimeRequest, TimeRequest{
+		if err := client.SendMessage(TypeTimeRequest, TimeRequest{
 			ClientTransmitTime: time.Now().UTC(),
-		})
-		client.SendMessage(TypeOrgInfoRequest, struct{}{})
+		}); err != nil {
+			gr.log.Warn("failed to send time request", "error", err)
+		}
+		if err := client.SendMessage(TypeOrgInfoRequest, struct{}{}); err != nil {
+			gr.log.Warn("failed to send org info request", "error", err)
+		}
 	})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/globalrouter/globalrouter.go` around lines 59 - 65, The OnConnect
callback on client currently calls SendMessage(TypeTimeRequest,
TimeRequest{...}) and SendMessage(TypeOrgInfoRequest, struct{}{}) but ignores
returned errors; update the client.OnConnect handler to capture the error
returned by SendMessage for both calls and log any non-nil error via gr.log (or
return/handle it appropriately). Specifically modify the anonymous func passed
to client.OnConnect to check the error from client.SendMessage when sending
TimeRequest and when sending the org info request, and call gr.log.Error or
gr.log.Warn with contextual messages referencing TypeTimeRequest/TimeRequest and
TypeOrgInfoRequest so marshaling/transport errors are visible.
pkg/globalrouter/pop.go (1)

432-472: Ping error is silently ignored.

At line 466, the result of localConn.Ping(ctx) is discarded. While ping failures may not be critical (the connection will fail on the next read/write anyway), logging at debug level could help diagnose connectivity issues.

Proposed change
 		if msgType == tunnelFrameKeepalive {
-			localConn.Ping(ctx)
+			if err := localConn.Ping(ctx); err != nil {
+				// Connection may be closing; continue to let next read fail naturally
+				return 0, buf, fmt.Errorf("keepalive ping failed: %w", err)
+			}
 			continue
 		}

Alternatively, just continue silently if ping failures should not interrupt the tunnel.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/globalrouter/pop.go` around lines 432 - 472, In readTunnelFrame, the
error returned by localConn.Ping(ctx) is currently discarded; capture the
returned error, and log it at debug/trace level (using the repository's existing
logger, e.g., processLogger or similar) with context like "keepalive ping
failed" but do not return the error — continue the loop so ping failures don't
interrupt processing; update the localConn.Ping(ctx) call site to check err and
log the message including the err value.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/coordinate/coordinate.go`:
- Around line 1135-1145: The loop over result.Results can append duplicate
"host:port" strings to the addrs slice when multiple reachable protocols share
the same port; change the logic in that block (the loop referencing
result.Results, r.Reachable, result.SourceAddress, r.Port and the addrs slice
where net.JoinHostPort is used) to deduplicate before appending — e.g., maintain
a local map[string]struct{} seen and only append the joined host:port if it is
not already in seen, then mark it seen.
- Around line 1087-1090: When netcheck fails or the source is invalid, clear the
stale netcheck cache and ensure gating logic uses the validated result: inside
the netcheck handling blocks that set c.netcheckCheckedAt (references:
c.netcheckMu, c.netcheckCheckedAt, netcheckResult, hasNetcheck and
AdditionalIPs), set c.netcheckMu.Lock(), zero or nil out c.netcheckResult (and
any related validity flag) before updating the timestamp and unlocking; ensure
any downstream gating that calls hasNetcheck or filters AdditionalIPs only
treats the result as present if netcheckResult is valid (apply same change at
the other occurrences mentioned around the blocks at the other occurrences
referenced in the comment).

In `@pkg/globalrouter/pop.go`:
- Around line 186-191: The data plane TLS config (variable dataTLS with
NextProtos {popDataALPN} and ServerName host) is using InsecureSkipVerify and
missing a MinVersion; replace InsecureSkipVerify: true with proper certificate
verification (load/assign a RootCAs pool or use
tls.Config.VerifyPeerCertificate) and set MinVersion: tls.VersionTLS13 to
enforce TLS1.3; update the dataTLS construction so it mirrors the secure
control-plane setup (use the same CA loading/verification logic rather than
skipping verification) to eliminate InsecureSkipVerify while preserving
NextProtos and ServerName.
- Around line 135-140: Add a short explanatory comment above the controlTLS
tls.Config that documents the bootstrap security model (token-based ConnectToken
+ Trust-on-First-Use during initial bootstrap, analogous to the runner join
flow) so readers understand why InsecureSkipVerify is used there; then set
MinVersion: tls.VersionTLS13 in the controlTLS configuration (alongside
NextProtos/http3.NextProtoH3, ServerName, InsecureSkipVerify) to explicitly
enforce TLS 1.3 for defense-in-depth given QUIC/RFC9001.

In `@servers/httpingress/httpingress.go`:
- Around line 1169-1171: The lease acquisition uses context.Background(),
detaching it from the caller; change the timeout context to derive from the
request/caller context so cancellation propagates: replace
context.WithTimeout(context.Background(), leaseAcquisitionTimeout) with
context.WithTimeout(ctx, leaseAcquisitionTimeout) (keeping actContext and
actCancel) and ensure this actContext is passed into AcquireTunnel (and any
downstream calls) so that if the caller cancels the request the lease
acquisition will abort and not retain a lease.

---

Nitpick comments:
In `@pkg/globalrouter/client.go`:
- Around line 79-101: The Run loop currently grows backoff on every disconnect
but never resets it; modify Client.Run so that after a successful sustained
connection (i.e., runOnce returns nil and the ctx hasn't been cancelled) you
reset backoff = initialBackoff (or reset only if the connection lasted at least
a threshold, e.g., 30s) before continuing; locate the backoff, initialBackoff,
maxBackoff and runOnce references in Client.Run and update the logic to track
the last successful connect time (or check runOnce's success) and reset backoff
accordingly.
- Around line 172-187: The per-message Info logs in Client.readLoop (the
c.log.Info calls logging received messages and successful dispatches for
Envelope handling) are too verbose for production; change those per-message logs
to use c.log.Debug instead of c.log.Info (leave the dispatch error warning via
c.log.Warn as-is) so normal Info-level production logging is not flooded—update
the two c.log.Info calls in readLoop to c.log.Debug and ensure the change
applies to the Envelope receive and successful dispatch log paths.

In `@pkg/globalrouter/globalrouter.go`:
- Around line 59-65: The OnConnect callback on client currently calls
SendMessage(TypeTimeRequest, TimeRequest{...}) and
SendMessage(TypeOrgInfoRequest, struct{}{}) but ignores returned errors; update
the client.OnConnect handler to capture the error returned by SendMessage for
both calls and log any non-nil error via gr.log (or return/handle it
appropriately). Specifically modify the anonymous func passed to
client.OnConnect to check the error from client.SendMessage when sending
TimeRequest and when sending the org info request, and call gr.log.Error or
gr.log.Warn with contextual messages referencing TypeTimeRequest/TimeRequest and
TypeOrgInfoRequest so marshaling/transport errors are visible.

In `@pkg/globalrouter/pop_test.go`:
- Around line 52-53: The test setup currently ignores errors returned by
pm.HandleConnectionRequest(...) and pm.RemovePOP(...); update the test to
capture and assert these errors (e.g., err := pm.HandleConnectionRequest(...);
require.NoError(t, err) or t.Fatalf on error) so setup failures fail the test
immediately; do the same for pm.RemovePOP(...) and for the other occurrences at
the later setup lines (the calls on lines 74-75).

In `@pkg/globalrouter/pop.go`:
- Around line 432-472: In readTunnelFrame, the error returned by
localConn.Ping(ctx) is currently discarded; capture the returned error, and log
it at debug/trace level (using the repository's existing logger, e.g.,
processLogger or similar) with context like "keepalive ping failed" but do not
return the error — continue the loop so ping failures don't interrupt
processing; update the localConn.Ping(ctx) call site to check err and log the
message including the err value.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 85b0bf6b-8efe-4b86-b4ff-5d05edfbe3f6

📥 Commits

Reviewing files that changed from the base of the PR and between 38cfa08 and 0eb21ce.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (13)
  • components/coordinate/coordinate.go
  • go.mod
  • hack/test-groups.json
  • hack/test-times.json
  • pkg/cloudauth/netcheck.go
  • pkg/cloudauth/netcheck_test.go
  • pkg/globalrouter/client.go
  • pkg/globalrouter/client_test.go
  • pkg/globalrouter/envelope.go
  • pkg/globalrouter/globalrouter.go
  • pkg/globalrouter/pop.go
  • pkg/globalrouter/pop_test.go
  • servers/httpingress/httpingress.go

evanphx added 3 commits March 18, 2026 13:35
…tion

- Clear netcheckResult on failure/invalid source so stale data doesn't persist
- Gate AdditionalIPs filtering on netcheckResult != nil, not just checkedAt
- Deduplicate publicAddresses when multiple protocols share the same port
- Only skip TLS verification for POP connections to non-public IPs
The tunnel path should abort lease acquisition when the caller cancels,
unlike the HTTP request paths which intentionally outlive the client.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
pkg/globalrouter/pop.go (2)

315-316: Consider handling https://wss:// for robustness.

The URL scheme replacement only handles http://ws://. If sandbox URLs ever use HTTPS, this would produce an invalid URL.

Proposed improvement
-	sandboxWSURL := strings.Replace(tunnel.URL, "http://", "ws://", 1) + wsPath
+	sandboxWSURL := tunnel.URL + wsPath
+	if strings.HasPrefix(sandboxWSURL, "http://") {
+		sandboxWSURL = "ws://" + strings.TrimPrefix(sandboxWSURL, "http://")
+	} else if strings.HasPrefix(sandboxWSURL, "https://") {
+		sandboxWSURL = "wss://" + strings.TrimPrefix(sandboxWSURL, "https://")
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/globalrouter/pop.go` around lines 315 - 316, Replace the naive string
replacement that builds sandboxWSURL from tunnel.URL with scheme-aware logic:
parse tunnel.URL using net/url, inspect u.Scheme and set it to "ws" if "http" or
"wss" if "https" (or return an error for unknown schemes), then reconstruct the
URL and append wsPath; update the code that references sandboxWSURL (the
variable created from tunnel.URL and wsPath) so it uses this parsed-and-rebuilt
URL rather than strings.Replace.

470-472: Consider logging Ping errors for observability.

The error returned by localConn.Ping(ctx) is silently discarded. While keepalives are best-effort, logging at debug level could help diagnose connectivity issues.

Proposed improvement
 		if msgType == tunnelFrameKeepalive {
-			localConn.Ping(ctx)
+			if err := localConn.Ping(ctx); err != nil {
+				// Debug-level: keepalive pings are best-effort
+				return 0, buf, fmt.Errorf("keepalive ping failed: %w", err)
+			}
 			continue
 		}

Alternatively, just log at debug level and continue rather than returning an error.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/globalrouter/pop.go` around lines 470 - 472, The keepalive branch
discards the error from localConn.Ping(ctx); capture the returned error and log
it at debug level (do not change behavior—still continue on error) so
connectivity issues are observable; update the block handling msgType ==
tunnelFrameKeepalive to call err := localConn.Ping(ctx) and, if err != nil, emit
a debug log using the existing logger (e.g., r.logger, logger, or ctx-aware
logger used elsewhere in this file) including context that this came from the
keepalive ping before continuing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/globalrouter/pop.go`:
- Around line 315-316: Replace the naive string replacement that builds
sandboxWSURL from tunnel.URL with scheme-aware logic: parse tunnel.URL using
net/url, inspect u.Scheme and set it to "ws" if "http" or "wss" if "https" (or
return an error for unknown schemes), then reconstruct the URL and append
wsPath; update the code that references sandboxWSURL (the variable created from
tunnel.URL and wsPath) so it uses this parsed-and-rebuilt URL rather than
strings.Replace.
- Around line 470-472: The keepalive branch discards the error from
localConn.Ping(ctx); capture the returned error and log it at debug level (do
not change behavior—still continue on error) so connectivity issues are
observable; update the block handling msgType == tunnelFrameKeepalive to call
err := localConn.Ping(ctx) and, if err != nil, emit a debug log using the
existing logger (e.g., r.logger, logger, or ctx-aware logger used elsewhere in
this file) including context that this came from the keepalive ping before
continuing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7c1c3f62-567a-4871-b0ea-cd377da50bc9

📥 Commits

Reviewing files that changed from the base of the PR and between 0eb21ce and 7c9c7c4.

📒 Files selected for processing (3)
  • components/coordinate/coordinate.go
  • go.mod
  • pkg/globalrouter/pop.go

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
servers/httpingress/httpingress.go (1)

1153-1166: Consider adding a method for callers to invalidate stale leases.

When AcquireTunnel returns a cached lease that points to a dead sandbox, the caller will get a connection error. However, calling Release() only decrements the use count—it doesn't remove the lease from cache. Subsequent AcquireTunnel calls may return the same stale lease until watchInvalidations or expireLeases cleans it up.

Compare with DoRequest (lines 1002-1007) and serveAuthenticatedRequest (lines 605-613), which call invalidateAppLeases on connection errors.

An optional improvement would be to add an Invalidate() method on TunnelConn that callers can use to signal a bad lease:

💡 Optional enhancement
// Invalidate marks the lease as stale and removes it from cache.
// Call this when a connection error indicates the sandbox is dead.
func (tc *TunnelConn) Invalidate() {
	if tc.lease != nil {
		tc.server.invalidateLease(context.Background(), tc.AppID, tc.lease)
		tc.lease = nil
	}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@servers/httpingress/httpingress.go` around lines 1153 - 1166, Add an
Invalidate method on TunnelConn so callers can remove a stale cached lease when
a connection error occurs: implement func (tc *TunnelConn) Invalidate() that, if
tc.lease != nil, calls tc.server.invalidateLease(context.Background(), tc.AppID,
tc.lease) and then sets tc.lease = nil; reference existing symbols
AcquireTunnel, TunnelConn, Release, invalidateAppLeases (used by DoRequest and
serveAuthenticatedRequest) to mirror their behavior and ensure invalidation
removes the lease from the cache rather than just decrementing use count.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@servers/httpingress/httpingress.go`:
- Around line 1153-1166: Add an Invalidate method on TunnelConn so callers can
remove a stale cached lease when a connection error occurs: implement func (tc
*TunnelConn) Invalidate() that, if tc.lease != nil, calls
tc.server.invalidateLease(context.Background(), tc.AppID, tc.lease) and then
sets tc.lease = nil; reference existing symbols AcquireTunnel, TunnelConn,
Release, invalidateAppLeases (used by DoRequest and serveAuthenticatedRequest)
to mirror their behavior and ensure invalidation removes the lease from the
cache rather than just decrementing use count.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ca6b7c8d-9d3e-4dcf-896d-f8d53f40378c

📥 Commits

Reviewing files that changed from the base of the PR and between 7c9c7c4 and 09b60e0.

📒 Files selected for processing (1)
  • servers/httpingress/httpingress.go

Copy link
Copy Markdown
Contributor

@phinze phinze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude and Paul here. Boy are we impressed. Well, Paul was impressed. Claude is a robot. But if Claude could be impressed, this would do it.

We went through the full architecture chunk by chunk, cross-referenced against cloud PR mirendev/cloud#110 and RFD-52 to make sure both sides of the wire protocol agree, and even went down a rabbit hole on QUIC NAT traversal standards to sanity check our approach (tl;dr: we're building exactly the right thing for serving web traffic through edge proxies).

Paul was kind of gushing during the review and Claude didn't stop him. The layering is really clean and the details are 🤌 — sharing the UDP socket across both QUIC connections so source IP matches for phase 2 auth, the http3.Transport.Dial override to reuse an already-dialed connection, keepalive-to-ping translation in the tunnel bridge. And sneaking simplified NTP into the initial WebSocket handshake! That's the kind of "oh of course that should be there" foresight that you don't appreciate until you're debugging clock drift across distributed systems at 2am. Mad props.

We filed a few follow-up tickets for things that came up during review: coordinator decomposition, e2e test suite for the full runtime+cloud+POP path, and a reminder to converge other runtime↔cloud communication onto the WebSocket channel once this graduates from labs (we think it's destined to become the connection between runtime and cloud).

A few small things inline:

- Reset backoff after connection was up for 30s+ before dropping
- Block full /.well-known/miren/ prefix for tunnel requests
- Use net.SplitHostPort in normalizeAddr for correct IPv6 handling
- Remove per-message and per-request logging from message and forwarding loops
@evanphx evanphx merged commit b4068a5 into main Mar 23, 2026
10 checks passed
@evanphx evanphx deleted the mir-825-implement-cluster-side-of-websocket-and-pop-connec branch March 23, 2026 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants