fix(cli): serialize port-availability probes (#309)#310
Conversation
testPortOnAllHosts ran the four host probes (127.0.0.1 / 0.0.0.0 / ::1 / ::) in parallel via Promise.all. The first bind on 127.0.0.1 held its socket open until server.close() resolved on the next event-loop tick, and while it was open the wildcard 0.0.0.0 / :: probes raced it and got EADDRINUSE. Result: every port in the 3002–3101 scan range looked occupied and the preview server refused to start — deterministic on Linux (Crostini on ChromeOS in the reporting environment, gigadeniga's issue #309). Fix is to test the hosts sequentially so each probe's socket is fully closed before the next opens. The multi-host check itself stays — it's load-bearing for devbox / SSH-forwarding layouts where a port is free on loopback but held on the wildcard. Regression tests bind real sockets (no mocks) and would have failed against the parallel implementation on any Linux shard.
Strengthens the regression gate from issue #309. The original tests binding real sockets only caught the bug on Linux (Crostini and likely Ubuntu), not macOS where wildcard-vs-loopback parallel binds happen to succeed. That left CI as the only gate and only if the CI runner's kernel reproduces the race. testPortOnAllHosts now takes an optional injectable probe. The contract test passes a recording fake that holds each call for 20ms and tracks in-flight count. Buggy parallel code drives it to 4; the sequential fix keeps it at 1. Verified by reverting the fix in place and confirming the contract test fails with expected=1 / received=4 — then restored and all 4 tests pass. Production callers pass no probe and get the real socket check, unchanged behaviour.
jrusso1020
left a comment
There was a problem hiding this comment.
Ship it. Well-diagnosed, minimally-scoped fix with a thoughtful test strategy. Notes below are non-blocking.
What's right
- Root cause is correct and clearly explained. Serializing eliminates the race window without changing semantics.
- Right call keeping all four hosts rather than collapsing to
0.0.0.0+::— multi-host coverage is load-bearing for SSH-forwarded devbox cases. - Test strategy honestly labels the real-socket tests as OS-dependent and makes the injectable-probe contract test the actual regression gate. Verifying peak concurrency 4 vs 1 against the reverted fix is the rigor I want to see.
- PR description is exemplary — reproduces the failure, explains the alternative considered and rejected, quantifies the cost, and credits the reporter.
Substantive (non-blocking)
-
Injected
probeparameter on the exported API is test infra leaking into production surface. The second parameter exists solely for the contract test and widens the public signature. Prefer either:- extract an internal
_testPortSequential(probe, hosts, port)helper and have the publictestPortOnAllHostscall it bound toisPortAvailableOnHost; or - keep the current shape but mark the parameter
@internalin JSDoc so consumers don't start relying on it.
- extract an internal
-
allocFreePort()is probabilistic.BASE + random(1000)can collide with anything else on the system and with parallel shards. The idiomatic fix: bindport: 0, readserver.address().port, close, return that — turns "runway so parallel test shards don't collide" from a hope into a guarantee. -
The second real-socket test doesn't test what its name says. It binds a blocker on
0.0.0.0, but the probe order starts with127.0.0.1. On most kernels withoutSO_REUSEADDR, the first probe is the one that fails (wildcard conflicts with loopback). Theresult === falseassertion still passes, but the test is exercising "0.0.0.0 bound ⇒ 127.0.0.1 also unavailable," not "testPortOnAllHostscorrectly detects occupation on0.0.0.0." A blocker on a host later in the probe order (e.g.,::1) would more directly exercise the multi-host contract. -
No positive test for the devbox scenario the multi-host check exists for — "free on loopback, held on wildcard." That's the entire justification for keeping four hosts. Since the short-circuit test already uses the injectable probe, one more case there (
host === "127.0.0.1" ? true : host === "0.0.0.0" ? false : ...) would lock in the semantics cheaply.
Nits
PORT_PROBE_HOSTSis areadonlytuple — good. Consider@internalJSDoc since it's only meaningful to tests.- The JSDoc on
testPortOnAllHostsis the kind of why-comment that earns its keep.
Items 1 and 4 are the two I'd actually want addressed (here or as a fast follow-up) — one is public-API shape, the other leaves the stated justification untested. Everything else is a nit.
Closes #309. Full credit to @gigadeniga for the diagnosis — the root cause + proposed fix in that issue are exactly what landed here.
The bug
`npx hyperframes preview` failed deterministically on Crostini (ChromeOS Linux) with `Ports 3002–3101 are all in use`, even when nothing was actually listening on any of them.
Why
`testPortOnAllHosts` ran four probes in parallel:
```ts
const hosts = ["127.0.0.1", "0.0.0.0", "::1", "::"];
const results = await Promise.all(hosts.map((h) => isPortAvailableOnHost(port, h)));
```
Each probe binds a socket and then calls `server.close()`. Close is async — the socket stays open until its callback fires on the next event-loop tick. While it's open, the wildcard binds (`0.0.0.0`, `::`) that include the loopback address race the still-open loopback socket and return `EADDRINUSE` spuriously. On Crostini this happens 100% of the time; other Linux configs hit it intermittently; macOS is less predictable. Net effect: every port in the 100-port scan range appears busy and the preview refuses to start.
Reproduces on any Linux box with the standalone snippet from the issue:
```
127.0.0.1: OK
0.0.0.0: EADDRINUSE ← false positive
::1: OK
::: EADDRINUSE ← false positive
```
Fix
Serialize the probes. Each socket is fully closed before the next opens, eliminating the race window entirely.
```ts
for (const host of hosts) {
const available = await isPortAvailableOnHost(port, host);
if (!available) return false;
}
return true;
```
Kept the four-host check rather than collapsing to just `0.0.0.0` + `::` — the multi-host coverage is load-bearing for the devbox / SSH-forwarding case where a port is free on loopback but held on the wildcard. Sequentializing is the smaller, less-behaviourally-affecting fix.
Regression tests
`packages/cli/src/server/portUtils.test.ts` — three cases binding real sockets, no mocks:
Test plan
Notes