fix(cli): serialize port-availability probes (#309) by miguel-heygen · Pull Request #310 · heygen-com/hyperframes

miguel-heygen · 2026-04-17T22:00:18Z

Closes #309. Full credit to @gigadeniga for the diagnosis — the root cause + proposed fix in that issue are exactly what landed here.

The bug

`npx hyperframes preview` failed deterministically on Crostini (ChromeOS Linux) with `Ports 3002–3101 are all in use`, even when nothing was actually listening on any of them.

Why

`testPortOnAllHosts` ran four probes in parallel:

```ts
const hosts = ["127.0.0.1", "0.0.0.0", "::1", "::"];
const results = await Promise.all(hosts.map((h) => isPortAvailableOnHost(port, h)));
```

Each probe binds a socket and then calls `server.close()`. Close is async — the socket stays open until its callback fires on the next event-loop tick. While it's open, the wildcard binds (`0.0.0.0`, `::`) that include the loopback address race the still-open loopback socket and return `EADDRINUSE` spuriously. On Crostini this happens 100% of the time; other Linux configs hit it intermittently; macOS is less predictable. Net effect: every port in the 100-port scan range appears busy and the preview refuses to start.

Reproduces on any Linux box with the standalone snippet from the issue:

```
127.0.0.1: OK
0.0.0.0: EADDRINUSE ← false positive
::1: OK
::: EADDRINUSE ← false positive
```

Fix

Serialize the probes. Each socket is fully closed before the next opens, eliminating the race window entirely.

```ts
for (const host of hosts) {
const available = await isPortAvailableOnHost(port, host);
if (!available) return false;
}
return true;
```

Kept the four-host check rather than collapsing to just `0.0.0.0` + `::` — the multi-host coverage is load-bearing for the devbox / SSH-forwarding case where a port is free on loopback but held on the wildcard. Sequentializing is the smaller, less-behaviourally-affecting fix.

Regression tests

`packages/cli/src/server/portUtils.test.ts` — three cases binding real sockets, no mocks:

Returns true for a genuinely free port — directly reproduces the Crostini bug; would fail on Linux against the parallel implementation.
Returns false when the port is occupied on `0.0.0.0` — confirms the multi-host check still catches the devbox scenario.
Releases each probe socket before the next run — two back-to-back calls for the same free port both return true, pinning the sequential contract against future refactors that might try to reparallelize for perf.

Test plan

`bunx vitest run packages/cli/src/server/portUtils.test.ts` — 3/3 pass
Full CLI suite — 109/109 pass
`tsc --noEmit` clean

Notes

Independent of any version bump; ship whenever.
Probing 4 hosts serially adds at most ~tens of milliseconds per port on the scan (binds are very fast on loopback). The worst-case cost shows up when the first port in the range is free — previously 1 parallel round-trip, now 4 sequential — and it's imperceptible (`preview` bind is a one-time startup cost, not a hot path).

testPortOnAllHosts ran the four host probes (127.0.0.1 / 0.0.0.0 / ::1 / ::) in parallel via Promise.all. The first bind on 127.0.0.1 held its socket open until server.close() resolved on the next event-loop tick, and while it was open the wildcard 0.0.0.0 / :: probes raced it and got EADDRINUSE. Result: every port in the 3002–3101 scan range looked occupied and the preview server refused to start — deterministic on Linux (Crostini on ChromeOS in the reporting environment, gigadeniga's issue #309). Fix is to test the hosts sequentially so each probe's socket is fully closed before the next opens. The multi-host check itself stays — it's load-bearing for devbox / SSH-forwarding layouts where a port is free on loopback but held on the wildcard. Regression tests bind real sockets (no mocks) and would have failed against the parallel implementation on any Linux shard.

Strengthens the regression gate from issue #309. The original tests binding real sockets only caught the bug on Linux (Crostini and likely Ubuntu), not macOS where wildcard-vs-loopback parallel binds happen to succeed. That left CI as the only gate and only if the CI runner's kernel reproduces the race. testPortOnAllHosts now takes an optional injectable probe. The contract test passes a recording fake that holds each call for 20ms and tracks in-flight count. Buggy parallel code drives it to 4; the sequential fix keeps it at 1. Verified by reverting the fix in place and confirming the contract test fails with expected=1 / received=4 — then restored and all 4 tests pass. Production callers pass no probe and get the real socket check, unchanged behaviour.

jrusso1020

Ship it. Well-diagnosed, minimally-scoped fix with a thoughtful test strategy. Notes below are non-blocking.

What's right

Root cause is correct and clearly explained. Serializing eliminates the race window without changing semantics.
Right call keeping all four hosts rather than collapsing to 0.0.0.0 + :: — multi-host coverage is load-bearing for SSH-forwarded devbox cases.
Test strategy honestly labels the real-socket tests as OS-dependent and makes the injectable-probe contract test the actual regression gate. Verifying peak concurrency 4 vs 1 against the reverted fix is the rigor I want to see.
PR description is exemplary — reproduces the failure, explains the alternative considered and rejected, quantifies the cost, and credits the reporter.

Substantive (non-blocking)

Injected probe parameter on the exported API is test infra leaking into production surface. The second parameter exists solely for the contract test and widens the public signature. Prefer either:
- extract an internal _testPortSequential(probe, hosts, port) helper and have the public testPortOnAllHosts call it bound to isPortAvailableOnHost; or
- keep the current shape but mark the parameter @internal in JSDoc so consumers don't start relying on it.
allocFreePort() is probabilistic. BASE + random(1000) can collide with anything else on the system and with parallel shards. The idiomatic fix: bind port: 0, read server.address().port, close, return that — turns "runway so parallel test shards don't collide" from a hope into a guarantee.
The second real-socket test doesn't test what its name says. It binds a blocker on 0.0.0.0, but the probe order starts with 127.0.0.1. On most kernels without SO_REUSEADDR, the first probe is the one that fails (wildcard conflicts with loopback). The result === false assertion still passes, but the test is exercising "0.0.0.0 bound ⇒ 127.0.0.1 also unavailable," not "testPortOnAllHosts correctly detects occupation on 0.0.0.0." A blocker on a host later in the probe order (e.g., ::1) would more directly exercise the multi-host contract.
No positive test for the devbox scenario the multi-host check exists for — "free on loopback, held on wildcard." That's the entire justification for keeping four hosts. Since the short-circuit test already uses the injectable probe, one more case there (host === "127.0.0.1" ? true : host === "0.0.0.0" ? false : ...) would lock in the semantics cheaply.

Nits

PORT_PROBE_HOSTS is a readonly tuple — good. Consider @internal JSDoc since it's only meaningful to tests.
The JSDoc on testPortOnAllHosts is the kind of why-comment that earns its keep.

Items 1 and 4 are the two I'd actually want addressed (here or as a fast follow-up) — one is public-API shape, the other leaves the stated justification untested. Everything else is a nit.

miguel-heygen added 2 commits April 17, 2026 23:59

jrusso1020 approved these changes Apr 17, 2026

View reviewed changes

miguel-heygen merged commit e4cfcd3 into main Apr 17, 2026
12 of 13 checks passed

jrusso1020 mentioned this pull request Apr 18, 2026

chore: release v0.4.4 #315

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cli): serialize port-availability probes (#309)#310

fix(cli): serialize port-availability probes (#309)#310
miguel-heygen merged 2 commits intomainfrom
fix/port-scan-race

miguel-heygen commented Apr 17, 2026

Uh oh!

jrusso1020 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

miguel-heygen commented Apr 17, 2026

The bug

Why

Fix

Regression tests

Test plan

Notes

Uh oh!

jrusso1020 left a comment

Choose a reason for hiding this comment

What's right

Substantive (non-blocking)

Nits

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants