Skip to content

fix(cli): serialize port-availability probes (#309)#310

Merged
miguel-heygen merged 2 commits intomainfrom
fix/port-scan-race
Apr 17, 2026
Merged

fix(cli): serialize port-availability probes (#309)#310
miguel-heygen merged 2 commits intomainfrom
fix/port-scan-race

Conversation

@miguel-heygen
Copy link
Copy Markdown
Collaborator

Closes #309. Full credit to @gigadeniga for the diagnosis — the root cause + proposed fix in that issue are exactly what landed here.

The bug

`npx hyperframes preview` failed deterministically on Crostini (ChromeOS Linux) with `Ports 3002–3101 are all in use`, even when nothing was actually listening on any of them.

Why

`testPortOnAllHosts` ran four probes in parallel:

```ts
const hosts = ["127.0.0.1", "0.0.0.0", "::1", "::"];
const results = await Promise.all(hosts.map((h) => isPortAvailableOnHost(port, h)));
```

Each probe binds a socket and then calls `server.close()`. Close is async — the socket stays open until its callback fires on the next event-loop tick. While it's open, the wildcard binds (`0.0.0.0`, `::`) that include the loopback address race the still-open loopback socket and return `EADDRINUSE` spuriously. On Crostini this happens 100% of the time; other Linux configs hit it intermittently; macOS is less predictable. Net effect: every port in the 100-port scan range appears busy and the preview refuses to start.

Reproduces on any Linux box with the standalone snippet from the issue:

```
127.0.0.1: OK
0.0.0.0: EADDRINUSE ← false positive
::1: OK
::: EADDRINUSE ← false positive
```

Fix

Serialize the probes. Each socket is fully closed before the next opens, eliminating the race window entirely.

```ts
for (const host of hosts) {
const available = await isPortAvailableOnHost(port, host);
if (!available) return false;
}
return true;
```

Kept the four-host check rather than collapsing to just `0.0.0.0` + `::` — the multi-host coverage is load-bearing for the devbox / SSH-forwarding case where a port is free on loopback but held on the wildcard. Sequentializing is the smaller, less-behaviourally-affecting fix.

Regression tests

`packages/cli/src/server/portUtils.test.ts` — three cases binding real sockets, no mocks:

  • Returns true for a genuinely free port — directly reproduces the Crostini bug; would fail on Linux against the parallel implementation.
  • Returns false when the port is occupied on `0.0.0.0` — confirms the multi-host check still catches the devbox scenario.
  • Releases each probe socket before the next run — two back-to-back calls for the same free port both return true, pinning the sequential contract against future refactors that might try to reparallelize for perf.

Test plan

  • `bunx vitest run packages/cli/src/server/portUtils.test.ts` — 3/3 pass
  • Full CLI suite — 109/109 pass
  • `tsc --noEmit` clean

Notes

  • Independent of any version bump; ship whenever.
  • Probing 4 hosts serially adds at most ~tens of milliseconds per port on the scan (binds are very fast on loopback). The worst-case cost shows up when the first port in the range is free — previously 1 parallel round-trip, now 4 sequential — and it's imperceptible (`preview` bind is a one-time startup cost, not a hot path).

testPortOnAllHosts ran the four host probes (127.0.0.1 / 0.0.0.0 / ::1 / ::)
in parallel via Promise.all. The first bind on 127.0.0.1 held its socket
open until server.close() resolved on the next event-loop tick, and while
it was open the wildcard 0.0.0.0 / :: probes raced it and got EADDRINUSE.
Result: every port in the 3002–3101 scan range looked occupied and the
preview server refused to start — deterministic on Linux (Crostini on
ChromeOS in the reporting environment, gigadeniga's issue #309).

Fix is to test the hosts sequentially so each probe's socket is fully
closed before the next opens. The multi-host check itself stays — it's
load-bearing for devbox / SSH-forwarding layouts where a port is free
on loopback but held on the wildcard.

Regression tests bind real sockets (no mocks) and would have failed
against the parallel implementation on any Linux shard.
Strengthens the regression gate from issue #309. The original tests
binding real sockets only caught the bug on Linux (Crostini and likely
Ubuntu), not macOS where wildcard-vs-loopback parallel binds happen
to succeed. That left CI as the only gate and only if the CI runner's
kernel reproduces the race.

testPortOnAllHosts now takes an optional injectable probe. The contract
test passes a recording fake that holds each call for 20ms and tracks
in-flight count. Buggy parallel code drives it to 4; the sequential fix
keeps it at 1. Verified by reverting the fix in place and confirming
the contract test fails with expected=1 / received=4 — then restored
and all 4 tests pass.

Production callers pass no probe and get the real socket check,
unchanged behaviour.
Copy link
Copy Markdown
Collaborator

@jrusso1020 jrusso1020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ship it. Well-diagnosed, minimally-scoped fix with a thoughtful test strategy. Notes below are non-blocking.

What's right

  • Root cause is correct and clearly explained. Serializing eliminates the race window without changing semantics.
  • Right call keeping all four hosts rather than collapsing to 0.0.0.0 + :: — multi-host coverage is load-bearing for SSH-forwarded devbox cases.
  • Test strategy honestly labels the real-socket tests as OS-dependent and makes the injectable-probe contract test the actual regression gate. Verifying peak concurrency 4 vs 1 against the reverted fix is the rigor I want to see.
  • PR description is exemplary — reproduces the failure, explains the alternative considered and rejected, quantifies the cost, and credits the reporter.

Substantive (non-blocking)

  1. Injected probe parameter on the exported API is test infra leaking into production surface. The second parameter exists solely for the contract test and widens the public signature. Prefer either:

    • extract an internal _testPortSequential(probe, hosts, port) helper and have the public testPortOnAllHosts call it bound to isPortAvailableOnHost; or
    • keep the current shape but mark the parameter @internal in JSDoc so consumers don't start relying on it.
  2. allocFreePort() is probabilistic. BASE + random(1000) can collide with anything else on the system and with parallel shards. The idiomatic fix: bind port: 0, read server.address().port, close, return that — turns "runway so parallel test shards don't collide" from a hope into a guarantee.

  3. The second real-socket test doesn't test what its name says. It binds a blocker on 0.0.0.0, but the probe order starts with 127.0.0.1. On most kernels without SO_REUSEADDR, the first probe is the one that fails (wildcard conflicts with loopback). The result === false assertion still passes, but the test is exercising "0.0.0.0 bound ⇒ 127.0.0.1 also unavailable," not "testPortOnAllHosts correctly detects occupation on 0.0.0.0." A blocker on a host later in the probe order (e.g., ::1) would more directly exercise the multi-host contract.

  4. No positive test for the devbox scenario the multi-host check exists for — "free on loopback, held on wildcard." That's the entire justification for keeping four hosts. Since the short-circuit test already uses the injectable probe, one more case there (host === "127.0.0.1" ? true : host === "0.0.0.0" ? false : ...) would lock in the semantics cheaply.

Nits

  • PORT_PROBE_HOSTS is a readonly tuple — good. Consider @internal JSDoc since it's only meaningful to tests.
  • The JSDoc on testPortOnAllHosts is the kind of why-comment that earns its keep.

Items 1 and 4 are the two I'd actually want addressed (here or as a fast follow-up) — one is public-API shape, the other leaves the stated justification untested. Everything else is a nit.

@miguel-heygen miguel-heygen merged commit e4cfcd3 into main Apr 17, 2026
12 of 13 checks passed
@jrusso1020 jrusso1020 mentioned this pull request Apr 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

npx hyperframes preview always fails on Crostini (ChromeOS Linux) + port scan race condition with wildcard addresses

2 participants