Severity / Size
- Severity: HIGH — an attacker who can inject a single frame onto the cluster-client TCP socket (MitM on cleartext
tcp://, compromised peer, or a malicious cluster node sitting between the client's contact-point and the eventual target) can resolve in-flight asks with attacker-chosen payloads.
- Size: S (~1d).
- Threat model: anyone with frame-level write access to the wire — MitM on plaintext, malicious peer, on-path observer + injector. TLS for the cluster transport (which exists today via
TlsTransportSettings) closes the network-injection path, but doesn't help against a compromised cluster peer that can issue frames legitimately.
Affected files
src/cluster/ClusterClient.ts:82-86 — nextAskId() generator.
src/cluster/ClusterClient.ts:153-180 — ask() registers the pending callback under that ID and waits for cluster-client-reply { askId, ... }.
src/cluster/ClusterClient.ts:270-300 — reply-handler: looks up pending.get(askId) and resolves the promise.
Background
ClusterClient is the outside-in handle landed in v0.8.0 (#86 / commit 5567dc5). It exchanges cluster-client-envelope / cluster-client-reply frames with a ClusterClientReceptionist on the cluster side. Each ask() call generates an askId used to route the matching reply back to the right pending-promise.
The current generator is:
let _askCounter = 0;
function nextAskId(): string {
_askCounter = (_askCounter + 1) >>> 0;
return `c${Date.now()}-${_askCounter}`;
}
Two predictable inputs: millisecond wallclock + monotonic counter starting at 0. Both are observable from outside the process — a single captured frame gives the attacker a tight window for the next 100+ askIds.
Exploit walkthrough
Setup: legitimate ClusterClient connected to a cluster. Attacker is on the network path (or in a malicious cluster node that observes traffic).
Step 1 — observation: attacker captures one outbound frame from the client. The frame contains askId: c1715000000123-7 (let's say). Attacker now knows the current _askCounter = 7 and the current Date.now() epoch on the client process.
Step 2 — prediction: any ask issued in the same millisecond would be c1715000000123-8. An ask within ~1ms would be c1715000000123-8 or c1715000000124-8 (jitter). The attacker can enumerate plausible next-askIds: 2 × 5 = 10 candidates.
Step 3 — pre-emptive injection: attacker sends a forged cluster-client-reply frame for each predicted askId, each with attacker-chosen body. The client's reply-handler matches askId to the pending promise and resolves with the forged body. The legitimate reply, when it eventually arrives, finds nothing in the pending map and is silently dropped.
Step 4 — impact: any ask() consumer that trusts the reply is now operating on attacker-injected data. For an actor coordinating cluster-wide invariants (e.g. "is X allowed?", "what's the current configuration?"), this is total compromise of the ask channel.
The exploit requires network-frame injection. It does not require breaking TLS — a compromised cluster node has legitimate frame-write access by definition. And cleartext tcp:// deployments are fully exposed.
How the 8 already-landed security fixes inform this
- Hello-handshake hijack (
9c3b005): first-conn-wins on the TCP transport. That fix closed the byPeer-overwrite path; the askId fix closes the orthogonal reply-injection path on the same wire. Both fixes harden the v0.8.0 ClusterClient feature.
- FrameDecoder size cap (
d454079): used the pattern "validate at the entry-point, throw before allocating". Same pattern fits here: validate askId entropy at generation; bind reply to socket at handling.
- Idempotency body-fingerprint (
4cac92a): bound a cached response to the request that produced it via SHA-256 of method+path+body. Same idea reused here: bind a reply to the connection it must come back on, plus an unguessable correlation ID.
Fix design
Two complementary defenses, both small.
Defense A — unguessable askId (primary).
Replace nextAskId() with crypto-random:
function nextAskId(): string {
// 128 bits of entropy — same as a UUID, but encoded compactly.
// Prefer crypto.randomUUID where available; fall back to
// getRandomValues + base64url for older runtimes.
if (typeof globalThis.crypto?.randomUUID === 'function') {
return globalThis.crypto.randomUUID();
}
const bytes = new Uint8Array(16);
globalThis.crypto.getRandomValues(bytes);
// base64url, no padding — 22 chars
return base64urlEncode(bytes);
}
crypto.randomUUID() is available on Node 19+, Bun 1.0+, Deno 2.0+ — every runtime we already support. The fallback covers the older-runtime path.
Defense B — bind reply to socket (defense-in-depth).
The pending map is currently keyed only by askId. Augment it to include the socket the ask was sent on:
this.pending.set(askId, {
socket: this.socket, // capture the socket reference at ask-time
resolve, reject, timer,
});
In the reply-handler, the frame's askId is the lookup key but the resolving condition is pending.socket === currentSocket. If the socket has changed (reconnect), the pending askId is invalidated.
This protects against the scenario where the attacker can inject frames on a different socket while spoofing the askId — which becomes irrelevant in practice once Defense A makes askIds unguessable, but is cheap defense-in-depth.
Receptionist side (symmetric).
The ClusterClientReceptionist echoes the askId from the envelope into the reply. It already trusts the envelope's askId without validation; that's fine on the cluster side since the cluster doesn't track in-flight asks — but for completeness, the receptionist should refuse to echo askId longer than a reasonable cap (say 256 chars) to prevent body-bloat attacks.
API surface
No public API changes. ClusterClient.ask() keeps its signature. The fix is internal to the askId generator + the pending-map shape.
Backward compatibility
The askId format changes from c1715000000123-7 to either a UUID (a1b2c3...) or 22-char base64url. Cluster-side receptionist parses the askId opaquely — it's just an echo field — so no change there. No on-the-wire breaking change.
Test plan
Three tests, again mirroring the established Security-Fix style:
-
Exploit test (tests/multi-node/cluster-client-security.test.ts): capture an outgoing ask frame from a real ClusterClient; enumerate the next 20 plausible askIds based on Date.now() + counter; inject a forged cluster-client-reply for each. Pre-fix: the legit ask resolves with attacker payload; this test would have failed-as-expected on the old generator. Post-fix: attacker enumeration fails (no useful prediction); the legit reply lands.
-
Defense test: generate 100K askIds in a tight loop; verify pairwise uniqueness; no Date.now()-prefix detectable; no monotonic counter pattern.
-
Reconnect-test: ask issued on socket A; socket A drops; client reconnects on socket B; pending ask is invalidated (rejected with ConnectionLostError) rather than being matched by a reply on socket B. Validates Defense B.
-
Regression: existing tests/multi-node/cluster-client.test.ts tests still pass.
Acceptance criteria
Severity / Size
tcp://, compromised peer, or a malicious cluster node sitting between the client's contact-point and the eventual target) can resolve in-flight asks with attacker-chosen payloads.TlsTransportSettings) closes the network-injection path, but doesn't help against a compromised cluster peer that can issue frames legitimately.Affected files
src/cluster/ClusterClient.ts:82-86—nextAskId()generator.src/cluster/ClusterClient.ts:153-180—ask()registers the pending callback under that ID and waits forcluster-client-reply { askId, ... }.src/cluster/ClusterClient.ts:270-300— reply-handler: looks uppending.get(askId)and resolves the promise.Background
ClusterClientis the outside-in handle landed in v0.8.0 (#86 / commit5567dc5). It exchangescluster-client-envelope/cluster-client-replyframes with aClusterClientReceptioniston the cluster side. Eachask()call generates anaskIdused to route the matching reply back to the right pending-promise.The current generator is:
Two predictable inputs: millisecond wallclock + monotonic counter starting at 0. Both are observable from outside the process — a single captured frame gives the attacker a tight window for the next 100+ askIds.
Exploit walkthrough
Setup: legitimate
ClusterClientconnected to a cluster. Attacker is on the network path (or in a malicious cluster node that observes traffic).Step 1 — observation: attacker captures one outbound frame from the client. The frame contains
askId: c1715000000123-7(let's say). Attacker now knows the current_askCounter = 7and the currentDate.now()epoch on the client process.Step 2 — prediction: any ask issued in the same millisecond would be
c1715000000123-8. An ask within ~1ms would bec1715000000123-8orc1715000000124-8(jitter). The attacker can enumerate plausible next-askIds: 2 × 5 = 10 candidates.Step 3 — pre-emptive injection: attacker sends a forged
cluster-client-replyframe for each predicted askId, each with attacker-chosenbody. The client's reply-handler matchesaskIdto the pending promise and resolves with the forged body. The legitimate reply, when it eventually arrives, finds nothing in the pending map and is silently dropped.Step 4 — impact: any
ask()consumer that trusts the reply is now operating on attacker-injected data. For an actor coordinating cluster-wide invariants (e.g. "is X allowed?", "what's the current configuration?"), this is total compromise of the ask channel.The exploit requires network-frame injection. It does not require breaking TLS — a compromised cluster node has legitimate frame-write access by definition. And cleartext
tcp://deployments are fully exposed.How the 8 already-landed security fixes inform this
9c3b005): first-conn-wins on the TCP transport. That fix closed the byPeer-overwrite path; the askId fix closes the orthogonal reply-injection path on the same wire. Both fixes harden the v0.8.0 ClusterClient feature.d454079): used the pattern "validate at the entry-point, throw before allocating". Same pattern fits here: validate askId entropy at generation; bind reply to socket at handling.4cac92a): bound a cached response to the request that produced it via SHA-256 of method+path+body. Same idea reused here: bind a reply to the connection it must come back on, plus an unguessable correlation ID.Fix design
Two complementary defenses, both small.
Defense A — unguessable askId (primary).
Replace
nextAskId()with crypto-random:crypto.randomUUID()is available on Node 19+, Bun 1.0+, Deno 2.0+ — every runtime we already support. The fallback covers the older-runtime path.Defense B — bind reply to socket (defense-in-depth).
The
pendingmap is currently keyed only byaskId. Augment it to include the socket the ask was sent on:In the reply-handler, the frame's
askIdis the lookup key but the resolving condition ispending.socket === currentSocket. If the socket has changed (reconnect), the pending askId is invalidated.This protects against the scenario where the attacker can inject frames on a different socket while spoofing the askId — which becomes irrelevant in practice once Defense A makes askIds unguessable, but is cheap defense-in-depth.
Receptionist side (symmetric).
The
ClusterClientReceptionistechoes theaskIdfrom the envelope into the reply. It already trusts the envelope's askId without validation; that's fine on the cluster side since the cluster doesn't track in-flight asks — but for completeness, the receptionist should refuse to echoaskIdlonger than a reasonable cap (say 256 chars) to prevent body-bloat attacks.API surface
No public API changes.
ClusterClient.ask()keeps its signature. The fix is internal to the askId generator + the pending-map shape.Backward compatibility
The askId format changes from
c1715000000123-7to either a UUID (a1b2c3...) or 22-char base64url. Cluster-side receptionist parses theaskIdopaquely — it's just an echo field — so no change there. No on-the-wire breaking change.Test plan
Three tests, again mirroring the established Security-Fix style:
Exploit test (
tests/multi-node/cluster-client-security.test.ts): capture an outgoing ask frame from a realClusterClient; enumerate the next 20 plausible askIds based on Date.now() + counter; inject a forgedcluster-client-replyfor each. Pre-fix: the legit ask resolves with attacker payload; this test would have failed-as-expected on the old generator. Post-fix: attacker enumeration fails (no useful prediction); the legit reply lands.Defense test: generate 100K askIds in a tight loop; verify pairwise uniqueness; no Date.now()-prefix detectable; no monotonic counter pattern.
Reconnect-test: ask issued on socket A; socket A drops; client reconnects on socket B; pending ask is invalidated (rejected with
ConnectionLostError) rather than being matched by a reply on socket B. Validates Defense B.Regression: existing
tests/multi-node/cluster-client.test.tstests still pass.Acceptance criteria
nextAskId()usescrypto.randomUUID()(orgetRandomValuesfallback).