Skip to content

[Security] ClusterClient askId predictability via Date.now()+counter #120

@pathosDev

Description

@pathosDev

Severity / Size

  • Severity: HIGH — an attacker who can inject a single frame onto the cluster-client TCP socket (MitM on cleartext tcp://, compromised peer, or a malicious cluster node sitting between the client's contact-point and the eventual target) can resolve in-flight asks with attacker-chosen payloads.
  • Size: S (~1d).
  • Threat model: anyone with frame-level write access to the wire — MitM on plaintext, malicious peer, on-path observer + injector. TLS for the cluster transport (which exists today via TlsTransportSettings) closes the network-injection path, but doesn't help against a compromised cluster peer that can issue frames legitimately.

Affected files

  • src/cluster/ClusterClient.ts:82-86nextAskId() generator.
  • src/cluster/ClusterClient.ts:153-180ask() registers the pending callback under that ID and waits for cluster-client-reply { askId, ... }.
  • src/cluster/ClusterClient.ts:270-300 — reply-handler: looks up pending.get(askId) and resolves the promise.

Background

ClusterClient is the outside-in handle landed in v0.8.0 (#86 / commit 5567dc5). It exchanges cluster-client-envelope / cluster-client-reply frames with a ClusterClientReceptionist on the cluster side. Each ask() call generates an askId used to route the matching reply back to the right pending-promise.

The current generator is:

let _askCounter = 0;
function nextAskId(): string {
  _askCounter = (_askCounter + 1) >>> 0;
  return `c${Date.now()}-${_askCounter}`;
}

Two predictable inputs: millisecond wallclock + monotonic counter starting at 0. Both are observable from outside the process — a single captured frame gives the attacker a tight window for the next 100+ askIds.

Exploit walkthrough

Setup: legitimate ClusterClient connected to a cluster. Attacker is on the network path (or in a malicious cluster node that observes traffic).

Step 1 — observation: attacker captures one outbound frame from the client. The frame contains askId: c1715000000123-7 (let's say). Attacker now knows the current _askCounter = 7 and the current Date.now() epoch on the client process.

Step 2 — prediction: any ask issued in the same millisecond would be c1715000000123-8. An ask within ~1ms would be c1715000000123-8 or c1715000000124-8 (jitter). The attacker can enumerate plausible next-askIds: 2 × 5 = 10 candidates.

Step 3 — pre-emptive injection: attacker sends a forged cluster-client-reply frame for each predicted askId, each with attacker-chosen body. The client's reply-handler matches askId to the pending promise and resolves with the forged body. The legitimate reply, when it eventually arrives, finds nothing in the pending map and is silently dropped.

Step 4 — impact: any ask() consumer that trusts the reply is now operating on attacker-injected data. For an actor coordinating cluster-wide invariants (e.g. "is X allowed?", "what's the current configuration?"), this is total compromise of the ask channel.

The exploit requires network-frame injection. It does not require breaking TLS — a compromised cluster node has legitimate frame-write access by definition. And cleartext tcp:// deployments are fully exposed.

How the 8 already-landed security fixes inform this

  • Hello-handshake hijack (9c3b005): first-conn-wins on the TCP transport. That fix closed the byPeer-overwrite path; the askId fix closes the orthogonal reply-injection path on the same wire. Both fixes harden the v0.8.0 ClusterClient feature.
  • FrameDecoder size cap (d454079): used the pattern "validate at the entry-point, throw before allocating". Same pattern fits here: validate askId entropy at generation; bind reply to socket at handling.
  • Idempotency body-fingerprint (4cac92a): bound a cached response to the request that produced it via SHA-256 of method+path+body. Same idea reused here: bind a reply to the connection it must come back on, plus an unguessable correlation ID.

Fix design

Two complementary defenses, both small.

Defense A — unguessable askId (primary).

Replace nextAskId() with crypto-random:

function nextAskId(): string {
  // 128 bits of entropy — same as a UUID, but encoded compactly.
  // Prefer crypto.randomUUID where available; fall back to
  // getRandomValues + base64url for older runtimes.
  if (typeof globalThis.crypto?.randomUUID === 'function') {
    return globalThis.crypto.randomUUID();
  }
  const bytes = new Uint8Array(16);
  globalThis.crypto.getRandomValues(bytes);
  // base64url, no padding — 22 chars
  return base64urlEncode(bytes);
}

crypto.randomUUID() is available on Node 19+, Bun 1.0+, Deno 2.0+ — every runtime we already support. The fallback covers the older-runtime path.

Defense B — bind reply to socket (defense-in-depth).

The pending map is currently keyed only by askId. Augment it to include the socket the ask was sent on:

this.pending.set(askId, {
  socket: this.socket,  // capture the socket reference at ask-time
  resolve, reject, timer,
});

In the reply-handler, the frame's askId is the lookup key but the resolving condition is pending.socket === currentSocket. If the socket has changed (reconnect), the pending askId is invalidated.

This protects against the scenario where the attacker can inject frames on a different socket while spoofing the askId — which becomes irrelevant in practice once Defense A makes askIds unguessable, but is cheap defense-in-depth.

Receptionist side (symmetric).

The ClusterClientReceptionist echoes the askId from the envelope into the reply. It already trusts the envelope's askId without validation; that's fine on the cluster side since the cluster doesn't track in-flight asks — but for completeness, the receptionist should refuse to echo askId longer than a reasonable cap (say 256 chars) to prevent body-bloat attacks.

API surface

No public API changes. ClusterClient.ask() keeps its signature. The fix is internal to the askId generator + the pending-map shape.

Backward compatibility

The askId format changes from c1715000000123-7 to either a UUID (a1b2c3...) or 22-char base64url. Cluster-side receptionist parses the askId opaquely — it's just an echo field — so no change there. No on-the-wire breaking change.

Test plan

Three tests, again mirroring the established Security-Fix style:

  1. Exploit test (tests/multi-node/cluster-client-security.test.ts): capture an outgoing ask frame from a real ClusterClient; enumerate the next 20 plausible askIds based on Date.now() + counter; inject a forged cluster-client-reply for each. Pre-fix: the legit ask resolves with attacker payload; this test would have failed-as-expected on the old generator. Post-fix: attacker enumeration fails (no useful prediction); the legit reply lands.

  2. Defense test: generate 100K askIds in a tight loop; verify pairwise uniqueness; no Date.now()-prefix detectable; no monotonic counter pattern.

  3. Reconnect-test: ask issued on socket A; socket A drops; client reconnects on socket B; pending ask is invalidated (rejected with ConnectionLostError) rather than being matched by a reply on socket B. Validates Defense B.

  4. Regression: existing tests/multi-node/cluster-client.test.ts tests still pass.

Acceptance criteria

  • nextAskId() uses crypto.randomUUID() (or getRandomValues fallback).
  • Pending map captures the socket at ask-time; reply-handler verifies socket match.
  • Exploit test demonstrates the pre-fix vulnerability and the post-fix block.
  • Defense test: 100K-askId uniqueness + entropy distribution.
  • Reconnect test: cross-socket reply rejected.
  • No public API change; receptionist symmetric (defensive askId length cap optional).
  • Plan-doc + README "Known security caveats" entry updated on land.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpriority: highTop priority — high impact, plan nextsecuritySecurity-relevant — see severity label for impact tierseverity: highSignificant impact, exploitable in standard threat model

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions