Skip to content

Guard nextWrite against null socket after async close#1168

Open
buttilda wants to merge 1 commit into
porsager:masterfrom
buttilda:guard-nextwrite-against-null-socket
Open

Guard nextWrite against null socket after async close#1168
buttilda wants to merge 1 commit into
porsager:masterfrom
buttilda:guard-nextwrite-against-null-socket

Conversation

@buttilda
Copy link
Copy Markdown

nextWrite can be scheduled via setImmediate and fire after closed() has nulled socket, crashing the whole process with:

TypeError: Cannot read properties of null (reading 'write')

Fixes #1066, #1154

@buttilda
Copy link
Copy Markdown
Author

buttilda commented Jun 1, 2026

Hey @porsager, any feedback here? We have a pnpm patch for this to stop our service from crashing randomly, so we have worked around it, but it'd be nice to have to keep the patch live for long :)

littledivy added a commit to denoland/deno that referenced this pull request Jun 5, 2026
… close (#34852)

## Summary

Adds a spec regression test that pins down `setImmediate` /
`clearImmediate`
semantics around a TCP socket's `'close'` event. The test directly
mirrors the
write-batching pattern that surfaces in the crash reported in #34667
(and the
duplicate trail of #29262 sightings against `npm:postgres`):

```js
function write(x) {
  chunk = chunk ? Buffer.concat([chunk, x]) : Buffer.from(x)
  if (nextWriteTimer === null)
    nextWriteTimer = setImmediate(nextWrite)
}
function nextWrite() {
  socket.write(chunk, fn)               // ← crashes when socket === null
  nextWriteTimer !== null && clearImmediate(nextWriteTimer)
  chunk = nextWriteTimer = null
}
function closed() {
  clearImmediate(nextWriteTimer)
  socket = null
}
```

## Investigation

I reproduced the exact stack trace from #34667 with a minimal
pure-`node:net`
script (no `npm:postgres` needed) and confirmed it crashes **identically
on
both Deno and Node.js 24.15.0** — same stack, same `TypeError: Cannot
read
properties of null (reading 'write')` from inside `Immediate.callback`.
So
the symptom is not Deno-specific; it's a userland race in postgres-js's
connection-pool reuse path that surfaces on every libuv-compatible
runtime.

The root cause (already triaged upstream as porsager/postgres#1066 and
#1154):

1. The slow query's `write()` queues `setImmediate(nextWrite)`. That
immediate
   runs **once**, in the check phase of the same tick — long before any
server-side close arrives. `nextWriteTimer` is reset to `null` inside
the
   immediate.
2. The server kills the backend. EOF arrives, the socket destroys, the
close
   callback emits `'close'`. postgres-js's `closed()` calls
`clearImmediate(nextWriteTimer)` — but the timer is already `null`, so
it's
   a no-op — then sets `socket = null`.
3. The slow query's promise rejects. The `await slow` continuation runs
in the
   microtask drain that follows the close handler, then the user calls
`conn.release()` → which moves the now-dead connection into the pool's
   `open` queue → `sql.reserve()` shifts it back out → the next `INSERT`
   triggers `c.execute(q)` → `write(insertBytes)` → **a fresh**
   `setImmediate(nextWrite)` is queued.
4. That fresh immediate fires on the next tick. By then `socket ===
null`,
so `socket.write(chunk, fn)` throws — and because the throw is inside a
check-phase callback, no surrounding `try/catch` in user code can catch
   it. The process exits.

Step 3 is the actual bug. The upstream fix is porsager/postgres#1168,
which
adds a `socket && socket.write(chunk, fn)` guard inside `nextWrite`.
It's been
open since 2026-05-25 awaiting maintainer review, with at least one user
already
applying it as a `pnpm patch` to keep their service stable.

## What this PR adds

A spec test under `tests/specs/node/clear_immediate_socket_close_race/`
that
pins down four invariants Deno already satisfies, so we never silently
regress
against the actual contract reporters were expecting:

1. `clearImmediate(t)` called synchronously cancels `t`.
2. `clearImmediate(t)` called from a microtask before any check phase
runs
   cancels `t`.
3. When a `setImmediate` is queued in the same tick that the socket is
torn
   down, libuv's check phase runs before the close phase — so the queued
immediate fires first and the close handler's `clearImmediate` is a
no-op,
identical to Node.js. (The reporter assumed the opposite order, which is
   the heart of the misdiagnosis.)
4. With a `socket && …` guard inside `nextWrite` (matching upstream
#1168),
   the pool-reuse-after-close pattern stops crashing on Deno.

## What this PR does NOT fix

The actual crash for users still on `postgres@3.4.9` (or anything
pre-#1168).
That fix has to land in `porsager/postgres` and propagate via an npm
release —
Deno can't patch user libraries in-tree. Pointing the issue at upstream
so users
can apply the `pnpm patch` workaround in the meantime.

## Test plan

- [x] `tests/specs/node/clear_immediate_socket_close_race/` passes
locally.
- [x] Removing the `socket && …` guard from case 4 reproduces the exact
`Uncaught TypeError: Cannot read properties of null (reading 'write')`
      stack from the issue.
- [x] The same un-guarded script crashes identically on Node.js 24.15.0,
      confirming this is not a Deno-specific bug.
- [x] `deno fmt` + `deno lint` clean on the new file.

Refs #34667
Refs porsager/postgres#1168

Closes denoland/divybot#465

Co-authored-by: divybot <divybot@users.noreply.github.com>
Co-authored-by: Divy Srivastava <me@littledivy.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TypeError: null is not an object (evaluating 'socket.write')

1 participant