Skip to content

Async Postgres and Pipelining

Hitalo Souza edited this page Jun 18, 2026 · 1 revision

Async Postgres and Pipelining

vanilla talks to PostgreSQL through pg_async, a native wire‑protocol client — no libpq. It speaks the v3 frontend/backend protocol directly (SCRAM‑SHA‑256 auth, the extended query protocol: Parse/Bind/Describe/Execute/Sync), frames messages off the recv buffer without copying, and integrates with the Architecture#the-async-runtime-opt-in via park/resume so a worker is never blocked on the database.

Why native, not libpq

The HttpArena DB endpoints (async-db, fortunes) are reads. They are not WAL/fsync‑bound; they're bound by how efficiently the client can issue queries and decode rows. A native, non‑blocking client that can pipeline many queries on one connection and decode rows with zero copying is the right lever for that, in a way libpq's synchronous, copy‑heavy API is not. (crud create/update are writes and are ultimately DB‑ceiling‑bound; the client still matters for not blocking the worker.)

The layers

protocol.v   framing + message builders + row iteration (pure, I/O-free)
  ▲          builders append to a caller buffer; parsers return borrowed slices
client.v     connection state machine (startup, SCRAM, ReadyForQuery)
conn_async.v non-blocking submit/flush/drain; the per-conn pipeline
pool.v       per-worker pool; acquire_pipelined() picks the least-loaded conn

Row decoding reads column lengths at their offset without slicing the payload (slicing allocated an array descriptor per column read — it was ~31 % of the async‑db per‑request CPU profile). Values are returned as borrowed []u8 views valid only inside the resume callback.

Cross‑request pipelining

A single Postgres connection can carry up to max_inflight = 8 queries in flight at once. acquire_pipelined() picks the connection with the shortest pipeline rather than requiring an idle one, so the server multiplexes many clients' queries onto few connections.

On the reactor side, when a second distinct client parks on a connection that already has a watch, the WatchEntry promotes to a FIFO queue of ParkSlots. The invariant that makes everything work:

queue[k] ↔ the connection's inflight[k] — the k‑th parked client corresponds to the k‑th in‑flight query. One readable edge drains replies in submission order, and each reply's continuation gets its client's result.

drain_pipelined runs the head continuations until one cannot complete yet (.suspend); by FIFO, if the front query isn't ready no later one is either, so it stops. Replies are popped with queue.delete(0).

Buffer reuse in the queue

The queue buffer is reused, not re‑allocated each cycle: promote/orphan push into the retained buffer, the drain‑to‑empty path resets len=0 instead of assigning a fresh []ParkSlot{}. Under -gc none a fresh empty array per drain cycle was the dominant per‑request allocation on the pipelined path. See Memory Management under gc none.

Persistent, pool‑owned connections (FIX 3)

A pooled DB connection must not be closed when the client that parked on it disconnects mid‑query — closing it would force a reconnect and a fresh SCRAM/PBKDF2 handshake on the next borrow (expensive, and pointless). So pooled fds are armed with watch_persistent, and on a mid‑query client disconnect the runtime:

  1. Tombstones the parked slot (ParkSlot.dead = true) instead of closing the fd. The slot stays in the queue so the connection's in‑flight FIFO stays aligned; drain_pipelined still consumes the orphaned reply in order (against a throwaway buffer) and discards it.
  2. Identifies the dead client by client_fd, never by re‑looking‑up conns[], so a reused fd (ABA) can't be mistaken for the disconnected client.

For a connection parked alone (no queue) on a persistent fd, reactor_orphan_single converts the single watch into a one‑slot dead tombstone, reusing the same drain path. The net effect: pool connections stay warm and open across client churn.

Backpressure: shedding

When every pooled connection is at max_inflight, the pool is saturated. Rather than block the worker, park() sheds the request: acquire_pipelined() (or async_submit/async_flush) fails, and the call site returns a caller‑chosen fallback response.

This is a deliberate design — blocking would collapse throughput on the closed‑loop benchmark clients — but the status a shed returns is a real decision:

  • Read endpoints shed to a benign empty 200 ({"items":[],"count":0} etc.).
  • crud write/get shed to 503 Service Unavailable — the honest "server momentarily out of DB pipeline capacity" status. (Earlier these returned 400/404, which misreported a backpressure shed as a client error and showed up as spurious 4xx in the benchmark; see Gotchas and Lessons Learned#the-crud-4xx-that-wasnt-a-bug.)

Genuine errors are unchanged: a malformed JSON body is still 400, a missing item is still 404. Only the shed fallback is 503.

The deeper questions — should the pool expose an explicit shed signal so callers don't guess the status; is max_inflight=8 the right number (raising it reduces shedding but costs max_inflight × 16 KiB of reply buffer per connection, against the memory budget); should a transient saturation queue briefly instead of shedding — are tracked in vanilla#51.