Extended protocol messages can be partially delivered to PostgreSQL, leaving connections stuck in ClientRead

### Description

Occasionally, PostgreSQL connections get permanently stuck in `state=active, wait_event_type=Client, wait_event=ClientRead` after client deployments. These connections never recover — `statement_timeout` does not apply to `ClientRead` waits, so the only remedy is manual termination via `pg_terminate_backend`.

### Observed behaviour

After deploying an application that connects through pgdog, often we observe stuck connections in `pg_stat_activity` like:

```
state:            active
wait_event_type:  Client
wait_event:       ClientRead
query:            SELECT "customers".* FROM "customers" WHERE ...
query_start:      2026-02-27 15:39:26.121194+00
state_change:     2026-02-27 15:39:26.121197+00
```

Key observation: `query_start` and `state_change` are 3 microseconds apart. This means PostgreSQL received a `Parse` message (which set the query text and query_start), then immediately entered `ClientRead` — it never received the corresponding `Bind/Execute/Sync` messages.

These connections remain stuck indefinitely (we've observed >1 hour). They hold locks and cause cascading blocking across the database.

### Environment

- pgdog used as a connection pooler in front of PostgreSQL
- pooler_mode = "transaction"
- query_parser = "off"
- Issue appears often after application deployments (when TCP connections to pgdog are abruptly terminated)
- Does not reproduce when the same application connects via pgbouncer

### Suspected root cause

`Server::send()` in `src/backend/server.rs` iterates over messages in a `ClientRequest`, calling `send_one()` for each. Each `send_one()` call writes to a `BufStream` with a default capacity of 4096 bytes (`net_buffer` config).

When `PreparedStatements::handle()` returns `HandleResult::Prepend` (re-preparing a statement on a different server connection), an additional `Parse` message is prepended to the outgoing sequence. This increases the total byte count of the message sequence.

If the total serialised size exceeds the `BufStream` write buffer capacity (4096 bytes), `BufStream` will auto-flush its internal buffer to the underlying TCP socket during a `write_all` call. This means part of the extended protocol sequence (e.g. the `Parse` message) can be sent to PostgreSQL before the remaining messages (`Bind/Execute/Sync`) have been written to the buffer.

If a subsequent `send_one()` call then fails (e.g. due to a network error), the `?` operator exits the loop before `flush()` is called. PostgreSQL has already received the `Parse` but will never receive `Bind/Execute/Sync`, leaving it waiting in `ClientRead` forever.

Even without an explicit error, the partial flush creates a window where the protocol sequence is split across multiple TCP writes, which could interact with connection failures during deployment.

### Suggested fix

Pre-serialise all messages (including any prepended prepared statement re-preparations) into a single byte buffer before writing anything to the stream. This ensures the complete extended protocol sequence is delivered atomically — either all messages reach PostgreSQL or none do.

### Secondary concern: inline response loop doesn't monitor client socket

As a defence-in-depth measure, `client_server_exchange()` in `src/frontend/client/query_engine/query.rs` reads server responses in a tight loop without monitoring the client socket:

```
  while self.backend.has_more_messages() && !self.backend.in_copy_mode() && !self.streaming {
      let message = self.read_server_message().await?;
      self.process_server_message(context, message).await?;
  }
```

If the client disconnects during this loop, pgdog won't detect it until the loop completes. For long-running queries, this means the server connection remains occupied even though the client is gone. Adding a `select!` with a client socket check (using the existing `Stream::check()` method) would allow pgdog to detect client disconnections promptly and cancel the server-side query.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extended protocol messages can be partially delivered to PostgreSQL, leaving connections stuck in ClientRead #803

Description

Observed behaviour

Environment

Suspected root cause

Suggested fix

Secondary concern: inline response loop doesn't monitor client socket

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Extended protocol messages can be partially delivered to PostgreSQL, leaving connections stuck in ClientRead #803

Description

Description

Observed behaviour

Environment

Suspected root cause

Suggested fix

Secondary concern: inline response loop doesn't monitor client socket

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions