Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .release-notes/rearm-statement-timeout-on-asio-failure.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## Fix statement timeout dropped when timer event subscription fails

A statement timeout could silently disappear if the kernel returned an error when the driver's internal timer tried to register itself with the I/O event loop (for example, `ENOMEM` under sustained resource pressure). The operation would then run without a timeout even though one had been requested. The driver now detects this failure and rearms the timer with the original duration, so a transient registration failure no longer drops the timeout. Recovery is best-effort: if the rearm itself fails to register, the timeout is still lost for that operation.
8 changes: 4 additions & 4 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ SSL version is mandatory. Tests run with `--sequential`. Integration tests requi
## Dependencies

- `ponylang/ssl` 2.0.1 (MD5 password hashing, SCRAM-SHA-256 crypto primitives via `ssl/crypto`, SSL/TLS via `ssl/net`)
- `ponylang/lori` 0.13.1 (TCP networking, STARTTLS support)
- `ponylang/lori` 0.14.1 (TCP networking, STARTTLS support)

Managed via `corral`.

Expand Down Expand Up @@ -91,7 +91,7 @@ This design makes illegal state transitions call `_IllegalState()` (panic) by de

Only one operation is in-flight at a time. The queue serializes execution. `query_queue`, `query_state`, `backend_pid`, `backend_secret_key`, `codec_registry`, and `statement_timer: (lori.TimerToken | None)` are non-underscore-prefixed fields on `_SessionLoggedIn` because other types in this package need cross-type access (Pony private fields are type-private). `codec_registry: CodecRegistry` is received from the `Session` constructor and threaded through the state machine (`_SessionUnopened` → `_ConnectableState.on_connected` → `_SessionSSLNegotiating`/`_SessionConnected` → `_AuthenticableState.on_authentication_ok`/`on_authentication_sasl` → `_SessionSCRAMAuthenticating`/`_SessionLoggedIn`), then passed to all `_RowsBuilder` call sites. On shutdown, `_SessionLoggedIn.on_shutdown` cancels any active statement timer, then calls `query_state.drain_in_flight()` to let the in-flight state handle its own queue item (skipping notification if `on_error_response` already notified the receiver), then drains remaining queued items with `SessionClosed`. This prevents double-notification when `close()` arrives between ErrorResponse and ReadyForQuery delivery.

**Statement timeout:** `execute`, `prepare`, `copy_in`, `copy_out`, `stream`, and `pipeline` accept an optional `statement_timeout: (lori.TimerDuration | None)` parameter. When provided, `_QueryReady.try_run_query()` starts a one-shot timer (via `lori.set_timer()`) after sending wire messages. Each in-flight state's `on_ready_for_query()` cancels the timer (via `lori.cancel_timer()`) before transitioning to `_QueryReady`. For pipelines, the timer is cancelled only on the final `ReadyForQuery`. On expiry, `_SessionLoggedIn.on_timer()` sends a `CancelRequest` via `_CancelSender`, producing the same cancellation flow as `session.cancel()`.
**Statement timeout:** `execute`, `prepare`, `copy_in`, `copy_out`, `stream`, and `pipeline` accept an optional `statement_timeout: (lori.TimerDuration | None)` parameter. When provided, `_QueryReady.try_run_query()` starts a one-shot timer (via `lori.set_timer()`) after sending wire messages. Each in-flight state's `on_ready_for_query()` cancels the timer (via `lori.cancel_timer()`) before transitioning to `_QueryReady`. For pipelines, the timer is cancelled only on the final `ReadyForQuery`. On expiry, `_SessionLoggedIn.on_timer()` sends a `CancelRequest` via `_CancelSender`, producing the same cancellation flow as `session.cancel()`. If the timer's ASIO event subscription fails after `set_timer` succeeded (e.g. `ENOMEM` from `kevent`/`epoll_ctl`), lori fires `_on_timer_failure`, which `Session` routes to `_SessionState.on_timer_failure`. `_SessionLoggedIn.on_timer_failure` reads the in-flight queue item's `statement_timeout` and rearms the timer with the original duration so transient subscription failures do not silently drop the timeout. All other states no-op; `Session._on_idle_timer_failure` panics via `_IllegalState()` because postgres never arms lori's idle timer.

**Query cancellation:** `session.cancel()` requests cancellation of the currently executing query by opening a separate TCP connection via `_CancelSender` and sending a `CancelRequest`. The `cancel` method on `_SessionState` follows the same "never illegal" contract as `close` — it is a no-op in all states except `_SessionLoggedIn`, where it fires only when a query is in flight (not in `_QueryReady` or `_QueryNotReady`). Cancellation is best-effort; the server may or may not honor it. If cancelled, the query's `ResultReceiver` receives `pg_query_failed` with an `ErrorResponseMessage` (SQLSTATE 57014).

Expand Down Expand Up @@ -198,9 +198,9 @@ Codec-based decoding via `CodecRegistry.decode(oid, format_code, data)`. Extende

Tests live in the main `postgres/` package (private test classes), organized across multiple files by concern (`_test_*.pony`). The `Main` test actor in `_test.pony` is the single test registry that lists all tests. Read the individual test files for per-test details.

**Conventions**: `_test.pony` contains shared helpers (`_ConnectionTestConfiguration` for env vars, `_ConnectTestNotify`/`_AuthenticateTestNotify` reused by other files). `_test_response_parser.pony` contains `_Incoming*TestMessage` builder classes that construct raw protocol bytes for mock servers across all test files. `_test_mock_message_reader.pony` contains `_MockMessageReader` for extracting complete PostgreSQL frontend messages from TCP data in mock servers. `_test_codecs.pony` contains unit tests for binary/text codecs, `CodecRegistry`, `_ParamEncoder`, and binary-format bind wire format. `_test_array.pony` contains unit tests for `_ArrayOidMap`, binary/text array decode, `PgArray` equality/string, `_ArrayEncoder` roundtrips, `_ParamEncoder` with `PgArray`, `_FrontendMessage.bind()` with `PgArray`, `_FieldDataEq` extraction, `_NumericBinaryCodec` encode roundtrip, `CodecRegistry` array methods, and integration tests for array SELECT/roundtrip. `_test_composite.pony` contains unit tests for `PgComposite` construction/access/equality/string, `CodecRegistry.with_composite_type` registration validation, binary/text composite decode, `_CompositeEncoder` roundtrips, `_ParamEncoder` with `PgComposite`, `_FrontendMessage.bind()` with `PgComposite`, composite-in-array and array-in-composite roundtrips, `_FieldDataEq` with composites, and integration tests for composite SELECT (binary/text) and INSERT/SELECT roundtrip. `_test_statement_timeout.pony` contains unit tests for statement timeout timer lifecycle (fires on slow query, cancelled on normal completion) and integration test for pg_sleep with timeout. `_test_connection_timeout.pony` contains unit test for connection timeout (connects to non-routable address, verifies `ConnectionFailedTimeout` is reported). `_test_protocol_violation.pony` contains unit tests for `ProtocolViolation` routing: parse failures during SCRAM and logged-in (with and without an in-flight query), wrong-state server messages pre-auth, mid-SCRAM, and logged-in (with and without an in-flight query), and per-receiver coverage of the in-flight states that have their own receiver interfaces — `copy_in`, `copy_out`, `prepare`, `stream`, and `pipeline` (each test triggers the violation immediately after the client-side wire message, so the in-flight state is entered but the server hasn't responded with a CopyInResponse/CopyOutResponse/ParseComplete/etc. yet; the pipeline case also exercises the current-query-gets-`ProtocolViolation`/remaining-queries-get-`SessionClosed` contract). The pre-auth parse-failure case (`_TestHandlingJunkMessages`) and the SSL-negotiation junk-byte case (`_TestSSLNegotiationJunkResponse`) live inline in `_test_session.pony` and `_test_ssl.pony` respectively. `_test_connection_closed.pony` contains unit tests for peer-initiated TCP close at every state boundary: `_SessionSSLNegotiating` (in both `SSLRequired` and `SSLPreferred` modes), `_SessionConnected` pre-auth, `_SessionSCRAMAuthenticating`, `_SessionLoggedIn` idle, `_SessionLoggedIn` with an in-flight simple query, `_SessionLoggedIn` with an in-flight pipeline, `_SessionLoggedIn` after `AuthenticationOk` but before the first `ReadyForQuery` (state `_QueryNotReady`), plus per-receiver coverage for the in-flight states — `execute` (extended query), `prepare`, `copy_in`, `copy_out`, `stream`, and `close_statement`. Two tests use exactly-once counters: the in-flight simple-query test catches regressions that double-deliver the query failure, and the `ErrorResponse`-followed-by-close test verifies that `_error` set by `on_error_response` prevents `on_closed` from double-delivering.
**Conventions**: `_test.pony` contains shared helpers (`_ConnectionTestConfiguration` for env vars, `_ConnectTestNotify`/`_AuthenticateTestNotify` reused by other files). `_test_response_parser.pony` contains `_Incoming*TestMessage` builder classes that construct raw protocol bytes for mock servers across all test files. `_test_mock_message_reader.pony` contains `_MockMessageReader` for extracting complete PostgreSQL frontend messages from TCP data in mock servers. `_test_codecs.pony` contains unit tests for binary/text codecs, `CodecRegistry`, `_ParamEncoder`, and binary-format bind wire format. `_test_array.pony` contains unit tests for `_ArrayOidMap`, binary/text array decode, `PgArray` equality/string, `_ArrayEncoder` roundtrips, `_ParamEncoder` with `PgArray`, `_FrontendMessage.bind()` with `PgArray`, `_FieldDataEq` extraction, `_NumericBinaryCodec` encode roundtrip, `CodecRegistry` array methods, and integration tests for array SELECT/roundtrip. `_test_composite.pony` contains unit tests for `PgComposite` construction/access/equality/string, `CodecRegistry.with_composite_type` registration validation, binary/text composite decode, `_CompositeEncoder` roundtrips, `_ParamEncoder` with `PgComposite`, `_FrontendMessage.bind()` with `PgComposite`, composite-in-array and array-in-composite roundtrips, `_FieldDataEq` with composites, and integration tests for composite SELECT (binary/text) and INSERT/SELECT roundtrip. `_test_statement_timeout.pony` contains unit tests for statement timeout timer lifecycle (fires on slow query, cancelled on normal completion, rearmed on simulated ASIO subscription failure via `Session._test_trigger_on_timer_failure`) and integration test for pg_sleep with timeout. `_test_connection_timeout.pony` contains unit test for connection timeout (connects to non-routable address, verifies `ConnectionFailedTimeout` is reported). `_test_protocol_violation.pony` contains unit tests for `ProtocolViolation` routing: parse failures during SCRAM and logged-in (with and without an in-flight query), wrong-state server messages pre-auth, mid-SCRAM, and logged-in (with and without an in-flight query), and per-receiver coverage of the in-flight states that have their own receiver interfaces — `copy_in`, `copy_out`, `prepare`, `stream`, and `pipeline` (each test triggers the violation immediately after the client-side wire message, so the in-flight state is entered but the server hasn't responded with a CopyInResponse/CopyOutResponse/ParseComplete/etc. yet; the pipeline case also exercises the current-query-gets-`ProtocolViolation`/remaining-queries-get-`SessionClosed` contract). The pre-auth parse-failure case (`_TestHandlingJunkMessages`) and the SSL-negotiation junk-byte case (`_TestSSLNegotiationJunkResponse`) live inline in `_test_session.pony` and `_test_ssl.pony` respectively. `_test_connection_closed.pony` contains unit tests for peer-initiated TCP close at every state boundary: `_SessionSSLNegotiating` (in both `SSLRequired` and `SSLPreferred` modes), `_SessionConnected` pre-auth, `_SessionSCRAMAuthenticating`, `_SessionLoggedIn` idle, `_SessionLoggedIn` with an in-flight simple query, `_SessionLoggedIn` with an in-flight pipeline, `_SessionLoggedIn` after `AuthenticationOk` but before the first `ReadyForQuery` (state `_QueryNotReady`), plus per-receiver coverage for the in-flight states — `execute` (extended query), `prepare`, `copy_in`, `copy_out`, `stream`, and `close_statement`. Two tests use exactly-once counters: the in-flight simple-query test catches regressions that double-deliver the query failure, and the `ErrorResponse`-followed-by-close test verifies that `_error` set by `on_error_response` prevents `on_closed` from double-delivering.

**Ports**: Mock server tests use ports in the 7669–7759 range and 9667–9668. **Port 7680 is reserved by Windows** (Update Delivery Optimization) and will fail to bind on WSL2 — do not use it. Composite integration tests don't use mock servers (they use the real PostgreSQL containers).
**Ports**: Mock server tests use ports in the 7669–7760 range and 9667–9668. **Port 7680 is reserved by Windows** (Update Delivery Optimization) and will fail to bind on WSL2 — do not use it. Composite integration tests don't use mock servers (they use the real PostgreSQL containers).

## Supported PostgreSQL Features

Expand Down
2 changes: 1 addition & 1 deletion corral.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
"deps": [
{
"locator": "github.com/ponylang/lori.git",
"version": "0.14.0"
"version": "0.14.1"
},
{
"locator": "github.com/ponylang/ssl.git",
Expand Down
1 change: 1 addition & 0 deletions postgres/_test.pony
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,7 @@ actor \nodoc\ Main is TestList
test(_TestCancelSSLPgSleep)
test(_TestConnectionTimeoutFires)
test(_TestStatementTimeoutFires)
test(_TestStatementTimeoutRearmOnTimerFailure)
test(_TestStatementTimeoutCancelledOnCompletion)
test(_TestStatementTimeoutPgSleep)
test(_TestResponseParserAuthenticationSASLMessage)
Expand Down
101 changes: 101 additions & 0 deletions postgres/_test_statement_timeout.pony
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,107 @@ actor \nodoc\ _TimeoutTestServer
// the statement timeout timer will fire.
end

class \nodoc\ iso _TestStatementTimeoutRearmOnTimerFailure is UnitTest
"""
Verifies that when the statement timer's ASIO event subscription fails,
the driver rearms the timer with the in-flight operation's original
duration instead of silently dropping the timeout. The client simulates
the failure via `_test_trigger_on_timer_failure` immediately after
dispatching a query; the rearmed timer fires and a CancelRequest arrives
on the second connection.
"""
fun name(): String =>
"StatementTimeout/RearmOnTimerFailure"

fun apply(h: TestHelper) =>
let host = "127.0.0.1"
let port = "7760"

let listener = _TimeoutRearmTestListener(
lori.TCPListenAuth(h.env.root),
host,
port,
h)

h.dispose_when_done(listener)
h.long_test(5_000_000_000)

actor \nodoc\ _TimeoutRearmTestClient is (SessionStatusNotify & ResultReceiver)
let _h: TestHelper

new create(h: TestHelper) =>
_h = h

be pg_session_connection_failed(s: Session,
reason: ConnectionFailureReason)
=>
_h.fail("Unable to establish connection.")
_h.complete(false)

be pg_session_authenticated(session: Session) =>
// Execute a query with a 200ms timeout, then immediately simulate an
// ASIO subscription failure on the statement timer. The session
// processes `execute` first (arms the original timer), then the
// simulation (cancels the original token and rearms with 200ms). The
// rearmed timer fires and sends a CancelRequest on a second connection.
match lori.MakeTimerDuration(200)
| let d: lori.TimerDuration =>
session.execute(SimpleQuery("SELECT pg_sleep(100)"), this
where statement_timeout = d)
session._test_trigger_on_timer_failure()
| let _: ValidationFailure =>
_h.fail("Failed to create TimerDuration.")
_h.complete(false)
end

be pg_query_result(session: Session, result: Result) =>
None

be pg_query_failed(session: Session, query: Query,
failure: (ErrorResponseMessage | ClientQueryError))
=>
None

actor \nodoc\ _TimeoutRearmTestListener is lori.TCPListenerActor
var _tcp_listener: lori.TCPListener = lori.TCPListener.none()
let _server_auth: lori.TCPServerAuth
let _h: TestHelper
let _host: String
let _port: String
var _connection_count: USize = 0

new create(listen_auth: lori.TCPListenAuth,
host: String,
port: String,
h: TestHelper)
=>
_host = host
_port = port
_h = h
_server_auth = lori.TCPServerAuth(listen_auth)
_tcp_listener = lori.TCPListener(listen_auth, host, port, this)

fun ref _listener(): lori.TCPListener =>
_tcp_listener

fun ref _on_accept(fd: U32): _TimeoutTestServer =>
_connection_count = _connection_count + 1
let server = _TimeoutTestServer(_server_auth, fd, _h,
_connection_count > 1)
_h.dispose_when_done(server)
server

fun ref _on_listening() =>
let session = Session(
ServerConnectInfo(lori.TCPConnectAuth(_h.env.root), _host, _port),
DatabaseConnectInfo("postgres", "postgres", "postgres"),
_TimeoutRearmTestClient(_h))
_h.dispose_when_done(session)

fun ref _on_listen_failure() =>
_h.fail("Unable to listen")
_h.complete(false)

class \nodoc\ iso _TestStatementTimeoutCancelledOnCompletion is UnitTest
"""
Verifies that when a query completes before its statement_timeout, the
Expand Down
Loading
Loading