Skip to content

Commit

Permalink
Add a dedicated cancel timeout
Browse files Browse the repository at this point in the history
Cancellations and regular queries are quite different. So it makes sense
to have different (default) timeouts for cancellation requests than for
queries. While I already think the default query_wait_timeout of 120
seconds we have is excessively long, it doesn't make sense at all to
have such a long timeout by default for cancellation requests that are
not sent. So this introduced a new `cancel_wait_timeout` which is set to
10 seconds by default. If the cancellation request cannot be forwarded
within that time it's probably better to drop it.

One reason why it's important to relatively quickly close cancellation
requests when they are not being handled is because libpq only allows
the client to send a cancel in a blocking way. So there's no possibility
to bail out after a certain amount of seconds on the client side, if
PgBouncer keeps the connection to the client open. Both me and Marco
noticed this when testing my PgBouncer peering PR (#666), when an entry
in the `[peers]` list pointed to an unreachable host. That meant that
pgbouncer would keep the cancel connection open, and even though the
query had finished the `psql` would not display the result because it
was stuck waiting for a result to the cancel request.

To avoid race conditions of cancellations that already forwarded, but
not yet answered cancellation requests are not timed out. See #717 for
details on such race conditions.
  • Loading branch information
JelteF committed Apr 21, 2023
1 parent 8bdfc6f commit 2622325
Show file tree
Hide file tree
Showing 4 changed files with 36 additions and 16 deletions.
13 changes: 12 additions & 1 deletion doc/config.md
Expand Up @@ -775,7 +775,18 @@ This setting is used to prevent unresponsive servers from grabbing up
connections. It also helps when the server is down or rejects
connections for any reason.

Default: 120
Default: 120.0

### cancel_timeout

Maximum time cancellation requests are allowed to take. If the cancel request
has not been processed by a server within this time the client is disconnected.
0 disables. If this is disabled, cancel requests will be queued indefinitely. [seconds]

This setting is used to prevent a client from locking up when a cancel cannot be
forwarded due to the server being down or when the server stops responding.

Default: 10.0

### client_idle_timeout

Expand Down
1 change: 1 addition & 0 deletions include/bouncer.h
Expand Up @@ -619,6 +619,7 @@ extern usec_t cf_server_connect_timeout;
extern usec_t cf_server_login_retry;
extern usec_t cf_query_timeout;
extern usec_t cf_query_wait_timeout;
extern usec_t cf_cancel_timeout;
extern usec_t cf_client_idle_timeout;
extern usec_t cf_client_login_timeout;
extern usec_t cf_idle_transaction_timeout;
Expand Down
36 changes: 21 additions & 15 deletions src/janitor.c
Expand Up @@ -404,16 +404,17 @@ static void pool_client_maint(PgPool *pool)
disconnect_client(client, true, "query_wait_timeout");
}
}
}

/* apply cancel_timeout for cancel connections */
if (cf_cancel_timeout > 0) {
statlist_for_each_safe(item, &pool->waiting_cancel_req_list, tmp) {
client = container_of(item, PgSocket, head);
Assert(client->state == CL_WAITING_CANCEL);
age = now - client->request_time;

if (cf_query_timeout > 0 && age > cf_query_timeout) {
disconnect_client(client, false, "cancel request query_timeout");
} else if (cf_query_wait_timeout > 0 && age > cf_query_wait_timeout) {
disconnect_client(client, false, "cancel request query_wait_timeout");
}
if (age > cf_cancel_timeout)
disconnect_client(client, false, "cancel_timeout");
}
}

Expand All @@ -438,18 +439,14 @@ static void peer_pool_client_maint(PgPool *pool)
PgSocket *client;
usec_t age;

/* force timeouts for waiting queries */
if (cf_query_timeout > 0 || cf_query_wait_timeout > 0) {
if (cf_cancel_timeout > 0) {
statlist_for_each_safe(item, &pool->waiting_cancel_req_list, tmp) {
client = container_of(item, PgSocket, head);
Assert(client->state == CL_WAITING_CANCEL);
age = now - client->request_time;

if (cf_query_timeout > 0 && age > cf_query_timeout) {
disconnect_client(client, false, "cancel request query_timeout");
} else if (cf_query_wait_timeout > 0 && age > cf_query_wait_timeout) {
disconnect_client(client, false, "cancel request query_wait_timeout");
}
if (age > cf_cancel_timeout)
disconnect_client(client, false, "cancel_timeout");
}
}
}
Expand Down Expand Up @@ -604,17 +601,26 @@ static void peer_pool_server_maint(PgPool *pool)
usec_t now = get_cached_time();
PgSocket *server;

/* find connections that got connect, but could not log in */
if (cf_server_connect_timeout > 0) {
/*
* find connections that got connected, but could not log in. For normal
* pools we only compare against server_connect_timeout for these servers,
* but since peer pools are only for sending cancellations we also compare
* against cancel_timeout here.
*/
if (cf_server_connect_timeout > 0 || cf_cancel_timeout > 0) {
statlist_for_each_safe(item, &pool->new_server_list, tmp) {
usec_t age;

server = container_of(item, PgSocket, head);
Assert(server->state == SV_LOGIN);

age = now - server->connect_time;
if (age > cf_server_connect_timeout)
if (cf_server_connect_timeout > 0 && age > cf_server_connect_timeout) {
disconnect_server(server, true, "connect timeout");
}
else if (cf_cancel_timeout > 0 && age > cf_cancel_timeout) {
disconnect_server(server, true, "cancel_timeout");
}
}
}
}
Expand Down
2 changes: 2 additions & 0 deletions src/main.c
Expand Up @@ -150,6 +150,7 @@ usec_t cf_server_connect_timeout;
usec_t cf_server_login_retry;
usec_t cf_query_timeout;
usec_t cf_query_wait_timeout;
usec_t cf_cancel_timeout;
usec_t cf_client_idle_timeout;
usec_t cf_client_login_timeout;
usec_t cf_idle_transaction_timeout;
Expand Down Expand Up @@ -275,6 +276,7 @@ CF_ABS("pkt_buf", CF_INT, cf_sbuf_len, CF_NO_RELOAD, "4096"),
CF_ABS("pool_mode", CF_LOOKUP(pool_mode_map), cf_pool_mode, 0, "session"),
CF_ABS("query_timeout", CF_TIME_USEC, cf_query_timeout, 0, "0"),
CF_ABS("query_wait_timeout", CF_TIME_USEC, cf_query_wait_timeout, 0, "120"),
CF_ABS("cancel_timeout", CF_TIME_USEC, cf_cancel_timeout, 0, "10"),
CF_ABS("reserve_pool_size", CF_INT, cf_res_pool_size, 0, "0"),
CF_ABS("reserve_pool_timeout", CF_TIME_USEC, cf_res_pool_timeout, 0, "5"),
CF_ABS("resolv_conf", CF_STR, cf_resolv_conf, CF_NO_RELOAD, ""),
Expand Down

0 comments on commit 2622325

Please sign in to comment.