New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Server-side Prepared Statements Cache #757
Conversation
Unfortunately [PgBouncer](https://www.pgbouncer.org/) does not support prepared statements in transaction pooling mode, as stated in the FAQ: > To make prepared statements work in this mode would need PgBouncer to > keep track of them internally, which it does not do. So the only way > to keep using PgBouncer in this mode is to disable prepared statements > in the client. This patch introduces [prepared statement tracking](#how-it-works), allowing to run PgBouncer in transaction pooling mode with server side prepared statements. It **only** supports the extended query protocol parse, bind, execute and close [message formats](https://www.postgresql.org/docs/12/protocol-message-formats.html) E.g. the flow used by the PostgreSQL JDBC driver to create server prepared statements. How it works ------------ PgBouncer has client connections (application↔️ PgBouncer) and server connections (PgBouncer↔️ PostgreSQL). Prepared statements received with a 'parse' command are stored on the receiving client connection together with a hash of the SQL. Parse commands sent to PostgreSQL are modified to have a unique prepared statement name for the server connection. The server connection stores the SQL hash and the prepared statement name unique to that server connection. Whenever a client connection receives a command involving a prepared statement, it does a lookup to get the prepared statement metadata (SQL sent by the parse command and the SQL hash). The server connection maps this SQL hash to the prepared statement name unique to that server connection and modifies the prepared statement name before sending the command to PostgreSQL. If the SQL hash cannot be found for the server connection, this means we are on a different server connection than the one we originally issued the parse command on. In this case, a parse command is generated and sent first. Configuration ------------- Create a configuration file, using `./etc/pgbouncer.ini` as a starting point. By default, prepared statement support is not enabled. To enable it just add the following to your pgbouncer-ps configuration: ``` disable_prepared_statement_support = 0 ``` The number of prepared statements kept active on a single backend connection is defined by the following configuration: ``` prepared_statement_cache_queries = 100 ``` Note: keep in mind that this will increase the memory footprint of each client connection on your PostgreSQL server. Tracking prepared statements does not only come with a memory cost, but also with increased CPU usage. Multiple PgBouncer instances can be run to use more than one core for processing, see `so_reuseport` socket option. Connect the application ----------------------- Configure your client application as though you were connecting directly to a PostgreSQL database. Example - JDBC driver ``` jdbc:postgresql://pgbouncer-ps:6432/postgres?prepareThreshold=10&preparedStatementCacheQueries=512&preparedStatementCacheSizeMiB=10 ``` Note: Cached prepared statements by the JDBC driver will increase the memory footprint of each JDBC connection in your application and each front-end connection in PgBouncer.
This is excellent work, thank you! I tested patched pgbouncer 1.17.0 connecting to AWS Aurora 14.3 and tokyo-postgres driver.
Bind packet decoded with Wireshark:
Happy to provide any additional information, if needed! |
@thecodingrobot ... did you ever find a solution to this issue on PgBouncer 1.17? |
any updates here? |
Can this PR be considered a viable solution for #695? |
yes |
Hey, is there an expectation for the PR to be merged? |
That's great news, guys! |
Running pgbouncer with Executing the testcase with standard pg (running as a local container) seems to work fine:
|
Hey, can I help somehow to get this patch finalized and merged? |
PktBuf *create_parse_packet(char *statement, PgParsePacket *pkt) | ||
{ | ||
PktBuf *buf; | ||
buf = pktbuf_dynamic(pkt->len - strlen(pkt->name + strlen(statement))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
buf = pktbuf_dynamic(pkt->len - strlen(pkt->name + strlen(statement))); | |
buf = pktbuf_dynamic(pkt->len - strlen(pkt->name) + strlen(statement)); |
client->packet_cb_state.pkt_offset = 0; | ||
} | ||
|
||
res = handle_partial_client_work(client, client->packet_cb_state.pkt_header, client->packet_cb_state.pkt_offset, data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
client->packet_cb_state.pkt_header
is not initialized here and contains garbage.
It causes portal ... does not exist
error, for example if you try to run pgbench -c 100 -M prepared -S
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have trier to fix this problem: hand bind message in the same way as prepare.
You can find my version here: git@github.com:knizhnik/pgbouncer.git
s->pkt = parse_packet; | ||
s->query_hash[0] = rand_seed; | ||
s->query_hash[1] = 0; | ||
spookyhash(parse_packet->query, strlen(parse_packet->query), &s->query_hash[0], &s->query_hash[1]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why don't you take in account hash collisions here?
Yes, probability of collision of spooky hash is assumed to be quite small, but still it is non-zero.
And if it happens, then pgbouncer locates wrong query and application receives wrong result.
|
||
if (cached_query_count >= (unsigned int)cf_prepared_statement_cache_queries) | ||
{ | ||
HASH_SORT(server->server_prepared_statements, bind_count_sort); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such replacement algorithm will cause throwing away newly added statements just because their bond_count is small.
It will be better to use Clock or LRU replacement algorithm.
Also it will help to eliminate sort overhead on each new item which may become bottleneck for large number of prepared statements.
@dashorst If I wanted to turn on prepared statements purely for security reasons, would it be possible to set Seems like memory footprint would become negligible, but would there still be an increase in CPU usage? |
@arianf You should be able to use anonymous prepared statements just fine with PgBouncer already. Those are fine for security reasons. See this blog for a detailed explanation: https://jpcamara.com/2023/04/12/pgbouncer-is-useful.html#prepared-statements |
@dashorst Thank you for all this work, I finally looked at it and really like the overall design. I left some comments on #845, which was a follow up PR on this one by @knizhnik, that contains a few small fixes/improvements. If you still feel like collaborating on this feature further, I'd very much appreciate if you respond on that PR. If both you and @knizhnik want to do actual coding work on it, we should make sure to coordinate so that both no-one is doing double work. I considered copy pasting the comments I made to that #845 to this PR too, but that seemed like it could only lead to split discussions, so I refrained from doing that. Again thanks for your great work on this initial implementation! |
@sboschman @dashorst
I think both those changes make sense. I checked the patch repo that you shared: https://github.com/topicusonderwijs/pgbouncer-ps-patch But the license there is different than the PgBouncer one (or at least its ammended). Would it be possible to update this PR to contain your latest changes that are made on the patch repo? That way there's definitely no licensing weirdness. And I also think it will make it easier to bring #845 up to date with these new changes. |
this mean that until this fix has been done. we still need to config as below when connects to PgBouncer ?
someone can advise full spring boot config to temporary disable prepared statements ? UPDATED: changes at below will work: add prepareThreshold=0 to URL string could be like:
|
To avoid confusion, I'm going to close this in favor of #845 |
Unfortunately PgBouncer does not support prepared statements in transaction pooling mode, as stated in the FAQ:
This patch introduces prepared statement tracking, allowing to run PgBouncer in transaction pooling mode with server side prepared statements. It only supports the extended query protocol parse, bind, execute and close message formats E.g. the flow used by the PostgreSQL JDBC driver to create server prepared statements.
How it works
PgBouncer has client connections (application↔️ PgBouncer) and server connections (PgBouncer ↔️ PostgreSQL). Prepared statements received with a 'parse' command are stored on the receiving client connection together with a hash of the SQL. Parse commands sent to PostgreSQL are modified to have a unique prepared statement name for the server connection. The server connection stores the SQL hash and the prepared statement name unique to that server connection.
Whenever a client connection receives a command involving a prepared statement, it does a lookup to get the prepared statement metadata (SQL sent by the parse command and the SQL hash). The server connection maps this SQL hash to the prepared statement name unique to that server connection and modifies the prepared statement name before sending the command to PostgreSQL. If the SQL hash cannot be found for the server connection, this means we are on a different server connection than the one we originally issued the parse command on. In this case, a parse command is generated and sent first.
Configuration
Create a configuration file, using
./etc/pgbouncer.ini
as a starting point.By default, prepared statement support is not enabled. To enable it just add the following to your pgbouncer-ps configuration:
The number of prepared statements kept active on a single backend connection is defined by the following configuration:
Note: keep in mind that this will increase the memory footprint of each client connection on your PostgreSQL server.
Tracking prepared statements does not only come with a memory cost, but also with increased CPU usage. Multiple PgBouncer instances can be run to use more than one core for processing, see
so_reuseport
socket option.Connect the application
Configure your client application as though you were connecting directly to a PostgreSQL database.
Example - JDBC driver
Note: Cached prepared statements by the JDBC driver will increase the memory footprint of each JDBC connection in your application and each front-end connection in PgBouncer.