Add Server-side Prepared Statements Cache #757

dashorst · 2022-08-18T09:25:05Z

Unfortunately PgBouncer does not support prepared statements in transaction pooling mode, as stated in the FAQ:

To make prepared statements work in this mode would need PgBouncer to keep track of them internally, which it does not do. So the only way to keep using PgBouncer in this mode is to disable prepared statements in the client.

This patch introduces prepared statement tracking, allowing to run PgBouncer in transaction pooling mode with server side prepared statements. It only supports the extended query protocol parse, bind, execute and close message formats E.g. the flow used by the PostgreSQL JDBC driver to create server prepared statements.

How it works

PgBouncer has client connections (application ↔️ PgBouncer) and server connections (PgBouncer ↔️ PostgreSQL). Prepared statements received with a 'parse' command are stored on the receiving client connection together with a hash of the SQL. Parse commands sent to PostgreSQL are modified to have a unique prepared statement name for the server connection. The server connection stores the SQL hash and the prepared statement name unique to that server connection.

Whenever a client connection receives a command involving a prepared statement, it does a lookup to get the prepared statement metadata (SQL sent by the parse command and the SQL hash). The server connection maps this SQL hash to the prepared statement name unique to that server connection and modifies the prepared statement name before sending the command to PostgreSQL. If the SQL hash cannot be found for the server connection, this means we are on a different server connection than the one we originally issued the parse command on. In this case, a parse command is generated and sent first.

Configuration

Create a configuration file, using ./etc/pgbouncer.ini as a starting point.

By default, prepared statement support is not enabled. To enable it just add the following to your pgbouncer-ps configuration:

disable_prepared_statement_support = 0

The number of prepared statements kept active on a single backend connection is defined by the following configuration:

prepared_statement_cache_queries = 100

Note: keep in mind that this will increase the memory footprint of each client connection on your PostgreSQL server.

Tracking prepared statements does not only come with a memory cost, but also with increased CPU usage. Multiple PgBouncer instances can be run to use more than one core for processing, see so_reuseport socket option.

Connect the application

Configure your client application as though you were connecting directly to a PostgreSQL database.

Example - JDBC driver

jdbc:postgresql://pgbouncer-ps:6432/postgres?prepareThreshold=10&preparedStatementCacheQueries=512&preparedStatementCacheSizeMiB=10

Note: Cached prepared statements by the JDBC driver will increase the memory footprint of each JDBC connection in your application and each front-end connection in PgBouncer.

Unfortunately [PgBouncer](https://www.pgbouncer.org/) does not support prepared statements in transaction pooling mode, as stated in the FAQ: > To make prepared statements work in this mode would need PgBouncer to > keep track of them internally, which it does not do. So the only way > to keep using PgBouncer in this mode is to disable prepared statements > in the client. This patch introduces [prepared statement tracking](#how-it-works), allowing to run PgBouncer in transaction pooling mode with server side prepared statements. It **only** supports the extended query protocol parse, bind, execute and close [message formats](https://www.postgresql.org/docs/12/protocol-message-formats.html) E.g. the flow used by the PostgreSQL JDBC driver to create server prepared statements. How it works ------------ PgBouncer has client connections (application ↔️ PgBouncer) and server connections (PgBouncer ↔️ PostgreSQL). Prepared statements received with a 'parse' command are stored on the receiving client connection together with a hash of the SQL. Parse commands sent to PostgreSQL are modified to have a unique prepared statement name for the server connection. The server connection stores the SQL hash and the prepared statement name unique to that server connection. Whenever a client connection receives a command involving a prepared statement, it does a lookup to get the prepared statement metadata (SQL sent by the parse command and the SQL hash). The server connection maps this SQL hash to the prepared statement name unique to that server connection and modifies the prepared statement name before sending the command to PostgreSQL. If the SQL hash cannot be found for the server connection, this means we are on a different server connection than the one we originally issued the parse command on. In this case, a parse command is generated and sent first. Configuration ------------- Create a configuration file, using `./etc/pgbouncer.ini` as a starting point. By default, prepared statement support is not enabled. To enable it just add the following to your pgbouncer-ps configuration: ``` disable_prepared_statement_support = 0 ``` The number of prepared statements kept active on a single backend connection is defined by the following configuration: ``` prepared_statement_cache_queries = 100 ``` Note: keep in mind that this will increase the memory footprint of each client connection on your PostgreSQL server. Tracking prepared statements does not only come with a memory cost, but also with increased CPU usage. Multiple PgBouncer instances can be run to use more than one core for processing, see `so_reuseport` socket option. Connect the application ----------------------- Configure your client application as though you were connecting directly to a PostgreSQL database. Example - JDBC driver ``` jdbc:postgresql://pgbouncer-ps:6432/postgres?prepareThreshold=10&preparedStatementCacheQueries=512&preparedStatementCacheSizeMiB=10 ``` Note: Cached prepared statements by the JDBC driver will increase the memory footprint of each JDBC connection in your application and each front-end connection in PgBouncer.

thecodingrobot · 2022-10-15T15:18:47Z

This is excellent work, thank you!

I tested patched pgbouncer 1.17.0 connecting to AWS Aurora 14.3 and tokyo-postgres driver.
My prepared statement failed with ERROR: portal "" does not exist
(sample client code: https://github.com/thecodingrobot/pgbouncer-rust-test/blob/master/src/main.rs)

$ RUST_LOG=debug cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.05s
     Running `target/debug/tokyo-test`
[2022-10-15T15:12:08Z DEBUG tokio_postgres::prepare] preparing query s0: SELECT $1::TEXT
[2022-10-15T15:12:08Z DEBUG tokio_postgres::query] executing statement s0 with parameters: ["hello world"]
Error: Error { kind: Db, cause: Some(DbError { severity: "ERROR", parsed_severity: Some(Error), code: SqlState(E34000), message: "portal \"\" does not exist", detail: None, hint: None, position: None, where_: None, schema: None, table: None, column: None, datatype: None, constraint: None, file: Some("postgres.c"), line: Some(2141), routine: Some("exec_execute_message") }) }

[1] DEBUG C-0x55cb8ca4d390: mydb/myuser@127.0.0.1:51074 P: got connection: 127.0.0.1:51074 -> 127.0.0.1:5432
[1] DEBUG C-0x55cb8ca4d390: mydb/myuser@127.0.0.1:51074 logged in
[1] DEBUG C-0x55cb8ca4d390: mydb/myuser@127.0.0.1:51074 pause_client
[1] DEBUG S-0x55cb8ca5b9b0: mydb/myuser@192.168.172.190:5432 P: checking: select 1
[1] DEBUG C-0x55cb8ca4d390: mydb/myuser@127.0.0.1:51074 activate_client
[1] DEBUG C-0x55cb8ca4d390: mydb/myuser@127.0.0.1:51074 handle_parse_command: mapping statement 's0' to 'B_2' (query hash '7631158207450644240658849849158387180')
[1] DEBUG pktbuf_dynamic(5): 0x55cb8ca74f10
[1] DEBUG pktbuf_free(0x55cb8ca74f10)
[1] DEBUG C-0x55cb8ca4d390: mydb/myuser@127.0.0.1:51074 handle_describe_command: mapped statement 's0' (query hash '7631158207450644240658849849158387180') to 'B_2'
[1] DEBUG pktbuf_dynamic(9): 0x55cb8ca74f10
[1] DEBUG make_room(0x55cb8ca74f10, 4): realloc newlen=18
[1] DEBUG pktbuf_free(0x55cb8ca74f10)
[1] DEBUG C-0x55cb8ca4d390: mydb/myuser@127.0.0.1:51074 query time: 1780 us
[1] DEBUG C-0x55cb8ca4d390: mydb/myuser@127.0.0.1:51074 transaction time: 1780 us
[1] DEBUG C-0x55cb8ca4d390: mydb/myuser@127.0.0.1:51074 query time: 473 us
[1] DEBUG C-0x55cb8ca4d390: mydb/myuser@127.0.0.1:51074 transaction time: 473 us
[1] LOG C-0x55cb8ca4d390: mydb/myuser@127.0.0.1:51074 closing because: client unexpected eof (age=1s)

Bind packet decoded with Wireshark:

PostgreSQL
    Type: Bind
    Length: 33
    Portal: 
    Statement: s0
    Parameter formats: 1
        Format: Binary (1)
    Parameter values: 1
        Column length: 11
        Data: 68656c6c6f20776f726c64
    Result formats: 1
        Format: Binary (1)

Happy to provide any additional information, if needed!

carukc · 2022-11-08T17:17:30Z

@thecodingrobot ... did you ever find a solution to this issue on PgBouncer 1.17?

rzavadski-ms · 2023-01-05T12:57:46Z

any updates here?

ziegenberg · 2023-02-23T15:50:46Z

Can this PR be considered a viable solution for #695?

dashorst · 2023-02-23T15:53:42Z

Can this PR be considered a viable solution for #695?

yes

adamCo1 · 2023-02-27T09:32:32Z

Hey, is there an expectation for the PR to be merged?

cyberbudy · 2023-03-30T07:28:04Z

That's great news, guys!
Waiting this feature so much
Tested this with diesel on Rust and had the same error as @thecodingrobot

sboschman · 2023-03-30T15:59:02Z

Running pgbouncer with -vv gives some additional 'NOISE' logging, which might help to determine at which point stuff breaks.

Executing the testcase with standard pg (running as a local container) seems to work fine:

$ RUST_LOG=debug cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.05s
     Running `target/debug/tokyo-test`
[2023-03-30T15:53:58Z DEBUG tokio_postgres::prepare] preparing query s0: SELECT $1::TEXT
[2023-03-30T15:53:58Z DEBUG tokio_postgres::query] executing statement s0 with parameters: ["hello world"]

kotik · 2023-04-05T21:19:46Z

Hey, can I help somehow to get this patch finalized and merged?

knizhnik · 2023-05-05T15:25:55Z

src/messages.c

+PktBuf *create_parse_packet(char *statement, PgParsePacket *pkt)
+{
+  PktBuf *buf;
+  buf = pktbuf_dynamic(pkt->len - strlen(pkt->name + strlen(statement)));


Suggested change

buf = pktbuf_dynamic(pkt->len - strlen(pkt->name + strlen(statement)));

buf = pktbuf_dynamic(pkt->len - strlen(pkt->name) + strlen(statement));

knizhnik · 2023-05-05T15:28:37Z

src/client.c

+          client->packet_cb_state.pkt_offset = 0;
+        }
+
+        res = handle_partial_client_work(client, client->packet_cb_state.pkt_header, client->packet_cb_state.pkt_offset, data);


client->packet_cb_state.pkt_header is not initialized here and contains garbage.
It causes portal ... does not exist error, for example if you try to run pgbench -c 100 -M prepared -S

I have trier to fix this problem: hand bind message in the same way as prepare.
You can find my version here: git@github.com:knizhnik/pgbouncer.git

knizhnik · 2023-05-06T05:29:09Z

src/ps.c

+  s->pkt = parse_packet;
+  s->query_hash[0] = rand_seed;
+	s->query_hash[1] = 0;
+	spookyhash(parse_packet->query, strlen(parse_packet->query), &s->query_hash[0], &s->query_hash[1]);


I wonder why don't you take in account hash collisions here?
Yes, probability of collision of spooky hash is assumed to be quite small, but still it is non-zero.
And if it happens, then pgbouncer locates wrong query and application receives wrong result.

knizhnik · 2023-05-07T19:59:56Z

src/ps.c

+
+  if (cached_query_count >= (unsigned int)cf_prepared_statement_cache_queries)
+  {
+    HASH_SORT(server->server_prepared_statements, bind_count_sort);


Such replacement algorithm will cause throwing away newly added statements just because their bond_count is small.
It will be better to use Clock or LRU replacement algorithm.
Also it will help to eliminate sort overhead on each new item which may become bottleneck for large number of prepared statements.

arianf · 2023-05-08T18:19:25Z

@dashorst If I wanted to turn on prepared statements purely for security reasons, would it be possible to set prepared_statement_cache_queries to 0 or 1?

Seems like memory footprint would become negligible, but would there still be an increase in CPU usage?

JelteF · 2023-05-25T16:11:48Z

@arianf You should be able to use anonymous prepared statements just fine with PgBouncer already. Those are fine for security reasons. See this blog for a detailed explanation: https://jpcamara.com/2023/04/12/pgbouncer-is-useful.html#prepared-statements

JelteF · 2023-05-25T16:17:52Z

@dashorst Thank you for all this work, I finally looked at it and really like the overall design. I left some comments on #845, which was a follow up PR on this one by @knizhnik, that contains a few small fixes/improvements. If you still feel like collaborating on this feature further, I'd very much appreciate if you respond on that PR. If both you and @knizhnik want to do actual coding work on it, we should make sure to coordinate so that both no-one is doing double work. I considered copy pasting the comments I made to that #845 to this PR too, but that seemed like it could only lead to split discussions, so I refrained from doing that.

Again thanks for your great work on this initial implementation!

JelteF · 2023-06-01T14:11:19Z

@sboschman @dashorst
On the PR #845 @sboschman wrote this:

Fix:
Instead of direct malloc/free calls the Slab memory store from libusual is used. After a recent os update (glibc maybe?) we sometimes see that pgbouncer keeps claiming memory from the os (until OOM). It seems like freed memory is never again reused for a future malloc call, and malloc keeps requesting more and more memory from the os.

Feat:
The admin console has two new commands to show some parse/bind statistics for client and server sockets. Which at least give a starting point to reason about how effect the ps caching is for a workload.

I think both those changes make sense. I checked the patch repo that you shared: https://github.com/topicusonderwijs/pgbouncer-ps-patch But the license there is different than the PgBouncer one (or at least its ammended). Would it be possible to update this PR to contain your latest changes that are made on the patch repo? That way there's definitely no licensing weirdness. And I also think it will make it easier to bring #845 up to date with these new changes.

letrthang · 2023-08-27T18:24:15Z

this mean that until this fix has been done. we still need to config as below when connects to PgBouncer ?

jpa:
    open-in-view: false
    properties:
      hibernate:
        cache:
          use_query_cache: false
        default_schema: navigation_schema_service

someone can advise full spring boot config to temporary disable prepared statements ?

UPDATED: changes at below will work:

add prepareThreshold=0 to URL string could be like:

url: jdbc:postgresql://url....?sslmode=require&prepareThreshold=0

jpa:
    open-in-view: false
    properties:
      hibernate:
        cache:
          use_query_cache: false
        default_schema: schema_name

JelteF · 2023-08-29T18:18:54Z

To avoid confusion, I'm going to close this in favor of #845

dashorst mentioned this pull request Aug 18, 2022

Yet Another Server Side Prepared Statements Patch #695

Closed

dashorst force-pushed the master branch from 467d0f2 to cedd129 Compare August 18, 2022 11:08

hamiltop mentioned this pull request Apr 12, 2023

Support named prepared statements supabase/supavisor#69

Closed

knizhnik reviewed May 5, 2023

View reviewed changes

knizhnik reviewed May 6, 2023

View reviewed changes

knizhnik reviewed May 7, 2023

View reviewed changes

knizhnik mentioned this pull request May 14, 2023

Support of prepared statements #845

Merged

lmatz mentioned this pull request Aug 16, 2023

Tracking: support pgbouncer risingwavelabs/risingwave#11674

Closed

2 tasks

JelteF closed this Aug 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Server-side Prepared Statements Cache #757

Add Server-side Prepared Statements Cache #757

dashorst commented Aug 18, 2022 •

edited

thecodingrobot commented Oct 15, 2022 •

edited

carukc commented Nov 8, 2022

rzavadski-ms commented Jan 5, 2023

ziegenberg commented Feb 23, 2023

dashorst commented Feb 23, 2023

adamCo1 commented Feb 27, 2023

cyberbudy commented Mar 30, 2023

sboschman commented Mar 30, 2023

kotik commented Apr 5, 2023

knizhnik May 5, 2023

knizhnik May 5, 2023

knizhnik May 5, 2023

knizhnik May 6, 2023

knizhnik May 7, 2023

arianf commented May 8, 2023

JelteF commented May 25, 2023

JelteF commented May 25, 2023 •

edited

JelteF commented Jun 1, 2023

letrthang commented Aug 27, 2023 •

edited

JelteF commented Aug 29, 2023

	buf = pktbuf_dynamic(pkt->len - strlen(pkt->name + strlen(statement)));
	buf = pktbuf_dynamic(pkt->len - strlen(pkt->name) + strlen(statement));

Add Server-side Prepared Statements Cache #757

Add Server-side Prepared Statements Cache #757

Conversation

dashorst commented Aug 18, 2022 • edited

How it works

Configuration

Connect the application

thecodingrobot commented Oct 15, 2022 • edited

carukc commented Nov 8, 2022

rzavadski-ms commented Jan 5, 2023

ziegenberg commented Feb 23, 2023

dashorst commented Feb 23, 2023

adamCo1 commented Feb 27, 2023

cyberbudy commented Mar 30, 2023

sboschman commented Mar 30, 2023

kotik commented Apr 5, 2023

knizhnik May 5, 2023

Choose a reason for hiding this comment

knizhnik May 5, 2023

Choose a reason for hiding this comment

knizhnik May 5, 2023

Choose a reason for hiding this comment

knizhnik May 6, 2023

Choose a reason for hiding this comment

knizhnik May 7, 2023

Choose a reason for hiding this comment

arianf commented May 8, 2023

JelteF commented May 25, 2023

JelteF commented May 25, 2023 • edited

JelteF commented Jun 1, 2023

letrthang commented Aug 27, 2023 • edited

JelteF commented Aug 29, 2023

dashorst commented Aug 18, 2022 •

edited

thecodingrobot commented Oct 15, 2022 •

edited

JelteF commented May 25, 2023 •

edited

letrthang commented Aug 27, 2023 •

edited