Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Server-side Prepared Statements Cache #757

Closed
wants to merge 1 commit into from

Conversation

dashorst
Copy link
Contributor

@dashorst dashorst commented Aug 18, 2022

Unfortunately PgBouncer does not support prepared statements in transaction pooling mode, as stated in the FAQ:

To make prepared statements work in this mode would need PgBouncer to keep track of them internally, which it does not do. So the only way to keep using PgBouncer in this mode is to disable prepared statements in the client.

This patch introduces prepared statement tracking, allowing to run PgBouncer in transaction pooling mode with server side prepared statements. It only supports the extended query protocol parse, bind, execute and close message formats E.g. the flow used by the PostgreSQL JDBC driver to create server prepared statements.

How it works

PgBouncer has client connections (application ↔️ PgBouncer) and server connections (PgBouncer ↔️ PostgreSQL). Prepared statements received with a 'parse' command are stored on the receiving client connection together with a hash of the SQL. Parse commands sent to PostgreSQL are modified to have a unique prepared statement name for the server connection. The server connection stores the SQL hash and the prepared statement name unique to that server connection.

Whenever a client connection receives a command involving a prepared statement, it does a lookup to get the prepared statement metadata (SQL sent by the parse command and the SQL hash). The server connection maps this SQL hash to the prepared statement name unique to that server connection and modifies the prepared statement name before sending the command to PostgreSQL. If the SQL hash cannot be found for the server connection, this means we are on a different server connection than the one we originally issued the parse command on. In this case, a parse command is generated and sent first.

Configuration

Create a configuration file, using ./etc/pgbouncer.ini as a starting point.

By default, prepared statement support is not enabled. To enable it just add the following to your pgbouncer-ps configuration:

disable_prepared_statement_support = 0

The number of prepared statements kept active on a single backend connection is defined by the following configuration:

prepared_statement_cache_queries = 100

Note: keep in mind that this will increase the memory footprint of each client connection on your PostgreSQL server.

Tracking prepared statements does not only come with a memory cost, but also with increased CPU usage. Multiple PgBouncer instances can be run to use more than one core for processing, see so_reuseport socket option.

Connect the application

Configure your client application as though you were connecting directly to a PostgreSQL database.

Example - JDBC driver

jdbc:postgresql://pgbouncer-ps:6432/postgres?prepareThreshold=10&preparedStatementCacheQueries=512&preparedStatementCacheSizeMiB=10

Note: Cached prepared statements by the JDBC driver will increase the memory footprint of each JDBC connection in your application and each front-end connection in PgBouncer.

Unfortunately [PgBouncer](https://www.pgbouncer.org/) does not support
prepared statements in transaction pooling mode, as stated in the FAQ:

> To make prepared statements work in this mode would need PgBouncer to
> keep track of them internally, which it does not do. So the only way
> to keep using PgBouncer in this mode is to disable prepared statements
> in the client.

This patch introduces [prepared statement tracking](#how-it-works),
allowing to run PgBouncer in transaction pooling mode with server side
prepared statements. It **only** supports the extended query protocol
parse, bind, execute and close [message formats](https://www.postgresql.org/docs/12/protocol-message-formats.html)
E.g. the flow used by the PostgreSQL JDBC driver to create server
prepared statements.

How it works
------------

PgBouncer has client connections (application ↔️
PgBouncer) and server connections (PgBouncer ↔️
PostgreSQL). Prepared statements received with a 'parse' command are
stored on the receiving client connection together with a hash of the
SQL. Parse commands sent to PostgreSQL are modified to have a unique
prepared statement name for the server connection. The server connection
stores the SQL hash and the prepared statement name unique to that
server connection.

Whenever a client connection receives a command involving a prepared
statement, it does a lookup to get the prepared statement metadata (SQL
sent by the parse command and the SQL hash). The server connection maps
this SQL hash to the prepared statement name unique to that server
connection and modifies the prepared statement name before sending the
command to PostgreSQL. If the SQL hash cannot be found for the server
connection, this means we are on a different server connection than the
one we originally issued the parse command on. In this case, a parse
command is generated and sent first.

Configuration
-------------

Create a configuration file, using `./etc/pgbouncer.ini` as a starting
point.

By default, prepared statement support is not enabled. To enable it just
add the following to your pgbouncer-ps configuration:

```
disable_prepared_statement_support = 0
```

The number of prepared statements kept active on a single backend
connection is defined by the following configuration:

```
prepared_statement_cache_queries = 100
```

Note: keep in mind that this will increase the memory footprint of each
client connection on your PostgreSQL server.

Tracking prepared statements does not only come with a memory cost, but
also with increased CPU usage. Multiple PgBouncer instances can be run
to use more than one core for processing, see `so_reuseport` socket
option.

Connect the application
-----------------------

Configure your client application as though you were connecting directly
to a PostgreSQL database.

Example - JDBC driver

```
jdbc:postgresql://pgbouncer-ps:6432/postgres?prepareThreshold=10&preparedStatementCacheQueries=512&preparedStatementCacheSizeMiB=10
```

Note: Cached prepared statements by the JDBC driver will increase the
memory footprint of each JDBC connection in your application and each
front-end connection in PgBouncer.
@thecodingrobot
Copy link

thecodingrobot commented Oct 15, 2022

This is excellent work, thank you!

I tested patched pgbouncer 1.17.0 connecting to AWS Aurora 14.3 and tokyo-postgres driver.
My prepared statement failed with ERROR: portal "" does not exist
(sample client code: https://github.com/thecodingrobot/pgbouncer-rust-test/blob/master/src/main.rs)

$ RUST_LOG=debug cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.05s
     Running `target/debug/tokyo-test`
[2022-10-15T15:12:08Z DEBUG tokio_postgres::prepare] preparing query s0: SELECT $1::TEXT
[2022-10-15T15:12:08Z DEBUG tokio_postgres::query] executing statement s0 with parameters: ["hello world"]
Error: Error { kind: Db, cause: Some(DbError { severity: "ERROR", parsed_severity: Some(Error), code: SqlState(E34000), message: "portal \"\" does not exist", detail: None, hint: None, position: None, where_: None, schema: None, table: None, column: None, datatype: None, constraint: None, file: Some("postgres.c"), line: Some(2141), routine: Some("exec_execute_message") }) }
[1] DEBUG C-0x55cb8ca4d390: mydb/myuser@127.0.0.1:51074 P: got connection: 127.0.0.1:51074 -> 127.0.0.1:5432
[1] DEBUG C-0x55cb8ca4d390: mydb/myuser@127.0.0.1:51074 logged in
[1] DEBUG C-0x55cb8ca4d390: mydb/myuser@127.0.0.1:51074 pause_client
[1] DEBUG S-0x55cb8ca5b9b0: mydb/myuser@192.168.172.190:5432 P: checking: select 1
[1] DEBUG C-0x55cb8ca4d390: mydb/myuser@127.0.0.1:51074 activate_client
[1] DEBUG C-0x55cb8ca4d390: mydb/myuser@127.0.0.1:51074 handle_parse_command: mapping statement 's0' to 'B_2' (query hash '7631158207450644240658849849158387180')
[1] DEBUG pktbuf_dynamic(5): 0x55cb8ca74f10
[1] DEBUG pktbuf_free(0x55cb8ca74f10)
[1] DEBUG C-0x55cb8ca4d390: mydb/myuser@127.0.0.1:51074 handle_describe_command: mapped statement 's0' (query hash '7631158207450644240658849849158387180') to 'B_2'
[1] DEBUG pktbuf_dynamic(9): 0x55cb8ca74f10
[1] DEBUG make_room(0x55cb8ca74f10, 4): realloc newlen=18
[1] DEBUG pktbuf_free(0x55cb8ca74f10)
[1] DEBUG C-0x55cb8ca4d390: mydb/myuser@127.0.0.1:51074 query time: 1780 us
[1] DEBUG C-0x55cb8ca4d390: mydb/myuser@127.0.0.1:51074 transaction time: 1780 us
[1] DEBUG C-0x55cb8ca4d390: mydb/myuser@127.0.0.1:51074 query time: 473 us
[1] DEBUG C-0x55cb8ca4d390: mydb/myuser@127.0.0.1:51074 transaction time: 473 us
[1] LOG C-0x55cb8ca4d390: mydb/myuser@127.0.0.1:51074 closing because: client unexpected eof (age=1s)

Bind packet decoded with Wireshark:

PostgreSQL
    Type: Bind
    Length: 33
    Portal: 
    Statement: s0
    Parameter formats: 1
        Format: Binary (1)
    Parameter values: 1
        Column length: 11
        Data: 68656c6c6f20776f726c64
    Result formats: 1
        Format: Binary (1)

Happy to provide any additional information, if needed!

@carukc
Copy link

carukc commented Nov 8, 2022

@thecodingrobot ... did you ever find a solution to this issue on PgBouncer 1.17?

@rzavadski-ms
Copy link

any updates here?

@ziegenberg
Copy link

Can this PR be considered a viable solution for #695?

@dashorst
Copy link
Contributor Author

Can this PR be considered a viable solution for #695?

yes

@adamCo1
Copy link

adamCo1 commented Feb 27, 2023

Hey, is there an expectation for the PR to be merged?

@cyberbudy
Copy link

That's great news, guys!
Waiting this feature so much
Tested this with diesel on Rust and had the same error as @thecodingrobot

@sboschman
Copy link
Contributor

Running pgbouncer with -vv gives some additional 'NOISE' logging, which might help to determine at which point stuff breaks.

Executing the testcase with standard pg (running as a local container) seems to work fine:

$ RUST_LOG=debug cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.05s
     Running `target/debug/tokyo-test`
[2023-03-30T15:53:58Z DEBUG tokio_postgres::prepare] preparing query s0: SELECT $1::TEXT
[2023-03-30T15:53:58Z DEBUG tokio_postgres::query] executing statement s0 with parameters: ["hello world"]

@kotik
Copy link

kotik commented Apr 5, 2023

Hey, can I help somehow to get this patch finalized and merged?

PktBuf *create_parse_packet(char *statement, PgParsePacket *pkt)
{
PktBuf *buf;
buf = pktbuf_dynamic(pkt->len - strlen(pkt->name + strlen(statement)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
buf = pktbuf_dynamic(pkt->len - strlen(pkt->name + strlen(statement)));
buf = pktbuf_dynamic(pkt->len - strlen(pkt->name) + strlen(statement));

client->packet_cb_state.pkt_offset = 0;
}

res = handle_partial_client_work(client, client->packet_cb_state.pkt_header, client->packet_cb_state.pkt_offset, data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

client->packet_cb_state.pkt_header is not initialized here and contains garbage.
It causes portal ... does not exist error, for example if you try to run pgbench -c 100 -M prepared -S

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have trier to fix this problem: hand bind message in the same way as prepare.
You can find my version here: git@github.com:knizhnik/pgbouncer.git

s->pkt = parse_packet;
s->query_hash[0] = rand_seed;
s->query_hash[1] = 0;
spookyhash(parse_packet->query, strlen(parse_packet->query), &s->query_hash[0], &s->query_hash[1]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why don't you take in account hash collisions here?
Yes, probability of collision of spooky hash is assumed to be quite small, but still it is non-zero.
And if it happens, then pgbouncer locates wrong query and application receives wrong result.


if (cached_query_count >= (unsigned int)cf_prepared_statement_cache_queries)
{
HASH_SORT(server->server_prepared_statements, bind_count_sort);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such replacement algorithm will cause throwing away newly added statements just because their bond_count is small.
It will be better to use Clock or LRU replacement algorithm.
Also it will help to eliminate sort overhead on each new item which may become bottleneck for large number of prepared statements.

@arianf
Copy link

arianf commented May 8, 2023

@dashorst If I wanted to turn on prepared statements purely for security reasons, would it be possible to set prepared_statement_cache_queries to 0 or 1?

Seems like memory footprint would become negligible, but would there still be an increase in CPU usage?

@JelteF
Copy link
Member

JelteF commented May 25, 2023

@arianf You should be able to use anonymous prepared statements just fine with PgBouncer already. Those are fine for security reasons. See this blog for a detailed explanation: https://jpcamara.com/2023/04/12/pgbouncer-is-useful.html#prepared-statements

@JelteF
Copy link
Member

JelteF commented May 25, 2023

@dashorst Thank you for all this work, I finally looked at it and really like the overall design. I left some comments on #845, which was a follow up PR on this one by @knizhnik, that contains a few small fixes/improvements. If you still feel like collaborating on this feature further, I'd very much appreciate if you respond on that PR. If both you and @knizhnik want to do actual coding work on it, we should make sure to coordinate so that both no-one is doing double work. I considered copy pasting the comments I made to that #845 to this PR too, but that seemed like it could only lead to split discussions, so I refrained from doing that.

Again thanks for your great work on this initial implementation!

@JelteF
Copy link
Member

JelteF commented Jun 1, 2023

@sboschman @dashorst
On the PR #845 @sboschman wrote this:

Fix:
Instead of direct malloc/free calls the Slab memory store from libusual is used. After a recent os update (glibc maybe?) we sometimes see that pgbouncer keeps claiming memory from the os (until OOM). It seems like freed memory is never again reused for a future malloc call, and malloc keeps requesting more and more memory from the os.

Feat:
The admin console has two new commands to show some parse/bind statistics for client and server sockets. Which at least give a starting point to reason about how effect the ps caching is for a workload.

I think both those changes make sense. I checked the patch repo that you shared: https://github.com/topicusonderwijs/pgbouncer-ps-patch But the license there is different than the PgBouncer one (or at least its ammended). Would it be possible to update this PR to contain your latest changes that are made on the patch repo? That way there's definitely no licensing weirdness. And I also think it will make it easier to bring #845 up to date with these new changes.

@letrthang
Copy link

letrthang commented Aug 27, 2023

this mean that until this fix has been done. we still need to config as below when connects to PgBouncer ?

jpa:
    open-in-view: false
    properties:
      hibernate:
        cache:
          use_query_cache: false
        default_schema: navigation_schema_service

someone can advise full spring boot config to temporary disable prepared statements ?

UPDATED: changes at below will work:

add prepareThreshold=0 to URL string could be like:

url: jdbc:postgresql://url....?sslmode=require&prepareThreshold=0

jpa:
    open-in-view: false
    properties:
      hibernate:
        cache:
          use_query_cache: false
        default_schema: schema_name

@JelteF
Copy link
Member

JelteF commented Aug 29, 2023

To avoid confusion, I'm going to close this in favor of #845

@JelteF JelteF closed this Aug 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet