New server_reset_query_always default introduces non-deterministic behavior. #110
Context: PgBouncer 1.7 changes the default of server_reset_query_always to 0, causing the server_reset_query to not take effect for transaction pooling.
According to the documentation, "When transaction pooling is used, the server_reset_query should be empty, as clients should not use any session features. If client does use session features, then they will be broken as transaction pooling will not guarantee that next query will be run on same connection."
I don't follow this argument. This may be true of statement pooling, but if you change a setting such as search_path or statement_timeout within a transaction, the changes are guaranteed to apply to the rest of the queries in that transaction. This is useful functionality, not something I would call broken.
As long as DISCARD ALL guarantees that all connections revert to a pristine state before returning to the pool, all behavior remains deterministic, in that the values of all settings within a transaction are controlled only by statements run within that transaction, plus the regular database configuration (per-user, per database, per-cluster, etc.).
Only if server_reset_query is left empty does the system enter what I would call a broken state: the values of settings are nondeterministic, governed by an unpredictable combination of all database activity since the creation of the connection pool.
For these reasons, I submit that server_reset_query should be set to DISCARD ALL for transaction-based pooling by default.
At the very least, if the new behavior is deemed to fix more than it breaks, could the 1.7 changelog be updated to highlight more clearly what is changing? "Now reset query is used only in pools that are in session mode" requires the reader to look up what reset query does and doesn't come across as "scary" enough to set off the proper alarm bells at the removal of the DISCARD ALL safety net users may have been relying on.
For instance, the Bucardo check_postgres monitoring plugin sets a non-default statement_timeout value. Once the Nagios check has run, other apps connecting to the database risk having a statement_timeout that was not anticipated by the author of those apps.
One could make the case that check_postgres shouldn't connect via pgbouncer, but until now it was safe, and especially since it's a third-party plugin, users might not be aware that it's changing a setting, even if they're confident their own code doesn't change any settings.
The text was updated successfully, but these errors were encountered:
Only goal of transaction pooling is performance and DISCARD ALL breaks that by killing internal caching in Postgres.
Also transaction-pooling requires cooperating apps. If you have apps that reallty expect 100% Postgres behaviour, you should not run them over transaction pooling mode.
Apps can use SET LOCAL if they want custom settings in their transaction.
The plan with the setting was explained in 1.6.1 changelog. In 1.7 it is perhaps too brief indeed. I can put clearer warning into 1.7.1 news.
We've just been bitten by this change and it look a lot of effort to work out what was going on due to the effects in one session being far removed from the cause in another.
I'd much prefer pgbouncer to have defaults that favour correctness and safety.
More significantly I think this change is a security risk since it could be used by an untrusted client to affect the action of another more privileged client, for example by changing SET PATH so a stored procedure in a different schema gets called. It's not a question of how pgbouncer should be configured, but how pgbouncer has been configured in actual installations. Previously server_reset_query made using transaction pooling 'reasonably' safe and thus appealing for us, and presumably many other installations, who are now possibly at risk.
Regarding the docs @sbranchaw quoted, shouldn't "When transaction pooling is used, the server_reset_query should be empty, as clients should not use any session features. If client does use session features, then they will be broken as transaction pooling will not guarantee that next query will be run on same connection." be "... that the next transaction will be run on same connection"?
In fact that whole paragraph seems unclear and somewhat self-justifying of a particular perspective due to the 'should's and the vague 'broken'. Our use of transaction pooling is primarily for reducing the number of concurrent connections, which is only partly related to 'performance'. I suggest this alternative wording:
PgBouncer defaults already favour correctness and safety: session-pooling with DISCARD ALL. If you change configuration away from that, you are supposed to know what you are doing.
It has been documented since day 1 that tx-pooling breaks many Postgres session-based features and you are supposed to be using it only situations where this is OK. If some people have gotten impression that it is fine to point any random app or untrusted client into tx-pooled bouncer, such misconception needs to be fixed fast.
My mistake was to let reset_query run in tx-pooled mode at all. It fit the code flow and I remember thinking "parhaps it is useful in some situations". It's really sad that it has let people to not care about what subset of SQL features apps actually use.
This is not really correct, the main use case we have for transaction pooling is to keep connection counts to pg down. Postgres with 3000 or so connections quickly gets to a state where various connections experience random and painful delays.