New server_reset_query_always default introduces non-deterministic behavior. #110

Closed
sbranchaw opened this Issue Feb 9, 2016 · 5 comments

Comments

Projects
None yet
4 participants
@sbranchaw

Context: PgBouncer 1.7 changes the default of server_reset_query_always to 0, causing the server_reset_query to not take effect for transaction pooling.

According to the documentation, "When transaction pooling is used, the server_reset_query should be empty, as clients should not use any session features. If client does use session features, then they will be broken as transaction pooling will not guarantee that next query will be run on same connection."

I don't follow this argument. This may be true of statement pooling, but if you change a setting such as search_path or statement_timeout within a transaction, the changes are guaranteed to apply to the rest of the queries in that transaction. This is useful functionality, not something I would call broken.

As long as DISCARD ALL guarantees that all connections revert to a pristine state before returning to the pool, all behavior remains deterministic, in that the values of all settings within a transaction are controlled only by statements run within that transaction, plus the regular database configuration (per-user, per database, per-cluster, etc.).

Only if server_reset_query is left empty does the system enter what I would call a broken state: the values of settings are nondeterministic, governed by an unpredictable combination of all database activity since the creation of the connection pool.

For these reasons, I submit that server_reset_query should be set to DISCARD ALL for transaction-based pooling by default.

At the very least, if the new behavior is deemed to fix more than it breaks, could the 1.7 changelog be updated to highlight more clearly what is changing? "Now reset query is used only in pools that are in session mode" requires the reader to look up what reset query does and doesn't come across as "scary" enough to set off the proper alarm bells at the removal of the DISCARD ALL safety net users may have been relying on.

For instance, the Bucardo check_postgres monitoring plugin sets a non-default statement_timeout value. Once the Nagios check has run, other apps connecting to the database risk having a statement_timeout that was not anticipated by the author of those apps.

One could make the case that check_postgres shouldn't connect via pgbouncer, but until now it was safe, and especially since it's a third-party plugin, users might not be aware that it's changing a setting, even if they're confident their own code doesn't change any settings.

@markokr

This comment has been minimized.

Show comment
Hide comment
@markokr

markokr Feb 10, 2016

Contributor

Only goal of transaction pooling is performance and DISCARD ALL breaks that by killing internal caching in Postgres.

Also transaction-pooling requires cooperating apps. If you have apps that reallty expect 100% Postgres behaviour, you should not run them over transaction pooling mode.

Apps can use SET LOCAL if they want custom settings in their transaction.

The plan with the setting was explained in 1.6.1 changelog. In 1.7 it is perhaps too brief indeed. I can put clearer warning into 1.7.1 news.

Contributor

markokr commented Feb 10, 2016

Only goal of transaction pooling is performance and DISCARD ALL breaks that by killing internal caching in Postgres.

Also transaction-pooling requires cooperating apps. If you have apps that reallty expect 100% Postgres behaviour, you should not run them over transaction pooling mode.

Apps can use SET LOCAL if they want custom settings in their transaction.

The plan with the setting was explained in 1.6.1 changelog. In 1.7 it is perhaps too brief indeed. I can put clearer warning into 1.7.1 news.

@timbunce

This comment has been minimized.

Show comment
Hide comment
@timbunce

timbunce Feb 10, 2016

We've just been bitten by this change and it look a lot of effort to work out what was going on due to the effects in one session being far removed from the cause in another.

I'd much prefer pgbouncer to have defaults that favour correctness and safety.

More significantly I think this change is a security risk since it could be used by an untrusted client to affect the action of another more privileged client, for example by changing SET PATH so a stored procedure in a different schema gets called. It's not a question of how pgbouncer should be configured, but how pgbouncer has been configured in actual installations. Previously server_reset_query made using transaction pooling 'reasonably' safe and thus appealing for us, and presumably many other installations, who are now possibly at risk.

Regarding the docs @sbranchaw quoted, shouldn't "When transaction pooling is used, the server_reset_query should be empty, as clients should not use any session features. If client does use session features, then they will be broken as transaction pooling will not guarantee that next query will be run on same connection." be "... that the next transaction will be run on same connection"?

In fact that whole paragraph seems unclear and somewhat self-justifying of a particular perspective due to the 'should's and the vague 'broken'. Our use of transaction pooling is primarily for reducing the number of concurrent connections, which is only partly related to 'performance'. I suggest this alternative wording:

When transaction pooling is used, if a client makes a change to the session (such as SET PATH) then the effect of that change will 'leak' into any other client transactions that happen to reuse that connection. Clearly this is unsafe and likely to cause hard to diagnose problems and security risks. For that reason transaction pooling is only recommended where all clients are trusted to not alter the session. Alternatively, if transaction pooling is wanted but clients aren't trusted then server_reset_query_always can be enabled to improve safety at the cost of performance.

We've just been bitten by this change and it look a lot of effort to work out what was going on due to the effects in one session being far removed from the cause in another.

I'd much prefer pgbouncer to have defaults that favour correctness and safety.

More significantly I think this change is a security risk since it could be used by an untrusted client to affect the action of another more privileged client, for example by changing SET PATH so a stored procedure in a different schema gets called. It's not a question of how pgbouncer should be configured, but how pgbouncer has been configured in actual installations. Previously server_reset_query made using transaction pooling 'reasonably' safe and thus appealing for us, and presumably many other installations, who are now possibly at risk.

Regarding the docs @sbranchaw quoted, shouldn't "When transaction pooling is used, the server_reset_query should be empty, as clients should not use any session features. If client does use session features, then they will be broken as transaction pooling will not guarantee that next query will be run on same connection." be "... that the next transaction will be run on same connection"?

In fact that whole paragraph seems unclear and somewhat self-justifying of a particular perspective due to the 'should's and the vague 'broken'. Our use of transaction pooling is primarily for reducing the number of concurrent connections, which is only partly related to 'performance'. I suggest this alternative wording:

When transaction pooling is used, if a client makes a change to the session (such as SET PATH) then the effect of that change will 'leak' into any other client transactions that happen to reuse that connection. Clearly this is unsafe and likely to cause hard to diagnose problems and security risks. For that reason transaction pooling is only recommended where all clients are trusted to not alter the session. Alternatively, if transaction pooling is wanted but clients aren't trusted then server_reset_query_always can be enabled to improve safety at the cost of performance.

@markokr

This comment has been minimized.

Show comment
Hide comment
@markokr

markokr Feb 10, 2016

Contributor

PgBouncer defaults already favour correctness and safety: session-pooling with DISCARD ALL. If you change configuration away from that, you are supposed to know what you are doing.

It has been documented since day 1 that tx-pooling breaks many Postgres session-based features and you are supposed to be using it only situations where this is OK. If some people have gotten impression that it is fine to point any random app or untrusted client into tx-pooled bouncer, such misconception needs to be fixed fast.

My mistake was to let reset_query run in tx-pooled mode at all. It fit the code flow and I remember thinking "parhaps it is useful in some situations". It's really sad that it has let people to not care about what subset of SQL features apps actually use.

Contributor

markokr commented Feb 10, 2016

PgBouncer defaults already favour correctness and safety: session-pooling with DISCARD ALL. If you change configuration away from that, you are supposed to know what you are doing.

It has been documented since day 1 that tx-pooling breaks many Postgres session-based features and you are supposed to be using it only situations where this is OK. If some people have gotten impression that it is fine to point any random app or untrusted client into tx-pooled bouncer, such misconception needs to be fixed fast.

My mistake was to let reset_query run in tx-pooled mode at all. It fit the code flow and I remember thinking "parhaps it is useful in some situations". It's really sad that it has let people to not care about what subset of SQL features apps actually use.

markokr added a commit to markokr/pgbouncer-dev that referenced this issue Feb 10, 2016

doc: Improve server_reset_query description
Based on feedback in #110

Also remove obsolete suggestion for pre-8.3 postgres.
@markokr

This comment has been minimized.

Show comment
Hide comment
@markokr

markokr Feb 11, 2016

Contributor

Closing.

When admin configures tx-pooling, it means it has well-behaved app and config defaults should reflect that.

But I'm open to suggestions on how to improve documentation to reflect that.

Contributor

markokr commented Feb 11, 2016

Closing.

When admin configures tx-pooling, it means it has well-behaved app and config defaults should reflect that.

But I'm open to suggestions on how to improve documentation to reflect that.

@markokr markokr closed this Feb 11, 2016

@SamSaffron

This comment has been minimized.

Show comment
Hide comment
@SamSaffron

SamSaffron Mar 9, 2018

Only goal of transaction pooling is performance and DISCARD ALL breaks that by killing internal caching in Postgres.

This is not really correct, the main use case we have for transaction pooling is to keep connection counts to pg down. Postgres with 3000 or so connections quickly gets to a state where various connections experience random and painful delays.

Only goal of transaction pooling is performance and DISCARD ALL breaks that by killing internal caching in Postgres.

This is not really correct, the main use case we have for transaction pooling is to keep connection counts to pg down. Postgres with 3000 or so connections quickly gets to a state where various connections experience random and painful delays.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment