Skip to content
Konstantin Knizhnik edited this page Apr 17, 2018 · 6 revisions

Introduction

Most of enterprise systems working with Postgres are using pgbouncer or similar tools. But pgbouncer has the following drawbacks:

  1. It is an extra entity which complicates system installation and administration.
  2. Pgbouncer itself can be a bottleneck and point of failure. For example with enabled SSL, single threaded model of pgbouncer becomes limiting factor when a lot of clients try to simultaneously reestablish connection. This is why some companies are building hierarchy of pgbouncers.
  3. Using pool_mode other than "session" makes it not possible to use prepared statements and session variables. Lack of prepared statements can itself decrease speed of simple queries up to two times.

This is whybuilt-in connectoin pooling for Postgres can be useful.

Architecture

  1. There is one or more session pools with specified maximal number of workers (backends). In case of multiple session pools, each of them is associated with it own port, leaving standard postgres port (5432 by default) for normal connections with dedicated backends.
  2. When number of connections exceeds number of backends in seesion pool, then instead of spawning new backend we choose some of existed backend and redirect connection to it. There is more or less portable way in Unix to pass socket descriptors between processes using Unix sockets: for example [https://stackoverflow.com/questions/28003921/sending-file-descriptor-by-linux-socket/]. So a session is bounded to a backend. Backends and chosen using round-robin policy which should guarantee more or less unform distribution of sessions between backends if number of sessions is much larger than number of backends. But certainly skews in client application access patterns can violate this assumption.
  3. Rescheduling is done at transaction level. So it is enough to have one entry in procarray for backend to correctly handle locks. Also transaction level pooling eliminates problem with false deadlocks (caused by lack of free executors in the pool). Also transaction level pooling minimize changes in Postgres core needed to maintain correct session context: no need to suspend/resume transaction state, static variables, ....
  4. In the main Postgres query loop in PostgresMain we determine a moment when backend is not in transaction state and perform select of sockets of all active sessions and choose one of them.
  5. When client is disconnected, then we close session but do not terminate backend.
  6. To support prepared statements, we append session identifier to the name of the statement. So prepared statements of different sessions will not interleave. As far as session is bounded to the backend, it is possible to use prepared statements.
  7. Each session has its own copy of non-default GUCs, so it is possible to use session variables for pooled connections.
  8. Temporary tables are supported by creating private temporary tablespace for each session.
  9. All pooled sessions are sharing the same local backend's caches and state of static variable. So it is assumed that their content is valid for all sessions executed by this backend.

As a result built-in session pool supports:

  • Prepared statements
  • Session variables
  • Temporary tables

Restrictions:

  • Long living transaction will block any other transactions scheduled for this worker. Waiting for user actions inside transactions is completely incompatible with this approach. It is possible to use "idle_in_transaction_session_timeout" variable to detect and abort such transactions.
  • Session worker can manage connections only to one database under the same user. Otherwise it will be necessary to cleanup caches at each reschedule. And reading startup package and authentication is done by backend, not by postmaster. So at the moment of scheduling connection to one of the workers, postmaster doesn't know to which database and under which user it will be connected. The current solution is to create several session pools and accept connection to each pool at separate port. All connections to the particular port should access the same database under the same user, otherwise connection will be rejected.

Configuration

Built-in page pool is controlled by the following GUC variables:

  • max_sessions Maximal number of client sessions which can be handled by one backend if session pooling is switched on. In theory it can be arbitrary large. Right now at Unix it is used for specifying epoll event set size. But managing too larger number of connections by one backend may suffer performance because just one transaction can be executed by backend at each moment of time. Default value is 1000.
  • session_pool_size Maximal number of workers (backends) in one connection pool. Zero (default) value of this parameter disable connection pooling. Workers are started on demand and never terminated (even if last connection to the worker is dropped). Pooling is done only after all session_pool_size workers were launched.
  • session_pool_ports Number of extra ports which PostgreSQL will listen to accept client session. Each such port has separate session pool. It is intended that each port corresponds to some particular database/user combination, so that all backends in this session pool will handle connection accessing this database. If session_pool_ports is non zero then postmaster will always spawn dedicated (non-pooling) backends at the main Postgres port. If session_pool_ports is zero and session_pool_size is not zero, then sessions (pooled connection) will be also accepted at main port. Session pool ports are allocated sequentially: if Postgres main port is 5432 and session_pool_ports is 2, then ports 5433 and 5434 will be used for connection pooling.

By default session pooling is switched of. To enable it assign non zero value to session_pool_size. If you wan to pool connections to different databases/users, then use session_pool_ports. When it is zero, then session pol is create for main PostgreSQL port. If it non-zero, then standard port will be used only for dedicated backends.

Performance

Internal connection pooling allows to use prepared statements which itself increase speed up to 2 times on OLTP queries. Also it allows to handle large number connections with limited number of active backends. Large number of backends cause significant degrade of Postgres performance because of increasing size of procarray, snapshots,... and high contention for resources between active transactions.

Below are results for read-only pgbench (-S):


#Connections\kTPS    Vanilla Postgres     Session pool size 256
    1k 	                  1300 	              1505
   10k                     633                1519
  100k	                     - 	              1425

As you can see, connection pooling allows to prevent degrade of performance for increasing number of connections.