Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix rejected connection on pause #420

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

bashtanov
Copy link

Sometimes, when executing PAUSE command, a connection may be refused. It turned out to happen when PAUSE finds a server connection in a login state. In this situation the server connection, according to PAUSE semantics, is closed, but it also marks the pool as last_login_failed and last_connect_failed. This may sometimes prevent reconnecting to a server, resulting in a client connection rejected.

To reproduce: make frequent random connections, disconnections and simple query executions with pgboucner concurrently (I can provide a python script if needed). In another console do psql -p 5433 -d pgbouncer -c 'pause'; psql -p 5433 -d pgbouncer -c 'resume';. From time to time the python script gets connection rejected.

The patch is to fix it. For now it's mostly a proof of concept, and a request for comments to the idea. Perhaps it's worth joining disconnect_server and disconnect_server_noblame into a single function with an extra argument, as well as tidying up the code a bit.

@bashtanov bashtanov force-pushed the fix-rejected-connection-on-pause branch from 3bdba8f to 5acd380 Compare September 26, 2019 08:39
@bashtanov
Copy link
Author

bashtanov commented Sep 26, 2019

It also looks like fixing 5-30-seconds stalls after RECONNECT, which appeared for the same reasons: a pool was incorrectly marked as last_connect_failed after server connections forced closed in login state.

@bashtanov
Copy link
Author

This is how I simulated the client load:

#!/usr/bin/env python3
import logging
import psycopg2
import random
import select

CONNECTION_STRING = 'host=/tmp/ port=5433 application_name=test'
MAX_CONNECTIONS = 100
PROB_CONNECT = 0.05
PROB_DISCONNECT = 0.02

_logger = logging.getLogger(__name__)

connections = []

def main():
    logging.basicConfig(format='%(asctime)s %(name)s %(message)s', level=logging.INFO)
    while True:
        tick()

def tick():
    r = random.random()
    if r < PROB_CONNECT:
        connect()
    elif r < PROB_CONNECT + PROB_DISCONNECT:
        disconnect()
    else:
        query()

def connect():
    if len(connections) < MAX_CONNECTIONS:
        conn = psycopg2.connect(CONNECTION_STRING, async=1)
        #conn.set_session(autocommit=True)
        connections.append(conn)
        wait(conn)
        logging.info('connected')
    else:
        logging.warn('no connection slots left')


def disconnect():
    if connections:
        connections.pop(random.randint(0, len(connections) - 1)).close()
        logging.info('disconnected')
    else:
        logging.warn('nothing to disconnect')

def query():
    free_connections = [conn for conn in connections if not conn.isexecuting()]
    if free_connections:        
        conn = random.choice(free_connections)
        cur = conn.cursor()
        duration = random.triangular(0.1, 1.0, 0.1)
        logging.info('firing query {}'.format(duration))
        cur.execute('select pg_sleep(%s)', (duration,))
    else:
        logging.warn('no connections')
        
def wait(conn):
    while True:
        state = conn.poll()
        if state == psycopg2.extensions.POLL_OK:
            break
        elif state == psycopg2.extensions.POLL_WRITE:
            select.select([], [conn.fileno()], [])
        elif state == psycopg2.extensions.POLL_READ:
            select.select([conn.fileno()], [], [])
        else:
            raise psycopg2.OperationalError("poll() returned %s" % state)

if __name__ == '__main__':
    main()

@bashtanov
Copy link
Author

I've tidies up disconnect_server function to accept a boolean parameter instead of having two separate functions. I would appreciate if someone could have a look at the change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant