Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

psycopg2.OperationalError: server closed the connection unexpectedly #829

Closed
RekGRpth opened this Issue Dec 21, 2018 · 5 comments

Comments

Projects
None yet
3 participants
@RekGRpth
Copy link

commented Dec 21, 2018

this code

sql = 'select pg_backend_pid()'

class DAL():
    def __init__(self, uri):
        from psycopg2 import connect
        #from pg8000 import connect
        self.cursor = connect(user='postgres', host='postgres').cursor()

    def executesql(self, sql):
        self.cursor.execute(sql)
        return self.cursor.fetchall()

def thread():
    db = DAL('')
    i = 10
    import time
    from pprint import pprint
    while i > 0:
        pprint(dict(function='thread:run', executesql='%s' % db.executesql(sql)[0][0]))
        time.sleep(1)
        i -= 1

def process():
    from pprint import pprint
    pprint(dict(function='process:begin'))
    import time
    time.sleep(2)
    pprint(dict(function='process:end'))

def main():
    from pprint import pprint
    pprint(dict(function='main:begin'))
    import threading
    threading.Thread(target=thread, name='mythread').start()
    import time
    time.sleep(2)
    import multiprocessing
    multiprocessing.Process(target=process, name='myprocess').start()
    pprint(dict(function='main:end'))

if __name__ == '__main__': main()

raises following exception

Exception in thread mythread:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.7/threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "test3.py", line 19, in thread
    pprint(dict(function='thread:run', executesql='%s' % db.executesql(sql)[0][0]))
  File "test3.py", line 10, in executesql
    self.cursor.execute(sql)
psycopg2.OperationalError: server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.

but with pg8000 everething is ok

@RekGRpth

This comment has been minimized.

Copy link
Author

commented Dec 21, 2018

also, with psycopg2 on python2 everething is ok

@dvarrazzo

This comment has been minimized.

Copy link
Member

commented Dec 21, 2018

try keeping the connection alive. instead of:

    self.cursor = connect(user='postgres', host='postgres').cursor()

try:

    self.connection = connect(user='postgres', host='postgres')
    self.cursor = self.connection.cursor()
@RekGRpth

This comment has been minimized.

Copy link
Author

commented Dec 22, 2018

result is same

@dimandzhi

This comment has been minimized.

Copy link

commented Jan 24, 2019

I'm having the same issue. For now as a workaround had to create wrapper around connection object with used methods to hook them. In cursor() method had to do a SELECT 1 "ping" check and reconnect if needed to return a valid working cursor.

@dvarrazzo

This comment has been minimized.

Copy link
Member

commented Mar 17, 2019

I've played a bit with this issue, which is reproducible. Using the following script:

import time
import threading
import multiprocessing
from pprint import pprint
from psycopg2 import connect

sql = 'select pg_backend_pid()'


class DAL():
    def __init__(self, uri):
        self.conn = connect("")
        self.cursor = self.conn.cursor()

    def executesql(self, sql):
        self.cursor.execute(sql)
        return self.cursor.fetchall()


def thread():
    db = DAL('')
    i = 3
    while i > 0:
        pprint(dict(
            function='thread:run', executesql='%s' % db.executesql(sql)[0][0]))
        time.sleep(1)
        i -= 1


def process():
    pprint(dict(function='process:begin'))
    time.sleep(2)
    pprint(dict(function='process:end'))


def main():
    pprint(dict(function='main:begin'))
    t = threading.Thread(target=thread, name='mythread')
    t.start()
    time.sleep(1)
    multiprocessing.Process(target=process, name='myprocess').start()
    t.join()
    pprint(dict(function='main:end'))


if __name__ == '__main__':
    main()

The script has the following output, on Py 2.7 and 3.6:

$ PYTHONPATH=build/lib.2.7/ python ./test-829.py 
{'function': 'main:begin'}
{'executesql': '16892', 'function': 'thread:run'}
{'function': 'process:begin'}
{'executesql': '16892', 'function': 'thread:run'}
{'executesql': '16892', 'function': 'thread:run'}
{'function': 'process:end'}
{'function': 'main:end'}
$ PYTHONPATH=build/lib.3.6/ python3 ./test-829.py 
{'function': 'main:begin'}
{'executesql': '16956', 'function': 'thread:run'}
{'executesql': '16956', 'function': 'thread:run'}
{'function': 'process:begin'}
Exception in thread mythread:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "./test-829.py", line 25, in thread
    function='thread:run', executesql='%s' % db.executesql(sql)[0][0]))
  File "./test-829.py", line 16, in executesql
    self.cursor.execute(sql)
psycopg2.OperationalError: server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.


{'function': 'main:end'}
{'function': 'process:end'}

Saving the output of strace in two different files:

PYTHONPATH=build/lib.2.7/ strace -f python -u ./test-829.py > out-829-py2 2>&1
PYTHONPATH=build/lib.3.6/ strace -f python3 -u ./test-829.py > out-829-py3 2>&1

after making the pid numbers uniform in the files (P0, P1, P2), extracting the relevant part (omitting what's before the connection to the db):

python 2 strace

$ egrep '(= 3$|sendto|close|clone)' out-829-py2
clone(child_stack=0x7fd47e85afb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fd47e85b9d0, tls=0x7fd47e85b700, child_tidptr=0x7fd47e85b9d0) = P1
[...]
[pid  P1] socket(AF_UNIX, SOCK_STREAM, 0) = 3
[pid  P1] sendto(3, "\0\0\0?\0\3\0\0user\0piro\0database\0piro\0"..., 63, MSG_NOSIGNAL, NULL, 0) = 63
[pid  P1] sendto(3, "Q\0\0\0\nBEGIN\0", 11, MSG_NOSIGNAL, NULL, 0) = 11
[pid  P1] sendto(3, "Q\0\0\0\34select pg_backend_pid()\0", 29, MSG_NOSIGNAL, NULL, 0) = 29
[pid  P0] close(6)                    = 0
[pid  P0] close(4)                    = 0
[pid  P0] clone( <unfinished ...>
[pid  P0] <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fd485bfda10) = P2
[pid  P1] sendto(3, "Q\0\0\0\34select pg_backend_pid()\0", 29, MSG_NOSIGNAL, NULL, 0) = 29
[pid  P1] sendto(3, "Q\0\0\0\34select pg_backend_pid()\0", 29, MSG_NOSIGNAL, NULL, 0) = 29
[pid  P1] sendto(3, "X\0\0\0\4", 5, MSG_NOSIGNAL, NULL, 0) = 5
[pid  P1] close(3 <unfinished ...>
[pid  P1] <... close resumed> )       = 0
close(5)                                = 0

python 3 strace

$ egrep '(= 3$|sendto|close|clone)' out-829-py3
clone(child_stack=0x7f4662f92fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f4662f939d0, tls=0x7f4662f93700, child_tidptr=0x7f4662f939d0) = P1
[...]
[pid  P1] socket(AF_UNIX, SOCK_STREAM, 0) = 3
[pid  P1] sendto(3, "\0\0\0?\0\3\0\0user\0piro\0database\0piro\0"..., 63, MSG_NOSIGNAL, NULL, 0) = 63
[pid  P1] sendto(3, "Q\0\0\0\nBEGIN\0", 11, MSG_NOSIGNAL, NULL, 0) = 11
[pid  P1] sendto(3, "Q\0\0\0\34select pg_backend_pid()\0", 29, MSG_NOSIGNAL, NULL, 0) = 29
[pid  P0] close(4)                    = 0
[pid  P1] sendto(3, "Q\0\0\0\34select pg_backend_pid()\0", 29, MSG_NOSIGNAL, NULL, 0 <unfinished ...>
[pid  P1] <... sendto resumed> )      = 29
[pid  P0] close(4)                    = 0
[pid  P0] close(4)                    = 0
[pid  P0] clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f466a434a10) = P2
[pid  P0] close(5)                    = 0
[pid  P2] sendto(3, "X\0\0\0\4", 5, MSG_NOSIGNAL, NULL, 0) = 5
[pid  P2] close(3)                    = 0
[pid  P2] close(4)                    = 0
[pid  P2] openat(AT_FDCWD, "/dev/null", O_RDONLY|O_CLOEXEC) = 3
[pid  P1] sendto(3, "Q\0\0\0\34select pg_backend_pid()\0", 29, MSG_NOSIGNAL, NULL, 0) = -1 EPIPE (Broken pipe)
[pid  P1] close(3)                    = 0
[pid  P1] openat(AT_FDCWD, "/usr/lib/python3.6/threading.py", O_RDONLY|O_CLOEXEC) = 3
[pid  P1] close(3)                    = 0
[pid  P1] openat(AT_FDCWD, "./test-829.py", O_RDONLY|O_CLOEXEC) = 3
[pid  P1] close(3)                    = 0
psycopg2.OperationalError: server closed the connection unexpectedly
[pid  P1] openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
[pid  P1] close(3)                    = 0
[pid  P1] openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = 3
[pid  P1] close(3)                    = 0

I see here that on Python 3, the process P2, politely, closes the connection belonging to P1 (the X command in the sendto() call) 😳

Similar behaviour can be seen in a run with debug enabled: on Python 3 the trace would be something like:

[P1] pq_execute: executing SYNC query: pgconn = 0xP1
[P1]     select pg_backend_pid()
[P1] pq_execute: entering synchronous DBAPI compatibility mode
[P1] pq_fetch: pgstatus = PGRES_TUPLES_OK
[P1] pq_fetch: got tuples
[P1] _pq_fetch_tuples: looking for cast 23:
[P1] curs_get_cast:        per-connection dict: (nil)
[P1] curs_get_cast:        global dict: 0x..
[P1] _pq_fetch_tuples: using cast at 0x.. for type 23
[P1] psyco_curs_execute: res = 1, pgres = 0xP2
[P2] conn_close: PQfinish called
[P1] _psyco_curs_prefetch: result = 0
[P2] connection_dealloc: deleted connection object at 0x7f9e15f28180, refcnt = 0
[P1] _psyco_curs_buildrow: row 0, element 0, len 5
[P2] cursor_dealloc: deleted cursor object at 0x7f9e161069f8, refcnt = 0
[P1] _psyco_curs_buildrow: val->refcnt = 1
{'executesql': 'P3', 'function': 'thread:run'}
{'function': 'process:begin'}
[P1] psyco_curs_execute: starting execution of new query
[P1] pq_execute: pg connection at 0xP1 OK
[P1] pq_begin_locked: pgconn = 0xP1, autocommit = 0, status = 2
[P1] pq_begin_locked: transaction in progress
[P1] pq_execute: executing SYNC query: pgconn = 0xP1
[P1]     select pg_backend_pid()
[P1] psyco_curs_execute: res = -1, pgres = (nil)

So, on Py3 it seems the objects undergo GC, which for the connection sends a close command to an open FD.

Maybe the thing can be fixed by making sure deleting the object doesn't call PQfinish if the pid is not the one the connection was created into.

dvarrazzo added a commit that referenced this issue Mar 17, 2019

Added test to reproduce ticket #829
Unrelated processes close the FD of the connection. This happens in
Python 3.6 but not 2.7. Let's see if travis shows where else it fails...

@dvarrazzo dvarrazzo closed this in f8f5a77 Mar 18, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.