Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent calls to psycopg2.connect() crash on Windows with both binary and source builds of psycopg2 #836

Closed
s3rvac opened this Issue Jan 24, 2019 · 21 comments

Comments

Projects
None yet
4 participants
@s3rvac
Copy link

s3rvac commented Jan 24, 2019

In one of our Python scripts that is being run on Windows, we have switched from MySQL (pymysql) to PostgreSQL (psycopg2). Since then, the script started crashing. Upon investigation, it turned out that it crashes when concurrently calling psycopg2.connect() from multiple threads (see the minimal example below).

As per the official documentation and #543, we have tried the source build of psycopg2, but it has not fixed the issue.

Steps to reproduce

  • Install Python 3.7 from https://www.python.org/downloads/release/python-372/
  • Install PostgreSQL 10 from https://www.postgresql.org/download/windows/
  • Build psycopg2 from source and install it into a virtual environment:
    python -m venv virtualenv
    virtualenv\Scripts\activate.bat
    pip install --no-binary psycopg2 psycopg2
    
  • Run the following script via python test.py:
    import multiprocessing.pool
    import psycopg2
    
    def check(_):
        connection = psycopg2.connect(host='...', user='...', password='...', dbname='...')
        connection.close()
    
    if __name__ == '__main__':
        with multiprocessing.pool.ThreadPool(50) as pool:
            pool.map(check, range(1000))

Expected behavior

The script runs successfully.

Actual behavior

The script crashes inside ntdll.dll -> libeay32.dll -> ssleay32.dll -> libpq.dll -> _psycopg.cp37-win32.pyd.

The stacktrace above indicates that the crash happens inside OpenSSL. Do you have any hints what might be causing this issue or how to proceed with the debugging?

Configuration

  • Windows 7 SP1 (32b)
  • Microsoft Visual Studio 2017 (32b)
  • PostgreSQL 10 (32b) -- the latest 32b version
  • psycopg2 2.7.7 -- the latest version
  • Python 3.7.2 (32b) -- the latest version

Notes

  • On Linux, everything works fine.
  • The Windows version does not matter (my colleague has the same issue on 64b Windows 10).
  • The order of imports does not matter (it crashes even when I switch them).
  • The used database server does not seem to matter (it crashes even when I connect to a different server).
  • Serializing the calls to psycopg2.connect() with threading.Lock fixes the crashes, i.e. the crash happens only when several threads call psycopg2.connect() at the same time. However, based on the official documentation, psycopg2 has thread safety 2 (in DB API 2.0 parlance), so concurrent calls to psycopg2.connect() should not be a problem.
@dvarrazzo

This comment has been minimized.

Copy link
Member

dvarrazzo commented Jan 25, 2019

What OpenSSL version are you using? Psycopg uses 1.0.2 but maybe it would be the case to move to 1.1 as suggested in #543?

@s3rvac

This comment has been minimized.

Copy link
Author

s3rvac commented Jan 28, 2019

Dependency Walker shows that _psycopg.cp37-win32.pyd depends on libeay32.dll and ssleay32.dll version 1.0.2.14.

@dvarrazzo

This comment has been minimized.

Copy link
Member

dvarrazzo commented Apr 12, 2019

We will release packages bundled with openssl 1.1 in 2.8.2.

Can you try if the packages installed by:

pip install -i https://test.pypi.org/simple/ psycopg2-binary==2.8.2.dev0

work as expected?

@dvarrazzo

This comment has been minimized.

Copy link
Member

dvarrazzo commented Apr 12, 2019

@jerickso could you be able to give a quick spin to these packages?

pip install -i https://test.pypi.org/simple/ psycopg2==2.8.2.dev0

just to check if basic ssl support works. I tried testing it on appveyor but it didn't like that

@jerickso

This comment has been minimized.

Copy link
Member

jerickso commented Apr 12, 2019

Sure thing. I've moved to a new computer, so it will take me a tish more time to do the test while I install everything.

@jerickso

This comment has been minimized.

Copy link
Member

jerickso commented Apr 12, 2019

So I tested out the development package. When testing against a PostgreSQL 11.2 Windows server, it fails to connect up in SSL mode. So I fixed the AppVeyor config to setup the local PostgreSQL server in SSL mode, and it connects up and passes those tests. But when I download the packages that pass the AppVeyor tests, it still does not connect up to the Windows PostgreSQL server.

I am going to have to do some more testing against different versions, as we are not there yet, but I'm hoping it is user (my) error.

BTW, I've issued a pull request to you for the SSL changes.

@jerickso

This comment has been minimized.

Copy link
Member

jerickso commented Apr 13, 2019

Not much farther along with a solution, but wanted to document where I'm at.

The error I see when using psycopg2 with the OpenSSL 1.1.1 build outside of Appveyor is:
psycopg2.OperationalError: could not create SSL context: No such process

This error is passed through from libpq. Ironically, there is not a lot out there on this error with the exception of this stackoverflow question. With this issue, someone had an issue with an Anaconda version of psycopg2, and they solved it by uninstalling it and installing the default pip version.

I see this error with both 32 and 64bit Python, and with Python 2.7 and Python 3.7. I also see the error when connecting up to a Unix PostgreSQL server (version 9.6) and a Windows PostgreSQL server (version 11.2). If I 'downgrade' Psycopg2 to the 2.8.1 version (or even 2.7.6.1), it works.

Another item to note: Modifying the Appveyor build to require SSL connections with the PostgreSQL server does NOT have the error show up.

This leaves me with a few questions:

  • Why does the running the tests in Appveyor succeed when it fails on other machines? This really bothers me as how do we know if we have a good build?
  • Why are there no build errors? Our the patches masking the issue?
  • What does 'could not create SSL context: No such process' mean?
@dvarrazzo

This comment has been minimized.

Copy link
Member

dvarrazzo commented Apr 13, 2019

"could not create SSL context" is a message from the libpq. the "No such process" comes from SSL_CTX_new().

This maybe gives more googling possibility. First hit is npm/npm#17261 which seems promising.

@jerickso

This comment has been minimized.

Copy link
Member

jerickso commented Apr 13, 2019

Great google-fu there @dvarrazzo! Using the information in that npm post, I think it explain what is seen.

In my test environments, if I create the environment variable OPENSSL_CONF and point it to a file that exists (could be an empty file), then your build works as expected. In the Appveyor environment, the OPENSSL_CONF environment variable is set, which explains why it works there. The good news is we can unset the variable in Appveyor and have it fail.

The challenge now is how to work around it requiring the config file. I'll see what I can do with that.

@dvarrazzo

This comment has been minimized.

Copy link
Member

dvarrazzo commented Apr 13, 2019

Cute that if it doesn't find a file it tells you it doesn't find a process <3

I thought as well that as a test strategy it could be good to unset that env var and see it fail.

How to not make it fail... Should we bundle an openssl.cnf (empty or otherwise) and set the env var there before connection unless we find it previously set?

@dvarrazzo

This comment has been minimized.

Copy link
Member

dvarrazzo commented Apr 13, 2019

I don't know if this from libpq ssl support docs offers further info.

libpq reads the system-wide OpenSSL configuration file. By default, this file is named openssl.cnf and is located in the directory reported by openssl version -d. This default can be overridden by setting environment variable OPENSSL_CONF to the name of the desired configuration file.

The default is whatever set on config via --openssldir.

@jerickso

This comment has been minimized.

Copy link
Member

jerickso commented Apr 14, 2019

Think I have a working solution.

There is a #define for the PostgreSQL build, HAVE_OPENSSL_INIT_SSL, that when not define, uses some deprecated OpenSSL calls in libpq during the initialization phase. One of these three calls (OPENSSL_config/SSL_library_init/SSL_load_error_strings) must do something differently then with OpenSSL 1.0 and want to have a configuration file to load.

The solution is defining HAVE_OPENSSL_INIT_SSL, which takes the newer code path which the bug/feature does not show up in. I was initially surprised that it wasn't defined before, but then I realized this define was in the PostgreSQL build, which must not be able to determine what version of OpenSSL it is building against (at least in Windows).

I'll issue a pull request once it runs through and tests all the Windows builds.

@dvarrazzo dvarrazzo closed this in 26b61e8 Apr 14, 2019

@dvarrazzo

This comment has been minimized.

Copy link
Member

dvarrazzo commented Apr 14, 2019

@s3rvac Psycopg 2.8.2 is built with OpenSSL 1.1 packages, which should be more thread safe. Please let us know if it works for you.

@s3rvac

This comment has been minimized.

Copy link
Author

s3rvac commented Apr 16, 2019

@dvarrazzo Good news. I have tried both the source build and the binary build of Psycopg 2.8.2 and they both worked without any issues. Thank you for your help! 👍

@dvarrazzo

This comment has been minimized.

Copy link
Member

dvarrazzo commented Apr 16, 2019

@s3rvac that's cool, thanks for letting us know.

Any chance you could write a script to reproduce the issue, or at least describe the setup of your program?

@s3rvac

This comment has been minimized.

Copy link
Author

s3rvac commented Apr 16, 2019

Everything is described in the issue description. Only instead of

pip install --no-binary psycopg2 psycopg2

(which installs 2.8.2 at the moment) you need to install version 2.7, e.g.:

pip install --no-binary psycopg2 psycopg2==2.7.7
@dvarrazzo

This comment has been minimized.

Copy link
Member

dvarrazzo commented Apr 16, 2019

Oh right, that's actually helpful. I'll see what I can do out of it.

@matthew-brett

This comment has been minimized.

Copy link

matthew-brett commented Apr 18, 2019

@jerickso - should the HAVE_OPENSSL_INIT_SSL define be present in the Mac and Linux builds? I guess so.

@dvarrazzo

This comment has been minimized.

Copy link
Member

dvarrazzo commented Apr 18, 2019

@matthew-brett I think configure on Linux does the right thing. after running build_libpq.sh in a manylinux docker image I see:

[root@... build]# grep HAVE_OPENSSL_INIT_SSL ./postgres-REL_11_2/src/include/pg_config.h
#define HAVE_OPENSSL_INIT_SSL 1
@matthew-brett

This comment has been minimized.

Copy link

matthew-brett commented Apr 18, 2019

Ah yes - a build on macOS just now has the same setting.

@jerickso

This comment has been minimized.

Copy link
Member

jerickso commented Apr 18, 2019

I was thinking that on Linux (and probably Mac) the configure script should detect OpenSSL 1.1.x and set the correct defines, so I'm glad to hear that they are.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.