Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oracledb2 hangs when getting connection from pool #275

Closed
ThiefMaster opened this issue Dec 21, 2023 · 17 comments
Closed

oracledb2 hangs when getting connection from pool #275

ThiefMaster opened this issue Dec 21, 2023 · 17 comments
Labels
bug Something isn't working patch available

Comments

@ThiefMaster
Copy link

ThiefMaster commented Dec 21, 2023

The same code worked just fine on oracledb<2, but after updating to oracledb==2.0.0, it just hangs during pool.acquire()

import oracledb
dsn = oracledb.makedsn('XXX', 10121, service_name='XXX')
print(dsn)
pool = oracledb.create_pool(dsn=dsn, user='XXX', password='XXX', min=1, max=5, increment=1, ping_interval=2)
print('created pool')
with pool.acquire() as conn:
    print('connected!')

When I set the pool's getmode to nowait, it fails with "DPY-4005: timed out waiting for the connection pool to return a connection" instead of freezing.

A normal connection using oracledb.connect(dns=dns, user=..., password=...) works fine.

Version information:

platform.platform: Linux-5.15.88-gentoo-x86_64-AMD_Ryzen_5_3600X_6-Core_Processor-with-glibc2.37
sys.maxsize > 2**32: True
platform.python_version: 3.12.1
oracledb.__version__: 2.0.0
@ThiefMaster ThiefMaster added the bug Something isn't working label Dec 21, 2023
@cjbj
Copy link
Member

cjbj commented Dec 21, 2023

Can you triple check the old version is still OK?
What's the DB version?

@ThiefMaster
Copy link
Author

SELECT * FROM v$version:

('Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production', 'Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production\nVersion 19.17.0.0.0', 'Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production', 0)

Any yes, after a pip install oracledb<2 it works fine, while after updating it breaks again.

@anthony-tuininga
Copy link
Member

That seems strange. The code you provided works just fine for me. The test suite also has very similar code and the entire test suite passes, too. That suggests something very specific to your configuration. If you break while the hang is occurring, can you provide the traceback? That may prove enlightening.

@ThiefMaster
Copy link
Author

$ python /tmp/oracletest.py
^CTraceback (most recent call last):
  File "/tmp/oracletest.py", line 4, in <module>
    with pool.acquire() as conn:
         ^^^^^^^^^^^^^^
  File ".../env-312/lib/python3.12/site-packages/oracledb/pool.py", line 148, in acquire
    return self._connection_method(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../env-312/lib/python3.12/site-packages/oracledb/connection.py", line 1134, in connect
    return conn_class(dsn=dsn, pool=pool, params=params, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../env-312/lib/python3.12/site-packages/oracledb/connection.py", line 520, in __init__
    impl = pool_impl.acquire(params_impl)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "src/oracledb/impl/thin/pool.pyx", line 594, in oracledb.thin_impl.ThinPoolImpl.acquire
  File "src/oracledb/impl/thin/pool.pyx", line 609, in oracledb.thin_impl.ThinPoolImpl.acquire
  File "/home/adrian/.pyenv/versions/3.12.1/lib/python3.12/threading.py", line 390, in wait_for
    self.wait(waittime)
  File "/home/adrian/.pyenv/versions/3.12.1/lib/python3.12/threading.py", line 355, in wait
    waiter.acquire()
KeyboardInterrupt

@anthony-tuininga
Copy link
Member

That is, unfortunately, not enlightening. :-( Can you add a time.sleep() call immediately after the pool creation? A second should be enough unless your database is very remote? Let me know if that "resolves" the issue for you. I am still unable to replicate.

@ThiefMaster
Copy link
Author

same problem :/

@anthony-tuininga
Copy link
Member

Ok. Thanks for checking. Is this a regular local database? Or is it remote? Any special configuration?

I note that the output doesn't match your script above (which printed some progress)? Does it work ok if you use a standalone connection? Does it work if you increase the minimum number of connections to 2 and then wait long enough for both of them to be created by the background thread? Can you set the environment variable PYO_DEBUG_PACKETS to some value and ensure that the connections are in fact being created? Anything else you can think of that might pinpoint the source of the issue for you? If you are able and willing to build from source I can suggest a few changes to see where things are going awry. You are the only one to note this so far and pooling is quite heavily used -- and our testing team tried on all platforms (macOS, Windows, Linux) and all database versions before we released this version -- which all suggests something very specific to your configuration. I would very much like to know what it is that is causing the problem!

@ThiefMaster
Copy link
Author

import oracledb
import time
dsn = oracledb.makedsn('pdb-*.cern.ch', 10121, service_name='PDB_******.cern.ch')
pool = oracledb.create_pool(dsn=dsn, user='***', password='***', min=1, max=5, increment=1, ping_interval=2, getmode=oracledb.POOL_GETMODE_FORCEGET)
print('created')
time.sleep(1)
print('acquiring')
with pool.acquire() as conn:
    print('connected!')

this is the script I've used the last time (just censored the full hostname and credentials).

The issue happens both on my alma9 production system and on my gentoo dev system

If you are able and willing to build from source I can suggest a few changes to see where things are going awry

Sure! (if you provide me with wheels I can also test those, might be even faster)

@anthony-tuininga
Copy link
Member

Thanks. The output you provided earlier doesn't match, however, as it doesn't show any output. Can you verify that you are able to create a standalone connection to this database? Can you set the environment variable PYO_DEBUG_PACKETS to verify that the pool is creating the connection successfully? I'll look into some changes to help debug this situation for you.

@ThiefMaster
Copy link
Author

Yes, when using this it works fine:

conn = oracledb.connect(dsn=dsn, user='***', password='***')
with conn:
    ...

--

PYO_DEBUG_PACKETS=1 shows me that there's lots of network activity while it keeps trying to connect. Not sure what exactly to look for in that data though and I assume it contains some sensitive data from authentication so I'd rather not post it publicly here. However, if you have matrix (or an email address) I could share it there.

@anthony-tuininga
Copy link
Member

Thanks for confirming that. My e-mail address is posted in the setup configuration for the project: https://github.com/oracle/python-oracledb/blob/main/setup.cfg. Does increasing the minimum to 2 and increasing the number of seconds to wait to 2 help any?

@ThiefMaster
Copy link
Author

min=2 doesn't help, neither does sleeping longer (or doing both). I emailed you the log.

anthony-tuininga added a commit that referenced this issue Jan 5, 2024
with either a connection pool or using asyncio (#275).
@anthony-tuininga
Copy link
Member

Thanks for your help with getting to the bottom of this issue. I was able to replicate and have verified the fix. I hope you are able to confirm that yourself!

@ThiefMaster

This comment was marked as outdated.

@ThiefMaster
Copy link
Author

The crash and sync problems are gone, async still randomly fails with "invalid redirect data" though

anthony-tuininga added a commit that referenced this issue Jan 5, 2024
does not arrive before the first part has been parsed (#275).
@anthony-tuininga
Copy link
Member

I was able to replicate the "invalid redirect data" issue as well. In my configuration it occurs rather rarely but often enough for me to discover the source of the issue: the redirect packet comes in two pieces and the first part can be completely processed before the second part arrives, which the code didn't handle correctly. It now does. Let me know if you are able to verify that, too. Thanks again for your help!

@anthony-tuininga
Copy link
Member

The patch has been included in version 2.0.1 which was just released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working patch available
Projects
None yet
Development

No branches or pull requests

3 participants