Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

identified a possible hang in async cursor.executemany() #278

Closed
zzzeek opened this issue Dec 29, 2023 · 17 comments
Closed

identified a possible hang in async cursor.executemany() #278

zzzeek opened this issue Dec 29, 2023 · 17 comments
Labels
bug Something isn't working patch available

Comments

@zzzeek
Copy link

zzzeek commented Dec 29, 2023

hi gang -

our sqlalchemy test suite for the new async oracledb driver we are developing hangs semi-reliably on one particular series of tests that are fielding an intentional IntegrityError from a cursor.executemany() call.

the condition is extremely difficult to reproduce, and I've come up with a script that seems to reproduce it a bit more portably when run on a linux machine, but it still takes many iterations for the condition to hit. The script below runs the same set of operations over and over again - running it against an Oracle 18c express database here, it will hang usually within 200 iterations. but i have no idea if it will do this in other circumstances. if i move the create table step out of the loop, it doesnt hang at all.

import asyncio
import oracledb
import uuid


async def main():
    conn = await oracledb.connect_async(
        dsn="(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=oracle18c)(PORT=1521))(CONNECT_DATA=(SID=xe)))",
        password="tiger",
        user="scott",
    )
    cursor = conn.cursor()

    try:
        await cursor.execute("DROP TABLE data")
    except:
        pass

    cursor.close()

    i = 0
    while True:
        await run_async_thing(conn)
        i += 1
        print(f"finished run: {i}")


async def run_async_thing(conn):
    cursor = conn.cursor()

    await cursor.execute(
        """
CREATE TABLE data (
    id VARCHAR2(30 CHAR) NOT NULL,
    data VARCHAR2(50 CHAR),
    has_server_default VARCHAR2(30 CHAR) DEFAULT 'some_server_default',
    "uid" CHAR(32),
    PRIMARY KEY (id)
)

"""
    )

    ret0, ret1, ret2 = (
        cursor.var(oracledb.STRING, arraysize=10),
        cursor.var(oracledb.STRING, arraysize=10),
        cursor.var(oracledb.STRING, arraysize=10),
    )

    parameters = [
        {
            "data": f"d{i}",
            '"uid"': uuid.uuid4().hex,
            "ret_0": ret0,
            "ret_1": ret1,
            "ret_2": ret2,
        }
        for i in range(10)
    ]

    try:
        await cursor.executemany(
            'INSERT INTO data (data, "uid") VALUES (:data, :"uid") '
            "RETURNING data.id, data.data, data.has_server_default "
            "INTO :ret_0, :ret_1, :ret_2",
            parameters,
        )
    except oracledb.IntegrityError as err:
        print("got error as expected!")

    print("next go!")

    await cursor.execute("DROP TABLE data")
    cursor.close()


asyncio.run(main())

so if I run it here it looks like:

got error as expected!
next go!
finished run: 1
got error as expected!
next go!
finished run: 2
got error as expected!
next go!
finished run: 3
got error as expected!
next go!
finished run: 4
got error as expected!
next go!

... keeps running ...


finished run: 29
got error as expected!
next go!
finished run: 30
got error as expected!
next go!
finished run: 31

then it's hung. press ctrl-C and stack shows as follows:

^CTraceback (most recent call last):
  File "/usr/lib64/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/asyncio/base_events.py", line 684, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/classic/dev/sqlalchemy/test4.py", line 23, in main
    await run_async_thing(conn)
  File "/home/classic/dev/sqlalchemy/test4.py", line 62, in run_async_thing
    await cursor.executemany(
  File "/home/classic/.venv3/lib64/python3.12/site-packages/oracledb/cursor.py", line 1067, in executemany
    await self._impl.executemany(
  File "src/oracledb/impl/thin/cursor.pyx", line 297, in executemany
  File "src/oracledb/impl/thin/protocol.pyx", line 817, in _process_single_message
  File "src/oracledb/impl/thin/protocol.pyx", line 818, in oracledb.thin_impl.BaseAsyncProtocol._process_single_message
  File "src/oracledb/impl/thin/protocol.pyx", line 769, in _process_message
  File "src/oracledb/impl/thin/protocol.pyx", line 872, in _reset
  File "src/oracledb/impl/thin/packet.pyx", line 571, in wait_for_packets_async
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/classic/dev/sqlalchemy/test4.py", line 77, in <module>
    asyncio.run(main())
  File "/usr/lib64/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/asyncio/runners.py", line 123, in run
    raise KeyboardInterrupt()
KeyboardInterrupt

that's what I got so far. Will see if @CaselIT can reproduce on his environment.

@zzzeek zzzeek added the bug Something isn't working label Dec 29, 2023
@CaselIT
Copy link

CaselIT commented Dec 29, 2023

Running from windows with oracle on docker (podman on fedora over wsl2 actually) I can't reproduce (stopped with loop over 1700) with 23 (image gvenzl/oracle-free:23-slim) and 21 (image gvenzl/oracle-xe:21-slim)

With 18 (image gvenzl/oracle-xe:18-slim) it tripped at 534. Stack trace below

Maybe it has something to do with version 18 of oracle?

@zzzeek maybe the best solution is to update the ci to 21 or 23? I'll leave the script running a bit more on 21 to see if it happens

Traceback (most recent call last):
  File "C:\Users\cfede\scoop\apps\python\current\Lib\asyncio\runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\cfede\scoop\apps\python\current\Lib\asyncio\base_events.py", line 684, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "C:\Users\cfede\Dev\GitHub\sqlalchemy\test4.py", line 25, in main
    await run_async_thing(conn)
  File "C:\Users\cfede\Dev\GitHub\sqlalchemy\test4.py", line 64, in run_async_thing
    await cursor.executemany(
  File "C:\Users\cfede\venvs\sa\Lib\site-packages\oracledb\cursor.py", line 1067, in executemany
    await self._impl.executemany(
  File "src\\oracledb\\impl/thin/cursor.pyx", line 297, in executemany
  File "src\\oracledb\\impl/thin/protocol.pyx", line 817, in _process_single_message
  File "src\\oracledb\\impl/thin/protocol.pyx", line 818, in oracledb.thin_impl.BaseAsyncProtocol._process_single_message
  File "src\\oracledb\\impl/thin/protocol.pyx", line 769, in _process_message
  File "src\\oracledb\\impl/thin/protocol.pyx", line 872, in _reset
  File "src\\oracledb\\impl/thin/packet.pyx", line 571, in wait_for_packets_async
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\cfede\Dev\GitHub\sqlalchemy\test4.py", line 79, in <module>
    asyncio.run(main())
  File "C:\Users\cfede\scoop\apps\python\current\Lib\asyncio\runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "C:\Users\cfede\scoop\apps\python\current\Lib\asyncio\runners.py", line 123, in run
    raise KeyboardInterrupt()
KeyboardInterrupt

@CaselIT
Copy link

CaselIT commented Dec 29, 2023

Just to add that the same suite has run for years using oracledb and cx-oracle with the sync version and it never hanged

@zzzeek
Copy link
Author

zzzeek commented Dec 29, 2023

well we can update to 21/23 sure, for now I just have the test marked to skip on oracledb async. great news it repros!

@CaselIT
Copy link

CaselIT commented Dec 29, 2023

I'll leave the script running a bit more on 21 to see if it happens

over 25k and it was still running. By this limited test it seems this has something to do with 18 only.

interestingly when I went to stop the 18 container (gvenzl/oracle-xe:18-slim) after the script hanged the db did not close cleanly, and got this warning

WARN[0010] StopSignal SIGTERM failed to stop container oracle in 10 seconds, resorting to SIGKILL

@anthony-tuininga
Copy link
Member

I noticed with #277 that Python 3.11 and higher do not experience the hang but Python 3.10 and lower do. I also noticed that the uvloop alternative implementation works fine with Python 3.10 and lower. Which Python version are you using? Are you able to try with the uvloop implementation? It may be something specific to the asyncio implementation.

@zzzeek
Copy link
Author

zzzeek commented Jan 2, 2024

I get the hang here with python 3.12 and we see it on 3.11 as well

@anthony-tuininga
Copy link
Member

Ok. Thanks for confirming that. That suggests the issues are unrelated. It was worth checking though!

anthony-tuininga added a commit that referenced this issue Jan 3, 2024
…when

using asyncio and the database raises an error and output variables are
present (#278).
@anthony-tuininga
Copy link
Member

I was able to replicate with Oracle Database 18c and discovered that it was a problem with the break/reset mechanism timing. I have pushed a patch that corrects this for me. If you are able to build from source you can verify that it works for you, too.

@CaselIT
Copy link

CaselIT commented Jan 3, 2024

I've tried with 18c on docker installing from source locally and I could no longer reproduce, stopped when the counter was over 26k. (this is with python on windows to oracle on podman on wsl)

Seems to be fine here.

@zzzeek
Copy link
Author

zzzeek commented Jan 3, 2024

yup seems fine now, I confirmed I can still reproduce w/ the old version and the main version fixes the issue.

@anthony-tuininga
Copy link
Member

Thanks for the confirmation!

@anthony-tuininga
Copy link
Member

The patch has been included in version 2.0.1 which was just released.

@anthony-tuininga
Copy link
Member

@zzzeek, @CaselIT, I have just pushed this change which pretty much reverts the patch that was originally made to resolve this issue. I was able to confirm that the issue remains in the original code prior to the patch but no longer does in the code I just pushed. Just to be safe, if you could run the code as well with this commit and verify that it is not an issue for you, that would be appreciated. I believe other changes have eliminated the original issue, but I'd like to be sure!

@zzzeek
Copy link
Author

zzzeek commented May 8, 2024

is there any quick way to build just the thin client without it looking for OCI for the thick client? or can you put up a wheel file somewhere

@anthony-tuininga
Copy link
Member

The build of the driver doesn't require OCI or the thick client in any way. You can simply build from source. That said, I believe I was able to confirm that the original issue reported here has been corrected a different way -- so this is just paranoia, I think!

@zzzeek
Copy link
Author

zzzeek commented May 8, 2024

OK I can confirm the latest main branch does not have the issue, after I re-confirmed that 2.0.0 still reproduces on this machine, so we're good

@anthony-tuininga
Copy link
Member

Thanks, @zzzeek!

@oracle oracle deleted a comment from zzzeek May 8, 2024
@oracle oracle deleted a comment from zzzeek May 8, 2024
@oracle oracle deleted a comment from zzzeek May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working patch available
Projects
None yet
Development

No branches or pull requests

3 participants