Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for asyncio #6

Closed
anthony-tuininga opened this issue Jun 1, 2022 · 59 comments
Closed

Support for asyncio #6

anthony-tuininga opened this issue Jun 1, 2022 · 59 comments
Labels
enhancement New feature or request patch available

Comments

@anthony-tuininga
Copy link
Member

This is a continuation of the original request made on cx_Oracle: oracle/python-cx_Oracle#178.

The current status is that with the addition of the thin driver, adding suport for asyncio will be considerably simpler (and will only work in thin mode). If anyone has suggestions or recommendations on API, please share them!

@kleysonr
Copy link

kleysonr commented Jun 1, 2022

+1

@danizen
Copy link

danizen commented Jun 1, 2022

It isn't async per se but the key benefits of escaping from the GIL and better managing RDBMS connections that we need. As both @sharkguto and @P403n1x87 mentioned in oracle/python-cx_Oracle#178, it can be done by wrapping queries in coroutines.

That means that @cjbj and @anthony-tuininga don't need to write a new driver (note that async drivers don't follow PEP 249). I think a thin package that wraps oracledb in an opinionated way (e.g. using connection pools all the time, and get a connection just to do the request) could be used.

I guess there is a problem there for cursors and fetching additional results, but I'll leave it to @anthony-tuininga to figure this out.

@oracle oracle deleted a comment from danizen Jun 1, 2022
@anthony-tuininga
Copy link
Member Author

@danizen, I did make an attempt at simply wrapping cx_Oracle in asyncio coroutines -- but the performance of the result was poor (and that's being generous). As such I don't see any benefit to wrapping the thick driver at all -- other than the fact that you can use asyncio, I guess! My first attempt to generate an asyncio (thin) driver showed excellent performance in comparison. So that is the route that is being considered at this point! I'd like to see this implemented soon -- this year if at all possible! Unless I hear otherwise I'll probably follow the pattern used by asyncpg.

@danizen
Copy link

danizen commented Jun 1, 2022

Thanks, @anthony-tuininga, I am looking forward to trying it out.

@jiaulislam
Copy link

I guess it's still not supported officially to get async oracle. 💔

@anthony-tuininga
Copy link
Member Author

Not yet, no. Its definitely on the list, though! And I'd still like to see it done this year, yet, if at all possible.

@anthony-tuininga
Copy link
Member Author

Just to give a bit of an update: I have started looking into this and ran into a bit of a roadblock interfacing asyncio with the Oracle database protocol -- but a solution has been found, thankfully! Thanks to that roadblock, getting it done this year is not going to happen any longer...but I am actively working on it (among other things), so hopefully I'll have something for you to look at in January.

One question for those of you following along: as mentioned earlier a simple wrap of the synchronous routines in a future (that executes in a thread pool) works but is about twice as slow as the synchronous version. Would it be helpful to include that as a fallback if the solution I mentioned earlier doesn't work for all database versions? Or would it be preferable to simply state that support isn't available in that case? Comments welcome!

@srtucker
Copy link

srtucker commented Dec 7, 2022

Thanks for the update. For your second option would it be a hard stop or would the async API still function, it just would affectively be synchronous?

@anthony-tuininga
Copy link
Member Author

The options, I think, are

  • raise an exception and state that asyncio doesn't work (well) with this database version
  • fallback to the "works but is considerably slower" approach (putting synchronous calls into a thread pool for execution)

@P403n1x87
Copy link

* fallback to the "works but is considerably slower" approach (putting synchronous calls into a thread pool for execution)

Slow queries could perhaps benefit from the process, rather than the thread, pool. But opting in, in this case, should require an active user request IMO. Sending a fast query to a process pool might actually be slower than using the thread pool, although I don't have numbers to back this up at the moment.

@old-syniex
Copy link

The options, I think, are

  • raise an exception and state that asyncio doesn't work (well) with this database version
  • fallback to the "works but is considerably slower" approach (putting synchronous calls into a thread pool for execution)

I would suggest to wrap the problematic versions in threadpool for consistency.

The community might will be able to suggest solutions

@danizen
Copy link

danizen commented Dec 19, 2022

My personal preference would be to raise an exception, but it is not either or. You could raise a warning and then fallback to wrap in a threadpool. Users who want the threadpool behavior could ignore the warning, and users who want the wrapped behavior could catch the warning and raise some sort of system error.

@ptekelly
Copy link

any news on this?

@jiaulislam
Copy link

They are working on it. But I guess it will work only for thin driver not the thick mode. I think asyncio is also required for thick mode.

@anthony-tuininga
Copy link
Member Author

Yes, we are working on it but also getting distracted by other projects! I can (and have as a proof of concept) implemented asyncio with thick mode -- but that was about 2-3 times slower than without asyncio, which sort of defeats the purpose, I think! Do you still want asyncio even if it is slower than regular synchronous mode? My experimentations with thin are much more promising.

@ptekelly
Copy link

for me thick doesn't matter - just using thin (at the moment at least)

@jiaulislam
Copy link

Yes, we are working on it but also getting distracted by other projects! I can (and have as a proof of concept) implemented asyncio with thick mode -- but that was about 2-3 times slower than without asyncio, which sort of defeats the purpose, I think! Do you still want asyncio even if it is slower than regular synchronous mode? My experimentations with thin are much more promising.

If I have a database on a remote can I use Thin mode ? I tried earlier but it was giving an exception saying require thick mode. I didn't dig up that issue much as I was in hurry with that project but I guess I have to look it up again if thin mode works in remote database.

@cjbj
Copy link
Member

cjbj commented Jan 25, 2023

@jiaulislam yes you can connect in Thin mode to remote databases. This is off topic for this thread, so if you have questions about it, please start a new discussion.

@vbadita
Copy link

vbadita commented Feb 28, 2023

What would be the database versions which don't support asyncio ?

@cjbj
Copy link
Member

cjbj commented Feb 28, 2023

No further discussions have occurred. @anthony-tuininga has been busy on the Thin node-oracledb driver.
What DB versions are you using?

@ptekelly
Copy link

ptekelly commented Mar 1, 2023

No further discussions have occurred. @anthony-tuininga has been busy on the Thin node-oracledb driver. What DB versions are you using?

I'd be happy for thin node async - is there forum post about that - or better still a time frame?

@cjbj
Copy link
Member

cjbj commented Mar 1, 2023

No time frame. This issue is the place to follow to get news.

@ptekelly
Copy link

ptekelly commented Mar 1, 2023

ok thanks

@danizen
Copy link

danizen commented Mar 1, 2023

Now that I am using asyncio more heavily, I think the main benefit from this would only be in managing the number of connections, and that can be addressed somewhat with DRCP configuration. One trick I am doing is to create a ThreadPoolExecutor and then using the asyncio.run_in_executor formulation to run synchronous code.

This means I can tune the ThreadPoolExecutor to have the same number of threads i expect in a connection pool, and it is quite functional.

If you couple that with DRCP, it gets yet more functional.

@vbadita
Copy link

vbadita commented Mar 9, 2023

Now that I am using asyncio more heavily, I think the main benefit from this would only be in managing the number of connections, and that can be addressed somewhat with DRCP configuration. One trick I am doing is to create a ThreadPoolExecutor and then using the asyncio.run_in_executor formulation to run synchronous code.

This means I can tune the ThreadPoolExecutor to have the same number of threads i expect in a connection pool, and it is quite functional.

If you couple that with DRCP, it gets yet more functional.

Thank you. I tried ThreadPoolExecutor with connection pooling and it seems it's working fine.

@ptekelly
Copy link

Hi - any update to this?

@anthony-tuininga
Copy link
Member Author

@nickswiss, others have indeed come up with a "solution" that works for them (the aforementioned cx_Oracle_async) while waiting for me to implement asyncio support. Support is indeed planned and some progress has been made, but other higher priority items have interrupted that progress. There are a few small items remaining on that list but I hope to have those completed shortly (which are planned to go into 1.4) and then I can concentrate on asyncio support (which is planned for 2.0). Of course these plans are subject to change but we will inform you if that is the case.

Part of my efforts in the past few months have been on support for a thin mode driver for Node.js. That effort has shown that it is possible to implement a truly async model without the enhancemnts available in Oracle Database 23c -- so with that knowledge I hope to have asyncio support for all database versions, with the caveat that the Oracle Database 23c enhancement should improve performance. The performance without the Oracle Database 23c enhancement should still easily outperform the simple wrapper approach that cx_Oracle_async is using.

The current plan is for asyncio support to only be usable in thin mode.

@WilliamStam
Copy link

thank you for all your hard work! its super supper appreciated.

im also a +1 for asyncio natively to get the proper awaits in it. in the meen time. anyone have some code they could share to get it going? :P cant exactly block the gil while waiting for an oracle response. is it as easy as:

def make_oracle_call(sql, params):
    with oracle as e: # psudo code
        results = e.fetchall()
        
    return results


executor = ThreadPoolExecutor(max_workers=1)
a = executor.submit(functools.partial(make_oracle_call,sql,params))

@Julian-Brendel
Copy link

Hi All,

Just wanted to check in on this thread.

Do you have any update on the roadmap / plans for the async / 2.0 release?

@cjbj
Copy link
Member

cjbj commented Oct 30, 2023

@Julian-Brendel some work is being done and management is aware we are treating this as a priority request, however other things do keep coming up so I'm not going to comment on timelines.

@anthony-tuininga
Copy link
Member Author

See announcement #258 for details. Comments welcome here!

@old-syniex
Copy link

@anthony-tuininga I am suggestingto release version 2.0.0a1 for ease of testing purposes.

I propose aligning our oracledb design with the following asyncpg methods:
fetch
fetchrow
fetchval
execute
executemany

Adopting this approach would likely enhance the overall user-friendliness.

@anthony-tuininga
Copy link
Member Author

@old-syniex, I presume you are referring to having these methods on the connection object -- thereby eliminating the need to create a cursor object and perform an execute/fetch on that? I agree that these would be more convenient in cases where the additional properties available on cursors are not required. Would you like to see this as an additional set of APIs? Or as a complete replacement?

As for creating a release, that is certainly a possibility which we will consider.

@old-syniex
Copy link

old-syniex commented Nov 22, 2023

@anthony-tuininga
Yeah, having those methods on the connection object.
I would like it to be additional set of API, I can't find a reason why to remove the current API.

@anthony-tuininga
Copy link
Member Author

anthony-tuininga commented Nov 22, 2023

Sure. I can see the advantage of doing that. Something like this would make sense to me:

async def execute(
    self,
    statement: str,
    parameters: Union[list, tuple, dict] = None
) -> Any:
    """
    Executes a statement and returns an AsyncCursor instance (if executing a query) or None if
    executing a non-query. Other options include returning the number of rows updated
    (for non-queries) or an ExecuteResult object which contains the information that would be on
    an AsyncCursor instance.
    """

async def executemany(
    self,
    statement: str,
    parameters: Union[list, int],
    batcherrors: bool = False,
    arraydmlrowcounts: bool = False
) -> None:
    """
    Similar to AsyncCursor.executemany() but doesn't require creating an AsyncCursor instance first.
    No return value but could also have an ExecuteResult object returned with information that would
    normally be an AsyncCursor instance.
    """

async def fetchone(
    self,
    statement: str,
    parameters: Union[list, tuple, dict] = None,
    rowfactory: Callable = None
) -> Any:
    """
    Executes a statement and returns the first row of the result set returned
    (or None, if no rows are fetched).
    """

async def fetchmany(
    self,
    statement: str,
    parameters: Union[list, tuple, dict] = None,
    num_rows: int = oracledb.defaults.arraysize,
    rowfactory: Callable = None
) -> list:
    """
    Executes a statement and returns the first <num_rows> rows of the result set.
    """

async def fetchall(
    self,
    statement: str,
    parameters: Union[list, tuple, dict] = None,
    arraysize: int = oracledb.defaults.arraysize,
    rowfactory: Callable = None
) -> list:
    """
    Executes a statement and returns all of the rows of the result set as a list.
    """

That makes it clear that they are the same as the cursor equivalents but without requiring a cursor. Thoughts?

@syniex
Copy link

syniex commented Nov 24, 2023

Yes, this looks great.

@anthony-tuininga
Copy link
Member Author

FYI, I just pushed changes to merge with the changes introduced in main as well as ensure that Python 3.7 and higher work correctly. I had inadvertently used a method that was only available in Python 3.11!

@anthony-tuininga
Copy link
Member Author

FYI, I just pushed more changes to merge with the changes introduced in main and also added shortcut functions on the connection object as follows:

    async def callfunc(
        self,
        name: str,
        return_type: Any,
        parameters: Union[list, tuple] = None,
        keyword_parameters: dict = None,
    ) -> Any:
        """
        Call a function with the given name.
            
        This is a shortcut for creating a cursor, calling the stored function
        with the cursor and then closing the cursor.
        """

    async def callproc(
        self,
        name: str,
        parameters: Union[list, tuple] = None,
        keyword_parameters: dict = None,
    ) -> list:
        """
        Call a procedure with the given name.

        This is a shortcut for creating a cursor, calling the stored procedure
        with the cursor and then closing the cursor.
        """
    async def execute(
        self, statement: str, parameters: Union[list, tuple, dict] = None
    ) -> None:
        """
        Execute a statement against the database.

        This is a shortcut for creating a cursor, executing a statement with
        the cursor and then closing the cursor.
        """
    async def executemany(
        self, statement: Union[str, None], parameters: Union[list, int]
    ) -> None:
        """
        Prepare a statement for execution against a database and then execute
        it against all parameter mappings or sequences found in the sequence
        parameters.

        This is a shortcut for creating a cursor, calling executemany() on the
        cursor and then closing the cursor.
        """
    async def fetchall(
        self,
        statement: str,
        parameters: Union[list, tuple, dict] = None,
        arraysize: int = None,
        rowfactory: Callable = None,
    ) -> list:
        """
        Executes a query and returns all of the rows. After the rows are
        fetched, the cursor is closed.
        """

    async def fetchmany(
        self,
        statement: str,
        parameters: Union[list, tuple, dict] = None,
        num_rows: int = None,
        rowfactory: Callable = None,
    ) -> list:
        """
        Executes a query and returns up to the specified number of rows. After
        the rows are fetched, the cursor is closed.
        """
    async def fetchone(
        self,
        statement: str,
        parameters: Union[list, tuple, dict] = None,
        rowfactory: Callable = None,
    ) -> Any:
        """
        Executes a query and returns the first row of the result set if one
        exists (or None if no rows exist). After the row is fetched the cursor
        is closed.
        """

@WilliamStam
Copy link

on behalf of pretty much everyone. thank you soooo much for this.

@cjbj
Copy link
Member

cjbj commented Nov 28, 2023

@WilliamStam thank you. Please make sure you hammer on it and give us feedback!

@cjbj
Copy link
Member

cjbj commented Dec 5, 2023

How is everyone going with python-oracledb asyncio testing? Are there any issues that we should know about, or suggestions that you want to make?

@syniex
Copy link

syniex commented Dec 5, 2023

@cjbj is there any chance to create a prerelease?

@cjbj
Copy link
Member

cjbj commented Dec 5, 2023

There's a chance :) Let me sync with Anthony - unless he does it before my day starts tomorrow.

@syniex
Copy link

syniex commented Dec 5, 2023

There's a chance :) Let me sync with Anthony - unless he does it before my day starts tomorrow.

I hope you will async with Anthony ;)

@anthony-tuininga
Copy link
Member Author

@syniex, we discussed this and agreed that we will release version 2.0 with asyncio support as it currently stands (probably some time next week). The feedback that has been received so far has all been positive. We will include a note that the asyncio support is under review and subject to change based on feedback after use in the real world! :-) That allows us to get out the other enhancements and bug fixes at the same time. So stay tuned!

@anthony-tuininga
Copy link
Member Author

anthony-tuininga commented Dec 12, 2023

Asyncio support is now in main in preparation for the release of version 2.0.

@anthony-tuininga
Copy link
Member Author

And version 2.0.0 has now been released! Thanks for your patience and let us know if you find anything that needs to be changed with the asyncio support.

@cjbj
Copy link
Member

cjbj commented Dec 19, 2023

Yay.

FWIW I did a quick blog post here.

@WilliamStam
Copy link

so finally getting round to performance stuff in my projects. interestingly the oracledb async is marginly faster than the sync every single time i ran the stupid thrown together benchmark script (a few ms but still.. its consistently "faster"

(i was testing the execute part here not the connection / whatever else parts. and tried to do apples vs apples. im aware of connection.execute() but i felt that would be "unfair" to the sync part)

https://gist.github.com/WilliamStam/b9bed409e3a754bf05accb95d04bb54e

also.. ps.. &^@#$*&^ sqlalchemy :(

@anthony-tuininga
Copy link
Member Author

Thanks for sharing! Is the "swearing" at SQLAlchemy because of performance or something else?

@WilliamStam
Copy link

WilliamStam commented Jan 12, 2024

lol yeah. having issues with SA and oracle but nothing for this thread

sqlalchemy/sqlalchemy#10874

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request patch available
Projects
None yet
Development

No branches or pull requests