-
Notifications
You must be signed in to change notification settings - Fork 259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrent queries on single connection #81
Comments
The two solutions I can think of would be to either:
|
What about being able to do something like this: async def get_from_db():
async with database.connection(new_connection=True) as conn:
return await conn.fetch_all("SELECT pg_sleep(1)") |
So you'll only be seeing that issue when you run within a transaction, right, since that's the case where the two operations are running against the same connection? Are you able to submit a failing test case for this (even if it only runs against asyncpg that'd be fine). |
And yes, I think we'll need to have a asyncio lock that's acquired for the connection. It hadn't really occured to me that if you use asyncio flow branching, then you could end up having concurrent operations on the same connection. |
OK. PR submitted. I think the lock will solve the issue, as there are legitimate reasons to branch inside of a transaction. But in some cases, I would like to execute a query unrelated to the transaction and run that concurrently in a separate connection. Do you think that the |
I'd suggest we treat the "new connection" as a separate issue to this. We could make a case that we don't support flow branching against a transaction, since eg. the equivalent wouldn't be supported in threading cases either, but I think it's probably worth us covering. Happy to take a look at any pull requests towards this, or else will aim to get onto it at some point. |
@tomchristie I'm running into this issue outside of a transaction. In particular, it seems that the error happens precisely if I ever await a The following code is my attempt at a minimal reproducible demonstration of the behavior. The import asyncio
from databases import Database
if __name__ == "__main__":
database_uri = "<removed>"
async def test1():
database = Database(database_uri)
await database.connect()
await asyncio.gather(
database.execute(query="SELECT 1"),
database.execute(query="SELECT 1"),
)
await asyncio.gather(
database.execute(query="SELECT 1"),
database.execute(query="SELECT 1"),
)
await database.execute(query="SELECT 1")
print("Test 1 succeeded")
print("----------------")
async def test2():
database = Database(database_uri)
await database.connect()
await database.execute(query="SELECT 1")
await asyncio.gather(
database.execute(query="SELECT 1"),
database.execute(query="SELECT 1"),
)
print("Test 2 succeeded")
asyncio.get_event_loop().run_until_complete(test1())
asyncio.get_event_loop().run_until_complete(test2()) The result of the execution is:
Any thoughts on how to get around this would be appreciated! |
Ah! I was experiencing the same issue but was not able to replicate it in isolation. I think the |
Indeed, overriding the Database methods to be called inside class MyDatabase(Database):
async def execute(self, *args, **kwargs):
return (await asyncio.gather(super().execute(*args, **kwargs)))[0]
async def fetch_all(self, *args, **kwargs):
return (await asyncio.gather(super().fetch_all(*args, **kwargs)))[0] (Replacing I would appreciate any insight into why this works (I only discovered it while trying to produce test cases). |
@ryananguiano It looks like an asyncio lock is getting used already (https://github.com/encode/databases/blob/master/databases/core.py#L164), is this not what you are describing? Also, I haven't actually checked whether the queries are being executed concurrently or if there is a lock preventing concurrent execution. My goal would be to be able to make unrelated (read) queries concurrently so I don't have to wait for multiple DB round trips; if an async lock is still forcing queries to be executed sequentially then that would defeat the primary purpose of my workaround anyway. Is that a reasonable thing to be aiming to achieve with this package/asyncpg? |
One of the issues I ran into was using But I do think that a separate solution is probably necessary to solve the issue that you brought up. Your solution could work. I will look at this closely so I can try to understand why that fixes it because there could possibly be a simpler solution. |
I'm having the same issue when using Is there any work around to overcome this or any way to use some other driver such as |
Given that we'd need to add locking and force sequential queries, I'd suggest not attempting to execute queries in parallel on the same connection, or using explicit locking in user code. I'd assume other drivers will have exactly the same behavior. Note that we I don't think we'd expect thread-concurrency drivers to deal with "fork-and-use-same-connection-for-concurrent-queries" to operate correctly, although I'd be interested to hear differently. (Eg. https://docs.python.org/3/library/sqlite3.html#sqlite3.connect "When using multiple threads with the same connection writing operations should be serialized by the user to avoid data corruption.")
@dmontagu - Nope, that's just a lock around the connection acquiry logic, rather than against queries on the connection. Things we could consider here:
|
Similar (but for the thread-concurrency case): "The MySQL protocol can not handle multiple threads using the same connection at once." - https://mysqlclient.readthedocs.io/user_guide.html?highlight=thread#mysqldb |
psycopg2 does handle it, by locking appropriately. http://initd.org/psycopg/docs/usage.html#thread-and-process-safety |
I believe this would handle the locking correctly: #108 |
Probably going to leave this for a couple of days to think over first. If you actually want concurrent access then you need to perform any flow branching outside the context of a connection or transaction. Merging #108 would resolve the behavior, by enforcing sequential access on the same connection. I guess that we probably want that supported, but happy to take any feedback before we merge it in. Also possible that we'll want some explicit connection managment API for cases where that's what you really want (a bit similar to #67). |
I have found workaround for it that works for me:
|
When using asyncpg, I am getting
InterfaceError: cannot perform operation: another operation is in progress
when running multiple db calls concurrently.I am able to replicate and isolate the issue with
force_rollback=True
, although in my project where I encountered this bug, it is happening regardless.Here is the isolated issue:
Result
When running into this issue with
force_rollback=False
, since there isn't supposed to be a global connection and transaction, I was able to get around the issue by changing the following function:to:
I tried using
async with database.connection() as conn:
but it would return the same connection that is being currently being used. I guess I was just trying to understand the logic of sharing the pool connection, instead of letting the pool allocate connections (when available), since asyncpg Connection only allows one statement at a time.The text was updated successfully, but these errors were encountered: