-
-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transactionally deadlock still there on 3.2.0-M1? #1614
Comments
There is no way to configure the default AsyncExecutor. You'll have to create a separate one. |
This really needs a reproduction. According to your description it is https://github.com/slick/slick/blob/master/slick-testkit/src/test/scala/slick/test/jdbc/hikaricp/SlickDeadlockTest.scala#L48-L58, which was successfully fixed by #1274. |
I've seen this in a local test run:
The test succeeded and it didn't cause any negative effects. This could just be an illegal state during a shutdown or something similar. But it could also point to a bug in #1274. If the in-use count is lower than it should be at some point during normal operation, this would defeat the deadlock-prevention. |
To make my previous comment sound less alarmist: The fact that the test succeeded and that I saw this message in the console at all (which only happens for log messages between tests but not the actual test output) points to an illegal shutdown state. |
I've just upgraded to 3.2.0-M2 and am seeing deadlocks. I have a hikari pool with 3 connections, a queue size 1000 and 6 threads. The form in which the deadlocks appear is
So in essence there is a large amount of DBIOActions created at once, but not so much as to overflow the db queue. |
@hvesalai Are the pools configured by Slick or manually? The new scheduler needs to know the correct connection pool size. (I'm not sure we ever considered the artificial case where the number of connections is lower than the number of threads, so it's possible that it wouldn't work properly in this case.) The gist of the code above, as far as DBIO scheduling is concerned, seems to be simply synchronous action -> wait for Future -> other synchronous action ( |
It is a manually configured AsyncExecutor for which I hadn't configured the maxConnections. Now I changed it to have a maxConnection, but I have also made other large changes in the code to lessen the number of simultaneous DBIOActions. I agree that everything should have been fused and so the deadlock really baffles me. Please also see my comment in #1461 (comment) |
So does it work now with the correctly configured size? The |
I have not seen any deadlocks since I configured it with maxConnections, but I also changed the code so that it does not fire simultaneous DBIOActions. |
From the reports so far there is no indication that the current scheduling algorithm fails if the pool is configured correctly. I think it would be good to reimplement scheduling in 3.3 as described by @hvesalai in #1461 (comment) and myself in the subsequent comment. This should be more robust and does not require knowledge of the pool size. |
We have seen this issue when submitting DBIOs composed of large numbers of sequenced DBIOs that affect the same set of rows in the database. We see the three worker threads attempting to read results (the same stacktrace as @curreli), but another transaction (that was initiated first) is a postgres lock point for them (i.e. the results that the three worker threads are attempting to read will not be available until the fourth transaction clears). In the thread dump we see no thread that is attempting to read the (available) results from the transaction that is the lock point. postgres reports the lock point transaction as "idle in transaction" - presumably because whatever should be reading the result has left. Running the same DBIO in isolation works, and increasing the numThreads means that we can increase the number of concurrent DBIOs that can be run, but we can always hit a point where the lock point transaction comes into existence and causes the other transaction to be dead locked. We are running 3.2.0. |
@dspasojevic Thanks, this is really useful. Can you reproduce it when I hadn't really thought about this case before. This could be a deadlock in the database instead of one in Slick's scheduler. It's different from #1274 but nevertheless caused by Slick's scheduler. If my assumption is correct, it should work when Since we do our own scheduling you can run into a situation where all DBIO threads are blocked on reading from the database but the row is locked and the transaction that holds the lock is currently suspended and can't run because no DBIO thread is available. The database can't detect this as a deadlock because it assumes that all connections run independently. I can't think of any solution that preserves our more resource-efficient scheduling with suspended connections in this case. Pinned sessions are be OK as long as they are in auto-commit mode, but when a session is in transaction mode we can't give up the DBIO thread and schedule a continuation while waiting for an asynchronous result. Instead we need to block the DBIO thread to prevent another DBIO action from running which could block until the transaction is committed. |
@szeiger I can't reproduce after setting What effect does this have on long running streaming DBIOs? I thought that the number of additional connections over Worth noting that this issue is exacerbated by hikari thinking that it is the job of application code to detect and shutdown "idle in transaction" operations. Since neither slick or hikari do this, we can only detect and recover from this at the database level itself. |
Streaming doesn't usually run in a transaction so it would still free up the thread. Or is it possible that rows get locked merely by the presence of an open result set? If that's the case then we'd have to always block the thread (at least by default). Currently all interruptions of database I/O (which are always asynchronous) give up the thread and schedule a continuation. This includes back-pressure handling during streaming as well as non-DBIO computations (for example, whenever you call a DBIO method like Setting So the situation where this deadlock can happen is that you have multiple DBIO actions running at the same time which contend over the same lock and have asynchronous computations while holding the lock (i.e. |
Au contraire. In postgresql streaming (the use of a cursor) requires a transaction. |
Ah yes, I remember needing a transaction to prevent caching of the whole result set. So I suppose without a transaction the result set gets fully materialized and there should never be a need for a lock. |
Exactly so and hence if the result set is huuuuge, you want to have a transaction with the lowest applicable isolation level to prevent running out of memory. |
Just a data point, we are experiencing the exact issue as dspasojevic on Mar 8, and the explanation above makes perfect sense. The underlying problem is that a thread running a statement that is waiting for a lock is blocked, due to JDBC's architecture. |
@arobertn, I might be wrong, but @dspasojevic describes having a a larger connection pool than thread pool, meaning yes, connections can run out of threads and we consider that a configuration bug in the application, which we'll hopefully make harder for people to implement at some point in the future. Meaning if you increase the thread pool size for Slick's thread pool beyond the number of connections, you should be good. Am I missing something? @dspasojevic also mentioned a relationship to an "idle transaction" being involved, but without more info on the details, I don't know the impact of that. |
min/maxThreads are defined in slick.util.AsyncExecutor#L50 |
@arobertn There is no configurations for max/min thread, only |
@hvesalai When we set |
@jontra the fix we are using only affects AndThen and Seq. If you are not using either, then that could be one reason. |
See discussion in slick#1614 about deadlocks caused by maxConnections > numThreads.
Unfortunately, we are still running into this issue after upgrading to slick 3.2.2. 😞 Not running our queries as |
@stubbi can you open a new issue and describe all the relevant settings (maxThread, minThread, maxConnections). Have you looked at the logs that there are no warnings about settings that could cause deadlocks? |
Also note that this issue was about deadlocks in the slick thread pool. If your deadlock happens somewhere lese (e.g. in the DB), that is completely separate issue. Slick cannot prevent deadlocks that it has no control over, such as, for example, those caused by inapropriate use of DB-level locking (implicit or explicit). |
Although this is supposedly fixed in 3.3, I'm stuck on 3.2 until the I was hopeful that if I configured the
Here's how I'm configuring my val asyncExecutor = AsyncExecutor(
name = "CorePostgresDriver.AsyncExecutor",
minThreads = poolSize,
maxThreads = poolSize,
maxConnections = poolSize,
queueSize = 1000 // the default used by AsyncExecutor.default()
)
api.Database.forDataSource(dataSource, Some(poolSize), executor = asyncExecutor) |
By coincidence some of my coworkers encountered transactional issues with Slick today. Instead of seeing the "count cannot be decreased" errors like I did, they saw timeouts waiting on connections from the pool. It seemed to behave OK under light load but under regular use it quickly hangs. In the DB itself we only see connections in state "idle in transaction", no obvious locks, as though The As far as they can tell, they are not aware of any DB locking. For now they are removing |
I've been facing the #1274 issue on slick
3.1.1
so I upgraded to3.2.0-M1
which supposedly contains a fix. However, after the upgrade, I'm still having deadlocks when usingtransactionally
.I can reproduce this issue by running more than 20
DBIOAction
in parallel, withtransactionally
. If I removetransactionally
then everything is fine. After investigating, I figure that 20 is the default number of threads in theAsynExecutor
so that kind of makes sense.I'm using
HikariCP v2.5.1
along withPostgreSQL Driver v9.4.1208.jre7
. PostgreSQL is configured to accept max 100 connections. The HikariCP data source is configured with a max pool size of 100.A thread dump shows all
AsyncExecutor
threads asRUNNABLE
:Before we figure this one out, is there a way I can increase the default number of threads in the
AsyncExecutor
?The text was updated successfully, but these errors were encountered: