ManagedArrayBlockingQueue is paused indefinitely having not empty itemQueue #1807

savulchik · 2017-10-30T12:41:43Z

I'm investigating a rare problem that manifests itself as:

slick 3.2.1 stopped processing DBIOActions from the ManagedArrayBlockingQueue that became paused having not empty itemQueue. Async executor threads are waiting on notEmpty lock:

   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000000f045c408> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	at slick.util.ManagedArrayBlockingQueue.$anonfun$take$1(ManagedArrayBlockingQueue.scala:115)
	at slick.util.ManagedArrayBlockingQueue$$Lambda$591/467364322.apply(Unknown Source)
	at slick.util.ManagedArrayBlockingQueue.lockedInterruptibly(ManagedArrayBlockingQueue.scala:222)
	at slick.util.ManagedArrayBlockingQueue.take(ManagedArrayBlockingQueue.scala:114)
	at slick.util.ManagedArrayBlockingQueue.take(ManagedArrayBlockingQueue.scala:13)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

hikaricp 2.5.1 closed all connections due to idle connection timeout

Inside a heap dump of the application I found the following state of ManagedArrayBlockingQueue instance:

Type   |Name                                                   |Value
--------------------------------------------------------------------------------------------------------------------------------------------------
boolean|bitmap$0                                               |true
ref    |logger                                                 |slick.util.SlickLogger @ 0xf13c8d68
boolean|paused                                                 |true
int    |inUseCount                                             |12
ref    |slick$util$ManagedArrayBlockingQueue$$highPrioItemQueue|slick.util.InternalArrayQueue @ 0xf045c900
ref    |slick$util$ManagedArrayBlockingQueue$$itemQueue        |slick.util.InternalArrayQueue @ 0xf045c8e0
ref    |notFull                                                |java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject @ 0xf045c8c8
ref    |notEmpty                                               |java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject @ 0xf045c408
ref    |slick$util$ManagedArrayBlockingQueue$$lock             |java.util.concurrent.locks.ReentrantLock @ 0xf045c3d8
int    |capacity                                               |20000
int    |maximumInUse                                           |12
--------------------------------------------------------------------------------------------------------------------------------------------------

and it's itemQueue state:

Type|Name                                    |Value
----------------------------------------------------------------------------------
int |count                                   |687
int |putIndex                                |8610
int |slick$util$InternalArrayQueue$$takeIndex|7923
ref |items                                   |java.lang.Object[40000] @ 0xf045e7b0
----------------------------------------------------------------------------------

I'm wondering if I stumbled upon a concurrency issue in slick (like in #1730)? And could I replace a ManagedArrayBlockingQueue with a LinkedBlockingQueue to workaround the possible concurrency issue with ManagedArrayBlockingQueue?

Thanks!

The text was updated successfully, but these errors were encountered:

savulchik · 2017-11-08T09:33:04Z

Update: Suddenly got the exception

Exception in thread "jdbc-pool-3" java.lang.IllegalArgumentException: requirement failed: count cannot be increased
        at scala.Predef$.require(Predef.scala:277)
        at slick.util.ManagedArrayBlockingQueue.$anonfun$increaseInUseCount$1(ManagedArrayBlockingQueue.scala:40)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
        at slick.util.ManagedArrayBlockingQueue.slick$util$ManagedArrayBlockingQueue$$locked(ManagedArrayBlockingQueue.scala:217)
        at slick.util.ManagedArrayBlockingQueue.increaseInUseCount(ManagedArrayBlockingQueue.scala:39)
        at slick.util.AsyncExecutor$$anon$2$$anon$1.beforeExecute(AsyncExecutor.scala:74)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

hvesalai · 2018-01-16T20:11:53Z

Seeing the same thing and debugging it. @kwark could you perhaps recall why InternalArrayQueue was created. I cannot see any justification for it and it does not seem to behave as its scaladoc would suggest ("additional logic for temporarily rejecting elements").

hvesalai · 2018-01-16T20:47:53Z

There problem is here:

at slick.util.AsyncExecutor$$anon$2$$anon$1.beforeExecute(AsyncExecutor.scala:74)

For all connections taken from the itemQueue the increaseInUseCount has already been called in extract of ManagedArrayBlockingQueue. Hence the item must be from the highPriorityQueue. However, that queue should only have items that already have a connection.

Then how is it possible that this is not the case? One source could be this code in AsyncExecutor:

       override def execute(command: Runnable): Unit = {
          if (command.isInstanceOf[PrioritizedRunnable]) {
            executor.execute(command)
          } else {
            executor.execute(new PrioritizedRunnable {
              override val priority: Priority = WithConnection
              override def run(): Unit = command.run()
            })
          }
        }

hvesalai · 2018-01-16T21:32:53Z

And other hyphothesis is that the priority of a PrioritizedRunnable changes between when it is queued and when it is executed. This could be possible since priority is def not val in the instances defined in BasicBackend

hvesalai · 2018-01-17T08:58:33Z

I was wrong. This happens because not all task go through the queue. When there are less threads in the ThreadPoolExecutor, it creates a Worker thread to run a given Runnable and that bypasses the queue.

So if there are more threads than there are connections in the pool, then the requirement will fail.

hvesalai · 2018-01-17T09:21:31Z

Or actually worse. It could fail in all cases when you have some executions holding the connections (inUseCount has been incremented) and running on application threads (i.e. not doing blocking I/O, but still holding the connection because of tx, etc). Then if a new Runnable is sent to the asyncExecutor it can create a new thread to run it, bypassing the queue, and shit hits the fan.

Having now been reading the code, I think it is extremely bad fragile and should be completely replaced. It is a big mud ball that should have never seen a commit.

savulchik · 2018-01-17T12:04:13Z

@hvesalai Previously I thought that every submitted Runnable goes through the queue, but now after reading the docs and the actual code of ThreadPoolExecutor I realize that you are right, thanks!

In my project with controllable workload I eventually switched from ManagedArrayBlockingQueue to LinkedBlockingQueue via AsyncExecutor's setting queueSize = -1 (unbounded) and didn't notice any problems with it so far. I believe in the worst case the queue can accumulate too much tasks and result in OOM exception.

hvesalai · 2018-01-17T12:25:42Z

The only way to prevent this issue when using ManagedArrayBlockingQueue with a limited number of connections is to have minThreads == maxThreads. This means that new threads are not created during use and thus the case is not triggered.

hvesalai · 2018-03-01T11:17:44Z

I have created a series of PRs to warn about (or prevent) bad configurations. Please have a look.

#1853
#1854
#1855

Fix #1614, Fix #1807: require and warn about bad configurations

hvesalai mentioned this issue Jan 30, 2018

Transactionally deadlock still there on 3.2.0-M1? #1614

Closed

hvesalai added this to the Next fix release milestone Feb 28, 2018

hvesalai closed this as completed in 56a99ed Mar 2, 2018

hvesalai added a commit that referenced this issue Mar 2, 2018

Merge pull request #1855 from hvesalai/fix/1614_and_1807

73fa93c

Fix #1614, Fix #1807: require and warn about bad configurations

hvesalai modified the milestones: Future fix release, 3.2.2 Mar 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ManagedArrayBlockingQueue is paused indefinitely having not empty itemQueue #1807

ManagedArrayBlockingQueue is paused indefinitely having not empty itemQueue #1807

savulchik commented Oct 30, 2017

savulchik commented Nov 8, 2017

hvesalai commented Jan 16, 2018

hvesalai commented Jan 16, 2018

hvesalai commented Jan 16, 2018

hvesalai commented Jan 17, 2018

hvesalai commented Jan 17, 2018

savulchik commented Jan 17, 2018 •

edited

hvesalai commented Jan 17, 2018

hvesalai commented Mar 1, 2018

ManagedArrayBlockingQueue is paused indefinitely having not empty itemQueue #1807

ManagedArrayBlockingQueue is paused indefinitely having not empty itemQueue #1807

Comments

savulchik commented Oct 30, 2017

savulchik commented Nov 8, 2017

hvesalai commented Jan 16, 2018

hvesalai commented Jan 16, 2018

hvesalai commented Jan 16, 2018

hvesalai commented Jan 17, 2018

hvesalai commented Jan 17, 2018

savulchik commented Jan 17, 2018 • edited

hvesalai commented Jan 17, 2018

hvesalai commented Mar 1, 2018

savulchik commented Jan 17, 2018 •

edited