Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ManagedArrayBlockingQueue is paused indefinitely having not empty itemQueue #1807

Closed
savulchik opened this issue Oct 30, 2017 · 9 comments
Closed
Milestone

Comments

@savulchik
Copy link
Contributor

I'm investigating a rare problem that manifests itself as:

  1. slick 3.2.1 stopped processing DBIOActions from the ManagedArrayBlockingQueue that became paused having not empty itemQueue. Async executor threads are waiting on notEmpty lock:
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000000f045c408> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	at slick.util.ManagedArrayBlockingQueue.$anonfun$take$1(ManagedArrayBlockingQueue.scala:115)
	at slick.util.ManagedArrayBlockingQueue$$Lambda$591/467364322.apply(Unknown Source)
	at slick.util.ManagedArrayBlockingQueue.lockedInterruptibly(ManagedArrayBlockingQueue.scala:222)
	at slick.util.ManagedArrayBlockingQueue.take(ManagedArrayBlockingQueue.scala:114)
	at slick.util.ManagedArrayBlockingQueue.take(ManagedArrayBlockingQueue.scala:13)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
  1. hikaricp 2.5.1 closed all connections due to idle connection timeout

Inside a heap dump of the application I found the following state of ManagedArrayBlockingQueue instance:

Type   |Name                                                   |Value
--------------------------------------------------------------------------------------------------------------------------------------------------
boolean|bitmap$0                                               |true
ref    |logger                                                 |slick.util.SlickLogger @ 0xf13c8d68
boolean|paused                                                 |true
int    |inUseCount                                             |12
ref    |slick$util$ManagedArrayBlockingQueue$$highPrioItemQueue|slick.util.InternalArrayQueue @ 0xf045c900
ref    |slick$util$ManagedArrayBlockingQueue$$itemQueue        |slick.util.InternalArrayQueue @ 0xf045c8e0
ref    |notFull                                                |java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject @ 0xf045c8c8
ref    |notEmpty                                               |java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject @ 0xf045c408
ref    |slick$util$ManagedArrayBlockingQueue$$lock             |java.util.concurrent.locks.ReentrantLock @ 0xf045c3d8
int    |capacity                                               |20000
int    |maximumInUse                                           |12
--------------------------------------------------------------------------------------------------------------------------------------------------

and it's itemQueue state:

Type|Name                                    |Value
----------------------------------------------------------------------------------
int |count                                   |687
int |putIndex                                |8610
int |slick$util$InternalArrayQueue$$takeIndex|7923
ref |items                                   |java.lang.Object[40000] @ 0xf045e7b0
----------------------------------------------------------------------------------

I'm wondering if I stumbled upon a concurrency issue in slick (like in #1730)? And could I replace a ManagedArrayBlockingQueue with a LinkedBlockingQueue to workaround the possible concurrency issue with ManagedArrayBlockingQueue?

Thanks!

@savulchik
Copy link
Contributor Author

Update: Suddenly got the exception

Exception in thread "jdbc-pool-3" java.lang.IllegalArgumentException: requirement failed: count cannot be increased
        at scala.Predef$.require(Predef.scala:277)
        at slick.util.ManagedArrayBlockingQueue.$anonfun$increaseInUseCount$1(ManagedArrayBlockingQueue.scala:40)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
        at slick.util.ManagedArrayBlockingQueue.slick$util$ManagedArrayBlockingQueue$$locked(ManagedArrayBlockingQueue.scala:217)
        at slick.util.ManagedArrayBlockingQueue.increaseInUseCount(ManagedArrayBlockingQueue.scala:39)
        at slick.util.AsyncExecutor$$anon$2$$anon$1.beforeExecute(AsyncExecutor.scala:74)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

@hvesalai
Copy link
Member

Seeing the same thing and debugging it. @kwark could you perhaps recall why InternalArrayQueue was created. I cannot see any justification for it and it does not seem to behave as its scaladoc would suggest ("additional logic for temporarily rejecting elements").

@hvesalai
Copy link
Member

There problem is here:

at slick.util.AsyncExecutor$$anon$2$$anon$1.beforeExecute(AsyncExecutor.scala:74)

For all connections taken from the itemQueue the increaseInUseCount has already been called in extract of ManagedArrayBlockingQueue. Hence the item must be from the highPriorityQueue. However, that queue should only have items that already have a connection.

Then how is it possible that this is not the case? One source could be this code in AsyncExecutor:

       override def execute(command: Runnable): Unit = {
          if (command.isInstanceOf[PrioritizedRunnable]) {
            executor.execute(command)
          } else {
            executor.execute(new PrioritizedRunnable {
              override val priority: Priority = WithConnection
              override def run(): Unit = command.run()
            })
          }
        }

@hvesalai
Copy link
Member

And other hyphothesis is that the priority of a PrioritizedRunnable changes between when it is queued and when it is executed. This could be possible since priority is def not val in the instances defined in BasicBackend

@hvesalai
Copy link
Member

I was wrong. This happens because not all task go through the queue. When there are less threads in the ThreadPoolExecutor, it creates a Worker thread to run a given Runnable and that bypasses the queue.

So if there are more threads than there are connections in the pool, then the requirement will fail.

@hvesalai
Copy link
Member

Or actually worse. It could fail in all cases when you have some executions holding the connections (inUseCount has been incremented) and running on application threads (i.e. not doing blocking I/O, but still holding the connection because of tx, etc). Then if a new Runnable is sent to the asyncExecutor it can create a new thread to run it, bypassing the queue, and shit hits the fan.

Having now been reading the code, I think it is extremely bad fragile and should be completely replaced. It is a big mud ball that should have never seen a commit.

@savulchik
Copy link
Contributor Author

savulchik commented Jan 17, 2018

@hvesalai Previously I thought that every submitted Runnable goes through the queue, but now after reading the docs and the actual code of ThreadPoolExecutor I realize that you are right, thanks!

In my project with controllable workload I eventually switched from ManagedArrayBlockingQueue to LinkedBlockingQueue via AsyncExecutor's setting queueSize = -1 (unbounded) and didn't notice any problems with it so far. I believe in the worst case the queue can accumulate too much tasks and result in OOM exception.

@hvesalai
Copy link
Member

The only way to prevent this issue when using ManagedArrayBlockingQueue with a limited number of connections is to have minThreads == maxThreads. This means that new threads are not created during use and thus the case is not triggered.

@hvesalai hvesalai added this to the Next fix release milestone Feb 28, 2018
@hvesalai
Copy link
Member

hvesalai commented Mar 1, 2018

I have created a series of PRs to warn about (or prevent) bad configurations. Please have a look.

#1853
#1854
#1855

hvesalai added a commit that referenced this issue Mar 2, 2018
 Fix #1614, Fix #1807: require and warn about bad configurations
@hvesalai hvesalai modified the milestones: Future fix release, 3.2.2 Mar 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants