Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Clients wait indefinitely when pool exhausted #638

Open
mohitanchlia opened this Issue Oct 28, 2013 · 2 comments

Comments

Projects
None yet
2 participants

Recently we saw a issue where all our threads were waiting on waitForCompletion and it never came out of this condition.

   at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
    at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:389)
    at me.prettyprint.cassandra.connection.ConcurrentHClientPool.waitForConnection(ConcurrentHClientPool.java:140)
    at me.prettyprint.cassandra.connection.ConcurrentHClientPool.borrowClient(ConcurrentHClientPool.java:108)

It appears that there might a bug in Hector which is triggered in a very specific condition:

  1. During high load when pool is fully utilized

  2. During this time if there is a hiccup in the network or communication issue with the node, then at that point hector is not able to add the HClient back to the pool and throws runtime exception, but in borrowClient call of HClient increments prematurely even though cassandraClient may return null.

    HClient cassandraClient = availableClientQueue.poll();
    int currentActiveClients = activeClientsCount.incrementAndGet();

  3. The above logic then puts it in this condition:

    if (currentActiveClients <= cassandraHost.getMaxActive()) {
      cassandraClient = createClient();
    } else {
      // We can't grow so let's wait for a connection to become available.
      cassandraClient = waitForConnection();
    }
    

And eventually blocks everything because there are no elements in availableClientQueue.

I think the fix is to increment only after cassandraClient != null.

Let me know if this looks ok?

It appears that

int currentActiveClients = activeClientsCount.incrementAndGet();

should really be
int currentActiveClients = availableClientQueue.size();

and then this condition makes sense:

  if ( cassandraClient == null ) {

    if (currentActiveClients <= cassandraHost.getMaxActive()) {
      cassandraClient = createClient();
    } else {
      // We can't grow so let's wait for a connection to become available.
      cassandraClient = waitForConnection();
    }

  }

Currently we seem to be over incrementing this count and possibility of ending up with bad pool and not noticing it until errors start happening

We have tried the change suggested by @mohitanchlia; however, we encounter another lock competition havoc. availableClientQueue of type ArrayBlockingQueue would internally access the critical section protected by an ReentrantLock. Under heavy traffic of database access, this lock might impact the overall query performance form a client. We ended up using the socket timeout mechanism to allow a thread to exit when its has waited for too long.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment