Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible race condition when disconnecting #123

Closed
infeo opened this issue Nov 11, 2020 · 6 comments
Closed

Possible race condition when disconnecting #123

infeo opened this issue Nov 11, 2020 · 6 comments

Comments

@infeo
Copy link

infeo commented Nov 11, 2020

Summary

There is a possible race condition when closing the connection to the DBus via the DBusConnection class leading to an java.util.concurrent.RejectedExecutionException.

Description

In swiesend/secret-service#21 a problem is encountered where after a successfully established DBus connection with the DBusConnection class and the (later happend) disconnect, still at least one message is tried to handled. Because the appearance of the problem is intermittent, it may be a race condition. The exception stacktrace:

Exception in thread "DBusConnection" java.util.concurrent.RejectedExecutionException: Task org.freedesktop.dbus.connections.AbstractConnection$3@5a0fc9cb rejected from java.util.concurrent.ThreadPoolExecutor@1f0d4cee[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
	at java.base/java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2057)
	at java.base/java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:827)
	at java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1357)
	at org.freedesktop.dbus.connections.AbstractConnection.executeInWorkerThreadPool(AbstractConnection.java:946)
	at org.freedesktop.dbus.connections.AbstractConnection.handleMessage(AbstractConnection.java:920)
	at org.freedesktop.dbus.connections.AbstractConnection.handleMessage(AbstractConnection.java:709)
	at org.freedesktop.dbus.connections.IncomingMessageThread.run(IncomingMessageThread.java:43)

It seems that the workerThreadPoolExecutor was already terminated when the message should be handled. Looking at docs of the ThreadPoolExecutor, there are three cases where it is shutdown:

  1. explicit call of shutdown()
  2. explicit call of shutdownNow()
  3. it has no remaining threads and is no longer referenced

The first two cases happen only in either in the changeThreadCount() or disconnectInternal() method of the AbstractConnection class. The first one basically transfers all jobs to a new executor in a thread safe manner, hence it should impose no problem.

The second, disconnectInternal(), is therefore the one I looked at. The DBusConnection is an object which recieves, send and handle messages. Currently, first the handling is closed and afterwards the recieving part.

workerThreadPoolLock.writeLock().lock();
try {
// try to wait for all pending tasks.
workerThreadPool.shutdown();
workerThreadPool.awaitTermination(10, TimeUnit.SECONDS); // 10 seconds should be enough, otherwise fail
} catch (InterruptedException _ex) {
logger.error("Interrupted while waiting for worker threads to be terminated.", _ex);
} finally {
workerThreadPoolLock.writeLock().unlock();
}
// shutdown sender executor service, send all remaining messages in main thread
for (Runnable runnable : senderService.shutdownNow()) {
runnable.run();
}
// stop the main thread
run = false;
connected = false;
readerThread.setTerminate(true);

The message source is the IncomingMessageThread. This object is terminated via a variable:

    private boolean                  terminate;
   
    /*
     *  other stuff
     */

    public void setTerminate(boolean _terminate) {
        terminate = _terminate;
        interrupt();
    }

    @Override
    public void run() {

        Message msg = null;
        while (!terminate) {
            /*
             * do stuff
             */
        }
    }

While I'm not a fan of closing the message source after closing the handling, more critical is that neither the connected variable nor terminate are volatile. This could lead to a cache coherence problem (see https://www.baeldung.com/java-volatile#volatile).

Suggested Fix

Make the variables connected and terminate volatile.

Additionally, i suggest to first close the message source (setting terminate to false) before shutting down the executor.

As a third point, maybe the setTerminate(boolean _terminate) method can be refactored to terminate() to only terminate the connection and not setting it to any value.

@hypfvieh
Copy link
Owner

I implemented some of the suggestions. Please try if you can reproduce the issue using the latest changes.

swiesend added a commit to swiesend/secret-service that referenced this issue Nov 12, 2020
@swiesend
Copy link

swiesend commented Nov 12, 2020

Hey @hypfvieh!

First of all a huge thank you for providing this lib and maintaining it with such quick replies!

I had a chance to test your changes, but unfortunately that didn't play out.

I documented my test here:
cryptomator/integrations-linux#2

@hypfvieh
Copy link
Owner

Are you sure you use the latest git revision?

The line numbers reported in the exception do not match.
e.g.AbstractConnection.java:946 is an empty line,
AbstractConnection.java:920 is the closing bracket of a catch block ...
AbstractConnection.java:709 is a piece of javadoc
IncomingMessageThread.java:43 this is the only correct line, but the the only change in that class was to add a volatile to the boolean member, so line numbers are likely the same as before

@swiesend
Copy link

swiesend commented Nov 13, 2020

Your were absolutely right @hypfvieh. I started a wrong cryptomator build with old dependencies.

I was quite confused yesterday evening as I couldn't observe the RejectedExecutionException in my secret-service tests anymore, but still with the wrong cryptomator build. But in my secret-serivice tests the RejectedExecutionException occurs less frequently, as when integrated with cryptomator.

So your changes actually have fixed the RejectedExecutionException issue.
Maybe it would be worth to add a regression test for that @hypfvieh?

But I still have problems on tearing down, as documented here.

@swiesend
Copy link

I still can trigger the RejectedExecutionException and the blocked shutdown:

cryptomator/integrations-linux#2 (comment)

@hypfvieh
Copy link
Owner

Seems to be fixed, see cryptomator/integrations-linux#2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants