ISPN-7172 Total order caches can hang during join #4663

danberindei · 2016-11-14T10:27:32Z

https://issues.jboss.org/browse/ISPN-7172

Fix WithinThreadExecutor handling in LimitedExecutor
Remove LimitedExecutor permit before putting the LocalCacheStatus
in the runningCaches map.

pruivo · 2016-11-16T14:24:15Z

core/src/main/java/org/infinispan/topology/LocalTopologyManagerImpl.java

      CompletableFuture<Void> joinFuture = new CompletableFuture<>();
      cacheStatus.getTopologyUpdatesExecutor().executeAsync(() -> joinFuture);
+


is this change really needed?

I'm not sure if it's strictly necessary for correctness at this point or it just makes tests more predictable, but I think it's safer to implement it as promised in the comments above.

I don't see how initializing the runningCaches before or after the sending the joinFuture to the executor could affect anything...
My point is: I'm not seeing any difference in the logic executed. So, what side effect am I missing?

Submitting joinFuture will prevent the executor from running any other task until joinFuture is completed. If the cache exists in runningCaches and it's LimitedExecutor has a free spot, it will process topology updates, and those have a chance of doing "stuff" before we have properly joined.
I think the initial stuff I was worried about was just blocking for a new view, which could overwhelm the OOB thread pool given enough caches. In this bug, the topology update was exposing the bug in LimitedExecutor itself.

makes sense.

* Fix WithinThreadExecutor handling in LimitedExecutor * Remove LimitedExecutor permit before putting the LocalCacheStatus in the runningCaches map.

pruivo · 2016-11-18T12:22:45Z

integrated! thanks @danberindei !

danberindei added the Ready for Review label Nov 14, 2016

danberindei added this to the 9.0.0.Beta1 milestone Nov 14, 2016

danberindei mentioned this pull request Nov 14, 2016

ISPN-7184 Server startup can fail after the upgrade to JGroups 4 #4656

Closed

pruivo reviewed Nov 16, 2016

View reviewed changes

danberindei force-pushed the ISPN-7172_LimitedExecutor_WithinThreadExecutor branch from a134462 to bd4d79c Compare November 16, 2016 17:21

ISPN-7172 Total order caches can hang during join

fc7108e

* Fix WithinThreadExecutor handling in LimitedExecutor * Remove LimitedExecutor permit before putting the LocalCacheStatus in the runningCaches map.

danberindei force-pushed the ISPN-7172_LimitedExecutor_WithinThreadExecutor branch from bd4d79c to fc7108e Compare November 18, 2016 10:55

pruivo merged commit 82a1690 into infinispan:master Nov 18, 2016

danberindei deleted the ISPN-7172_LimitedExecutor_WithinThreadExecutor branch November 21, 2016 16:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ISPN-7172 Total order caches can hang during join #4663

ISPN-7172 Total order caches can hang during join #4663

danberindei commented Nov 14, 2016

pruivo Nov 16, 2016

danberindei Nov 16, 2016

pruivo Nov 16, 2016

danberindei Nov 18, 2016 •

edited

pruivo Nov 18, 2016

pruivo commented Nov 18, 2016

		CompletableFuture<Void> joinFuture = new CompletableFuture<>();
		cacheStatus.getTopologyUpdatesExecutor().executeAsync(() -> joinFuture);

ISPN-7172 Total order caches can hang during join #4663

ISPN-7172 Total order caches can hang during join #4663

Conversation

danberindei commented Nov 14, 2016

pruivo Nov 16, 2016

Choose a reason for hiding this comment

danberindei Nov 16, 2016

Choose a reason for hiding this comment

pruivo Nov 16, 2016

Choose a reason for hiding this comment

danberindei Nov 18, 2016 • edited

Choose a reason for hiding this comment

pruivo Nov 18, 2016

Choose a reason for hiding this comment

pruivo commented Nov 18, 2016

danberindei Nov 18, 2016 •

edited