Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISPN-7172 Total order caches can hang during join #4663

Conversation

danberindei
Copy link
Member

https://issues.jboss.org/browse/ISPN-7172

  • Fix WithinThreadExecutor handling in LimitedExecutor
  • Remove LimitedExecutor permit before putting the LocalCacheStatus
    in the runningCaches map.

CompletableFuture<Void> joinFuture = new CompletableFuture<>();
cacheStatus.getTopologyUpdatesExecutor().executeAsync(() -> joinFuture);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this change really needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's strictly necessary for correctness at this point or it just makes tests more predictable, but I think it's safer to implement it as promised in the comments above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how initializing the runningCaches before or after the sending the joinFuture to the executor could affect anything...
My point is: I'm not seeing any difference in the logic executed. So, what side effect am I missing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Submitting joinFuture will prevent the executor from running any other task until joinFuture is completed. If the cache exists in runningCaches and it's LimitedExecutor has a free spot, it will process topology updates, and those have a chance of doing "stuff" before we have properly joined.
I think the initial stuff I was worried about was just blocking for a new view, which could overwhelm the OOB thread pool given enough caches. In this bug, the topology update was exposing the bug in LimitedExecutor itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense.

@danberindei danberindei force-pushed the ISPN-7172_LimitedExecutor_WithinThreadExecutor branch from a134462 to bd4d79c Compare November 16, 2016 17:21
* Fix WithinThreadExecutor handling in LimitedExecutor
* Remove LimitedExecutor permit before putting the LocalCacheStatus
  in the runningCaches map.
@danberindei danberindei force-pushed the ISPN-7172_LimitedExecutor_WithinThreadExecutor branch from bd4d79c to fc7108e Compare November 18, 2016 10:55
@pruivo pruivo merged commit 82a1690 into infinispan:master Nov 18, 2016
@pruivo
Copy link
Member

pruivo commented Nov 18, 2016

integrated! thanks @danberindei !

@danberindei danberindei deleted the ISPN-7172_LimitedExecutor_WithinThreadExecutor branch November 21, 2016 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants