ISPN-11385 Convert Remote Command Executor to Non blocking/blocking t… #7997

wburns · 2020-03-04T21:00:03Z

…hread executor

Invoke commands that block on blocking executor
Invoke other commands by caller
Use non blocking executor instead of remote in other places

https://issues.redhat.com/browse/ISPN-11385
https://issues.redhat.com/browse/ISPN-11473
https://issues.redhat.com/browse/ISPN-11489

wburns · 2020-03-04T21:00:44Z

Preview to confirm other CI modules pass. With this all the regular thread pools are combined into the non blocking and blocking ones !

wburns · 2020-03-04T22:17:54Z

please run performance tests please

core/src/main/java/org/infinispan/commands/functional/AbstractWriteManyCommand.java

wburns · 2020-03-05T17:12:08Z

run performance tests please

wburns · 2020-03-05T19:55:05Z

Only 1 related test failure in the previous run, which should now be fixed (AsyncInvocationTest thread leak fixed).

Rerunning with that fixed and also deprecation of canBlock method.

ghost · 2020-03-05T20:00:46Z

Performance tests run successfully. Link to the results here.

Additional info:
Commit: 0574783
Build number: #400
Comment body: run performance tests please
skip ci

wburns · 2020-03-11T13:26:20Z

CI is all good https://ci.infinispan.org/job/Infinispan/job/PR-7997/6/

core/src/main/java/org/infinispan/globalstate/impl/GlobalConfigurationManagerImpl.java

core/src/main/java/org/infinispan/manager/DefaultCacheManager.java

core/src/main/java/org/infinispan/manager/impl/ReplicableManagerFunctionCommand.java

core/src/main/java/org/infinispan/remoting/inboundhandler/GlobalInboundInvocationHandler.java

core/src/main/java/org/infinispan/topology/ClusterTopologyManagerImpl.java

core/src/main/java/org/infinispan/util/CoreBlockHoundIntegration.java

danberindei · 2020-03-18T06:35:57Z

core/src/test/java/org/infinispan/util/CoreTestBlockHoundIntegration.java

@@ -67,6 +73,10 @@ private static void allowTestsToBlock(BlockHound.Builder builder) {
      CommonsBlockHoundIntegration.allowPublicMethodsToBlock(builder, NotifierLatch.class);

      CommonsBlockHoundIntegration.allowPublicMethodsToBlock(builder, TestBlocking.class);
+
+      CommonsBlockHoundIntegration.allowMethodsToBlock(builder, Class.forName(ReplListener.class.getName() + "$ReplListenerInterceptor"), false);


Assuming this is just for the Thread.sleep() call, I think it would be better to add an executor parameter to TestingUtil.delayed() and to inject the non-blocking executor in ReplListenerInterceptor.

Actually am not sure why I didn't just add TestingUtil#sleepThread to the exception list. Let me try that instead.

Ah it was because of logCommand that acquires a lock. I think I will leave it as is for now.

But I can also add TestingUtil#sleepThread as okay to block though too.

+1 to add TestingUtil#sleepThread

Seems to work fine without the ReplListenerInterceptor exception now.

wburns · 2020-03-18T17:24:08Z

Actually now that the persistence checks are back in it found more issues. So I am working on this still.

wburns · 2020-03-18T20:35:07Z

It looks like it should be fixed now, will let CI sort it out :)

danberindei · 2020-03-18T22:58:04Z

core/src/main/java/org/infinispan/manager/DefaultCacheManager.java

@@ -1139,11 +1138,14 @@ public ClusterExecutor executor() {
      if (transport != null) {
         long time = configurationManager.getGlobalConfiguration().transport().distributedSyncTimeout();
         return ClusterExecutors.allSubmissionExecutor(null, this, transport, time, TimeUnit.MILLISECONDS,
-               globalComponentRegistry.getComponent(ExecutorService.class, KnownComponentNames.REMOTE_COMMAND_EXECUTOR),
+               // This can run arbitrary code, including user - such commands can block


No longer necessary?

No, I more put it here because it can block :)

Should it be using the non-blocking executor then?

Unfortunately until the other JIRA is fixed, we don't have a great solution. And cluster executor isn't that widely used afaik. But we should hopefully get it fixed before people use it like this.

Unfortunately I think the few users of cluster executor may be doing exactly blocking cache operations, because there's no way to return a value asynchronously.

I'm starting to think that the proper solution is

change PersistenceManagerImpl to detect if it is a blocking thread and run it inline and if non blocking thread to run the command in a blocking thread.

In fact, I would go even further, and change continueOnCPUExecutor to also continue on the caller thread if the caller thread was blocking. Otherwise, for cluster executor tasks doing cache.put(k1, v1), where the put requires 1 store operation to read the previous value and 1 store operation to store the value, the store read would happen on the task's initial blocking thread, but the store write would need another blocking thread. If the size of the blocking thread pool is N and you have N simultaneous tasks like this, there's no free thread to process the store writes.

Just an FYI but the only way currently we have to detect if it is a blocking thread is to check the thread name, which is quite brittle.

Also your put case, I don't see how the read and write would need concurrent blocking threads. The read would be done before then the write would be done afterwards, synchronously.

Just an FYI but the only way currently we have to detect if it is a blocking thread is to check the thread name, which is quite brittle.

Can't we do !(Thread.currentThread() instanceof ISPNNonBlockingThread)?

Also your put case, I don't see how the read and write would need concurrent blocking threads. The read would be done before then the write would be done afterwards, synchronously.

If the cluster executor task does a blocking cache.put(k, v), it needs a (blocking) thread for the entire duration of the cache operation. The read would run on the same thread, but then continueOnCPUExecutor() would submit a task to the non-blocking executor, and the next PersistenceManagerImpl call would submit a task to the blocking executor.

Can't we do !(Thread.currentThread() instanceof ISPNNonBlockingThread)?

No, unfortunately. This would include user threads, jgroups etc.

If the cluster executor task does a blocking cache.put(k, v), it needs a (blocking) thread for the entire duration of the cache operation. The read would run on the same thread, but then continueOnCPUExecutor() would submit a task to the non-blocking executor, and the next PersistenceManagerImpl call would submit a task to the blocking executor.

Oh, okay you were not referring to the read then write. I agree if a blocking operation is invoked on a blocking thread then yes it would use more than 1.

core/src/main/java/org/infinispan/manager/impl/ReplicableManagerFunctionCommand.java

core/src/test/java/org/infinispan/stream/DistributedSequentialNonRehashStreamTest.java

danberindei · 2020-03-18T23:03:42Z

core/src/test/java/org/infinispan/util/CoreTestBlockHoundIntegration.java

@@ -67,6 +73,10 @@ private static void allowTestsToBlock(BlockHound.Builder builder) {
      CommonsBlockHoundIntegration.allowPublicMethodsToBlock(builder, NotifierLatch.class);

      CommonsBlockHoundIntegration.allowPublicMethodsToBlock(builder, TestBlocking.class);
+
+      CommonsBlockHoundIntegration.allowMethodsToBlock(builder, Class.forName(ReplListener.class.getName() + "$ReplListenerInterceptor"), false);


+1 to add TestingUtil#sleepThread

core/src/main/java/org/infinispan/remoting/inboundhandler/GlobalInboundInvocationHandler.java

danberindei · 2020-03-18T23:16:30Z

There are some startup failures in CI

wburns · 2020-03-18T23:39:41Z

Yeah I figured there may be and my local run was just a fluke.

wburns · 2020-03-19T00:08:28Z

So the failure is because we commit the transaction in a blocking thread as the API is blocking. However the store write can touch the cache store, which doesn't want you to run it on the blocking thread. I am not sure how to fix this other than to change PersistenceManagerImpl to detect if it is a blocking thread and run it inline and if non blocking thread to run the command in a blocking thread. I think this is how it should be long term, but sadly this check is finding lots of bugs as it is now :)

wburns · 2020-03-19T15:53:40Z

There are some test failures from the tx changes where I missed something, guessing I am not joining somewhere :)

wburns · 2020-03-20T12:42:26Z

Updated to fix the 2 blocking test failures.

core/src/main/java/org/infinispan/cache/impl/InvocationHelper.java

danberindei · 2020-03-20T10:21:10Z

core/src/main/java/org/infinispan/manager/DefaultCacheManager.java

@@ -1139,11 +1138,14 @@ public ClusterExecutor executor() {
      if (transport != null) {
         long time = configurationManager.getGlobalConfiguration().transport().distributedSyncTimeout();
         return ClusterExecutors.allSubmissionExecutor(null, this, transport, time, TimeUnit.MILLISECONDS,
-               globalComponentRegistry.getComponent(ExecutorService.class, KnownComponentNames.REMOTE_COMMAND_EXECUTOR),
+               // This can run arbitrary code, including user - such commands can block


Unfortunately I think the few users of cluster executor may be doing exactly blocking cache operations, because there's no way to return a value asynchronously.

I'm starting to think that the proper solution is

change PersistenceManagerImpl to detect if it is a blocking thread and run it inline and if non blocking thread to run the command in a blocking thread.

In fact, I would go even further, and change continueOnCPUExecutor to also continue on the caller thread if the caller thread was blocking. Otherwise, for cluster executor tasks doing cache.put(k1, v1), where the put requires 1 store operation to read the previous value and 1 store operation to store the value, the store read would happen on the task's initial blocking thread, but the store write would need another blocking thread. If the size of the blocking thread pool is N and you have N simultaneous tasks like this, there's no free thread to process the store writes.

core/src/main/java/org/infinispan/manager/impl/ReplicableManagerFunctionCommand.java

core/src/main/java/org/infinispan/transaction/impl/TransactionCoordinator.java

danberindei · 2020-03-20T14:33:00Z

core/src/main/java/org/infinispan/transaction/impl/TransactionCoordinator.java

+      }
+   }
+
+   private <T> CompletionStage<T> handleRollbackFailure(Throwable t, LocalTransaction localTransaction) {


AFAICT handleRollbackFailure and handleCommitFailure don't need to return a CompletionStage

They don't, but the users of it require it to be a CompletionStage, so less code overall. :)

If you return void and throw the exception directly, the callers can use handle instead of handleAndCompose

core/src/main/java/org/infinispan/transaction/xa/XaTransactionTable.java

core/src/main/java/org/infinispan/transaction/xa/recovery/RecoveryManagerImpl.java

danberindei · 2020-03-20T15:58:10Z

core/src/test/java/org/infinispan/commands/GetAllCommandNodeCrashTest.java

@@ -41,6 +43,7 @@ public void test() throws Exception {
      cache(2).put(key, "value");

      ControlledRpcManager rpcManager = ControlledRpcManager.replaceRpcManager(cache(2));
+      rpcManager.excludeCommands(StateResponseCommand.class, StateTransferStartCommand.class);


It's so weird that it wasn't a problem before, I have to debug the test to see how it's passing on master :)

Yeah, I agree. But I wasn't quite sure what was going on.

danberindei · 2020-03-20T18:40:04Z

core/src/main/java/org/infinispan/transaction/impl/TransactionCoordinator.java

+
+            //rollback transaction before throwing the exception as there's no guarantee the TM calls XAResource.rollback
+            //after prepare failed.
+            return CompletionStages.handleAndCompose(rollback(localTransaction), (ignore2, rollbackThrowable) -> {


No need for handleAndCompose

Not sure, why. We need to handle the case when it was not an error to wrap it with an XAException still. And we want to still catch the rollback exception to supress or rethrow that.

Yes, but none of those need to return a CompletionStage, so you can use handle instead of handleAndCompose.

I was trying to keep all the exceptions as bare XAException. If I do the other I would have to use CompletionException wrapping XAException and all callers would have to pay attention to that including TransactionXAAdapter, but I can do that.

Actually I don't feel comfortable changing this right now. I can revisit later if we need.

I was trying to keep all the exceptions as bare XAException. If I do the other I would have to use CompletionException wrapping XAException and all callers would have to pay attention to that including TransactionXAAdapter, but I can do that.

Not sure what you mean. When you do CompletableFuture.join() it will wrap the exception in a CompletionStage anyway, so you have to be prepared to extract the exception with CompletableFutures.extractException.

But I'm ok with revisiting this later.

core/src/main/java/org/infinispan/transaction/impl/TransactionCoordinator.java

core/src/main/java/org/infinispan/transaction/xa/TransactionXaAdapter.java

wburns · 2020-03-20T19:09:34Z

Fixed latest comments.

core/src/main/java/org/infinispan/transaction/xa/recovery/RecoveryManagerImpl.java

…hread executor * Invoke commands that block on blocking executor * Invoke other commands by caller * Use non blocking executor instead of remote in other places

… on a blocking thread

danberindei · 2020-03-20T21:29:07Z

Merged, thanks Will!

wburns · 2020-03-20T22:38:46Z

+1

…

On Fri, Mar 20, 2020, 5:28 PM Dan Berindei ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In core/src/test/java/org/infinispan/util/CoreTestBlockHoundIntegration.java <#7997 (comment)> : > @@ -67,6 +73,10 @@ private static void allowTestsToBlock(BlockHound.Builder builder) { CommonsBlockHoundIntegration.allowPublicMethodsToBlock(builder, NotifierLatch.class); CommonsBlockHoundIntegration.allowPublicMethodsToBlock(builder, TestBlocking.class); + + CommonsBlockHoundIntegration.allowMethodsToBlock(builder, Class.forName(ReplListener.class.getName() + "$ReplListenerInterceptor"), false); Seems to work fine without the ReplListenerInterceptor exception now. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7997 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAE6H35HN5XE3N3ZOXPQLO3RIPNZTANCNFSM4LBT6TGQ> .

wburns added the Preview label Mar 4, 2020

ryanemerson reviewed Mar 5, 2020

View reviewed changes

core/src/main/java/org/infinispan/commands/functional/AbstractWriteManyCommand.java Outdated Show resolved Hide resolved

wburns force-pushed the ISPN-11385_convert_remote_command_executor branch from 75e1aa2 to 0574783 Compare March 5, 2020 17:00

wburns removed the Preview label Mar 5, 2020

wburns force-pushed the ISPN-11385_convert_remote_command_executor branch from 9c073f6 to 57a1afb Compare March 10, 2020 17:00

wburns mentioned this pull request Mar 17, 2020

ISPN-11443 PersistenceManagerImpl thread checks need to be updated #8022

Merged

danberindei reviewed Mar 18, 2020

View reviewed changes

wburns force-pushed the ISPN-11385_convert_remote_command_executor branch 2 times, most recently from b3bd615 to ad99dea Compare March 18, 2020 16:53

wburns added the Changes Required label Mar 18, 2020

wburns removed the Changes Required label Mar 18, 2020

danberindei reviewed Mar 18, 2020

View reviewed changes

wburns force-pushed the ISPN-11385_convert_remote_command_executor branch from 4730be1 to bf6f39a Compare March 19, 2020 15:50

wburns force-pushed the ISPN-11385_convert_remote_command_executor branch 4 times, most recently from 9797610 to 5ae68c5 Compare March 20, 2020 04:22

wburns mentioned this pull request Mar 20, 2020

ISPN-11473 InvocationHelper should commit or rollback the transaction on a blocking thread #8062

Closed

wburns force-pushed the ISPN-11385_convert_remote_command_executor branch 2 times, most recently from 21db4df to fdefde6 Compare March 20, 2020 12:42

danberindei reviewed Mar 20, 2020

View reviewed changes

wburns force-pushed the ISPN-11385_convert_remote_command_executor branch 3 times, most recently from d028787 to 7010d95 Compare March 20, 2020 17:01

danberindei reviewed Mar 20, 2020

View reviewed changes

core/src/main/java/org/infinispan/transaction/impl/TransactionCoordinator.java Outdated Show resolved Hide resolved

danberindei reviewed Mar 20, 2020

View reviewed changes

core/src/main/java/org/infinispan/transaction/xa/TransactionXaAdapter.java Outdated Show resolved Hide resolved

wburns force-pushed the ISPN-11385_convert_remote_command_executor branch from 7010d95 to 3173ad9 Compare March 20, 2020 19:09

wburns force-pushed the ISPN-11385_convert_remote_command_executor branch 2 times, most recently from 514d861 to 6030cc1 Compare March 20, 2020 19:12

danberindei reviewed Mar 20, 2020

View reviewed changes

core/src/main/java/org/infinispan/transaction/xa/recovery/RecoveryManagerImpl.java Outdated Show resolved Hide resolved

wburns added 3 commits March 20, 2020 16:54

ISPN-11489 TransactionCoordinator updated for non blocking

553b36d

ISPN-11385 Convert Remote Command Executor to Non blocking/blocking t…

b352ff4

…hread executor * Invoke commands that block on blocking executor * Invoke other commands by caller * Use non blocking executor instead of remote in other places

ISPN-11473 InvocationHelper should commit or rollback the transaction…

0caa5b7

… on a blocking thread

wburns force-pushed the ISPN-11385_convert_remote_command_executor branch from 6030cc1 to 0caa5b7 Compare March 20, 2020 20:55

danberindei merged commit 2c74173 into infinispan:master Mar 20, 2020

ISPN-11385 Convert Remote Command Executor to Non blocking/blocking t… #7997

ISPN-11385 Convert Remote Command Executor to Non blocking/blocking t… #7997

Conversation

wburns commented Mar 4, 2020 • edited Loading

wburns commented Mar 4, 2020

wburns commented Mar 4, 2020

wburns commented Mar 5, 2020

wburns commented Mar 5, 2020

ghost commented Mar 5, 2020

wburns commented Mar 11, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wburns commented Mar 18, 2020

wburns commented Mar 18, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wburns Mar 19, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danberindei commented Mar 18, 2020

wburns commented Mar 18, 2020

wburns commented Mar 19, 2020

wburns commented Mar 19, 2020

wburns commented Mar 20, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wburns Mar 20, 2020 • edited Loading

Choose a reason for hiding this comment

danberindei Mar 20, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wburns commented Mar 20, 2020

danberindei commented Mar 20, 2020

wburns commented Mar 20, 2020 via email

wburns commented Mar 4, 2020 •

edited

Loading

wburns Mar 19, 2020 •

edited

Loading

wburns Mar 20, 2020 •

edited

Loading

danberindei Mar 20, 2020 •

edited

Loading