Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISPN-13309 Rolling Upgrades with Hot Rod protocol version < 2.8 ad encoding fails #9540

Merged

Conversation

gustavocoding
Copy link

@tristantarrant
Copy link
Member

Lots of thread leak reports introduced by this

@gustavocoding
Copy link
Author

The reports come from another test, not the one added here, and fails at random times. It's not obvious what is the issue, as both tests correctly allocate and clean the resources they use. Looks like it sometimes gets stuck in the netty event loop while stopping...

@gustavocoding
Copy link
Author

This is what I can see in my env, many threads getting reject executions:

"non-blocking-thread-HotRodUpgradeDynamicStoreTest-NodeL-p642-t2" #3278 daemon prio=5 os_prio=0 cpu=174861.97ms elapsed=297.26s tid=0x00007f99303b7800 nid=0x127ee1 runnable  [0x00007f9897abf000]
   java.lang.Thread.State: RUNNABLE
        at java.lang.Throwable.fillInStackTrace(java.base@11.0.10/Native Method)
        at java.lang.Throwable.fillInStackTrace(java.base@11.0.10/Throwable.java:787)
        - locked <0x00000000f8266988> (a java.util.concurrent.RejectedExecutionException)
        at java.lang.Throwable.<init>(java.base@11.0.10/Throwable.java:270)
        at java.lang.Exception.<init>(java.base@11.0.10/Exception.java:66)
        at java.lang.RuntimeException.<init>(java.base@11.0.10/RuntimeException.java:62)
        at java.util.concurrent.RejectedExecutionException.<init>(java.base@11.0.10/RejectedExecutionException.java:64)
        at io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:926)
        at io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:353)
        at io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:346)
        at io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:828)
        at io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:818)
        at org.infinispan.server.core.transport.NonRecursiveEventLoopGroup.execute(NonRecursiveEventLoopGroup.java:46)
        at org.infinispan.util.concurrent.BlockingTaskAwareExecutorServiceImpl.doExecute(BlockingTaskAwareExecutorServiceImpl.java:169)
        at org.infinispan.util.concurrent.BlockingTaskAwareExecutorServiceImpl.tryBlockedTasks(BlockingTaskAwareExecutorServiceImpl.java:152)
        at org.infinispan.util.concurrent.BlockingTaskAwareExecutorServiceImpl.checkForReadyTasks(BlockingTaskAwareExecutorServiceImpl.java:102)
        at org.infinispan.util.concurrent.BlockingTaskAwareExecutorServiceImpl.execute(BlockingTaskAwareExecutorServiceImpl.java:119)
        at java.util.concurrent.CompletableFuture$UniCompletion.claim(java.base@11.0.10/CompletableFuture.java:568)
        at java.util.concurrent.CompletableFuture.uniHandle(java.base@11.0.10/CompletableFuture.java:920)
        at java.util.concurrent.CompletableFuture$UniHandle.tryFire(java.base@11.0.10/CompletableFuture.java:907)
        at java.util.concurrent.CompletableFuture.postComplete(java.base@11.0.10/CompletableFuture.java:506)
        at java.util.concurrent.CompletableFuture.complete(java.base@11.0.10/CompletableFuture.java:2073)
        at org.infinispan.util.concurrent.ActionSequencer$SequenceEntry.accept(ActionSequencer.java:211)
        at org.infinispan.util.concurrent.ActionSequencer$SequenceEntry.accept(ActionSequencer.java:179)
        at java.util.concurrent.CompletableFuture.uniWhenComplete(java.base@11.0.10/CompletableFuture.java:859)
        at java.util.concurrent.CompletableFuture.uniWhenCompleteStage(java.base@11.0.10/CompletableFuture.java:883)
        at java.util.concurrent.CompletableFuture.whenComplete(java.base@11.0.10/CompletableFuture.java:2251)
        at java.util.concurrent.CompletableFuture.whenComplete(java.base@11.0.10/CompletableFuture.java:143)
        at org.infinispan.util.concurrent.ActionSequencer$SequenceEntry.run(ActionSequencer.java:227)
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at java.lang.Thread.run(java.base@11.0.10/Thread.java:834)

   Locked ownable synchronizers:
        - None

and one thread stuck while stopping

"testng-HotRodUpgradePojoTest" #25 prio=5 os_prio=0 cpu=2975.10ms elapsed=360.76s tid=0x00007f99b8f10800 nid=0x126dae in Object.wait()  [0x00007f9978be8000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.$$BlockHound$$_wait(java.base@11.0.10/Native Method)
        at java.lang.Object.wait(java.base@11.0.10/Object.java)
        at java.lang.Object.wait(java.base@11.0.10/Object.java:328)
        at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:253)
        - waiting to re-lock in wait() <0x00000000c28e6b68> (a io.netty.util.concurrent.DefaultPromise)
        at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:35)
        at org.infinispan.server.core.transport.NonRecursiveEventLoopGroup.shutdownGracefullyAndWait(NonRecursiveEventLoopGroup.java:82)
        at org.infinispan.server.core.transport.ServerCorePackageImpl$1.stop(ServerCorePackageImpl.java:51)
        at org.infinispan.server.core.transport.ServerCorePackageImpl$1.stop(ServerCorePackageImpl.java:49)
        at org.infinispan.factories.impl.BasicComponentRegistryImpl.invokeStop(BasicComponentRegistryImpl.java:678)
        at org.infinispan.factories.impl.BasicComponentRegistryImpl.doStopWrapper(BasicComponentRegistryImpl.java:674)
        at org.infinispan.factories.impl.BasicComponentRegistryImpl.stopWrapper(BasicComponentRegistryImpl.java:662)
        at org.infinispan.factories.impl.BasicComponentRegistryImpl.stop(BasicComponentRegistryImpl.java:529)
        at org.infinispan.factories.AbstractComponentRegistry.internalStop(AbstractComponentRegistry.java:374)
        at org.infinispan.factories.AbstractComponentRegistry.stop(AbstractComponentRegistry.java:308)
        at org.infinispan.manager.DefaultCacheManager.internalStop(DefaultCacheManager.java:825)
        at org.infinispan.manager.DefaultCacheManager.stop(DefaultCacheManager.java:798)
        at org.infinispan.test.SecurityActions.lambda$stopManager$0(SecurityActions.java:35)
        at org.infinispan.test.SecurityActions$$Lambda$1297/0x00000001009b5840.run(Unknown Source)
        at org.infinispan.security.Security.doPrivileged(Security.java:56)
        at org.infinispan.test.SecurityActions.doPrivileged(SecurityActions.java:29)
        at org.infinispan.test.SecurityActions.stopManager(SecurityActions.java:34)
        at org.infinispan.test.TestingUtil.killCacheManagers(TestingUtil.java:866)
        at org.infinispan.test.TestingUtil.killCacheManagers(TestingUtil.java:856)
        at org.infinispan.persistence.remote.upgrade.TestCluster.lambda$destroy$0(TestCluster.java:89)
        at org.infinispan.persistence.remote.upgrade.TestCluster$$Lambda$1296/0x00000001009b5440.accept(Unknown Source)
        at java.util.ArrayList.forEach(java.base@11.0.10/ArrayList.java:1541)
        at org.infinispan.persistence.remote.upgrade.TestCluster.destroy(TestCluster.java:89)
        at org.infinispan.persistence.remote.upgrade.HotRodUpgradePojoTest.tearDown(HotRodUpgradePojoTest.java:178)
        at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@11.0.10/Native Method)
        at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@11.0.10/NativeMethodAccessorImpl.java:62)
        at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.10/DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(java.base@11.0.10/Method.java:566)
        at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:124)
        at org.testng.internal.MethodInvocationHelper.invokeMethodConsideringTimeout(MethodInvocationHelper.java:59)
        at org.testng.internal.Invoker.invokeConfigurationMethod(Invoker.java:458)
        at org.testng.internal.Invoker.invokeConfigurations(Invoker.java:222)
        at org.testng.internal.Invoker.invokeMethod(Invoker.java:646)
        at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:719)
        at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:989)
        at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:125)
                                                                                                                                                                                                                            

Which points to this guy getting possibly deadlocked org.infinispan.server.core.transport.NonRecursiveEventLoopGroup

/cc @danberindei and @wburns

@tristantarrant
Copy link
Member

Which is #9547

@tristantarrant
Copy link
Member

@gustavonalle can you rebase this, since #9547 is now in

@gustavocoding
Copy link
Author

rebased

@tristantarrant tristantarrant merged commit b0e9f46 into infinispan:main Oct 11, 2021
@tristantarrant
Copy link
Member

Merged, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants