Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offload non-cooperative ProcessorSupplier.init and ProcessorMetaSupplier.init [HZ-1204] #21595

Merged
merged 101 commits into from Aug 16, 2022

Conversation

TomaszGaweda
Copy link
Contributor

@TomaszGaweda TomaszGaweda commented Jun 10, 2022

Allow ProcessorMetaSupplier.init and ProcessorSupplier.init to be offloaded to a different thread to not starve the partition thread. User can mark PS/PMS as (non-)cooperative by overriding PMS#initIsCooperative and PS#initIsCooperative method.

Fixes #21499

Checklist:

  • Labels (Team:, Type:, Source:, Module:) and Milestone set
  • Label Add to Release Notes or Not Release Notes content set
  • Request reviewers if possible
  • Send backports/forwardports if fix needs to be applied to past/future releases
  • New public APIs have @Nonnull/@Nullable annotations
  • New public APIs have @since tags in Javadoc

The change is extensive. Won't be backported.

@TomaszGaweda TomaszGaweda changed the title Basic Processor and ProcessorSupplier init offloading Offload ProcessorSupplier.init and ProcessorMetaSupplier.init if it's marked as non-cooperative Jun 14, 2022
@hazelcast hazelcast deleted a comment from hz-devops-test Jun 15, 2022
@hazelcast hazelcast deleted a comment from hz-devops-test Jun 15, 2022
@hazelcast hazelcast deleted a comment from hz-devops-test Jun 15, 2022
@hazelcast hazelcast deleted a comment from hz-devops-test Jun 15, 2022
@hazelcast hazelcast deleted a comment from hz-devops-test Jun 15, 2022
@AyberkSorgun AyberkSorgun changed the title Offload ProcessorSupplier.init and ProcessorMetaSupplier.init if it's marked as non-cooperative Offload ProcessorSupplier.init and ProcessorMetaSupplier.init if it's marked as non-cooperative [HZ-1204] Jun 17, 2022
@TomaszGaweda
Copy link
Contributor Author

Note to reviewers: most of the lines marked as changed are just moved inside some lambda (and because of that, identation changed). Instead of blocking code in various places I'm composing a chain of actions using CompletableFuture. Each CF may be executed on offload executor

@hz-devops-test
Copy link

The job Hazelcast-pr-compiler of your PR failed. (Hazelcast internal details: build log, artifacts).
Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log file
--------------------------
---------SUMMARY----------
--------------------------
[ERROR] Failed to execute goal on project hazelcast-sql: Could not resolve dependencies for project com.hazelcast:hazelcast-sql:jar:5.2-SNAPSHOT: Could not transfer artifact net.java.dev.jna:jna:jar:5.2.0 from/to nexus-proxy (http://jenkins.hazelcast.com:8081/content/groups/public/): GET request of: net/java/dev/jna/jna/5.2.0/jna-5.2.0.jar from nexus-proxy failed: Connection reset -> [Help 1]
--------------------------

@TomaszGaweda
Copy link
Contributor Author

run-lab-run

@hz-devops-test
Copy link

The job Hazelcast-pr-builder of your PR failed. (Hazelcast internal details: build log, artifacts).
Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log file
--------------------------
-------TEST FAILURE-------
--------------------------
[INFO] Results:
[INFO] 
[ERROR] Failures: 
[ERROR]   BasicClusterStateTest.pendingInvocations_shouldBeNotified_whenMemberLeft_whenClusterState_PASSIVE 
Expected: an instance of com.hazelcast.core.MemberLeftException
     but:  is a java.util.concurrent.ExecutionException
Stacktrace was: java.util.concurrent.ExecutionException: java.lang.IllegalStateException: Cluster is in PASSIVE state! Operation: com.hazelcast.partition.IndeterminateOperationStateExceptionTest$SilentOperation{serviceName='null', identityHash=810831927, partitionId=-1, replicaIndex=0, callId=4, invocationTime=1660219538836 (2022-08-11 12:05:38.836), waitTimeout=-1, callTimeout=60000, tenantControl=com.hazelcast.spi.impl.tenantcontrol.NoopTenantControl@0}
	at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.returnOrThrowWithGetConventions(InvocationFuture.java:121)
	at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveAndThrowIfException(InvocationFuture.java:100)
	at com.hazelcast.spi.impl.AbstractInvocationFuture.get(AbstractInvocationFuture.java:609)
	at com.hazelcast.internal.cluster.impl.BasicClusterStateTest.pendingInvocations_shouldBeNotified_whenMemberLeft_whenClusterState_doesNotAllowJoin(BasicClusterStateTest.java:428)
	at com.hazelcast.internal.cluster.impl.BasicClusterStateTest.pendingInvocations_shouldBeNotified_whenMemberLeft_whenClusterState_PASSIVE(BasicClusterStateTest.java:406)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at com.hazelcast.test.FailOnTimeoutStatement$CallableStatement.call(FailOnTimeoutStatement.java:115)
	at com.hazelcast.test.FailOnTimeoutStatement$CallableStatement.call(FailOnTimeoutStatement.java:107)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Cluster is in PASSIVE state! Operation: com.hazelcast.partition.IndeterminateOperationStateExceptionTest$SilentOperation{serviceName='null', identityHash=810831927, partitionId=-1, replicaIndex=0, callId=4, invocationTime=1660219538836 (2022-08-11 12:05:38.836), waitTimeout=-1, callTimeout=60000, tenantControl=com.hazelcast.spi.impl.tenantcontrol.NoopTenantControl@0}
	at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.checkNodeState(OperationRunnerImpl.java:332)
	at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.metWithPreconditions(OperationRunnerImpl.java:221)
	at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:249)
	at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:479)
	at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.process(OperationThread.java:197)
	at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.process(OperationThread.java:137)
	at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.executeRun(OperationThread.java:123)
	at com.hazelcast.internal.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:102)
	at ------ submitted from ------.()
	at com.hazelcast.internal.util.ExceptionUtil.cloneExceptionWithFixedAsyncStackTrace(ExceptionUtil.java:337)
	at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.returnOrThrowWithGetConventions(InvocationFuture.java:112)
	... 16 more

[INFO]
[ERROR] Tests run: 50691, Failures: 1, Errors: 0, Skipped: 238
[INFO]

[ERROR] There are test failures.

@TomaszGaweda
Copy link
Contributor Author

run-lab-run

@hz-devops-test
Copy link

The job Hazelcast-pr-builder of your PR failed. (Hazelcast internal details: build log, artifacts).
Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log file
--------------------------
-------TEST FAILURE-------
--------------------------
[INFO] Results:
[INFO] 
[ERROR] Failures: 
[ERROR]   DynamicConfigYamlGeneratorTest.testIfSensitiveDataIsMasked_whenMaskingEnabled:102 expected:<[Hazelcast]> but was:<[****]>
[INFO] 
[ERROR] Tests run: 50691, Failures: 1, Errors: 0, Skipped: 238
[INFO] 

[ERROR] There are test failures.

@TomaszGaweda
Copy link
Contributor Author

run-lab-run

@TomaszGaweda
Copy link
Contributor Author

Can I not hit another flaky for once? :|

@hz-devops-test
Copy link

The job Hazelcast-pr-builder of your PR failed. (Hazelcast internal details: build log, artifacts).
Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log file
--------------------------
---------SUMMARY----------
--------------------------
[ERROR] Failed to execute goal org.codehaus.mojo:license-maven-plugin:2.0.0:add-third-party (add-third-party) on project hazelcast-jet-kinesis: could not init goal AddThirdPartyMojo for reason : null: ConcurrentModificationException -> [Help 1]
--------------------------

@frant-hartm
Copy link
Collaborator

run-lab-run

@TomaszGaweda
Copy link
Contributor Author

run-nightly-tests

@hz-devops-test
Copy link

The job Hazelcast-pr-builder of your PR failed. (Hazelcast internal details: build log, artifacts).
Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log file
--------------------------
-------TEST FAILURE-------
--------------------------
[INFO] Results:
[INFO] 
[ERROR] Failures: 
[ERROR]   DynamicConfigYamlGeneratorTest.testIfSensitiveDataIsMasked_whenMaskingEnabled:102 expected:<[Hazelcast]> but was:<[****]>
[INFO] 
[ERROR] Tests run: 50722, Failures: 1, Errors: 0, Skipped: 238
[INFO] 

[ERROR] There are test failures.

@TomaszGaweda
Copy link
Contributor Author

run-lab-run

@hz-devops-test
Copy link

The job Hazelcast-pr-builder of your PR failed. (Hazelcast internal details: build log, artifacts).
Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log file
--------------------------
-------TEST FAILURE-------
--------------------------
[INFO] Results:
[INFO] 
[ERROR] Failures: 
[ERROR]   DynamicConfigYamlGeneratorTest.testIfSensitiveDataIsMasked_whenMaskingEnabled:102 expected:<[Hazelcast]> but was:<[****]>
[INFO] 
[ERROR] Tests run: 50722, Failures: 1, Errors: 0, Skipped: 238
[INFO] 

[ERROR] There are test failures.

@hz-devops-test
Copy link

The job Hazelcast-pr-builder of your PR failed. (Hazelcast internal details: build log, artifacts).
Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log file
--------------------------
-------TEST FAILURE-------
--------------------------
[INFO] Results:
[INFO] 
[ERROR] Failures: 
[ERROR]   ClientMapPartitionLostListenerTest.test_mapPartitionLostListener_invoked_fromOtherNode:133->assertProxyExistsEventually:163->HazelcastTestSupport.assertTrueEventually:1338->HazelcastTestSupport.assertTrueEventually:1236 There is no proxy with name 54bbab02-505b-41e6-8645-f04821f279f5 created (yet)
[INFO] 
[ERROR] Tests run: 50722, Failures: 1, Errors: 0, Skipped: 238
[INFO] 

[ERROR] There are test failures.

@frant-hartm
Copy link
Collaborator

run-lab-run

@AyberkSorgun AyberkSorgun removed the request for review from viliam-durina August 15, 2022 10:18
@AyberkSorgun AyberkSorgun dismissed viliam-durina’s stale review August 15, 2022 10:19

Removing Viliam per his request

@hz-devops-test
Copy link

The job Hazelcast-pr-builder of your PR failed. (Hazelcast internal details: build log, artifacts).
Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log file
--------------------------
-------TEST FAILURE-------
--------------------------
[INFO] Results:
[INFO] 
[ERROR] Failures: 
[ERROR]   QueueDestroyTest.checkStatsMapEntryRemovedWhenQueueDestroyed:78 expected:<0> but was:<1>
[ERROR]   JobLifecycleMetricsTest.after:71 Close called without init being called on all the nodes. Init count: 0 node count: 2
[INFO] 
[ERROR] Tests run: 50722, Failures: 2, Errors: 0, Skipped: 238
[INFO] 

[ERROR] There are test failures.

@frant-hartm
Copy link
Collaborator

The following failure seems related to this PR

JobLifecycleMetricsTest.after:71 Close called without init being called on all the nodes. Init count: 0 node count: 2

@frant-hartm
Copy link
Collaborator

The JobLifecycleMetricsTest is run with HazelcastParallelClassRunner, but uses the TestProcessors which should be only used with serial runner. The parallel runner can explain the failure, but I haven't managed to reproduce it.

I have reviewed all the tests using the TestProcessors and sent a separate PR because the issue is on master as well. #21976

@frant-hartm
Copy link
Collaborator

run-lab-run

@frant-hartm
Copy link
Collaborator

run-ee-compile

@frant-hartm
Copy link
Collaborator

run-lts-compilers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect handling of blocking calls in ProcessorSupplier.init() [HZ-1204]
6 participants