[HZ-1013 HZ-1059 HZ-1051] Fix hanging cluster safe query from common pool #21145

vbekiaris · 2022-04-04T11:46:15Z

When all FJP#commonPool threads are busy querying isClusterSafe
(eg seems this can be the case when querying via PartitionService MBean)
and partition assignments are not in sync (eg during initial
partition arrangement), then there is no chance for an important
callback to be executed after PartitionBackupReplicaAntiEntropyOperation
is done, resulting in neither partition replica sync nor cluster-safe
query being able to make any progress.
The fix is to use the Hazelcast internal async executor (instead of
the common pool) for the callback that processes replica anti-entropy
operation result.

Fixes #19672
Fixes #18286
Fixes #19665

Checklist:

Send backports/forwardports if fix needs to be applied to past/future releases

edit: see also #19672 (comment) on how this issue might occur

When all FJP#commonPool threads are busy querying isClusterSafe and partition assignments are not in sync (eg during initial partition arrangement), then there is no chance for an important callback to be executed after PartitionBackupReplicaAntiEntropyOperation is done, resulting in neither partition replica sync nor cluster-safe query being able to make any progress. The fix is to use the Hazelcast internal async executor (instead of the common pool) for the callback that processes replica antientropy operation result.

ahmetmircik

good finding 👏

hz-devops-test · 2022-04-12T14:31:41Z

The job Hazelcast-pr-EE-compiler of your PR failed. (Hazelcast internal details: build log, artifacts).
Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log file

--------------------------
---------SUMMARY----------
--------------------------
[ERROR] COMPILATION ERROR : 
--------------------------
[ERROR] /home/jenkins/jenkins_slave/workspace/Hazelcast-pr-EE-compiler_2/hazelcast-enterprise/hazelcast-enterprise/src/main/java/com/hazelcast/internal/nio/ssl/MemberTLSChannelInitializer.java:[32,37] error: incompatible types: InboundHandler[] cannot be converted to InboundHandler
--------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:compile (default-compile) on project hazelcast-enterprise: Compilation failure
--------------------------
---------ERRORS-----------
--------------------------
[ERROR] /home/jenkins/jenkins_slave/workspace/Hazelcast-pr-EE-compiler_2/hazelcast-enterprise/hazelcast-enterprise/src/main/java/com/hazelcast/internal/nio/ssl/MemberTLSChannelInitializer.java:[32,37] error: incompatible types: InboundHandler[] cannot be converted to InboundHandler
--------------------------
[ERROR] /home/jenkins/jenkins_slave/workspace/Hazelcast-pr-EE-compiler_2/hazelcast-enterprise/hazelcast-enterprise/src/main/java/com/hazelcast/internal/nio/ssl/MemberTLSChannelInitializer.java:[32,37] error: incompatible types: InboundHandler[] cannot be converted to InboundHandler
--------------------------

vbekiaris added Type: Defect Team: Core Source: Internal PR or issue was opened by an employee Module: Partitioning Add to Release Notes labels Apr 4, 2022

vbekiaris added this to the 5.2 milestone Apr 4, 2022

vbekiaris requested review from ahmetmircik and ramizdundar April 4, 2022 11:46

vbekiaris changed the title ~~Fix hanging cluster safe query from common pool~~ [HZ-1013] Fix hanging cluster safe query from common pool Apr 4, 2022

ahmetmircik approved these changes Apr 4, 2022

View reviewed changes

ramizdundar approved these changes Apr 6, 2022

View reviewed changes

vbekiaris merged commit 434d731 into hazelcast:master Apr 8, 2022

This was referenced Apr 12, 2022

Fix hanging cluster safe query from common pool #21205

Merged

Fix hanging cluster safe query from common pool #21206

Merged

Fix hanging cluster safe query from common pool #21207

Merged

Fix hanging cluster safe query from common pool #21208

Merged

AyberkSorgun changed the title ~~[HZ-1013] Fix hanging cluster safe query from common pool~~ [HZ-1013 HZ-1059] Fix hanging cluster safe query from common pool May 9, 2022

AyberkSorgun changed the title ~~[HZ-1013 HZ-1059] Fix hanging cluster safe query from common pool~~ [HZ-1013 HZ-1059 HZ-1051] Fix hanging cluster safe query from common pool May 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HZ-1013 HZ-1059 HZ-1051] Fix hanging cluster safe query from common pool #21145

[HZ-1013 HZ-1059 HZ-1051] Fix hanging cluster safe query from common pool #21145

vbekiaris commented Apr 4, 2022 •

edited

ahmetmircik left a comment

hz-devops-test commented Apr 12, 2022

[HZ-1013 HZ-1059 HZ-1051] Fix hanging cluster safe query from common pool #21145

[HZ-1013 HZ-1059 HZ-1051] Fix hanging cluster safe query from common pool #21145

Conversation

vbekiaris commented Apr 4, 2022 • edited

ahmetmircik left a comment

Choose a reason for hiding this comment

hz-devops-test commented Apr 12, 2022

vbekiaris commented Apr 4, 2022 •

edited