Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReplicatedMapProxy - refactored check and retry initialization logic ... #14331

Merged
merged 3 commits into from
May 12, 2020

Conversation

netudima
Copy link
Contributor

…to do it in parallel for different partitions. It is much faster compared to the original way - try to sync a partition and sleep.
Fixes #14330

@mmedenjak mmedenjak added this to the 3.12 milestone Feb 8, 2019
@mmedenjak mmedenjak modified the milestones: 3.12, 3.13 Mar 25, 2019
@mmedenjak mmedenjak modified the milestones: 3.13, 4.0 Apr 17, 2019
@mmedenjak mmedenjak modified the milestones: 4.0, 4.1 Dec 24, 2019
@hazelcast hazelcast deleted a comment from devOpsHazelcast Feb 28, 2020
@hazelcast hazelcast deleted a comment from devOpsHazelcast Feb 28, 2020
@hazelcast hazelcast deleted a comment from devOpsHazelcast Feb 28, 2020
@hazelcast hazelcast deleted a comment from devOpsHazelcast Feb 28, 2020
@hazelcast hazelcast deleted a comment from devOpsHazelcast Feb 28, 2020
@hazelcast hazelcast deleted a comment from devOpsHazelcast Feb 28, 2020
@hazelcast hazelcast deleted a comment from devOpsHazelcast Feb 28, 2020
@hazelcast hazelcast deleted a comment from devOpsHazelcast Feb 28, 2020
@hazelcast hazelcast deleted a comment from devOpsHazelcast Feb 28, 2020
@hazelcast hazelcast deleted a comment from devOpsHazelcast Feb 28, 2020
@hazelcast hazelcast deleted a comment from devOpsHazelcast Feb 28, 2020
@hazelcast hazelcast deleted a comment from devOpsHazelcast Mar 27, 2020
@hazelcast hazelcast deleted a comment from devOpsHazelcast Mar 27, 2020
@hazelcast hazelcast deleted a comment from devOpsHazelcast Mar 27, 2020
@devOpsHazelcast
Copy link
Collaborator

devOpsHazelcast commented Apr 5, 2020

CLA assistant check
All committers have signed the CLA.

@hazelcast hazelcast deleted a comment from devOpsHazelcast Apr 6, 2020
@mmedenjak mmedenjak self-requested a review April 6, 2020 10:28
@mmedenjak
Copy link
Contributor

mmedenjak commented Apr 14, 2020

Hi @netudima ! After 1.5 years, we finally managed to take a look at the PR. To be honest, we haven't been good stewards of community PRs but things are changing around here.

I believe your PR makes perfect sense but there are a couple of small improvements I'd add here. First off, I think you can use BitSet instead of LinkedHashSet for the partition ID storage.

Secondly, since we're now adding support for many partitions (20k, 50k), I think we need to "throttle" loading a bit so maybe just add some local variable that will keep track of how many concurrent loading processes there are. For now, let's limit it to something like 100 - we can always change that number later.

And lastly, nitpicking - can you replace the while(true) with something like while (!nonLoadedStores.isEmpty()) to simplify the code?

If you're unable to continue with the PR since a lot of time has passed, we understand. I'll make sure we adopt the PR and continue with it and also mention you in the commit as the original author.

@mmedenjak mmedenjak removed their request for review April 14, 2020 08:29
@netudima
Copy link
Contributor Author

Hi @mmedenjak, thank you for the feedback :-). I will take a look and try to address your comments during 1-2 weeks.

@mmedenjak
Copy link
Contributor

mmedenjak commented Apr 17, 2020

Looks like the issue is checkstyle:
10:51:01 [ERROR] /home/jenkins/jenkins_slave/workspace/Hazelcast-pr-builder_2/hazelcast/src/main/java/com/hazelcast/replicatedmap/impl/ReplicatedMapProxy.java:79:1: Class Fan-Out Complexity is 42 (max allowed is 40). [ClassFanOutComplexity]

If the issue persists after rebasing to latest master, I think you can ignore it, like here.

…to do it in parallel for different partitions. It is much faster compared to the original way - try to sync a partition and sleep.
@netudima
Copy link
Contributor Author

@mmedenjak, I have updated the PR based on the comments. Regarding while(true) - I have not found a way to simplify the logic without adding an unnecessary sleep for the case when the 1st iteration is successful for all partitions...

@mmedenjak mmedenjak self-requested a review April 27, 2020 13:35
@hazelcast hazelcast deleted a comment May 4, 2020
Copy link
Contributor

@mmedenjak mmedenjak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, added one minor observation and one suggested change. Thank you for rebasing the fix!

int partitionCount = nodeEngine.getPartitionService().getPartitionCount();
BitSet nonLoadedStores = new BitSet(partitionCount);
int[] retryCount = new int[partitionCount];
for (int i = 0; i < partitionCount; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: you can use nonLoadedStores.set(0, partitionCount); instead of looping.

Copy link
Contributor Author

@netudima netudima May 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think even better option is possible :-) - I can rename the set to loadedStores and mark a bit as set when a partition loading is completed. In this case the init logic is not required.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried doing something similar but beware in that the nextClearBit will never return -1 but might instead return a number which is even out of the partition ID range. So if all partitions are loaded, loadedPartitions.nextClearBit() will return partitionCount + 1

@mmedenjak mmedenjak requested a review from tkountis May 4, 2020 12:33
@tkountis
Copy link
Contributor

tkountis commented May 5, 2020

@netudima thanks for your PR, much appreciated.
Do you mind including a test-case that demonstrates the fix, similar to the one you have in the linked issue.

@netudima
Copy link
Contributor Author

netudima commented May 6, 2020

It is a bit tricky, because the code from the issue shows a difference in a loading time, not in a functional behaviour. Maybe it can be tested using a test with timeout (but it can be unreliable when a test is running on a different hardware). Do you have some examples of unit tests with timeouts in Hazelcast codebase?

@tkountis
Copy link
Contributor

@netudima I agree with you, i couldn't think of a solid test that wouldn't cause more pain to maintain.

@mmedenjak mmedenjak merged commit 0045caf into hazelcast:master May 12, 2020
@mmedenjak
Copy link
Contributor

And after several years, we managed to get you to make a contribution. Thank you and here's a medal 🥇 cc @Holmistr

If you want, you can pick another issue. I can say right now everyone's a bit busy but we try to get community PRs merged faster these days.

@netudima
Copy link
Contributor Author

🥳, thank you a lot for your help

webashutosh pushed a commit to webashutosh/hazelcast that referenced this pull request Jun 29, 2020
…... (hazelcast#14331)

ReplicatedMapProxy - refactored check and retry initialization logic to do it in parallel for different partitions. It is much faster compared to the original way - try to sync a partition and sleep.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] ReplicatedMap initialization may stuck for a long time in case of async-fillup=false
4 participants