-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed missing EventService registrations after cluster members startup #16020
Fixed missing EventService registrations after cluster members startup #16020
Conversation
b572d56
to
a66316a
Compare
hazelcast/src/main/java/com/hazelcast/internal/cluster/impl/operations/OnJoinOp.java
Show resolved
Hide resolved
run-lab-run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep in mind that this needs to be backported (if possible) and that the current fix is not compatible with RU or patch-level guarantees.
@@ -148,12 +148,10 @@ private void sendPostJoinOperations() { | |||
final OperationService operationService = nodeEngine.getOperationService(); | |||
final Collection<Member> members = clusterService.getMembers(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: members
are no longer needed.
You know, I can smell bugs even with this solution. For instance, a member joins a stable cluster and prepares to send the I guess I could conjure up some other scenarios, given enough time. But honestly, I don't think we need to solve this completely as this sounds like the atomic broadcast problem and I don't think it's solvable with our AP-style membership protocol without venturing into CP-land. You can try finding a solution for the patch release but if there is none, we can just say it's an inherent design issue which is unsolvable due to minor and patch level guarantees, has been solved in 4.0 and that if it's an issue, users can insert an artificial delay between joining members (as they have already been instructed). |
Regarding 3.12, yes, this fix is not going to work with RU. It may even make things worse if joining member is upgraded, but master is not. In this case, master is not going to broadcast the registrations as well as joining member. For this scenario we can keep old logic in combination with the new one. Yes, we will broadcast more events and there will be duplicates (AFAIU they are already handled properly), but in this case we will have more guarantees at least when the master is stable. WDYT, guys? |
Yes, I wanted to suggest sending the operation on multiple occasions (e.g. a blunt version might send the operation again on every member added event) but I was unsure if the operations were idempotent. |
Fixed a race condition between new cluster member join and post join operations executed as part of concurrent member join. Send post operations directly to master from joining member and it in turn broadcasts them to all other members of the cluster. This way master guarantees that all post join operations are executed on all members of the cluster. Fixes: hazelcast#15950
409ae3c
to
2414092
Compare
Guys, thanks for the review, I am merging the PR. |
Fixed a race condition between new cluster member join and post join
operations executed as part of concurrent member join.
Send post operations directly to master from joining member and it in
turn broadcasts them to all other members of the cluster. This way
master guarantees that all post join operations are executed on all
members of the cluster.
Fixes: #15950