Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented split-brain healing for ISet and IList #11677

Merged
merged 1 commit into from Jan 23, 2018
Merged

Implemented split-brain healing for ISet and IList #11677

merged 1 commit into from Jan 23, 2018

Conversation

Donnerbart
Copy link
Contributor

@Donnerbart Donnerbart commented Oct 27, 2017

Depends on #11788
Depends on #12061

@Donnerbart Donnerbart self-assigned this Oct 27, 2017
@Donnerbart Donnerbart changed the title [WIP] Prototype of split-brain healing for ISet [WIP] Prototype of split-brain healing for ISet and IList Nov 7, 2017
@Donnerbart Donnerbart changed the title [WIP] Prototype of split-brain healing for ISet and IList Implemented split-brain healing for ISet and IList Dec 12, 2017
@Donnerbart Donnerbart added this to the 3.10 milestone Dec 13, 2017
String name = container.getName();
int batchSize = container.getConfig().getMergePolicyConfig().getBatchSize();
SplitBrainMergePolicy mergePolicy = getMergePolicy(container);
Data partitionAwareName = nodeEngine.getSerializationService().toData(name, StringPartitioningStrategy.INSTANCE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it better to group them by partition (during prepareMergeRunnable() thus you avoid computing (serialization cost) the partition twice, there & here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good one, fixed this in the other PR for IQueue as well.

Copy link
Contributor

@mmedenjak mmedenjak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR looks very nice, except for two comments. Maybe we can merge and address them in a separate PR. WDYT?

private String quorumName;
private MergePolicyConfig mergePolicyConfig = new MergePolicyConfig();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the new mergePolicy config property covered by the client protocol for dynamically adding config? Is this in a different PR?
(see: AddListConfigMessageTask, AddSetConfigMessageTask, ClientDynamicClusterConfig#addSetConfig, ClientDynamicClusterConfig#addListConfig)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea how this works (both, never did anything with the Client Protocol and that part of the dynamic config is also new to me). But we have a similar open issue for other configs (#12126) and Thomas mentioned in his PR, that we anyway need to merge this first, before the Client Protocol can be modified. I would say we'll merge all server-side implementations and then add the merge policies to the Client Protocol in a single PR. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically we need to allow adding new configuration from the client when the cluster is running. To allow specifying the merge policy, the client protocol needs to be expanded, e.g. https://github.com/mmedenjak/hazelcast-client-protocol/blob/master/hazelcast/src/main/java/com/hazelcast/client/impl/protocol/template/DynamicConfigTemplate.java#L118-L135

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging this PR and addressing it later in a separate PR is fine by me, BTW.

mergePolicy.setSerializationService(nodeEngine.getSerializationService());

// try to find an existing item with the same value
CollectionItem existingItem = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's not a clear cut decision but I'm thinking if we should find the existing entry by value or by ID. E.g. how should we behave if we are merging these two lists: [1,1,1,2,3,1] and [2,1,1,3,2,1]. Right now all of the 1s in the list from the smaller cluster will be merged with the same entry in the bigger cluster. Maybe switching to merging entries by ID would produce better results. The same comment was on ringbuffer PR AFAIK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with the ID is, that it's completely internal and unrelated to the value. So if we add the same values to a Set in both sub-clusters, but the operations will be handled in a different order, the sets will contain the same items, but with different internal IDs. I don't think merging by those are a good idea. We should merge by what the user knows about the data structure (the contents, not the implementation details).

Copy link
Contributor

@mmedenjak mmedenjak Jan 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are actually good points on both approaches. The downside to merging entries with the same ID is, as you say, that the items may just be ordered differently.
The downside to merging entries that are equal are:

  • in the degenerative case when the collection from the smaller cluster contains the same entry multiple times, it will be merged with the same entry in the bigger cluster. For instance, if the list in the bigger cluster is [1,2,3,4,5] and the list from the smaller cluster is [1,1,1,1], all of entries from the smaller cluster will be merged with the same entry in the bigger cluster
  • there is also the unusual case when the merge policy returns a different value which is then equal to some value further down in the merging collection. For instance, let's say there's a silly policy that returns the value+1 and you're merging [1,2,3,4,5] (bigger cluster) with [1,2,3,4,5] (smaller cluster, merging list). First the two 1s are merged and produce [2,2,3,4,5] (bigger cluster) and [2,3,4,5] (smaller cluster, merging collection). Now the second entry is again equal to the already merged first entry in the bigger cluster and we get [3,2,3,4,5] (bigger cluster) and [3,4,5] (smaller cluster, merging collection). Finally, after the merging collection is exhausted we end up with [6,2,3,4,5].

Both approaches have their downsides and it boils down to what one would expect.
I'm thinking that we could merge this and open an issue to consider which approach is better and if needed fix it later. Is that ok?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your first example nothing would actually be changed, since the values are not different after the merging. I always check (newValue != null && !newValue.equals(oldValue)), so if the merge policy returns the same value, we just leave everything as it is. Of course it takes some time to process everything, but nothing bad happens.

But yes, I would merge this approach (since it works in its way) and see if there are any flaws later. This applies to any keyless data structure, so we eventually have to solve it in multiple data structures anyway.

@@ -233,10 +233,12 @@

<list name="default">
<backup-count>1</backup-count>
<merge-policy batch-size="100">PutIfAbsentMergePolicy</merge-policy>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add it to hazelcast-fullconfig.xml as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@Donnerbart Donnerbart merged commit 55f6755 into hazelcast:master Jan 23, 2018
@Donnerbart Donnerbart deleted the implementSplitBrainSet branch January 23, 2018 21:48
@mmedenjak mmedenjak added the Source: Internal PR or issue was opened by an employee label Apr 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Source: Internal PR or issue was opened by an employee Team: Core Type: Enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants