Implemented split-brain healing for ISet and IList #11677

Donnerbart · 2017-10-27T11:00:14Z

Depends on #11788
Depends on #12061

tkountis · 2018-01-15T16:31:14Z

hazelcast/src/main/java/com/hazelcast/collection/impl/collection/CollectionService.java

+                String name = container.getName();
+                int batchSize = container.getConfig().getMergePolicyConfig().getBatchSize();
+                SplitBrainMergePolicy mergePolicy = getMergePolicy(container);
+                Data partitionAwareName = nodeEngine.getSerializationService().toData(name, StringPartitioningStrategy.INSTANCE);


Isn't it better to group them by partition (during prepareMergeRunnable() thus you avoid computing (serialization cost) the partition twice, there & here.

Good one, fixed this in the other PR for IQueue as well.

mmedenjak

PR looks very nice, except for two comments. Maybe we can merge and address them in a separate PR. WDYT?

mmedenjak · 2018-01-22T13:35:58Z

hazelcast/src/main/java/com/hazelcast/config/CollectionConfig.java

    private String quorumName;
+    private MergePolicyConfig mergePolicyConfig = new MergePolicyConfig();


Is the new mergePolicy config property covered by the client protocol for dynamically adding config? Is this in a different PR?
(see: AddListConfigMessageTask, AddSetConfigMessageTask, ClientDynamicClusterConfig#addSetConfig, ClientDynamicClusterConfig#addListConfig)

I have no idea how this works (both, never did anything with the Client Protocol and that part of the dynamic config is also new to me). But we have a similar open issue for other configs (#12126) and Thomas mentioned in his PR, that we anyway need to merge this first, before the Client Protocol can be modified. I would say we'll merge all server-side implementations and then add the merge policies to the Client Protocol in a single PR. WDYT?

Basically we need to allow adding new configuration from the client when the cluster is running. To allow specifying the merge policy, the client protocol needs to be expanded, e.g. https://github.com/mmedenjak/hazelcast-client-protocol/blob/master/hazelcast/src/main/java/com/hazelcast/client/impl/protocol/template/DynamicConfigTemplate.java#L118-L135

Merging this PR and addressing it later in a separate PR is fine by me, BTW.

mmedenjak · 2018-01-22T13:42:26Z

hazelcast/src/main/java/com/hazelcast/collection/impl/collection/CollectionContainer.java

+        mergePolicy.setSerializationService(nodeEngine.getSerializationService());
+
+        // try to find an existing item with the same value
+        CollectionItem existingItem = null;


I know it's not a clear cut decision but I'm thinking if we should find the existing entry by value or by ID. E.g. how should we behave if we are merging these two lists: [1,1,1,2,3,1] and [2,1,1,3,2,1]. Right now all of the 1s in the list from the smaller cluster will be merged with the same entry in the bigger cluster. Maybe switching to merging entries by ID would produce better results. The same comment was on ringbuffer PR AFAIK.

The problem with the ID is, that it's completely internal and unrelated to the value. So if we add the same values to a Set in both sub-clusters, but the operations will be handled in a different order, the sets will contain the same items, but with different internal IDs. I don't think merging by those are a good idea. We should merge by what the user knows about the data structure (the contents, not the implementation details).

Yes, there are actually good points on both approaches. The downside to merging entries with the same ID is, as you say, that the items may just be ordered differently.
The downside to merging entries that are equal are:

in the degenerative case when the collection from the smaller cluster contains the same entry multiple times, it will be merged with the same entry in the bigger cluster. For instance, if the list in the bigger cluster is [1,2,3,4,5] and the list from the smaller cluster is [1,1,1,1], all of entries from the smaller cluster will be merged with the same entry in the bigger cluster

there is also the unusual case when the merge policy returns a different value which is then equal to some value further down in the merging collection. For instance, let's say there's a silly policy that returns the value+1 and you're merging [1,2,3,4,5] (bigger cluster) with [1,2,3,4,5] (smaller cluster, merging list). First the two 1s are merged and produce [2,2,3,4,5] (bigger cluster) and [2,3,4,5] (smaller cluster, merging collection). Now the second entry is again equal to the already merged first entry in the bigger cluster and we get [3,2,3,4,5] (bigger cluster) and [3,4,5] (smaller cluster, merging collection). Finally, after the merging collection is exhausted we end up with [6,2,3,4,5].

Both approaches have their downsides and it boils down to what one would expect.
I'm thinking that we could merge this and open an issue to consider which approach is better and if needed fix it later. Is that ok?

In your first example nothing would actually be changed, since the values are not different after the merging. I always check (newValue != null && !newValue.equals(oldValue)), so if the merge policy returns the same value, we just leave everything as it is. Of course it takes some time to process everything, but nothing bad happens.

But yes, I would merge this approach (since it works in its way) and see if there are any flaws later. This applies to any keyless data structure, so we eventually have to solve it in multiple data structures anyway.

mmedenjak · 2018-01-22T14:22:39Z

hazelcast/src/main/resources/hazelcast-default.xml

@@ -233,10 +233,12 @@

    <list name="default">
        <backup-count>1</backup-count>
+        <merge-policy batch-size="100">PutIfAbsentMergePolicy</merge-policy>


Can you add it to hazelcast-fullconfig.xml as well?

Donnerbart self-assigned this Oct 27, 2017

Donnerbart changed the title ~~[WIP] Prototype of split-brain healing for ISet~~ [WIP] Prototype of split-brain healing for ISet and IList Nov 7, 2017

Donnerbart mentioned this pull request Dec 12, 2017

Split-Brain Merge Policies for Additional Data Structures #11969

Open

Donnerbart changed the title ~~[WIP] Prototype of split-brain healing for ISet and IList~~ Implemented split-brain healing for ISet and IList Dec 12, 2017

Donnerbart mentioned this pull request Dec 12, 2017

Backup mismatch in SetSplitBrainTest and ListSplitBrainTest #11930

Closed

Donnerbart added Team: Core Type: Enhancement labels Dec 13, 2017

Donnerbart added this to the 3.10 milestone Dec 13, 2017

tkountis approved these changes Jan 16, 2018

View reviewed changes

mmedenjak reviewed Jan 22, 2018

View reviewed changes

taburet added Team: Core and removed Team: Core labels Jan 22, 2018

mmedenjak reviewed Jan 22, 2018

View reviewed changes

Implemented split-brain healing for ISet and IList

73b8a3b

mmedenjak approved these changes Jan 23, 2018

View reviewed changes

Donnerbart merged commit 55f6755 into hazelcast:master Jan 23, 2018

Donnerbart deleted the implementSplitBrainSet branch January 23, 2018 21:48

mmedenjak mentioned this pull request Jan 24, 2018

Consider split brain merging keyless data structures #12194

Closed

mmedenjak added the Source: Internal PR or issue was opened by an employee label Apr 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implemented split-brain healing for ISet and IList #11677

Implemented split-brain healing for ISet and IList #11677

Donnerbart commented Oct 27, 2017 •

edited

tkountis Jan 15, 2018

Donnerbart Jan 18, 2018

mmedenjak left a comment

mmedenjak Jan 22, 2018

Donnerbart Jan 22, 2018

mmedenjak Jan 23, 2018

mmedenjak Jan 23, 2018

mmedenjak Jan 22, 2018

Donnerbart Jan 22, 2018

mmedenjak Jan 23, 2018 •

edited

Donnerbart Jan 23, 2018

mmedenjak Jan 22, 2018

Donnerbart Jan 22, 2018

		private String quorumName;
		private MergePolicyConfig mergePolicyConfig = new MergePolicyConfig();

Implemented split-brain healing for ISet and IList #11677

Implemented split-brain healing for ISet and IList #11677

Conversation

Donnerbart commented Oct 27, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mmedenjak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mmedenjak Jan 23, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Donnerbart commented Oct 27, 2017 •

edited

mmedenjak Jan 23, 2018 •

edited