ReplicatedMap updates in 3.6 takes very long time #7617

sabag · 2016-02-28T09:26:41Z

in version 3.6 there is a massive ,IMO, regression in the ReplicatedMap updates.
sometimes the update loses some of the entries to update.
here is a test i prepared that demonstrate the problem :

hazelcast.xml :

<hazelcast xsi:schemaLocation="http://www.hazelcast.com/schema/config https://hazelcast.com/schema/config/hazelcast-config-3.6.xsd"
           xmlns="http://www.hazelcast.com/schema/config"
           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <group>
        <name>mygroup</name>
        <password>admin</password>
    </group>
    <management-center enabled="false">http://localhost:8080/mancenter</management-center>
    <instance-name>BOTH_LAYERS</instance-name>
    <network>
        <port auto-increment="true">5711</port>
        <join>
            <multicast enabled="true">
                <multicast-group>224.2.2.3</multicast-group>
                <multicast-port>54327</multicast-port>
                <multicast-timeout-seconds>4</multicast-timeout-seconds>
            </multicast>
            <tcp-ip enabled="false"/>
            <aws enabled="false"/>
        </join>
    </network>
    <replicatedmap name="configurationSetCache">
        <statistics-enabled>true</statistics-enabled>
    </replicatedmap>
</hazelcast>

ReplicatedMapIssue.java :

package test;

import com.hazelcast.core.Hazelcast;
import com.hazelcast.core.HazelcastInstance;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.Serializable;
import java.util.HashMap;
import java.util.Map;

public class ReplicatedMapIssue implements Serializable{

    public static final int NEW_ENTRIES_COUNT = 7;

    private Map<Integer,DummyObject> replicatedMap;

    private BufferedReader stdin;


    public void test() {

        HazelcastInstance hz = Hazelcast.newHazelcastInstance();
        replicatedMap = hz.getReplicatedMap("configurationSetCache");


        try {
            boolean exit = false;
            stdin = new BufferedReader(new InputStreamReader(System.in));
            do {
                System.out.print(
                        "1.replicated map test\n" +
                        "0.exit\n");
                System.out.print("choose: ");
                String line = stdin.readLine();

                switch(Integer.parseInt(line)) {
                    case 1: {
                        runReplicatedMapTest();
                        break;
                    }
                    case 0: {
                        exit = true;
                        break;
                    }
                }
            } while (!exit) ;
            hz.shutdown();
        } catch (IOException e) {
            e.printStackTrace();
        }

    }

    private void runReplicatedMapTest() {
        Map<Integer,DummyObject> map = new HashMap<>();
        for(int i=0;i < NEW_ENTRIES_COUNT;i++){
            map.put((int) (Math.random() * 100), new DummyObject());
        }
        replicatedMap.clear();
        replicatedMap.putAll(map);
//        for(Map.Entry<Integer,DummyObject> e:map.entrySet()) replicatedMap.put(e.getKey(),e.getValue());

        long start = System.currentTimeMillis();
        do {
            try {
                Thread.sleep(1000);
            } catch (InterruptedException ignored) {}
        }while(replicatedMap.size() != NEW_ENTRIES_COUNT);
        long finish = System.currentTimeMillis();
        System.out.format("replicated map (size %d), took %d ms and contains the following values: %s \n", replicatedMap.size(), (finish - start), replicatedMap.values());
    }


    public static void main(String[] args) {
        ReplicatedMapIssue issue = new ReplicatedMapIssue();
        issue.test();
        System.out.println("hazelcast stopped");
        System.exit(0);
    }
}

DummyObject.java :

package test;

import java.io.Serializable;
import java.util.UUID;

public class DummyObject implements Serializable {
    private String a;

    public DummyObject() {
        a = UUID.randomUUID().toString();
    }

    @Override
    public String toString() {
        return a.substring(0,8);
    }
}

using java 8 and hazelcast 3.6

reproduce steps:

run the main
on the menu click 1 few times one by one
--> after some times either the update will take more than a minute or the program will simply freeze (its because the map contains only 6 entries and the seventh entry will never get updated.

in 3.5.5, the scenario is much better, but around 20% of the times, also the seventh entry is lost.

The text was updated successfully, but these errors were encountered:

vertex-github · 2016-02-28T19:07:30Z

I think your test is flawed - you're iterating NEW_ENTRIES_COUNT times, but you will likely not be adding that many entries to the Map<> since you will likely have collisions on the random number generated. Based on tests I just ran, I frequently got a key collision even with only 7 iterations. Check the value from the map.put() - if it's non-zero then you have a key collision.

I don't mean to imply that there isn't an issue but the test as it stands doesn't prove a defect. Are you also aware of the consistency model of ReplicatedMaps? See http://docs.hazelcast.org/docs/3.6/manual/html-single/index.html#replicated-map

sabag · 2016-02-29T09:49:28Z

Hi,
You are correct about the uniqueness, I fixed the test to random up to Integer.MAX_VALUE and I also added a check for unique keys.

    private void runReplicatedMapTest() {
        Map<Integer,DummyObject> map = new HashMap<>();
        for(int i=0;i < NEW_ENTRIES_COUNT;i++){
            map.put((int) (Math.random() * Integer.MAX_VALUE), new DummyObject());
        }

        // check uniqueness of keys
        if( map.keySet().size() != NEW_ENTRIES_COUNT ){
            System.out.println("keys are not unique, skipping..");
            return;
        }

        replicatedMap.clear();
        replicatedMap.putAll(map);
 ...
 ...

but still when running 3 nodes cluster (locally), it takes about 18 seconds (in average) for the cluster to finish the update, Are these the expected time measures for updating ReplicatedMap ?
For a comparison, in 3.5.5 it did not take more than 1 second.
Please look at the output from running several times:

replicated map (size 7), took 2009 ms and contains the following values: [08ea6176, baf9225b, 41a25576, a8ebee83, 470529db, 20969b6d, 63980451] 
replicated map (size 7), took 1001 ms and contains the following values: [b72aeaa8, 9f20d358, 5b650d38, 6b1188bd, 5a00514d, caa03cae, 77e2ede1] 
replicated map (size 7), took 5020 ms and contains the following values: [3965f6e1, f04474a2, a9a5c549, 32392473, 1de8b6a8, 80ef32af, ce732c27] 
replicated map (size 7), took 7027 ms and contains the following values: [e2950a79, 99f9bf39, 528a3584, 20092363, 0a800126, 4c3a6b20, f30b7ad5] 
replicated map (size 7), took 16061 ms and contains the following values: [97f3079b, def5915b, 7ba3a9db, 2d73ac31, 6943b57f, 05280418, fe8d0bdd] 
replicated map (size 7), took 17076 ms and contains the following values: [552dd2cb, 43da15c7, 9e82a40c, 7ee63201, c26b0bb2, 788dd421, 8207a026] 
replicated map (size 7), took 19069 ms and contains the following values: [3a4fe678, 7fc1f9c1, 99c56b00, 42e9c2d3, 5245494e, e1ec4b23, 86f5edbe] 
replicated map (size 7), took 22083 ms and contains the following values: [e709eeee, 0b89d542, 6f49bf6f, 778b9d36, 450f2696, 3fca5b67, 3fbb8abe] 
replicated map (size 7), took 27108 ms and contains the following values: [8463d3bc, 8ba8cefb, 7cb38a42, 7ab6fbca, ff41d832, b2278841, bf9bd5be] 
replicated map (size 7), took 28076 ms and contains the following values: [1299319b, dcbb605b, 506349cd, 4fc64f4f, fc6e7552, c8420c8a, a1c9a9ae] 
replicated map (size 7), took 28095 ms and contains the following values: [22622ceb, ea7bcba5, 2ef6efef, cffbbbf0, aa80cdf1, 5965a2c9, 93c35006] 
replicated map (size 7), took 28100 ms and contains the following values: [3a4fe678, 7fc1f9c1, 99c56b00, 42e9c2d3, 5245494e, e1ec4b23, 86f5edbe] 
replicated map (size 7), took 30128 ms and contains the following values: [08ea6176, baf9225b, 41a25576, a8ebee83, 470529db, 20969b6d, 63980451] 
replicated map (size 7), took 22116 ms and contains the following values: [efa33f9e, 83f6e4ba, 5d186404, afdb8e51, b81b34ca, 50905773, 74a4764b]

eminn · 2016-02-29T11:23:34Z

It looks like an issue with putAll implementation of replicated map. I managed to reproduce the issue with the help of your test. The fix is on the way.

As a workaround until the fix is available, you can use put instead of putAll.

Thanks you for reporting the issue

sabag · 2016-02-29T11:45:14Z

No, Thank you !!
:-)

we rolled back to 3.5.5, but in your opinion, if we replace the putAll with put, are we safe with 3.6, or you think its better to stay on 3.5.5 ?

eminn · 2016-02-29T12:54:19Z

I think it is better to update 3.6 because we've made a lot of improvements to replicated map.

eminn · 2016-02-29T13:00:41Z

The fix will be available on 3.6.2 and 3.7

sabag · 2016-02-29T13:04:22Z

Thank you so much

vertex-github · 2016-02-29T13:29:31Z

@eminn I just took a look at the PR - what was causing the delay: https://github.com/hazelcast/hazelcast/pull/7626/files - that equality check seems innocuous enough - are there DNS lookups taking place?

eminn · 2016-02-29T13:39:13Z

@vertex-github that was causing write operations to not to replicate to the callers. There is a anti-entropy system which checks whether data owners are in sync with the replicas, if not syncs them every 30 seconds. So when the updates are not replicated to callers, it takes up to 30 seconds make all the members sync.

vertex-github · 2016-02-29T13:46:18Z

Thanks. Is that 30s parameter configurable?

eminn · 2016-02-29T13:47:43Z

No, that's an internal right now. But we can make it configureable for the next versions.

sabag changed the title ~~ReplicatedMap updates sometimes takes more than minute~~ ReplicatedMap updates in 3.6 are sometimes lost Feb 28, 2016

jerrinot added Type: Defect Team: Core labels Feb 29, 2016

jerrinot added this to the 3.7 milestone Feb 29, 2016

sabag changed the title ~~ReplicatedMap updates in 3.6 are sometimes lost~~ ReplicatedMap updates in 3.6 takes very long time Feb 29, 2016

This was referenced Feb 29, 2016

Fixed broken replication logic in putAll operation #7625

Merged

[Backport] Fixed broken replication logic in putAll operation #7626

Merged

eminn closed this as completed in #7625 Mar 1, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ReplicatedMap updates in 3.6 takes very long time #7617

ReplicatedMap updates in 3.6 takes very long time #7617

sabag commented Feb 28, 2016

vertex-github commented Feb 28, 2016

sabag commented Feb 29, 2016

eminn commented Feb 29, 2016

sabag commented Feb 29, 2016

eminn commented Feb 29, 2016

eminn commented Feb 29, 2016

sabag commented Feb 29, 2016

vertex-github commented Feb 29, 2016

eminn commented Feb 29, 2016

vertex-github commented Feb 29, 2016

eminn commented Feb 29, 2016

ReplicatedMap updates in 3.6 takes very long time #7617

ReplicatedMap updates in 3.6 takes very long time #7617

Comments

sabag commented Feb 28, 2016

vertex-github commented Feb 28, 2016

sabag commented Feb 29, 2016

eminn commented Feb 29, 2016

sabag commented Feb 29, 2016

eminn commented Feb 29, 2016

eminn commented Feb 29, 2016

sabag commented Feb 29, 2016

vertex-github commented Feb 29, 2016

eminn commented Feb 29, 2016

vertex-github commented Feb 29, 2016

eminn commented Feb 29, 2016