Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReplicatedMap updates in 3.6 takes very long time #7617

Closed
sabag opened this issue Feb 28, 2016 · 11 comments · Fixed by #7625
Closed

ReplicatedMap updates in 3.6 takes very long time #7617

sabag opened this issue Feb 28, 2016 · 11 comments · Fixed by #7625

Comments

@sabag
Copy link

sabag commented Feb 28, 2016

in version 3.6 there is a massive ,IMO, regression in the ReplicatedMap updates.
sometimes the update loses some of the entries to update.
here is a test i prepared that demonstrate the problem :

hazelcast.xml :

<hazelcast xsi:schemaLocation="http://www.hazelcast.com/schema/config https://hazelcast.com/schema/config/hazelcast-config-3.6.xsd"
           xmlns="http://www.hazelcast.com/schema/config"
           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <group>
        <name>mygroup</name>
        <password>admin</password>
    </group>
    <management-center enabled="false">http://localhost:8080/mancenter</management-center>
    <instance-name>BOTH_LAYERS</instance-name>
    <network>
        <port auto-increment="true">5711</port>
        <join>
            <multicast enabled="true">
                <multicast-group>224.2.2.3</multicast-group>
                <multicast-port>54327</multicast-port>
                <multicast-timeout-seconds>4</multicast-timeout-seconds>
            </multicast>
            <tcp-ip enabled="false"/>
            <aws enabled="false"/>
        </join>
    </network>
    <replicatedmap name="configurationSetCache">
        <statistics-enabled>true</statistics-enabled>
    </replicatedmap>
</hazelcast>

ReplicatedMapIssue.java :

package test;

import com.hazelcast.core.Hazelcast;
import com.hazelcast.core.HazelcastInstance;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.Serializable;
import java.util.HashMap;
import java.util.Map;

public class ReplicatedMapIssue implements Serializable{

    public static final int NEW_ENTRIES_COUNT = 7;

    private Map<Integer,DummyObject> replicatedMap;

    private BufferedReader stdin;


    public void test() {

        HazelcastInstance hz = Hazelcast.newHazelcastInstance();
        replicatedMap = hz.getReplicatedMap("configurationSetCache");


        try {
            boolean exit = false;
            stdin = new BufferedReader(new InputStreamReader(System.in));
            do {
                System.out.print(
                        "1.replicated map test\n" +
                        "0.exit\n");
                System.out.print("choose: ");
                String line = stdin.readLine();

                switch(Integer.parseInt(line)) {
                    case 1: {
                        runReplicatedMapTest();
                        break;
                    }
                    case 0: {
                        exit = true;
                        break;
                    }
                }
            } while (!exit) ;
            hz.shutdown();
        } catch (IOException e) {
            e.printStackTrace();
        }

    }

    private void runReplicatedMapTest() {
        Map<Integer,DummyObject> map = new HashMap<>();
        for(int i=0;i < NEW_ENTRIES_COUNT;i++){
            map.put((int) (Math.random() * 100), new DummyObject());
        }
        replicatedMap.clear();
        replicatedMap.putAll(map);
//        for(Map.Entry<Integer,DummyObject> e:map.entrySet()) replicatedMap.put(e.getKey(),e.getValue());

        long start = System.currentTimeMillis();
        do {
            try {
                Thread.sleep(1000);
            } catch (InterruptedException ignored) {}
        }while(replicatedMap.size() != NEW_ENTRIES_COUNT);
        long finish = System.currentTimeMillis();
        System.out.format("replicated map (size %d), took %d ms and contains the following values: %s \n", replicatedMap.size(), (finish - start), replicatedMap.values());
    }


    public static void main(String[] args) {
        ReplicatedMapIssue issue = new ReplicatedMapIssue();
        issue.test();
        System.out.println("hazelcast stopped");
        System.exit(0);
    }
}

DummyObject.java :

package test;

import java.io.Serializable;
import java.util.UUID;

public class DummyObject implements Serializable {
    private String a;

    public DummyObject() {
        a = UUID.randomUUID().toString();
    }

    @Override
    public String toString() {
        return a.substring(0,8);
    }
}


using java 8 and hazelcast 3.6

reproduce steps:

  1. run the main
  2. on the menu click 1 few times one by one
    --> after some times either the update will take more than a minute or the program will simply freeze (its because the map contains only 6 entries and the seventh entry will never get updated.

in 3.5.5, the scenario is much better, but around 20% of the times, also the seventh entry is lost.

@sabag sabag changed the title ReplicatedMap updates sometimes takes more than minute ReplicatedMap updates in 3.6 are sometimes lost Feb 28, 2016
@vertex-github
Copy link

I think your test is flawed - you're iterating NEW_ENTRIES_COUNT times, but you will likely not be adding that many entries to the Map<> since you will likely have collisions on the random number generated. Based on tests I just ran, I frequently got a key collision even with only 7 iterations. Check the value from the map.put() - if it's non-zero then you have a key collision.

I don't mean to imply that there isn't an issue but the test as it stands doesn't prove a defect. Are you also aware of the consistency model of ReplicatedMaps? See http://docs.hazelcast.org/docs/3.6/manual/html-single/index.html#replicated-map

@sabag
Copy link
Author

sabag commented Feb 29, 2016

Hi,
You are correct about the uniqueness, I fixed the test to random up to Integer.MAX_VALUE and I also added a check for unique keys.

    private void runReplicatedMapTest() {
        Map<Integer,DummyObject> map = new HashMap<>();
        for(int i=0;i < NEW_ENTRIES_COUNT;i++){
            map.put((int) (Math.random() * Integer.MAX_VALUE), new DummyObject());
        }

        // check uniqueness of keys
        if( map.keySet().size() != NEW_ENTRIES_COUNT ){
            System.out.println("keys are not unique, skipping..");
            return;
        }

        replicatedMap.clear();
        replicatedMap.putAll(map);
 ...
 ...

but still when running 3 nodes cluster (locally), it takes about 18 seconds (in average) for the cluster to finish the update, Are these the expected time measures for updating ReplicatedMap ?
For a comparison, in 3.5.5 it did not take more than 1 second.
Please look at the output from running several times:

replicated map (size 7), took 2009 ms and contains the following values: [08ea6176, baf9225b, 41a25576, a8ebee83, 470529db, 20969b6d, 63980451] 
replicated map (size 7), took 1001 ms and contains the following values: [b72aeaa8, 9f20d358, 5b650d38, 6b1188bd, 5a00514d, caa03cae, 77e2ede1] 
replicated map (size 7), took 5020 ms and contains the following values: [3965f6e1, f04474a2, a9a5c549, 32392473, 1de8b6a8, 80ef32af, ce732c27] 
replicated map (size 7), took 7027 ms and contains the following values: [e2950a79, 99f9bf39, 528a3584, 20092363, 0a800126, 4c3a6b20, f30b7ad5] 
replicated map (size 7), took 16061 ms and contains the following values: [97f3079b, def5915b, 7ba3a9db, 2d73ac31, 6943b57f, 05280418, fe8d0bdd] 
replicated map (size 7), took 17076 ms and contains the following values: [552dd2cb, 43da15c7, 9e82a40c, 7ee63201, c26b0bb2, 788dd421, 8207a026] 
replicated map (size 7), took 19069 ms and contains the following values: [3a4fe678, 7fc1f9c1, 99c56b00, 42e9c2d3, 5245494e, e1ec4b23, 86f5edbe] 
replicated map (size 7), took 22083 ms and contains the following values: [e709eeee, 0b89d542, 6f49bf6f, 778b9d36, 450f2696, 3fca5b67, 3fbb8abe] 
replicated map (size 7), took 27108 ms and contains the following values: [8463d3bc, 8ba8cefb, 7cb38a42, 7ab6fbca, ff41d832, b2278841, bf9bd5be] 
replicated map (size 7), took 28076 ms and contains the following values: [1299319b, dcbb605b, 506349cd, 4fc64f4f, fc6e7552, c8420c8a, a1c9a9ae] 
replicated map (size 7), took 28095 ms and contains the following values: [22622ceb, ea7bcba5, 2ef6efef, cffbbbf0, aa80cdf1, 5965a2c9, 93c35006] 
replicated map (size 7), took 28100 ms and contains the following values: [3a4fe678, 7fc1f9c1, 99c56b00, 42e9c2d3, 5245494e, e1ec4b23, 86f5edbe] 
replicated map (size 7), took 30128 ms and contains the following values: [08ea6176, baf9225b, 41a25576, a8ebee83, 470529db, 20969b6d, 63980451] 
replicated map (size 7), took 22116 ms and contains the following values: [efa33f9e, 83f6e4ba, 5d186404, afdb8e51, b81b34ca, 50905773, 74a4764b] 

@sabag sabag changed the title ReplicatedMap updates in 3.6 are sometimes lost ReplicatedMap updates in 3.6 takes very long time Feb 29, 2016
@eminn
Copy link
Contributor

eminn commented Feb 29, 2016

It looks like an issue with putAll implementation of replicated map. I managed to reproduce the issue with the help of your test. The fix is on the way.

As a workaround until the fix is available, you can use put instead of putAll.

Thanks you for reporting the issue

@sabag
Copy link
Author

sabag commented Feb 29, 2016

No, Thank you !!
:-)

we rolled back to 3.5.5, but in your opinion, if we replace the putAll with put, are we safe with 3.6, or you think its better to stay on 3.5.5 ?

@eminn
Copy link
Contributor

eminn commented Feb 29, 2016

I think it is better to update 3.6 because we've made a lot of improvements to replicated map.

@eminn
Copy link
Contributor

eminn commented Feb 29, 2016

The fix will be available on 3.6.2 and 3.7

@sabag
Copy link
Author

sabag commented Feb 29, 2016

Thank you so much

@vertex-github
Copy link

@eminn I just took a look at the PR - what was causing the delay: https://github.com/hazelcast/hazelcast/pull/7626/files - that equality check seems innocuous enough - are there DNS lookups taking place?

@eminn
Copy link
Contributor

eminn commented Feb 29, 2016

@vertex-github that was causing write operations to not to replicate to the callers. There is a anti-entropy system which checks whether data owners are in sync with the replicas, if not syncs them every 30 seconds. So when the updates are not replicated to callers, it takes up to 30 seconds make all the members sync.

@vertex-github
Copy link

Thanks. Is that 30s parameter configurable?

@eminn
Copy link
Contributor

eminn commented Feb 29, 2016

No, that's an internal right now. But we can make it configureable for the next versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants