Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lost data if a node is killed during data repartition #5444

Closed
elmor opened this issue Jun 3, 2015 · 37 comments

Comments

Projects
None yet
@elmor
Copy link

commented Jun 3, 2015

Hi,

I am evaluating HazelCast 3.42 against Coherence and I experienced a strange behavior with Hazelcast.

I have a cluster of two nodes(I will name them node1 and node2) and two clients

I have a map name “toto” (I joined the hazelcast configuration file server.xml)

  1.   I inserted 100000 entries into the map=> all is ok.
    
  2.   I kill the node  node2. All is ok I found the 100000 entries into the node1.
    
  3.   I restart the node2.  While node2 is synchronizing with node1, I kill again node2.
    
  4.   Node1 only have 51919 elements L I lost 48081 entries L (The pb seems to be due to the partitioning)
    

Ho How can I avoid to lose elements of a node is killed during synchronisation ?

I can see the following errors in the log file of my server

Members [1] {

        Member [192.88.65.124]:5701 this

}

<Jun 2, 2015 3:03:43 PM CEST> <com.hazelcast.partition.InternalPartitionService> <dcaclx01.dns21.socgen> <hz._hzInstance_1_dev.migration> <> <> <> <1433250223501> <[192.88.65.124]:5701 [dev] [3.4.2] Partition balance is ok, no need to re-partition cluster data... >

<Jun 2, 2015 3:05:14 PM CEST> <dcaclx01.dns21.socgen> <weblogic.GCMonitor> <> <> <> <1433250314135> <79% of the total memory in the server is free.>

<Jun 2, 2015 3:05:41 PM CEST> <com.hazelcast.nio.tcp.SocketAcceptor> <dcaclx01.dns21.socgen> <hz._hzInstance_1_dev.IO.thread-Acceptor> <> <> <> <1433250341255> <[192.88.65.124]:5701 [dev] [3.4.2] Accepting socket connection from /192.88.65.123:2247>

<Jun 2, 2015 3:05:41 PM CEST> <com.hazelcast.nio.tcp.TcpIpConnectionManager> <dcaclx01.dns21.socgen> <hz._hzInstance_1_dev.cached.thread-4> <> <> <> <1433250341267> <[192.88.65.124]:5701 [dev] [3.4.2] Established socket connection between /192.88.65.124:5701 and dcaclx02.dns21-3.socgen/192.88.65.123:2247>

<Jun 2, 2015 3:05:47 PM CEST> <com.hazelcast.cluster.ClusterService> <dcaclx01.dns21.socgen> <hz._hzInstance_1_dev.generic-operation.thread-1> <> <> <> <1433250347267> <[192.88.65.124]:5701 [dev] [3.4.2]

Members [2] {

        Member [192.88.65.124]:5701 this

        Member [192.88.65.123]:5701

}

<Jun 2, 2015 3:05:47 PM CEST> <com.hazelcast.partition.InternalPartitionService> <dcaclx01.dns21.socgen> <hz._hzInstance_1_dev.migration> <> <> <> <1433250347582> <[192.88.65.124]:5701 [dev] [3.4.2] Re-partitioning cluster data... Migration queue size: 135>

<Jun 2, 2015 3:05:48 PM CEST> <com.hazelcast.nio.tcp.TcpIpConnection> <dcaclx01.dns21.socgen> <hz._hzInstance_1_dev.IO.thread-in-2> <> <> <> <1433250348783> <[192.88.65.124]:5701 [dev] [3.4.2] Connection [Address[192.88.65.123]:5701] lost. Reason: java.io.EOFException[Remote socket closed!]>

<Jun 2, 2015 3:05:48 PM CEST> <com.hazelcast.nio.tcp.ReadHandler> <dcaclx01.dns21.socgen> <hz._hzInstance_1_dev.IO.thread-in-2> <> <> <> <1433250348786> <[192.88.65.124]:5701 [dev] [3.4.2] hz._hzInstance_1_dev.IO.thread-in-2 Closing socket to endpoint Address[192.88.65.123]:5701, Cause:java.io.EOFException: Remote socket closed!>

<Jun 2, 2015 3:05:49 PM CEST> <com.hazelcast.partition.InternalPartitionService> <dcaclx01.dns21.socgen> <hz._hzInstance_1_dev.migration> <> <> <> <1433250349207> <[192.88.65.124]:5701 [dev] [3.4.2] All migration tasks have been completed, queues are empty.>

<Jun 2, 2015 3:05:49 PM CEST> <com.hazelcast.nio.tcp.SocketConnector> <dcaclx01.dns21.socgen> <hz._hzInstance_1_dev.cached.thread-5> <> <> <> <1433250349490> <[192.88.65.124]:5701 [dev] [3.4.2] Connecting to /192.88.65.123:5701, timeout: 0, bind-any: true>

<Jun 2, 2015 3:05:49 PM CEST> <com.hazelcast.nio.tcp.SocketConnector> <dcaclx01.dns21.socgen> <hz._hzInstance_1_dev.cached.thread-5> <> <> <> <1433250349511> <[192.88.65.124]:5701 [dev] [3.4.2] Could not connect to: /192.88.65.123:5701. Reason: SocketException[Connection refused to address /192.88.65.123:5701]>

<Jun 2, 2015 3:05:49 PM CEST> <com.hazelcast.nio.tcp.SocketConnector> <dcaclx01.dns21.socgen> <hz._hzInstance_1_dev.cached.thread-4> <> <> <> <1433250349716> <[192.88.65.124]:5701 [dev] [3.4.2] Connecting to /192.88.65.123:5701, timeout: 0, bind-any: true>

<Jun 2, 2015 3:05:49 PM CEST> <com.hazelcast.nio.tcp.SocketConnector> <dcaclx01.dns21.socgen> <hz._hzInstance_1_dev.cached.thread-4> <> <> <> <1433250349717> <[192.88.65.124]:5701 [dev] [3.4.2] Could not connect to: /192.88.65.123:5701. Reason: SocketException[Connection refused to address /192.88.65.123:5701]>

<Jun 2, 2015 3:05:49 PM CEST> <com.hazelcast.nio.tcp.SocketConnector> <dcaclx01.dns21.socgen> <hz._hzInstance_1_dev.cached.thread-4> <> <> <> <1433250349717> <[192.88.65.124]:5701 [dev] [3.4.2] Connecting to /192.88.65.123:5701, timeout: 0, bind-any: true>

<Jun 2, 2015 3:05:49 PM CEST> <com.hazelcast.nio.tcp.SocketConnector> <dcaclx01.dns21.socgen> <hz._hzInstance_1_dev.cached.thread-4> <> <> <> <1433250349719> <[192.88.65.124]:5701 [dev] [3.4.2] Could not connect to: /192.88.65.123:5701. Reason: SocketException[Connection refused to address /192.88.65.123:5701]>

<Jun 2, 2015 3:05:49 PM CEST> <com.hazelcast.nio.tcp.SocketConnector> <dcaclx01.dns21.socgen> <hz._hzInstance_1_dev.cached.thread-4> <> <> <> <1433250349719> <[192.88.65.124]:5701 [dev] [3.4.2] Connecting to /192.88.65.123:5701, timeout: 0, bind-any: true>

<Jun 2, 2015 3:05:49 PM CEST> <com.hazelcast.nio.tcp.SocketConnector> <dcaclx01.dns21.socgen> <hz._hzInstance_1_dev.cached.thread-4> <> <> <> <1433250349725> <[192.88.65.124]:5701 [dev] [3.4.2] Could not connect to: /192.88.65.123:5701. Reason: SocketException[Connection refused to address /192.88.65.123:5701]>

<Jun 2, 2015 3:05:49 PM CEST> <com.hazelcast.nio.tcp.SocketConnector> <dcaclx01.dns21.socgen> <hz._hzInstance_1_dev.cached.thread-1> <> <> <> <1433250349727> <[192.88.65.124]:5701 [dev] [3.4.2] Connecting to /192.88.65.123:5701, timeout: 0, bind-any: true>

<Jun 2, 2015 3:05:49 PM CEST> <com.hazelcast.nio.tcp.SocketConnector> <dcaclx01.dns21.socgen> <hz._hzInstance_1_dev.cached.thread-1> <> <> <> <1433250349728> <[192.88.65.124]:5701 [dev] [3.4.2] Could not connect to: /192.88.65.123:5701. Reason: SocketException[Connection refused to address /192.88.65.123:5701]>

<Jun 2, 2015 3:05:50 PM CEST> <com.hazelcast.nio.tcp.SocketConnector> <dcaclx01.dns21.socgen> <hz._hzInstance_1_dev.cached.thread-1> <> <> <> <1433250350491> <[192.88.65.124]:5701 [dev] [3.4.2] Connecting to /192.88.65.123:5701, timeout: 0, bind-any: true>

<Jun 2, 2015 3:05:50 PM CEST> <com.hazelcast.nio.tcp.SocketConnector> <dcaclx01.dns21.socgen> <hz._hzInstance_1_dev.cached.thread-1> <> <> <> <1433250350493> <[192.88.65.124]:5701 [dev] [3.4.2] Could not connect to: /192.88.65.123:5701. Reason: SocketException[Connection refused to address /192.88.65.123:5701]>

<Jun 2, 2015 3:05:50 PM CEST> <com.hazelcast.nio.tcp.TcpIpConnectionMonitor> <dcaclx01.dns21.socgen> <hz._hzInstance_1_dev.cached.thread-1> <> <> <> <1433250350495> <[192.88.65.124]:5701 [dev] [3.4.2] Removing connection to endpoint Address[192.88.65.123]:5701 Cause => java.net.SocketException {Connection refused to address /192.88.65.123:5701}, Error-Count: 5>

<Jun 2, 2015 3:05:50 PM CEST> <com.hazelcast.cluster.ClusterService> <dcaclx01.dns21.socgen> <hz._hzInstance_1_dev.cached.thread-3> <> <> <> <1433250350495> <[192.88.65.124]:5701 [dev] [3.4.2] Removing Member [192.88.65.123]:5701>

<Jun 2, 2015 3:05:50 PM CEST> <com.hazelcast.cluster.ClusterService> <dcaclx01.dns21.socgen> <hz._hzInstance_1_dev.cached.thread-3> <> <> <> <1433250350503> <[192.88.65.124]:5701 [dev] [3.4.2]

Thanks in advance for your help,

ps:
the hazelcast.xml file is:



dev
dev-pass

http://dcaclx01.dns21.socgen:8080/mancenter

<properties>
    <property name="hazelcast.jmx">true</property>
    <property name="hazelcast.jmx.detailed">true</property>     
</properties>
<network>
    <port auto-increment="true" port-count="100">5701</port>
    <outbound-ports>
        <!-- Allowed port range when connecting to other nodes. 0 or * means use 
            system provided port. -->
        <ports>0</ports>
    </outbound-ports>
    <join>
        <multicast enabled="false">
            <multicast-group>224.2.2.3</multicast-group>
            <multicast-port>54327</multicast-port>
        </multicast>
        <tcp-ip enabled="true">
            <interface>192.88.65.123</interface>
            <interface>192.88.65.124</interface>
        </tcp-ip>
        <aws enabled="false">
            <access-key>my-access-key</access-key>
            <secret-key>my-secret-key</secret-key>
            <!--optional, default is us-east-1 -->
            <region>us-west-1</region>
            <!--optional, default is ec2.amazonaws.com. If set, region shouldn't 
                be set as it will override this property -->
            <host-header>ec2.amazonaws.com</host-header>
            <!-- optional, only instances belonging to this group will be discovered, 
                default will try all running instances -->
            <security-group-name>hazelcast-sg</security-group-name>
            <tag-key>type</tag-key>
            <tag-value>hz-nodes</tag-value>
        </aws>
    </join>
    <interfaces enabled="false">
        <interface>10.10.1.*</interface>
    </interfaces>
    <ssl enabled="false" />
    <socket-interceptor enabled="false" />
    <symmetric-encryption enabled="false">
        <!-- encryption algorithm such as DES/ECB/PKCS5Padding, PBEWithMD5AndDES, 
            AES/CBC/PKCS5Padding, Blowfish, DESede -->
        <algorithm>PBEWithMD5AndDES</algorithm>
        <!-- salt value to use when generating the secret key -->
        <salt>thesalt</salt>
        <!-- pass phrase to use when generating the secret key -->
        <password>thepass</password>
        <!-- iteration count to use when generating the secret key -->
        <iteration-count>19</iteration-count>
    </symmetric-encryption>
</network>


<partition-group enabled="false" />
<executor-service name="default">
    <pool-size>16</pool-size>
    <!--Queue capacity. 0 means Integer.MAX_VALUE. -->
    <queue-capacity>0</queue-capacity>
</executor-service>
<queue name="default">
    <!-- Maximum size of the queue. When a JVM's local queue size reaches the 
        maximum, all put/offer operations will get blocked until the queue size of 
        the JVM goes down below the maximum. Any integer between 0 and Integer.MAX_VALUE. 
        0 means Integer.MAX_VALUE. Default is 0. -->
    <max-size>0</max-size>
    <!-- Number of backups. If 1 is set as the backup-count for example, then 
        all entries of the map will be copied to another JVM for fail-safety. 0 means 
        no backup. -->
    <backup-count>1</backup-count>

    <!-- Number of async backups. 0 means no backup. -->
    <async-backup-count>0</async-backup-count>

    <empty-queue-ttl>-1</empty-queue-ttl>
</queue>

<map name="toto">
    <!-- Data type that will be used for storing recordMap. Possible values: 
        BINARY (default): keys and values will be stored as binary data OBJECT : 
        values will be stored in their object forms NATIVE : values will be stored 
        in non-heap region of JVM -->
    <near-cache>
        <!-- Maximum size of the near cache. When max size is reached, cache is 
            evicted based on the policy defined. Any integer between 0 and Integer.MAX_VALUE. 
            0 means Integer.MAX_VALUE. Default is 0. -->
        <max-size>5000</max-size>

        <!-- Maximum number of seconds for each entry to stay in the near cache. 
            Entries that are older than <time-to-live-seconds> will get automatically 
            evicted from the near cache. Any integer between 0 and Integer.MAX_VALUE. 
            0 means infinite. Default is 0. -->
        <time-to-live-seconds>0</time-to-live-seconds>

        <!-- Maximum number of seconds each entry can stay in the near cache as 
            untouched (not-read). Entries that are not read (touched) more than <max-idle-seconds> 
            value will get removed from the near cache. Any integer between 0 and Integer.MAX_VALUE. 
            0 means Integer.MAX_VALUE. Default is 0. -->
        <max-idle-seconds>60</max-idle-seconds>

        <!-- Valid values are: NONE (no extra eviction, <time-to-live-seconds> 
            may still apply), LRU (Least Recently Used), LFU (Least Frequently Used). 
            NONE is the default. Regardless of the eviction policy used, <time-to-live-seconds> 
            will still apply. -->
        <eviction-policy>LRU</eviction-policy>

        <!-- Should the cached entries get evicted if the entries are changed 
            (updated or removed). true of false. Default is true. -->
        <invalidate-on-change>true</invalidate-on-change>

        <!-- You may want also local entries to be cached. This is useful when 
            in memory format for near cache is different than the map's one. By default 
            it is disabled. -->
        <cache-local-entries>false</cache-local-entries>
    </near-cache>


    <in-memory-format>BINARY</in-memory-format>


    <!-- Number of backups. If 1 is set as the backup-count for example, then 
        all entries of the map will be copied to another JVM for fail-safety. 0 means 
        no backup. -->
    <backup-count>1</backup-count>
    <!-- Number of async backups. 0 means no backup. -->
    <async-backup-count>0</async-backup-count>
    <!-- Maximum number of seconds for each entry to stay in the map. Entries 
        that are older than <time-to-live-seconds> and not updated for <time-to-live-seconds> 
        will get automatically evicted from the map. Any integer between 0 and Integer.MAX_VALUE. 
        0 means infinite. Default is 0. -->
    <time-to-live-seconds>0</time-to-live-seconds>
    <!-- Maximum number of seconds for each entry to stay idle in the map. 
        Entries that are idle(not touched) for more than <max-idle-seconds> will 
        get automatically evicted from the map. Entry is touched if get, put or containsKey 
        is called. Any integer between 0 and Integer.MAX_VALUE. 0 means infinite. 
        Default is 0. -->
    <max-idle-seconds>0</max-idle-seconds>
    <!-- Valid values are: NONE (no eviction), LRU (Least Recently Used), LFU 
        (Least Frequently Used). NONE is the default. -->
    <eviction-policy>LRU</eviction-policy>
    <!-- Maximum size of the map. When max size is reached, map is evicted 
        based on the policy defined. Any integer between 0 and Integer.MAX_VALUE. 
        0 means Integer.MAX_VALUE. Default is 0. -->
    <max-size policy="PER_NODE">200000</max-size>
    <!-- When max. size is reached, specified percentage of the map will be 
        evicted. Any integer between 0 and 100. If 25 is set for example, 25% of 
        the entries will get evicted. -->
    <eviction-percentage>25</eviction-percentage>
    <!-- Minimum time in milliseconds which should pass before checking if 
        a partition of this map is evictable or not. Default value is 100 millis. -->
    <min-eviction-check-millis>100</min-eviction-check-millis>
    <!-- While recovering from split-brain (network partitioning), map entries 
        in the small cluster will merge into the bigger cluster based on the policy 
        set here. When an entry merge into the cluster, there might an existing entry 
        with the same key already. Values of these entries might be different for 
        that same key. Which value should be set for the key? Conflict is resolved 
        by the policy set here. Default policy is PutIfAbsentMapMergePolicy There 
        are built-in merge policies such as com.hazelcast.map.merge.PassThroughMergePolicy; 
        entry will be overwritten if merging entry exists for the key. com.hazelcast.map.merge.PutIfAbsentMapMergePolicy 
        ; entry will be added if the merging entry doesn't exist in the cluster. 
        com.hazelcast.map.merge.HigherHitsMapMergePolicy ; entry with the higher 
        hits wins. com.hazelcast.map.merge.LatestUpdateMapMergePolicy ; entry with 
        the latest update wins. -->
    <merge-policy>com.hazelcast.map.merge.PutIfAbsentMapMergePolicy
    </merge-policy>

</map>



<map name="default">
    <!-- Data type that will be used for storing recordMap. Possible values: 
        BINARY (default): keys and values will be stored as binary data OBJECT : 
        values will be stored in their object forms NATIVE : values will be stored 
        in non-heap region of JVM -->
    <in-memory-format>BINARY</in-memory-format>

    <!-- Number of backups. If 1 is set as the backup-count for example, then 
        all entries of the map will be copied to another JVM for fail-safety. 0 means 
        no backup. -->
    <backup-count>1</backup-count>
    <!-- Number of async backups. 0 means no backup. -->
    <async-backup-count>0</async-backup-count>
    <!-- Maximum number of seconds for each entry to stay in the map. Entries 
        that are older than <time-to-live-seconds> and not updated for <time-to-live-seconds> 
        will get automatically evicted from the map. Any integer between 0 and Integer.MAX_VALUE. 
        0 means infinite. Default is 0. -->
    <time-to-live-seconds>0</time-to-live-seconds>
    <!-- Maximum number of seconds for each entry to stay idle in the map. 
        Entries that are idle(not touched) for more than <max-idle-seconds> will 
        get automatically evicted from the map. Entry is touched if get, put or containsKey 
        is called. Any integer between 0 and Integer.MAX_VALUE. 0 means infinite. 
        Default is 0. -->
    <max-idle-seconds>0</max-idle-seconds>
    <!-- Valid values are: NONE (no eviction), LRU (Least Recently Used), LFU 
        (Least Frequently Used). NONE is the default. -->
    <eviction-policy>NONE</eviction-policy>
    <!-- Maximum size of the map. When max size is reached, map is evicted 
        based on the policy defined. Any integer between 0 and Integer.MAX_VALUE. 
        0 means Integer.MAX_VALUE. Default is 0. -->
    <max-size policy="PER_NODE">0</max-size>
    <!-- When max. size is reached, specified percentage of the map will be 
        evicted. Any integer between 0 and 100. If 25 is set for example, 25% of 
        the entries will get evicted. -->
    <eviction-percentage>25</eviction-percentage>
    <!-- Minimum time in milliseconds which should pass before checking if 
        a partition of this map is evictable or not. Default value is 100 millis. -->
    <min-eviction-check-millis>100</min-eviction-check-millis>
    <!-- While recovering from split-brain (network partitioning), map entries 
        in the small cluster will merge into the bigger cluster based on the policy 
        set here. When an entry merge into the cluster, there might an existing entry 
        with the same key already. Values of these entries might be different for 
        that same key. Which value should be set for the key? Conflict is resolved 
        by the policy set here. Default policy is PutIfAbsentMapMergePolicy There 
        are built-in merge policies such as com.hazelcast.map.merge.PassThroughMergePolicy; 
        entry will be overwritten if merging entry exists for the key. com.hazelcast.map.merge.PutIfAbsentMapMergePolicy 
        ; entry will be added if the merging entry doesn't exist in the cluster. 
        com.hazelcast.map.merge.HigherHitsMapMergePolicy ; entry with the higher 
        hits wins. com.hazelcast.map.merge.LatestUpdateMapMergePolicy ; entry with 
        the latest update wins. -->
    <merge-policy>com.hazelcast.map.merge.PutIfAbsentMapMergePolicy
    </merge-policy>

</map>

<multimap name="default">
    <backup-count>1</backup-count>
    <value-collection-type>SET</value-collection-type>
</multimap>

<multimap name="default">
    <backup-count>1</backup-count>
    <value-collection-type>SET</value-collection-type>
</multimap>

<list name="default">
    <backup-count>1</backup-count>
</list>

<set name="default">
    <backup-count>1</backup-count>
</set>

<jobtracker name="default">
    <max-thread-size>0</max-thread-size>
    <!-- Queue size 0 means number of partitions * 2 -->
    <queue-size>0</queue-size>
    <retry-count>0</retry-count>
    <chunk-size>1000</chunk-size>
    <communicate-stats>true</communicate-stats>
    <topology-changed-strategy>CANCEL_RUNNING_OPERATION
    </topology-changed-strategy>
</jobtracker>

<semaphore name="default">
    <initial-permits>0</initial-permits>
    <backup-count>1</backup-count>
    <async-backup-count>0</async-backup-count>
</semaphore>

<serialization>
    <portable-version>0</portable-version>
</serialization>

<services enable-defaults="true" />

@metanet metanet added the PENDING label Jun 3, 2015

@metanet

This comment has been minimized.

Copy link
Contributor

commented Jun 3, 2015

Hi @elmor,

Could you please check this comment: #5388 (comment)

@elmor

This comment has been minimized.

Copy link
Author

commented Jun 3, 2015

Hi @metanet,

thanks for your answer.
Unfortunately I am not agree with you. Having the a node going down again while the sync is in progress and losing data is not desirable at all and should be avoided.

If the first node has 100% of the data, we cannot accept to lose data if the node is going down while repartition process. We cannot know when a JVM, a VM, a router ... will crash

Isn't it possible to copy data in the backup before sending a partition to a node?
The partition backup will be overwritten by the partition sent by the new node. If the new node never send data, no data will be lost.

@metanet

This comment has been minimized.

Copy link
Contributor

commented Jun 4, 2015

Hi @elmor,

I understand your point but what we provide is safety of your data if it already has 1 backup. Your second node is crashing when the cluster is not in safe state. You won't lose your data in every node failure during repartitioning. It is the moment when ownership of a partition is passed to the second node but its 1st backup is not guaranteed yet. After second node receives data of a partition, we remove the partition data from the first node. This is done this way because it becomes a more problematic case if you have many nodes and your initial node is not a backup of the partition anymore.

@elmor

This comment has been minimized.

Copy link
Author

commented Jun 4, 2015

OK so I need at least 3 nodes and 2 backups to be sure to not lose data ?

@fchameau

This comment has been minimized.

Copy link

commented Jun 4, 2015

Hi,

  • Could you please confirm the above statement from 'elmor' before we try it for real (3 nodes, 2 backups will make the trick) ?
  • If yes, we believe this would be valuable as stated explicitely in the documentation (if not yet ? we did not find such detail)

Note: we are in the middle of a POC experiencing several distributed cache solutions. Some other do work well on this very same failover use case (the data is removed from node 1 only when node 2 is up and has acknoledged the reception of the full half data, with backup of this data still on node 1 - and not partition per partition).

Many thanks in advance for your feedback.

@Verdoso

This comment has been minimized.

Copy link

commented Jun 10, 2015

Even if the 3 node rule works, in order to have 3 nodes you have to start with 2, so there you go.

Pretty disappointed on how Hazelcast deals with this, at all levels.

@metanet

This comment has been minimized.

Copy link
Contributor

commented Jun 11, 2015

Hi,

Independent from number of nodes, a new node joining to the cluster takes some of the partitions from the other nodes (=migration). For a specific partition, when its data is moved from an existing node to the new node, existing node removes that data for preventing data-inconsistency problems. When the new node receives data of the migrating partition, it starts sending backups for that partition to the other nodes. The cluster is not in safe state until this process is completed and there may be data loss for a specific case as I explained in my previous comment: #5388 (comment) .

@Verdoso

This comment has been minimized.

Copy link

commented Jun 12, 2015

I understand what the issue is, what I can't comprehend is why this is not considered a bug to be fixed.

It means that no matter what you do, no matter how many nodes or backups you configure, you cannot trust your data to be safe, as one failure when a new node joins the cluster, no matter the size, could compromise the data you already have.

Maybe it's me trying to use Hazelcast in a way it was not meant to be used, but this tiny detail means I have to go back to my customer and tell him we have to start again the selection process for a data-grid solution. It makes me sad.

@pveentjer

This comment has been minimized.

Copy link
Member

commented Jun 13, 2015

Imho this is a bug and needs to get fixed ASAP. Imho we should move this bug-fix to 3.5.1

@pveentjer pveentjer added this to the 3.6 milestone Jun 13, 2015

@pveentjer pveentjer modified the milestones: 3.5.1, 3.6 Jun 13, 2015

@wenerme

This comment has been minimized.

Copy link

commented Jun 14, 2015

+1

@bwzhang2011

This comment has been minimized.

Copy link

commented Jun 15, 2015

@pveentjer, hope such could be resolved ASAP. looking forward to that.

@dfex55

This comment has been minimized.

Copy link

commented Jun 18, 2015

+1
Maybe the already existing backups can be used by the new location, so repartitioning can be finished/acknowledged in a shorter time.
After that, the location of backups can be reorganized by the new owning node.

@lukasblu

This comment has been minimized.

Copy link
Contributor

commented Jun 18, 2015

+1

Adding an additional node should improve reliability and performance - not introduce a potential risk of losing data...

If adding a node takes longer, because of additional synchronization, then I can live with that, but losing data is certainly not an option.

A scenario where a new node crashes while joining a cluster is actually quite realistic. The new node might have too little memory or some Firewall setup was wrong and the node needs to be killed, etc. etc.

@burakcelebi burakcelebi modified the milestones: 3.6, 3.5.1 Jun 26, 2015

@mnaspreda

This comment has been minimized.

Copy link

commented Jun 29, 2015

Hi Hazelcast team,

we have faced the same issue on 2.6.x. Will this fix be applied also to 2.6 branch?

Thanks in advanced!!

@Fabryprog

This comment has been minimized.

Copy link
Contributor

commented Jul 20, 2015

+1

@bwzhang2011

This comment has been minimized.

Copy link

commented Oct 30, 2015

@gurbuzali, any update with such issue ?

@DDani

This comment has been minimized.

Copy link

commented Nov 9, 2015

+1!

@nick7ikin

This comment has been minimized.

Copy link

commented Nov 10, 2015

This is critical issue that produces lots of inconsistent states

@jerrinot

This comment has been minimized.

Copy link
Contributor

commented Nov 10, 2015

Hello all,

thank you for your feedback; It's much appreciated(for real!) and it helps us to prioritize our work. We are now in a feature freeze for 3.6 release. We are going to address this issue in Hazelcast 3.7.

@jerrinot jerrinot modified the milestones: 3.7, Backlog Nov 10, 2015

@dolomite52

This comment has been minimized.

Copy link

commented Nov 10, 2015

-1 this is not a feature, this is a horrible bug that is so problematic that it should be fixed immediately. Why would you let this fester until March 2016?

@ChristerF

This comment has been minimized.

Copy link

commented Nov 12, 2015

Agree on the -1, I do hope that no data loss is the #1 feature for any Hazelcast release and I do hope that is the case. @gregrluck wouldn't you agree with that?

@enesakar

This comment has been minimized.

Copy link
Member

commented Nov 12, 2015

March is the worst case. We will try to backport the fix to 3.6.1 or 3.6.2 patch releases if we can find a solution that does not have a major side effect and compatibility problem.
We have some solution ideas that is expected to work in theory. Partitioning is probably the most critical component of the system; we can not quickly change the partitioning system without extensive testing.

@bwzhang2011

This comment has been minimized.

Copy link

commented Dec 6, 2015

@pveentjer, any update with such issue ? any better solution for data lost not only for the node under repartition to be killed ?

@gregrluck

This comment has been minimized.

Copy link
Contributor

commented Dec 7, 2015

We are just finishing 3.6, then we will start on 3.7 including a fix for this issue.

@bwzhang2011

This comment has been minimized.

Copy link

commented Dec 8, 2015

@gregrluck, maybe it's better add some chapter for data lost risk reason mentioned above as some times we use kill -9 for node shutdown which exists the risk of data lost. for another side, hope the solution could be back ported for most of the version released since 3.5.

sancar added a commit that referenced this issue Dec 30, 2015

Ignoring MigrationAwareTest because it is a known issue
This test needs to be reopened when issue #5444 is addressed and
fixed.
Related test failure and logs can be found here #5393

sancar added a commit that referenced this issue Dec 30, 2015

Ignoring MigrationAwareTest because it is a known issue
This test needs to be reopened when issue #5444 is addressed and
fixed.
Related test failure and logs can be found here #5393
@gawlikt

This comment has been minimized.

Copy link

commented Jan 13, 2016

Is there any suggested workaround for this issue until fix is provided ?
I'm using hazelcast on two machines, each machine having three apps, each app is adding one member to hazelcast cluster - so I have six members in my cluster.

occasionally I need to restart two apps (one per machine)
as a result I always loose some data stored in IMaps

as a workaround I was thinking about increasing number of backups for two maps.
I have such Beans defined:

    @Bean(destroyMethod = "shutdown")
    public HazelcastInstance hazelcastInstance() {
        Config config = new Config("hazelcastInstance");
        config.getGroupConfig().setName(System.getProperty("hazelcast.group"))
                .setPassword(System.getProperty("hazelcast.password"));
        config.addMapConfig(new com.hazelcast.config.MapConfig(CacheElement.MapName1).setBackupCount(5));
        config.addMapConfig(new com.hazelcast.config.MapConfig(CacheElement.MapName2).setBackupCount(5));

        hazelcastInstance = HazelcastInstanceFactory.newHazelcastInstance(config);

        return hazelcastInstance;
    }

I was testing this in eclipse on my laptop
4 apps, so 4 members on one machine
when I kill 2 instances I loose my data again

so can anyone suggest some proper workaround for this issue ?

@jerrinot

This comment has been minimized.

Copy link
Contributor

commented Jan 13, 2016

@gawlikt: The only reliable workaround is to restart members one-by-one.
We are working on a proper fix.

@channingwalton

This comment has been minimized.

Copy link

commented Feb 12, 2016

hi @enesakar, you mentioned above that March is the worst case for a fix, is that still possible?

@channingwalton

This comment has been minimized.

Copy link

commented Mar 1, 2016

Hi, I hate to keep asking but we are doing some planning around our use of hazelcast that critically depends on this issue being fixed.
Do you have any idea at all when this is likely to be resolved?

@gregrluck

This comment has been minimized.

Copy link
Contributor

commented Mar 1, 2016

We realise how critical this issue is.

This issue is being worked on by a team of Core engineers at Hazelcast. They are about a month in. We anticipate having a fix in trunk in the next month. We plan to do a tagged master release so that you and others on this thread can test that we have actually fixed the issue. This way we can get it out to you at the earliest time for testing.

@channingwalton

This comment has been minimized.

Copy link

commented Mar 1, 2016

Thanks for the update @gregrluck, its much appreciated.

@nick7ikin

This comment has been minimized.

Copy link

commented Mar 24, 2016

As a temporary workaround consider switching to client-server mode. So one node will contain all data in consistent state and must be always turned on. The rest nodes will connect as clients and they can only work through the network with data stored server (like MySQL does). In local network environment this approach has quite good performance.

@channingwalton

This comment has been minimized.

Copy link

commented Mar 24, 2016

Thanks @nick7ikin

@burakcelebi

This comment has been minimized.

Copy link
Member

commented Apr 7, 2016

Our engineering team has finished their “Solution Design" for the issue.

Your input is always appreciated. You can review and comment on our design here:

Avoid Data Loss on Migration - Solution Design
PR: #7911

Thanks in advance for any feedback you can provide!

@mdogan mdogan closed this in #7911 Apr 14, 2016

@bwzhang2011

This comment has been minimized.

Copy link

commented May 24, 2016

@elmor, @gawlikt, would you mind sparing some time and take a look it ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.