Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Member lists out of sync #274

Closed
marshalium opened this issue Sep 17, 2012 · 6 comments

Comments

Projects
None yet
2 participants
@marshalium
Copy link
Contributor

commented Sep 17, 2012

Currently running Hazelcast 2.3.1 on a cluster of 124 nodes.

Hazelcast got into a state where the list of members on some of the nodes do not match up. In the example that I've included node 134 thinks that node 86 is the master but node 86 thinks that node 17 is the master. Node 134 is in both node 86 and node 17's member lists, but node 17 is not in node 86's member list.

The member list does not appear to heal itself and nodes seem to be able to treat a node as their master even if that node doesn't think itself is the master.

Node 134 says it has 42 members and node 86 is the master and the real master isn't in the list:

Members [42] [
    "10.10.10.86:4286", # Master
    ...
    "10.10.10.134:4286", # This
    ...
]

Node 86 says it has a total of 124 members:

Members [124] [
    "10.10.10.17:4286", # Master
    ...
    "10.10.10.134:4286", # Node 134
    ...
    "10.10.10.86:4286", # This
    ...
]

Node 17 has the same list as 86:

Members [124] [
    "10.10.10.17:4286", # Master and This
    ...
    "10.10.10.134:4286", # Node 134
    ...
    "10.10.10.86:4286", # Node 86
    ...
]

Node 134 (the one with the broken list) keeps logging messages like this that appear to be from the true master:

[09.17.12 16:18:15.860 WARN    hz._hzInstance_1_gre com.hazelcast.impl.PartitionManager     ] [10.10.10.134]:4286 [hz-2000] Received a ClusterRuntimeState, but its sender doesn't seem master! => Sender: Address[10.10.10.17]:4286, Master: Address[10.10.10.86]:4286! (Ignore if master node has changed recently.)

Looking at the logs of node 134. It got into this state by timing out its connection to the then master and selecting the next in the list. This happend repeatedly until it settled on node 86.

[09.17.12 10:37:38.777 WARN    hz._hzInstance_1_gre com.hazelcast.cluster.ClusterManager    ] [10.10.10.134]:4286 [hz-2000] Master node has timed out its heartbeat and will be removed
[09.17.12 10:37:38.777 INFO    hz._hzInstance_1_gre com.hazelcast.cluster.ClusterManager    ] [10.10.10.134]:4286 [hz-2000] Removing Address Address[10.10.10.11]:4286
[09.17.12 10:37:38.778 INFO    hz._hzInstance_1_gre com.hazelcast.impl.Node                 ] [10.10.10.134]:4286 [hz-2000] ** setting master address to Address[10.10.10.13]:4286
...
[09.17.12 10:41:21.447 WARN    hz._hzInstance_1_gre com.hazelcast.cluster.ClusterManager    ] [10.10.10.134]:4286 [hz-2000] Master node has timed out its heartbeat and will be removed
[09.17.12 10:41:21.447 INFO    hz._hzInstance_1_gre com.hazelcast.cluster.ClusterManager    ] [10.10.10.134]:4286 [hz-2000] Removing Address Address[10.10.10.85]:4286
[09.17.12 10:41:21.447 INFO    hz._hzInstance_1_gre com.hazelcast.impl.Node                 ] [10.10.10.134]:4286 [hz-2000] ** setting master address to Address[10.10.10.86]:4286
@marshalium

This comment has been minimized.

Copy link
Contributor Author

commented Sep 17, 2012

I don't see anywhere in the code where a new member list is provided after the initial join. Is there any kind of process that should have corrected this?

@marshalium

This comment has been minimized.

Copy link
Contributor Author

commented Sep 20, 2012

Here's some new test cases. The previous ones I posted did not set the max.no.heartbeat interval to a small enough interval so they weren't completely valid.

package com.hazelcast.cluster;

import static junit.framework.Assert.assertEquals;

import java.io.Serializable;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.atomic.AtomicBoolean;

import org.junit.After;
import org.junit.BeforeClass;
import org.junit.Test;
import org.junit.runner.RunWith;

import com.hazelcast.config.Config;
import com.hazelcast.config.NetworkConfig;
import com.hazelcast.core.Hazelcast;
import com.hazelcast.core.HazelcastInstance;
import com.hazelcast.core.MultiTask;
import com.hazelcast.impl.GroupProperties;
import com.hazelcast.impl.MemberImpl;
import com.hazelcast.impl.Node;
import com.hazelcast.impl.Processable;
import com.hazelcast.impl.TestUtil;

@RunWith(com.hazelcast.util.RandomBlockJUnit4ClassRunner.class)
public class MemberListTest {

    @BeforeClass
    public static void init() throws Exception {
        Hazelcast.shutdownAll();
    }

    @After
    public void cleanup() throws Exception {
        Hazelcast.shutdownAll();
    }

    /*
     * Sets up a situation where node3 removes the master and sets node2 as the
     * master but none of the other nodes do. This means that node3 thinks node2
     * is master but node2 thinks node1 is master.
     */
    @Test
    public void testOutOfSyncMemberListIssue274() throws Exception {
        Config c1 = buildConfig(false);
        Config c2 = buildConfig(false);
        Config c3 = buildConfig(false);

        c1.getNetworkConfig().setPort(35701);
        c2.getNetworkConfig().setPort(35702);
        c3.getNetworkConfig().setPort(35703);

        List<String> allMembers = Arrays.asList("127.0.0.1:35701, 127.0.0.1:35702, 127.0.0.1:35703");
        c1.getNetworkConfig().getJoin().getTcpIpConfig().setMembers(allMembers);
        c2.getNetworkConfig().getJoin().getTcpIpConfig().setMembers(allMembers);
        c3.getNetworkConfig().getJoin().getTcpIpConfig().setMembers(allMembers);

        final HazelcastInstance h1 = Hazelcast.newHazelcastInstance(c1);
        final HazelcastInstance h2 = Hazelcast.newHazelcastInstance(c2);
        final HazelcastInstance h3 = Hazelcast.newHazelcastInstance(c3);

        // All three nodes join into one cluster
        assertEquals(3, h1.getCluster().getMembers().size());
        assertEquals(3, h2.getCluster().getMembers().size());
        assertEquals(3, h3.getCluster().getMembers().size());

        // This simulates node 1 doing at least one read from the other nodes in the list at regular intervals
        // This prevents the heart beat code from timing out
        final AtomicBoolean doingWork = new AtomicBoolean(true);
        Thread workThread = new Thread(new Runnable() {

            @Override
            public void run() {
                while (doingWork.get()) {
                    MultiTask<String> task = new MultiTask<String>(new PingCallable(), h1.getCluster().getMembers());
                    h1.getExecutorService().execute(task);

                    try {
                        task.get();
                    } catch (Exception e) {
                        e.printStackTrace();
                    }

                    try {
                        Thread.sleep(2000);
                    } catch (InterruptedException e) {
                        e.printStackTrace();
                    }
                }
            }
        });
        workThread.start();

        final Node n3 = TestUtil.getNode(h3);
        n3.clusterManager.enqueueAndWait(new Processable() {
            public void process() {

                // Simulates node3's heartbeat code choosing to remove node1 
                n3.clusterManager.doRemoveAddress(((MemberImpl) h1.getCluster().getLocalMember()).getAddress());
                assertEquals(2, n3.clusterManager.getMembers().size());
            }
        }, 5);

        // Give the cluster some time to figure things out. The merge and heartbeat code should have kicked in by this point
        Thread.sleep(60 * 1000);

        doingWork.set(false);
        workThread.join();

        assertEquals(3, h1.getCluster().getMembers().size());
        assertEquals(3, h2.getCluster().getMembers().size());
        assertEquals(3, h3.getCluster().getMembers().size());
    }

    private static class PingCallable implements Callable<String>, Serializable {

        @Override
        public String call() throws Exception {
            return "ping response";
        }
    }

    /*
     * Sets up a situation where the member list is out of order on node2. Both
     * node2 and node1 think they are masters and both think each other are in
     * their clusters.
     */
    @Test
    public void testOutOfSyncMemberListTwoMastersIssue274() throws Exception {
        Config c1 = buildConfig(false);
        Config c2 = buildConfig(false);
        Config c3 = buildConfig(false);

        c1.getNetworkConfig().setPort(45701);
        c2.getNetworkConfig().setPort(45702);
        c3.getNetworkConfig().setPort(45703);

        List<String> allMembers = Arrays.asList("127.0.0.1:45701, 127.0.0.1:45702, 127.0.0.1:45703");
        c1.getNetworkConfig().getJoin().getTcpIpConfig().setMembers(allMembers);
        c2.getNetworkConfig().getJoin().getTcpIpConfig().setMembers(allMembers);
        c3.getNetworkConfig().getJoin().getTcpIpConfig().setMembers(allMembers);

        final HazelcastInstance h1 = Hazelcast.newHazelcastInstance(c1);
        final HazelcastInstance h2 = Hazelcast.newHazelcastInstance(c2);
        final HazelcastInstance h3 = Hazelcast.newHazelcastInstance(c3);

        final MemberImpl m1 = (MemberImpl) h1.getCluster().getLocalMember();
        final MemberImpl m2 = (MemberImpl) h2.getCluster().getLocalMember();
        final MemberImpl m3 = (MemberImpl) h3.getCluster().getLocalMember();

        // All three nodes join into one cluster
        assertEquals(3, h1.getCluster().getMembers().size());
        assertEquals(3, h2.getCluster().getMembers().size());
        assertEquals(3, h3.getCluster().getMembers().size());

        final Node n2 = TestUtil.getNode(h2);

        n2.clusterManager.enqueueAndWait(new Processable() {
            public void process() {

                // Simulates node2 getting an out of order member list. That causes node2 to think it's the master.
                List<MemberInfo> members = new ArrayList<MemberInfo>();
                members.add(new MemberInfo(m2.getAddress(), m2.getNodeType(), m2.getUuid()));
                members.add(new MemberInfo(m3.getAddress(), m3.getNodeType(), m3.getUuid()));
                members.add(new MemberInfo(m1.getAddress(), m1.getNodeType(), m1.getUuid()));
                n2.clusterManager.updateMembers(members);
                n2.setMasterAddress(m2.getAddress());
            }
        }, 5);

        // Give the cluster some time to figure things out. The merge and heartbeat code should have kicked in by this point
        Thread.sleep(60 * 1000);

        assertEquals(m1, h1.getCluster().getMembers().iterator().next());
        assertEquals(m1, h2.getCluster().getMembers().iterator().next());
        assertEquals(m1, h3.getCluster().getMembers().iterator().next());

        assertEquals(3, h1.getCluster().getMembers().size());
        assertEquals(3, h2.getCluster().getMembers().size());
        assertEquals(3, h3.getCluster().getMembers().size());
    }

    private static Config buildConfig(boolean multicastEnabled) {
        Config c = new Config();
        c.getGroupConfig().setName("group").setPassword("pass");
        c.setProperty(GroupProperties.PROP_MERGE_FIRST_RUN_DELAY_SECONDS, "10");
        c.setProperty(GroupProperties.PROP_MERGE_NEXT_RUN_DELAY_SECONDS, "5");
        c.setProperty(GroupProperties.PROP_MAX_NO_HEARTBEAT_SECONDS, "10");
        final NetworkConfig networkConfig = c.getNetworkConfig();
        networkConfig.getJoin().getMulticastConfig().setEnabled(multicastEnabled);
        networkConfig.getJoin().getTcpIpConfig().setEnabled(!multicastEnabled);
        networkConfig.setPortAutoIncrement(false);
        return c;
    }
}
@mdogan

This comment has been minimized.

Copy link
Member

commented Sep 20, 2012

Yes, member list provided to nodes only when a new member joins or leaves.

Although Hazelcast can handle network split, it's hard to recover from a
partial network split.

~ Sent from mobile

@marshalium

This comment has been minimized.

Copy link
Contributor Author

commented Sep 20, 2012

How does this sound as a solution? It should solve the two test cases I've provided.

The master node at regular intervals asks each of its members to confirm that it is actually their master. The nodes that do not agree are removed from the master node's list. This ensures that a node cannot be in more than one cluster at a time because it will confirm only one master. Eventually, once every node is in a single cluster, the split-brain handler can resolve the multiple cluster situation.

@marshalium

This comment has been minimized.

Copy link
Contributor Author

commented Sep 24, 2012

I've submitted a pull request with a fix. It would be nice to get it into the next 2.3.x release.

Contains three unit tests that failed and now pass with the patch.

#284

@mdogan

This comment has been minimized.

Copy link
Member

commented Sep 24, 2012

Thanks for the patch, Marshall. We will look into this.

@mmdogan

~ Sent from mobile
On Sep 24, 2012 8:04 PM, "Marshall Scorcio" notifications@github.com
wrote:

I've submitted a pull request with a fix. It would be nice to get it into
the next 2.3.x release.

Contains three unit tests that failed and now pass with the patch.

#284#284


Reply to this email directly or view it on GitHubhttps://github.com//issues/274#issuecomment-8826296.

@mdogan mdogan closed this Sep 28, 2012

mdogan added a commit that referenced this issue Oct 17, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.