Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node appears to be in multiple clusters #247

Closed
marshalium opened this issue Aug 16, 2012 · 3 comments

Comments

Projects
None yet
2 participants
@marshalium
Copy link
Contributor

commented Aug 16, 2012

In a production environment I ran into a situation where a Hazlecast node seemed to be in multiple clusters at once.

The cluster was split-brained so there were 3 different masters. When the new node started up it connected to 3 different masters and appeared to successfully join each one. There were no errors or warnings in the logs. The node repeatedly logged at various times (over a period of several minutes) that it was in each of the different clusters. Each of the 3 masters logged that the node was in their member list.

I'll attach a test case that consistently reproduces the situation. The test case causes a node to startup successfully and join three masters at the same time. All three masters then think that the new node is one of their members.

@marshalium

This comment has been minimized.

Copy link
Contributor Author

commented Aug 16, 2012

import static org.junit.Assert.assertEquals;

import java.util.Arrays;

import org.junit.AfterClass;
import org.junit.BeforeClass;
import org.junit.Test;
import org.junit.runner.RunWith;

import com.hazelcast.config.Config;
import com.hazelcast.impl.GroupProperties;

@RunWith(com.hazelcast.util.RandomBlockJUnit4ClassRunner.class)
public class JoinMultipleMasters {

    @BeforeClass
    @AfterClass
    public static void init() throws Exception {
        Hazelcast.shutdownAll();
    }

    /*
     * This test illustrates that Hazelcast can get into a state where a node
     * appears to be in more than one cluster.
     */
    @Test
    public void testMultiJoins() throws Exception {
        Config c1 = buildConfig();
        Config c2 = buildConfig();
        Config c3 = buildConfig();
        Config c4 = buildConfig();

        c1.getNetworkConfig().setPort(15701);
        c2.getNetworkConfig().setPort(15702);
        c3.getNetworkConfig().setPort(15703);
        c4.getNetworkConfig().setPort(15704);

        c1.getNetworkConfig().getJoin().getTcpIpConfig().setMembers(Arrays.asList("127.0.0.1:15701"));
        c2.getNetworkConfig().getJoin().getTcpIpConfig().setMembers(Arrays.asList("127.0.0.1:15702"));
        c3.getNetworkConfig().getJoin().getTcpIpConfig().setMembers(Arrays.asList("127.0.0.1:15703"));
        c4.getNetworkConfig().getJoin().getTcpIpConfig().setMembers(Arrays.asList("127.0.0.1:15701, 127.0.0.1:15702, 127.0.0.1:15703, 127.0.0.1:15704"));

        HazelcastInstance h1 = Hazelcast.newHazelcastInstance(c1);
        HazelcastInstance h2 = Hazelcast.newHazelcastInstance(c2);
        HazelcastInstance h3 = Hazelcast.newHazelcastInstance(c3);

        // First three nodes are up. All should be in separate clusters.
        assertEquals(1, h1.getCluster().getMembers().size());
        assertEquals(1, h2.getCluster().getMembers().size());
        assertEquals(1, h3.getCluster().getMembers().size());

        HazelcastInstance h4 = Hazelcast.newHazelcastInstance(c4);

        // Fourth node is up. Should join one of the other three clusters.
        int numNodesWithTwoMembers = 0;
        if (h1.getCluster().getMembers().size() == 2) {
            numNodesWithTwoMembers++;
        }
        if (h2.getCluster().getMembers().size() == 2) {
            numNodesWithTwoMembers++;
        }
        if (h3.getCluster().getMembers().size() == 2) {
            numNodesWithTwoMembers++;
        }
        if (h4.getCluster().getMembers().size() == 2) {
            numNodesWithTwoMembers++;
        }

        Member h4Member = h4.getCluster().getLocalMember();

        int numNodesThatKnowAboutH4 = 0;
        if (h1.getCluster().getMembers().contains(h4Member)) {
            numNodesThatKnowAboutH4++;
        }
        if (h2.getCluster().getMembers().contains(h4Member)) {
            numNodesThatKnowAboutH4++;
        }
        if (h3.getCluster().getMembers().contains(h4Member)) {
            numNodesThatKnowAboutH4++;
        }
        if (h4.getCluster().getMembers().contains(h4Member)) {
            numNodesThatKnowAboutH4++;
        }

        /*
         * At this point h4 should have joined a single node out of the other
         * three. There should be two clusters of one and one cluster of two. h4
         * should only be in one cluster.
         * 
         * This is not what is happening. Instead, h4 thinks it joined in a
         * cluster of two with one of the other three nodes. And each of the
         * other three nodes (h1, h2, and h3) thinks that h4 is joined with
         * them.
         */
        assertEquals(2, h4.getCluster().getMembers().size());
        assertEquals(2, numNodesWithTwoMembers);
        assertEquals(2, numNodesThatKnowAboutH4);
    }

    private static Config buildConfig() {
        Config c = new Config();
        c.getNetworkConfig().getJoin().getMulticastConfig().setEnabled(false);
        c.getNetworkConfig().getJoin().getTcpIpConfig().setEnabled(true);
        c.getNetworkConfig().setPortAutoIncrement(false);
        c.setProperty(GroupProperties.PROP_WAIT_SECONDS_BEFORE_JOIN, "0");
        return c;
    }
}

@ghost ghost assigned mdogan Aug 17, 2012

@mdogan mdogan closed this in d089c7b Aug 17, 2012

@mdogan

This comment has been minimized.

Copy link
Member

commented Aug 17, 2012

Thanks for findings and test case.

@marshalium

This comment has been minimized.

Copy link
Contributor Author

commented Aug 19, 2012

No problem. Thanks for getting a fix in so quickly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.