Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[discovery] MulticastDiscoveryStrategy works incorrect with client discovery #10089

Closed
uladzimirFilipchanka opened this issue Mar 16, 2017 · 11 comments

Comments

@uladzimirFilipchanka
Copy link

@uladzimirFilipchanka uladzimirFilipchanka commented Mar 16, 2017

We have client-server interaction based on MulticastDiscoveryStrategy.

Let's say we have appC (client) and appS (server).
Several instances of appC go to appS instances via hazelcast client using MulticastDiscoveryStrategy (only one way). When we need to restart one node of appS we usually have discovery troubles. Hazelcast can't find other nodes of appS which are live and available. And new node stays alone, split brain occur. After some restarts (from 0 to 10 and more), node successfully joins cluster.

Here is out part of hazelcast.xml for appS (the server)

    <properties>
        <property name="hazelcast.discovery.enabled">true</property>
    </properties>
    <group>
        <name>appS</name>
        <password>appSPass</password>
    </group>

    <network>
        <port auto-increment="true" port-count="100">5701</port>
        <outbound-ports>
            <ports>0</ports>
        </outbound-ports>
        <join>
            <discovery-strategies>
                <discovery-strategy class="com.hazelcast.spi.discovery.multicast.MulticastDiscoveryStrategy" enabled="true">
                    <properties>
                        <property name="group">224.2.2.8</property>
                        <property name="port">54324</property>
                    </properties>
                </discovery-strategy>
            </discovery-strategies>
        </join>
    </network>

And part of hazelcast.xml for appC (uses to connect to server)

    <properties>
        <property name="hazelcast.discovery.enabled">true</property>
        <property name="hazelcast.client.invocation.timeout.seconds">1</property>
    </properties>

    <group>
        <name>appS</name>
        <password>appSPass</password>
    </group>

    <network>
        <smart-routing>true</smart-routing>
        <redo-operation>true</redo-operation>

        <connection-attempt-limit>0</connection-attempt-limit>
        <connection-attempt-period>500</connection-attempt-period>
        <connection-timeout>1000</connection-timeout>

        <discovery-strategies>
            <discovery-strategy class="com.hazelcast.spi.discovery.multicast.MulticastDiscoveryStrategy" enabled="true">
                <properties>
                    <property name="group">224.2.2.8</property>
                    <property name="port">54324</property>
                </properties>
            </discovery-strategy>
        </discovery-strategies>
    </network>

After some debugging, we noticed that MulticastDiscoveryReceiver on appS can catches data from appC during initialization. And it leads to splitbrain issue and discovery failure.
Next investigation took us to MulticastDiscoveryStrategy.start method. There we have a new thread creation for multicast porposes. And the thread should be created only if this node isn't a client.
And looks like here is the problem. Based on the code, isClient is always false.

So client application multicasting information about the server it connects to. And new node of the server can catch this data for discovery.

If this is true, fix is simple. MulticastDiscoveryStrategy.initializeMulticastSocket need to contain:

 // code 
            if (discoveryNode == null) {
                isClient = true;
            }
@jerrinot jerrinot added this to the 3.9 milestone Mar 16, 2017
@jerrinot
Copy link
Contributor

@jerrinot jerrinot commented Mar 16, 2017

@uladzimirFilipchanka: awesome finding and analysis, many thanks!

@jerrinot
Copy link
Contributor

@jerrinot jerrinot commented Mar 16, 2017

@uladzimirFilipchanka: fancy sending a PR?

@mdogan
Copy link
Contributor

@mdogan mdogan commented Jun 20, 2017

There's already a discovery mode setting DiscoveryServiceSettings.discoveryMode;

public enum DiscoveryMode {
    /**
     * The current runtime environment is a Hazelcast member node
     */
    Member,

    /**
     * The current runtime environment is a Hazelcast client
     */
    Client
}

Both member and client side impl are passing correct mode to the service. But unfortunately, DiscoveryServiceSettings.getDiscoveryMode() is not used at all.

@mdogan
Copy link
Contributor

@mdogan mdogan commented Jun 20, 2017

And as far as I see, MulticastDiscoveryStrategy is not optimal. It has more issues.

@mmedenjak mmedenjak changed the title MulticastDiscoveryStrategy works incorrect with client discovery [discovery] MulticastDiscoveryStrategy works incorrect with client discovery Jul 13, 2017
@jerrinot
Copy link
Contributor

@jerrinot jerrinot commented Jul 21, 2017

@mdogan: can you elaborate it please?

@mdogan
Copy link
Contributor

@mdogan mdogan commented Jul 24, 2017

I don't remember all, these two are the ones I see now;

  • First problem is, as I already stated above, DiscoveryMode setting is not used. Instead a hacky workaround is used to distinguish a client and a member.
  • Second is, MulticastDiscoverySender continuously publishes discovery packet every 2 (hardcoded) seconds, even after members join, until member shutdowns. So, every member is flooding the network every 2 seconds with multicast messages.
@sancar
Copy link
Member

@sancar sancar commented Aug 1, 2017

Multicast discovery is also not working in a simple scenario.
I have come across the issue when trying to understand what is going on in #10606

Our MemberToMemberDiscoveryTest is passing just because it is using discovery mechanism available in mock network. When I switch the test to real network, nodes are not able to find each other in start. They are finding each other only when SplitBrainHandler kicks in, which is too late for the test.

It seems that DiscoveryStrategy with multicast also needs some work to make it usable.

@emrahkocaman
Copy link
Contributor

@emrahkocaman emrahkocaman commented Aug 24, 2017

@mdogan Yes there already is DiscoveryMode setting but it's not passed to DiscoveryStrategy instances, it's only used within DiscoveryServiceProvider. So we need an SPI change to pass it. Instead I'm proposing to keep SPI as is and use DiscoveryNode as @uladzimirFilipchanka suggested. WDY?

This is just to close this specific problem, IMO other mentioned problems should have their own issues/PRDs.

@mdogan
Copy link
Contributor

@mdogan mdogan commented Aug 24, 2017

That workaround solves the problem in this issue. It's up to you (cc: @jerrinot ), I can't decide that.

Discovery SPI and its default implementation (DefaultDiscoveryServiceProvider, DefaultDiscoveryService etc) are broken, because DiscoveryMode is never used at all. It's set by members and clients but it's never read. I consider this a bug.

@jerrinot
Copy link
Contributor

@jerrinot jerrinot commented Aug 31, 2017

for 3.9 I'd just apply the workaround.

Later we can discuss how to pass the DiscoveryNode with the smallest possible impact on compatibility.

@mesutcelik
Copy link
Contributor

@mesutcelik mesutcelik commented Sep 13, 2017

fixed by #10987
see for further improvements #11357

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

7 participants
You can’t perform that action at this time.