Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multicast socket exception if no network present. #9081

Closed
neilstevenson opened this issue Oct 11, 2016 · 8 comments

Comments

Projects
None yet
3 participants
@neilstevenson
Copy link

commented Oct 11, 2016

If networking in the hazelcast.xml is set as,

    <network>
        <port>7000</port>
        <interfaces>
            <interface>127.0.0.1</interface>
        </interfaces>
    </network>

and there is only 127.0.0.1 available, on Hazelcast 3.7.2 the following exception appears

SEVERE: [127.0.0.1]:7000 [dev-configured-from-xml] [3.7.1] Can't assign requested address

java.net.SocketException: Can't assign requested address

at java.net.PlainDatagramSocketImpl.join(Native Method)

at java.net.AbstractPlainDatagramSocketImpl.join(AbstractPlainDatagramSocketImpl.java:178)

at java.net.MulticastSocket.joinGroup(MulticastSocket.java:323)

at com.hazelcast.internal.cluster.impl.MulticastService.createMulticastService(MulticastService.java:110)

at com.hazelcast.instance.Node.<init>(Node.java:191)

at com.hazelcast.instance.HazelcastInstanceImpl.createNode(HazelcastInstanceImpl.java:155)

at com.hazelcast.instance.HazelcastInstanceImpl.<init>(HazelcastInstanceImpl.java:126)

at com.hazelcast.instance.HazelcastInstanceFactory.constructHazelcastInstance(HazelcastInstanceFactory.java:218)

at com.hazelcast.instance.HazelcastInstanceFactory.newHazelcastInstance(HazelcastInstanceFactory.java:176)

at com.hazelcast.instance.HazelcastInstanceFactory.newHazelcastInstance(HazelcastInstanceFactory.java:126)

at com.hazelcast.core.Hazelcast.newHazelcastInstance(Hazelcast.java:87)

at XMLMain.main(XMLMain.java:7)

although the server does actually start.

Multicast without network isn't a real scenario, but it's the kind of thing you might use when doing development while travelling, and the exception is misleading.

It may IPV6 related, as using System.setProperty("java.net.preferIPv4Stack", "true"); stops it occuring.

@jerrinot jerrinot added this to the 3.8 milestone Oct 11, 2016

@jerrinot

This comment has been minimized.

Copy link
Contributor

commented Oct 11, 2016

@neilstevenson: what's your OS?

@neilstevenson

This comment has been minimized.

Copy link
Author

commented Oct 11, 2016

I'm on a Macbook, macOS Sierra, Java 1.8.0_101. This scenario comes up on training courses, which use 3.7.1 (my text says 3.7.2, typo).

@jerrinot

This comment has been minimized.

Copy link
Contributor

commented Oct 11, 2016

We use ipv4 as a default multicast group. This blows up when the only non-loopback interface is ipv6

If the multicast group happens to be ipv4 then we should not even attempt to join the group over ipv6 address.

jerrinot added a commit to jerrinot/hazelcast that referenced this issue Oct 11, 2016

@jerrinot

This comment has been minimized.

Copy link
Contributor

commented Oct 11, 2016

Apparently it's caused by the fact multicast is disabled(?) on local loopback on Mac OS X? I don't know why the preferIPv4Stack property works though.

I tried to have a quick check: jerrinot@002bae7 and when the loopback device is used then it's throwing an exception directly on this check.

@tkountis

This comment has been minimized.

Copy link
Contributor

commented Oct 12, 2016

@neilstevenson I managed to reproduce as well. Apparently this is not happening with an earlier OSX version, I had to upgrade to macOS Sierra. From the first looks it seems to be related to the default-interface selection mechanism, I will try to identify the exact root cause.

Would you be able to share the output of the following commands, in your environment ? ifconfig -a and netstat -nr. Please make sure, that you hide any sensitive information in them before sharing.

@tkountis

This comment has been minimized.

Copy link
Contributor

commented Oct 14, 2016

After some more investigation, I strongly believe that this is not related to macOS specifically but rather the network interfaces layout. Apparently when you signup for iCloud services a new interface gets created, usually named utun. This however, is not a unique problem with iCloud per se, in my case I was able to reproduce also by having VPN configured on the machine, which also has another tun device.

When joining a multicast group, some OSes need to know to which interface you are doing the join. If you don't specify an interface, then the algorithm falls back on the default interface. In Java, the NetworkInterface.defaultInterface() is selected in the native lib, and has the following rules

 * We choose the first interface that is up and is (in order of preference):
 * 1. neither loopback nor point to point
 * 2. point to point
 * 3. loopback
 * 4. none.

Therefore when your main internet connection is offline (WiFi in my case) then the next interface with this order is the utun devide. utun device in my case, is configured with IPv6 addresses, so the join is using this configuration by default.

In an attempt to makes this more deterministic, I tried to specify the NetworkInterface instead of allowing the un-deterministric default selection take place.

multicastSocket.setNetworkInterface(NetworkInterface.getByInetAddress(InetAddress.getByName("127.0.0.1")))

forcing the default interface to be the loopback one. This also, looks like a good idea to introduce in the configuration, if config exists specifying multicast interface, use that, otherwise fallback to default.

Although the join succeeds when doing the above, every subsequent send() operation fails with java.io.IOException: Network is unreachable

java.io.IOException: Network is unreachable
         at java.net.PlainDatagramSocketImpl.send(Native Method)
         at java.net.DatagramSocket.send(DatagramSocket.java:693)
         at com.hazelcast.internal.cluster.impl.MulticastService.send(MulticastService.java:235)
         at com.hazelcast.internal.cluster.impl.MulticastJoiner.findMasterWithMulticast(MulticastJoiner.java:174)
         at com.hazelcast.internal.cluster.impl.MulticastJoiner.doJoin(MulticastJoiner.java:62)
         at com.hazelcast.internal.cluster.impl.AbstractJoiner.join(AbstractJoiner.java:135)
         at com.hazelcast.instance.Node.join(Node.java:650)
         at com.hazelcast.instance.Node.start(Node.java:362)
         at com.hazelcast.instance.HazelcastInstanceImpl.<init😠HazelcastInstanceImpl.java:134)
         at com.hazelcast.instance.HazelcastInstanceFactory.constructHazelcastInstance(HazelcastInstanceFactory.java:218)
         at com.hazelcast.instance.HazelcastInstanceFactory.newHazelcastInstance(HazelcastInstanceFactory.java:176)
         at com.hazelcast.instance.HazelcastInstanceFactory.newHazelcastInstance(HazelcastInstanceFactory.java:126)
         at com.hazelcast.core.Hazelcast.newHazelcastInstance(Hazelcast.java:87)

As @neilstevenson correctly pointed out, using java.net.preferIPv4Stack the error goes away.
Looking into the selected NetworkInterface, the InetAddress list contains both IPv6 & IPv4 addresses. PlainDatagramSocketImpl.c native implementation for the respective Java class, seems to be using the first address when creating the socket for imr_address see. http://man7.org/linux/man-pages/man7/ip.7.html
The first address in index 0, is IPv6, when using java.net.preferIPv4Stack the only address available in that list is IPv4 which works as expected.

Looking for a way to hand-wire this potentially using some configuration settings, but I personally don't see value in doing so, since this is going to mostly affect development environments, and can be bypassed by disabling cluster join. @jerrinot WDYT ?
IMHO I would also prefer to have the multicast interface configured rather than relying on the default selection, this of course doesn't solve the problem but I would presume, for production environments makes it easies to work around multi-interface setups.

@tkountis

This comment has been minimized.

Copy link
Contributor

commented Oct 17, 2016

@jerrinot noted that we already do set the interface in lines https://github.com/hazelcast/hazelcast/blob/v3.7.1/hazelcast/src/main/java/com/hazelcast/internal/cluster/impl/MulticastService.java#L94-L98 which was not executed in my use-case.
Apparently this is why this works on most setups, the interface used on the multicast socket is the one of the bind-address used to Hazelcast. There is an extra branch, when the bind address is the loopback one, then we set the interface only if configured so:
<multicast loopbackModeEnabled="true"/>

Also, as seen in these lines of code, there is a reference to a bug that describes almost the same scenario for multicasting on loopback, where a custom route needs to be added statically.

Regarding the difference between setInterface & setNetworkInterface is that the former is using the InetAddress argument to find the interface, while the latter is using the NetworkInterface provided as argument.
Hence, modifying the config as above, fixes the reported exception, but, as before it get us back, to the previously said state (Network is Unreachable exception). Its preference on the IPv6 flow, rather than the IPv4, is quite odd, because it contradicts my instructions, since I specified an IPv4 InetAddress. In order to avoid this flow, you need to enable java.net.preferIPv4Stack.

The ENETUNREACH error in this case is justified, since by checking the route table in my machine I can see that the default gateway for IPv6 is utun0 which is next in the default selection algo but disabled, therefore the destination can't be reached.

However, using the IPv4 even though it manages to join, and it doesn't complain on send() operations either, but looking at the route table, I can see that there is NO default gateway for IPv4. Further confirming this, I tried to nc -vzu 224.2.2.3 54327 while HZ is up but I get the ENETUNREACH on this side. So I am only guessing here, even though HZ comes up with no errors the multicast socket is not working. Currently no more time to investigate this further.

Note: The multicast socket based on the above observations should always be IPv6 unless preferIPv4Stack is enabled.

TL;DR In conclusion, I will try to patch this with an exception, to at least give the user some useful feedback. This works as expected, but its quite confusing to the end user without any context.

For those curious minds out there, here is the native code making the decision of using the IPv6 approach:

IPv6_available = IPv6_supported() & (!preferIPv4Stack);

static void setMulticastInterface(JNIEnv *env, jobject this, int fd,
                                  jint opt, jobject value)
{
    if (opt == java_net_SocketOptions_IP_MULTICAST_IF) {
        /*
         * value is an InetAddress.
         */

...

#else  /* __linux__ not defined */
        if (ipv6_available()) {
            mcast_set_if_by_if_v6(env, this, fd, value);
        } else {
            mcast_set_if_by_if_v4(env, this, fd, value);
        }
...
}

#ifdef AF_INET6
static void mcast_set_if_by_if_v6(JNIEnv *env, jobject this, int fd, jobject value) {
    static jfieldID ni_indexID;
    int index;

    if (ni_indexID == NULL) {
        jclass c = (*env)->FindClass(env, "java/net/NetworkInterface");
        CHECK_NULL(c);
        ni_indexID = (*env)->GetFieldID(env, c, "index", "I");
        CHECK_NULL(ni_indexID);
    }
    index = (*env)->GetIntField(env, value, ni_indexID);

    if (JVM_SetSockOpt(fd, IPPROTO_IPV6, IPV6_MULTICAST_IF,
                       (const char*)&index, sizeof(index)) < 0) {
        if (errno == EINVAL && index > 0) {
            JNU_ThrowByName(env, JNU_JAVANETPKG "SocketException",
                "IPV6_MULTICAST_IF failed (interface has IPv4 "
                "address only?)");
        } else {
            NET_ThrowByNameWithLastError(env, JNU_JAVANETPKG "SocketException",
                           "Error setting socket option");
        }
        return;
    }

}
#endif /* AF_INET6 */

tkountis added a commit to tkountis/hazelcast that referenced this issue Oct 27, 2016

Attempt to warn the end user when the bind address is on the loopback…
… interface, but loopback mode is disabled, for potential auto-discovery issues as described in hazelcast#9081

emrahkocaman added a commit to emrahkocaman/hazelcast that referenced this issue Nov 25, 2016

Attempt to warn the end user when the bind address is on the loopback…
… interface, but loopback mode is disabled, for potential auto-discovery issues as described in hazelcast#9081
@jerrinot

This comment has been minimized.

Copy link
Contributor

commented Dec 6, 2016

warning added by #9190

@jerrinot jerrinot closed this Dec 6, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.