Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JDK/JRE 7 update 72 breaks multicast in 1.4.0.Beta1 #8225

Closed
costin opened this Issue Oct 24, 2014 · 12 comments

Comments

Projects
None yet
6 participants
@costin
Copy link
Member

costin commented Oct 24, 2014

I've just updated to JDK 7 update 72 and 1.4.0.Beta1 doesn't work any more (win 8.1).

# java -version
java version "1.7.0_72"
Java(TM) SE Runtime Environment (build 1.7.0_72-b14)
Java HotSpot(TM) Client VM (build 24.72-b04, mixed mode, sharing)
[2014-10-25 00:37:58,821][INFO ][node                     ] [Thog] version[1.4.0.Beta1], pid[6224], build[1f25669/2014-10-01T14:58:15Z]
[2014-10-25 00:37:58,822][INFO ][node                     ] [Thog] initializing ...
[2014-10-25 00:37:58,827][INFO ][plugins                  ] [Thog] loaded [], sites []
[2014-10-25 00:38:01,206][INFO ][node                     ] [Thog] initialized
[2014-10-25 00:38:01,206][INFO ][node                     ] [Thog] starting ...
[2014-10-25 00:38:01,308][INFO ][transport                ] [Thog] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.168.1.50:9300]}
[2014-10-25 00:38:01,351][INFO ][discovery.zen.ping.multicast] [Thog] multicast failed to start [SocketException[An invalid argument was supplied]], disabling
[2014-10-25 00:38:01,353][INFO ][discovery                ] [Thog] elasticsearch/Sk3GjLbeTduVCehGYnfdRg
[2014-10-25 00:38:01,359][WARN ][discovery.zen.ping.multicast] [Thog] failed to send multicast ping request: NullPointerException[null]
[2014-10-25 00:38:02,860][WARN ][discovery.zen.ping.multicast] [Thog] failed to send multicast ping request: NullPointerException[null]
[2014-10-25 00:38:04,372][INFO ][cluster.service          ] [Thog] new_master [Thog][Sk3GjLbeTduVCehGYnfdRg][cerberus][inet[/192.168.1.50:9300]], reason: zen-disco-join (elected_as_master)
[2014-10-25 00:38:04,415][INFO ][http                     ] [Thog] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.168.1.50:9200]}
[2014-10-25 00:38:04,416][INFO ][node                     ] [Thog] started
[2014-10-25 00:38:04,926][INFO ][gateway                  ] [Thog] recovered [2] indices into cluster_state

1.4.0.Beta1 works fine with 7u67 and 8u25. It looks like only 7u72 is problematic (potentially 7u71).
Tested 1.3.4 on 7u72 and it seems to be working fine.
Verified the java version and the same problem occurs - clearly something has changed in the JDK.

# java -version
java version "1.7.0_72"
Java(TM) SE Runtime Environment (build 1.7.0_72-b14)
Java HotSpot(TM) Client VM (build 24.72-b04, mixed mode, sharing)

# elasticsearch
[2014-10-25 00:50:20,068][INFO ][node                     ] [Wendell Vaughn] version[1.3.4], pid[7220], build[a70f3cc/2014-09-30T09:07:17Z]
[2014-10-25 00:50:20,070][INFO ][node                     ] [Wendell Vaughn] initializing ...
[2014-10-25 00:50:20,074][INFO ][plugins                  ] [Wendell Vaughn] loaded [], sites []
[2014-10-25 00:50:22,367][INFO ][node                     ] [Wendell Vaughn] initialized
[2014-10-25 00:50:22,367][INFO ][node                     ] [Wendell Vaughn] starting ...
[2014-10-25 00:50:22,472][INFO ][transport                ] [Wendell Vaughn] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.168.1.50:9300]}
[2014-10-25 00:50:22,570][INFO ][discovery.zen.ping.multicast] [Wendell Vaughn] multicast failed to start [SocketException[An invalid argument was supplied]], disabling
[2014-10-25 00:50:22,571][INFO ][discovery                ] [Wendell Vaughn] elasticsearch/iROZFsnQQb-2OUFP6bA_bA
[2014-10-25 00:50:22,600][WARN ][discovery.zen.ping.multicast] [Wendell Vaughn] failed to send multicast ping request: NullPointerException[null]
[2014-10-25 00:50:24,102][WARN ][discovery.zen.ping.multicast] [Wendell Vaughn] failed to send multicast ping request: NullPointerException[null]
[2014-10-25 00:50:25,605][INFO ][cluster.service          ] [Wendell Vaughn] new_master [Wendell Vaughn][iROZFsnQQb-2OUFP6bA_bA][cerberus][inet[/192.168.1.50:9300]], reason: zen-disco-join (elected_as_master)
[2014-10-25 00:50:25,638][INFO ][gateway                  ] [Wendell Vaughn] recovered [0] indices into cluster_state
[2014-10-25 00:50:25,649][INFO ][http                     ] [Wendell Vaughn] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.168.1.50:9200]}
[2014-10-25 00:50:25,650][INFO ][node                     ] [Wendell Vaughn] started
@costin

This comment has been minimized.

Copy link
Member Author

costin commented Oct 24, 2014

Looking at the JDK changelog it might be that update 71 (a critical security update) introduced the breaking behaviour, namely through 8021372:
8021372 core-libs java.net NetworkInterface.getNetworkInterfaces() returns duplicate hardware address

@nik9000

This comment has been minimized.

Copy link
Contributor

nik9000 commented Oct 24, 2014

Yay! We're in the process of upgrading to 7u72 right now. Happy times.

If it needs an assignee I can look at it on Monday. I need an excuse to go poking around in this code. OTOH if someone has time to work on it please don't let me stop you. Your Monday probably starts before mine anyway.

@costin

This comment has been minimized.

Copy link
Member Author

costin commented Oct 24, 2014

@nik9000 Note that I've currently tested it on Windows - there's a slim chance, the bug doesn't occur on *nix machines...

@nik9000

This comment has been minimized.

Copy link
Contributor

nik9000 commented Oct 24, 2014

Ah. Well, I'll at least be able to test that on Monday. I don't think my wife would appreciate me commandeering her machine to reproduce on Windows. So if it doesn't reproduce on unix I'd probably be a bad person to work on it.

@Mpdreamz

This comment has been minimized.

Copy link
Member

Mpdreamz commented Oct 27, 2014

I'm on Windows 8.1 but I can't reproduce:

c:\>"%JAVA_HOME%\bin\java.exe" -version
java version "1.7.0_72"
Java(TM) SE Runtime Environment (build 1.7.0_72-b14)
Java HotSpot(TM) Client VM (build 24.72-b04, mixed mode, sharing)
c:\Data\elasticsearch-1.4.0.Beta1>bin\elasticsearch
[2014-10-27 11:25:44,236][WARN ][bootstrap                ] jvm uses the client vm, make sure to run `java` with the server vm for best performance by adding `-server` to the command line
[2014-10-27 11:25:44,299][INFO ][node                     ] [Pete Wisdom] version[1.4.0.Beta1], pid[4656], build[1f25669/2014-10-01T14:58:15Z]
[2014-10-27 11:25:44,299][INFO ][node                     ] [Pete Wisdom] initializing ...
[2014-10-27 11:25:44,314][INFO ][plugins                  ] [Pete Wisdom] loaded [], sites []
[2014-10-27 11:25:46,317][INFO ][node                     ] [Pete Wisdom] initialized
[2014-10-27 11:25:46,317][INFO ][node                     ] [Pete Wisdom] starting ...
[2014-10-27 11:25:46,458][INFO ][transport                ] [Pete Wisdom] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.168.194.169:9300]}
[2014-10-27 11:25:46,520][INFO ][discovery                ] [Pete Wisdom] elasticsearch/oVt8wl4-S9usPM0xWva_TA
[2014-10-27 11:25:49,567][INFO ][cluster.service          ] [Pete Wisdom] new_master [Pete Wisdom][oVt8wl4-S9usPM0xWva_TA][WIN-QIFLG7L0DOO][inet[/192.168.194.169:9300]], reason: zen-disco-join (elected_as_master)
[2014-10-27 11:25:49,598][INFO ][gateway                  ] [Pete Wisdom] recovered [0] indices into cluster_state
[2014-10-27 11:25:49,614][INFO ][http                     ] [Pete Wisdom] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.168.194.169:9200]}
[2014-10-27 11:25:49,614][INFO ][node                     ] [Pete Wisdom] started

Am I doing something wrong here @costin ?

@costin

This comment has been minimized.

Copy link
Member Author

costin commented Oct 27, 2014

Looks like the issue can be reproduced only on Win 8.1 SP1 with both u71 and u72. Elasticsearch ran fine on Win 7 and Win 8.1 (no SP1). In my opinion, this is not a blocker any more however there is work on the way to provide a fix for it.

@gmarz

This comment has been minimized.

Copy link
Member

gmarz commented Oct 27, 2014

@costin I'm also not able to reproduce this running Win 8.1.

What is considered SP1? I can't seem to find any updates being referred to as SP1. The latest update I have installed is KB2995388.

java version "1.7.0_71"
Java(TM) SE Runtime Environment (build 1.7.0_71-b14)
Java HotSpot(TM) 64-Bit Server VM (build 24.71-b01, mixed mode)

[2014-10-27 10:42:15,525][INFO ][node                     ] [Persuasion] version[1.4.0.Beta1], pid[4416], build[1f25669/2014-10-01T14:58:15Z]
[2014-10-27 10:42:15,525][INFO ][node                     ] [Persuasion] initializing ...
[2014-10-27 10:42:15,541][INFO ][plugins                  ] [Persuasion] loaded [], sites []
[2014-10-27 10:42:18,119][INFO ][node                     ] [Persuasion] initialized
[2014-10-27 10:42:18,119][INFO ][node                     ] [Persuasion] starting ...
[2014-10-27 10:42:18,259][INFO ][transport                ] [Persuasion] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/10.211.55.3:9300]}
[2014-10-27 10:42:18,291][INFO ][discovery                ] [Persuasion] elasticsearch/Mwp7dFlMQW6kVg-xsdU87w 
[2014-10-27 10:42:21,338][INFO ][cluster.service          ] [Persuasion] new_master [Persuasion][Mwp7dFlMQW6kVg-xsdU87w][GREGMARZOUK4138][inet[/10.211.55.3:9300]], reason: zen-disco-join (elected_as_master)
[2014-10-27 10:42:21,370][INFO ][http                     ] [Persuasion] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/10.211.55.3:9200]}
[2014-10-27 10:42:21,370][INFO ][node                     ] [Persuasion] started

costin added a commit to costin/elasticsearch that referenced this issue Oct 27, 2014

Improve handling of multicast binding exceptions
In case the multicast binding fails (due to socket errors), abort zen
pinging code and throw a better error message
Relates elastic#8225
@costin

This comment has been minimized.

Copy link
Member Author

costin commented Oct 27, 2014

Based on various feedback it seems that I am the only one running into this issue which suggests there's some configuration issue with my system. The win 7, win 8.1 (with and without Update 1/ SP1) that we have tested, do not seem to be affected.

@clintongormley Hence why I removed the two labels above (critical and blocker).

@costin

This comment has been minimized.

Copy link
Member Author

costin commented Oct 27, 2014

@nik9000 see the comment above - I don't think you're going to be affected and anyway, there's a fix already in the works (see the PR).

@mrsolo

This comment has been minimized.

Copy link
Contributor

mrsolo commented Oct 27, 2014

Jenkins seeing similar, if the the same failure on window 2012 r2 server

1> [2014-10-27 17:59:39,689][DEBUG][discovery.zen.ping.multicast] [node_t0] failed to send multicast ping request
1> java.lang.NullPointerException
1> at org.elasticsearch.discovery.zen.ping.multicast.MulticastZenPing.sendPingRequest(MulticastZenPing.java:256)
1> at org.elasticsearch.discovery.zen.ping.multicast.MulticastZenPing.ping(MulticastZenPing.java:188)
1> at org.elasticsearch.discovery.zen.ping.ZenPingService.ping(ZenPingService.java:146)
1> at org.elasticsearch.discovery.zen.ping.ZenPingService.pingAndWait(ZenPingService.java:124)
1> at org.elasticsearch.discovery.zen.ZenDiscovery.findMaster(ZenDiscovery.java:932)
1> at org.elasticsearch.discovery.zen.ZenDiscovery.innerJoinCluster(ZenDiscovery.java:341)
1> at org.elasticsearch.discovery.zen.ZenDiscovery.access$5600(ZenDiscovery.java:83)
1> at org.elasticsearch.discovery.zen.ZenDiscovery$JoinThreadControl$1.run(ZenDiscovery.java:1299)
1> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
1> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
1> at java.lang.Thread.run(Thread.java:745)
1> [2014-10-27 17:59:39,705][DEBUG][discovery.zen.ping.multicast] [node_t1] multicast failed to start [SocketException[Unrecognized Windows Sockets error: 0: Cannot bind]], disabling
1> java.net.SocketException: Unrecognized Windows Sockets error: 0: Cannot bind
1> at java.net.TwoStacksPlainDatagramSocketImpl.bind0(Native Method)
1> at java.net.TwoStacksPlainDatagramSocketImpl.bind0(TwoStacksPlainDatagramSocketImpl.java:107)
1> at java.net.AbstractPlainDatagramSocketImpl.bind(AbstractPlainDatagramSocketImpl.java:96)
1> at java.net.TwoStacksPlainDatagramSocketImpl.bind(TwoStacksPlainDatagramSocketImpl.java:97)
1> at java.net.DatagramSocket.bind(DatagramSocket.java:396)
1> at java.net.MulticastSocket.(MulticastSocket.java:172)
1> at java.net.MulticastSocket.(MulticastSocket.java:137)
1> at org.elasticsearch.common.network.MulticastChannel$Plain.buildMulticastSocket(MulticastChannel.java:283)
1> at org.elasticsearch.common.network.MulticastChannel$Plain.(MulticastChannel.java:276)
1> at org.elasticsearch.common.network.MulticastChannel.getChannel(MulticastChannel.java:50)
1> at org.elasticsearch.discovery.zen.ping.multicast.MulticastZenPing.doStart(MulticastZenPing.java:127)
1> at org.elasticsearch.common.component.AbstractLifecycleComponent.start(AbstractLifecycleComponent.java:85)
1> at org.elasticsearch.discovery.zen.ping.ZenPingService.doStart(ZenPingService.java:103)
1> at org.elasticsearch.common.component.AbstractLifecycleComponent.start(AbstractLifecycleComponent.java:85)
1> at org.elasticsearch.discovery.zen.ZenDiscovery.doStart(ZenDiscovery.java:215)
1> at org.elasticsearch.common.component.AbstractLifecycleComponent.start(AbstractLifecycleComponent.java:85)
1> at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source)
1> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
1> at java.lang.reflect.Method.invoke(Method.java:606)
1> at org.elasticsearch.common.inject.internal.ConstructionContext$DelegatingInvocationHandler.invoke(ConstructionContext.java:110)
1> at com.sun.proxy.$Proxy35.start(Unknown Source)
1> at org.elasticsearch.discovery.DiscoveryService.doStart(DiscoveryService.java:84)
1> at org.elasticsearch.common.component.AbstractLifecycleComponent.start(AbstractLifecycleComponent.java:85)
1> at org.elasticsearch.node.internal.InternalNode.start(InternalNode.java:244)
1> at org.elasticsearch.test.InternalTestCluster$5.run(InternalTestCluster.java:1341)
1> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
1> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
1> at java.lang.Thread.run(Thread.java:745)
1> [2014-10-27 17:59:39,705][INFO ][discovery ] [node_t1] TEST-ElasticSearchAdTest-CHILD_VM=[0]-CLUSTER_SEED=[-2825708668188973463]-HASH=[12BD0B6F224490]/D3vQTSdQQlWVzREaEwf0AA
1> [2014-10-27 17:59:39,705][TRACE][discovery.zen.ping.unicast] [node_t0] [1] connecting to [node_t0][1HPXOGEJT46CaUCbPEkDNg][ElasticSearchAdTest][local[54]]{mode=local}
1> [2014-10-27 17:59:39,705][TRACE][discovery.zen.ping.unicast] [node_t0] [1] connected to [node_t0][1HPXOGEJT46CaUCbPEkDNg][ElasticSearchAdTest][local[54]]{mode=local}
1> [2014-10-27 17:59:39,705][TRACE][discovery.zen.ping.unicast] [node_t0] [1] sending to [node_t0][1HPXOGEJT46CaUCbPEkDNg][ElasticSearchAdTest][local[54]]{mode=local}
1> [2014-10-27 17:59:39,705][DEBUG][cluster.service ] [node_t1] processing [initial_join]: execute
1> [2014-10-27 17:59:39,720][DEBUG][cluster.service ] [node_t1] processing [initial_join]: no change in cluster_state
1> [2014-10-27 17:59:39,720][TRACE][discovery.zen ] [node_t1] starting to ping
1> [2014-10-27 17:59:39,720][DEBUG][discovery.zen.ping.multicast] [node_t1] failed to send multicast ping request
1> java.lang.NullPointerException
1> at org.elasticsearch.discovery.zen.ping.multicast.MulticastZenPing.sendPingRequest(MulticastZenPing.java:256)
1> at org.elasticsearch.discovery.zen.ping.multicast.MulticastZenPing.ping(MulticastZenPing.java:188)
1> at org.elasticsearch.discovery.zen.ping.ZenPingService.ping(ZenPingService.java:146)
1> at org.elasticsearch.discovery.zen.ping.ZenPingService.pingAndWait(ZenPingService.java:124)
1> at org.elasticsearch.discovery.zen.ZenDiscovery.findMaster(ZenDiscovery.java:932)
1> at org.elasticsearch.discovery.zen.ZenDiscovery.innerJoinCluster(ZenDiscovery.java:341)
1> at org.elasticsearch.discovery.zen.ZenDiscovery.access$5600(ZenDiscovery.java:83)
1> at org.elasticsearch.discovery.zen.ZenDiscovery$JoinThreadControl$1.run(ZenDiscovery.java:1299)
1> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
1> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

@costin

This comment has been minimized.

Copy link
Member Author

costin commented Oct 27, 2014

Notice the NullPointerException have been fixed in the PR. However the core problem (multicast that suddenly fails) is still present. On my machine the error occurs when calling java.net.MulticastSocket#setInterface while here it seems to occur when calling MulticastSocket(SocketAddress)

costin added a commit that referenced this issue Oct 28, 2014

Improve handling of multicast binding exceptions
In case the multicast binding fails (due to socket errors), abort zen
pinging code and throw a better error message
Relates #8225

costin added a commit that referenced this issue Oct 28, 2014

Improve handling of multicast binding exceptions
In case the multicast binding fails (due to socket errors), abort zen
pinging code and throw a better error message
Relates #8225

(cherry picked from commit f819cff)

costin added a commit that referenced this issue Oct 28, 2014

Improve handling of multicast binding exceptions
In case the multicast binding fails (due to socket errors), abort zen
pinging code and throw a better error message
Relates #8225

costin added a commit that referenced this issue Oct 28, 2014

Improve handling of multicast binding exceptions
In case the multicast binding fails (due to socket errors), abort zen
pinging code and throw a better error message
Relates #8225

(cherry picked from commit f819cff)
@costin

This comment has been minimized.

Copy link
Member Author

costin commented Oct 28, 2014

Pushed the fix in 1.3, 1.4, 1.x and master. In case of a socket error (for whatever reason) a nicer message is displayed to the user and the zen code is skipped (otherwise it causes its own exceptions which only add to the confusion).
This plus the ES_USE_IPV4 flag seem to counter the error reproducible so far only on my windows machines...

@costin costin closed this Oct 28, 2014

mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015

Improve handling of multicast binding exceptions
In case the multicast binding fails (due to socket errors), abort zen
pinging code and throw a better error message
Relates elastic#8225

(cherry picked from commit f819cff)

mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015

Improve handling of multicast binding exceptions
In case the multicast binding fails (due to socket errors), abort zen
pinging code and throw a better error message
Relates elastic#8225
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.