Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default to unicast discovery, with default host list of 127.0.0.1, [::1] #12999

Closed
wants to merge 29 commits into from

Conversation

@rmuir
Copy link
Contributor

commented Aug 19, 2015

Multicast discovery has serious issues such as #12993
It is never going to work correctly and securely with a "default to localhost" scenario. It also has numerous issues with operating system and JDK support.

Instead we should encourage unicast discovery by default, since thats what we encourage for production, and configure it to look at "localhost" by default which will give the same experience for startup/development.

This PR changes the default unicast host list (if not specified) to be "127.0.0.1, [::1]", enables unicast by default, disables multicast by default, fixes the parsing of unicast hosts to be strict, fixes it to allow IPV6 addresses unambiguously, and fixes it to work with hostnames that resolve to multiple addresses.

Things that should be deferred to followups (trust me, this is a huge change on its own, and i know a bunch of tests are angry, which i need help with, but the change works):

  • Move multicast zen discovery out to a plugin, possibly deprecate it.
  • Review all uses + ban InetSocketAddress(java.lang.String, int), which is lenient and bad to use. I did some cleanups here already.

If anyone wants to help, let me know I will give commit access to my branch here.

@@ -64,7 +64,7 @@

public static final String ACTION_NAME = "internal:discovery/zen/unicast";

public static final int LIMIT_PORTS_COUNT = 1;
public static final int LIMIT_PORTS_COUNT = 10;

This comment has been minimized.

Copy link
@rmuir

rmuir Aug 19, 2015

Author Contributor

This is really a nocommit. We should only do 10 i think when its the localhost default. Just trying to start simple :)

This comment has been minimized.

Copy link
@rmuir

rmuir Aug 19, 2015

Author Contributor

Maybe when its localhost we dont need any limiting, not sure, we should look into it. At least any limiting should be "per-address", since a host here could resolve to multiple addresses (e.g. ipv4 and v6, among other possibilities).

This comment has been minimized.

Copy link
@dadoonet

dadoonet Aug 19, 2015

Member

Cool! If we merge it I think we can close #8833 ! It will also help in discovery plugins.

@s1monw

This comment has been minimized.

Copy link
Contributor

commented Aug 19, 2015

ok we are down to 1 awaits fix and all other tests pass - there is some cleanup work needed here for sure but this is a great start!!

rmuir and others added 10 commits Aug 20, 2015
…ur logs and stats apis
We used static methods reading sys properties to define the node mode
per cluster. this had lots of problems when tests couldn't cope with
mixed or only local mode. Now we are passing it down to the cluster from the test
which allows to @SuppressNetworkMode / @SupressLocalMode on the test to force
consistent node configurations.
We only run multicast tests now when we explicitly state it. A working
multicast env is required which is not always the case.
@s1monw

This comment has been minimized.

Copy link
Contributor

commented Aug 20, 2015

I fixed all the remaining tests. Everything now runs on unicast and passes with network and local. We still have some forbiddenAPI:

[ERROR] Forbidden method invocation: java.net.InetSocketAddress#getHostName() [Use getHostString() instead, which avoids a DNS lookup]
[ERROR]   in org.elasticsearch.common.transport.InetSocketTransportAddress (InetSocketTransportAddress.java:96)
[ERROR] Forbidden method invocation: java.net.InetSocketAddress#getHostName() [Use getHostString() instead, which avoids a DNS lookup]
[ERROR]   in org.elasticsearch.common.transport.InetSocketTransportAddress (InetSocketTransportAddress.java:132)
[ERROR] Scanned 5737 (and 1524 related) class file(s) for forbidden API invocations (in 1.90s), 2 error(s).

but those are the last remaining issues. We can SuppressForbidden them if we wanna push the decision out to a different issue?

@@ -360,16 +360,16 @@ public int hashCode() {
public String toString() {
StringBuilder sb = new StringBuilder();
if (nodeName.length() > 0) {
sb.append('[').append(nodeName).append(']');
sb.append('{').append(nodeName).append('}');

This comment has been minimized.

Copy link
@bleskes

bleskes Aug 20, 2015

Member

why is this change? did we decide to change our usage of [] to {}?

This comment has been minimized.

Copy link
@rmuir

rmuir Aug 20, 2015

Author Contributor

because it is part of a bracketed ipv6 address specification.

This comment has been minimized.

Copy link
@s1monw

s1monw Aug 20, 2015

Contributor

+1 for this change though RFCs are a good thing to follow here

This comment has been minimized.

Copy link
@rmuir

rmuir Aug 20, 2015

Author Contributor

Yes, when an IPv6 address is combined with a port number, you really need them. Its needed to make things unambiguous, its needed to work inside a user's browser, its needed to conform with numerous RFCs on the display of this stuff. If you need more justification, see ones like https://tools.ietf.org/html/rfc5952

On the other hand, the brackets in ES logging provide NOTHING. Part of me wanted to remove them across all logging across the board.

* {@code 127.0.0.1} and {@code ::1} respectively. No methods do reverse
* lookups.
*/
public final class NetworkAddress {

This comment has been minimized.

Copy link
@bleskes

bleskes Aug 20, 2015

Member

this is great. It was indeed confusing...

@@ -452,7 +461,7 @@ public boolean onPortNumber(int portNumber) {
profileBoundAddresses.putIfAbsent(name, new BoundTransportAddress(new InetSocketTransportAddress(boundAddress), new InetSocketTransportAddress(publishAddress)));
}

logger.info("Bound profile [{}] to address [{}]", name, boundSocket.get());
logger.info("Bound profile [{}] to address {{}}", name, NetworkAddress.format(boundSocket.get()));

This comment has been minimized.

Copy link
@bleskes

bleskes Aug 20, 2015

Member

same question about the {} wrappers.

@@ -64,7 +64,9 @@

public static final String ACTION_NAME = "internal:discovery/zen/unicast";

public static final int LIMIT_PORTS_COUNT = 1;
public static final int LIMIT_FOREIGN_PORTS_COUNT = 1;
public static final int LIMIT_LOCAL_PORTS_COUNT = 25;

This comment has been minimized.

Copy link
@bleskes

bleskes Aug 20, 2015

Member

Not a big deal, but 25 ports feels a bit high - we will try to connect to all these ports all the time, per local address. Maybe lower to 5? note that people would still be able to start more than 5 nodes (but if they kill the first 5, the cluster will not form again).

This comment has been minimized.

Copy link
@dadoonet

dadoonet Aug 20, 2015

Member

+1. The most I have seen in dev/test was 4 nodes.
The most I have seen in PROD is 3 (2 data nodes + 1 master node on 128gb machines)

This comment has been minimized.

Copy link
@s1monw

s1monw Aug 20, 2015

Contributor

@dadoonet this is only for default config without any configured unicast hosts. I don't think this is used in production. I can change it to 5 for now I don't have strong feelings

This comment has been minimized.

Copy link
@rmuir

rmuir Aug 20, 2015

Author Contributor

I have strong feelings. I dont want this lowered unless its per address.

Wake up guys, computers have more than one address these days.

This comment has been minimized.

Copy link
@rmuir

rmuir Aug 20, 2015

Author Contributor

I'm changing this to per-address. Otherwise someone is just going to follow up with some other PR and break dual-stack machines.

@@ -64,7 +64,9 @@

public static final String ACTION_NAME = "internal:discovery/zen/unicast";

public static final int LIMIT_PORTS_COUNT = 1;
public static final int LIMIT_FOREIGN_PORTS_COUNT = 1;

This comment has been minimized.

Copy link
@dadoonet

dadoonet Aug 20, 2015

Member

Note that in AWS context, we might have very big nodes 128gb RAM with 1 master node 2 data nodes on the same machine.
It could happen that the master is started after so it gets 9302 port allocated.
I wonder if we should not increase this to 3... May be not relevant thought here.

This comment has been minimized.

Copy link
@s1monw

s1monw Aug 20, 2015

Contributor

this is like it used to be please open a followup issue for this

rmuir and others added 3 commits Aug 20, 2015
@s1monw

This comment has been minimized.

Copy link
Contributor

commented Aug 20, 2015

I am +1 on pushing this - it looks good and has alll the fixes needed. We can do all otehr things in folllowups

@rmuir

This comment has been minimized.

Copy link
Contributor Author

commented Aug 20, 2015

Summarizing the current tradeoffs (versus multicast) so we are all aware:

  • out of box bin/elasticsearch will discover nodes on the local machine by default just like multicast did.
  • adding "localhost" as a default unicast discovery host has some advantages over multicast: for example it works with ipv6-only configurations.
  • however, unlike multicast, as soon as you start messing with configuration (changing port numbers, etc), you are going to need to deal with discovery. in most cases, multicast would continue to work all the way into production, regardless.
@jaymode

This comment has been minimized.

Copy link
Member

commented Aug 20, 2015

+1, this looks good to me

@rmuir rmuir closed this in e2ab625 Aug 20, 2015
rmuir added a commit that referenced this pull request Aug 20, 2015
Fix unicast discovery to work when a host has multiple addresses.
Ban dangerous methods in java.net with forbidden APIs.
Fix ipv6 bugs and formatting of network addresses everywhere.

Closes #12999
Closes #12993

Squashed commit of the following:

commit 6c1aa00
Author: Robert Muir <rmuir@apache.org>
Date:   Thu Aug 20 14:25:43 2015 -0400

    Fix these to be correct with addresses just in case

commit 6482156
Merge: d00561b 41d8fbe
Author: Robert Muir <rmuir@apache.org>
Date:   Thu Aug 20 13:23:09 2015 -0400

    Merge branch 'master' into unicast_all_the_way_down

commit d00561b
Author: Simon Willnauer <simonw@apache.org>
Date:   Thu Aug 20 16:38:50 2015 +0200

    limit local ports to 5 in UnicastZenPing

commit e2e15c5
Author: Robert Muir <rmuir@apache.org>
Date:   Thu Aug 20 10:32:47 2015 -0400

    fix port limiting

commit 10153cb
Author: Robert Muir <rmuir@apache.org>
Date:   Thu Aug 20 10:18:37 2015 -0400

    don't serialize scopeids: that's broken

commit 2aa63d4
Author: Simon Willnauer <simonw@apache.org>
Date:   Thu Aug 20 16:06:51 2015 +0200

    restore @Network

commit c840f1d
Author: Simon Willnauer <simonw@apache.org>
Date:   Thu Aug 20 16:02:30 2015 +0200

     Use NetworkAddress.formatAddress where applicable in plugins

commit 374ce87
Author: Simon Willnauer <simonw@apache.org>
Date:   Thu Aug 20 15:34:06 2015 +0200

    Use NetworkAddress.formatAddress where applicable

commit e7a606d
Author: Simon Willnauer <simonw@apache.org>
Date:   Thu Aug 20 10:17:57 2015 +0200

    Add @multicast annotation to disable multicast tests by default.

    We only run multicast tests now when we explicitly state it. A working
    multicast env is required which is not always the case.

commit 2d7d2d0
Author: Simon Willnauer <simonw@apache.org>
Date:   Thu Aug 20 09:51:28 2015 +0200

    Remove extra check for local mode in InternalTestCluster

commit dda59ac
Author: Simon Willnauer <simonw@apache.org>
Date:   Thu Aug 20 09:37:03 2015 +0200

    Handle node mode across entire test cluster

    We used static methods reading sys properties to define the node mode
    per cluster. this had lots of problems when tests couldn't cope with
    mixed or only local mode. Now we are passing it down to the cluster from the test
    which allows to @SuppressNetworkMode / @SupressLocalMode on the test to force
    consistent node configurations.

commit 058197b
Author: Robert Muir <rmuir@apache.org>
Date:   Thu Aug 20 03:19:14 2015 -0400

    really ban InetSocketAddress's trappy method and break build and go to sleep, sorry

commit ac87791
Author: Robert Muir <rmuir@apache.org>
Date:   Thu Aug 20 03:16:52 2015 -0400

    Ban methods that might surprisingly cause DNS lookups

commit e64fe3d
Author: Robert Muir <rmuir@apache.org>
Date:   Thu Aug 20 02:59:05 2015 -0400

    Add unit test

commit f15434f
Author: Robert Muir <rmuir@apache.org>
Date:   Thu Aug 20 02:39:02 2015 -0400

    fix ipv6 formatting bugs

commit 05c2c74
Author: Robert Muir <rmuir@apache.org>
Date:   Thu Aug 20 02:12:05 2015 -0400

    format addresses correctly so I can actually read what comes out of our logs and stats apis

commit 4f9389d
Author: Robert Muir <rmuir@apache.org>
Date:   Wed Aug 19 21:26:52 2015 -0400

    ban dangerous methods in java.net

commit 6aacd4d
Author: Robert Muir <rmuir@apache.org>
Date:   Wed Aug 19 20:59:24 2015 -0400

    ban lenient method

commit f466a84
Author: Simon Willnauer <simonw@apache.org>
Date:   Thu Aug 20 00:29:00 2015 +0200

    fix tests to not mix local transport and zen unicast disco

commit 0de007a
Author: Simon Willnauer <simonw@apache.org>
Date:   Thu Aug 20 00:10:07 2015 +0200

    fix tests to not mix local transport and zen unicast disco

commit 539f6ca
Author: Simon Willnauer <simonw@apache.org>
Date:   Thu Aug 20 00:02:01 2015 +0200

    fix tests to not mix local transport and zen unicast disco

commit 004c288
Author: Robert Muir <rmuir@apache.org>
Date:   Wed Aug 19 17:51:45 2015 -0400

    Fix multinode

commit 54113af
Author: Robert Muir <rmuir@apache.org>
Date:   Wed Aug 19 17:36:45 2015 -0400

    fix integration tests

commit 0156a77
Author: Simon Willnauer <simonw@apache.org>
Date:   Wed Aug 19 23:32:18 2015 +0200

    enable multicast in MulticastZenPingIT.java

commit 1791caa
Author: Robert Muir <rmuir@apache.org>
Date:   Wed Aug 19 17:23:16 2015 -0400

    Fix constant

commit 22820b5
Author: Simon Willnauer <simonw@apache.org>
Date:   Wed Aug 19 22:59:09 2015 +0200

    give it some extra ids for local transport crazyness

commit b2138fa
Author: Simon Willnauer <simonw@apache.org>
Date:   Wed Aug 19 22:51:42 2015 +0200

    pass on local addresses from configured transport rather than hard code IP addresses

commit 1bf5de1
Author: Simon Willnauer <simonw@apache.org>
Date:   Wed Aug 19 22:04:31 2015 +0200

    fix PluggableTransportModuleIT.java to use local disco and detach port limit for node local disco

commit b6706ed
Author: Robert Muir <rmuir@apache.org>
Date:   Wed Aug 19 14:16:03 2015 -0400

    Default to unicast discovery, with default host list of 127.0.0.1, [::1]
rjernst added a commit to rjernst/elasticsearch that referenced this pull request Aug 20, 2015
Multicast has known issues (see elastic#12999 and elastic#12993). This change moves
multicast into a plugin, and deprecates it in the docs.  It also allows
for plugging in multiple zen ping implementations.

closes elastic#13019
This was referenced Aug 21, 2015
@brwe brwe removed the v2.0.0 label Aug 21, 2015
@s1monw s1monw deleted the rmuir:unicast_all_the_way_down branch Aug 21, 2015
@clintongormley clintongormley removed the WIP label Aug 24, 2015
karmi added a commit to elastic/elasticsearch-ruby that referenced this pull request Sep 17, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.