Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

elasticsearch fails to start tribe node #14573

Closed
kt97679 opened this issue Nov 6, 2015 · 39 comments

Comments

Projects
None yet
10 participants
@kt97679
Copy link

commented Nov 6, 2015

Hi folks,

I'm trying to start tribe node using following config:

transport.tcp.port: 9301
http.port: 9201
network.host: 0.0.0.0
path.data: /var/lib/elasticsearch/
path.logs: /var/log/elasticsearch/

tribe:
    kibana:
        cluster.name: logstash-kibana
        discovery.zen.ping.multicast.enabled: false
        discovery.zen.ping.unicast.hosts: ["127.0.0.1"]
    els:
        cluster.name: logstash-data
        discovery.zen.ping.multicast.enabled: false
        discovery.zen.ping.unicast.hosts: ["10.128.69.48", "10.128.75.237"]

This config resides in the file /etc/tribe-elasticseach/elasticsearch.yml. I'm starting it using following command:

sudo -u elasticsearch /usr/share/elasticsearch/bin/elasticsearch -Ddefault.path.conf=/etc/tribe-elasticsearch/

Elasticsearch fails with following output:

[2015-11-05 17:07:42,433][INFO ][node                     ] [Bucky] version[2.0.0], pid[25943], build[de54438/2015-10-22T08:09:48Z]
[2015-11-05 17:07:42,434][INFO ][node                     ] [Bucky] initializing ...
[2015-11-05 17:07:42,596][INFO ][plugins                  ] [Bucky] loaded [], sites []
Exception in thread "main" java.security.AccessControlException: access denied ("java.io.FilePermission" "/usr/share/elasticsearch/config/elasticsearch.yml" "read")
        at java.security.AccessControlContext.checkPermission(AccessControlContext.java:457)
        at java.security.AccessController.checkPermission(AccessController.java:884)
        at java.lang.SecurityManager.checkPermission(SecurityManager.java:549)
        at java.lang.SecurityManager.checkRead(SecurityManager.java:888)
        at sun.nio.fs.UnixPath.checkRead(UnixPath.java:795)
        at sun.nio.fs.UnixFileSystemProvider.checkAccess(UnixFileSystemProvider.java:290)
        at java.nio.file.Files.exists(Files.java:2385)
        at org.elasticsearch.node.internal.InternalSettingsPreparer.prepareEnvironment(InternalSettingsPreparer.java:87)
        at org.elasticsearch.node.Node.(Node.java:128)
        at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:145)
        at org.elasticsearch.tribe.TribeService.(TribeService.java:136)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
        at <<>>
        at org.elasticsearch.node.Node.(Node.java:198)
        at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:145)
        at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:170)
        at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:270)
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:35)

I'm not sure why it tries to access /usr/share/elasticsearch/config/elasticsearch.yml. There is no such file in the elasticsearch deb package. I created this file, but command above still fails with same output. Please advise how this can be resolved.

I'm running elasticsearch 2.0.0 installed from the debian package downloaded from the official site. I'm using ubuntu 14.

Thanks,
Kirill.

@erwan-koffi erwan-koffi referenced this issue Nov 7, 2015

Closed

tribe #14599

@clintongormley

This comment has been minimized.

Copy link
Member

commented Nov 9, 2015

@kt97679

This comment has been minimized.

Copy link
Author

commented Nov 9, 2015

Hi @clintongormley , thanks for quick response. As you can see from the description I've provided I was using option -Ddefault.path.conf. I tried again same command with option --path.conf. There was no exception because of config access issue, but I had to specify also --path.data and --path.logs because for some reason those settings were ignored in the config I've provided. In my config I also specify nonstandard ports to use and those settings are also not used. Any advise what can be wrong?

Thanks,
Kirill.

@kt97679

This comment has been minimized.

Copy link
Author

commented Nov 9, 2015

Looks like config is ignored completely. If I specify all options via command line I still get exception like above:

# sudo -u elasticsearch /usr/share/elasticsearch/bin/elasticsearch --path.conf=/etc/tribe-elasticsearch/ --path.logs=/var/log/elasticsearch --path.data=/var/lib/elasticsearch/ --transport.tcp.port=9301 --http.port=9201 --network.host=0.0.0.0 --tribels.cluster.name=logstash-data --tribe.els.discovery.zen.ping.multicast.enabled=false --tribe.els.discovery.zen.ping.unicast.hosts=["10.128.69.48","10.128.75.237"]                                                                                                              
log4j:WARN No appenders could be found for logger (bootstrap).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" java.security.AccessControlException: access denied ("java.io.FilePermission" "/usr/share/elasticsearch/config/elasticsearch.yml" "read")
        at java.security.AccessControlContext.checkPermission(AccessControlContext.java:457)
        at java.security.AccessController.checkPermission(AccessController.java:884)
        at java.lang.SecurityManager.checkPermission(SecurityManager.java:549)
        at java.lang.SecurityManager.checkRead(SecurityManager.java:888)
        at sun.nio.fs.UnixPath.checkRead(UnixPath.java:795)
        at sun.nio.fs.UnixFileSystemProvider.checkAccess(UnixFileSystemProvider.java:290)
        at java.nio.file.Files.exists(Files.java:2385)
        at org.elasticsearch.node.internal.InternalSettingsPreparer.prepareEnvironment(InternalSettingsPreparer.java:87)
        at org.elasticsearch.node.Node.(Node.java:128)
        at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:145)
        at org.elasticsearch.tribe.TribeService.(TribeService.java:136)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
        at <<>>
        at org.elasticsearch.node.Node.(Node.java:198)
        at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:145)
        at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:170)
        at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:270)
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:35)
@clintongormley

This comment has been minimized.

Copy link
Member

commented Nov 17, 2015

Thanks for persisting. I've managed to replicate this and it is indeed a bug.

When the tribe node attempts to instantiate a node for the tribe service, it checks for access to the config directory, but that setting is no longer available to it and so it defaults to checking for path.home.

This can be replicated with a simple config file, saved as foo/elasticsearch.yml:

node.name: foo

tribe:
    foo:
        cluster.name: bar

Start elasticsearch as:

./elasticsearch-2.0.0/bin/elasticsearch --path.conf foo/

And it fails with:

[2015-11-17 13:54:47,763][INFO ][node                     ] [foo] version[2.0.0], pid[5940], build[de54438/2015-10-22T08:09:48Z]
[2015-11-17 13:54:47,763][INFO ][node                     ] [foo] initializing ...
[2015-11-17 13:54:47,836][INFO ][plugins                  ] [foo] loaded [], sites []
Exception in thread "main" java.security.AccessControlException: access denied ("java.io.FilePermission" "/Users/clinton/workspace/servers/elasticsearch-2.0.0/config/elasticsearch.yml" "read")
  at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
  at java.security.AccessController.checkPermission(AccessController.java:884)
  at java.lang.SecurityManager.checkPermission(SecurityManager.java:549)
  at java.lang.SecurityManager.checkRead(SecurityManager.java:888)
  at sun.nio.fs.UnixPath.checkRead(UnixPath.java:795)
  at sun.nio.fs.UnixFileSystemProvider.checkAccess(UnixFileSystemProvider.java:290)
  at java.nio.file.Files.exists(Files.java:2385)
  at org.elasticsearch.node.internal.InternalSettingsPreparer.prepareEnvironment(InternalSettingsPreparer.java:87)
  at org.elasticsearch.node.Node.<init>(Node.java:128)
  at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:145)
  at org.elasticsearch.tribe.TribeService.<init>(TribeService.java:136)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
  at <<<guice>>>
  at org.elasticsearch.node.Node.<init>(Node.java:198)
  at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:145)
  at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:170)
  at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:270)
  at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:35)
@clintongormley

This comment has been minimized.

Copy link
Member

commented Nov 17, 2015

@javanna could you take a look at this please?

@javanna

This comment has been minimized.

Copy link
Member

commented Nov 17, 2015

I had a look at this. Only selected settings are forwarded to the inner tribe clients from the tribe node. path.home is one of them but path.conf is not. That said, if I remember correctly the tribe clients shouldn't read from configuration file (and sysprops) but only inherit a few settings from the parent node (like it happens in TribeService), something that we had enforced with #9721. I think something got lost with #13383 where loadConfigSettings was removed, which was our way to prevent loading anything from the config file. With that set to false I believe we wouldn't even check for the existence of the file, thus we wouldn't need any permission for that. At this point it seems to me that we would have to forward path.conf to the tribe clients just because we are going to check for its existence at some point although we have nothing to load from it (otherwise we check for path.home that we have no permissions for)? I think I'd need @rjernst to verify if what I explained makes any sense, it might be that I overlooked something.

@rjernst

This comment has been minimized.

Copy link
Member

commented Nov 17, 2015

If I understand the tribe node correctly, it is no different than any other client node (well, creating multiple client nodes internally). So to me, it should be passing along any settings it needs to configure the node (including path.conf). However, I'm not sure what this has to do with the transport client? The transport client by definition now does not use the config file settings (and the stack trace shown above indicates the exception was from building a node, not a transport client).

@javanna

This comment has been minimized.

Copy link
Member

commented Nov 18, 2015

However, I'm not sure what this has to do with the transport client?

@rjernst it doesn't have to do directly with the transport client, but the inner tribe nodes have a similar requirement when it comes to loading from config file. They should not be reading out of the config file but only inherit some selected settings from their "parent" node (the actual tribe node), and that is why we were previously setting loadConfigSettings to false, which is now removed though. If my analysis is correct security manager barfs because we check if the config file exists while creating inner client nodes as part of TribeService, but we shouldn't need to read from that file at that point anyway. I could forward the path.conf setting to the client nodes too, but I feel it is not the right fix given that we should not be reading from that file nor check if it exists. Not sure what the right fix is though.

@javanna

This comment has been minimized.

Copy link
Member

commented Nov 19, 2015

I looked deeper, I can confirm this is not just a problem around passing in the right path.conf to the inner nodes. The inner client nodes must not read from the main configuration file, something that was fixed in #9721. The option to not load from config settings for a node was though removed with #13383. I had expected TribeUnitTests to fail after that change but it doesn't unfortunately. If you try setting for instance transport.tcp.port in the configuration file, the tribe node will get that port, but the inner nodes will try to get that one too and will fail. The inner nodes should only get some selected settings from their parent node but never read from config file or system properties.

@ESamir

This comment has been minimized.

Copy link

commented Nov 19, 2015

+1
Removing the path.conf did not resolve the issue

the config used

bootstrap:
  mlockall: true
cluster:
  name: tribe.elk.h2.com
discovery:
  zen:
    minimum_master_nodes: 2
    ping:
      unicast:
        hosts:
             - h2-clt01
             - h2-clt02
             - h2-clt03
network:
  host: h2-clt01
node:
  data: false
  master: true
  name: h2-ct01-h2-ct01
path:
  data: /data/h2-ct01
tribe:
  h2:
    cluster:
      name: elk.h2.com
    discovery:
      zen:
        ping:
          unicast:
            hosts:
                 - h2-cm01
                 - h2-cm02
                 - h2-cm03
  h3:
    cluster:
      name: elk.h3.com
    discovery:
      zen:
        ping:
          unicast:
            hosts:
                 - h3-cm01
                 - h3-cm02
                 - h3-cm03
@clintongormley

This comment has been minimized.

Copy link
Member

commented Nov 19, 2015

There is a workaround for this bug. Assuming your tribe config directory is /etc/tribe/:

cd /etc
cp -a /etc/tribe /etc/tribe-client
echo "" > /etc/tribe-client/elasticsearch.yml
chown -R elasticsearch /etc/tribe-client

Then edit /etc/tribe/elasticsearch.yml and specify a path.conf for each tribe cluster, eg:

# arbitrary config
transport.tcp.port: 9301
http.port: 9201
network.host: 0.0.0.0
path.data: /var/lib/elasticsearch/
path.logs: /var/log/elasticsearch/

tribe:
    kibana:
        path.conf: /etc/tribe-client  ### ADD THIS LINE
        cluster.name: logstash-kibana
        discovery.zen.ping.multicast.enabled: false
        discovery.zen.ping.unicast.hosts: ["127.0.0.1"]
    els:
        path.conf: /etc/tribe-client  ### ADD THIS LINE
        cluster.name: logstash-data
        discovery.zen.ping.multicast.enabled: false
        discovery.zen.ping.unicast.hosts: ["10.128.69.48", "10.128.75.237"]

Then start elasticsearch as:

./bin/elasticsearch --path.conf /etc/tribe

The tribe node will use /etc/tribe/ as its config directory. Then the tribe node starts a node client for each cluster, and will use /etc/tribe-client as its config directory, but /etc/tribe-client/elasticsearch.yml is empty, so no settings will be loaded.

@javanna

This comment has been minimized.

Copy link
Member

commented Nov 19, 2015

Workaround above works, the only caveat is that depending on where the additional empty configuration file is located, we might not have the permissions to read from it. I think it should work if we simply add an empty configuration file under the tribe node config and point right to it, not only specifying its parent directory but the complete path that includes the filename:

tribe.t1.path.conf: /path/to/config/tribe.yml
tribe.t2.path.conf: /path/to/config/tribe.yml
@ppf2

This comment has been minimized.

Copy link
Member

commented Nov 19, 2015

Ran into this last night when attempting to set up a tribe node on 2.0. This will also affect users who attempt to set a custom transport.tcp.port for the tribe node. In this case, setting a custom transport.tcp.port for the tribe node causes a misleading BindException[Address already in use]; exception when the port specified is not actually already in use.

cluster.name: elasticsearch_2_0_0_tribe_cluster
network.host: 127.0.0.1
transport.tcp.port: 11111
node.name: tribe_cluster_node1
tribe:
  t1:
    cluster.name: elasticsearch_2_0_0_cluster1
  t2:
    cluster.name: elasticsearch_2_0_0_cluster2

Settings for the 2 clusters:

cluster.name: elasticsearch_2_0_0_cluster2
network.host: 127.0.0.1
transport.tcp.port: 9301
http.port: 9201
node.name: cluster2_node1

and

cluster.name: elasticsearch_2_0_0_cluster1
network.host: 127.0.0.1
transport.tcp.port: 9300
http.port: 9200
node.name: cluster1_node1

The problem is that the tribe node will not start up as long as I have the transport.tcp.port: 11111 in place. If I don't set a custom transport port for the tribe node, it starts up fine and can connect with the 2 clusters.

The following is the error that shows up when I attempt to set transport.tcp.port for the tribe node. Note that prior to starting the tribe node, I used lsof to confirm that there's no process on the machine using port 11111 (and it doesn't matter what port I set it to, as long as transport.tcp.port is set for the tribe node, it will throw the same bind exception).

[2015-11-19 01:09:12,816][DEBUG][discovery.zen.elect      ] [tribe_cluster_node1/t1] using minimum_master_nodes [-1]

[2015-11-19 01:09:12,816][DEBUG][discovery.zen.ping.unicast] [tribe_cluster_node1/t1] using initial hosts [127.0.0.1, [::1]], with concurrent_connects [10]

[2015-11-19 01:09:12,817][DEBUG][discovery.zen            ] [tribe_cluster_node1/t1] using ping.timeout [3s], join.timeout [1m], master_election.filter_client [true], master_election.filter_data [false]

[2015-11-19 01:09:12,817][DEBUG][discovery.zen.fd         ] [tribe_cluster_node1/t1] [master] uses ping_interval [1s], ping_timeout [30s], ping_retries [3]

[2015-11-19 01:09:12,817][DEBUG][discovery.zen.fd         ] [tribe_cluster_node1/t1] [node  ] uses ping_interval [1s], ping_timeout [30s], ping_retries [3]

[2015-11-19 01:09:12,820][DEBUG][script                   ] [tribe_cluster_node1/t1] using script cache with max_size [100], expire [null]

[2015-11-19 01:09:12,853][DEBUG][cluster.routing.allocation.decider] [tribe_cluster_node1/t1] using [cluster.routing.allocation.allow_rebalance] with [indices_all_active]

[2015-11-19 01:09:12,853][DEBUG][cluster.routing.allocation.decider] [tribe_cluster_node1/t1] using [cluster_concurrent_rebalance] with [2]

[2015-11-19 01:09:12,854][DEBUG][cluster.routing.allocation.decider] [tribe_cluster_node1/t1] using node_concurrent_recoveries [2], node_initial_primaries_recoveries [4]

[2015-11-19 01:09:12,855][DEBUG][gateway                  ] [tribe_cluster_node1/t1] using initial_shards [quorum]

[2015-11-19 01:09:12,885][DEBUG][indices.recovery         ] [tribe_cluster_node1/t1] using max_bytes_per_sec[40mb], concurrent_streams [3], file_chunk_size [512kb], translog_size [512kb], translog_ops [1000], and compress [true]

[2015-11-19 01:09:12,886][DEBUG][indices.store            ] [tribe_cluster_node1/t1] using indices.store.throttle.type [NONE], with index.store.throttle.max_bytes_per_sec [10gb]

[2015-11-19 01:09:12,886][DEBUG][indices.memory           ] [tribe_cluster_node1/t1] using indexing buffer size [99mb], with indices.memory.min_shard_index_buffer_size [4mb], indices.memory.max_shard_index_buffer_size [512mb], indices.memory.shard_inactive_time [5m], indices.memory.interval [30s]

[2015-11-19 01:09:12,887][DEBUG][indices.cache.query      ] [tribe_cluster_node1/t1] using [node] query cache with size [10%], actual_size [99mb], max filter count [1000]

[2015-11-19 01:09:12,887][DEBUG][indices.fielddata.cache  ] [tribe_cluster_node1/t1] using size [-1] [-1b], expire [null]

[2015-11-19 01:09:12,897][INFO ][node                     ] [tribe_cluster_node1/t1] initialized

[2015-11-19 01:09:12,906][INFO ][node                     ] [tribe_cluster_node1] initialized

[2015-11-19 01:09:12,907][INFO ][node                     ] [tribe_cluster_node1] starting ...

[2015-11-19 01:09:12,924][DEBUG][netty.channel.socket.nio.SelectorUtil] Using select timeout of 500

[2015-11-19 01:09:12,924][DEBUG][netty.channel.socket.nio.SelectorUtil] Epoll-bug workaround enabled = false

[2015-11-19 01:09:12,947][DEBUG][transport.netty          ] [tribe_cluster_node1] using profile[default], worker_count[8], port[11111], bind_host[null], publish_host[null], compress[false], connect_timeout[30s], connections_per_node[2/3/6/1/1], receive_predictor[512kb->512kb]

[2015-11-19 01:09:12,957][DEBUG][transport.netty          ] [tribe_cluster_node1] binding server bootstrap to: 127.0.0.1

[2015-11-19 01:09:12,985][DEBUG][transport.netty          ] [tribe_cluster_node1] Bound profile [default] to address {127.0.0.1:11111}

[2015-11-19 01:09:12,986][INFO ][transport                ] [tribe_cluster_node1] publish_address {127.0.0.1:11111}, bound_addresses {127.0.0.1:11111}

[2015-11-19 01:09:12,993][DEBUG][discovery.local          ] [tribe_cluster_node1] Connected to cluster [Cluster [elasticsearch_2_0_0_tribe_cluster]]

[2015-11-19 01:09:12,996][INFO ][discovery                ] [tribe_cluster_node1] elasticsearch_2_0_0_tribe_cluster/baK4hDMwRiaKGS5D8ivYng

[2015-11-19 01:09:12,996][WARN ][discovery                ] [tribe_cluster_node1] waited for 0s and no initial state was set by the discovery

[2015-11-19 01:09:12,996][DEBUG][gateway                  ] [tribe_cluster_node1] can't wait on start for (possibly) reading state from gateway, will do it asynchronously

[2015-11-19 01:09:13,010][DEBUG][http.netty               ] [tribe_cluster_node1] Bound http to address {127.0.0.1:22222}

[2015-11-19 01:09:13,011][INFO ][http                     ] [tribe_cluster_node1] publish_address {127.0.0.1:22222}, bound_addresses {127.0.0.1:22222}

[2015-11-19 01:09:13,011][INFO ][node                     ] [tribe_cluster_node1/t2] starting ...

[2015-11-19 01:09:13,016][DEBUG][transport.netty          ] [tribe_cluster_node1/t2] using profile[default], worker_count[8], port[11111], bind_host[null], publish_host[null], compress[false], connect_timeout[30s], connections_per_node[2/3/6/1/1], receive_predictor[512kb->512kb]

[2015-11-19 01:09:13,022][DEBUG][transport.netty          ] [tribe_cluster_node1/t2] binding server bootstrap to: 127.0.0.1

[2015-11-19 01:09:13,039][INFO ][node                     ] [tribe_cluster_node1/t2] stopping ...

[2015-11-19 01:09:13,041][INFO ][node                     ] [tribe_cluster_node1/t2] stopped

[2015-11-19 01:09:13,042][INFO ][node                     ] [tribe_cluster_node1/t2] closing ...

[2015-11-19 01:09:13,048][INFO ][node                     ] [tribe_cluster_node1/t2] closed

[2015-11-19 01:09:13,048][INFO ][node                     ] [tribe_cluster_node1/t1] closing ...

[2015-11-19 01:09:13,052][INFO ][node                     ] [tribe_cluster_node1/t1] closed

Exception in thread "main" BindTransportException[Failed to bind to [11111]]; nested: ChannelException[Failed to bind to: /127.0.0.1:11111]; nested: BindException[Address already in use];

Likely root cause: java.net.BindException: Address already in use

at sun.nio.ch.Net.bind0(Native Method)

at sun.nio.ch.Net.bind(Net.java:444)

at sun.nio.ch.Net.bind(Net.java:436)

at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)

at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)

at org.jboss.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193)

at org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391)

at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315)

at org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)

at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)

at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

Refer to the log for complete error details.

[2015-11-19 01:09:13,058][INFO ][node                     ] [tribe_cluster_node1] stopping ...

[2015-11-19 01:09:13,064][INFO ][node                     ] [tribe_cluster_node1] stopped

[2015-11-19 01:09:13,064][INFO ][node                     ] [tribe_cluster_node1] closing ...

[2015-11-19 01:09:13,066][INFO ][node                     ] [tribe_cluster_node1] closed

Note that I cannot reproduce this on 1.7.2. On 1.7.2, I can set up a custom transport.tcp.port for the tribe node and it will start up fine.

@javanna

This comment has been minimized.

Copy link
Member

commented Nov 19, 2015

@ppf2 this happens because the tribe node process will start three nodes, the first one will get the configured port, and the second will try to get the same one as it reads from the same configuration file. The workaround provided by Clint above should work till we fix this properly.

@rjernst

This comment has been minimized.

Copy link
Member

commented Nov 19, 2015

@javanna I am going to explore having the tribe node have its own subclass of Node which can customize this single behavior (how to get the node's settings). I don't think we should add back this general purpose flag as we need to keep the tons of ways Nodes can be configured to a minimum.

@javanna

This comment has been minimized.

Copy link
Member

commented Nov 20, 2015

@rjernst thanks that sounds good to me.

@ppf2

This comment has been minimized.

Copy link
Member

commented Nov 20, 2015

Confirmed that the workaround works to prevent the BindTransportException error, thx!

@ppf2

This comment has been minimized.

Copy link
Member

commented Nov 20, 2015

@rjernst Do we have a sense of whether the fix will make it to the upcoming 2.1 release? Or will it likely be after 2.1 (i.e. use the workaround until a later 2.x release)?

@rjernst

This comment has been minimized.

Copy link
Member

commented Nov 20, 2015

@ppf2 Definitely after 2.1. I would not want to destabilize 2.1 with a refactoring like this.

@ppf2

This comment has been minimized.

Copy link
Member

commented Nov 20, 2015

@rjernst sounds good, thx!

@javanna javanna assigned rjernst and unassigned javanna Nov 20, 2015

@clintongormley

This comment has been minimized.

Copy link
Member

commented Dec 2, 2015

This requires some fairly extensive changes, so we will target this for 2.2. In the meantime, we should document the workaround in the 2.1 docs.

@rjernst

This comment has been minimized.

Copy link
Member

commented Dec 8, 2015

I opened a PR to fix this here: #15300.

Note that I was able to do the fix simply enough that I think it will be ok to backport to 2.1.x

@clintongormley

This comment has been minimized.

Copy link
Member

commented Dec 9, 2015

thanks @rjernst

@ppf2

This comment has been minimized.

Copy link
Member

commented Dec 10, 2015

Thanks @rjernst !

@lb425

This comment has been minimized.

Copy link

commented Dec 12, 2015

I'm late to the party but thought this might be useful for anyone coming across this. I found that the dummy config file isn't needed to work around the issue. Instead for creating a new directory (/etc/tribe-client in the example) path.conf can reference the current configuration directory.

Using the above example where the config directory was /etc/tribe

arbitrary config

transport.tcp.port: 9301
http.port: 9201
network.host: 0.0.0.0
path.data: /var/lib/elasticsearch/
path.logs: /var/log/elasticsearch/

tribe:
kibana:
path.conf: /etc/tribe #
cluster.name: logstash-kibana
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["127.0.0.1"]
els:
path.conf: /etc/tribe #
cluster.name: logstash-data
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["10.128.69.48", "10.128.75.237"]

@kt97679

This comment has been minimized.

Copy link
Author

commented Dec 22, 2015

Is this fixed in 2.1.1?

@thn-dev

This comment has been minimized.

Copy link

commented Dec 30, 2015

With v2.1.1, I still have to specify path.conf and I used the valid path as mentioned above by lb425. In my case, I also had to specify path.plugins for similar reason. Otherwise, I kept getting AccessControlException error.

I did not have to specify both path.conf and path.plugins when I was using v1.7.3

@thn-dev

This comment has been minimized.

Copy link

commented Dec 31, 2015

WRT ES v2.1.1, I have to do the following to get the tribe node talking to two different clusters: cluster A and cluster B

# tribe node's configuration (elasticsearch.yml)
network.host: 0.0.0.0
transport.tcp.port: 9300
http.port: 9200
http.enabled: true

tribe.t1.cluster.name:
tribe.t1.discovery.zen.ping.unicast.hosts: <cluster A's master node>
tribe.t1.discovery.zen.ping.multicast.enabled: false
tribe.t1.path.conf: <valid path/to/conf>
tribe.t1.path.plugins: <valid path/to/plugin>
tribe.t1.network.bind_host: 0.0.0.0
tribe.t1.network.publish_host: <tribe node's IP>
tribe.t1.transport.tcp.port:

repeat the same block but replace "t1" to "t2" for cluster B and fill in proper info related to cluster B but keep the tribe.t2.network.* the same with different tribe.t2.transport.tcp.port value from t1 if specified

@rjernst

This comment has been minimized.

Copy link
Member

commented Dec 31, 2015

@thn-dev Setting network and path settings for tribe nodes (the t1, t2 here) should not be necessary. Can you share your full elasticsearch.yml for both the tribe node, as well as cluster A and cluster B?

@thn-dev

This comment has been minimized.

Copy link

commented Jan 1, 2016

@rjernst I did not have to do network and path settings when I was using v1.7.3. It was a surprise to me when v2.1.1 kept giving me AccessControlException error message. Initially, it pointed to the "plugins" location, after I set it, it complained about the "config" location. If I did not do the network settings for t1 and t2, it was not able to connect to cluster A and/or B. This part is weird too. Again, I did not have to do this in v1.7.3.

All ES instances are installed using .rpm file, not .zip file.

My settings for tribe node is above with additional parameters

  • cluster.name
  • discovery.zen.ping.multicast.enabled: false

Cluster A and B, each has 1 master node, 3 data nodes with the following parameters' settings (I don't have all information with me at the moment)

  • cluster.name:
  • network.host: 0.0.0.0
  • transport.tcp.port: 9300
  • http.port: 9200 (master)
  • http.enabled: true (master)
  • discovery.zen.ping.multicast.enabled: false
  • discovery.zen.ping.unicast.hosts: <master node's IP>
  • path.conf: /data/es/config
  • path.plugins: /data/es/plugins
  • path.data: /data/es
@rjernst

This comment has been minimized.

Copy link
Member

commented Jan 5, 2016

@thn-dev I tried a very minimal configuration with both 2.1.1, and the 2.2 branch. The tribe settings necessary were only cluster.name and discovery.zen.ping.unicast.hosts. If you can reproduce, please create a new issue.

@thn-dev

This comment has been minimized.

Copy link

commented Jan 6, 2016

@rjernst Thank you for looking into it. As I mentioned before, I did not have to do that in v1.7.3. One thing I do know when I upgraded ES from v1.7.3 to 2.1.1, I did "rpm -Uvh elasticsearch-.rpm" instead of removing v1.7.3 completely. Everything that I have described so far is running in CentOS 6.5 or 6.7. I'm in the middle of doing a stress test right now, once I have the opportunity to redo the cluster, I will report back if installing ES 2.1.1 from scratch would make a difference or not.

Once again, thank you.

@TinLe

This comment has been minimized.

Copy link

commented Feb 29, 2016

Still broken in ES 2.2.0.

Simple setup. One tribe node, and a cluster of 7 nodes, all running ES 2.2.0.

Config for tribe.

cluster.name: psec-tribe-elasticsearch-ela4
node.name: ela4-app7246
node.master: false
node.data: false
node.max_local_storage_nodes: 1

path.data: /export/content/data/
path.plugins: /export/content/lid/apps/psec-tribe-elasticsearch/i001/plugins

################################## Tribe ################################
tribe:
   prod-ltx1_psec-elasticsearch:
     cluster.name: psec-elasticsearch_prod-ltx1
     path.conf: /export/content/lid/apps/psec-tribe-elasticsearch/i001/elasticsearch/config/psec-tribe-elasticsearch/
     path.home: /export/content/lid/apps/psec-tribe-elasticsearch/i001/
     path.plugins: /export/content/lid/apps/psec-tribe-elasticsearch/i001/plugins/
     discovery.zen.ping.unicast.hosts:
      - ltx1-app9624
      - ltx1-app9495

Cluster is up and running, reachable.

Log from tribe node when ES is started.

[2016-02-29 23:04:22,694][INFO ][node                     ] [ela4-app7246.prod] version[2.2.0], pid[17205], build[8ff36d1/2016-01-27T13:32:39Z]
[2016-02-29 23:04:22,694][INFO ][node                     ] [ela4-app7246.prod] initializing ...
[2016-02-29 23:04:22,925][INFO ][plugins                  ] [ela4-app7246.prod] modules [], plugins [license, kopf], sites [kopf]
[2016-02-29 23:04:24,160][INFO ][node                     ] [ela4-app7246.prod/prod-ltx1_lotr-elasticsearch] version[2.2.0], pid[17205], build[8ff36d1/2016-01-27T13:32:39Z]
[2016-02-29 23:04:24,160][INFO ][node                     ] [ela4-app7246.prod/prod-ltx1_lotr-elasticsearch] initializing ...
[2016-02-29 23:04:24,236][INFO ][plugins                  ] [ela4-app7246.prod/prod-ltx1_lotr-elasticsearch] modules [], plugins [license, kopf], sites [kopf]
[2016-02-29 23:04:24,344][INFO ][node                     ] [ela4-app7246.prod/prod-ltx1_lotr-elasticsearch] initialized
[2016-02-29 23:04:24,349][INFO ][node                     ] [ela4-app7246.prod] initialized
[2016-02-29 23:04:24,349][INFO ][node                     ] [ela4-app7246.prod] starting ...
[2016-02-29 23:04:24,589][INFO ][transport                ] [ela4-app7246.prod] publish_address {ela4-app7246.prod/172.25.22.199:9300}, bound_addresses {[::]:9300}
[2016-02-29 23:04:24,594][INFO ][discovery                ] [ela4-app7246.prod] psec-tribe-elasticsearch-ela4/yNB3ZuZOSceWreWyi-825Q
[2016-02-29 23:04:24,594][WARN ][discovery                ] [ela4-app7246.prod] waited for 0s and no initial state was set by the discovery
[2016-02-29 23:04:24,618][INFO ][http                     ] [ela4-app7246.prod] publish_address {ela4-app7246.prod/172.25.22.199:9200}, bound_addresses {[::]:9200}
[2016-02-29 23:04:24,619][INFO ][node                     ] [ela4-app7246.prod/prod-ltx1_lotr-elasticsearch] starting ...
[2016-02-29 23:04:24,661][INFO ][transport                ] [ela4-app7246.prod/prod-ltx1_lotr-elasticsearch] publish_address {127.0.0.1:9301}, bound_addresses {[::1]:9301}, {127.0.0.1:9301}
[2016-02-29 23:04:24,665][INFO ][discovery                ] [ela4-app7246.prod/prod-ltx1_lotr-elasticsearch] lotr-elasticsearch_prod-ltx1/ZCGhRNvCQLWF8beCT3MsUw
[2016-02-29 23:04:27,881][INFO ][discovery.zen            ] [ela4-app7246.prod/prod-ltx1_lotr-elasticsearch] failed to send join request to master [{ltx1-app9495}{1GAZZhN8T5qprr2aalxhRw}{10.149.74.222}{10.149.74.222:9300}{max_local_storage_nodes=1, master=true}], reason [RemoteTransportException[[ltx1-app9495][10.149.74.222:9300][internal:discovery/zen/join]]; nested: ConnectTransportException[[ela4-app7246.prod/prod-ltx1_lotr-elasticsearch][127.0.0.1:9301] connect_timeout[30s]]; nested: NotSerializableExceptionWrapper[Connection refused: /127.0.0.1:9301]; ]
[2016-02-29 23:04:30,974][INFO ][discovery.zen            ] [ela4-app7246.prod/prod-ltx1_lotr-elasticsearch] failed to send join request to master [{ltx1-app9495}{1GAZZhN8T5qprr2aalxhRw}{10.149.74.222}{10.149.74.222:9300}{max_local_storage_nodes=1, master=true}], reason [RemoteTransportException[[ltx1-app9495][10.149.74.222:9300][internal:discovery/zen/join]]; nested: ConnectTransportException[[ela4-app7246.prod/prod-ltx1_lotr-elasticsearch][127.0.0.1:9301] connect_timeout[30s]]; nested: NotSerializableExceptionWrapper[Connection refused: /127.0.0.1:9301]; ]
[2016-02-29 23:04:34,062][INFO ][discovery.zen            ] [ela4-app7246.prod/prod-ltx1_lotr-elasticsearch] failed to send join request to master [{ltx1-app9495}{1GAZZhN8T5qprr2aalxhRw}{10.149.74.222}{10.149.74.222:9300}{max_local_storage_nodes=1, master=true}], reason [RemoteTransportException[[ltx1-app9495][10.149.74.222:9300][internal:discovery/zen/join]]; nested: ConnectTransportException[[ela4-app7246.prod/prod-ltx1_lotr-elasticsearch][127.0.0.1:9301] connect_timeout[30s]]; nested: NotSerializableExceptionWrapper[Connection refused: /127.0.0.1:9301]; ]

@bleskes

This comment has been minimized.

Copy link
Member

commented Mar 1, 2016

The problem lies in the publish address of the tribe's client node - it's local host which prevents people from connecting back to the node, which is why it fails to join cluster:

[2016-02-29 23:04:24,661][INFO ][transport                ] [ela4-app7246.prod/prod-ltx1_lotr-elasticsearch] publish_address {127.0.0.1:9301}, bound_addresses {[::1]:9301}, {127.0.0.1:9301}

The tribe itself does bind to a non local address ela4-app7246.prod/172.25.22.199 but I don't see it in the configuration you supplied (we default to localhost in 2.x). Can you share your complete yml file?

@bleskes

This comment has been minimized.

Copy link
Member

commented Mar 1, 2016

@TinLe another option is to supply these settings from the command line, are you perhaps doing that?

@TinLe

This comment has been minimized.

Copy link

commented Mar 1, 2016

@bleskes

The two missing lines are:

network.publish_host: ela4-app7246.prod
network.host: 0.0.0.0

In another email exchange with @sherry-ger, I got the correct settings for the tribe. I need to add the following for tribe to work.

tribe.prod-ltx1_psec-elasticsearch.network.publish_host: ela4-app7246.prod
tribe.prod-ltx1_psec-elasticsearch.network.host: 0.0.0.0

The network.publish_host setting pointing to itself is something I would not have guessed....

In any case, I got tribe working in ES 2.2.0 now.

@rjernst

This comment has been minimized.

Copy link
Member

commented Mar 1, 2016

@TinLe I opened a PR (#16893) to fix the issue of not passing through eg network.publish_host, as well as an issue to validate per tribe client settings to avoid confusion (#16894, you don't need any of the path.* settings there and they are ignored).

@thn-dev

This comment has been minimized.

Copy link

commented Mar 1, 2016

Glad to see a fix for this. Thanks @rjernst @TinLe

@TinLe

This comment has been minimized.

Copy link

commented Mar 2, 2016

FYI, tribe node not passing on settings in elasticsearch.yml is breaking plugins.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.