Tribe client connects directly to client node over transport #16756

ppf2 · 2016-02-21T23:45:01Z

This is observed in a setup where the tribe node does not have firewall access over the transport port to a client node of the downstream cluster:

[2016-02-14 20:52:16,243][WARN ][cluster.service          ] [tribe_node_name/t1] failed to connect to node [{client_node}{kH2yVx_WQ22qHmthaX_NHA}{10.8.17.130}{host_name/IP:9300}{data=false, master=false}]
ConnectTransportException[[client_node][host_name/IP:9300] connect_timeout[30s]]; nested: ConnectTimeoutException[connection timed out: host_name/IP:9300];
    at org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:951)
    at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:884)
    at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:857)
    at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:243)
    at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:474)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.jboss.netty.channel.ConnectTimeoutException: connection timed out: host_name/IP:9300
    at org.jboss.netty.channel.socket.nio.NioClientBoss.processConnectTimeout(NioClientBoss.java:139)
    at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:83)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
    at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
    at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    ... 3 more

The message is benign since there is no reason for the tribe node to connect directly to a downstream cluster's client node. The behavior is likely due to how client nodes work (in general) where they will connect to all nodes in the cluster (and with tribe node being just a specialized client node, it just behaves the same way). Perhaps we can add an exclusion for tribe node so it will not attempt to connect to the client nodes in the downstream clusters, etc..

The text was updated successfully, but these errors were encountered:

javanna · 2016-03-01T15:07:32Z

This seems related to some other issue around client nodes connecting to other client nodes: #16815, #3617, #16105. Looking at the linked issue though, it seems like any client node does not even try to connect to other client nodes, while it should. The problem here seems to be the opposite, an attempt of connection that may not be desirable.

I am not sure about the proposal. Why shouldn't the tribe node connect to the client nodes that are part of the cluster? I think every node should rather be able to connect to whichever other node in the cluster. I understand that the tribe node is already a client of its own, and it doesn't need to connect to other clients nodes when it comes to operations that involve data, but there are apis, like monitoring ones, that do need to have access to client nodes too. My reasoning goes along this other comment.

javanna · 2016-03-02T22:08:20Z

I think with #16898 we made it clear that each node connects to every other node in the cluster, client nodes should not be treated differently, I don't think we should make exceptions for tribe nodes either. Are you ok with this @ppf2 ?

ppf2 · 2016-03-02T22:32:53Z

Thanks @javanna , sounds good. It will be nice though if we document this behavior also in the tribe node documentation - will be helpful for admins out there who have to figure out what ports to open between the tribe node and the other nodes.

Closes #16756

ppf2 · 2016-04-01T01:06:43Z

Reopening this ticket for a follow up discussion. One side effect of the current behavior is that the tribe node log file gets filled up with heaps of exceptions like the one noted at the beginning of this issue. For instance, within a 16 hour period (< 1 day) with just 1 client node in a downstream cluster, the tribe node ends up logging 176Mb of log entries, pretty much filling up the log file with 21K instances of these exception stacks.

While we do not intend to change the design that the tribe node will try to connect to all nodes in the cluster, it can be helpful if we can move these exceptions (when a tribe node attempts to connect to a client node) to the trace level. Thoughts?

javanna · 2016-04-01T05:06:25Z

@ppf2 do you mean the log line that's part of the description of this issue or some other log line?

ppf2 · 2016-04-01T07:28:50Z

Here you go :) We are seeing a ton of these indicating that the tribe node is trying to connect to a client node.

[2016-03-31 03:20:04,750][DEBUG][action.admin.cluster.node.info] [tribe_node] failed to execute on node [A-cIPrviSUiCmoiIzj_GAw]
SendRequestTransportException[[client_node][tribe_node/IP:9300][cluster:monitor/nodes/info[n]]]; nested: NodeNotConnectedException[[client_node][tribe_node/IP:9300] Node not connected];
    at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:323)
    at org.elasticsearch.shield.transport.ShieldServerTransportService.sendRequest(ShieldServerTransportService.java:75)
    at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.start(TransportNodesAction.java:147)
    at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.access$100(TransportNodesAction.java:94)
    at org.elasticsearch.action.support.nodes.TransportNodesAction.doExecute(TransportNodesAction.java:68)
    at org.elasticsearch.action.support.nodes.TransportNodesAction.doExecute(TransportNodesAction.java:44)
    at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:101)
    at org.elasticsearch.shield.action.ShieldActionFilter.apply(ShieldActionFilter.java:113)
    at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:99)
    at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:77)
    at org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:58)
    at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:351)
    at org.elasticsearch.client.FilterClient.doExecute(FilterClient.java:52)
    at org.elasticsearch.rest.BaseRestHandler$HeadersAndContextCopyClient.doExecute(BaseRestHandler.java:83)
    at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:351)
    at org.elasticsearch.client.support.AbstractClient$ClusterAdmin.execute(AbstractClient.java:845)
    at org.elasticsearch.clclient_nodeient.support.AbstractClient$ClusterAdmin.nodesInfo(AbstractClient.java:925)
    at org.elasticsearch.rest.action.admin.cluster.node.info.RestNodesInfoAction.handleRequest(RestNodesInfoAction.java:102)
    at org.elasticsearch.rest.BaseRestHandler.handleRequest(BaseRestHandler.java:54)
    at org.elasticsearch.rest.RestController.executeHandler(RestController.java:207)
    at org.elasticsearch.rest.RestController$RestHandlerFilter.process(RestController.java:281)
    at org.elasticsearch.rest.RestController$ControllerFilterChain.continueProcessing(RestController.java:262)
    at org.elasticsearch.shield.rest.ShieldRestFilter.process(ShieldRestFilter.java:77)
    at org.elasticsearch.rest.RestController$ControllerFilterChain.continueProcessing(RestController.java:265)
    at org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:176)
    at org.elasticsearch.http.HttpServer.internalDispatchRequest(HttpServer.java:128)
    at org.elasticsearch.http.HttpServer$Dispatcher.dispatchRequest(HttpServer.java:86)
    at org.elasticsearch.http.netty.NettyHttpServerTransport.dispatchRequest(NettyHttpServerTransport.java:385)
    at org.elasticsearch.http.netty.HttpRequestHandler.messageReceived(HttpRequestHandler.java:63)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.elasticsearch.http.netty.pipelining.HttpPipeliningHandler.messageReceived(HttpPipeliningHandler.java:60)
    at org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:88)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:145)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.jboss.netty.handler.codec.http.HttpContentDecoder.messageReceived(HttpContentDecoder.java:108)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
    at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
    at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:75)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.jboss.netty.handler.ipfilter.IpFilteringHandlerImpl.handleUpstream(IpFilteringHandlerImpl.java:154)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
    at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
    at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
    at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: NodeNotConnectedException[[client_node][tribe_node/IP:9300] Node not connected]
    at org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:1132)
    at org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:819)
    at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:312)
    ... 75 more

javanna · 2016-04-01T08:56:32Z

Thanks @ppf2! I am not sure we can change log level only when the log line comes from a tribe node. Seems like working around the problem. I think every node should get access to all the other nodes instead, including the client ones.

This specific log line comes from calling nodes info from the tribe node. The tribe node will gather the info from all the nodes, as simple as that. Another way to work around it would be to not use the tribe node for monitoring calls, or filter out some of the nodes from this call (e.g. using node attributes).

javanna · 2016-04-05T06:58:12Z

What I previously provided are workarounds, assuming that the firewall config stays the same. But given that we removed support for the node.client setting in master, and we are moving away from using client nodes with the java api, I wonder why those "client" nodes need to be treated differently. I think they should have their ports accessible, cause that's what the cluster requires.

bleskes · 2016-04-05T07:01:56Z

what @javanna said. The tribe node should be able to connect to any node in the clusters it connects to. Agreed that it's confusing with the current way we treat client nodes as clients, but that's what we're changing...

amazinganshul · 2017-01-24T08:42:23Z

With this issue are we resolving whether tribe nodes in the federation cluster should connect to each other or not? We have a federation cluster with two tribe nodes, but no api output shows that tribe nodes have connected to each other. Is this an expected behaviour? Also is there any documentation on scaling tribe nodes?

ppf2 added the :Tribe Node label Feb 21, 2016

clintongormley added the discuss label Feb 28, 2016

javanna added >docs General docs changes and removed discuss labels Mar 23, 2016

javanna closed this as completed in 4bfef1f Mar 23, 2016

javanna added a commit that referenced this issue Mar 23, 2016

[DOCS] clarify that tribe node connects to every node in every cluster

ff9e37e

Closes #16756

javanna added a commit that referenced this issue Mar 23, 2016

[DOCS] clarify that tribe node connects to every node in every cluster

989a6f3

Closes #16756

javanna added a commit that referenced this issue Mar 23, 2016

[DOCS] clarify that tribe node connects to every node in every cluster

93513a6

Closes #16756

javanna added a commit that referenced this issue Mar 23, 2016

[DOCS] clarify that tribe node connects to every node in every cluster

60cd35c

Closes #16756

javanna added a commit that referenced this issue Mar 23, 2016

[DOCS] clarify that tribe node connects to every node in every cluster

57f81e9

Closes #16756

javanna mentioned this issue Mar 23, 2016

Remove memory section #17278

Merged

ppf2 added the discuss label Apr 1, 2016

ppf2 mentioned this issue Apr 5, 2016

Use nodes info with node Ids when issuing healthchecks? elastic/kibana#6778

Closed

javanna removed the discuss label Apr 6, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tribe client connects directly to client node over transport #16756

Tribe client connects directly to client node over transport #16756

ppf2 commented Feb 21, 2016 •

edited by javanna

Loading

javanna commented Mar 1, 2016

javanna commented Mar 2, 2016

ppf2 commented Mar 2, 2016

ppf2 commented Apr 1, 2016

javanna commented Apr 1, 2016

ppf2 commented Apr 1, 2016

javanna commented Apr 1, 2016

javanna commented Apr 5, 2016

bleskes commented Apr 5, 2016

amazinganshul commented Jan 24, 2017

Tribe client connects directly to client node over transport #16756

Tribe client connects directly to client node over transport #16756

Comments

ppf2 commented Feb 21, 2016 • edited by javanna Loading

javanna commented Mar 1, 2016

javanna commented Mar 2, 2016

ppf2 commented Mar 2, 2016

ppf2 commented Apr 1, 2016

javanna commented Apr 1, 2016

ppf2 commented Apr 1, 2016

javanna commented Apr 1, 2016

javanna commented Apr 5, 2016

bleskes commented Apr 5, 2016

amazinganshul commented Jan 24, 2017

ppf2 commented Feb 21, 2016 •

edited by javanna

Loading