Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tribe client connects directly to client node over transport #16756

Closed
ppf2 opened this issue Feb 21, 2016 · 10 comments
Closed

Tribe client connects directly to client node over transport #16756

ppf2 opened this issue Feb 21, 2016 · 10 comments
Labels
>docs General docs changes

Comments

@ppf2
Copy link
Member

ppf2 commented Feb 21, 2016

This is observed in a setup where the tribe node does not have firewall access over the transport port to a client node of the downstream cluster:

[2016-02-14 20:52:16,243][WARN ][cluster.service          ] [tribe_node_name/t1] failed to connect to node [{client_node}{kH2yVx_WQ22qHmthaX_NHA}{10.8.17.130}{host_name/IP:9300}{data=false, master=false}]
ConnectTransportException[[client_node][host_name/IP:9300] connect_timeout[30s]]; nested: ConnectTimeoutException[connection timed out: host_name/IP:9300];
    at org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:951)
    at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:884)
    at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:857)
    at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:243)
    at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:474)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.jboss.netty.channel.ConnectTimeoutException: connection timed out: host_name/IP:9300
    at org.jboss.netty.channel.socket.nio.NioClientBoss.processConnectTimeout(NioClientBoss.java:139)
    at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:83)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
    at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
    at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    ... 3 more

The message is benign since there is no reason for the tribe node to connect directly to a downstream cluster's client node. The behavior is likely due to how client nodes work (in general) where they will connect to all nodes in the cluster (and with tribe node being just a specialized client node, it just behaves the same way). Perhaps we can add an exclusion for tribe node so it will not attempt to connect to the client nodes in the downstream clusters, etc..

@javanna
Copy link
Member

javanna commented Mar 1, 2016

This seems related to some other issue around client nodes connecting to other client nodes: #16815, #3617, #16105. Looking at the linked issue though, it seems like any client node does not even try to connect to other client nodes, while it should. The problem here seems to be the opposite, an attempt of connection that may not be desirable.

I am not sure about the proposal. Why shouldn't the tribe node connect to the client nodes that are part of the cluster? I think every node should rather be able to connect to whichever other node in the cluster. I understand that the tribe node is already a client of its own, and it doesn't need to connect to other clients nodes when it comes to operations that involve data, but there are apis, like monitoring ones, that do need to have access to client nodes too. My reasoning goes along this other comment.

@javanna
Copy link
Member

javanna commented Mar 2, 2016

I think with #16898 we made it clear that each node connects to every other node in the cluster, client nodes should not be treated differently, I don't think we should make exceptions for tribe nodes either. Are you ok with this @ppf2 ?

@ppf2
Copy link
Member Author

ppf2 commented Mar 2, 2016

Thanks @javanna , sounds good. It will be nice though if we document this behavior also in the tribe node documentation - will be helpful for admins out there who have to figure out what ports to open between the tribe node and the other nodes.

@ppf2
Copy link
Member Author

ppf2 commented Apr 1, 2016

Reopening this ticket for a follow up discussion. One side effect of the current behavior is that the tribe node log file gets filled up with heaps of exceptions like the one noted at the beginning of this issue. For instance, within a 16 hour period (< 1 day) with just 1 client node in a downstream cluster, the tribe node ends up logging 176Mb of log entries, pretty much filling up the log file with 21K instances of these exception stacks.

While we do not intend to change the design that the tribe node will try to connect to all nodes in the cluster, it can be helpful if we can move these exceptions (when a tribe node attempts to connect to a client node) to the trace level. Thoughts?

@javanna
Copy link
Member

javanna commented Apr 1, 2016

@ppf2 do you mean the log line that's part of the description of this issue or some other log line?

@ppf2
Copy link
Member Author

ppf2 commented Apr 1, 2016

Here you go :) We are seeing a ton of these indicating that the tribe node is trying to connect to a client node.

[2016-03-31 03:20:04,750][DEBUG][action.admin.cluster.node.info] [tribe_node] failed to execute on node [A-cIPrviSUiCmoiIzj_GAw]
SendRequestTransportException[[client_node][tribe_node/IP:9300][cluster:monitor/nodes/info[n]]]; nested: NodeNotConnectedException[[client_node][tribe_node/IP:9300] Node not connected];
    at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:323)
    at org.elasticsearch.shield.transport.ShieldServerTransportService.sendRequest(ShieldServerTransportService.java:75)
    at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.start(TransportNodesAction.java:147)
    at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.access$100(TransportNodesAction.java:94)
    at org.elasticsearch.action.support.nodes.TransportNodesAction.doExecute(TransportNodesAction.java:68)
    at org.elasticsearch.action.support.nodes.TransportNodesAction.doExecute(TransportNodesAction.java:44)
    at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:101)
    at org.elasticsearch.shield.action.ShieldActionFilter.apply(ShieldActionFilter.java:113)
    at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:99)
    at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:77)
    at org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:58)
    at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:351)
    at org.elasticsearch.client.FilterClient.doExecute(FilterClient.java:52)
    at org.elasticsearch.rest.BaseRestHandler$HeadersAndContextCopyClient.doExecute(BaseRestHandler.java:83)
    at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:351)
    at org.elasticsearch.client.support.AbstractClient$ClusterAdmin.execute(AbstractClient.java:845)
    at org.elasticsearch.clclient_nodeient.support.AbstractClient$ClusterAdmin.nodesInfo(AbstractClient.java:925)
    at org.elasticsearch.rest.action.admin.cluster.node.info.RestNodesInfoAction.handleRequest(RestNodesInfoAction.java:102)
    at org.elasticsearch.rest.BaseRestHandler.handleRequest(BaseRestHandler.java:54)
    at org.elasticsearch.rest.RestController.executeHandler(RestController.java:207)
    at org.elasticsearch.rest.RestController$RestHandlerFilter.process(RestController.java:281)
    at org.elasticsearch.rest.RestController$ControllerFilterChain.continueProcessing(RestController.java:262)
    at org.elasticsearch.shield.rest.ShieldRestFilter.process(ShieldRestFilter.java:77)
    at org.elasticsearch.rest.RestController$ControllerFilterChain.continueProcessing(RestController.java:265)
    at org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:176)
    at org.elasticsearch.http.HttpServer.internalDispatchRequest(HttpServer.java:128)
    at org.elasticsearch.http.HttpServer$Dispatcher.dispatchRequest(HttpServer.java:86)
    at org.elasticsearch.http.netty.NettyHttpServerTransport.dispatchRequest(NettyHttpServerTransport.java:385)
    at org.elasticsearch.http.netty.HttpRequestHandler.messageReceived(HttpRequestHandler.java:63)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.elasticsearch.http.netty.pipelining.HttpPipeliningHandler.messageReceived(HttpPipeliningHandler.java:60)
    at org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:88)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:145)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.jboss.netty.handler.codec.http.HttpContentDecoder.messageReceived(HttpContentDecoder.java:108)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
    at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
    at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:75)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.jboss.netty.handler.ipfilter.IpFilteringHandlerImpl.handleUpstream(IpFilteringHandlerImpl.java:154)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
    at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
    at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
    at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: NodeNotConnectedException[[client_node][tribe_node/IP:9300] Node not connected]
    at org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:1132)
    at org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:819)
    at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:312)
    ... 75 more

@javanna
Copy link
Member

javanna commented Apr 1, 2016

Thanks @ppf2! I am not sure we can change log level only when the log line comes from a tribe node. Seems like working around the problem. I think every node should get access to all the other nodes instead, including the client ones.

This specific log line comes from calling nodes info from the tribe node. The tribe node will gather the info from all the nodes, as simple as that. Another way to work around it would be to not use the tribe node for monitoring calls, or filter out some of the nodes from this call (e.g. using node attributes).

@javanna
Copy link
Member

javanna commented Apr 5, 2016

What I previously provided are workarounds, assuming that the firewall config stays the same. But given that we removed support for the node.client setting in master, and we are moving away from using client nodes with the java api, I wonder why those "client" nodes need to be treated differently. I think they should have their ports accessible, cause that's what the cluster requires.

@bleskes
Copy link
Contributor

bleskes commented Apr 5, 2016

what @javanna said. The tribe node should be able to connect to any node in the clusters it connects to. Agreed that it's confusing with the current way we treat client nodes as clients, but that's what we're changing...

@javanna javanna removed the discuss label Apr 6, 2016
@amazinganshul
Copy link

With this issue are we resolving whether tribe nodes in the federation cluster should connect to each other or not? We have a federation cluster with two tribe nodes, but no api output shows that tribe nodes have connected to each other. Is this an expected behaviour? Also is there any documentation on scaling tribe nodes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>docs General docs changes
Projects
None yet
Development

No branches or pull requests

5 participants