Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node fails to join cluster after upgrade 6.54 -> 6.7.0 #40784

Closed
ItamarBenjamin opened this issue Apr 3, 2019 · 5 comments
Closed

node fails to join cluster after upgrade 6.54 -> 6.7.0 #40784

ItamarBenjamin opened this issue Apr 3, 2019 · 5 comments

Comments

@ItamarBenjamin
Copy link
Contributor

** this does not seem related to #40565

Elasticsearch version (bin/elasticsearch --version): cluster is 6.5.4, upgraded single node to 6.7.0. all docker builds

Plugins installed: none

JVM version (java -version): docker provided jvm

OS version (uname -a if on a Unix-like system): 4.15.0-33-generic #36~16.04.1-Ubuntu

Description of the problem including expected versus actual behavior:

upgraded a single node on 5 different clusters, all clusters suffered the same issue. upgraded node was not able to rejoin the cluster and remained yellow. to fix had to upgrade the other nodes to 6.7.0, causing downtime.

Steps to reproduce:
run a 3 6.5.4 nodes cluster on docker, upgrade a single node to 6.7.0

Provide logs (if relevant):
[2019-04-03T06:30:56,549][WARN ][o.e.d.z.PublishClusterStateAction] [fmyinf7012] publishing cluster state with version [29318] failed for the following nodes: [[{fmyinf7011}{ejFPzwZHRxqk4CNBtF66iA}{x1NPhmVuQcucbPX3VeEgPg}{10.96.87.25}{10.96.87.25:9300}{xpack.installed=true}]] [2019-04-03T06:30:56,563][INFO ][o.e.c.s.MasterService ] [fmyinf7012] zen-disco-node-failed({fmyinf7011}{ejFPzwZHRxqk4CNBtF66iA}{x1NPhmVuQcucbPX3VeEgPg}{10.96.87.25}{10.96.87.25:9300}{xpack.installed=true}), reason(transport disconnected)[{fmyinf7011}{ejFPzwZHRxqk4CNBtF66iA}{x1NPhmVuQcucbPX3VeEgPg}{10.96.87.25}{10.96.87.25:9300}{xpack.installed=true} transport disconnected, {fmyinf7011}{ejFPzwZHRxqk4CNBtF66iA}{x1NPhmVuQcucbPX3VeEgPg}{10.96.87.25}{10.96.87.25:9300}{xpack.installed=true} transport disconnected], reason: removed {{fmyinf7011}{ejFPzwZHRxqk4CNBtF66iA}{x1NPhmVuQcucbPX3VeEgPg}{10.96.87.25}{10.96.87.25:9300}{xpack.installed=true},} [2019-04-03T06:30:56,569][INFO ][o.e.c.s.ClusterApplierService] [fmyinf7012] removed {{fmyinf7011}{ejFPzwZHRxqk4CNBtF66iA}{x1NPhmVuQcucbPX3VeEgPg}{10.96.87.25}{10.96.87.25:9300}{xpack.installed=true},}, reason: apply cluster state (from master [master {fmyinf7012}{K0Od0eOBTp6c8PRggjBpuw}{_1KFcV3zRryhiuGz7cCQwQ}{10.96.87.26}{10.96.87.26:9300}{xpack.installed=true} committed version [29319] source [zen-disco-node-failed({fmyinf7011}{ejFPzwZHRxqk4CNBtF66iA}{x1NPhmVuQcucbPX3VeEgPg}{10.96.87.25}{10.96.87.25:9300}{xpack.installed=true}), reason(transport disconnected)[{fmyinf7011}{ejFPzwZHRxqk4CNBtF66iA}{x1NPhmVuQcucbPX3VeEgPg}{10.96.87.25}{10.96.87.25:9300}{xpack.installed=true} transport disconnected, {fmyinf7011}{ejFPzwZHRxqk4CNBtF66iA}{x1NPhmVuQcucbPX3VeEgPg}{10.96.87.25}{10.96.87.25:9300}{xpack.installed=true} transport disconnected]]]) [2019-04-03T06:31:00,580][INFO ][o.e.c.s.MasterService ] [fmyinf7012] zen-disco-node-join[{fmyinf7011}{ejFPzwZHRxqk4CNBtF66iA}{x1NPhmVuQcucbPX3VeEgPg}{10.96.87.25}{10.96.87.25:9300}{xpack.installed=true}], reason: added {{fmyinf7011}{ejFPzwZHRxqk4CNBtF66iA}{x1NPhmVuQcucbPX3VeEgPg}{10.96.87.25}{10.96.87.25:9300}{xpack.installed=true},} [2019-04-03T06:31:00,635][INFO ][o.e.c.s.ClusterApplierService] [fmyinf7012] added {{fmyinf7011}{ejFPzwZHRxqk4CNBtF66iA}{x1NPhmVuQcucbPX3VeEgPg}{10.96.87.25}{10.96.87.25:9300}{xpack.installed=true},}, reason: apply cluster state (from master [master {fmyinf7012}{K0Od0eOBTp6c8PRggjBpuw}{_1KFcV3zRryhiuGz7cCQwQ}{10.96.87.26}{10.96.87.26:9300}{xpack.installed=true} committed version [29320] source [zen-disco-node-join[{fmyinf7011}{ejFPzwZHRxqk4CNBtF66iA}{x1NPhmVuQcucbPX3VeEgPg}{10.96.87.25}{10.96.87.25:9300}{xpack.installed=true}]]]) [2019-04-03T06:31:06,665][WARN ][o.e.t.n.Netty4Transport ] [fmyinf7012] exception caught on transport layer [NettyTcpChannel{localAddress=/10.96.87.26:58004, remoteAddress=10.96.87.25/10.96.87.25:9300}], closing connection java.lang.IllegalStateException: Message not fully read (response) for requestId [55772288], handler [org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler/org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1@58fac40a], error [false]; resetting at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:1197) ~[elasticsearch-6.5.4.jar:6.5.4] at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:65) ~[transport-netty4-client-6.5.4.jar:6.5.4] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.30.Final.jar:4.1.30.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.30.Final.jar:4.1.30.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.30.Final.jar:4.1.30.Final] at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323) [netty-codec-4.1.30.Final.jar:4.1.30.Final] at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310) [netty-codec-4.1.30.Final.jar:4.1.30.Final] at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:426) [netty-codec-4.1.30.Final.jar:4.1.30.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278) [netty-codec-4.1.30.Final.jar:4.1.30.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.30.Final.jar:4.1.30.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.30.Final.jar:4.1.30.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.30.Final.jar:4.1.30.Final] at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) [netty-handler-4.1.30.Final.jar:4.1.30.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.30.Final.jar:4.1.30.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.30.Final.jar:4.1.30.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.30.Final.jar:4.1.30.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) [netty-transport-4.1.30.Final.jar:4.1.30.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.30.Final.jar:4.1.30.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.30.Final.jar:4.1.30.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) [netty-transport-4.1.30.Final.jar:4.1.30.Final] at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.30.Final.jar:4.1.30.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644) [netty-transport-4.1.30.Final.jar:4.1.30.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:544) [netty-transport-4.1.30.Final.jar:4.1.30.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498) [netty-transport-4.1.30.Final.jar:4.1.30.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458) [netty-transport-4.1.30.Final.jar:4.1.30.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897) [netty-common-4.1.30.Final.jar:4.1.30.Final] at java.lang.Thread.run(Thread.java:834) [?:?]

@dongs0104
Copy link

A problem has occurred due to #39378

this issue is same as #40511

@DaveCTurner
Copy link
Contributor

DaveCTurner commented Apr 3, 2019

This sounds like the issue fixed by #40483. You can work around this by first upgrading the cluster completely to 6.6.2 and thence to 6.7.0. Alternatively you can wait for 6.7.1 to be released.

This does not sound related to #40511. Edit: sorry, I looked more closely and see that this is docker-related. Yes, this is possibly fixed by #40511.

@dongs0104
Copy link

@DaveCTurner when 6.7.1 to be release?

@ItamarBenjamin
Copy link
Contributor Author

@DaveCTurner upgrading to 6.6.2 first didn't work. any other workarounds you know? thanks!

@dongs0104
Copy link

dongs0104 commented Apr 4, 2019

@ItamarBenjamin check this way #40511 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants