Getting "Failed to deserialize response from handler" after upgrading from 5.6.4 to 6.2.1 #28713

bra-fsn · 2018-02-17T10:25:13Z

Elasticsearch version (bin/elasticsearch --version):
Version: 6.2.1, Build: 7299dc3/2018-02-07T19:34:26.990113Z, JVM: 1.8.0_131

Plugins installed: []
analysis-icu

JVM version (java -version):
openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)

OS version (uname -a if on a Unix-like system):
FreeBSD 11.1

Description of the problem including expected versus actual behavior:
After upgrading to 6.2.1 from 5.6.4 I get these transport errors between nodes (in this case between the master (10.6.145.102) and a data node (10.6.145.235).

Steps to reproduce:
Never saw this on previous elastic versions, it started to appear after the 6.2.1 upgrade. We have several nodes (three distinct masters, 40 mixed data and client nodes and some client only nodes, between them the java versions differ, masters: 1.8.0_51, data nodes: 1.8.0_144, client nodes: 1.8.0_131).
We have indices created with various 5.x versions and with a lot of different data.
Please tell me what exact info should I provide.

Provide logs (if relevant):
[2018-02-17T09:45:45,474][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [fmesm02] failed to execute on node [VuFL_GfOQau3I-ZYHbiW5g] org.elasticsearch.transport.RemoteTransportException: [Failed to deserialize response from handler [org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler]] Caused by: org.elasticsearch.transport.TransportSerializationException: Failed to deserialize response from handler [org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler] at org.elasticsearch.transport.TcpTransport.handleResponse(TcpTransport.java:1441) [elasticsearch-6.2.1.jar:6.2.1] at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:1400) [elasticsearch-6.2.1.jar:6.2.1] at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:64) [transport-netty4-6.2.1.jar:6.2.1] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.16.Final.jar:4.1.16.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.16.Final.jar:4.1.16.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.16.Final.jar:4.1.16.Final] at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310) [netty-codec-4.1.16.Final.jar:4.1.16.Final] at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:297) [netty-codec-4.1.16.Final.jar:4.1.16.Final] at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:413) [netty-codec-4.1.16.Final.jar:4.1.16.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265) [netty-codec-4.1.16.Final.jar:4.1.16.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.16.Final.jar:4.1.16.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.16.Final.jar:4.1.16.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.16.Final.jar:4.1.16.Final] at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) [netty-handler-4.1.16.Final.jar:4.1.16.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.16.Final.jar:4.1.16.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.16.Final.jar:4.1.16.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.16.Final.jar:4.1.16.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359) [netty-transport-4.1.16.Final.jar:4.1.16.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.16.Final.jar:4.1.16.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.16.Final.jar:4.1.16.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935) [netty-transport-4.1.16.Final.jar:4.1.16.Final] at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134) [netty-transport-4.1.16.Final.jar:4.1.16.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) [netty-transport-4.1.16.Final.jar:4.1.16.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545) [netty-transport-4.1.16.Final.jar:4.1.16.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499) [netty-transport-4.1.16.Final.jar:4.1.16.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) [netty-transport-4.1.16.Final.jar:4.1.16.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [netty-common-4.1.16.Final.jar:4.1.16.Final] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131] Caused by: java.io.IOException: Invalid string; unexpected character: 249 hex: f9 at org.elasticsearch.common.io.stream.StreamInput.readString(StreamInput.java:375) ~[elasticsearch-6.2.1.jar:6.2.1] at org.elasticsearch.node.ResponseCollectorService$ComputedNodeStats.<init>(ResponseCollectorService.java:145) ~[elasticsearch-6.2.1.jar:6.2.1] at org.elasticsearch.common.io.stream.StreamInput.readMap(StreamInput.java:460) ~[elasticsearch-6.2.1.jar:6.2.1] at org.elasticsearch.node.AdaptiveSelectionStats.<init>(AdaptiveSelectionStats.java:56) ~[elasticsearch-6.2.1.jar:6.2.1] at org.elasticsearch.common.io.stream.StreamInput.readOptionalWriteable(StreamInput.java:733) ~[elasticsearch-6.2.1.jar:6.2.1] at org.elasticsearch.action.admin.cluster.node.stats.NodeStats.readFrom(NodeStats.java:239) ~[elasticsearch-6.2.1.jar:6.2.1] at org.elasticsearch.transport.TransportResponseHandler.read(TransportResponseHandler.java:47) ~[elasticsearch-6.2.1.jar:6.2.1] at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.read(TransportService.java:1085) ~[elasticsearch-6.2.1.jar:6.2.1] at org.elasticsearch.transport.TcpTransport.handleResponse(TcpTransport.java:1437) ~[elasticsearch-6.2.1.jar:6.2.1] at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:1400) [elasticsearch-6.2.1.jar:6.2.1] at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:64) ~[?:?] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[?:?] at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310) ~[?:?] at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:297) ~[?:?] at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:413) ~[?:?] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265) ~[?:?] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[?:?] at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) ~[?:?] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[?:?] at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359) ~[?:?] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?] at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935) ~[?:?] at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134) ~[?:?] at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) ~[?:?] at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545) ~[?:?] at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499) ~[?:?] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) ~[?:?] at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) ~[?:?] at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_131] [2018-02-17T09:45:45,487][WARN ][o.e.t.n.Netty4Transport ] [fmesm02] exception caught on transport layer [NettyTcpChannel{localAddress=/10.6.145.102:13476, remoteAddress=10.6.145.235/10.6.145.235:9300}], closing connection java.lang.IllegalStateException: Message not fully read (response) for requestId [3185238], handler [org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler/org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1@3c93bec2], error [false]; resetting at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:1407) ~[elasticsearch-6.2.1.jar:6.2.1] at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:64) ~[transport-netty4-6.2.1.jar:6.2.1] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.16.Final.jar:4.1.16.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.16.Final.jar:4.1.16.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.16.Final.jar:4.1.16.Final] at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310) [netty-codec-4.1.16.Final.jar:4.1.16.Final] at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:297) [netty-codec-4.1.16.Final.jar:4.1.16.Final] at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:413) [netty-codec-4.1.16.Final.jar:4.1.16.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265) [netty-codec-4.1.16.Final.jar:4.1.16.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.16.Final.jar:4.1.16.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.16.Final.jar:4.1.16.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.16.Final.jar:4.1.16.Final] at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) [netty-handler-4.1.16.Final.jar:4.1.16.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.16.Final.jar:4.1.16.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.16.Final.jar:4.1.16.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.16.Final.jar:4.1.16.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359) [netty-transport-4.1.16.Final.jar:4.1.16.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.16.Final.jar:4.1.16.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.16.Final.jar:4.1.16.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935) [netty-transport-4.1.16.Final.jar:4.1.16.Final] at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134) [netty-transport-4.1.16.Final.jar:4.1.16.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) [netty-transport-4.1.16.Final.jar:4.1.16.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545) [netty-transport-4.1.16.Final.jar:4.1.16.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499) [netty-transport-4.1.16.Final.jar:4.1.16.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) [netty-transport-4.1.16.Final.jar:4.1.16.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [netty-common-4.1.16.Final.jar:4.1.16.Final] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131] [2018-02-17T09:45:45,548][INFO ][o.e.c.r.a.AllocationService] [fmesm02] Cluster health status changed from [GREEN] to [YELLOW] (reason: [{fmfe30}{VuFL_GfOQau3I-ZYHbiW5g}{VGhsT_tCRpCngX3cTFSZ0w}{10.6.145.235}{10.6.145.235:9300} transport disconnected]).

The text was updated successfully, but these errors were encountered:

bra-fsn · 2018-02-17T14:31:02Z

The unexpected character varies wildly:
Caused by: java.io.IOException: Invalid string; unexpected character: 146 hex: 92
Caused by: java.io.IOException: Invalid string; unexpected character: 159 hex: 9f
Caused by: java.io.IOException: Invalid string; unexpected character: 242 hex: f2
Caused by: java.io.IOException: Invalid string; unexpected character: 137 hex: 89
Caused by: java.io.IOException: Invalid string; unexpected character: 186 hex: ba
Caused by: java.io.IOException: Invalid string; unexpected character: 251 hex: fb
Caused by: java.io.IOException: Invalid string; unexpected character: 249 hex: f9
Caused by: java.io.IOException: Invalid string; unexpected character: 189 hex: bd
Caused by: java.io.IOException: Invalid string; unexpected character: 153 hex: 99
Caused by: java.io.IOException: Invalid string; unexpected character: 254 hex: fe
Caused by: java.io.IOException: Invalid string; unexpected character: 139 hex: 8b

ywelsch · 2018-02-17T14:48:47Z

Are all nodes on 6.2.1? Is this issue only occurring between certain nodes? Is the response different if you fire off a GET /_nodes/stats request on the master-eligible, the data, or the client nodes? Are all nodes but one reporting a failure in that case, or only a subset of the nodes?
Before we get into tcp dump territory, can you maybe reproduce this in a simple 2 node cluster (same hardware/OS)?

bra-fsn · 2018-02-17T19:04:27Z

Yes, all nodes are 6.2.1:
# curl -s 'http://localhost:9200/_cat/nodes?h=node.role,version' | sort | uniq -c 40 di 6.2.1 9 i 6.2.1 3 mi 6.2.1
They only differ in their roles and JDK versions.

I can only see these log entries on the three master(eliglible) nodes. BTW, we query only those for statistics.
There are many nodes involved, until now 29 of them, each of them are data (and client, mixed) nodes:
egrep 'exception caught on transport layer' all_masters.log | egrep -o 'remoteAddress.*}' | sort | uniq -c 2 remoteAddress=10.6.145.193/10.6.145.193:9300} 1 remoteAddress=10.6.145.195/10.6.145.195:9300} 2 remoteAddress=10.6.145.197/10.6.145.197:9300} 1 remoteAddress=10.6.145.199/10.6.145.199:9300} 4 remoteAddress=10.6.145.200/10.6.145.200:9300} 1 remoteAddress=10.6.145.201/10.6.145.201:9300} 1 remoteAddress=10.6.145.202/10.6.145.202:9300} 1 remoteAddress=10.6.145.203/10.6.145.203:9300} 1 remoteAddress=10.6.145.204/10.6.145.204:9300} 1 remoteAddress=10.6.145.206/10.6.145.206:9300} 1 remoteAddress=10.6.145.210/10.6.145.210:9300} 1 remoteAddress=10.6.145.211/10.6.145.211:9300} 1 remoteAddress=10.6.145.223/10.6.145.223:9300} 2 remoteAddress=10.6.145.225/10.6.145.225:9300} 1 remoteAddress=10.6.145.226/10.6.145.226:9300} 3 remoteAddress=10.6.145.227/10.6.145.227:9300} 3 remoteAddress=10.6.145.228/10.6.145.228:9300} 1 remoteAddress=10.6.145.232/10.6.145.232:9300} 2 remoteAddress=10.6.145.233/10.6.145.233:9300} 1 remoteAddress=10.6.145.234/10.6.145.234:9300} 3 remoteAddress=10.6.145.235/10.6.145.235:9300} 3 remoteAddress=10.6.145.236/10.6.145.236:9300} 1 remoteAddress=10.6.145.237/10.6.145.237:9300} 1 remoteAddress=10.6.145.239/10.6.145.239:9300} 3 remoteAddress=10.6.145.240/10.6.145.240:9300} 2 remoteAddress=10.6.145.241/10.6.145.241:9300} 2 remoteAddress=10.6.145.242/10.6.145.242:9300} 2 remoteAddress=10.6.145.243/10.6.145.243:9300} 2 remoteAddress=10.6.145.244/10.6.145.244:9300}
It happens more often with some of them as you can see, but I can't point one as guilty.
It also happens on master-eliglible, but non-master nodes, the above contains their logs too.

Today the log dates are:
[2018-02-17T00:15:46,519][WARN [2018-02-17T00:21:08,014][WARN [2018-02-17T02:16:04,699][WARN [2018-02-17T02:22:18,086][WARN [2018-02-17T02:25:28,047][WARN [2018-02-17T02:31:26,304][WARN [2018-02-17T02:35:43,636][WARN [2018-02-17T02:50:40,803][WARN [2018-02-17T02:55:27,373][WARN [2018-02-17T03:30:54,778][WARN [2018-02-17T03:55:15,757][WARN [2018-02-17T05:40:33,378][WARN [2018-02-17T05:40:48,091][WARN [2018-02-17T06:10:37,117][WARN [2018-02-17T06:10:49,796][WARN [2018-02-17T06:56:14,371][WARN [2018-02-17T07:11:15,700][WARN [2018-02-17T07:26:03,577][WARN [2018-02-17T07:30:39,050][WARN [2018-02-17T07:36:26,556][WARN [2018-02-17T08:06:04,656][WARN [2018-02-17T08:45:43,094][WARN [2018-02-17T08:55:47,759][WARN [2018-02-17T09:00:32,387][WARN [2018-02-17T09:45:45,487][WARN [2018-02-17T10:15:53,099][WARN [2018-02-17T10:20:38,481][WARN [2018-02-17T10:30:29,252][WARN [2018-02-17T11:00:27,803][WARN [2018-02-17T11:10:28,640][WARN [2018-02-17T11:50:43,091][WARN [2018-02-17T12:00:48,460][WARN [2018-02-17T12:35:25,549][WARN [2018-02-17T12:35:32,719][WARN [2018-02-17T12:50:30,927][WARN [2018-02-17T13:00:51,557][WARN [2018-02-17T13:35:25,834][WARN [2018-02-17T14:15:54,408][WARN [2018-02-17T14:25:36,535][WARN [2018-02-17T14:25:46,726][WARN [2018-02-17T15:10:28,747][WARN [2018-02-17T15:15:29,193][WARN [2018-02-17T15:50:16,142][WARN [2018-02-17T16:00:48,446][WARN [2018-02-17T16:00:57,826][WARN [2018-02-17T16:10:23,455][WARN [2018-02-17T16:20:25,079][WARN [2018-02-17T16:35:34,404][WARN [2018-02-17T16:40:28,779][WARN [2018-02-17T17:05:34,549][WARN
Most of them are close to 0 and 5 minutes which correlates with out statistics collection cronjob.
It doesn't happen on every _nodes/stats query. I've tried to fire some queries by hand, but nothing happened. Will try running the stats collector more often along with a tcpdump.

bra-fsn · 2018-02-17T19:40:57Z

I could capture an error, I sent the dump and log in email to you.

ywelsch · 2018-02-18T10:37:56Z

Thanks for the packet capture. I've had a look and I think I know what's going on.

The AdaptiveSelectionStats object (introduced in 6.1) serializes the clientOutgoingConnections map that's concurrently updated in SearchTransportService. Serializing the map consists of first writing the size of the map and then serializing the entries. If the number of entries changes while the map is being serialized, the size and number of entries go out of sync. The deserialization routine expects those to be in sync though. I've opened #28718 as a fix.

bra-fsn · 2018-02-19T09:16:03Z

The cluster is fine since yesterday with your latest patch:
ef03a0e
Thanks for the quick fix!

The AdaptiveSelectionStats object serializes the clientOutgoingConnections map that's concurrently updated in SearchTransportService. Serializing the map consists of first writing the size of the map and then serializing the entries. If the number of entries changes while the map is being serialized, the size and number of entries go out of sync. The deserialization routine expects those to be in sync though. Closes #28713

ywelsch · 2018-02-19T09:24:05Z

Thanks for the detailed report and the packet dump. For anyone else running into this bug:
This can manifest when using the node stats call (GET /_nodes/stats), which queries adaptive replica selection stats by default, even if the feature is not used. The only workaround is to explicitly specify (all) the desired metrics os,jvm,thread_pool,... on the GET /_nodes/stats request to implicitly exclude the collection of the adaptive replica selection stats.

ywelsch mentioned this issue Feb 18, 2018

Fix AdaptiveSelectionStats serialization bug #28718

Merged

ywelsch added >bug :Data Management/Stats Statistics tracking and retrieval APIs labels Feb 18, 2018

bra-fsn closed this as completed Feb 19, 2018

bra-fsn mentioned this issue Feb 26, 2018

CorruptIndexException: file mismatch #28826

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting "Failed to deserialize response from handler" after upgrading from 5.6.4 to 6.2.1 #28713

Getting "Failed to deserialize response from handler" after upgrading from 5.6.4 to 6.2.1 #28713

bra-fsn commented Feb 17, 2018

bra-fsn commented Feb 17, 2018

ywelsch commented Feb 17, 2018

bra-fsn commented Feb 17, 2018 •

edited

bra-fsn commented Feb 17, 2018

ywelsch commented Feb 18, 2018

bra-fsn commented Feb 19, 2018

ywelsch commented Feb 19, 2018

Getting "Failed to deserialize response from handler" after upgrading from 5.6.4 to 6.2.1 #28713

Getting "Failed to deserialize response from handler" after upgrading from 5.6.4 to 6.2.1 #28713

Comments

bra-fsn commented Feb 17, 2018

bra-fsn commented Feb 17, 2018

ywelsch commented Feb 17, 2018

bra-fsn commented Feb 17, 2018 • edited

bra-fsn commented Feb 17, 2018

ywelsch commented Feb 18, 2018

bra-fsn commented Feb 19, 2018

ywelsch commented Feb 19, 2018

bra-fsn commented Feb 17, 2018 •

edited