Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Netty4SizeHeaderFrameDecoder error #31057

Merged
merged 4 commits into from
Jun 5, 2018

Conversation

martijnvg
Copy link
Member

Always take into account return value of TcpTransport.readMessageLength(...) in Netty4SizeHeaderFrameDecoder.

If TcpTransport.readMessageLength(...) returned -1 then this was ignored because HEADER_SIZE was always added to what this method returns.

During ccr benchmarking we have ran into the following error:

[2018-06-01T08:15:02,693][WARN ][o.e.t.n.Netty4Transport  ] [ccr-es-cluster-a-mvg-2] exception caught on transport layer [NettyTcpChannel{localAddress=/10.128.0.2:60298, remoteAddress=10.132.0.3/10.132.0.3:39300}], closing connection
io.netty.handler.codec.DecoderException: java.lang.IndexOutOfBoundsException: readerIndex(2026014) + length(6) exceeds writerIndex(2026019): PooledHeapByteBuf(ridx: 2026014, widx: 2026019, cap: 2097152)
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:459) ~[netty-codec-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265) ~[netty-codec-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) [netty-handler-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [netty-common-4.1.16.Final.jar:4.1.16.Final]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
Caused by: java.lang.IndexOutOfBoundsException: readerIndex(2026014) + length(6) exceeds writerIndex(2026019): PooledHeapByteBuf(ridx: 2026014, widx: 2026019, cap: 2097152)
	at io.netty.buffer.AbstractByteBuf.checkReadableBytes0(AbstractByteBuf.java:1401) ~[?:?]
	at io.netty.buffer.AbstractByteBuf.checkReadableBytes(AbstractByteBuf.java:1388) ~[?:?]
	at io.netty.buffer.AbstractByteBuf.skipBytes(AbstractByteBuf.java:944) ~[?:?]
	at org.elasticsearch.transport.netty4.Netty4SizeHeaderFrameDecoder.decode(Netty4SizeHeaderFrameDecoder.java:44) ~[?:?]
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) ~[?:?]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428) ~[?:?]
	... 19 more

I'm not 100% sure whether that is caused by the fact the return value -1 was ignored, but after running the ccr benchmark many times with this change, I have not run into this error any more.

@martijnvg martijnvg added >bug review :Distributed/Network Http and internode communication implementations v7.0.0 labels Jun 4, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

final ByteBuf message = in.skipBytes(HEADER_SIZE);
// 6 bytes would mean it is a ping. And we should ignore.
if (messageLength != 6) {
if (messageWithHeaderLength != 6) {
Copy link
Member Author

@martijnvg martijnvg Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is clearer if we just let TcpTransport.readMessageLength() return -1 in case of a ping (right now it returns 0, but TcpTransport.PING_DATA_SIZE itself is -1)

@martijnvg martijnvg force-pushed the Netty4SizeHeaderFrameDecoder branch from f20a2a2 to e14a1cb Compare June 4, 2018 11:34
Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment.

@@ -37,17 +37,17 @@
protected void decode(ChannelHandlerContext ctx, ByteBuf in, List<Object> out) throws Exception {
try {
BytesReference networkBytes = Netty4Utils.toBytesReference(in);
int messageLength = TcpTransport.readMessageLength(networkBytes) + HEADER_SIZE;
int messageLength = TcpTransport.readMessageLength(networkBytes);
int messageWithHeaderLength = messageLength + HEADER_SIZE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not even think we should be making calculations with messageLength if it is equal to -1 so we should guard all of this in if (messageLength != -1).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in: 6678992

int messageLength = TcpTransport.readMessageLength(networkBytes);
// If the message length is -1, we have not read a complete header.
if (messageLength != -1) {
int messageWithHeaderLength = messageLength + HEADER_SIZE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably messageLengthWithHeader.

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one more comment but this LGTM now. Good find.

Copy link
Contributor

@Tim-Brooks Tim-Brooks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@martijnvg
Copy link
Member Author

retest this please

1 similar comment
@martijnvg
Copy link
Member Author

retest this please

@martijnvg martijnvg merged commit 0fad7cc into elastic:master Jun 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Network Http and internode communication implementations v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants