Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Die with dignity on the network layer #21720

Merged
merged 2 commits into from
Nov 22, 2016

Conversation

jasontedor
Copy link
Member

When a fatal error is thrown on the network layer, such an error never
makes its way to the uncaught exception handler. This prevents the node
from being torn down if an out of memory error or other fatal error is
thrown while handling HTTP or transport traffic. This commit adds logic
to ensure that such errors bubble their way up to the uncaught exception
handler, even though Netty tries really hard to swallow everything.

Relates #19272

When a fatal error is thrown on the network layer, such an error never
makes its way to the uncaught exception handler. This prevents the node
from being torn down if an out of memory error or other fatal error is
thrown while handling HTTP or transport traffic. This commit adds logic
to ensure that such errors bubble their way up to the uncaught exception
handler, even though Netty tries really hard to swallow everything.
Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jasontedor
Copy link
Member Author

jasontedor commented Nov 21, 2016

To test this, start an instance of Elasticsearch that does not have this patch applied with a 256m heap and http.max_content_length set to 512m.

$ dd if=/dev/zero of=zero bs=1024k count=512
$ curl -XPOST localhost:9200/i/t/1 --data-binary @zero

Elasticsearch will not die. Now apply this patch and test again. Elasticsearch will die with:

[2016-11-21T16:27:18,509][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [] fatal error in thread [org.elasticsearch.http.netty4.Netty4HttpRequestHandler#exceptionCaught], exiting
java.lang.OutOfMemoryError: Java heap space
	at io.netty.buffer.PoolArena$HeapArena.newUnpooledChunk(PoolArena.java:661) ~[?:?]
	at io.netty.buffer.PoolArena.allocateHuge(PoolArena.java:246) ~[?:?]
	at io.netty.buffer.PoolArena.allocate(PoolArena.java:224) ~[?:?]
	at io.netty.buffer.PoolArena.allocate(PoolArena.java:141) ~[?:?]
	at io.netty.buffer.PooledByteBufAllocator.newHeapBuffer(PooledByteBufAllocator.java:247) ~[?:?]
	at io.netty.buffer.AbstractByteBufAllocator.heapBuffer(AbstractByteBufAllocator.java:160) ~[?:?]
	at io.netty.buffer.AbstractByteBufAllocator.heapBuffer(AbstractByteBufAllocator.java:151) ~[?:?]
	at io.netty.buffer.CompositeByteBuf.allocBuffer(CompositeByteBuf.java:1653) ~[?:?]
	at io.netty.buffer.CompositeByteBuf.consolidateIfNeeded(CompositeByteBuf.java:405) ~[?:?]
	at io.netty.buffer.CompositeByteBuf.addComponent(CompositeByteBuf.java:196) ~[?:?]
	at io.netty.handler.codec.MessageAggregator.appendPartialContent(MessageAggregator.java:317) ~[?:?]
	at io.netty.handler.codec.MessageAggregator.decode(MessageAggregator.java:282) ~[?:?]
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:373) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:359) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:351) ~[?:?]
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:373) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:359) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:351) ~[?:?]
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293) ~[?:?]
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:280) ~[?:?]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:396) ~[?:?]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:248) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:373) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:359) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:351) ~[?:?]
	at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:373) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:359) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:351) ~[?:?]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334) ~[?:?]

Copy link
Member

@jaymode jaymode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left one comment. Other than that LGTM

* frame so that at least we know where it came from.
*/
final StackTraceElement previous = Thread.currentThread().getStackTrace()[2];
new Thread(() -> { throw (Error)cause; }, previous.getClassName() + "#" + previous.getMethodName()).start();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can try to log the stacktrace in a try-finally block in hopes that we can get the full stacktrace? In the finally we can throw the Error

Copy link
Member Author

@jasontedor jasontedor Nov 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed bf47916.

@jasontedor
Copy link
Member Author

retest this please

When preparing to rethrow a fatal error, this commit adds an attempt to
log the current stack trace so where know which handler saw the fatal
error.
@jasontedor jasontedor merged commit 446037c into elastic:master Nov 22, 2016
jasontedor added a commit that referenced this pull request Nov 22, 2016
When a fatal error is thrown on the network layer, such an error never
makes its way to the uncaught exception handler. This prevents the node
from being torn down if an out of memory error or other fatal error is
thrown while handling HTTP or transport traffic. This commit adds logic
to ensure that such errors bubble their way up to the uncaught exception
handler, even though Netty tries really hard to swallow everything.

Relates #21720
jasontedor added a commit that referenced this pull request Nov 22, 2016
When a fatal error is thrown on the network layer, such an error never
makes its way to the uncaught exception handler. This prevents the node
from being torn down if an out of memory error or other fatal error is
thrown while handling HTTP or transport traffic. This commit adds logic
to ensure that such errors bubble their way up to the uncaught exception
handler, even though Netty tries really hard to swallow everything.

Relates #21720
@jasontedor jasontedor deleted the you-are-killing-me-netty branch November 22, 2016 03:57
@jasontedor
Copy link
Member Author

Thanks @s1monw and @jaymode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants