Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in latest netty version. #6221

Closed
doom369 opened this issue Jan 16, 2017 · 123 comments
Closed

Memory leak in latest netty version. #6221

doom369 opened this issue Jan 16, 2017 · 123 comments
Assignees
Labels
Milestone

Comments

@doom369
Copy link
Contributor

doom369 commented Jan 16, 2017

After recent update to 4.1.7-Final (from 4.1.4-Final) my servers started dying with OOM within few hours. Before they were running for weeks with no issues.

Error :

io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 64 byte(s) of direct memory (used: 468189141, max: 468189184)
        at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:614) ~[server-0.22.0-SNAPSHOT.jar:?]
        at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:568) ~[server-0.22.0-SNAPSHOT.jar:?]
        at io.netty.buffer.UnpooledUnsafeNoCleanerDirectByteBuf.allocateDirect(UnpooledUnsafeNoCleanerDirectByteBuf.java:30) ~[server-0.22.0-SNAPSHOT.jar:?]
        at io.netty.buffer.UnpooledUnsafeDirectByteBuf.<init>(UnpooledUnsafeDirectByteBuf.java:68) ~[server-0.22.0-SNAPSHOT.jar:?]
        at io.netty.buffer.UnpooledUnsafeNoCleanerDirectByteBuf.<init>(UnpooledUnsafeNoCleanerDirectByteBuf.java:25) ~[server-0.22.0-SNAPSHOT.jar:?]
        at io.netty.buffer.UnsafeByteBufUtil.newUnsafeDirectByteBuf(UnsafeByteBufUtil.java:625) ~[server-0.22.0-SNAPSHOT.jar:?]
        at io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:65) ~[server-0.22.0-SNAPSHOT.jar:?]
        at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:179) ~[server-0.22.0-SNAPSHOT.jar:?]
        at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:170) ~[server-0.22.0-SNAPSHOT.jar:?]
        at io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:131) ~[server-0.22.0-SNAPSHOT.jar:?]
        at io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:73) ~[server-0.22.0-SNAPSHOT.jar:?]
        at io.netty.channel.RecvByteBufAllocator$DelegatingHandle.allocate(RecvByteBufAllocator.java:124) ~[server-0.22.0-SNAPSHOT.jar:?]
        at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:956) ~[server-0.22.0-SNAPSHOT.jar:?]
        at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$1.run(AbstractEpollChannel.java:359) ~[server-0.22.0-SNAPSHOT.jar:?]
        at io.netty.util.concurrent.SingleThreadEventExecutor.safeExecute(SingleThreadEventExecutor.java:451) ~[server-0.22.0-SNAPSHOT.jar:?]
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:418) ~[server-0.22.0-SNAPSHOT.jar:?]
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:306) ~[server-0.22.0-SNAPSHOT.jar:?]
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:877) ~[server-0.22.0-SNAPSHOT.jar:?]
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144) ~[server-0.22.0-SNAPSHOT.jar:?]
        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111]

Or :

08:28:00.752 WARN  - Failed to mark a promise as failure because it has succeeded already: DefaultChannelPromise@7cd20
32d(success)io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 18713 byte(s) of direct memory (used: 468184872, max: 468189184)        
	at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:631) ~[server-0.21.7-2.jar:?]        
	at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:585) ~[server-0.21.7-2.jar:?]        
	at io.netty.buffer.UnpooledUnsafeNoCleanerDirectByteBuf.allocateDirect(UnpooledUnsafeNoCleanerDirectByteBuf.java:30) ~[server-0.21.7-2.jar:?]        
	at io.netty.buffer.UnpooledUnsafeDirectByteBuf.<init>(UnpooledUnsafeDirectByteBuf.java:68) ~[server-0.21.7-2.jar:?]
        at io.netty.buffer.UnpooledUnsafeNoCleanerDirectByteBuf.<init>(UnpooledUnsafeNoCleanerDirectByteBuf.java:25) ~[server-0.21.7-2.jar:?]
        at io.netty.buffer.UnsafeByteBufUtil.newUnsafeDirectByteBuf(UnsafeByteBufUtil.java:624) ~[server-0.21.7-2.jar:?]
        at io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:65) ~[server-0.21.7-2.jar:?]
        at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:179) ~[server-0.21.7-2.jar:?]
        at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:170) ~[server-0.21.7-2.jar:?]
        at io.netty.handler.ssl.SslHandler.allocate(SslHandler.java:1533) ~[server-0.21.7-2.jar:?]
        at io.netty.handler.ssl.SslHandler.allocateOutNetBuf(SslHandler.java:1544) ~[server-0.21.7-2.jar:?]
        at io.netty.handler.ssl.SslHandler.wrap(SslHandler.java:575) ~[server-0.21.7-2.jar:?]
        at io.netty.handler.ssl.SslHandler.wrapAndFlush(SslHandler.java:550) ~[server-0.21.7-2.jar:?]
        at io.netty.handler.ssl.SslHandler.flush(SslHandler.java:531) ~[server-0.21.7-2.jar:?]
        at io.netty.handler.ssl.SslHandler.flush(SslHandler.java:1324) ~[server-0.21.7-2.jar:?]
        at io.netty.handler.ssl.SslHandler.closeOutboundAndChannel(SslHandler.java:1307) ~[server-0.21.7-2.jar:?]
        at io.netty.handler.ssl.SslHandler.close(SslHandler.java:498) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:625) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:609) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.close(CombinedChannelDuplexHandler.java:504) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.ChannelOutboundHandlerAdapter.close(ChannelOutboundHandlerAdapter.java:71) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.CombinedChannelDuplexHandler.close(CombinedChannelDuplexHandler.java:315) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:625) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:609) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.ChannelDuplexHandler.close(ChannelDuplexHandler.java:73) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:625) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:609) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:466) ~[server-0.21.7-2.jar:?]
        at cc.blynk.server.core.protocol.handlers.DefaultExceptionHandler.handleUnexpectedException(DefaultExceptionHandler.java:59) ~[server-0.21.7-2.jar:?]
        at cc.blynk.server.core.protocol.handlers.DefaultExceptionHandler.handleGeneralException(DefaultExceptionHandler.java:43) ~[server-0.21.7-2.jar:?]
        at cc.blynk.core.http.handlers.StaticFileHandler.exceptionCaught(StaticFileHandler.java:277) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:286) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:265) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:257) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:286) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:265) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:257) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:286) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:265) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:257) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:286) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:265) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:257) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:286) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:265) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:257) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:286) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:265) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:257) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireExceptionCaught(CombinedC:
08:28:00.752 WARN  - Failed to mark a promise as failure because it has succeeded already: DefaultChannelPromise@7cd20
32d(success)io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 18713 byte(s) of direct memory (used: 468184872, max: 468189184)        
	at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:631) ~[server-0.21.7-2.jar:?]        
	at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:585) ~[server-0.21.7-2.jar:?]        
	at io.netty.buffer.UnpooledUnsafeNoCleanerDirectByteBuf.allocateDirect(UnpooledUnsafeNoCleanerDirectByteBuf.java:30) ~[server-0.21.7-2.jar:?]        
	at io.netty.buffer.UnpooledUnsafeDirectByteBuf.<init>(UnpooledUnsafeDirectByteBuf.java:68) ~[server-0.21.7-2.jar:?]
        at io.netty.buffer.UnpooledUnsafeNoCleanerDirectByteBuf.<init>(UnpooledUnsafeNoCleanerDirectByteBuf.java:25) ~[server-0.21.7-2.jar:?]
        at io.netty.buffer.UnsafeByteBufUtil.newUnsafeDirectByteBuf(UnsafeByteBufUtil.java:624) ~[server-0.21.7-2.jar:?]
        at io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:65) ~[server-0.21.7-2.jar:?]
        at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:179) ~[server-0.21.7-2.jar:?]
        at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:170) ~[server-0.21.7-2.jar:?]
        at io.netty.handler.ssl.SslHandler.allocate(SslHandler.java:1533) ~[server-0.21.7-2.jar:?]
        at io.netty.handler.ssl.SslHandler.allocateOutNetBuf(SslHandler.java:1544) ~[server-0.21.7-2.jar:?]
        at io.netty.handler.ssl.SslHandler.wrap(SslHandler.java:575) ~[server-0.21.7-2.jar:?]
        at io.netty.handler.ssl.SslHandler.wrapAndFlush(SslHandler.java:550) ~[server-0.21.7-2.jar:?]
        at io.netty.handler.ssl.SslHandler.flush(SslHandler.java:531) ~[server-0.21.7-2.jar:?]
        at io.netty.handler.ssl.SslHandler.flush(SslHandler.java:1324) ~[server-0.21.7-2.jar:?]
        at io.netty.handler.ssl.SslHandler.closeOutboundAndChannel(SslHandler.java:1307) ~[server-0.21.7-2.jar:?]
        at io.netty.handler.ssl.SslHandler.close(SslHandler.java:498) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:625) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:609) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.close(CombinedChannelDuplexHandler.java:504) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.ChannelOutboundHandlerAdapter.close(ChannelOutboundHandlerAdapter.java:71) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.CombinedChannelDuplexHandler.close(CombinedChannelDuplexHandler.java:315) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:625) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:609) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.ChannelDuplexHandler.close(ChannelDuplexHandler.java:73) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:625) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:609) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:466) ~[server-0.21.7-2.jar:?]
        at cc.blynk.server.core.protocol.handlers.DefaultExceptionHandler.handleUnexpectedException(DefaultExceptionHandler.java:59) ~[server-0.21.7-2.jar:?]
        at cc.blynk.server.core.protocol.handlers.DefaultExceptionHandler.handleGeneralException(DefaultExceptionHandler.java:43) ~[server-0.21.7-2.jar:?]
        at cc.blynk.core.http.handlers.StaticFileHandler.exceptionCaught(StaticFileHandler.java:277) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:286) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:265) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:257) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:286) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:265) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:257) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:286) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:265) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:257) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:286) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:265) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:257) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:286) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:265) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:257) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:286) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:265) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:257) ~[server-0.21.7-2.jar:?]
        at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireExceptionCaught(CombinedC:

I did restart and made heap dump before abnormal memory consumption and after first error messages from above :

memory

This screenshot shows difference between heap after server start (takes 17% of RAM of Instance) and first OOM in logs (takes 31% of RAM of instance). Instance RAM is 2 GB. So look like all direct memory was consumed (468MB) while heap itself takes less than direct buffers. Load on server is pretty low - 900 req/sec, with ~600 active connections. CPU consumption is only ~15%.

I tried to analyze heap dump but I don't know netty well in order to make any conclusions.

java version "1.8.0_111"
Java(TM) SE Runtime Environment (build 1.8.0_111-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.111-b14, mixed mode)
        <netty.version>4.1.7.Final</netty.version>
        <netty.tcnative.version>1.1.33.Fork25</netty.tcnative.version>

        <dependency>
            <groupId>io.netty</groupId>
            <artifactId>netty-transport-native-epoll</artifactId>
            <version>${netty.version}</version>
            <classifier>${epoll.os}</classifier>
        </dependency>
        <dependency>
            <groupId>io.netty</groupId>
            <artifactId>netty-tcnative</artifactId>
            <version>${netty.tcnative.version}</version>
            <classifier>${epoll.os}</classifier>
        </dependency>

Right now I'm playing with

-Dio.netty.leakDetectionLevel=advanced 
-Dio.netty.noPreferDirect=true 
-Dio.netty.allocator.type=unpooled 
-Dio.netty.maxDirectMemory=0

to find out working settings. I'll update ticket with additional info if any.

Unfortunately I wasn't able to reproduce this issue on QA env. Please let me know if you need more info.

@johnou
Copy link
Contributor

johnou commented Jan 16, 2017

@doom369 interesting find, keep us updated.

@normanmaurer
Copy link
Member

@doom369 just a guess... can you try set: -Dio.netty.maxDirectMemory=0 ?

@doom369
Copy link
Contributor Author

doom369 commented Jan 16, 2017

@normanmaurer

can you try set: -Dio.netty.maxDirectMemory=0 ?

I started 3 different servers with different params :

1 (epoll + no openssl) - java -Dio.netty.noPreferDirect=true -Dio.netty.allocator.type=unpooled -jar server-0.21.7-3.jar
2 (epoll + no openssl) - java -Dio.netty.noPreferDirect=true -Dio.netty.allocator.type=unpooled -Dio.netty.maxDirectMemory=0 -jar server-0.21.7-3.jar
3 (epoll + openssl) - java -Dio.netty.noPreferDirect=true -Dio.netty.allocator.type=unpooled -Dio.netty.maxDirectMemory=0 -Dio.netty.leakDetectionLevel=advanced -jar server-0.21.7-3.jar

For 2 hours all servers up and running without any suspicious memory consumption. However 2 hours is not yet enough for reproducing prev behavior.

@doom369
Copy link
Contributor Author

doom369 commented Jan 17, 2017

Just now server 3 died (11 hours passed). The funny thing is - it has 4 times less load than server 1 and 2 times less load than server 2. Netty LEAK detection doesn't show anything.

Error :

23:04:54.942 WARN  - Failed to mark a promise as failure because it has succeeded already: DefaultChannelPromise@47c23f57(success)
java.lang.OutOfMemoryError: Direct buffer memory
        at java.nio.Bits.reserveMemory(Bits.java:693) ~[?:1.8.0_111]
        at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) ~[?:1.8.0_111]
        at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) ~[?:1.8.0_111]
        at io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:111) ~[server-0.21.7-3.jar:?]
        at io.netty.buffer.UnpooledUnsafeDirectByteBuf.<init>(UnpooledUnsafeDirectByteBuf.java:68) ~[server-0.21.7-3.jar:?]
        at io.netty.buffer.UnsafeByteBufUtil.newUnsafeDirectByteBuf(UnsafeByteBufUtil.java:626) ~[server-0.21.7-3.jar:?]
        at io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:65) ~[server-0.21.7-3.jar:?]
        at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:179) ~[server-0.21.7-3.jar:?]
        at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:170) ~[server-0.21.7-3.jar:?]
        at io.netty.handler.ssl.SslHandler.allocate(SslHandler.java:1533) ~[server-0.21.7-3.jar:?]
        at io.netty.handler.ssl.SslHandler.allocateOutNetBuf(SslHandler.java:1544) ~[server-0.21.7-3.jar:?]
        at io.netty.handler.ssl.SslHandler.wrap(SslHandler.java:575) ~[server-0.21.7-3.jar:?]
        at io.netty.handler.ssl.SslHandler.wrapAndFlush(SslHandler.java:550) ~[server-0.21.7-3.jar:?]
        at io.netty.handler.ssl.SslHandler.flush(SslHandler.java:531) ~[server-0.21.7-3.jar:?]
        at io.netty.handler.ssl.SslHandler.flush(SslHandler.java:1324) ~[server-0.21.7-3.jar:?]
        at io.netty.handler.ssl.SslHandler.closeOutboundAndChannel(SslHandler.java:1307) ~[server-0.21.7-3.jar:?]
        at io.netty.handler.ssl.SslHandler.close(SslHandler.java:498) ~[server-0.21.7-3.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:625) ~[server-0.21.7-3.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:609) ~[server-0.21.7-3.jar:?]
        at io.netty.channel.ChannelOutboundHandlerAdapter.close(ChannelOutboundHandlerAdapter.java:71) ~[server-0.21.7-3.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:625) ~[server-0.21.7-3.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:609) ~[server-0.21.7-3.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:466) ~[server-0.21.7-3.jar:?]
        at cc.blynk.server.core.protocol.handlers.DefaultExceptionHandler.handleUnexpectedException(DefaultExceptionHandler.java:59) ~[server-0.21.7-3.jar:?]
        at cc.blynk.server.core.protocol.handlers.DefaultExceptionHandler.handleGeneralException(DefaultExceptionHandler.java:43) ~[server-0.21.7-3.jar:?]
        at cc.blynk.server.handlers.BaseSimpleChannelInboundHandler.exceptionCaught(BaseSimpleChannelInboundHandler.java:91) ~[server-0.21.7-3.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:286) ~[server-0.21.7-3.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:265) ~[server-0.21.7-3.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:257) ~[server-0.21.7-3.jar:?]
        at io.netty.channel.ChannelHandlerAdapter.exceptionCaught(ChannelHandlerAdapter.java:78) ~[server-0.21.7-3.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:286) ~[server-0.21.7-3.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:265) ~[server-0.21.7-3.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:257) ~[server-0.21.7-3.jar:?]
        at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) ~[server-0.21.7-3.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:286) ~[server-0.21.7-3.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:265) ~[server-0.21.7-3.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:257) ~[server-0.21.7-3.jar:?]
        at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) ~[server-0.21.7-3.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:286) ~[server-0.21.7-3.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:265) ~[server-0.21.7-3.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:257) ~[server-0.21.7-3.jar:?]
        at io.netty.handler.ssl.SslHandler.exceptionCaught(SslHandler.java:793) ~[server-0.21.7-3.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:286) ~[server-0.21.7-3.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.notifyHandlerException(AbstractChannelHandlerContext.java:851) ~[server-0.21.7-3.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[server-0.21.7-3.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349) ~[server-0.21.7-3.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341) ~[server-0.21.7-3.jar:?]
        at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287) ~[server-0.21.7-3.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) ~[server-0.21.7-3.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349) ~[server-0.21.7-3.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341) ~[server-0.21.7-3.jar:?]
        at io.netty.handler.traffic.AbstractTrafficShapingHandler.channelRead(AbstractTrafficShapingHandler.java:510) ~[server-0.21.7-3.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) ~[server-0.21.7-3.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349) ~[server-0.21.7-3.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341) ~[server-0.21.7-3.jar:?]
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334) ~[server-0.21.7-3.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) ~[server-0.21.7-3.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349) ~[server-0.21.7-3.jar:?]
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926) ~[server-0.21.7-3.jar:?]
        at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:1018) ~[server-0.21.7-3.jar:?]

I did quick restart of this server with only :

-Dio.netty.noPreferDirect=true -Dio.netty.allocator.type=unpooled

and it died again within 5 minutes.

So I turned off openSSl and run again with -Dio.netty.noPreferDirect=true -Dio.netty.allocator.type=unpooled . All is fine again, at least for now.

@normanmaurer
Copy link
Member

@doom369 so what exact confit not produce the problem and what does ? I am a bit confused... also could you please try to upgrade in more intectemental steps so we can find out at which version this starts to happen?

@normanmaurer
Copy link
Member

@doom369 and a heap dump would be nice

@doom369
Copy link
Contributor Author

doom369 commented Jan 17, 2017

so what exact confit not produce the problem and what does ?

-Dio.netty.noPreferDirect=true -Dio.netty.allocator.type=unpooled + no openSSL - ok
-Dio.netty.noPreferDirect=true -Dio.netty.allocator.type=unpooled -Dio.netty.maxDirectMemory=0 + no openSSL - ok

-Dio.netty.noPreferDirect=true -Dio.netty.allocator.type=unpooled + openSSL - failed
-Dio.netty.noPreferDirect=true -Dio.netty.allocator.type=unpooled -Dio.netty.maxDirectMemory=0 -Dio.netty.leakDetectionLevel=advanced + openSSL - failed

and a heap dump would be nice

I'll try in case of next failure, hope it will not appear soon :).

@normanmaurer
Copy link
Member

Interesting... so whenever you use OpenSSL it blows up? And whenever you use JDK SSL it never does?

@Scottmitch
Copy link
Member

This is what it looks like to me too ... #6222 (comment)

@doom369
Copy link
Contributor Author

doom369 commented Jan 17, 2017

so whenever you use OpenSSL it blows up?

Correct. (Latest netty and fork25 of tcnative).

And whenever you use JDK SSL it never does?

With -Dio.netty.noPreferDirect=true -Dio.netty.allocator.type=unpooled all is fine so far. However there is also issue with pooled allocator I had before with 4.1.4-Final. This is related ticket - #5723. I can't say is it releated to tcnative (fork15) or to pooled allocator. Maybe this is different issues.

I can say for sure situation became worse with update.
And the worst thing - I tried to reproduce in test env. with different scenarios with no luck.

@Scottmitch
Copy link
Member

Scottmitch commented Jan 17, 2017

So you are also using Netty's SOCKS code? Is it possible to try a test scenario w/out SSL to rule that out (don't do this in production or with real use data if that doesn't make sense for your scenario)?

Providing a reproducer would also help.

@doom369
Copy link
Contributor Author

doom369 commented Jan 17, 2017

@Scottmitch no. #5723 seems very similar to issue I had.

Few more details. I migrated from 4.0.37 to 4.1.4-Final - 6 months ago (epoll + openSSL). All was fine (servers were running for weeks). However ~1 month ago one of servers went down with OOM. This is where I found this #5723 and it was very similar to what I saw in logs (I also attached there heap dump screen from problem instance). Look like some new scenarios on my servers pulled some triggers. Servers started to die more and more often. I did many tests with high load scenarios in test env. but with no luck.
So I did update to latest netty to avoid this OOM with no luck too. That's where I found all this new errors.

@doom369
Copy link
Contributor Author

doom369 commented Jan 17, 2017

Hm... I just remembered that I did all my tests without openSSL in order to reproduce issue on prod. Let check again.

@normanmaurer
Copy link
Member

@doom369 please report back...

@doom369
Copy link
Contributor Author

doom369 commented Jan 17, 2017

@normanmaurer @Scottmitch So here is my findings so far :

I made a simple test that creates 400 users (opens 400 SSL keep-alive connections) and 400 hardwares (opens 400 plain tcp/ip keep-alive connections). 800 in total. All hardware connections send 1 message in loop. Pipeline on server delivers those message to corresponding user. So this is like 1 to 1 chat for 400 users.

I run same test on few configurations :

epoll + opensll - failed with OOM (1-2 minutes).
epoll + no openssl - failed with OOM.
epoll + opensll + -Dio.netty.noPreferDirect=true -Dio.netty.allocator.type=unpooled works for ~5 minutes while consumes memory slowly. After it grabs 40% of available memory all connections dropped. However no OOM here.
epoll + -Dio.netty.noPreferDirect=true -Dio.netty.allocator.type=unpooled - ok.

Test creates very high request rate (however bandwidth is low ~10 Mbps) so in all tests some of 800 connections are dropped. In average there only 400-500 connections only survived. With low request rate I'm not able to reproduce OOM. On prod load is 10 times lower, so issue for sure not in request rate.

Test code :

    @Test
    public void testConnectAppAndHardware() throws Exception {
        int clientNumber = 400;
        ExecutorService executorService = Executors.newFixedThreadPool(4);

        ClientPair[] clients = new ClientPair[clientNumber];
        List<Future<ClientPair>> futures = new ArrayList<>();

        long start = System.currentTimeMillis();
        for (int i = 0; i < clientNumber; i++) {
            String usernameAndPass = "dima" + i +  "@mail.ua 1";

            Future<ClientPair> future = executorService.submit(
                    () -> initClientsWithSharedNio("xx.xx.xx.xx", 8443, 8442, usernameAndPass, null, properties)
            );
            futures.add(future);
        }

        int counter = 0;
        for (Future<ClientPair> clientPairFuture : futures) {
            clients[counter] = clientPairFuture.get();
            //removing mocks, replace with real class
            clients[counter].appClient.replace(new SimpleClientHandler());
            clients[counter].hardwareClient.replace(new SimpleClientHandler());
            counter++;
        }

        System.out.println(clientNumber + " client pairs created in " + (System.currentTimeMillis() - start));
        assertEquals(clientNumber, counter);

        int i = 0;
        while (true) {
            if (++i % 2 == 0) {
                continue;
            }
            for (ClientPair clientPair : clients) {
                //clientPair.appClient.send("hardware aw 10 10");
                clientPair.hardwareClient.send("hardware vw " + ThreadLocalRandom.current().nextInt(1, 128) + " " + ThreadLocalRandom.current().nextFloat());
            }
            sleep(20);
        }
    }

I tried also to use

-Dio.netty.noUnsafe=true and -Dio.netty.tryUnsafe=false

but in both cases I get NPE :

root@load-test:~# Exception in thread "main" java.lang.IllegalStateException: failed to create a child event loop
	at io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:88)
	at io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:58)
	at io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:47)
	at io.netty.channel.MultithreadEventLoopGroup.<init>(MultithreadEventLoopGroup.java:58)
	at io.netty.channel.epoll.EpollEventLoopGroup.<init>(EpollEventLoopGroup.java:100)
	at io.netty.channel.epoll.EpollEventLoopGroup.<init>(EpollEventLoopGroup.java:87)
	at io.netty.channel.epoll.EpollEventLoopGroup.<init>(EpollEventLoopGroup.java:64)
	at io.netty.channel.epoll.EpollEventLoopGroup.<init>(EpollEventLoopGroup.java:48)
	at cc.blynk.server.TransportTypeHolder.<init>(TransportTypeHolder.java:43)
	at cc.blynk.server.TransportTypeHolder.<init>(TransportTypeHolder.java:32)
	at cc.blynk.server.Holder.<init>(Holder.java:99)
	at cc.blynk.server.launcher.ServerLauncher.start(ServerLauncher.java:71)
	at cc.blynk.server.launcher.ServerLauncher.main(ServerLauncher.java:65)
Caused by: java.lang.NullPointerException
	at io.netty.util.internal.PlatformDependent0.allocateMemory(PlatformDependent0.java:606)
	at io.netty.util.internal.PlatformDependent.allocateMemory(PlatformDependent.java:264)
	at io.netty.channel.epoll.IovArray.<init>(IovArray.java:64)
	at io.netty.channel.epoll.EpollEventLoop.<init>(EpollEventLoop.java:54)
	at io.netty.channel.epoll.EpollEventLoopGroup.newChild(EpollEventLoopGroup.java:130)
	at io.netty.channel.epoll.EpollEventLoopGroup.newChild(EpollEventLoopGroup.java:35)
	at io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:84)
	... 12 more

Advanced mode for memory leak shows nothing in all cases.

@normanmaurer
Copy link
Member

@doom369 super strange... can I have a heap dump now :) ? I will also fix the NPE as this is not expect. In fact the epoll transport requires unsafe and should fail with another exception if unsafe is not present

@doom369
Copy link
Contributor Author

doom369 commented Jan 17, 2017

can I have a heap dump now :) ?

Sure :). This is thread dumps for test case 1 (epoll + openssl) :

https://www.dropbox.com/s/efytgymi202u81k/start.bin?dl=0 (70mb)
https://www.dropbox.com/s/05okdmgl1qr8mls/finish.bin?dl=0 (330mb)

One more thing. All tests were done with :

java version "1.8.0_111"
Java(TM) SE Runtime Environment (build 1.8.0_111-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.111-b14, mixed mode)

Ubuntu 14.04 64x
Digital Ocean instance for 20$ - 2GB RAM, 2 Cores.

@normanmaurer
Copy link
Member

@doom369 I gutes there is no way to provide an reproducer ?

@doom369
Copy link
Contributor Author

doom369 commented Jan 17, 2017

@normanmaurer

  1. Download - https://github.com/blynkkk/blynk-server/releases/download/v0.21.7/server-0.21.7-3.jar
  2. Download - https://www.dropbox.com/s/f5hno4k0omcqj3g/server.properties?dl=0
  3. Put them in 1 folder; (server.properties enables epoll and openssl)
  4. Build tcnative (git checkout netty-tcnative-parent-1.1.33.Fork25);
  5. Run with java -jar server-0.21.7-3.jar &;
  6. Checkout https://github.com/blynkkk/blynk-server.git
  7. Find SimplePerformanceTest, put IP of your server and run it.

@normanmaurer
Copy link
Member

@doom369 awesome let me try :)

@normanmaurer normanmaurer self-assigned this Jan 17, 2017
@Scottmitch
Copy link
Member

I will also fix the NPE as this is not expect.

@normanmaurer - IIRC the kqueue PR does this already

@normanmaurer
Copy link
Member

@Scottmitch cool, anyway let us fix this as a separate PR (working on it)

@normanmaurer
Copy link
Member

@doom369 what else do I need to install ? Seems like at least redis... anything else ?

@doom369
Copy link
Contributor Author

doom369 commented Jan 17, 2017

@normanmaurer no need in redis. Sorry, my bad. Please replace string real with local in Holder class. Line 126.

@normanmaurer
Copy link
Member

@doom369 ok.. I see lot of these WARN lines:

20:33:29.444 INFO  - Enabling one-way auth with no certs checks.
20:33:29.479 INFO  - Creating app client. Host 127.0.0.1, sslPort : 8443
20:33:29.479 WARN  - server.ssl.cert file was not found at /test-certs/mutual/server.crt location
20:33:29.479 WARN  - client.ssl.cert file was not found at  location
20:33:29.479 WARN  - client.ssl.key file was not found at  location
20:33:29.479 INFO  - Enabling one-way auth with no certs checks.
400 client pairs created in 10397

Is this something that is expected ?

@doom369
Copy link
Contributor Author

doom369 commented Jan 17, 2017

Yeah, that's fine.

@normanmaurer
Copy link
Member

@doom369 ok its running now... so the OOME should happen on the server I guess ?

@doom369
Copy link
Contributor Author

doom369 commented Jan 17, 2017

Correct.

@rkapsi
Copy link
Member

rkapsi commented Jan 18, 2017

Interesting, any chance this is related to #6249?

@normanmaurer
Copy link
Member

With the help of @doom369 I was able to track down the change that is guilty for this "regression". #6252 should fix this. The commit message of #6252 should give you a better idea what happened, so I will not repeat it here but to make it short its not a memory leak but just a change in how much memory is used when using our custom SSLEngine impl.

@rkapsi @doom369 can you please check the PR and let me know...

Thanks again to @doom369 for all the help tracking this down. Without you this would have been not possible or took a way longer.

@doom369
Copy link
Contributor Author

doom369 commented Jan 19, 2017

@normanmaurer checked again with your PR from scratch. Everything seems fine. Not reproducible anymore. All last tests I also run with io.netty.leakDetection.level=disabled. Static Boring SSL seems fine too with it. Thanks!

@doom369 doom369 closed this as completed Jan 19, 2017
@normanmaurer
Copy link
Member

@doom369 thanks a lot !

@normanmaurer normanmaurer added this to the 4.0.44.Final milestone Jan 19, 2017
@normanmaurer
Copy link
Member

Let me re-open until the PR is merged.

@normanmaurer normanmaurer reopened this Jan 19, 2017
@Scottmitch
Copy link
Member

great work @normanmaurer !

+1 on the big thanks to @doom369 for the debug support!

@normanmaurer
Copy link
Member

Fixed by #6252

@doom369
Copy link
Contributor Author

doom369 commented Jan 22, 2017

@normanmaurer I was able to reproduce issue after 24 hour run on production. Please reopen ticket.

OpenSSL enabled, no any additional options.
No traffic spikes. Actually traffic was very low - 0.5 Mbps in and same out.
600 opened connections. 800 req/sec (messages are very small ~20 bytes mostly)
Instantly memory just started growing for no reason until OOM in logs.
No LEAK reports from netty.

        <netty.version>4.1.8.Final-SNAPSHOT</netty.version>
        <netty.tcnative.version>1.1.33.Fork26</netty.tcnative.version>

This is screen of heap after OOM started :

leak

I have no issue on this instance while I had no openSSL enabled and -Dio.netty.noPreferDirect=true -Dio.netty.allocator.type=unpooled.

@normanmaurer normanmaurer reopened this Jan 22, 2017
@normanmaurer
Copy link
Member

Can a get a dump again?

@doom369
Copy link
Contributor Author

doom369 commented Jan 22, 2017

@normanmaurer sure. sent link via email.

@normanmaurer
Copy link
Member

normanmaurer commented Jan 22, 2017 via email

@doom369
Copy link
Contributor Author

doom369 commented Jan 22, 2017

@normanmaurer latest one. 9077269

Yesterday pulled all changes and did build and deploy.

@normanmaurer
Copy link
Member

@doom369 I did not have time yet to investigate in detail but seems like the dumps are not up anymore and i missed to download them in time as I was busy. Could you please reupload and ensure these not "timeout" ?

@doom369
Copy link
Contributor Author

doom369 commented Jan 23, 2017

@normanmaurer done, see your email.

@normanmaurer
Copy link
Member

@doom369 will ping you tomorrow morning and see if we can find out whats wrong.

@doom369
Copy link
Contributor Author

doom369 commented Jan 23, 2017

Ok.Thanks for looking into it.

@normanmaurer
Copy link
Member

normanmaurer commented Jan 25, 2017

After chatting with @doom369 and analysing some heap dumps we found out the last problem was caused by writing to fast.

@doom369 please open a new issue if you think there is still an issue.

@prayagupa
Copy link

Increasing JVM heap from 96M (on 700M machine) to 96*3MB (2100M) is working for me. At least temporarily.

Jan 25, 2019 9:53:48 PM com.twitter.finagle.netty4.channel.ChannelExceptionHandler exceptionCaught WARNING: Unhandled exception in connection with /10.1.18.1:55388, shutting down connection io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 1048576 byte(s) of direct memory (used: 94371847, max: 95158272) at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:640) at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:594) at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:764) at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:740) at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:244) at io.netty.buffer.PoolArena.allocate(PoolArena.java:214) at io.netty.buffer.PoolArena.allocate(PoolArena.java:146) at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:324) at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:185) at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:176) at io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:137) at io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:114) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:147) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:646) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:581) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:460) at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at com.twitter.finagle.util.BlockingTimeTrackingThreadFactory$$anon$1.run(BlockingTimeTrackingThreadFactory.scala:23) at java.lang.Thread.run(Thread.java:748)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants