-
-
Notifications
You must be signed in to change notification settings - Fork 15.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Native leak as of switch from TCN 2.0.0 to 2.0.1 #6789
Comments
Would it still be possible to get a heapdump? Any way to reproduce easily?
… Am 30.05.2017 um 00:35 schrieb Roger ***@***.***>:
I've been trying to switch from Netty 4.1.9.Final w/ netty-tcnative 2.0.0 to 4.1.10.Final (actually 4.1.11 at this point) and netty-tcnative 2.0.1 but have run into something that appears to be a native memory leak.
I have git bisect'd it down to this 4bcfa07 commit. I don't think it's this change per se but that's when the TCN 2.0.0 to 2.0.1 switch happened.
Now, I don't have 2.0.1-SNAPSHOT dated for March 29, 2017 (date of the commit). So I've been using the respective release versions of TCN and the leak could be in any of the commits between 2.0.0 and 2.0.1. I have tried the OPENSSL provider as well as the OPENSSL_REFCNT provider and also tested it in conjuction with BoringSSL a little bit. They all show the same behavior: 2.0.0 is fine, 2.0.1 goes south and the application runs eventually out of memory and Linux's OOM killer terminates it.
JVM heap dumps (jmap, yourkit) have been squeaky clean and Netty's own leak detection hasn't flagged anything (running in advanced mode). Random observations: The memory disppears at a constant low pace. The JVM's performance is not impacted by it in the sense that the JVM isn't gasping for memory as it usually happens. It runs fine until Netty's own OutOfDirectMemoryError start happening and Linux's OOM killer steps in. Just in case, I'm running without OCSP stapling.
Expected behavior
Actual behavior
Caused by: io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 2600468774, max: 2614624256)
at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:610) ~[netty-all-4.1.10.Final.jar:4.1.10.Final]
at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:564) ~[netty-all-4.1.10.Final.jar:4.1.10.Final]
at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:764) ~[netty-all-4.1.10.Final.jar:4.1.10.Final]
at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:740) ~[netty-all-4.1.10.Final.jar:4.1.10.Final]
at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:244) ~[netty-all-4.1.10.Final.jar:4.1.10.Final]
at io.netty.buffer.PoolArena.allocate(PoolArena.java:226) ~[netty-all-4.1.10.Final.jar:4.1.10.Final]
at io.netty.buffer.PoolArena.allocate(PoolArena.java:146) ~[netty-all-4.1.10.Final.jar:4.1.10.Final]
at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:324) ~[netty-all-4.1.10.Final.jar:4.1.10.Final]
at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:181) ~[netty-all-4.1.10.Final.jar:4.1.10.Final]
at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:172) ~[netty-all-4.1.10.Final.jar:4.1.10.Final]
at io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:133) ~[netty-all-4.1.10.Final.jar:4.1.10.Final]
at io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:80) ~[netty-all-4.1.10.Final.jar:4.1.10.Final]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:122) ~[netty-all-4.1.10.Final.jar:4.1.10.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644) ~[netty-all-4.1.10.Final.jar:4.1.10.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:579) ~[netty-all-4.1.10.Final.jar:4.1.10.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:496) ~[netty-all-4.1.10.Final.jar:4.1.10.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458) [netty-all-4.1.10.Final.jar:4.1.10.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [netty-all-4.1.10.Final.jar:4.1.10.Final]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_102]
Steps to reproduce
Minimal yet complete reproducer code (or URL to code)
Netty version
JVM version (e.g. java -version)
OS version (e.g. uname -a)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@rkapsi another thing that strikes me as odd is that netty-tcnative should make no difference in how the native memory allocation are restricted. Did you try to set the leak detection to paranoid ? |
regard heapdump: unfortunately no because of the PKI material it contains. I can share screenshots and numbers if you're interested in specific classes/objects but from what I've seen there is nothing really sicking out. I've also used YourKit's diff functionality to compare heap dumps of good and bad versions and they're pretty much the same in terms of JVM heap. I should also emphasize that the "file size" of the good and bad dumps is pretty much the same which is a sign for a non-JVM heap leak. regards alloc: I think it's just a side effect. The graph above shows free memory as seen/reported by Linux itself (it's not a JVM metric). Once we're out of system memory strange things start to happen. I'll see if I can run it with paranoid but this git bisect alone took 2 workdays - in other words plenty of time to catch something with advanced mode (and TBH I've been hunting this bug for a week). regards 4bcfa07: yes. Prior commits are fine, commits after 4bcfa07 (inclusive) are not. The distinguishing factor IMHO is that it's the point when the TCN 2.0.0 to 2.0.1 switch happened. Because I don't have TCN 2.0.1-SNAPSHOT dating from March 29, 2017 I've been simply using the current release version. I'm going to attempt to downgrade TCN (2.0.1 -> 2.0.0) of commit 4bcfa07 and maybe upgrade TCN (2.0.0 -> 2.0.1) on the prior commit. After that I'll start bisecting TCN. |
@rkapsi - I took a glance at the diff between 2.0.0 and 2.0.1 and I think I found a leak. Can you verify that netty/netty-tcnative#276 fixes your issue? |
Fixed by netty/netty-tcnative#276 |
Fix will be included in netty-tcnative 2.0.2 |
will netty-tcnative 2.0.2 work with netty 4.1.11 ? |
@floragunncom - Yes. IIRC we didn't remove/modify existing method signatures. |
I've been trying to switch from Netty 4.1.9.Final w/ netty-tcnative 2.0.0 to 4.1.10.Final (actually 4.1.11 at this point) and netty-tcnative 2.0.1 but have run into something that appears to be a native memory leak.
I have git bisect'd it down to this 4bcfa07 commit. I don't think it's this change per se but that's when the TCN 2.0.0 to 2.0.1 switch happened.
Now, I don't have 2.0.1-SNAPSHOT dated for March 29, 2017 (date of the commit). So I've been using the respective release versions of TCN and the leak could be in any of the commits between 2.0.0 and 2.0.1. I have tried the OPENSSL provider as well as the OPENSSL_REFCNT provider and also tested it in conjuction with BoringSSL a little bit. They all show the same behavior: 2.0.0 is fine, 2.0.1 goes south and the application runs eventually out of memory and Linux's OOM killer terminates it.
JVM heap dumps (jmap, yourkit) have been squeaky clean and Netty's own leak detection hasn't flagged anything (running in advanced mode). Random observations: The memory disppears at a constant low pace. The JVM's performance is not impacted by it in the sense that the JVM isn't gasping for memory as it usually happens. It runs fine until Netty's own OutOfDirectMemoryError start happening and Linux's OOM killer steps in. Just in case, I'm running without OCSP stapling.
Expected behavior
Actual behavior
Steps to reproduce
Minimal yet complete reproducer code (or URL to code)
Netty version
JVM version (e.g.
java -version
)OS version (e.g.
uname -a
)The text was updated successfully, but these errors were encountered: