Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native leak as of switch from TCN 2.0.0 to 2.0.1 #6789

Closed
rkapsi opened this issue May 29, 2017 · 9 comments
Closed

Native leak as of switch from TCN 2.0.0 to 2.0.1 #6789

rkapsi opened this issue May 29, 2017 · 9 comments
Assignees
Labels

Comments

@rkapsi
Copy link
Member

rkapsi commented May 29, 2017

I've been trying to switch from Netty 4.1.9.Final w/ netty-tcnative 2.0.0 to 4.1.10.Final (actually 4.1.11 at this point) and netty-tcnative 2.0.1 but have run into something that appears to be a native memory leak.

I have git bisect'd it down to this 4bcfa07 commit. I don't think it's this change per se but that's when the TCN 2.0.0 to 2.0.1 switch happened.

Now, I don't have 2.0.1-SNAPSHOT dated for March 29, 2017 (date of the commit). So I've been using the respective release versions of TCN and the leak could be in any of the commits between 2.0.0 and 2.0.1. I have tried the OPENSSL provider as well as the OPENSSL_REFCNT provider and also tested it in conjuction with BoringSSL a little bit. They all show the same behavior: 2.0.0 is fine, 2.0.1 goes south and the application runs eventually out of memory and Linux's OOM killer terminates it.

JVM heap dumps (jmap, yourkit) have been squeaky clean and Netty's own leak detection hasn't flagged anything (running in advanced mode). Random observations: The memory disppears at a constant low pace. The JVM's performance is not impacted by it in the sense that the JVM isn't gasping for memory as it usually happens. It runs fine until Netty's own OutOfDirectMemoryError start happening and Linux's OOM killer steps in. Just in case, I'm running without OCSP stapling.

Expected behavior

Actual behavior

netty-411-tcn-201

Caused by: io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 2600468774, max: 2614624256)
	at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:610) ~[netty-all-4.1.10.Final.jar:4.1.10.Final]
	at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:564) ~[netty-all-4.1.10.Final.jar:4.1.10.Final]
	at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:764) ~[netty-all-4.1.10.Final.jar:4.1.10.Final]
	at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:740) ~[netty-all-4.1.10.Final.jar:4.1.10.Final]
	at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:244) ~[netty-all-4.1.10.Final.jar:4.1.10.Final]
	at io.netty.buffer.PoolArena.allocate(PoolArena.java:226) ~[netty-all-4.1.10.Final.jar:4.1.10.Final]
	at io.netty.buffer.PoolArena.allocate(PoolArena.java:146) ~[netty-all-4.1.10.Final.jar:4.1.10.Final]
	at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:324) ~[netty-all-4.1.10.Final.jar:4.1.10.Final]
	at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:181) ~[netty-all-4.1.10.Final.jar:4.1.10.Final]
	at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:172) ~[netty-all-4.1.10.Final.jar:4.1.10.Final]
	at io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:133) ~[netty-all-4.1.10.Final.jar:4.1.10.Final]
	at io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:80) ~[netty-all-4.1.10.Final.jar:4.1.10.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:122) ~[netty-all-4.1.10.Final.jar:4.1.10.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644) ~[netty-all-4.1.10.Final.jar:4.1.10.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:579) ~[netty-all-4.1.10.Final.jar:4.1.10.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:496) ~[netty-all-4.1.10.Final.jar:4.1.10.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458) [netty-all-4.1.10.Final.jar:4.1.10.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [netty-all-4.1.10.Final.jar:4.1.10.Final]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_102]

Steps to reproduce

Minimal yet complete reproducer code (or URL to code)

Netty version

JVM version (e.g. java -version)

OS version (e.g. uname -a)

@normanmaurer
Copy link
Member

normanmaurer commented May 30, 2017 via email

@normanmaurer
Copy link
Member

@rkapsi another thing that strikes me as odd is that netty-tcnative should make no difference in how the native memory allocation are restricted. Did you try to set the leak detection to paranoid ?

@normanmaurer
Copy link
Member

@rkapsi also just to clarify if you move one commit before 4bcfa07 all works and once you use 4bcfa07 it fails ?

@rkapsi
Copy link
Member Author

rkapsi commented May 30, 2017

@normanmaurer

regard heapdump: unfortunately no because of the PKI material it contains. I can share screenshots and numbers if you're interested in specific classes/objects but from what I've seen there is nothing really sicking out. I've also used YourKit's diff functionality to compare heap dumps of good and bad versions and they're pretty much the same in terms of JVM heap. I should also emphasize that the "file size" of the good and bad dumps is pretty much the same which is a sign for a non-JVM heap leak.

regards alloc: I think it's just a side effect. The graph above shows free memory as seen/reported by Linux itself (it's not a JVM metric). Once we're out of system memory strange things start to happen. I'll see if I can run it with paranoid but this git bisect alone took 2 workdays - in other words plenty of time to catch something with advanced mode (and TBH I've been hunting this bug for a week).

regards 4bcfa07: yes. Prior commits are fine, commits after 4bcfa07 (inclusive) are not. The distinguishing factor IMHO is that it's the point when the TCN 2.0.0 to 2.0.1 switch happened. Because I don't have TCN 2.0.1-SNAPSHOT dating from March 29, 2017 I've been simply using the current release version.

I'm going to attempt to downgrade TCN (2.0.1 -> 2.0.0) of commit 4bcfa07 and maybe upgrade TCN (2.0.0 -> 2.0.1) on the prior commit.

After that I'll start bisecting TCN.

@Scottmitch
Copy link
Member

@rkapsi - I took a glance at the diff between 2.0.0 and 2.0.1 and I think I found a leak. Can you verify that netty/netty-tcnative#276 fixes your issue?

@Scottmitch
Copy link
Member

Fixed by netty/netty-tcnative#276

@Scottmitch Scottmitch self-assigned this May 30, 2017
@Scottmitch
Copy link
Member

Fix will be included in netty-tcnative 2.0.2

@floragunn
Copy link
Contributor

will netty-tcnative 2.0.2 work with netty 4.1.11 ?

@Scottmitch
Copy link
Member

@floragunncom - Yes. IIRC we didn't remove/modify existing method signatures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants