-
Notifications
You must be signed in to change notification settings - Fork 450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
netty out of direct memory #425
Comments
This is the first I've heard of the problem, and am not certain that upgrading to 0.9.2 would necessarily help. To check the math, it looks like you're exhausting about 16Gb of direct memory after "several months." At 30,000,000 notifications per patch, an estimated 3 batches per week, four weeks per month, and (let's guess) 3 months until memory exhaustion, it sounds like you're sending somewhere around a billion notifications before you run out of memory. In other words, we're losing something like 16 bytes per notification. I'm not entirely sure what could be going wrong here, but will see if I can reproduce the problem locally. To be clear, are you using |
my pushy version is 0.9.1, jdk 1.8 ,netty 4.1.8.Final, can sugges me how to solve this.
|
@kadar2012 It sounds like you're having a different problem. At your convenience, please open a separate ticket. We'll need some more information to be able to help you, too; when you create that new ticket, can you share some of the code you're using to create Thanks! |
@jchambers Hi, there, I'm using alpn-boot 8.1.6.v20151105. I agree with your analysis and wish you can find something helpful with your local experiments. By the way, I have upgrade to version 0.9.2. If there is any thing new about this stuff, I will reply here. |
@jchambers i have open a new ticket. thanks in advance. |
I encounter this same issue just one day after I upgraded to v0.9.2. Moreover, I start to dount that this issue is caused by netty itself, because the inner exception throws from netty code. My netty version is 4.1.6.Final
|
Could you please tell me how can I locate the exact place of memory leak? I find memory leak proof in the logs of our production environment as follows:
But after I add -Dio.netty.leakDetection.level=advanced . I still cannot get any more infomation about this issue. I wonder if there is any switch in the code of pushy where I can turn on to locate the exact place of memory leak. Thanks for you attention~ |
I'm not entirely sure, honestly. Maybe the Netty folks could provide some advice? The latest update to Netty has some improvements around leak detection, and I plan on updating Pushy's Netty version shortly. Pardon the delay here, by the way; I've just returned from a few weeks of vacation out of the country and am still trying to catch up on everything. |
I can only find two places where we're allocating our own direct memory:
In both cases, I'm confident we're releasing the buffers we're allocating. Everything else is either a heap buffer or managed by something deeper inside Netty. I'm worried that this might be an upstream problem. Still, we'll upgrade to the latest version (#440 should be merged shortly), and hopefully that will either (a) solve the problem or (b) give us the tools to figure out where the problem is. |
@ioilala While I don't necessarily expect it to solve the problem, we did just release Pushy v0.9.3, which uses the latest version of Netty. Would you mind giving that a try and letting us know if the problem persists? Thanks! |
I couldn't reproduce this locally, but may be testing the wrong things. @ioilala to eliminate some possible differences between our environments:
My other hypothesis is that something is happening when notifications get rejected (in which case, we'd be losing a larger amount of memory less frequently, and it just so happens to amortize out to about 16 bytes/notification). Thanks! |
I strongly suspect this issue may have been resolved by moving from finalizer-based direct memory recovery to reference-counted direct memory recovery (by way of using reference-counted SSL providers) in v0.10. If you wouldn't mind upgrading and letting us know if the problem persists, we'd certainly appreciate it. If we don't hear from you in a few weeks, we'll close this issue, but will happily re-open it if you get back to us later. Thanks! |
Closing due to lack of information. @ioilala if you believe this problem persists, please leave a comment here and we'll happily reopen this issue. Thank you! |
i have the same question in v0.11.0 in my production environment and try to find the reason. There is any method to fix it now? |
@yanggz01 Try 0.11.3; 0.11.0 does not have reference-counted SSL providers (which we think fix this problem), but 0.11.1 and newer do (and if you're going to 0.11.1, you may as well go all the way to 0.11.3). |
@jchambers OK,i try 0.11.3 and watch it's perform, thanks. |
@jchambers i am so sad to tell you that 0.11.3 also throw the issue today in my production environment. |
Bummer. Thanks for letting me know. Is your situation comparable to the one reported by @ioilala (i.e. this happens after something like a billion notifications)? |
…also, can you tell me as much as possible about your environment? Which JVM are you using? OS? Are you using Thank you! |
@jchambers The count of notifications not beyond million and my OS is on virtual cloud hosting, i use maven, netty-tcnative is netty-tcnative-boringssl-static-2.0.6.Final.jar. and java version: OS: |
io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 251658527, max: 268435456) |
Hm. That sounds like a much steeper loss rate (~256 bytes/notification assuming ~1,000,000 notifications) than what @ioilala saw. Will continue to investigate. Thanks for the additional details. |
@yanggz01 A new version of Netty just came out with a very interesting fix: netty/netty#7338 I haven't been able to reproduce the problem locally (though I admit I haven't had a ton of time to devote to the problem in the past couple weeks). The experiments I DID run showed that there's definitely no leak under steady-state, error-free conditions. The users most affected by this issue seem to be in China, and we've observed very high rates of packet loss from China (particularly Beijing) to the production APNs servers. That could mean that the thing that's going wrong is some bad cleanup when something fails unexpectedly, and that's exactly what's fixed in the latest version of Netty. If you'd like to give #551 a try, I'd love to hear if it helps your problem. Thank you! |
@jchambers I am very thanks for your reply. I guess maybe the issue is due to the |
Well, my suspicion is that the problem is still there (since we haven't actually shipped any changes yet), but just takes much longer to run out of memory. After we ship 0.12, I think it should be safe to go back to 128M. If you get a chance to upgrade and make the change, please let us know. 0.12 should be out shortly. Thanks! |
the IOExcepiton stack for 0.11.3
I think the error may be occurred on
so ,I think the pushy may need to do :
Excuse me for my poor english. |
@qiankunli Thanks for the additional details! Just to make sure we're understanding each other, it sounds like you're saying that you're still seeing If so, I'll try to write some tests to simulate lots of connection failures and see if I can reproduce the problem that way. Thank you again for the additional details—this is very helpful! |
Hi @jchambers - I also joined as a user and I am wondering if there are any updates regarding this topic? |
Updates will be posted as they become available. Contributions—even if just of research—are always welcome. |
Friends, is anybody still seeing this issue in v0.13 or newer? I spent a long time trying to reproduce the problem with every flavor of bad network conditions I could think of, but haven't had much luck. Thanks! |
No error like this anymore, since we updated to the newest version of pushy. |
@klopfdreh Glad to hear it! To be clear though, you did affirmatively have this problem with older versions of Pushy? |
To be honest we had OOM Exceptions in one of the containers running pushy, but I don’t have any Stacktraces available anymore to have a look if this is exactly the issue mentioned here. We updated to 0.12.1 and even with high load everything is running fine, now. I would consider this bug to be closed. |
Okay; I'm going to close this for now, but if anybody is still seeing the issue in 0.13 or newer, please post a comment here and we'll reopen it. Thanks, everybody! |
Hi, @jchambers ,my pushy is 0.13.3, jdk1.8.0_131,netty 4.1.15
By the way, I'm not entirely sure if I am using alpn-boot, or using netty-tcnative. |
And how long does it take to run out of memory?
Pushy should log what it's using for an SSL provider when you construct your If you didn't deliberately add |
my pushy version:0.8.1, jdk 1.8.0_66,netty 4.1.5.Final
I get such exception as follows after I use pushy to send IOS push notifycations for several months in my production environment. We send about 30,000,000 notifications every time, and several times per week.
Do I have to upgrade to the newest version of pushy(0.9.2) to solve this problem?
The text was updated successfully, but these errors were encountered: