-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3.4.0 (official hhvm-3.4.0~trusty package) eats all memory+swap #4268
Comments
In my case hhvm eats RAM+swap (about 5gigs in total) in about 5/7 minutes. |
yesterday we had the same on 64 GB RAM and 8 GB SWAP in about 6 hours :/ |
@tat, this is an increase from 3.3 to 3.4? That's interesting. Can you get a heap profile for us? The process is unfortunately somewhat involved.
cc @paulbiss @mklooss, what you're experiencing is unfortunately somewhat expected, and is a long-term issue we've been slowly looking into. It's not indicative of server instability, we just haven't optimized for a super-long-running server very much, since FB pushes twice a day. (Though 6 hours is still quite short.) |
The admin server speaks FastCGI now, not HTTP - you'll also need to configure your webserver to give you access to it. |
Thanks for the feedback, I've got the admin interface working but I'm getting an error from the activate command: do you know what's the issue? where is the file supposed to be written to? /tmp ? Here's the jmalloc-stats output I captured from the admin interface, http://pastebin.com/vBvfPiP5 btw @jwatzman 3.3 is working fine for me, RAM usage is stable at about 350MB; it has been running for days without restarts. |
jemalloc Stats: https://gist.github.com/mklooss/8091e48c4551f40d05c8 |
I'm getting the same problem running a large wordpress site on HHVM. Memory usage starts at ~450mb and climbs to 1.2mb before restarting (not 100% if OOM killed or crashes yet) every ~2 hours This is HHVM 3.4.0 on ubuntu/trusty |
I just cherry-picked a memory leak fix into the 3.4 branch -- can someone who's experiencing this build that branch and report back? If it fixes it, we can roll a 3.4.1 release. The issue is that if you are passing invalid arguments to some builtin functions, such that the builtin raises a warning, we leak a small amount of memory each time -- and it looks new in 3.4. If your PHP app generates a lot of warnings from builtins, then this could easily be your bug :)
I don't, sorry -- @fredemmott, @paulbiss, can either of you advise better? |
I went ahead and built a deb for trusty with this patch: http://dl.hhvm.com/ubuntu/hhvm_3.4.1-devtest~trusty_amd64.deb You can manually install that so you don't have to build HHVM yourself; let me know if it works better. |
configure:
|
@jwatzman which change you cherry-pick? |
edf53c1 is the relevant cherry-pick. It does look like only a doc update, but AIUI we have a script that parses that file (in particular, lines of the form of the one changed) to generate a bunch of data about opcode semantics, and the change is thus relevant. It confused me as well until it was explained to me this morning :-P |
Thank you! We are building hhvm ourselves because we need some specific version of lib. I will pick the change and try it out to see how it goes |
Thanks for the patch and build, I'm trying it out now but unfortunately it looks like it's still leaking memory. There are no warning/errors in my hhvm log so it doesn't look like this was the cause of the leak for me. |
maybe you are using |
@staabm: that's been leaky for awhile, we're looking for a leak that was recently introduced |
Spent most of the morning looking at this. I wasn't able to reproduce it with the "representative WordPress" install from https://github.com/hhvm/oss-performance, unfortunately. However, I was able to reproduce the heap profiling failure, and can help you get us a heap profile. It's a little messy.
|
@jwatzman Put this in the wiki or somewhere! 👍 |
Yeah, good idea, will do if this ends up producing useful results :) |
Have any of you that are experiencing this been able to get any more info? Just confirming that the |
I'll install the devtest thing now on the live-server now. lets hope |
hm, apt-get keeps nagging me tell me a newer version is available... how can I avoid that? |
You can directly download the deb and then |
that's what I did. It replaced the installed hhvm, but now apt-get reports that updates are available and that triggers reporting systems and that triggers mails.... |
Can you just silence that for a little while? The package is deliberately built out-of-band, since it's unclear if it will help. (Though it's signed with the same GPG key as the official ones so you can tel it does come from us.) I'm not sure what reporting system you are using to tell you how to shut it up; you may try just commenting out the HHVM repo from |
Debug files sent to jwatzman. Let me know if you find anything, thanks! |
Thanks, that log file was very helpful. It looks like the JIT is just trying to compile an incredibly large chunk of code in a function with an abnormally large amount of locals, and we're using a lot of memory as a result. There are a few things you can do that should help. The problem is that you have a large amount of code in a pseudomain (code that isn't in any function, just at the top level of a file) and the way we compile those is pretty suboptimal. The quickest fix will be disabling compilation of those with the A better fix would be putting all of that code inside a function, rather than leaving it at the top level. I can't tell which file it was, but it looks like there are at least 395 local variables in it, and some of the functions it calls are It's of course possible that there's a real leak somewhere, but so far all signs are pointing to the massive compilation unit being the problem. |
@tat and this bit of code started consuming more memory in 3.4, which is why this probably just now hit you. I hear the memory usage will be somewhat improved in 3.5 or 3.6, but no promises -- and that clearly doesn't help you now :) @swtaarrs is there any tweaking we could do of defaults, or anything like that, which would make this failure more at least more debuggable, or hopefully go away, for external folks? This seems like something that folks will hit from time to time, and ideally they shouldn't, or at the very least it shouldn't be this hard to debug. |
@swtaarrs: I just deployed that option 'hhvm.jit_pseudomain = 0' to our server and restarted hhvm. |
I checked our Wordpress installation and couldn't find any of the aforementioned methods in the code. So this is not part of the Wordpress Core. |
@jwatzman @swtaarrs we definitely have a jumbo file with about 4k lines and most of them are top level. I tried it with hhvm.jit_pseudomain=0 and the memory consumed peaked at 21% (~600MB) but of course that is not useful for us as the jumbo file is the one that gets 95% of requests and it's not compiled with that setting. I'm trying now with hhvm.jit_max_region_instrs=500 and it didn't get oom'd yet but it's consuming 78% of memory at the moment (2.8gigs). the memory growth seems slower but still there imho (also we don't have much traffic at this hours). so it seems to be jit related but does it sound possible to you that it was using ~500mb on 3.3 and can't run with 3.5 gigs on 3.4? |
@liayn it's extremely likely that your issue is unrelated to what we're tracking down with @tat and so it's unsurprising that the options mentioned didn't help -- HHVM has some very longstanding slow memory leaks; a particularly bad regression went in to 3.4, that was fixed in the 3.4.1-devtest I posted above. I'll be rolling 3.4.1 with that fix and some other fixes that I'm waiting on being finished up, probably in the next couple of weeks. But most have been there well before 3.4, for years, and what you've said unfortunately sounds in line with them. That's just to say it's expected, not good -- I think some folks are going to try to go after those leaks some time after the holidays.
@bertmaher @swtaarrs @alexmalyshev any of you have any idea how much the RAM usage of |
Or maybe something that isn't a "leak" per-say -- the memory will eventually get cleaned up, but will stick around somewhat longer than before, causing an earlier OOM when combined with the |
@jwatzman, @tat: something that's still confusing to me is that FrameState should not be long-lived; we should allocate them and either (a) OOM immediately, which is what we think is happening here, or (b) finish translating and free the FrameStates. So something still feels weird here, unless this server never stops translating until it OOMs... |
Yeah, something weird is going on. https://gist.github.com/jwatzman/5c25aa6732e849df13e2 manages to reproduce the issue -- run the output of that on 3.4, refresh 12 times until we JIT, then watch RAM spike -- up to 1G on my machine. An idle heap dump looks very similar to what @tat sent -- with things that should never be running simultaneously, and shouldn't be running at all when idle. Looks either like a leak or our heap profiling is lying to us :) The issue actually looks much worse on master, though it could be a separate issue. (We peg the CPU and keep consuming RAM until we OOM with my above script -- we're at least stable at 1G on 3.4.) @swtaarrs and @bertmaher are continuing to look into this. |
We found it! It appears to be a bug in boost This was triggered by the new usage of I'll write a change tomorrow to work around this for old versions of boost, and get that merged into 3.4.1. |
Great!!! if you can upload a new .deb I'll test it tonight. On a side note it would be nice if the apt Package file would keep old versions listed so we could stay on a version that we know it works properly and upgrade manually after testing new versions (really needed when autoscaling is used). Thank you! |
Hey, |
Some notes:
Unfortunately
This is likely to be a separate issue, since I'm pretty sure 14.10 has a fixed boost library. HHVM is known to leak small amounts of memory over a long period of time (i.e., needing to restart the HHVM process every couple of days isn't unexpected). If the leak is worse than that, please open a separate issue -- this one is specifically about a regression from 3.3 to 3.4. |
@dmytroleonenko the heap dump you sent me was pretty clearly not from 14.10 -- are you sure you're not on 14.04? It looks a lot like you are, in which case you are likely hitting the same or a similar leak. Try the new nightly tonight (2014.12.13 or newer). |
Will tag and roll |
Summary: This is very likely to be the memory leak reported to be new in HHVM 3.4. See code comment and linked GitHub issue for full explanation. Fixes #4268 {sync, type="child", parent="internal", parentrevid="1736543", parentrevfbid="1584241511789266", parentdiffid="5935230"} Reviewed By: @paulbiss Differential Revision: D1736543
The release version of 3.4.1 for trusty (14.04) is up, which was the hardest hit and what I think everyone on this thread was using. Building debug version, and the other OSes, over the next day or two. Thanks for the info from everyone about this! @dmytroleonenko if you're still hitting problems after trying the 2014.12.13 nightly (which won't exist for another 12 hours or so) please file a new issue. |
great, thank you guys! I'll try it out and report back if I find any issues. |
Looks good with us as well. Initial memory usage already lower. So far memory seems stable. Thanks a lot to everyone! |
It looks much better. I'll try to see if the issue is still there and file a new bug if still there |
I had the same problem with HHVM + wordpress, the server was crashing every time we were posting something new. It turns out that the simple solution that works is just disable the "Try to automatically compress the sitemap if the requesting client supports it" setting in XML sitemaps plugin. |
@HumanWorks the problem we were tracking here turned out to be a memory leak in boost that was being triggered in the JIT, it's been fixed since 3.4.1 and isn't present in 3.3. If you're seeing a different leak I would suggest opening a new issue. |
@paulbiss Any idea on when 3.6.0 will make it's way out of the door? |
@craigcarnell I think the plan is start rolling packages today or tomorrow (our packager isn't the fastest box...), I've got one more cherry-pick I need to push. It's been a busy week for everyone and we haven't had a chance to update the packaging system to push a second LTS. |
I upgraded my aws instances (c3.large) to 3.4.0 (official packages got from http://dl.hhvm.com/ubuntu) and all of them get killed by oom-killer after eating all RAM and swap in about 5 minutes (getting about 300 requests per minute).
Is there anything I can check to track down the issue?
My server.ini:
pid = /var/run/hhvm/pid
hhvm.server.port = 9000
hhvm.server.type = fastcgi
hhvm.server.default_document = index.php
hhvm.log.use_log_file = true
hhvm.log.file = /var/log/hhvm/error.log
hhvm.repo.central.path = /var/run/hhvm/hhvm.hhbc
hhvm.resource_limit.max_socket = 10000
hhvm.log.header = true
Thanks,
stefano
The text was updated successfully, but these errors were encountered: