New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.4.0 (official hhvm-3.4.0~trusty package) eats all memory+swap #4268

Closed
tat opened this Issue Nov 18, 2014 · 75 comments

Comments

Projects
None yet
@tat

tat commented Nov 18, 2014

I upgraded my aws instances (c3.large) to 3.4.0 (official packages got from http://dl.hhvm.com/ubuntu) and all of them get killed by oom-killer after eating all RAM and swap in about 5 minutes (getting about 300 requests per minute).

Is there anything I can check to track down the issue?

My server.ini:
pid = /var/run/hhvm/pid
hhvm.server.port = 9000
hhvm.server.type = fastcgi
hhvm.server.default_document = index.php
hhvm.log.use_log_file = true
hhvm.log.file = /var/log/hhvm/error.log
hhvm.repo.central.path = /var/run/hhvm/hhvm.hhbc
hhvm.resource_limit.max_socket = 10000
hhvm.log.header = true

Thanks,
stefano

@tat tat changed the title from 3.4.0 (official hhvm-3.4.0~trusty package) memory to 3.4.0 (official hhvm-3.4.0~trusty package) eats all memory+swap Nov 18, 2014

@mklooss

This comment has been minimized.

Show comment
Hide comment
@mklooss

mklooss Nov 18, 2014

can confirm same senario here, but also on HHVM 3.3.
we have to restart the hhvm process every 6 hours to keep the server online
were are using an Dedicated Server
auswahl_010

mklooss commented Nov 18, 2014

can confirm same senario here, but also on HHVM 3.3.
we have to restart the hhvm process every 6 hours to keep the server online
were are using an Dedicated Server
auswahl_010

@tat

This comment has been minimized.

Show comment
Hide comment
@tat

tat Nov 18, 2014

In my case hhvm eats RAM+swap (about 5gigs in total) in about 5/7 minutes.

tat commented Nov 18, 2014

In my case hhvm eats RAM+swap (about 5gigs in total) in about 5/7 minutes.

@mklooss

This comment has been minimized.

Show comment
Hide comment
@mklooss

mklooss Nov 18, 2014

yesterday we had the same on 64 GB RAM and 8 GB SWAP in about 6 hours :/

mklooss commented Nov 18, 2014

yesterday we had the same on 64 GB RAM and 8 GB SWAP in about 6 hours :/

@jwatzman

This comment has been minimized.

Show comment
Hide comment
@jwatzman

jwatzman Nov 18, 2014

Member

@tat, this is an increase from 3.3 to 3.4? That's interesting. Can you get a heap profile for us? The process is unfortunately somewhat involved.

cc @paulbiss

@mklooss, what you're experiencing is unfortunately somewhat expected, and is a long-term issue we've been slowly looking into. It's not indicative of server instability, we just haven't optimized for a super-long-running server very much, since FB pushes twice a day. (Though 6 hours is still quite short.)

Member

jwatzman commented Nov 18, 2014

@tat, this is an increase from 3.3 to 3.4? That's interesting. Can you get a heap profile for us? The process is unfortunately somewhat involved.

cc @paulbiss

@mklooss, what you're experiencing is unfortunately somewhat expected, and is a long-term issue we've been slowly looking into. It's not indicative of server instability, we just haven't optimized for a super-long-running server very much, since FB pushes twice a day. (Though 6 hours is still quite short.)

@fredemmott

This comment has been minimized.

Show comment
Hide comment
@fredemmott

fredemmott Nov 18, 2014

Contributor

The admin server speaks FastCGI now, not HTTP - you'll also need to configure your webserver to give you access to it.

Contributor

fredemmott commented Nov 18, 2014

The admin server speaks FastCGI now, not HTTP - you'll also need to configure your webserver to give you access to it.

@tat

This comment has been minimized.

Show comment
Hide comment
@tat

tat Nov 18, 2014

Thanks for the feedback, I've got the admin interface working but I'm getting an error from the activate command:
Error 2 in mallctl("prof.active", ...)

do you know what's the issue? where is the file supposed to be written to? /tmp ?

Here's the jmalloc-stats output I captured from the admin interface, http://pastebin.com/vBvfPiP5

btw @jwatzman 3.3 is working fine for me, RAM usage is stable at about 350MB; it has been running for days without restarts.

tat commented Nov 18, 2014

Thanks for the feedback, I've got the admin interface working but I'm getting an error from the activate command:
Error 2 in mallctl("prof.active", ...)

do you know what's the issue? where is the file supposed to be written to? /tmp ?

Here's the jmalloc-stats output I captured from the admin interface, http://pastebin.com/vBvfPiP5

btw @jwatzman 3.3 is working fine for me, RAM usage is stable at about 350MB; it has been running for days without restarts.

@mklooss

This comment has been minimized.

Show comment
Hide comment
@mklooss

mklooss Nov 19, 2014

jemalloc Stats: https://gist.github.com/mklooss/8091e48c4551f40d05c8
currently the HHVM Process eats ~10 GB RAM, process runs ~ 2 hours

mklooss commented Nov 19, 2014

jemalloc Stats: https://gist.github.com/mklooss/8091e48c4551f40d05c8
currently the HHVM Process eats ~10 GB RAM, process runs ~ 2 hours

@frankh

This comment has been minimized.

Show comment
Hide comment
@frankh

frankh Nov 20, 2014

I'm getting the same problem running a large wordpress site on HHVM. Memory usage starts at ~450mb and climbs to 1.2mb before restarting (not 100% if OOM killed or crashes yet) every ~2 hours

This is HHVM 3.4.0 on ubuntu/trusty

frankh commented Nov 20, 2014

I'm getting the same problem running a large wordpress site on HHVM. Memory usage starts at ~450mb and climbs to 1.2mb before restarting (not 100% if OOM killed or crashes yet) every ~2 hours

This is HHVM 3.4.0 on ubuntu/trusty

@jwatzman

This comment has been minimized.

Show comment
Hide comment
@jwatzman

jwatzman Nov 20, 2014

Member

I just cherry-picked a memory leak fix into the 3.4 branch -- can someone who's experiencing this build that branch and report back? If it fixes it, we can roll a 3.4.1 release. The issue is that if you are passing invalid arguments to some builtin functions, such that the builtin raises a warning, we leak a small amount of memory each time -- and it looks new in 3.4. If your PHP app generates a lot of warnings from builtins, then this could easily be your bug :)

Thanks for the feedback, I've got the admin interface working but I'm getting an error from the activate command:
Error 2 in mallctl("prof.active", ...)

do you know what's the issue? where is the file supposed to be written to? /tmp ?

I don't, sorry -- @fredemmott, @paulbiss, can either of you advise better?

Member

jwatzman commented Nov 20, 2014

I just cherry-picked a memory leak fix into the 3.4 branch -- can someone who's experiencing this build that branch and report back? If it fixes it, we can roll a 3.4.1 release. The issue is that if you are passing invalid arguments to some builtin functions, such that the builtin raises a warning, we leak a small amount of memory each time -- and it looks new in 3.4. If your PHP app generates a lot of warnings from builtins, then this could easily be your bug :)

Thanks for the feedback, I've got the admin interface working but I'm getting an error from the activate command:
Error 2 in mallctl("prof.active", ...)

do you know what's the issue? where is the file supposed to be written to? /tmp ?

I don't, sorry -- @fredemmott, @paulbiss, can either of you advise better?

@jwatzman

This comment has been minimized.

Show comment
Hide comment
@jwatzman

jwatzman Nov 21, 2014

Member

can someone who's experiencing this build that branch and report back? If it fixes it, we can roll a 3.4.1 release.

I went ahead and built a deb for trusty with this patch: http://dl.hhvm.com/ubuntu/hhvm_3.4.1-devtest~trusty_amd64.deb You can manually install that so you don't have to build HHVM yourself; let me know if it works better.

Member

jwatzman commented Nov 21, 2014

can someone who's experiencing this build that branch and report back? If it fixes it, we can roll a 3.4.1 release.

I went ahead and built a deb for trusty with this patch: http://dl.hhvm.com/ubuntu/hhvm_3.4.1-devtest~trusty_amd64.deb You can manually install that so you don't have to build HHVM yourself; let me know if it works better.

@denji

This comment has been minimized.

Show comment
Hide comment
@denji

denji Nov 21, 2014

Contributor

configure:

./configure -DENABLE_SSP=ON -DDEBUG_MEMORY_LEAK=ON -DDEBUG_APC_LEAK=ON

  -DDEBUG_APC_LEAK=ON|OFF : Allow easier debugging of apc leaks : Default: OFF
  -DDEBUG_MEMORY_LEAK=ON|OFF : Allow easier debugging of memory leaks : Default: OFF
  -DENABLE_SSP=ON|OFF : Enabled GCC/LLVM stack-smashing protection : Default: OFF
Contributor

denji commented Nov 21, 2014

configure:

./configure -DENABLE_SSP=ON -DDEBUG_MEMORY_LEAK=ON -DDEBUG_APC_LEAK=ON

  -DDEBUG_APC_LEAK=ON|OFF : Allow easier debugging of apc leaks : Default: OFF
  -DDEBUG_MEMORY_LEAK=ON|OFF : Allow easier debugging of memory leaks : Default: OFF
  -DENABLE_SSP=ON|OFF : Enabled GCC/LLVM stack-smashing protection : Default: OFF
@levixie

This comment has been minimized.

Show comment
Hide comment
@levixie

levixie Nov 21, 2014

@jwatzman which change you cherry-pick?
I only see some doc update
Thanks

levixie commented Nov 21, 2014

@jwatzman which change you cherry-pick?
I only see some doc update
Thanks

@jwatzman

This comment has been minimized.

Show comment
Hide comment
@jwatzman

jwatzman Nov 21, 2014

Member

edf53c1 is the relevant cherry-pick. It does look like only a doc update, but AIUI we have a script that parses that file (in particular, lines of the form of the one changed) to generate a bunch of data about opcode semantics, and the change is thus relevant. It confused me as well until it was explained to me this morning :-P

Member

jwatzman commented Nov 21, 2014

edf53c1 is the relevant cherry-pick. It does look like only a doc update, but AIUI we have a script that parses that file (in particular, lines of the form of the one changed) to generate a bunch of data about opcode semantics, and the change is thus relevant. It confused me as well until it was explained to me this morning :-P

@levixie

This comment has been minimized.

Show comment
Hide comment
@levixie

levixie Nov 21, 2014

Thank you! We are building hhvm ourselves because we need some specific version of lib. I will pick the change and try it out to see how it goes

levixie commented Nov 21, 2014

Thank you! We are building hhvm ourselves because we need some specific version of lib. I will pick the change and try it out to see how it goes

@frankh

This comment has been minimized.

Show comment
Hide comment
@frankh

frankh Nov 21, 2014

Thanks for the patch and build, I'm trying it out now but unfortunately it looks like it's still leaking memory.

There are no warning/errors in my hhvm log so it doesn't look like this was the cause of the leak for me.

frankh commented Nov 21, 2014

Thanks for the patch and build, I'm trying it out now but unfortunately it looks like it's still leaking memory.

There are no warning/errors in my hhvm log so it doesn't look like this was the cause of the leak for me.

@staabm

This comment has been minimized.

Show comment
Hide comment
@staabm

staabm Nov 21, 2014

Contributor

maybe you are using create_function ? it seems this one is leaky, too - #4250

Contributor

staabm commented Nov 21, 2014

maybe you are using create_function ? it seems this one is leaky, too - #4250

@paulbiss

This comment has been minimized.

Show comment
Hide comment
@paulbiss

paulbiss Nov 21, 2014

Contributor

@staabm: that's been leaky for awhile, we're looking for a leak that was recently introduced

Contributor

paulbiss commented Nov 21, 2014

@staabm: that's been leaky for awhile, we're looking for a leak that was recently introduced

@jwatzman

This comment has been minimized.

Show comment
Hide comment
@jwatzman

jwatzman Nov 21, 2014

Member

Spent most of the morning looking at this. I wasn't able to reproduce it with the "representative WordPress" install from https://github.com/hhvm/oss-performance, unfortunately. However, I was able to reproduce the heap profiling failure, and can help you get us a heap profile. It's a little messy.

  • The reason that turning profiling on and off is failing is that the default Ubuntu jemalloc lib doesn't have profiling enabled. I built one for you: download http://dl.hhvm.com/ubuntu/libjemalloc.so.1 and replace /usr/lib/x86_64-linux-gnu/libjemalloc.so.1 with that file. (You probably want to move the system one out of the way instead of overwriting it so you can easily restore it later.) (If anyone reading this isn't on Ubuntu Trusty, the important thing is to build jemalloc with ./configure --enable-prof.)
  • That's still not enough to work; you need to make sure HHVM runs with the following environment variable: MALLOC_CONF="prof:true,prof_active:false"
  • Then, set up the admin server as detailed above. My local install just passes -v AdminServer.Port=8093 to the HHVM command line, but you can put that in the config too. You also need nginx in front of that; I did it with something like this below the normal server stanza:
  server {
    listen 8091 default_server;
    access_log            /dev/shm/hhvm-nginxCnchqi/admin-access.log main;
    client_body_temp_path /dev/shm/hhvm-nginxCnchqi/admin-client_temp;
    proxy_temp_path       /dev/shm/hhvm-nginxCnchqi/admin-proxy_temp;
    fastcgi_temp_path     /dev/shm/hhvm-nginxCnchqi/admin-fastcgi_temp;
    uwsgi_temp_path       /dev/shm/hhvm-nginxCnchqi/admin-uwsgi_temp;
    scgi_temp_path        /dev/shm/hhvm-nginxCnchqi/admin-scgi_temp;

    location / {
      fastcgi_pass 127.0.0.1:8093;
      include fastcgi_params;
    }
  }
  • Start up and warm up your server.
  • curl localhost:8091/jemalloc-prof-activate to turn on profiling.
  • curl localhost:8091/jemalloc-prof-dump?file=/tmp/dump1 to get an initial dump.
  • Let your server run for a little while. You say it takes about 6 hours to fall over? Let it run for 3-4 or so.
  • curl localhost:8091/jemalloc-prof-dump?file=/tmp/dump2 to get a second dump.
  • Let it run until it almost falls over.
  • curl localhost:8091/jemalloc-prof-dump?file=/tmp/dump3 to get a final dump.
  • Send me those three files out of /tmp, along with both which version of HHVM you installed (i.e., the Trusty 3.4.0 package, which I think you're using) and the output of hhvm --version. Feel free to post it here, or in case you think there's anything sensitive, my email address is my GitHub username at fb.com.
Member

jwatzman commented Nov 21, 2014

Spent most of the morning looking at this. I wasn't able to reproduce it with the "representative WordPress" install from https://github.com/hhvm/oss-performance, unfortunately. However, I was able to reproduce the heap profiling failure, and can help you get us a heap profile. It's a little messy.

  • The reason that turning profiling on and off is failing is that the default Ubuntu jemalloc lib doesn't have profiling enabled. I built one for you: download http://dl.hhvm.com/ubuntu/libjemalloc.so.1 and replace /usr/lib/x86_64-linux-gnu/libjemalloc.so.1 with that file. (You probably want to move the system one out of the way instead of overwriting it so you can easily restore it later.) (If anyone reading this isn't on Ubuntu Trusty, the important thing is to build jemalloc with ./configure --enable-prof.)
  • That's still not enough to work; you need to make sure HHVM runs with the following environment variable: MALLOC_CONF="prof:true,prof_active:false"
  • Then, set up the admin server as detailed above. My local install just passes -v AdminServer.Port=8093 to the HHVM command line, but you can put that in the config too. You also need nginx in front of that; I did it with something like this below the normal server stanza:
  server {
    listen 8091 default_server;
    access_log            /dev/shm/hhvm-nginxCnchqi/admin-access.log main;
    client_body_temp_path /dev/shm/hhvm-nginxCnchqi/admin-client_temp;
    proxy_temp_path       /dev/shm/hhvm-nginxCnchqi/admin-proxy_temp;
    fastcgi_temp_path     /dev/shm/hhvm-nginxCnchqi/admin-fastcgi_temp;
    uwsgi_temp_path       /dev/shm/hhvm-nginxCnchqi/admin-uwsgi_temp;
    scgi_temp_path        /dev/shm/hhvm-nginxCnchqi/admin-scgi_temp;

    location / {
      fastcgi_pass 127.0.0.1:8093;
      include fastcgi_params;
    }
  }
  • Start up and warm up your server.
  • curl localhost:8091/jemalloc-prof-activate to turn on profiling.
  • curl localhost:8091/jemalloc-prof-dump?file=/tmp/dump1 to get an initial dump.
  • Let your server run for a little while. You say it takes about 6 hours to fall over? Let it run for 3-4 or so.
  • curl localhost:8091/jemalloc-prof-dump?file=/tmp/dump2 to get a second dump.
  • Let it run until it almost falls over.
  • curl localhost:8091/jemalloc-prof-dump?file=/tmp/dump3 to get a final dump.
  • Send me those three files out of /tmp, along with both which version of HHVM you installed (i.e., the Trusty 3.4.0 package, which I think you're using) and the output of hhvm --version. Feel free to post it here, or in case you think there's anything sensitive, my email address is my GitHub username at fb.com.
@SiebelsTim

This comment has been minimized.

Show comment
Hide comment
@SiebelsTim

SiebelsTim Nov 21, 2014

Contributor

@jwatzman Put this in the wiki or somewhere! 👍

Contributor

SiebelsTim commented Nov 21, 2014

@jwatzman Put this in the wiki or somewhere! 👍

@jwatzman

This comment has been minimized.

Show comment
Hide comment
@jwatzman

jwatzman Nov 22, 2014

Member

Yeah, good idea, will do if this ends up producing useful results :)

Member

jwatzman commented Nov 22, 2014

Yeah, good idea, will do if this ends up producing useful results :)

@jwatzman

This comment has been minimized.

Show comment
Hide comment
@jwatzman

jwatzman Dec 2, 2014

Member

Have any of you that are experiencing this been able to get any more info? Just confirming that the 3.4.1-devtest deb linked above does or does not help would be useful -- and if it doesn't help, a heap dump as above would be even more useful. This is going to eventually hit human timeout which would be unfortunate, since it seems to be a real issue -- but since we can't repro it, we need more info to track it down :(

Member

jwatzman commented Dec 2, 2014

Have any of you that are experiencing this been able to get any more info? Just confirming that the 3.4.1-devtest deb linked above does or does not help would be useful -- and if it doesn't help, a heap dump as above would be even more useful. This is going to eventually hit human timeout which would be unfortunate, since it seems to be a real issue -- but since we can't repro it, we need more info to track it down :(

@liayn

This comment has been minimized.

Show comment
Hide comment
@liayn

liayn Dec 2, 2014

I'll install the devtest thing now on the live-server now. lets hope

liayn commented Dec 2, 2014

I'll install the devtest thing now on the live-server now. lets hope

@liayn

This comment has been minimized.

Show comment
Hide comment
@liayn

liayn Dec 2, 2014

hm, apt-get keeps nagging me tell me a newer version is available... how can I avoid that?

liayn commented Dec 2, 2014

hm, apt-get keeps nagging me tell me a newer version is available... how can I avoid that?

@jwatzman

This comment has been minimized.

Show comment
Hide comment
@jwatzman

jwatzman Dec 2, 2014

Member

You can directly download the deb and then sudo dpkg --install path/to/deb.

Member

jwatzman commented Dec 2, 2014

You can directly download the deb and then sudo dpkg --install path/to/deb.

@liayn

This comment has been minimized.

Show comment
Hide comment
@liayn

liayn Dec 2, 2014

that's what I did. It replaced the installed hhvm, but now apt-get reports that updates are available and that triggers reporting systems and that triggers mails....

liayn commented Dec 2, 2014

that's what I did. It replaced the installed hhvm, but now apt-get reports that updates are available and that triggers reporting systems and that triggers mails....

@jwatzman

This comment has been minimized.

Show comment
Hide comment
@jwatzman

jwatzman Dec 2, 2014

Member

Can you just silence that for a little while? The package is deliberately built out-of-band, since it's unclear if it will help. (Though it's signed with the same GPG key as the official ones so you can tel it does come from us.) I'm not sure what reporting system you are using to tell you how to shut it up; you may try just commenting out the HHVM repo from /etc/apt/sources.list or /etc/apt/sources.list.d/ wherever it is.

Member

jwatzman commented Dec 2, 2014

Can you just silence that for a little while? The package is deliberately built out-of-band, since it's unclear if it will help. (Though it's signed with the same GPG key as the official ones so you can tel it does come from us.) I'm not sure what reporting system you are using to tell you how to shut it up; you may try just commenting out the HHVM repo from /etc/apt/sources.list or /etc/apt/sources.list.d/ wherever it is.

@tat

This comment has been minimized.

Show comment
Hide comment
@tat

tat Dec 10, 2014

Debug files sent to jwatzman. Let me know if you find anything, thanks!

tat commented Dec 10, 2014

Debug files sent to jwatzman. Let me know if you find anything, thanks!

@swtaarrs

This comment has been minimized.

Show comment
Hide comment
@swtaarrs

swtaarrs Dec 10, 2014

Contributor

Thanks, that log file was very helpful. It looks like the JIT is just trying to compile an incredibly large chunk of code in a function with an abnormally large amount of locals, and we're using a lot of memory as a result. There are a few things you can do that should help. The problem is that you have a large amount of code in a pseudomain (code that isn't in any function, just at the top level of a file) and the way we compile those is pretty suboptimal. The quickest fix will be disabling compilation of those with the hhvm.jit_pseudomain = 0 ini option. That will negatively impact performance but should reduce the crazy memory usage.

A better fix would be putting all of that code inside a function, rather than leaving it at the top level. I can't tell which file it was, but it looks like there are at least 395 local variables in it, and some of the functions it calls are array_map, trc, convert_height_to_text, convert_size_to_text, and convert_weight_to_text. If that's not easily possible since you make heavy use of global variables, or if neither of these help, your best bet is probably the hhvm.jit_max_region_isntrs option I mentioned in a previous comment.

It's of course possible that there's a real leak somewhere, but so far all signs are pointing to the massive compilation unit being the problem.

Contributor

swtaarrs commented Dec 10, 2014

Thanks, that log file was very helpful. It looks like the JIT is just trying to compile an incredibly large chunk of code in a function with an abnormally large amount of locals, and we're using a lot of memory as a result. There are a few things you can do that should help. The problem is that you have a large amount of code in a pseudomain (code that isn't in any function, just at the top level of a file) and the way we compile those is pretty suboptimal. The quickest fix will be disabling compilation of those with the hhvm.jit_pseudomain = 0 ini option. That will negatively impact performance but should reduce the crazy memory usage.

A better fix would be putting all of that code inside a function, rather than leaving it at the top level. I can't tell which file it was, but it looks like there are at least 395 local variables in it, and some of the functions it calls are array_map, trc, convert_height_to_text, convert_size_to_text, and convert_weight_to_text. If that's not easily possible since you make heavy use of global variables, or if neither of these help, your best bet is probably the hhvm.jit_max_region_isntrs option I mentioned in a previous comment.

It's of course possible that there's a real leak somewhere, but so far all signs are pointing to the massive compilation unit being the problem.

@jwatzman

This comment has been minimized.

Show comment
Hide comment
@jwatzman

jwatzman Dec 10, 2014

Member

@tat and this bit of code started consuming more memory in 3.4, which is why this probably just now hit you. I hear the memory usage will be somewhat improved in 3.5 or 3.6, but no promises -- and that clearly doesn't help you now :)

@swtaarrs is there any tweaking we could do of defaults, or anything like that, which would make this failure more at least more debuggable, or hopefully go away, for external folks? This seems like something that folks will hit from time to time, and ideally they shouldn't, or at the very least it shouldn't be this hard to debug.

Member

jwatzman commented Dec 10, 2014

@tat and this bit of code started consuming more memory in 3.4, which is why this probably just now hit you. I hear the memory usage will be somewhat improved in 3.5 or 3.6, but no promises -- and that clearly doesn't help you now :)

@swtaarrs is there any tweaking we could do of defaults, or anything like that, which would make this failure more at least more debuggable, or hopefully go away, for external folks? This seems like something that folks will hit from time to time, and ideally they shouldn't, or at the very least it shouldn't be this hard to debug.

@liayn

This comment has been minimized.

Show comment
Hide comment
@liayn

liayn Dec 10, 2014

@swtaarrs: I just deployed that option 'hhvm.jit_pseudomain = 0' to our server and restarted hhvm.
Unfortunately, memory usage keeps building. Not that fast, but we're not having rush hour currently. :-(

liayn commented Dec 10, 2014

@swtaarrs: I just deployed that option 'hhvm.jit_pseudomain = 0' to our server and restarted hhvm.
Unfortunately, memory usage keeps building. Not that fast, but we're not having rush hour currently. :-(

@liayn

This comment has been minimized.

Show comment
Hide comment
@liayn

liayn Dec 10, 2014

I checked our Wordpress installation and couldn't find any of the aforementioned methods in the code. So this is not part of the Wordpress Core.

liayn commented Dec 10, 2014

I checked our Wordpress installation and couldn't find any of the aforementioned methods in the code. So this is not part of the Wordpress Core.

@tat

This comment has been minimized.

Show comment
Hide comment
@tat

tat Dec 10, 2014

@jwatzman @swtaarrs we definitely have a jumbo file with about 4k lines and most of them are top level.

I tried it with hhvm.jit_pseudomain=0 and the memory consumed peaked at 21% (~600MB) but of course that is not useful for us as the jumbo file is the one that gets 95% of requests and it's not compiled with that setting.

I'm trying now with hhvm.jit_max_region_instrs=500 and it didn't get oom'd yet but it's consuming 78% of memory at the moment (2.8gigs). the memory growth seems slower but still there imho (also we don't have much traffic at this hours).

so it seems to be jit related but does it sound possible to you that it was using ~500mb on 3.3 and can't run with 3.5 gigs on 3.4?

tat commented Dec 10, 2014

@jwatzman @swtaarrs we definitely have a jumbo file with about 4k lines and most of them are top level.

I tried it with hhvm.jit_pseudomain=0 and the memory consumed peaked at 21% (~600MB) but of course that is not useful for us as the jumbo file is the one that gets 95% of requests and it's not compiled with that setting.

I'm trying now with hhvm.jit_max_region_instrs=500 and it didn't get oom'd yet but it's consuming 78% of memory at the moment (2.8gigs). the memory growth seems slower but still there imho (also we don't have much traffic at this hours).

so it seems to be jit related but does it sound possible to you that it was using ~500mb on 3.3 and can't run with 3.5 gigs on 3.4?

@jwatzman

This comment has been minimized.

Show comment
Hide comment
@jwatzman

jwatzman Dec 11, 2014

Member

@liayn it's extremely likely that your issue is unrelated to what we're tracking down with @tat and so it's unsurprising that the options mentioned didn't help -- HHVM has some very longstanding slow memory leaks; a particularly bad regression went in to 3.4, that was fixed in the 3.4.1-devtest I posted above. I'll be rolling 3.4.1 with that fix and some other fixes that I'm waiting on being finished up, probably in the next couple of weeks. But most have been there well before 3.4, for years, and what you've said unfortunately sounds in line with them. That's just to say it's expected, not good -- I think some folks are going to try to go after those leaks some time after the holidays.

so it seems to be jit related but does it sound possible to you that it was using ~500mb on 3.3 and can't run with 3.5 gigs on 3.4?

@bertmaher @swtaarrs @alexmalyshev any of you have any idea how much the RAM usage of FrameState increased from 3.3 to 3.4? I know someone said that it did, but 5-10x seems like a lot.

Member

jwatzman commented Dec 11, 2014

@liayn it's extremely likely that your issue is unrelated to what we're tracking down with @tat and so it's unsurprising that the options mentioned didn't help -- HHVM has some very longstanding slow memory leaks; a particularly bad regression went in to 3.4, that was fixed in the 3.4.1-devtest I posted above. I'll be rolling 3.4.1 with that fix and some other fixes that I'm waiting on being finished up, probably in the next couple of weeks. But most have been there well before 3.4, for years, and what you've said unfortunately sounds in line with them. That's just to say it's expected, not good -- I think some folks are going to try to go after those leaks some time after the holidays.

so it seems to be jit related but does it sound possible to you that it was using ~500mb on 3.3 and can't run with 3.5 gigs on 3.4?

@bertmaher @swtaarrs @alexmalyshev any of you have any idea how much the RAM usage of FrameState increased from 3.3 to 3.4? I know someone said that it did, but 5-10x seems like a lot.

@jwatzman

This comment has been minimized.

Show comment
Hide comment
@jwatzman

jwatzman Dec 11, 2014

Member

Or maybe something that isn't a "leak" per-say -- the memory will eventually get cleaned up, but will stick around somewhat longer than before, causing an earlier OOM when combined with the FrameState size increase?

Member

jwatzman commented Dec 11, 2014

Or maybe something that isn't a "leak" per-say -- the memory will eventually get cleaned up, but will stick around somewhat longer than before, causing an earlier OOM when combined with the FrameState size increase?

@bertmaher

This comment has been minimized.

Show comment
Hide comment
@bertmaher

bertmaher Dec 11, 2014

Contributor

@jwatzman, @tat: something that's still confusing to me is that FrameState should not be long-lived; we should allocate them and either (a) OOM immediately, which is what we think is happening here, or (b) finish translating and free the FrameStates. So something still feels weird here, unless this server never stops translating until it OOMs...

Contributor

bertmaher commented Dec 11, 2014

@jwatzman, @tat: something that's still confusing to me is that FrameState should not be long-lived; we should allocate them and either (a) OOM immediately, which is what we think is happening here, or (b) finish translating and free the FrameStates. So something still feels weird here, unless this server never stops translating until it OOMs...

@jwatzman

This comment has been minimized.

Show comment
Hide comment
@jwatzman

jwatzman Dec 11, 2014

Member

Yeah, something weird is going on. https://gist.github.com/jwatzman/5c25aa6732e849df13e2 manages to reproduce the issue -- run the output of that on 3.4, refresh 12 times until we JIT, then watch RAM spike -- up to 1G on my machine. An idle heap dump looks very similar to what @tat sent -- with things that should never be running simultaneously, and shouldn't be running at all when idle. Looks either like a leak or our heap profiling is lying to us :)

The issue actually looks much worse on master, though it could be a separate issue. (We peg the CPU and keep consuming RAM until we OOM with my above script -- we're at least stable at 1G on 3.4.)

@swtaarrs and @bertmaher are continuing to look into this.

Member

jwatzman commented Dec 11, 2014

Yeah, something weird is going on. https://gist.github.com/jwatzman/5c25aa6732e849df13e2 manages to reproduce the issue -- run the output of that on 3.4, refresh 12 times until we JIT, then watch RAM spike -- up to 1G on my machine. An idle heap dump looks very similar to what @tat sent -- with things that should never be running simultaneously, and shouldn't be running at all when idle. Looks either like a leak or our heap profiling is lying to us :)

The issue actually looks much worse on master, though it could be a separate issue. (We peg the CPU and keep consuming RAM until we OOM with my above script -- we're at least stable at 1G on 3.4.)

@swtaarrs and @bertmaher are continuing to look into this.

@jwatzman

This comment has been minimized.

Show comment
Hide comment
@jwatzman

jwatzman Dec 12, 2014

Member

We found it!

It appears to be a bug in boost flat_map, versions before 1.55 (trusty has 1.54). One of these two, not clear which:

This was triggered by the new usage of flat_map in 4a8ee81, which is a rev that is new in 3.4.

I'll write a change tomorrow to work around this for old versions of boost, and get that merged into 3.4.1.

Member

jwatzman commented Dec 12, 2014

We found it!

It appears to be a bug in boost flat_map, versions before 1.55 (trusty has 1.54). One of these two, not clear which:

This was triggered by the new usage of flat_map in 4a8ee81, which is a rev that is new in 3.4.

I'll write a change tomorrow to work around this for old versions of boost, and get that merged into 3.4.1.

@tat

This comment has been minimized.

Show comment
Hide comment
@tat

tat Dec 12, 2014

Great!!! if you can upload a new .deb I'll test it tonight.

On a side note it would be nice if the apt Package file would keep old versions listed so we could stay on a version that we know it works properly and upgrade manually after testing new versions (really needed when autoscaling is used).

Thank you!

tat commented Dec 12, 2014

Great!!! if you can upload a new .deb I'll test it tonight.

On a side note it would be nice if the apt Package file would keep old versions listed so we could stay on a version that we know it works properly and upgrade manually after testing new versions (really needed when autoscaling is used).

Thank you!

@dmytroleonenko

This comment has been minimized.

Show comment
Hide comment
@dmytroleonenko

dmytroleonenko Dec 12, 2014

Hey,
I'm also experiencing memory leaks on Ubuntu 14.10 with hhvm 3.5 (and nightly as well) when running vBulletin 3 forum software.
Have sent dumps to jwatzman. Please advice if I should open separate issue.

dmytroleonenko commented Dec 12, 2014

Hey,
I'm also experiencing memory leaks on Ubuntu 14.10 with hhvm 3.5 (and nightly as well) when running vBulletin 3 forum software.
Have sent dumps to jwatzman. Please advice if I should open separate issue.

@jwatzman

This comment has been minimized.

Show comment
Hide comment
@jwatzman

jwatzman Dec 12, 2014

Member

Some notes:

  • A change is up for review to fix this -- https://reviews.facebook.net/D30183 if anyone wants to follow along. I'll merge it into the 3.4 branch and roll 3.4.1 packages once it's in.
  • The bug is actually in flat_set, I misspoke above. The upstream issue is likely to be https://svn.boost.org/trac/boost/ticket/9166 but it's not totally clear.
  • It was fixed in boost 1.55, which ubuntu 14.10 ships with, so unfortunately if you're running 14.10 this won't help you. We are separately tracking some other memory issues with 3.5-dev, we'll see if they're related.
  • If you installed my libjemalloc above to get a heap dump, you should revert to the system version when you're done with the tests. There's a very slight incompatibility between it and the version of folly in 3.4, which can trigger a very rare issue that leads to memory corruption. It's so rare that it's probably fine, but to be totally safe I'd revert it.

On a side note it would be nice if the apt Package file would keep old versions listed

Unfortunately reprepro which we use to manage the repo doesn't do this. The debs are all still up on dl.hhvm.com (specifically ubuntu lives here) if you want to manually install something. But yeah, I'd definitely recommend installing new versions onto a single development machine and testing before upgrading your production server(s). (Not just for HHVM, but for any upgrade of anything :))

I'm also experiencing memory leaks on Ubuntu 14.10 with hhvm 3.5 (and nightly as well) when running vBulletin 3 forum software.

This is likely to be a separate issue, since I'm pretty sure 14.10 has a fixed boost library. HHVM is known to leak small amounts of memory over a long period of time (i.e., needing to restart the HHVM process every couple of days isn't unexpected). If the leak is worse than that, please open a separate issue -- this one is specifically about a regression from 3.3 to 3.4.

Member

jwatzman commented Dec 12, 2014

Some notes:

  • A change is up for review to fix this -- https://reviews.facebook.net/D30183 if anyone wants to follow along. I'll merge it into the 3.4 branch and roll 3.4.1 packages once it's in.
  • The bug is actually in flat_set, I misspoke above. The upstream issue is likely to be https://svn.boost.org/trac/boost/ticket/9166 but it's not totally clear.
  • It was fixed in boost 1.55, which ubuntu 14.10 ships with, so unfortunately if you're running 14.10 this won't help you. We are separately tracking some other memory issues with 3.5-dev, we'll see if they're related.
  • If you installed my libjemalloc above to get a heap dump, you should revert to the system version when you're done with the tests. There's a very slight incompatibility between it and the version of folly in 3.4, which can trigger a very rare issue that leads to memory corruption. It's so rare that it's probably fine, but to be totally safe I'd revert it.

On a side note it would be nice if the apt Package file would keep old versions listed

Unfortunately reprepro which we use to manage the repo doesn't do this. The debs are all still up on dl.hhvm.com (specifically ubuntu lives here) if you want to manually install something. But yeah, I'd definitely recommend installing new versions onto a single development machine and testing before upgrading your production server(s). (Not just for HHVM, but for any upgrade of anything :))

I'm also experiencing memory leaks on Ubuntu 14.10 with hhvm 3.5 (and nightly as well) when running vBulletin 3 forum software.

This is likely to be a separate issue, since I'm pretty sure 14.10 has a fixed boost library. HHVM is known to leak small amounts of memory over a long period of time (i.e., needing to restart the HHVM process every couple of days isn't unexpected). If the leak is worse than that, please open a separate issue -- this one is specifically about a regression from 3.3 to 3.4.

@jwatzman

This comment has been minimized.

Show comment
Hide comment
@jwatzman

jwatzman Dec 12, 2014

Member

@dmytroleonenko the heap dump you sent me was pretty clearly not from 14.10 -- are you sure you're not on 14.04? It looks a lot like you are, in which case you are likely hitting the same or a similar leak. Try the new nightly tonight (2014.12.13 or newer).

Member

jwatzman commented Dec 12, 2014

@dmytroleonenko the heap dump you sent me was pretty clearly not from 14.10 -- are you sure you're not on 14.04? It looks a lot like you are, in which case you are likely hitting the same or a similar leak. Try the new nightly tonight (2014.12.13 or newer).

@jwatzman jwatzman closed this in 12f44e0 Dec 12, 2014

@jwatzman

This comment has been minimized.

Show comment
Hide comment
@jwatzman

jwatzman Dec 12, 2014

Member

Will tag and roll 3.4.1 shortly. I'll make sure to build 14.04 first, should be available in a few hours. Nightly builds won't have the fix until the 2014.12.13 builds, as noted above.

Member

jwatzman commented Dec 12, 2014

Will tag and roll 3.4.1 shortly. I'll make sure to build 14.04 first, should be available in a few hours. Nightly builds won't have the fix until the 2014.12.13 builds, as noted above.

jwatzman added a commit that referenced this issue Dec 12, 2014

Work around leak in boost flat_set
Summary: This is very likely to be the memory leak reported to be new in
HHVM 3.4. See code comment and linked GitHub issue for full explanation.

Fixes #4268

{sync, type="child", parent="internal", parentrevid="1736543", parentrevfbid="1584241511789266", parentdiffid="5935230"}

Reviewed By: @paulbiss

Differential Revision: D1736543
@jwatzman

This comment has been minimized.

Show comment
Hide comment
@jwatzman

jwatzman Dec 13, 2014

Member

The release version of 3.4.1 for trusty (14.04) is up, which was the hardest hit and what I think everyone on this thread was using. Building debug version, and the other OSes, over the next day or two.

Thanks for the info from everyone about this! @dmytroleonenko if you're still hitting problems after trying the 2014.12.13 nightly (which won't exist for another 12 hours or so) please file a new issue.

Member

jwatzman commented Dec 13, 2014

The release version of 3.4.1 for trusty (14.04) is up, which was the hardest hit and what I think everyone on this thread was using. Building debug version, and the other OSes, over the next day or two.

Thanks for the info from everyone about this! @dmytroleonenko if you're still hitting problems after trying the 2014.12.13 nightly (which won't exist for another 12 hours or so) please file a new issue.

@tat

This comment has been minimized.

Show comment
Hide comment
@tat

tat Dec 13, 2014

great, thank you guys! I'll try it out and report back if I find any issues.

tat commented Dec 13, 2014

great, thank you guys! I'll try it out and report back if I find any issues.

@pjv

This comment has been minimized.

Show comment
Hide comment
@pjv

pjv Dec 13, 2014

3.4.1 on ubuntu trusty is looking good for me. 9 hours of stable memory use by HHVM.

2014-12-13 at 2 49 am

pjv commented Dec 13, 2014

3.4.1 on ubuntu trusty is looking good for me. 9 hours of stable memory use by HHVM.

2014-12-13 at 2 49 am

@liayn

This comment has been minimized.

Show comment
Hide comment
@liayn

liayn Dec 13, 2014

Looks good with us as well. Initial memory usage already lower. So far memory seems stable. Thanks a lot to everyone!

liayn commented Dec 13, 2014

Looks good with us as well. Initial memory usage already lower. So far memory seems stable. Thanks a lot to everyone!

@dmytroleonenko

This comment has been minimized.

Show comment
Hide comment
@dmytroleonenko

dmytroleonenko Dec 13, 2014

It looks much better. I'll try to see if the issue is still there and file a new bug if still there

dmytroleonenko commented Dec 13, 2014

It looks much better. I'll try to see if the issue is still there and file a new bug if still there

@HumanWorks

This comment has been minimized.

Show comment
Hide comment
@HumanWorks

HumanWorks Mar 5, 2015

I had the same problem with HHVM + wordpress, the server was crashing every time we were posting something new. It turns out that the simple solution that works is just disable the "Try to automatically compress the sitemap if the requesting client supports it" setting in XML sitemaps plugin.
Hope this will help (I know it's not a real solution to hhvm but at least it will work for many wordpress installations out there)

HumanWorks commented Mar 5, 2015

I had the same problem with HHVM + wordpress, the server was crashing every time we were posting something new. It turns out that the simple solution that works is just disable the "Try to automatically compress the sitemap if the requesting client supports it" setting in XML sitemaps plugin.
Hope this will help (I know it's not a real solution to hhvm but at least it will work for many wordpress installations out there)

@paulbiss

This comment has been minimized.

Show comment
Hide comment
@paulbiss

paulbiss Mar 5, 2015

Contributor

@HumanWorks the problem we were tracking here turned out to be a memory leak in boost that was being triggered in the JIT, it's been fixed since 3.4.1 and isn't present in 3.3. If you're seeing a different leak I would suggest opening a new issue.

Contributor

paulbiss commented Mar 5, 2015

@HumanWorks the problem we were tracking here turned out to be a memory leak in boost that was being triggered in the JIT, it's been fixed since 3.4.1 and isn't present in 3.3. If you're seeing a different leak I would suggest opening a new issue.

@craigcarnell

This comment has been minimized.

Show comment
Hide comment
@craigcarnell

craigcarnell Mar 5, 2015

Contributor

@paulbiss Any idea on when 3.6.0 will make it's way out of the door?

Contributor

craigcarnell commented Mar 5, 2015

@paulbiss Any idea on when 3.6.0 will make it's way out of the door?

@paulbiss

This comment has been minimized.

Show comment
Hide comment
@paulbiss

paulbiss Mar 5, 2015

Contributor

@craigcarnell I think the plan is start rolling packages today or tomorrow (our packager isn't the fastest box...), I've got one more cherry-pick I need to push. It's been a busy week for everyone and we haven't had a chance to update the packaging system to push a second LTS.

Contributor

paulbiss commented Mar 5, 2015

@craigcarnell I think the plan is start rolling packages today or tomorrow (our packager isn't the fastest box...), I've got one more cherry-pick I need to push. It's been a busy week for everyone and we haven't had a chance to update the packaging system to push a second LTS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment