New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nodes are crashing, out of memory #40
Comments
@hawkinswnaf gave the good suggestion of using a serial console to monitor dmesg output to catch OOM messages right before crash. |
Additional testing has confirmed this issue is likely due to high memory usage in the serval daemon.
|
Likewise, leading from chatting briefly with Dan here, I suggested adding the "zram-swap" package presently in OpenWRT trunk to the commotion packages feed, and then enabling swap in kernelconfig. This would let you enable compressed swap memory on nodes, and ideally make the memory limit somewhat softer (i.e. help nodes avoid OOM errors and processes crashing). So, specifically, do make kernel_menuconfig and make these selections: This kernel config change can also be done via a patch, and such a patch is buried somewhere in openwrt-devel listserv archives (i.e. when the zram package was originally announced). Then copy the zram_swap package from trunk into commotionfeed and enable it. For my nodes with 32MB of RAM, I specify 6MB swap in /etc/config/system:
You can periodically check swap usage to ensure nothing is using excessive RAM:
|
Now THIS is cool. I can't wait to try it! On 10/04/2013 04:25 AM, Ben West wrote:
|
I am running zram-swap as described above on WasabiNet nodes, both 5.8GHz mesh backhaul and 2.4GHz mesh APs, and I can confirm that certain processes do not like to be swapped out, freezing or behaving erratically as a result. hostapd, wpa_supplicant, olsrd, crond/busybox, and whatever your captive portal agent is (nodogsplash or coovachilli), all certainly shouldn't be swapped out. Possibly commotiond too, although I've not had opportunity to test that. So, the zram_swap method described above lacks a bit in robustness. I think the mlock command can be used to prevent specific processes from being swapped out, although I'm uncertain of whether OpenWRT has this tool integrated. |
@andygunn Can we get the Detroit folks to review this issue for 1.1 using @elationfoundation's crashlog scripts? |
To follow-up here, I've seen the hostapd and/or wpa_supplicant processes on my Wasabi Nanostation M2 nodes occasionally crash under heavy load, and it does look to be well coupled with memory exhaustion from some misbehaving process. This is happening even with 3MB of zram swap, and furthermore even with vm.swappiness=0 specified in /etc/sysctl.conf. Although, noticeably fewer crashes with swappiness turned all the way down. When the hostapd or wpa_supplicant processes crash, you will see whichever wireless wireless VIF that process manages (e.g. the mesh adhoc VIF, the private AP) become unresponsive, even though the SSID remains visible. Note the process names "hostapd" and "wpa_supplicant" appear in the process table, regardless of if you're using the wpad or wpad-mini packages. You can verify such crashes with the presence of files like these in /tmp: |
@areynold Sure - I will need to pass along instructions to folks in Detroit, is there a wiki page or existing set of documentation to test with. Where are the crashlog scripts? @elationfoundation can you send them to me? |
I have the notes for testing for restarting nodes in the "I think my node is restarting. How can I tell?" Section of https://wiki.commotionwireless.net/doku.php?id=development_resources:router:troubleshooting_routers This will create a file on the node that will log whenever it restarts. |
What we're seeing at this point is that there are lots of different memory conditions that can cause a node to crash. While there are still memory leaks we're dealing with, the original leak that this issue was pointing to is way out of date, so I'm closing this out in favor of other, more specific issues. |
Picostations flashed with DR2 seem to crash periodically due to out of memory issues (nodes have ~700K free memory right before crash, as reported by top, and no other conditions for crash have been found). I have not been able to reliably reproduce the crash conditions, however, and we do not currently have a method for accurately measure process memory usage. However, both Serval and Nodogsplash report high memory usage.
The text was updated successfully, but these errors were encountered: