New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kernel panic - Out of memory #109
Comments
whangarei: |
opua: |
You should post the logs in https://gist.github.com/ instead of in a comment |
Thanks for the logs and report! Ok, I found such a node here in Lübeck, too. A 1043ND v1 (freifunk-24a43c74493a.nodes.ffhl), using gluon-ffhl-0.4. It has the mesh-vpn activated but is only meshing via WLAN. This node restarted five days ago and shows an OOM in /sys/kernel/debug/crashlog. I'm now running a script to hourly log free/ps:
started via "start-stop-daemon -S -x /root/mem-logger.sh". Let's see how that turns out in the next couple of days. |
Hmm, now I got a crashlog without any user-space process is listed with "page allocation failure" before, just started with "kthreadd invoked oom-killer" :((( |
Which process has the allocation failure first says nothing at all, it has to do nothing with the reason there is no memory left... Looking at the process list, no process seems to consume much memory (look at the RSS column), so the leak must be in the kernel... :/ |
Problem also occured with af61c46 / 0.4+0-exp20140520 on TL-WR841ND v8 and with v2014.1a on Nanostation loco M2 :( |
Also v2014.2 on Nanostation loco M2 is affected... |
Also pressent in 2014.2-56-g3a8af0c / 0.4.2+0-exp20140729 on TL-WR841ND v8. |
Okay, please check if the switch to Barrier Breaker (everything starting with v2014.3-21-ge53f310) has any effect on this issue. |
v2014.3-23-gddd7c16 on a TL-WR842N/ND v1: I'm still getting page allocation failures. No reboot so far, but the router is only at 1,5 hours uptime so far. As before, the page allocation failures start a long time before the reboot. |
Page allocation failures are still appearing, but no reboot yet.
please let me know if I can help with some logs (etc). If you need access to the node in question, I can add your pubkey. |
Sorry, but "opua" rebooted after 38h with v2014.3-23-gddd7c16 :( |
My node now has an uptime of 231 hours, a duration I never achieved pre-BB. So maybe BB does not solves the problem (as Joachim reported a reboot) but makes the situation noticeably better? |
Short update, only "opua" has had only 2 reboots caused by Out of memory since on BB, so it looks MUCH better :) "whangarei" don't rebooted by this issue... |
Also in 0.5.1 / v2014.3.1 is very affected by this bug, got 4 reboots in 10 hours, shortest uptime 7 min followed by 11 min... |
Yes, I'm seeing similar issues with v2014.3.1, two of my nodes reboot several times a day. Interestingly my nodes seem to be the only Freifunk Lübeck nodes affected (out of about 10 which are already using 2014.3.1). |
It seams it depend on the location.. I have 5-10 clients, most with poor signal levels, so many retransmits and sometimes higher traffic. Normally also poor signal to the other node, but not this time. It's maybe very fare away from a lab environment.. BB seams to be much more stable, so why not a "bugfix" based on BB or a release of 2014.4 and putting all the open milestones to 2014.5 ? |
Two suggestion to save memory on the node, stable and experimental. In /lib/gluon/cron/alfred put both commands separated by ";" in one line, so one will executed after the other and not be forked by crond at the same time, so it needs less memory. The tmpfs for /tmp is much to large on a system without swap! Every user or process can cause a out-of-memory condition buy only writing 4-6Mbyte incompressible data there. No-space-left-on-device is much easier to debug... The size will be set at /lib/preinit/10_essential_fs but can't be changed there after the Firmware is builded, because /overlay will be mounted later. "mount tmpfs /tmp -t tmpfs -o remount,size=1024k,nosuid,nodev,noatime" works at any time. Of cause, both not solve this problem :/ |
This should reduce memory consumption a little. freifunk-gluon/gluon#109 (comment)
This should reduce memory consumption a little. #109 (comment)
Closing, since the switch to Barrier Breaker there should be no random reboots any more (all reboots should have specific reasons for which separate tickets should be opened) |
* Hoodselector: is now able to handle Polygon hoods. !60 * Hoodselector: does not use `scan dump` anymore which saved airtime this functionality is not avalable in LEDE annymore. * Hoodselector: L2TP tunneldigger support freifunk-gluon#47 * Hoodselector: the upgrade script got a refactoring. Dead code is removed. desing failuer is fixed, the script use the hood BSSID instead the possible redundant hood name for hood identification. freifunk-gluon#107 * Hoodselector: New state, a router from hood A which is connected to an another router from hood B is now able to switch back in its own hood if a router from hood A viseble. freifunk-gluon#108 * Hoodselector: in state "radio less" is now ensured that mesh on LAN / WAN is enabled befor entering mode. freifunk-gluon#109 * Hoodselector: A bug in the function get_mesh_if() is fixed now. This function returns now a list of all mesh interfaces exzept the VPN interface freifunk-gluon#116 * Hoodselector: Old VPN configurations will be deleted now. If a hood without VPN peers got configured the old peers from the old hood was still presend. freifunk-gluon#117 * Hoodselector: many functions of the hoodselector are now placed in a lua libary. freifunk-gluon#118
* Hoodselector: is now able to handle Polygon hoods. !60 * Hoodselector: does not use `scan dump` anymore which saved airtime this functionality is not avalable in LEDE annymore. * Hoodselector: L2TP tunneldigger support freifunk-gluon#47 * Hoodselector: the upgrade script got a refactoring. Dead code is removed. desing failuer is fixed, the script use the hood BSSID instead the possible redundant hood name for hood identification. freifunk-gluon#107 * Hoodselector: New state, a router from hood A which is connected to an another router from hood B is now able to switch back in its own hood if a router from hood A viseble. freifunk-gluon#108 * Hoodselector: in state "radio less" is now ensured that mesh on LAN / WAN is enabled befor entering mode. freifunk-gluon#109 * Hoodselector: A bug in the function get_mesh_if() is fixed now. This function returns now a list of all mesh interfaces exzept the VPN interface freifunk-gluon#116 * Hoodselector: Old VPN configurations will be deleted now. If a hood without VPN peers got configured the old peers from the old hood was still presend. freifunk-gluon#117 * Hoodselector: many functions of the hoodselector are now placed in a lua libary. freifunk-gluon#118
* Hoodselector: is now able to handle Polygon hoods. !60 * Hoodselector: does not use `scan dump` anymore which saved airtime this functionality is not avalable in LEDE annymore. * Hoodselector: L2TP tunneldigger support freifunk-gluon#47 * Hoodselector: the upgrade script got a refactoring. Dead code is removed. desing failuer is fixed, the script use the hood BSSID instead the possible redundant hood name for hood identification. freifunk-gluon#107 * Hoodselector: New state, a router from hood A which is connected to an another router from hood B is now able to switch back in its own hood if a router from hood A viseble. freifunk-gluon#108 * Hoodselector: in state "radio less" is now ensured that mesh on LAN / WAN is enabled befor entering mode. freifunk-gluon#109 * Hoodselector: A bug in the function get_mesh_if() is fixed now. This function returns now a list of all mesh interfaces exzept the VPN interface freifunk-gluon#116 * Hoodselector: Old VPN configurations will be deleted now. If a hood without VPN peers got configured the old peers from the old hood was still presend. freifunk-gluon#117 * Hoodselector: many functions of the hoodselector are now placed in a lua libary. freifunk-gluon#118
* Hoodselector: is now able to handle Polygon hoods. !60 * Hoodselector: does not use `scan dump` anymore which saved airtime this functionality is not avalable in LEDE annymore. * Hoodselector: L2TP tunneldigger support freifunk-gluon#47 * Hoodselector: the upgrade script got a refactoring. Dead code is removed. desing failuer is fixed, the script use the hood BSSID instead the possible redundant hood name for hood identification. freifunk-gluon#107 * Hoodselector: New state, a router from hood A which is connected to an another router from hood B is now able to switch back in its own hood if a router from hood A viseble. freifunk-gluon#108 * Hoodselector: in state "radio less" is now ensured that mesh on LAN / WAN is enabled befor entering mode. freifunk-gluon#109 * Hoodselector: A bug in the function get_mesh_if() is fixed now. This function returns now a list of all mesh interfaces exzept the VPN interface freifunk-gluon#116 * Hoodselector: Old VPN configurations will be deleted now. If a hood without VPN peers got configured the old peers from the old hood was still presend. freifunk-gluon#117 * Hoodselector: many functions of the hoodselector are now placed in a lua libary. freifunk-gluon#118
* fix freifunk-gluon#109 * statistics can now toggled on/off via wizard * move private to own page
Have observed randomly reboots on both of my TL-WR841ND v8 since running Gluon v2014.1a (Hamburg v0.4a), occurred 8 times in 3 weeks on each of it. Seams to have noting to do with issue #93.
Now I have found a crashlog file at /sys/kernel/debug/crashlog which make it really easy to report and identify this problem.
On whangarei there was 5 clients, 2 WLAN and 0 VPN at this time.
On opua 3 clients, 1 WLAN and 0 VPN at the time of his reboot, based on the Verbindungsstatistik from ohrensessel.net
Unfortunately I have no programing skills, so I don't have any idea what's going wrong...
The text was updated successfully, but these errors were encountered: