New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Olsrd: Frequently segfaults on startup #573
Comments
can you check your log provided, there are 2 lines "`Jun 14 07:59:02 16-4-uferwerk-17a olsrd[2385]: Writing '1' (was 0) to /proc/sys/net/ipv4/conf/all/send_redirects" What was happening before "Jun 14 07:59:02" ? |
These would be the first entry originating with olsrd.
|
I am seeing this on several of my routers, usually after a fresh reboot. The routers where I am seeing this behaviour are uplinks, providing internet access themselves. We have a roaming network here based on batman, as described at https://wiki.freifunk-potsdam.de/Roaming |
I never saw this with Kathleen, only with Hedy. |
in your initial post I can see that olsrd is restarted. can you explain why this happened? Was this caused by you? probably there is some relation to #405 ? |
I don't know. If olsrd got restarted, this is probably due to me going to services/OLSRv4 and just hitting "save & apply" because this sometime magically fixes things.
|
I don't know if it helps but here is a page fault I find in the log prior to the latest crash, which again happened when I was trying to visit the status page: ` Jun 23 16:25:20 16-5-uferwerk-17b olsrd[3338]: New main address: 10.22.16.5 ` |
This morning the same again: Opened Freifunk/Neighbours, found it to contain no entries, clicked "Administration" to log on and this cause the router to crash, so that it doesn't even respond to pings. |
What device are you using? Is there a statistics page that we can look at? For example, I am wondering if the router is running out of memory or if the conntrack connection limit is full. Have you tried disabling local statistics (rrd)? Did you customize the router config in any way? When pings don't work, are you connected wirelessly or with a cable? Have you tried ipv6 link local pings? There is also the possibility that the flash chip is corrupt or some other annoying hardware issue. But first I would like to rule out the memory and conntrack ideas before heading down this path. |
Am 24.06.2018 um 13:54 schrieb pmelange:
What device are you using?
This is about a GL Inet AR-150, but I am seeing the same occasionally on
a TP Link CPE 210
Is there a statistics page that we can look at? For example, I am
wondering if the router is running out of memory or if the conntrack
connection limit is full.
For this node, look at
https://monitor.freifunk-potsdam.de/grafana/d/000000008/stat-node-overview?var-hostname=16-4-uferwerk-17a
For the entire site, see
https://monitor.freifunk-potsdam.de/grafana/d/000000034/loc-uferwerk-werder
Have you tried disabling local statistics (rrd)?
Is disabled because the Potsdam community uses Grafana instead.
Did you customize the router config in any way?
Configured according to
https://wiki.freifunk-potsdam.de/Kathleen (without VPN)
https://wiki.freifunk-potsdam.de/Roaming
https://wiki.freifunk-potsdam.de/StatusUpdates
That is, followed the standard configuration for the Freifunk Potsdam
community, configured B.A.T.M.A.N. based roaming, again as done in
Potsdam, and set up Grafana data collection for the Freifunk Potsdam
monitor.
When pings don't work, are you connected wirelessly or with a cable?
Cable, from the local lan, that provides the uplink.
Have you tried ipv6 link local pings?
There is also the possibility that the flash chip is corrupt or some
other annoying hardware issue. But first I would like to rule out the
memory and conntrack ideas before heading down this path.
Does the grafana page suffice for that?
… —
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#573 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADKOOFW8YM96L_H2MoLSPA1PnsPy6LHUks5t_34QgaJpZM4UnW-H>.
--
Auto signature for you
Johannes Rohr
Senior Advisor, Climate Action, Russia
<https://www.iwgia.org/en/>
E-mail: jr@iwgia.org <mailto:jr@iwgia.org>
Phone: +49-221-7392871 <tel:+492217392871>
Skype:johannesrohr <skype:johannesrohr?call>
<https://www.iwgia.org/en/><https://www.facebook.com/IWGIA/><http://eepurl.com/cC6erX><https://twitter.com/IWGIA>
|
Cable, from the local lan, that provides the uplink.
"Local lan" sounds to me like you are on the WAN side of the freifunk
router. Is that right?
Have you tried ipv6 link local pings?
Does the grafana page suffice for that?
On the Grafana page, there are no statistics about the number of
neighbors and their ETX. Also conntrack is not shown (unless it's the
same as "Network Connections").
But it doesn't look like a memory issue.
Would you be able to connect to the serial console of the AR150 and log
all the messages which go by? That might provide a bit more detail.
The big question is "What happened between 3:45 and 8:30?"
Just as the router goes offline, there is a spike in the load avg and
CPU utilization. Why?
At 8:30 it seems that the router was restarted.
A log from the serial console would help a lot here.
|
I have now temporarily replaced the node by a TP-Link TL-WDR3600 v1. This box doesn't crash but I see the same weird phenomenon, that
So these are in all likelihood not issues related to any particular hardware. I have also witnessed this on TP-Link CPE 210, all of which are acting as uplinks and servers in our roaming network I never saw any of this with Kathleen. |
Am 24.06.2018 um 16:29 schrieb pmelange:
> Cable, from the local lan, that provides the uplink.
"Local lan" sounds to me like you are on the WAN side of the freifunk
router. Is that right?
Yes
Have you tried ipv6 link local pings?
No, can try next time
> Does the grafana page suffice for that?
On the Grafana page, there are no statistics about the number of
neighbors and their ETX. Also conntrack is not shown (unless it's the
same as "Network Connections").
Have to ask about that.
But it doesn't look like a memory issue.
Again, I see the empty olsr table on multiple devices.
Would you be able to connect to the serial console of the AR150 and log
all the messages which go by? That might provide a bit more detail.
The big question is "What happened between 3:45 and 8:30?"
3:45 the device was rebooted via cron.
Just as the router goes offline, there is a spike in the load avg and
CPU utilization. Why?
At 8:30 it seems that the router was restarted.
Yes, that was me, when it no longer responded to anything.
A log from the serial console would help a lot here.
<https://github.com/pmelange>Can you advise, how to access the serial
console? Is there a way without cracking open the box and thus voiding
the warranty?
… —
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#573 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADKOONB_9dJuCNxWnM8PmoeI2s5bRFxcks5t_6JbgaJpZM4UnW-H>.
--
Auto signature for you
Johannes Rohr
Senior Advisor, Climate Action, Russia
<https://www.iwgia.org/en/>
E-mail: jr@iwgia.org <mailto:jr@iwgia.org>
Phone: +49-221-7392871 <tel:+492217392871>
Skype:johannesrohr <skype:johannesrohr?call>
<https://www.iwgia.org/en/><https://www.facebook.com/IWGIA/><http://eepurl.com/cC6erX><https://twitter.com/IWGIA>
|
so again, on the TP-Link WDR 4300 I also see olsrd crashing, here, from the log: Jun 24 17:51:52 16-24-uferwerk-16 olsrd[2545]: Tunnel tnl_0a16100b added, to 10.22.16.11 After than I ran /etc/init.d/olsrd restart three times. The first time, the segfault reoccurred, the second time it looked ok, but the status page remained empty and only after the third time, the status page was filled with entries. |
So now I am seeing the issue on one TP Link CPE 210 which is connected to the same uplink router, and what I am seeing in the log is again Jun 25 08:25:57 16-5-uferwerk-17b kernel: [17729.309320] do_page_fault(): sending SIGSEGV to olsrd for invalid write access to 004689c0 running /etc/init.d/olsrd restart makes it run again. |
just now after rebooting two other CPEs which are also uplinks, but connected to a different router in another building, I found the same: The olsrd status pages were empty, only after running /etc/init.d/olsrd restart, they began to be filled. |
#513 should demystify your segfault-issue. but this is not explaining, why the olsrd stops randomly. probably it's fixed in the recent olsrd v0.9.6.x. Have you tried master or SAm0815_experimental-branch? |
Am 25.06.2018 um 19:37 schrieb Sven Roederer:
Jun 24 17:51:52 16-24-uferwerk-16 olsrd[2545]: olsr.org -
0.9.0.3-git_788312c-hash_c7f667fe7cf42baa389872842561b6c3 stopped
Jun 24 17:51:52 16-24-uferwerk-16 olsrd[2545]: OLSR: sendto IPv4
Bad file descriptor
Jun 24 17:51:52 16-24-uferwerk-16 olsrd[2545]: OLSR: sendto IPv4
Bad file descriptor
Jun 24 17:51:52 16-24-uferwerk-16 kernel: [ 60.682114]'
Jun 24 17:51:52 16-24-uferwerk-16 kernel: [ 60.682114]
do_page_fault(): sending SIGSEGV to olsrd for invalid write access
to 00468858
Jun 24 17:51:52 16-24-uferwerk-16 kernel: [ 60.690563] epc =
00417249 in olsrd[400000+38000]
Jun 24 17:51:52 16-24-uferwerk-16 kernel: [ 60.695410] ra =
0041dc4f in olsrd[400000+38000]
Jun 24 17:51:52 16-24-uferwerk-16 kernel: [ 60.700196]
#513 <#513> should
demystify your segfault-issue.
but this is not explaining, why the olsrd stops randomly. probably
it's fixed in the recent olsrd v0.9.6.x. Have you tried master or
SAm0815_experimental-branch?
I have used the stable releases, hedy 1.0.0 and 1.0.1
… —
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#573 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADKOOB-bbs0pmkxkDoMbPv1v3s50OSCGks5uAR_xgaJpZM4UnW-H>.
--
Auto signature for you
Johannes Rohr
Senior Advisor, Climate Action, Russia
<https://www.iwgia.org/en/>
E-mail: jr@iwgia.org <mailto:jr@iwgia.org>
Phone: +49-221-7392871 <tel:+492217392871>
Skype:johannesrohr <skype:johannesrohr?call>
<https://www.iwgia.org/en/><https://www.facebook.com/IWGIA/><http://eepurl.com/cC6erX><https://twitter.com/IWGIA>
|
This seems to be solved upstream (try out a development version of the firmware). If this is still an issue, please reopen. |
4cff2b1 bmx6: update version to fix freifunk-berlin#573 memory leak 71d7806 luci-app-bmx6: fixes error line in logread freifunk-berlin#578 920a045 Merge pull request freifunk-berlin#574 from pedro-nonfree/patch-2 272cd4c Merge pull request freifunk-berlin#579 from pedro-nonfree/patch-3
4cff2b1 bmx6: update version to fix freifunk-berlin#573 memory leak 71d7806 luci-app-bmx6: fixes error line in logread freifunk-berlin#578 920a045 Merge pull request freifunk-berlin#574 from pedro-nonfree/patch-2 272cd4c Merge pull request freifunk-berlin#579 from pedro-nonfree/patch-3
Since I upgrade my routers to hedy, olsrd seems to be quite unstable. Often, under OLSR/Neigbours I find no entry at all, which sometimes is remedied by going to Services/OLSRv4 and just clicking "save and apply".
Today, this did not help. I get the following console output:
The text was updated successfully, but these errors were encountered: