Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dnsmasq networking error breaks interfaces #95

Closed
seamustuohy opened this issue Jan 17, 2014 · 6 comments
Closed

dnsmasq networking error breaks interfaces #95

seamustuohy opened this issue Jan 17, 2014 · 6 comments
Assignees
Milestone

Comments

@seamustuohy
Copy link
Collaborator

Due to the header file displaying the ipv4 address of a mesh network interface when dnsmasq breaks the luci interfaces break as well. While I will add an issue for the interfaces the dnsmasq issue is far more important and sinister.

Once this bug appears it seems to continue to present itself across multiple runs of "/etc/init.d/network restart".
Running "/etc/init.d/dnsmasq restart" or ".../dnsmasq reload" does nothing either.

This problem persisted through multiple reboots.

I believe that this bug is related to this logfile section.

On a broken network restart

o-DBus no-i18n no-IDN DHCP no-DHCPv6 no-Lua TFTP no-conntrack no-ipset no-auth
Jan 16 16:22:53 commotion daemon.info dnsmasq-dhcp[31147]: DHCP, IP range 10.138.149.2 -- 10.
138.149.151, lease time 12h
Jan 16 16:22:53 commotion daemon.info dnsmasq[31147]: using local addresses only for domain m
esh.local
Jan 16 16:22:53 commotion daemon.warn dnsmasq[31147]: no servers found in /tmp/resolv.conf.au
to, will retry
Jan 16 16:22:53 commotion daemon.info dnsmasq[31147]: read /etc/hosts - 2 addresses
Jan 16 16:22:53 commotion daemon.err dnsmasq[31147]: failed to load names from /var/run/hosts
_olsr: No such file or directory
Jan 16 16:22:53 commotion daemon.info dnsmasq-dhcp[31147]: read /etc/ethers - 0 addresses
Jan 16 16:22:55 commotion daemon.info olsrd[30964]: olsr.org - 0.6.5.4-git_4c19cba-hash_3667
acb4ad7e32204039db1f6b9bc660 - successfully started

On a working network restart

o-DBus no-i18n no-IDN DHCP no-DHCPv6 no-Lua TFTP no-conntrack no-ipset no-auth
Jan 16 16:21:39 commotion daemon.info dnsmasq-dhcp[29616]: DHCP, IP range 10.138.149.2 -- 10.
138.149.151, lease time 12h
Jan 16 16:21:39 commotion daemon.info dnsmasq[29616]: using local addresses only for domain m
esh.local
Jan 16 16:21:39 commotion daemon.info dnsmasq[29616]: reading /tmp/resolv.conf.auto
Jan 16 16:21:39 commotion daemon.info dnsmasq[29616]: using nameserver 208.67.222.222#53
Jan 16 16:21:39 commotion daemon.info dnsmasq[29616]: using local addresses only for domain m
esh.local
Jan 16 16:21:39 commotion daemon.info dnsmasq[29616]: read /etc/hosts - 2 addresses
Jan 16 16:21:39 commotion daemon.err dnsmasq[29616]: failed to load names from /var/run/hosts
_olsr: No such file or directory
Jan 16 16:21:39 commotion daemon.info dnsmasq-dhcp[29616]: read /etc/ethers - 0 addresses
Jan 16 16:21:42 commotion daemon.info olsrd[29476]: olsr.org - 0.6.5.4-git_4c19cba-hash_3667
acb4ad7e32204039db1f6b9bc660 - successfully started

@westbywest
Copy link
Collaborator

One of the freifunk packages inserted this cronjob upon install to
periodically restart dnsmasq with SIGHUP due to an apparent problem with
dnsmasq crashing. (Not sure if the balky version of dnsmasq that prompted
this cronjob in the first place has since been replaced with a newer/fixed
version.)

5 * * * * killall -HUP dnsmasq

Does this fashion of restarting dnsmasq help with the problem you're
describing?

On Fri, Jan 17, 2014 at 3:03 PM, Seamus Tuohy notifications@github.comwrote:

Due to the header file displaying the ipv4 address of a mesh network
interface when dnsmasq breaks the luci interfaces break as well. While I
will add an issue for the interfaces the dnsmasq issue is far more
important and sinister.

Once this bug appears it seems to continue to present itself across
multiple runs of "/etc/init.d/network restart".
Running "/etc/init.d/dnsmasq restart" or ".../dnsmasq reload" does nothing
either.

This problem persisted through multiple reboots.

I believe that this bug is related to this logfile section.
On a broken network restart

o-DBus no-i18n no-IDN DHCP no-DHCPv6 no-Lua TFTP no-conntrack no-ipset
no-auth
Jan 16 16:22:53 commotion daemon.info dnsmasq-dhcp[31147]: DHCP, IP range
10.138.149.2 -- 10.
138.149.151, lease time 12h
Jan 16 16:22:53 commotion daemon.info dnsmasq[31147]: using local
addresses only for domain m
esh.local
Jan 16 16:22:53 commotion daemon.warn dnsmasq[31147]: no servers found in
/tmp/resolv.conf.au
to, will retry
Jan 16 16:22:53 commotion daemon.info dnsmasq[31147]: read /etc/hosts - 2
addresses
Jan 16 16:22:53 commotion daemon.err dnsmasq[31147]: failed to load names
from /var/run/hosts

_olsr: No such file or directory Jan 16 16:22:53 commotion daemon.info
http://daemon.info dnsmasq-dhcp[31147]: read /etc/ethers - 0 addresses
Jan 16 16:22:55 commotion daemon.info http://daemon.info olsrd[30964]:
olsr.org http://olsr.org - 0.6.5.4-git_4c19cbahttps://github.com/opentechinstitute/commotion-router/commit/4c19cba
-hash_3667
acb4ad7e32204039db1f6b9bc660 - successfully started
On a working network restart

o-DBus no-i18n no-IDN DHCP no-DHCPv6 no-Lua TFTP no-conntrack no-ipset
no-auth
Jan 16 16:21:39 commotion daemon.info dnsmasq-dhcp[29616]: DHCP, IP range
10.138.149.2 -- 10.
138.149.151, lease time 12h
Jan 16 16:21:39 commotion daemon.info dnsmasq[29616]: using local
addresses only for domain m
esh.local
Jan 16 16:21:39 commotion daemon.info dnsmasq[29616]: reading
/tmp/resolv.conf.auto
Jan 16 16:21:39 commotion daemon.info dnsmasq[29616]: using nameserver
208.67.222.222#53
Jan 16 16:21:39 commotion daemon.info dnsmasq[29616]: using local
addresses only for domain m
esh.local
Jan 16 16:21:39 commotion daemon.info dnsmasq[29616]: read /etc/hosts - 2
addresses
Jan 16 16:21:39 commotion daemon.err dnsmasq[29616]: failed to load names
from /var/run/hosts

_olsr: No such file or directory Jan 16 16:21:39 commotion daemon.info
http://daemon.info dnsmasq-dhcp[29616]: read /etc/ethers - 0 addresses
Jan 16 16:21:42 commotion daemon.info http://daemon.info olsrd[29476]:
olsr.org http://olsr.org - 0.6.5.4-git_4c19cbahttps://github.com/opentechinstitute/commotion-router/commit/4c19cba
-hash_3667
acb4ad7e32204039db1f6b9bc660 - successfully started


Reply to this email directly or view it on GitHubhttps://github.com//issues/95
.

Ben West
me@benwest.name

@jheretic
Copy link
Member

I'm not sure what this has to do with dnsmasq. Dnsmasq shouldn't have any bearing on the IPv4 addresses of the node itself. Does it have something to do with hostname resolution?

@ghost ghost assigned jheretic Jan 21, 2014
@jheretic
Copy link
Member

Are either of these a node which has received its address via DHCP?

@seamustuohy
Copy link
Collaborator Author

Here is a debug dump you can look at. https://gist.github.com/elationfoundation/8541737

@jheretic
Copy link
Member

@westbywest, I suspect that that cronjob is not to restart dnsmasq when it crashes, but rather to get it to reload its configuration file. This might be needed when, say, /var/run/hosts_olsr is updated.

On further testing of this issue, I think the issue wasn't that dnsmasq caused networking to not come up, but the other way around. The debug logs indicate the mesh interface didn't come up, so /tmp/resolved.conf.auto wasn't populated, which may have been why an empty table was returned to luci by dnsmasq. Since the logs don't indicate why the mesh backhaul didn't come up, and it may have been due to an issue already fixed (and because the luci issue was fixed), I'm going to close this issue and aim to find the original problem if it still exists during rigorous stability testing for v1.1.

@andygunn andygunn modified the milestone: 1.1 Feb 20, 2014
@westbywest
Copy link
Collaborator

Understood. I would recommend keeping an eye on this OpenWRT issue, if you are seeing wireless interfaces crash or become unresponsive. I'm still seeing this myself, both the adhoc VIF and also private VIF, very sporadically locking up under heavy load conditions in AA r39154.
https://dev.openwrt.org/ticket/13681

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants