Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avahi fails to action wide area DNS requests after a long period of operation #159

Closed
andrewbonney opened this issue Dec 14, 2017 · 8 comments

Comments

@andrewbonney
Copy link

I'm trying to debug an issue we're seeing where unicast DNS browsing works correctly as soon as the avahi-daemon service is started, but stops functioning after a long period of running. I'd appreciate any pointers to likely aspects of the source code which may be to blame as it's proving difficult to debug given the long period between avahi-daemon starting and the issue first presenting itself.

This issue can be easily replicated across any of our Ubuntu 16.04 servers running avahi 0.6.32 as follows:

  • Restart avahi-daemon
  • Run the following browse command: $ avahi-browse -a -d -v
  • Results are presented correctly
  • Wait for an unknown period (may be a few hours, may be longer)
  • Run the browse command again
  • No records are presented, with just the messages 'Cache exhausted' and 'All for now' shown.
  • Restart avahi-daemon
  • Run the browse command again
  • Results are presented correctly again

I had thought this might be a cache expiry issue, but I've tested this beyond the TTL of the DNS records and avahi does behave correctly, issuing a new DNS request and processing the response.

When the issue does present itself, I can see via tcpdump that no DNS queries are issued when performing an avahi-browse. I'm yet to identify where in the code path this might be going wrong, and unfortunately there don't appear to be any errors logged associated with it.

@andrewbonney
Copy link
Author

Following a bit more debug this issue appears a little different to the original diagnoses. I believe this actually occurs on initial system boot rather than after a long period of operation.

Running 'avahi-browse -a -d -v' immediately after boot produces no records. If a reload or restart is performed on the avahi-daemon service the same command will then produce a valid set of records.

I assume there is some sort of startup order issue going on, but it's still not exactly clear what the root cause of this issue is. Presumably this is unique to the configuration of our systems.

Multicast DNS queries work perfectly throughout, so this is still very much limited to unicast queries. Any suggestions for things to try would be appreciated.

@db260179
Copy link

db260179 commented Jul 31, 2018

Hi Andrew,

This might help

https://github.com/machinekoder/querierd

Due to the nature of multicast traffic, if nothing is being sent then the packets will drop, so reason why when restarting avahi initiates a response to the multicast group.

Adding further to this, if your network does not have a igmp querier going on then multicast group traffic will get dropped.

Apart from above solution or buy a switch with a querier, this link might help
http://www.coexsi.fr/publications/igmp-querier/igmp-querier.pl

Only needs perl to run - have used on pfsense and openwrt,lede as a cronjob that sends out igmp queries.

Hope this helps?
David

@andrewbonney
Copy link
Author

This is about wide area (unicast) DNS, not multicast.

@midicase
Copy link

Sorry, but the querierd link explanation is no good (even if not related to current issue). Home routers may or may not filter wireless multicast and this is more of an issue of flooding the radio than any compliance. I have never seen filtering on wired side on consumer grade switches. In fact in dealing with broadcast/mutlicast traffic, consumer switches are reduced to hubs. Mysteriously disappearing multicast data is something new to me.

Have you tried an older version of Ubuntu? How about another distro?
Does the problem manifest itself if you use a direct connection between to machines?

@db260179
Copy link

db260179 commented Aug 1, 2018

Sorry andrew, just saw your last comment, but my statement is still correct regards discovery issues with multicast and home setups.

Haven't experience issues with the unicast DNS from avahi? Although hotplugging of nics with avahi can be an issue where it loses all cache and only a restart fixes it.

Is the ports 5353 verified working? (i guess, yes)

Have you tried the newer release 0.7?

@midicase by default most home grade routers block multicast traffic, usually adding static multicast routes or igmpproxy usually gets the multicast working. Commercial routers have igmp queries.

@lathiat
Copy link
Contributor

lathiat commented Aug 2, 2018

The above notes about multicast are correct for multicast but obviously not unicast.

I believe the root cause of this issue is related to resolv.conf loading and reloading. Currently we don't monitor resolv.conf for changes after startup. That is a duplicate of Issue #118

For now you may be able to look at a change like from PR #69 which suggested targeting loading avahi after network-online.target for systemd systems - if you are not using systemd generally I would suggest figuring out how to make avahi start after your resolv.conf is populated - or restart it after that file changes.

@lathiat
Copy link
Contributor

lathiat commented Aug 2, 2018

Closing this issue purely because it's a duplicate of #118 - we will obviously need to address this.

@andrewbonney
Copy link
Author

Thanks, that's a useful pointer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants