New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spurious name conflicts #117

Open
callegar opened this Issue Apr 26, 2017 · 27 comments

Comments

Projects
None yet
@callegar
Copy link

callegar commented Apr 26, 2017

Hi, hope this is the right place for reporting issues with avahi-daemon as the readme on my system points to the bugtracker on freedesktop.org that does not list avahi as a bug report target.

I am experiencing spurious name conflicts on various systems, all of which have a common trait in having two interfaces, one on the local lan, having a static IP address and the other getting a dhcp address from somewhere (typically an ADSL router).

What happens is the following. Suppose that the host is called "foo". Initially, it is correctly advertised as foo.local. After some time the name conflict occurs and the host starts being advertised as foo-2.local, foo-3.local, etc., even if it is certainly the sole host named foo on the network. In practice there is a spurious name conflict with the host itself, probably due to some race in avahi. The unfortunate result is that no other system cannot find "foo" no more on the network, since they look for foo.local.

I see the issue on a couple of debian jessie systems (avahi version 0.6.31); on a raspbian jessie system (same); and on an openwrt chaos calmer system (avahi version 0.6.31 again).

I see a lot of reports for this same issue (or possibly something similar) on many distro bugtrackers, applications bugtrackers and question sites:

I wonder if there is something misconfigured on my systems (and in this case some hit at diagnosing would be appreciated) or if this is an issue (possibly a race) with the avahi daemon.

Even if this cannot be fixed rapidly, I'd like to suggest an interim point release of avahi with an option to disable the name conflict analysis when he/she is absolutely sure that it won't be needed on his/her network.

@lathiat

This comment has been minimized.

Copy link
Owner

lathiat commented Apr 27, 2017

I agree that I have seen this from time to time, unfortunately I am not currently sure what causes it. I think in some cases it might be related to the reflector, but if that is not in use I am not sure.

How often is this happening? I wonder if we can setup a long term pcap capture to try and figure out what happens.

@callegar

This comment has been minimized.

Copy link
Author

callegar commented Apr 27, 2017

Rather frequently, I see it almost every odd day.

It seems to be associated to a lease expire on the interface getting the address from dhcp and probably has to do with the fact that there is both an IPv4 and an IPv6 address configured for the interface...

Apr 27 07:30:57 xyz dhcpcd[365]: eth0: soliciting a DHCPv6 lease
Apr 27 07:30:57 xyz dhcpcd[365]: eth0: fe80::6a7f:74ff:fe15:6a2e router available
Apr 27 07:30:57 xyz dhcpcd[365]: eth0: ADV fd57:81fe:80da::218/128 from fe80::6a7f:74ff:fe15:6a2e
Apr 27 07:30:57 xyz dhcpcd[365]: eth0: REPLY6 received from fe80::6a7f:74ff:fe15:6a2e
Apr 27 07:30:57 xyz dhcpcd[365]: eth0: adding address fd57:81fe:80da::218/128
Apr 27 07:30:57 xyz dhcpcd[365]: eth0: renew in 21600 seconds, rebind in 34560 seconds
Apr 27 07:30:57 xyz dhcpcd[365]: eth0: using IPv4LL address 169.254.139.60
Apr 27 07:30:57 xyz dhcpcd[365]: eth0: adding route to 169.254.0.0/16
Apr 27 07:30:57 xyz avahi-daemon[24276]: Joining mDNS multicast group on interface eth0.IPv4 with address 169.254.139.60.
Apr 27 07:30:57 xyz avahi-daemon[24276]: New relevant interface eth0.IPv4 for mDNS.
Apr 27 07:30:57 xyz avahi-daemon[24276]: Registering new address record for 169.254.139.60 on eth0.IPv4.
Apr 27 07:30:58 xyz avahi-daemon[24276]: Leaving mDNS multicast group on interface eth0.IPv6 with address fe80::600c:b99e:4f17:ce61.
Apr 27 07:30:58 xyz avahi-daemon[24276]: Joining mDNS multicast group on interface eth0.IPv6 with address fd57:81fe:80da:0:c99b:6cc1:2a7c:c139.
Apr 27 07:30:58 xyz avahi-daemon[24276]: Registering new address record for fd57:81fe:80da:0:c99b:6cc1:2a7c:c139 on eth0.*.
Apr 27 07:30:58 xyz avahi-daemon[24276]: Withdrawing address record for fe80::600c:b99e:4f17:ce61 on eth0.
Apr 27 07:30:58 xyz avahi-daemon[24276]: Withdrawing address record for fe80::a92:e068:3cb:7ae2 on wlan0.
Apr 27 07:30:58 xyz avahi-daemon[24276]: Withdrawing address record for 169.254.208.59 on wlan0.
Apr 27 07:30:58 xyz avahi-daemon[24276]: Withdrawing address record for 192.168.32.1 on wlan0.
Apr 27 07:30:58 xyz avahi-daemon[24276]: Withdrawing workstation service for wlan0.
Apr 27 07:30:58 xyz avahi-daemon[24276]: Withdrawing address record for 169.254.139.60 on eth0.
Apr 27 07:30:58 xyz avahi-daemon[24276]: Withdrawing workstation service for eth0.
Apr 27 07:30:58 xyz avahi-daemon[24276]: Withdrawing workstation service for lo.
Apr 27 07:30:58 xyz avahi-daemon[24276]: Host name conflict, retrying with xyz-2
@callegar

This comment has been minimized.

Copy link
Author

callegar commented Apr 27, 2017

I don't think that the reflector should be on anywhere, as it should be disabled by default, shouldn't it?

@callegar

This comment has been minimized.

Copy link
Author

callegar commented May 1, 2017

Preventing avahi-daemon from using the interface where the address is received from a dhcp server makes the issue disappear, but obviously it is not a solution.

@midicase

This comment has been minimized.

Copy link

midicase commented May 1, 2017

Avahi can't handle inter-connected multi-homed systems. We have to use the option to disable one of the interfaces to avoid the daemon from seeing mutliple name registration requests (one from each network).

Best I can tell there isn't a better solution since this is really an issue with the design of the protocol.

@callegar

This comment has been minimized.

Copy link
Author

callegar commented May 1, 2017

Still I wonder...

  1. why do I not just see a -1, but also a -2 and every now and then a -3 too?
  2. wouldn't it be possible to have the two interfaces both managed by avahi-daemon with a reproducible name assignment? Like getting from the very start hostname.local for the IP on the one of the two nics and hostname-2.local for the name on the other nic, rather than having things in one way at boot and then getting the -x suffix when the dhcp lease is renewed?
    I am asking because the issue is not the -2, but not knowing in advance how an host will be reachable.
@lathiat

This comment has been minimized.

Copy link
Owner

lathiat commented May 18, 2017

I totally had this happen on one of my own systems, with a very similar looking log to you. Downed and uped a bunch of interfaces rapidly. There must definitely be a bug there I'll have to try and figure out if I can make it reproducible.

Some kind of race to do with the new interfaces appearing while probing perhaps.. there is a related issue for services that get stuck registering. So maybe the logic for interfaces coming and going needs to be reviewed.

@lathiat

This comment has been minimized.

Copy link
Owner

lathiat commented May 20, 2017

OK I think I figured it out. What's happening is an address is withdrawn before it finishes probing, but we receive a copy of our own probe immediately after and thus assume a conflict (our own multicast probes are mirrored back to us by the kernel). A bit of a race condition.

This happens a lot with IPv6 where we withdraw the fe80 link-local address once we receive a global address and can happen very rapidly on boot. Of note you are using IPv6 on your site, as well as mine where I am seeing this. On IPv4 address withdrawls while probing are quite uncommon.

So we'll need to identify those in some way, either with a ghost list or otherwise determining that the probe looped back. I'll look at that.

@lathiat lathiat added the bug label May 20, 2017

@lathiat lathiat added this to the v0.6.33 milestone May 20, 2017

@lathiat

This comment has been minimized.

Copy link
Owner

lathiat commented Jun 21, 2017

Confirmed the issue as I suspected, we withdraw our address record but only then receive a copy of our own probe and decide it is a conflict:

Jun 20 15:40:58 hyper avahi-daemon[6567]: Joining mDNS multicast group on interface vsw3.IPv6 with address fe80::9cf5:4ff:fef6:ec81.
Jun 20 15:40:58 hyper avahi-daemon[6567]: Registering new address record for fe80::9cf5:4ff:fef6:ec81 on vsw3.*.
Jun 20 15:40:58 hyper avahi-daemon[6567]: Leaving mDNS multicast group on interface vsw3.IPv6 with address fe80::9cf5:4ff:fef6:ec81.
Jun 20 15:40:58 hyper avahi-daemon[6567]: Withdrawing address record for fe80::9cf5:4ff:fef6:ec81 on vsw3.
Jun 20 15:40:58 hyper avahi-daemon[6567]: Received conflicting probe [hyper.local#011IN#011AAAA fe80::9cf5:4ff:fef6:ec81 ; ttl=120]. Local host lost. Withdrawing.

This happens because we revoke the link-local address from being advertised once we receive a global address.

Hope to have a fix for this shortly

@lathiat lathiat modified the milestones: v0.7, v0.8 Jul 13, 2017

@f0nt4

This comment has been minimized.

Copy link

f0nt4 commented Aug 4, 2017

Would a workaround be to disable ipv6 in the config when you're not using it?

@dmosberger

This comment has been minimized.

Copy link

dmosberger commented Oct 20, 2017

Any updates on this?

@Strayer

This comment has been minimized.

Copy link

Strayer commented Jun 22, 2018

Hey @lathiat, sorry for bugging you.

I think this just happened to me as well. I have a usual IPv4/6 dual stack network at home and run avahi-daemon in a Docker container with network_mode host.

This is the log:

daemon_1  | 2018-06-21T11:54:39.577225354Z Found user 'avahi' (UID 102) and group 'avahi' (GID 102).
daemon_1  | 2018-06-21T11:54:39.577682651Z Successfully dropped root privileges.
daemon_1  | 2018-06-21T11:54:39.578293096Z avahi-daemon 0.6.32 starting up.
daemon_1  | 2018-06-21T11:54:39.579337101Z WARNING: No NSS support for mDNS detected, consider installing nss-mdns!
daemon_1  | 2018-06-21T11:54:39.579720148Z Successfully called chroot().
daemon_1  | 2018-06-21T11:54:39.580163570Z Successfully dropped remaining capabilities.
daemon_1  | 2018-06-21T11:54:39.580505155Z Loading service file /services/smbd.service.
daemon_1  | 2018-06-21T11:54:39.583626444Z Joining mDNS multicast group on interface enp5s0.IPv6 with address 2003:e5:d70e:bc00:265e:beff:fe06:ed43.
daemon_1  | 2018-06-21T11:54:39.583822517Z New relevant interface enp5s0.IPv6 for mDNS.
daemon_1  | 2018-06-21T11:54:39.583868742Z Joining mDNS multicast group on interface enp5s0.IPv4 with address 192.168.178.58.
daemon_1  | 2018-06-21T11:54:39.583883829Z New relevant interface enp5s0.IPv4 for mDNS.
daemon_1  | 2018-06-21T11:54:39.584342326Z Network interface enumeration completed.
daemon_1  | 2018-06-21T11:54:39.585046921Z Registering new address record for 2003:e5:d70e:bc00:265e:beff:fe06:ed43 on enp5s0.*.
daemon_1  | 2018-06-21T11:54:39.585065258Z Registering new address record for 192.168.178.58 on enp5s0.IPv4.
daemon_1  | 2018-06-21T11:54:40.492960211Z Server startup complete. Host name is nibelungenhort.local. Local service cookie is 3353354288.
daemon_1  | 2018-06-21T11:54:41.400508244Z Service "nibelungenhort" (/services/smbd.service) successfully established.
daemon_1  | 2018-06-22T02:48:49.975073115Z Registering new address record for fe80::265e:beff:fe06:ed43 on enp5s0.*.
daemon_1  | 2018-06-22T02:48:49.984858578Z Withdrawing address record for fe80::265e:beff:fe06:ed43 on enp5s0.
daemon_1  | 2018-06-22T02:48:49.984962940Z Registering new address record for fe80::265e:beff:fe06:ed43 on enp5s0.*.
daemon_1  | 2018-06-22T02:48:50.983388121Z Withdrawing address record for fe80::265e:beff:fe06:ed43 on enp5s0.
daemon_1  | 2018-06-22T02:48:50.983547607Z Registering new address record for fe80::265e:beff:fe06:ed43 on enp5s0.*.
daemon_1  | 2018-06-22T02:48:50.983631244Z Registering new address record for fd00::265e:beff:fe06:ed43 on enp5s0.*.
daemon_1  | 2018-06-22T02:48:51.269257637Z Withdrawing address record for fd00::265e:beff:fe06:ed43 on enp5s0.
daemon_1  | 2018-06-22T02:48:51.269418273Z Withdrawing address record for fe80::265e:beff:fe06:ed43 on enp5s0.
daemon_1  | 2018-06-22T02:48:51.269500173Z Registering new address record for fd00::265e:beff:fe06:ed43 on enp5s0.*.
daemon_1  | 2018-06-22T02:48:51.269576160Z Registering new address record for fe80::265e:beff:fe06:ed43 on enp5s0.*.
daemon_1  | 2018-06-22T02:48:52.686728887Z Registering new address record for 2003:e5:d70b:3400:265e:beff:fe06:ed43 on enp5s0.*.
daemon_1  | 2018-06-22T02:48:52.690760136Z Withdrawing address record for fd00::265e:beff:fe06:ed43 on enp5s0.
daemon_1  | 2018-06-22T02:48:52.690858948Z Withdrawing address record for fe80::265e:beff:fe06:ed43 on enp5s0.
daemon_1  | 2018-06-22T02:48:52.690937072Z Withdrawing address record for 2003:e5:d70e:bc00:265e:beff:fe06:ed43 on enp5s0.
daemon_1  | 2018-06-22T02:48:52.691012722Z Withdrawing address record for 192.168.178.58 on enp5s0.
daemon_1  | 2018-06-22T02:48:52.696120164Z Host name conflict, retrying with nibelungenhort-2
daemon_1  | 2018-06-22T02:48:52.697784540Z Registering new address record for 2003:e5:d70b:3400:265e:beff:fe06:ed43 on enp5s0.*.
daemon_1  | 2018-06-22T02:48:52.697876765Z Registering new address record for 192.168.178.58 on enp5s0.IPv4.
daemon_1  | 2018-06-22T02:48:54.587279952Z Server startup complete. Host name is nibelungenhort-2.local. Local service cookie is 3353354288.
daemon_1  | 2018-06-22T02:48:55.489180837Z Service "nibelungenhort-2" (/services/smbd.service) successfully established.
@midicase

This comment has been minimized.

Copy link

midicase commented Jun 22, 2018

two interfaces
dual stack

Make sure you are not hitting a shortcoming in the protocol involving the daemon seeing other daemons through multiple paths. System announces itself through one interface and gets rejected after announcing itself on subsequent interfaces due to being a duplicate.

I always use allow-interfaces/deny-interfaces to force avahi to use only a single interface (in my industry this is typically the management interface). After that I have not had this issue.

@Strayer

This comment has been minimized.

Copy link

Strayer commented Jun 22, 2018

There are two interfaces on my system, although only one is actually up and connected to the network. As far as I can see avahi-daemon only works on the connected interface (enp5s0), but I'll try manually allowing it.

@ondrej1024

This comment has been minimized.

Copy link

ondrej1024 commented Jun 22, 2018

is anyone working on a fix for this? @lathiat said

Hope to have a fix for this shortly

exactly a year ago? Any progess? Thanks

@lathiat

This comment has been minimized.

Copy link
Owner

lathiat commented Jun 29, 2018

allow-interfaces will work around the issue as it's a bug in handling interfaces rapidly adding and removing addresses (particularly noticeable if you have globally routable IPv6 addresses, as we add then remove the link local address)

Still planning a fix

@Strayer

This comment has been minimized.

Copy link

Strayer commented Jul 13, 2018

So, this is still happening to me. I've set allow-interfaces=enp5s0 in avahi-daemon.conf as suggested here, but that didn't help. Still the same log messages as posted above.

@knro

This comment has been minimized.

Copy link

knro commented Aug 17, 2018

I tried the allow-interfaces method as well but it's not working. Is there another work-around for this? How about a fix?

@lathiat Any ETA? Many distros are reporting the same bug.

@Strayer

This comment has been minimized.

Copy link

Strayer commented Aug 18, 2018

The only workaround for me is a daily restart of avahi-daemon. I'll soon replace this by an automatic restart if the daemon logs the error message, but for now this works for me. Not ideal, but eh… it's just for my homelab and nothing critical.

@gramels

This comment has been minimized.

Copy link

gramels commented Nov 27, 2018

seems like a working work around is

cache-entries-max=0

@alexforencich

This comment has been minimized.

Copy link

alexforencich commented Dec 19, 2018

Maybe there just needs to be a configuration option to completely disable conflict checking. Or at least to prevent modification of the host name if a conflict is discovered. I set the host names I want manually. I expect avahi to faithfully announce the hostname that I have configured, not to change it sporadically. The whole point of avahi is so I can connect based on the host name alone. If avahi is changing the host name for any reason whatsoever, this defeats the whole purpose of using avahi.

@alexforencich

This comment has been minimized.

Copy link

alexforencich commented Dec 19, 2018

@gramels that workaround might prevent avahi from discovering a phantom conflict and changing the host name, but it also prevents avahi lookups from working. I cannot perform any lookups via avahi with cache-entries-max set to zero.

@knro

This comment has been minimized.

Copy link

knro commented Dec 20, 2018

This is indeed awful. Are there any plans to fork this project due to the lack of maintenance?

@lathiat

This comment has been minimized.

Copy link
Owner

lathiat commented Dec 20, 2018

Unfortunately the host-name conflict detection is part of the Multicast DNS spec so any such option would both violate it and just not work well. If another host on the network is actually advertising your hostname trying to connect to it with that hostname will unreliably connect to your machine anyway.

Obviously in this case the bug is such that there is not a real conflict. I did identify the cause for this, I will try and get a fix patched shortly.

@alexforencich

This comment has been minimized.

Copy link

alexforencich commented Dec 20, 2018

There should still be an option to disable hostname mangling on specific devices. Obviously this would not default to on, but it would prevent devices that are supposed to be accessible with a specific hostname from getting permanently 'bumped'. Perhaps the way to do this would be to continuously retry until the correct name can be announced. Another option would be to use the 'bumped' name, but periodically retry (say, every 60 seconds) the correct name so that if a device does get 'bumped', it will eventually return to its correct name. Again, this can be off by default, but available for devices that are supposed to be accessible with a specific name.

@satter

This comment has been minimized.

Copy link

satter commented Dec 21, 2018

Unplugging ethernet cable from my linux pc triggers this issue. It is Ubuntu 18.04 system with dual ipv4/ipv6 network stack.

@knro

This comment has been minimized.

Copy link

knro commented Jan 2, 2019

Any update on this? This bug is making my life miserable at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment