Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is 'unicast SOA heuristic' a problem for end users? #75

Closed
giox069 opened this issue Oct 31, 2020 · 12 comments
Closed

Is 'unicast SOA heuristic' a problem for end users? #75

giox069 opened this issue Oct 31, 2020 · 12 comments

Comments

@giox069
Copy link

giox069 commented Oct 31, 2020

Take new linux desktop with nss-mdns 0.14 (Ubuntu 20.04, Ubuntu 20.10...). Put this desktop PC at home or in a small office where the ISP default DNS is serving a SOA record for ".local" domain.

Printing from that box to a local network printer will be impossible for the common end user, because printername.local resolution will never work due to SOA heuristic.

This is what the DNS for an Italian ISP, called WINDTRE (formerly known as WIND), returns when requesting the SOA record for "local" zone:

user@pc:~$ dig @151.5.216.15 local SOA
[...]
;; ANSWER SECTION:
local.			30	IN	SOA	blacklistw3.zone. hostmaster.feedrpz.block.zone. 1 10800 3600 604800 86400
local.rpz.GS.local.	3600	IN	A	127.0.0.2
[...]

This, according to README.md on SOA heuristic, will prevent libnss-mdns to resolve hosts via mDNS.

And just to complete all the steps:

user@pc:~$ dig @151.5.216.15 myprintername.local A
[...]
;; ANSWER SECTION:
myprintername.local.	60	IN	A	40.68.249.35

myprintername.local will then be resolved with that ISP advertisements webserver instead of NXDOMAIN, so CUPS will try to send your prints to your ISP.

I know that

  • There is a workaround on README.md: create the appropiate /etc/mdns.allow and use mdns4 not _minimal on nsswitch.conf
  • The ISP should not do that

but a standard user will never be able to track its problem to a SOA heuristic problem, come here, understand and fix as explained on the README.md

Also please note that current MacOS does not seem to be affected by this problem. I can't see SOA queries when launching ping myprinntername.local, and the printer name is resolved locally with its private IP address.

@agoode
Copy link
Collaborator

agoode commented Nov 2, 2020

This is very interesting. I wonder if Apple software has other heuristics now, or if they have just stopped with SOA probing.

What version of macOS are you testing?

@giox069
Copy link
Author

giox069 commented Nov 2, 2020

I wrote something wrong for MacOS: I think i did not see the initial SOA request due to internal caching.

MacOS (Catalina 10.15.7) sends the SOA query, and then the A query to the DNS. If one fails, it fallbacks to mDNS

this is a dump for ping myprinter.local from last MacOS after wi-fi disconnect and reconnect (to flush DNS/mDNS caches):

MacOS case when SOA query to DNS gives an answer, but A query fails:

14:52:54.307372 IP 192.168.56.23.59965 > 192.168.56.254.53: 54781+ SOA? local. (23)
14:52:54.309151 IP 192.168.56.254.53 > 192.168.56.23.59965: 54781* 1/0/0 SOA (99)
14:52:54.310810 IP 192.168.56.23.57289 > 192.168.56.254.53: 61275+ A? myprinter.local. (33)
14:52:54.312455 IP 192.168.56.254.53 > 192.168.56.23.57289: 61275 NXDomain* 0/1/0 (109)
14:52:54.352724 IP 192.168.56.23.5353 > 224.0.0.251.5353: 0 [1au] A (QU)? myprinter.local. (56)
14:52:55.357062 IP 192.168.56.23.5353 > 224.0.0.251.5353: 0 [1au] A (QM)? myprinter.local. (56)

MacOS case when SOA query to DNS gives an answer, and DNS answer to A query too:

15:08:06.882295 IP 192.168.56.23.58514 > 192.168.56.254.53: 30804+ SOA? local. (23)
15:08:06.883997 IP 192.168.56.254.53 > 192.168.56.23.58514: 30804* 1/0/0 SOA (99)
15:08:06.885624 IP 192.168.56.23.50825 > 192.168.56.254.53: 17341+ A? myotherprinter.local. (38)
15:08:06.887194 IP 192.168.56.254.53 > 192.168.56.23.50825: 17341* 1/0/0 A 192.168.56.4 (54)

but with MacOS I'm still able to print without problems, despite I can see the same resolution request with bad A answer (192.168.56.4 is not a printer). No idea on how the magic works on these apple blackboxes... I'm still investigating.
In the meanwhile... kubuntu 20.04 is still trying to send prints to 192.168.56.4 (the emulated ISP advertisement host).

@agoode
Copy link
Collaborator

agoode commented Nov 7, 2020

If you are able to figure out the Apple heuristic, there is a chance we can implement it.

@gmk57
Copy link

gmk57 commented Nov 28, 2020

Oh yes, I'm an "end user" who spent several days to investigate mDNS not working after upgrading from Linux Mint 19 (≈Ubuntu 18.04) to Mint 20 (≈Ubuntu 20.04). I'm relatively new to Linux and mDNS, and had no idea which specific component is responsible for this issue. The only resource I found which seemed relatively relevant was from 2009, and setting AVAHI_DAEMON_DETECT_LOCAL=0 suggested there did not help.

Applying the solution from README fixed the issue for me, thanks for the info. But I still doubt that enabling unicast SOA heuristic by default was a good idea. Below I'll try to explain why.

  1. It seems to violate RFC 6762, section 22.1.3: "Name resolution APIs and libraries SHOULD recognize these names as special and SHOULD NOT send queries for these names to their configured (unicast) caching DNS server(s). This is to avoid unnecessary load on the root name servers and other name servers". Even if heuristic itself only checks for SOA, NSS as a whole then queries DNS for specific names.
  2. Introducing similar heuristic in Avahi back in 2009 affected a lot of people, because many ISPs around the globe have SOA record for ".local" domain, sometimes claiming this is being done "to protect root name servers".
  3. Companies using ".local" domain in DNS do violate RFC 6762 (e.g. section 22.1.6). These are probably mostly legacy setups from pre-mDNS era, so their admins should be accustomed to configuring name resolution since previous versions (e.g. removing [NOTFOUND=return] after mdns in /etc/nsswitch.conf). And they should have the skills to do it.
  4. On the other hand, this heuristic hurts home users, who did not violate anything (even if their ISP did), who hit it as an unexpected regression from previous version, and who may not be able to investigate and fix it.
  5. From a user's perspective, nss-mdns silently ignores requests, not giving any clue what's going on (Avahi in 2009 at least printed "avahi-daemon disabled because there is a unicast .local domain", which could be searched in the web).
  6. Reliably and automatically working mDNS is especially crucial for home networking (e.g. file sharing) after NetBIOS/SMB1 retirement in recent Samba versions (also shipped since Ubuntu 20.04).

Looking back after solving the issue, it's funny to see "zeroconf" not working without some configuration. ;)

@giox069
Copy link
Author

giox069 commented Nov 29, 2020

it's funny to see "zeroconf" not working without some configuration. ;)

Great! 👍
And yes, the regression is a kind of punishment for the end user.

I hope I will find some time to investigate on MacOS further, but I have no real idea now on how to find more info from it: is a kind of blackbox.

And about 3: "Companies using ".local" domain in DNS do violate RFC 6762". Yes, I'm a sysadmin, using Active Directory. I admit that 12-13 years ago I setup a small company domain controller server with "xxx.local" DNS domain. I discovered the problem 6 years ago, and changed AD domain and all clients to another domain. Luckily, only 15 clients/servers were involved.
But I can imagine that bigger setups are very difficult, but not impossible, to migrate.

@satirebird
Copy link

I also spent several hours to investigate printing problems. I figured out that the mdns address resolution didn't worked after system upgrade. Finally I found this issue. Due to unknown reason my router / ISP answers with a SOA for the .local domain. For me it was easy to fix the issue by adding the mdns.allow file and switch from mdns_minimal to mdns to disable the heuristic.

@Mek101
Copy link

Mek101 commented Oct 13, 2021

I can say SOA heuristics also gave me headaches in #79

@egalanos
Copy link

egalanos commented Jul 1, 2022

I'm another victim of the SOA heuristic trying to setup a new printer.

I've seen a lot of forum messages when trying to diagnose this with people struggling with their printer setup as it is seen by avahi and then cups is unable to resolve the name. Most seem to end up hard coding the IP not realising what the problem was (though it could be mdns not being configured at all).

I would imagine it is much more likely that a .local domain is intended to be resolved with mdns than the local unicast DNS, thus this heuristic is wrong.

@agoode Why not just delete the heuristic altogether? The /etc/nsswitch.conf config will go to the next source (unless the NOTFOUND action is specified). Leave it to the distributions/end user if they want to check .local in DNS rather than having nss-dns trying to be too clever.

@agoode
Copy link
Collaborator

agoode commented Jul 1, 2022

I implemented the heuristic based on the current guidance from Apple at the time (the now-deleted https://support.apple.com/en-us/HT201275). This was to fix crbug.com/626377.

It may be the case that the heuristic causes more harm than good these days, and/or that nss-mdns is the only thing still implementing it.

I'd like to know if Windows and Mac (and iOS) devices still implement the unicast SOA heuristic. If they don't, maybe we can drop it. If they use some other mechanism instead, maybe we'd need to implement that. I could try poking around but it would save time if others could do the investigation and post the results here.

Basically, we need to know what happens when your DNS server returns records for .local domains that conflict with mDNS names on a variety of platforms, and if there is some administrative way of changing this behavior.

@pemensik
Copy link
Member

pemensik commented Dec 6, 2022

I think this check has unfortunate implementation, which should be modified to something else. I understand it was implemented to avoid regressions on existing networks.

What I think would be a better replacement is a quick mode. Current configuration is quite unusable when the searched name is not found in MDNS. It takes 5 seconds to timeout with mdns4_minimal. It takes 10 seconds with mdns_minimal, which provides also IPv6 addresses. That is too much. I think the primary reason for disabling .local domain when SOA exists is making unicast DNS responses usable. That requires two things:

  • [NOTFOUND=return] would not abort additional search in DNS
  • The search would be significantly shortened.

On solid networks positive responses take up to 20ms, on worse netwoks it takes up to 100ms. I think we should instruct avahi to make faster query and abort sooner. Then it should not authoritatively say not found, but instead report unavail. That would make nss hosts module to continue to dns, where it can search also unicast space. If we waited just up to 500ms, it would still have enough time for retries, but not take too long to make it unusable for those having still records in .local unicast zone.

And of course something should be done to speed up search for names on both IPv6 and IPv4 concurrently.

@agoode
Copy link
Collaborator

agoode commented Jan 16, 2023

I'm going to make a new release at some point to include #84. It would be good to see if this issue can then be closed.

@giox069
Copy link
Author

giox069 commented Aug 10, 2024

After years, unfortunately I'm no longer able to test that setup with the same ISP. So I'm closing this issue, in the hope that it has been fixed with #84

@giox069 giox069 closed this as completed Aug 10, 2024
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants