Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression after upgrading resolved from 252.5 to 253: slow reverse DNS lookups #26594

Open
gentoo-root opened this issue Feb 26, 2023 · 11 comments
Labels
bug 🐛 Programming errors, that need preferential fixing regression ⚠️ A bug in something that used to work correctly and broke through some recent commit resolve
Milestone

Comments

@gentoo-root
Copy link
Contributor

systemd version the issue has been seen with

253

Used distribution

Arch linux

Linux kernel version used

6.1.12-arch1-1

CPU architectures issue was seen on

x86_64

Component

systemd-resolved

Expected behaviour you didn't see

Reverse DNS lookup should be fast if the upstream DNS server returns NXDOMAIN:

$ time nslookup 128.1.1.1
** server can't find 1.1.1.128.in-addr.arpa: NXDOMAIN


________________________________________________________
Executed in  150.51 millis    fish           external
   usr time    7.39 millis  553.00 micros    6.83 millis
   sys time    6.77 millis    0.00 micros    6.77 millis

exit code: 1

This is a sample output from systemd-252.5.

Unexpected behaviour you saw

After upgrading systemd to 253, the above test takes 15 seconds before failing:

$ time nslookup 128.1.1.1
;; communications error to 127.0.0.53#53: timed out
;; communications error to 127.0.0.53#53: timed out
;; communications error to 127.0.0.53#53: timed out
;; no servers could be reached



________________________________________________________
Executed in   15.03 secs      fish           external
   usr time   19.56 millis   11.18 millis    8.38 millis
   sys time    4.33 millis    0.15 millis    4.17 millis

exit code: 1

Also, instead of NXDOMAIN, it returns an I/O error with resolved itself, which suggests a bug in resolved.

Steps to reproduce the problem

  1. Upgrade systemd from 252.5 to 253, systemctl restart systemd-resolved.service.
  2. Run time nslookup 128.1.1.1 and measure time it takes to complete.
  3. With systemd 253 it completes in 15 seconds and returns a "communications error to 127.0.0.53#53: timed out", and with systemd 252.5 it completes in less than a second and returns NXDOMAIN, as it should.

Instead of 128.1.1.1, you can use any other valid and routable IPv4 or IPv6 that doesn't have a PTR record. 128.1.1.1 is an arbitrary address used for an example purpose.

Additional program output to the terminal or log subsystem illustrating the issue

$ grep '^hosts:' /etc/nsswitch.conf
hosts: mymachines resolve [!UNAVAIL=return] files myhostname dns
@gentoo-root gentoo-root added the bug 🐛 Programming errors, that need preferential fixing label Feb 26, 2023
@rpigott
Copy link
Contributor

rpigott commented Feb 27, 2023

Possible dupe of #14735? Can you capture systemd-resolved log using SYSTEMD_LOG_LEVEL=debug?

@gentoo-root
Copy link
Contributor Author

Possible dupe of #14735?

Note that this is a new regression after upgrading from 252.5 to 253, and that ticket is 3 years old.

Can you capture systemd-resolved log using SYSTEMD_LOG_LEVEL=debug?

I attached the logs for 252.5 (good) and 253 (bad). Note that I disabled IPv6 system-wide (to avoid leaking my addresses to the log), and tested nslookup 128.1.1.1, as mentioned in the bug description, but the same bug also happens for IPv6 addresses.

@yuwata
Copy link
Member

yuwata commented Mar 4, 2023

Please try to disable MulticastDNS if you do not use it.

@gentoo-root
Copy link
Contributor Author

Disabling MulticastDNS worked around the issue, but I see it's not a new option in 253, and older versions didn't suffer from this issue, even with MulticastDNS enabled.

@YHNdnzj
Copy link
Member

YHNdnzj commented Mar 4, 2023 via email

@yuwata
Copy link
Member

yuwata commented Mar 4, 2023

Disabling MulticastDNS worked around the issue, but I see it's not a new option in 253, and older versions didn't suffer from this issue, even with MulticastDNS enabled.

Yeah, (unfortunately?) the default value for MulticastDNS= was changed in v253. See e315401, the relevant discussion, and the relevant news entry.

Though, I am not familiar with the difference between mDNS and LLMNR, but LLMNR was enabled by default previously. Why LLMNR did not slow down querying DNS lookup??

@yuwata yuwata added the regression ⚠️ A bug in something that used to work correctly and broke through some recent commit label Mar 4, 2023
@gentoo-root
Copy link
Contributor Author

Yeah, (unfortunately?) the default value for MulticastDNS= was changed in v253.

No, not really, at least not on Arch Linux. I just installed systemd 252.5 again, and I see that resolved.conf has #MulticastDNS=yes commented out, which hints that it's the default. I uncommented it to be 100% sure and tested again, and I confirm that systemd 252.5 doesn't suffer from this bug even with MulticastDNS enabled.

@yuwata
Copy link
Member

yuwata commented Mar 4, 2023

The commit e315401 changes the per-link default for mDNS, not the global default.

@gentoo-root
Copy link
Contributor Author

I confirm that resolvectl mdns wlp0s20f3 on makes reverse DNS requests slow also on systemd 252.5, and resolvectl mdns wlp0s20f3 off makes them fast on systemd 253.1, so it seems that I indeed started observing the issue after the per-link default was changed.

@yuwata yuwata added this to the v254 milestone Mar 21, 2023
@rulatir
Copy link

rulatir commented Apr 27, 2023

Disabling MulticastDNS worked around the issue

Can I have a complete instruction on how to apply this workaround on my system? I know nothing about systemd resolver functionality beyond vaguely sensing that it is very complex, and I am bitten by this issue - not just because I see the "communications error... timed out" message in nslookup output, but also because renewing certificates with certbot fails due to a DNS timeout, and I strongly suspect this is the same issue.

@gentoo-root
Copy link
Contributor Author

Either set MulticastDNS=no in /etc/systemd/resolved.conf to apply it globally, or use the command from the previous message to disable it on a given interface: resolvectl mdns $iface off (doesn't persist).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Programming errors, that need preferential fixing regression ⚠️ A bug in something that used to work correctly and broke through some recent commit resolve
Development

No branches or pull requests

5 participants