Consistent successful lookups of DNSSEC-signed FQDNs
Unexpected behaviour you saw
Sporadic lookup failures of DNSSEC-signed FQDNs due to «missing-key»
Steps to reproduce the problem
Start out with a /etc/systemd/resolved.conf that enables DNSSEC and specifies upstream DNS servers that are known to support DNSSEC (note that I can easily reproduce the issue with other DNS servers too, so this does not appear to be an upstream DNS server issue):
Restart systemd-resolved and make two queries for an FQDN known to have valid DNSSEC signatures. In this case, I'm using ring.nlnog.net, which has perfect test results both on Verisign Labs's DNSSEC Analyzer and on DNSViz.
(The issue occurs with other unrelated FQDNs as well, e.g., www.jernia.no. However, not all FQDNs suffer from the issue - I've been unable to reproduce the issue for www.ripe.net, for example.)
Observe how the first query tend to fail with «resolve call failed: DNSSEC validation failed: missing-key», while the subsequent query tend to go work fine:
$ date; sudo systemctl restart systemd-resolved.service; sleep 3; resolvectl query ring.nlnog.net; sleep 3; resolvectl query ring.nlnog.net su. 12. mai 10:21:49 +0200 2019ring.nlnog.net: resolve call failed: DNSSEC validation failed: missing-keyring.nlnog.net: 95.211.149.24 -- link: wlp2s0-- Information acquired via protocol DNS in 279.3ms.-- Data is authenticated: yes
I am attaching a PCAP file containing the traffic between the upstream DNS server and systemd-resolved captured during the above console output. Frames 1-16 are from the initial failing query issued at 10:21:52, while frames 17-24 are from the successful subsequent query at 10:21:56. I am also attaching debug-level journal output from systemd-resolved taken in the same time span.
Note that the first lookup does not always fail, so a few attempts might be necessary to reproduce.
Non-debug log messages from systemd-resolved vary slightly from when a query fails, sometimes it will log four lines, like so:
mai 12 10:41:42 sloth.fud.no systemd-resolved[15103]: DNSSEC validation failed for question nlnog.net IN DNSKEY: missing-key
mai 12 10:41:42 sloth.fud.no systemd-resolved[15103]: DNSSEC validation failed for question ring.nlnog.net IN DS: missing-key
mai 12 10:41:42 sloth.fud.no systemd-resolved[15103]: DNSSEC validation failed for question ring.nlnog.net IN DNSKEY: missing-key
mai 12 10:41:42 sloth.fud.no systemd-resolved[15103]: DNSSEC validation failed for question ring.nlnog.net IN A: missing-key
Other times, it will only log two lines, like so:
mai 12 10:43:15 sloth.fud.no systemd-resolved[15591]: DNSSEC validation failed for question ring.nlnog.net IN DNSKEY: missing-key
mai 12 10:43:15 sloth.fud.no systemd-resolved[15591]: DNSSEC validation failed for question ring.nlnog.net IN A: missing-key
This can also be demonstrated by issuing 100 queries in rapid succession after flushing the caches. Typically I will get 0, 1 or 2 failures initially; the remainder of the queries will be successfully.
I'm seeing this too, on only one machine, using systemd 241-7~deb10u4 (Debian Buster).
In my case the name being resolved is repo.powerdns.com. The systemd-resolved journal shows missing-key entries for "IN DNSKEY powerdns.com", "IN A repo.powerdns.com", and "IN AAAA repo.powerdns.com". The system has PowerDNS Recursor 4.3.2-1pdns.buster running on 127.0.0.1 and systemd-resolved is configured to use it. Sending the same queries directly to pdns-recursor using 'dig' produces the correct results.
I can't get any queries for this name to resolve without producing the 'missing-key' result. As a workaround I've stopped using the systemd-resolved stub resolver on this system.
systemd version the issue has been seen with
Used distribution
Expected behaviour you didn't see
Unexpected behaviour you saw
Steps to reproduce the problem
Start out with a
/etc/systemd/resolved.confthat enables DNSSEC and specifies upstream DNS servers that are known to support DNSSEC (note that I can easily reproduce the issue with other DNS servers too, so this does not appear to be an upstream DNS server issue):For the record, here's the output from
resolvectl statuswith the above configuration.Restart
systemd-resolvedand make two queries for an FQDN known to have valid DNSSEC signatures. In this case, I'm usingring.nlnog.net, which has perfect test results both on Verisign Labs's DNSSEC Analyzer and on DNSViz.(The issue occurs with other unrelated FQDNs as well, e.g.,
www.jernia.no. However, not all FQDNs suffer from the issue - I've been unable to reproduce the issue forwww.ripe.net, for example.)Observe how the first query tend to fail with «resolve call failed: DNSSEC validation failed: missing-key», while the subsequent query tend to go work fine:
I am attaching a PCAP file containing the traffic between the upstream DNS server and
systemd-resolvedcaptured during the above console output. Frames 1-16 are from the initial failing query issued at 10:21:52, while frames 17-24 are from the successful subsequent query at 10:21:56. I am also attaching debug-level journal output fromsystemd-resolvedtaken in the same time span.Note that the first lookup does not always fail, so a few attempts might be necessary to reproduce.
Non-debug log messages from
systemd-resolvedvary slightly from when a query fails, sometimes it will log four lines, like so:Other times, it will only log two lines, like so:
This can also be demonstrated by issuing 100 queries in rapid succession after flushing the caches. Typically I will get 0, 1 or 2 failures initially; the remainder of the queries will be successfully.
However, if I first disable caching by setting
Cache=noin/etc/systemd/resolved.conf, the results are much more erratic:It would seem that caching successfully masks the failures except for the first ones.
The text was updated successfully, but these errors were encountered: