Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resolve call failed: DNSSEC validation failed: failed-auxiliary #9867

Open
filbranden opened this issue Aug 14, 2018 · 64 comments
Open

resolve call failed: DNSSEC validation failed: failed-auxiliary #9867

filbranden opened this issue Aug 14, 2018 · 64 comments

Comments

@filbranden
Copy link
Member

@filbranden filbranden commented Aug 14, 2018

systemd version the issue has been seen with
Latest git (v239-525-g9e5f34a639f6)

Used distribution
Fedora Rawhide

Expected behaviour you didn't see
systemd-resolved resolving all domains.

Unexpected behaviour you saw
systemd-resolved having trouble with specific domains (in my case, savannah.gnu.org)

Steps to reproduce the problem

My ens3.network has:

[Network]
DHCP=yes
LLMNR=no
DNS=8.8.8.8 8.8.4.4
DNSOverTLS=opportunistic
IPForward=yes

My resolved.conf has:

[Resolve]
#DNS=
#FallbackDNS=8.8.8.8 8.8.4.4 2001:4860:4860::8888 2001:4860:4860::8844
#Domains=
#LLMNR=yes
#MulticastDNS=yes
#DNSSEC=allow-downgrade
DNSOverTLS=opportunistic
#Cache=yes
#DNSStubListener=udp

Output of resolvectl:

Link 2 (ens3)
      Current Scopes: DNS
       LLMNR setting: no
MulticastDNS setting: no
  DNSOverTLS setting: opportunistic
      DNSSEC setting: allow-downgrade
    DNSSEC supported: yes
  Current DNS Server: 8.8.8.8
         DNS Servers: 8.8.8.8
                      8.8.4.4
                      192.168.122.1

Resolving some domains works fine:

$ resolvectl query www.facebook.com
www.facebook.com: 157.240.22.39
                  (star-z-mini.c10r.facebook.com)

-- Information acquired via protocol DNS in 756.3ms.
-- Data is authenticated: no

And:

$ resolvectl query www.google.com
www.google.com: 216.58.195.68

-- Information acquired via protocol DNS in 94.6ms.
-- Data is authenticated: no

But this one fails, hangs for a long time and finally times out:

$ resolvectl query savannah.gnu.org
savannah.gnu.org: resolve call failed: DNSSEC validation failed: failed-auxiliary

systemd-resolved logs are here: https://gist.github.com/filbranden/ad972ab59aaf23e5c8f0013f867db1eb

Issue #9283 looks similar (same error message), not sure if it's a dupe somehow... cc @yuwata and @irtimmer.

Thanks!
Filipe

@filbranden
Copy link
Member Author

@filbranden filbranden commented Aug 14, 2018

I see two problems here that I don't really understand:

  1. DNSKEY lookup took a long time, started at 23:10:27 and only got the keys at 23:11:14, not sure why that is, if I query the resolver directly with host -t dnskey gnu.org 8.8.8.8 I seem to get a response right away... After the entries are cached, if I query it again, I get the "DNSSEC validation failed: failed-auxiliary" error right away.

  2. The log has some "Found verdict ...: insecure" lines, at that point, after it gets the DNSKEY. Assuming that that's indeed correct, and that DNSSEC is actually not configured correctly for this domain... Shouldn't "DNSSEC setting: allow-downgrade" mean it ought to fall back to using non-DNSSEC lookups?

UPDATE: I tried the lookup again. This time, it seems the entries just expired, so a new lookup was stuck again for a while. I seem to reproduce the delay fairly consistently too. But this time the lookup eventually succeeded. This time lookup of DNSKEY took from 23:38:30 to 23:39:23. The verdicts were all insecure. But it ended up replying with an IN A address for the domain.

@yuwata
Copy link
Member

@yuwata yuwata commented Aug 14, 2018

Note that -ENODATA is returned by dns_resource_record_is_synthetic().

@yuwata yuwata added the resolve label Sep 3, 2018
@filbranden
Copy link
Member Author

@filbranden filbranden commented Sep 12, 2018

Ping? I can still reproduce this one, as of current master (v239-751-g49cdae63d168). Any ideas what might be going wrong here?

I can try to check back when this actually worked (if it actually has worked before) and try to bisect to see where the problem was introduced... Will try to do that now.

@filbranden
Copy link
Member Author

@filbranden filbranden commented Dec 18, 2018

I wonder if this is fixed by #11194 or not... Will try to see if I can still reproduce it on HEAD and, if so, check #11194 on top of it to see whether that fixes this too...

@filbranden
Copy link
Member Author

@filbranden filbranden commented Dec 19, 2018

Ping? I can still reproduce this issue consistently.

Just tried it with a build that includes @poettering's #11194 and it still doesn't work...

Let me know if you'd like me to collect fresh logs for this one, I'd be happy to.

Cheers,
Filipe

@kyrias
Copy link

@kyrias kyrias commented Dec 20, 2018

I think that the core issue is that while the gnu.org. zone is signed, they haven't set up a DS record to be published by org., and because of this org. returns a NSEC3 "proving" that it's not signed. This seems to trip up resolved when it discovers that gnu.org. actually is signed.

@filbranden
Copy link
Member Author

@filbranden filbranden commented Dec 20, 2018

I'm wondering if other DNSSEC clients/implementations also have trouble with gnu.org. domains...

In other words: Is this a resolved bug? Or is this an actual problem with the way gnu.org. set up their DNSSEC and rejecting their domains is the proper approach here?

Pragmatically, what can we do about it? Currently I have disabled resolving through resolved on my machines, since not being able to checkout git repos from savannah.gnu.org is a problem I'd rather not live with right now...

@Ferdi265
Copy link

@Ferdi265 Ferdi265 commented Dec 23, 2018

Currently running into this on my ArchLinux machine. I'd really not want to need to either disable DNSSEC or use a nonlocal resolver.

Also adding that one of the other affected domains is youtu.be.

@Hi-Angel
Copy link

@Hi-Angel Hi-Angel commented Dec 25, 2018

I'm wondering if other DNSSEC clients/implementations also have trouble with gnu.org. domains...

No, I asked a gf it she can ping savannah.gnu.org from her Win7, and everything is fine. Meanwhile, I have the problem on both of my Archlinux machines.

@Ferdi265
Copy link

@Ferdi265 Ferdi265 commented Dec 25, 2018

Can also confirm that dig +dnssec savannah.gnu.org @$LOCAL_DNS_SERVER_HERE correctly resolves. (On the same machine that resolved fails)

Edit: with "local dns server" I mean the one provided by my router, not systemd-resolved.

@fabolhak
Copy link

@fabolhak fabolhak commented Feb 17, 2019

I also had a lot of problems with resolving various names like repository.spotify.com:
https://gist.github.com/fabolhak/4dba7e4c167f9cf0e36c0195ebe8931d

My configuration was:

[Resolve]
DNS=46.182.19.48
FallbackDNS=1.1.1.1
DNSSEC=allow-downgrade
DNSOverTLS=opportunistic
Cache=yes

dig +dnssec @46.182.19.48 repository.spotify.com however worked fine for me.

I removed the DNSOverTLS=opportunistic. Now it seems to work for me.

@Hi-Angel
Copy link

@Hi-Angel Hi-Angel commented Apr 13, 2019

I also had a lot of problems with resolving various names like repository.spotify.com:
https://gist.github.com/fabolhak/4dba7e4c167f9cf0e36c0195ebe8931d

My configuration was:

[Resolve]
DNS=46.182.19.48
FallbackDNS=1.1.1.1
DNSSEC=allow-downgrade
DNSOverTLS=opportunistic
Cache=yes

dig +dnssec @46.182.19.48 repository.spotify.com however worked fine for me.

I removed the DNSOverTLS=opportunistic. Now it seems to work for me.

Hmm, workaround doesn't work for me. I have:

$ grep -E "^[^#]" /etc/systemd/resolved.conf
[Resolve]
DNSSEC=allow-downgrade
Cache=yes

The only difference is I don't set DNS explicitly, and per the suggestion I don't have the DNSOverTLS line.

Btw, I think these values are actually defaults.

@fabolhak
Copy link

@fabolhak fabolhak commented Apr 18, 2019

There were some updates lately. Unfortunately, I didn't had time to check whether the problem is fixed. My current (working) configuration is:

[Resolve]
DNS=46.182.19.48
FallbackDNS=1.1.1.1
#Domains=
#LLMNR=yes
#MulticastDNS=yes

#[Service]
#Environment=SYSTEMD_LOG_LEVEL=debug
DNSSEC=allow-downgrade
#DNSOverTLS=opportunistic
Cache=yes
DNSStubListener=udp
ReadEtcHosts=yes

@Ferdi265
Copy link

@Ferdi265 Ferdi265 commented Apr 19, 2019

@fabolhak on the machine I originally encountered this problem, resolving savannah.gnu.org still doesn't work when DNSSEC is enabled

@ShapeShifter499
Copy link
Contributor

@ShapeShifter499 ShapeShifter499 commented Apr 26, 2019

I have DNSSEC enabled and I was running into archlinux.org: resolve call failed: DNSSEC validation failed: failed-auxiliary

Setting FallbackDNS= to nothing and restarting both systemd-networkd and systemd-resolved helped it seems.

My system gets both a IPv4 (dynamic) address and a IPv6 (Comcast, doesn't seem dynamic) address

systemd 242 (242.0-3-arch)
What 'resolvectl' has for my system below.

[root@kumo resolved.conf.d]# resolvectl 
Global
       LLMNR setting: no
MulticastDNS setting: yes
  DNSOverTLS setting: opportunistic
      DNSSEC setting: yes
    DNSSEC supported: yes
          DNSSEC NTA: 10.in-addr.arpa
                      16.172.in-addr.arpa
                      168.192.in-addr.arpa
                      17.172.in-addr.arpa
                      18.172.in-addr.arpa
                      19.172.in-addr.arpa
                      20.172.in-addr.arpa
                      21.172.in-addr.arpa
                      22.172.in-addr.arpa
                      23.172.in-addr.arpa
                      24.172.in-addr.arpa
                      25.172.in-addr.arpa
                      26.172.in-addr.arpa
                      27.172.in-addr.arpa
                      28.172.in-addr.arpa
                      29.172.in-addr.arpa
                      30.172.in-addr.arpa
                      31.172.in-addr.arpa
                      corp
                      d.f.ip6.arpa
                      home
                      internal
                      intranet
                      lan
                      local
                      private
                      test

Link 2 (enp1s0)
      Current Scopes: DNS
DefaultRoute setting: yes
       LLMNR setting: no
MulticastDNS setting: no
  DNSOverTLS setting: opportunistic
      DNSSEC setting: yes
    DNSSEC supported: yes
  Current DNS Server: fd72:b86:dead::1
         DNS Servers: 192.168.1.1
                      fd72:b86:dead::1
[root@kumo resolved.conf.d]#

@filbranden
Copy link
Member Author

@filbranden filbranden commented May 1, 2019

I looked into this a bit further, it turns out DNSViz shows there's indeed a problem with savannah.gnu.org, see screenshot below or check for yourself:

dnsviz-savannah

It lists delegation status as Bogus for gnu.org to savannah.gnu.org (not sure what Bogus means in this context.)

Under Errors it lists:

  • NSEC proving non-existence of savannah.gnu.org/DS: The NS bit was not set in the bitmap of the NSEC RR corresponding to the delegated name (savannah.gnu.org).
  • NSEC proving non-existence of savannah.gnu.org/DS: The NS bit was not set in the bitmap of the NSEC RR corresponding to the delegated name (savannah.gnu.org).

So it turns out this domain seems to be misconfigured, indeed. Not really sure how to report this to the owners of the domain, so they get it fixed... Also not really sure whether resolved should do something different in this case (such as resolve it without any DNSSEC?), it seems other resolvers are able to resolve this domain, not really sure where the difference is and whether that's really the correct behavior...

BTW, I can't reproduce this with archlinux.org (maybe there was a problem there, but has already been fixed?)

Will keep looking...

@filbranden
Copy link
Member Author

@filbranden filbranden commented May 2, 2019

Reported to hostmaster of gnu.org by e-mail, got a reply saying:

We know about this and we are working on it, but thanks for the reminder!

So hopefully that domain will be fixed shortly!

Cheers,
Filipe

@toreanderson
Copy link
Contributor

@toreanderson toreanderson commented May 13, 2019

It would appear that most resolvers will happily resolve FQDNs such as savannah.gnu.org, but simply don't consider them authenticated (i.e., by omitting the ad flag in the response):

$ dig @1.1.1.1 savannah.gnu.org. +dnssec  | grep ';; flags'
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
$ dig @8.8.8.8 savannah.gnu.org. +dnssec  | grep ';; flags'
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

I think systemd-resolved ought to behave in the same way.

A practical consequence of the current behaviour is that it becomes impossible for a domain owner to gracefully deploy (or remove) DNSSEC signatures without interruption, as he has no control over when the parent (gTLD/ccTLD) zone is reloaded and the DS record becomes visible. It is impossible to in an «atomic»/instantaneous manner transition from a having no DS in the parent zone and no signatures in zone itself to a state where the zone is signed and DS records exists in the parent zone (or vice versa) - even before taking TTL and caching into account.

The normal procedure when deploying DNSSEC is to first sign the zone, wait for all slaves to pick it up and for TTLs to expire, and only then push the DS records to the parent zone via the registrar. This procedure could take hours or even days, during which systemd-resolved with DNSSEC enabled will fail to resolve any hostname in the zone (as I understand it).

RFC 4033 section 5 contains some guidance on how to behave in this this situation:

   Insecure: The validating resolver has a trust anchor, a chain of
      trust, and, at some delegation point, signed proof of the
      non-existence of a DS record.  This indicates that subsequent
      branches in the tree are provably insecure.  A validating resolver
      may have a local policy to mark parts of the domain space as
      insecure.

   Bogus: The validating resolver has a trust anchor and a secure
      delegation indicating that subsidiary data is signed, but the
      response fails to validate for some reason: missing signatures,
      expired signatures, signatures with unsupported algorithms, data
      missing that the relevant NSEC RR says should be present, and so
      forth.

   Indeterminate: There is no trust anchor that would indicate that a
      specific portion of the tree is secure.  This is the default
      operation mode.

As I understand it, savannah.gnu.org matches the Insecure definition, as there's no DS record for gnu.org in the org gTLD zone. It is not Bogus.

The RFC goes on to say:

   This specification only defines how security-aware name servers can
   signal non-validating stub resolvers that data was found to be bogus
   (using RCODE=2, "Server Failure"; see [RFC4035]).

And also:

   This specification does not define a format for communicating why
   responses were found to be bogus or marked as insecure.  The current
   signaling mechanism does not distinguish between indeterminate and
   insecure states.

However, even though savannah.gnu.org is not Bogus, systemd-resolved responds with SERVFAIL when queried for it:

$ dig @127.0.0.53 savannah.gnu.org. SOA | grep status:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 41084

This seems improper to me. The correct thing to do, as I understand it, would be to answer the query without the ad bit absent - i.e., the same as what it would do with Insecure answers.

@azurit
Copy link

@azurit azurit commented Jun 24, 2019

Having the same problem. After activating DNSSEC in systemd-resolved, Facebook stopped showing most of the images:

$ resolvectl query scontent.fada1-4.fna.fbcdn.net
scontent.fada1-4.fna.fbcdn.net: resolve call failed: DNSSEC validation failed: failed-auxiliary

systemd version 239 (Kubuntu 18.10).

@nkukard
Copy link

@nkukard nkukard commented Jul 18, 2019

I've got the same issue on systemd 242.32-3 (Arch).

$ resolvectl query en.wikipedia.org
en.wikipedia.org: resolve call failed: DNSSEC validation failed: failed-auxiliary

$resolvectl query wiki.archlinux.org
wiki.archlinux.org: resolve call failed: DNSSEC validation failed: failed-auxiliary

It doesn't look like there are any errors with the DNSSEC though...
http://dnsviz.net/d/en.wikipedia.org/dnssec/
http://dnsviz.net/d/wiki.archlinux.org/dnssec/

Querying my upstream provider DNSSEC-enabled resolvers work fine.

@Hi-Angel
Copy link

@Hi-Angel Hi-Angel commented Aug 4, 2019

Someone reported on a somewhat similar issue that setting DNSSEC=no in resolved.conf helped them.

I just tried it, and it helps with this issue too.

@nkukard
Copy link

@nkukard nkukard commented Aug 4, 2019

Someone reported on a somewhat similar issue that setting DNSSEC=no in resolved.conf helped them.

I just tried it, and it helps with this issue too.

Disabling DNSSEC does work for me, although I'd really prefer not to do that :/

@hexchain
Copy link
Contributor

@hexchain hexchain commented Sep 3, 2019

I've got the same issue with systemd 242.84 on Arch. However, without restarting resolved, even if I manually execute resolvectl dnssec wlp58s0 no, some queries still fail:

Sep 03 07:10:02 host systemd-resolved[453]: DNSSEC validation failed for question dnssec-analyzer.verisignlabs.com IN A: failed-auxiliary
Sep 03 07:10:02 host systemd-resolved[453]: DNSSEC validation failed for question dnssec-analyzer.verisignlabs.com IN AAAA: failed-auxiliary
Sep 03 07:10:02 host systemd-resolved[453]: DNSSEC validation failed for question dnssec-analyzer.verisignlabs.com IN A: failed-auxiliary
Sep 03 07:10:02 host systemd-resolved[453]: DNSSEC validation failed for question dnssec-analyzer.verisignlabs.com IN AAAA: failed-auxiliary
Sep 03 07:10:03 host systemd-resolved[453]: DNSSEC validation failed for question dnssec-analyzer.verisignlabs.com IN A: failed-auxiliary
Sep 03 07:10:03 host systemd-resolved[453]: DNSSEC validation failed for question dnssec-analyzer.verisignlabs.com IN AAAA: failed-auxiliary
Sep 03 07:10:03 host systemd-resolved[453]: DNSSEC validation failed for question dnssec-analyzer.verisignlabs.com IN A: failed-auxiliary
Sep 03 07:10:03 host systemd-resolved[453]: DNSSEC validation failed for question dnssec-analyzer.verisignlabs.com IN AAAA: failed-auxiliary
Sep 03 07:10:08 host systemd-resolved[453]: DNSSEC validation failed for question dnssectest.sidnlabs.nl IN AAAA: failed-auxiliary
Sep 03 07:10:08 host systemd-resolved[453]: DNSSEC validation failed for question dnssectest.sidnlabs.nl IN A: failed-auxiliary
Sep 03 07:10:09 host systemd-resolved[453]: DNSSEC validation failed for question dnssectest.sidnlabs.nl IN A: failed-auxiliary
Sep 03 07:10:09 host systemd-resolved[453]: DNSSEC validation failed for question dnssectest.sidnlabs.nl IN AAAA: failed-auxiliary
Sep 03 07:10:11 host systemd-resolved[453]: DNSSEC validation failed for question dnssec-analyzer.verisignlabs.com IN AAAA: failed-auxiliary
Sep 03 07:10:11 host systemd-resolved[453]: DNSSEC validation failed for question dnssec-analyzer.verisignlabs.com IN A: failed-auxiliary
Sep 03 07:10:11 host systemd-resolved[453]: DNSSEC validation failed for question dnssec-analyzer.verisignlabs.com IN A: failed-auxiliary
Sep 03 07:10:11 host systemd-resolved[453]: DNSSEC validation failed for question dnssec-analyzer.verisignlabs.com IN AAAA: failed-auxiliary
Sep 03 07:10:12 host systemd-resolved[453]: DNSSEC validation failed for question dnssectest.sidnlabs.nl IN A: failed-auxiliary
Sep 03 07:10:12 host systemd-resolved[453]: DNSSEC validation failed for question dnssectest.sidnlabs.nl IN AAAA: failed-auxiliary
Sep 03 07:10:13 host systemd-resolved[453]: DNSSEC validation failed for question dnssectest.sidnlabs.nl IN A: failed-auxiliary
Sep 03 07:10:13 host systemd-resolved[453]: DNSSEC validation failed for question dnssectest.sidnlabs.nl IN AAAA: failed-auxiliary

Then I executed resolvectl reset-server-features, and after a while it correctly figured out that the upstream server does not support DNSSEC:

Sep 03 07:15:02 host systemd-resolved[453]: Resetting learnt feature levels on all servers.
Sep 03 07:15:03 host systemd-resolved[453]: DNSSEC validation failed for question cdn.dnsv1.com IN SOA: failed-auxiliary
Sep 03 07:15:03 host systemd-resolved[453]: DNSSEC validation failed for question com.cdn.dnsv1.com IN DS: failed-auxiliary
Sep 03 07:15:03 host systemd-resolved[453]: DNSSEC validation failed for question com.cdn.dnsv1.com IN SOA: failed-auxiliary
...
Sep 03 07:15:07 host systemd-resolved[453]: Server 192.168.1.1 does not support DNSSEC, downgrading to non-DNSSEC mode.

@Xohwie1i
Copy link

@Xohwie1i Xohwie1i commented Sep 27, 2019

Getting the same exact issue with systemd 243.51-1 on Arch for savannah.gnu.org. Disabling DNSSEC works for me but, as others have already pointed out, I'd prefer not to do that. Let me know if there are any logs I can provide to help with this.

edit: not sure if this will be helpful or not, but, this bug seems to be quite inconsistent. But that I mean, it depends on what network I'm on. The bug appears on my home (comcast) wifi, but does not when I'm connected to the coffeeshop wifi.

@Silur
Copy link

@Silur Silur commented Aug 29, 2020

why does everyone think disabling a bugged security feature is a solution? do you also disable your antivirus because it makes your PC slow?

@Nothing4You
Copy link

@Nothing4You Nothing4You commented Aug 29, 2020

probably because the bugged security feature is preventing productivity. that off-topic here though.

@kkaosninja
Copy link

@kkaosninja kkaosninja commented Aug 29, 2020

why does everyone think disabling a bugged security feature is a solution? do you also disable your antivirus because it makes your PC slow?

@Silur I first noticed the error when I was attempting to install Rust. I could not even download the shell script required(rustup) to do so. The domain its hosted on is sh.rustup.rs. Have reproduced the error below for you to see.

~                                                                                                                                                                 
❯ nslookup sh.rustup.rs
Server:		127.0.0.53
Address:	127.0.0.53#53

** server can't find sh.rustup.rs: SERVFAIL


~                                                                                                                                                                 
❯ resolvectl query sh.rustup.rs
sh.rustup.rs: resolve call failed: DNSSEC validation failed: failed-auxiliary

~                                                                                                                                                                 
❯ resolvectl query rustup.rs
rustup.rs: 13.33.144.77                        -- link: wlo1
           13.33.144.104                       -- link: wlo1
           13.33.144.127                       -- link: wlo1
           13.33.144.67                        -- link: wlo1

-- Information acquired via protocol DNS in 871us.
-- Data is authenticated: no 

Also, I wanted to install the Signal messenger. But its even its repo cannot be resolved

❯ nslookup updates.signal.org
Server:		127.0.0.53
Address:	127.0.0.53#53

** server can't find updates.signal.org: SERVFAIL


~                                                                                                                                                                 
❯ resolvectl query updates.signal.org
updates.signal.org: resolve call failed: DNSSEC validation failed: failed-auxiliary

My systemd-resolved status and version if you want to reproduce. ( servers configured are Quad9 and Cleanbrowsing anti-malware DNS resolvers )

Global
       LLMNR setting: no
MulticastDNS setting: no
  DNSOverTLS setting: opportunistic
      DNSSEC setting: allow-downgrade
    DNSSEC supported: yes
  Current DNS Server: 9.9.9.11
         DNS Servers: 9.9.9.11
                      149.112.112.11
Fallback DNS Servers: 185.228.168.9
                      185.228.169.9
          DNS Domain: ~.
          DNSSEC NTA: 10.in-addr.arpa
...
...

❯ systemd --version
systemd 245 (245.4-4ubuntu3.2pop1~1596049172~20.04~c9d8f2b)
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid

This is why I turned off DNSSEC. As @Nothing4You mentioned, it was hampering my productivity.

The only commonality I found between these two domains was that they both have CNAME records pointing to a cloudfront subdomain. (Below responses received after turning off DNSSEC, restarting systemd-resolved and flushing the cache).

❯ resolvectl query sh.rustup.rs
sh.rustup.rs: 13.33.144.77                     -- link: wlo1
              13.33.144.120                    -- link: wlo1
              13.33.144.44                     -- link: wlo1
              13.33.144.50                     -- link: wlo1
              (dks7yomi95k2d.cloudfront.net)

-- Information acquired via protocol DNS in 927us.
-- Data is authenticated: no

~                                                                                                                                                                 07:48:01 PM
❯ resolvectl query updates.signal.org
updates.signal.org: 13.35.254.78               -- link: wlo1
                    13.35.254.63               -- link: wlo1
                    13.35.254.121              -- link: wlo1
                    13.35.254.124              -- link: wlo1
                    (d1ugeilqty6idm.cloudfront.net)

-- Information acquired via protocol DNS in 1.0ms.
-- Data is authenticated: no

Funnily enough, the presence of CNAME records themselves does not seem to the cause of the problem. As it does not seem to affect the VS Code repo(checked below after turning DNSSEC back to allow-downgrade) and the GitHub raw files domain, hosted on Azure and Fastly respectively.

❯ resolvectl query sh.rustup.rs
sh.rustup.rs: resolve call failed: DNSSEC validation failed: failed-auxiliary

~                                                                                                                                                                 08:05:32 PM
❯ resolvectl query packages.microsoft.com
packages.microsoft.com: 20.188.102.6           -- link: wlo1
                        (csd-apt-sea-d-2.southeastasia.cloudapp.azure.com)

-- Information acquired via protocol DNS in 320.5ms.
-- Data is authenticated: no

~                                                                                                                                                                 08:05:35 PM
❯ nslookup packages.microsoft.com
Server:		127.0.0.53
Address:	127.0.0.53#53

Non-authoritative answer:
packages.microsoft.com	canonical name = apt-geofence-parent.trafficmanager.net.
apt-geofence-parent.trafficmanager.net	canonical name = csd-apt-sea-d-2.southeastasia.cloudapp.azure.com.
Name:	csd-apt-sea-d-2.southeastasia.cloudapp.azure.com
Address: 20.188.102.6

~
❯ resolvectl query raw.githubusercontent.com
raw.githubusercontent.com: 151.101.8.133       -- link: wlo1
                           (github.map.fastly.net)

-- Information acquired via protocol DNS in 146.9ms.
-- Data is authenticated: no

@marka63
Copy link

@marka63 marka63 commented Oct 26, 2020

@ian-kelling The problem with savannah.gnu.org was that there weren't delegating NS record in the gnu.org zone. With DNSSEC delegation errors like this are exposed as the DS response comes from the parent zone. With plain DNS and the parent and child zones both being served by the same set of servers clients never see referral responses for the child zone so the lack of NS records is not obvious.

NSEC proving non-existence of savannah.gnu.org/DS: The NS bit was not set in the bitmap of the NSEC RR corresponding to the delegated name (savannah.gnu.org).

@marka63
Copy link

@marka63 marka63 commented Oct 26, 2020

Looking at this ticket there is no evidence of a bug as the reports do not capture the DNSSEC records (DNSKEY, DS, RRSIG) at the time of the issue.

There is evidence of DNSSEC failures due to operator error, e.g. savannah.gnu.org was not properly delegated (no NS records in gnu.org for savannah.gnu.org), clocks being wrong or zone not being re-signed in time (signature-expired being reported).

DNSSEC errors aren't hard to diagnose if you actually take a little bit of time to learn how DNSSEC works:

  • The RRSIGs need to have the timestamps that cover 'now'
  • For each DS record in the parent zone there needs to be a matching DNSKEY record in the child zone. 99.999% of the time you can do this by looking at the key id/tag and algorithm fields of the DS records and have tools like dig report the key tag of the DNSKEY record
  • Is there a NSEC/NSEC3 at/for the delegation point with NS bit set (this captures missing delegating NS RRsets) or is it in a OPTOUT range (NSEC3 only)?
[beetle:~/git/bind9] marka% dig ds isc.org
;; BADCOOKIE, retrying.

; <<>> DiG 9.15.4 <<>> ds isc.org
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 26316
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: e8e390a69330bfc6010000005f975e0cd4a52b2db50b69d9 (good)
;; QUESTION SECTION:
;isc.org.			IN	DS

;; ANSWER SECTION:
isc.org.		12869	IN	DS	7250 13 2 A30B3F78B6DDE9A4A9A2AD0C805518B4F49EC62E7D3F4531D33DE697 CDA01CB2

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Tue Oct 27 10:38:52 AEDT 2020
;; MSG SIZE  rcvd: 112

[beetle:~/git/bind9] marka% 
[beetle:~/git/bind9] marka% dig +rrcomments dnskey isc.org
;; BADCOOKIE, retrying.

; <<>> DiG 9.15.4 <<>> +rrcomments dnskey isc.org
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 5342
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: daafec85f030b7bb010000005f975c5b3b5f84aa51f0b55c (good)
;; QUESTION SECTION:
;isc.org.			IN	DNSKEY

;; ANSWER SECTION:
isc.org.		5930	IN	DNSKEY	256 3 13 1CS+VQcRn4lGTK+b3wDjVO0hFDx4DV7s3Q1Fwxuq9ahd255FRny4f4vd ZOMMMxpbRH5Zhwoh/706IV0v9JwjlA==  ; ZSK; alg = ECDSAP256SHA256 ; key id = 27566
isc.org.		5930	IN	DNSKEY	257 3 13 zEoOfseNFDM+E8spu7RR2Ar/GzFqAehe4yapWLiv6McIUF6xmI5GcIQ3 +uLAizS2cNWHt6EArVj8ogjtrRXwfw==  ; KSK; alg = ECDSAP256SHA256 ; key id = 7250

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Tue Oct 27 10:31:39 AEDT 2020
;; MSG SIZE  rcvd: 224

[beetle:~/git/bind9] marka% 

@toreanderson
Copy link
Contributor

@toreanderson toreanderson commented Oct 27, 2020

Looking at this ticket there is no evidence of a bug as the reports do not capture the DNSSEC records (DNSKEY, DS, RRSIG) at the time of the issue.

Actually, the only thing that should matter is that there were (and still are) no DS records for gnu.org in the parent zone, as I noted in a previous comment.

That is, dig @a0.org.afilias-nst.info. DS +norec +short gnu.org. yields no output.

This means anything below *.gnu.org is by the RFC definition considered Insecure, regardless of any incorrectly configured DNSSEC records within the gnu.org zone. However, systemd-resolved instead responds with SERVFAIL as if it was Bogus.

@marka63
Copy link

@marka63 marka63 commented Oct 27, 2020

Using test zones (https://workbench.sidnlabs.nl/bad-dnssec.html) may help this one move along as one won't be testing against moving targets (a.k.a. production zones).

nods.bad-dnssec.wb.sidnlabs.nl and ok.nods.bad-dnssec.wb.sidnlabs.nl are signed zones that should validate as insecure and do with BIND 9 (below). The should also approximate savannah.gnu.org original state (these aren't missing the delegating NS RRset). I don't have access to systemd to perform the same lookups with it.

% dig nods.bad-dnssec.wb.sidnlabs.nl +dnssec ds
;; BADCOOKIE, retrying.

; <<>> DiG 9.15.4 <<>> nods.bad-dnssec.wb.sidnlabs.nl +dnssec ds
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 38323
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 4, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 4096
; COOKIE: e50f3755cba96aaf010000005f97954ef0b6558754c32b3f (good)
;; QUESTION SECTION:
;nods.bad-dnssec.wb.sidnlabs.nl.	IN	DS

;; AUTHORITY SECTION:
nods.bad-dnssec.wb.sidnlabs.nl.	3000 IN	RRSIG	NSEC 8 5 3600 20300101000000 20190314080032 56725 bad-dnssec.wb.sidnlabs.nl. 8+U1ILt3EFDGV6IjTCcJK7F8R9C1dShB3oTEBD8FKfLSKuRp88EDxS+S Ts0LcIYG+vtSi+1yBwR3Pdr66mHjINCAGUKqiiNJ+oB6HzJJx046XYcY UuZ81U4oBopKbM4A+sgC/D3sZJb8rkBCNN5J2jConGATlowa1PZIuHfM Dx0=
nods.bad-dnssec.wb.sidnlabs.nl.	3000 IN	NSEC	ok.bad-dnssec.wb.sidnlabs.nl. NS RRSIG NSEC
bad-dnssec.wb.sidnlabs.nl. 3000	IN	SOA	bind9.sidnlabs.nl. hostmaster.sidnlabs.nl. 1552550431 3600 600 1814400 3600
bad-dnssec.wb.sidnlabs.nl. 3000	IN	RRSIG	SOA 8 4 3600 20300101000000 20190314080032 56725 bad-dnssec.wb.sidnlabs.nl. wUgXV/JHKlz8GIholZycjvITAxLxdd+pXrhtktqkygOQUPHGgCcBdwS3 eu46co0AdSpStF1IcU4x32vrrESRO/SFssPPFEk+8JXLzbn0sr/mpX68 +DrCIJMMM7YEMdgY5jxclBcrU39Y+jfTLHlLeZYc4DnBtbzppBjBijtu AhQ=

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Tue Oct 27 14:34:38 AEDT 2020
;; MSG SIZE  rcvd: 571

% dig nods.bad-dnssec.wb.sidnlabs.nl +dnssec 
;; BADCOOKIE, retrying.

; <<>> DiG 9.15.4 <<>> nods.bad-dnssec.wb.sidnlabs.nl +dnssec
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2800
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 4096
; COOKIE: fdf3ef8441d11c1e010000005f9795522c4c7edaed50c970 (good)
;; QUESTION SECTION:
;nods.bad-dnssec.wb.sidnlabs.nl.	IN	A

;; ANSWER SECTION:
nods.bad-dnssec.wb.sidnlabs.nl.	3016 IN	A	94.198.159.39
nods.bad-dnssec.wb.sidnlabs.nl.	3016 IN	RRSIG	A 8 5 3600 20300101000000 20190314080032 37516 nods.bad-dnssec.wb.sidnlabs.nl. Ju+7AZPiF/y7b4TlQnVkSaPpakSE0Zwx43d6ycUg3F7GZaOOjJaBgPr1 uLmzSEQfUGt1/vk2A8rJfLlZixvE/tn0hzs5EeK//hYIyc6sE3GI8Y3r UgruDJL/+jbkA7S4QMaP5fnqO9AQ1Xdt6m2WK4va1CdpSBXfgQNM2h7T 92A=

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Tue Oct 27 14:34:42 AEDT 2020
;; MSG SIZE  rcvd: 293

% dig ok.nods.bad-dnssec.wb.sidnlabs.nl +dnssec
;; BADCOOKIE, retrying.

; <<>> DiG 9.15.4 <<>> ok.nods.bad-dnssec.wb.sidnlabs.nl +dnssec
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 33628
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 4096
; COOKIE: 84b169de142ae078010000005f9797b4ddad1c97c175ab24 (good)
;; QUESTION SECTION:
;ok.nods.bad-dnssec.wb.sidnlabs.nl. IN	A

;; ANSWER SECTION:
ok.nods.bad-dnssec.wb.sidnlabs.nl. 2736	IN A	94.198.159.39
ok.nods.bad-dnssec.wb.sidnlabs.nl. 2736	IN RRSIG A 8 6 3600 20300101000000 20190314080032 35635 ok.nods.bad-dnssec.wb.sidnlabs.nl. imtz3VtbBcuwe9homxESvno1+KgRrvR9wHP/k57onYPZe0XKwspAmvwE Ebz9SdOho2OLWhTyXgVY+INDSg8HarcTGcsHHX4jVCcp2t6h82glZ+hp CkXxfGgBEa5vmbLdPOe5ARrrRsxylwQJZDRqjDV6yMRa+mf3LO+LlhfX I8Q=

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Tue Oct 27 14:44:52 AEDT 2020
;; MSG SIZE  rcvd: 299
 %

@marka63
Copy link

@marka63 marka63 commented Oct 28, 2020

Discussing this on another channel full of DNS developers from multiple vendors. It appears that two levels of no DS records fails. Getting people to turn on DNSSEC is hard enough without implementation errors like this continuing to exist for years after they are reports.

% resolvectl query nods.nods.bad-dnssec.wb.sidnlabs.nl.           
nods.nods.bad-dnssec.wb.sidnlabs.nl.: resolve call failed: DNSSEC validation failed: failed-auxiliary

@marka63
Copy link

@marka63 marka63 commented Oct 28, 2020

And to show that it does work through other validators

% dig nods.nods.bad-dnssec.wb.sidnlabs.nl +dnssec
;; BADCOOKIE, retrying.

; <<>> DiG 9.15.4 <<>> nods.nods.bad-dnssec.wb.sidnlabs.nl +dnssec
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 58274
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 4096
; COOKIE: 8d1ed38356e986e8010000005f99fd32332edba071597e71 (good)
;; QUESTION SECTION:
;nods.nods.bad-dnssec.wb.sidnlabs.nl. IN	A

;; ANSWER SECTION:
nods.nods.bad-dnssec.wb.sidnlabs.nl. 3600 IN A	94.198.159.39
nods.nods.bad-dnssec.wb.sidnlabs.nl. 3600 IN RRSIG A 8 6 3600 20300101000000 20190314080032 38782 nods.nods.bad-dnssec.wb.sidnlabs.nl. nQPOiEqIDwQgUVc3cKOPYe5+YU1TZO73CDT4b4+0Ylxn5oIaCqyAUToj X8OwzAmn+CnJJqtZOIPkFnm0L7tY+E+kJrWWMh3p1jXf7mPmCRvP71OG OEqHWfXNz3JJVbiAHTZH7TbWTpwn4x0NzncM6FVOlTjxrDlVmOw2jkKk a24=

;; Query time: 3681 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu Oct 29 10:22:26 AEDT 2020
;; MSG SIZE  rcvd: 303

%

@marka63
Copy link

@marka63 marka63 commented Oct 28, 2020

Just for reference systemd has a REALLY bad reputation with other DNS vendors.

yes, I agree; but with other resolvers, we go 'oh right that should be fixed, even if the zone is also broken'; with resolved, we go 'of course, ANOTHER thing that is broken'

@v-fox
Copy link

@v-fox v-fox commented Oct 29, 2020

@marka63 Yep, now you get it.
Although, I had similar experience with Unbound in role of caching aggregator of real resolvers (multiple dnscrypt-proxy1 instances + resolved with LLMNR & MulticastDNS) and that is a fancy professional DNS server. It loves to spew SERVFAIL at random and not just on DNSSEC. In the end I just ended up using naked dnscrypt-proxy2 (even disabled "hosts" caching in nscd that for some reason loves randomly forcing IPv6-only connections that immediately fail) after some fellow managed to go through trouble of making a distro package. Luckily, DP2 even has its own auto-prioritization of servers with lowest latencies, so tuning for your geography or having multiple instances is not necessary. And it actually protects from DNS poisoning by ISPs and government "public info-security agencies" which I tested out in practice.

@marka63
Copy link

@marka63 marka63 commented Oct 29, 2020

SERVFAILs are often warranted. There are lots of broken implementations of DNS out there, mostly because recursive servers have been too permissive with results so there has been no feedback when implementations get things wrong. Without knowing the query and state of the servers at the time it is impossible to determine if SERVFAIL is valid or not.

That said this is the wrong place to discuss if UNBOUND was correct or not to return SERVFAIL.

@v-fox
Copy link

@v-fox v-fox commented Oct 29, 2020

SERVFAILs are often warranted. There are lots of broken implementations of DNS out there, mostly because recursive servers have been too permissive with results so there has been no feedback when implementations get things wrong. Without knowing the query and state of the servers at the time it is impossible to determine if SERVFAIL is valid or not.

Not when request to the same server for the same address either goes through immediately, fails immediately or with ridiculous timeout as if failover servers don't matter and then failure is cached, so it doesn't even try getting the address but then on forced cache wipe it does… or not. Which is also what resolved likes to do.

With such crapshoot I'm starting to think that maybe DNS protocol itself is garbage pile of bad ideas with DNSSEC being cherry on top of it, thus any effort to rely on it is a fool's errand.

That said this is the wrong place to discuss if UNBOUND was correct or not to return SERVFAIL.

Is there even a caching local server that doesn't do that ? Unbound is widely hailed as example of such server and it's one thing you can find in almost any Linux distro. What I was saying is that resolved is, technically, on par with it. What else it can hope to achieve ?

@marka63
Copy link

@marka63 marka63 commented Oct 29, 2020

SERVFAILs are often warranted. There are lots of broken implementations of DNS out there, mostly because recursive servers have been too permissive with results so there has been no feedback when implementations get things wrong. Without knowing the query and state of the servers at the time it is impossible to determine if SERVFAIL is valid or not.

Not when request to the same server for the same address either goes through immediately, fails immediately or with ridiculous timeout as if failover servers don't matter and then failure is cached, so it doesn't even try getting the address but then on forced cache wipe it does… or not. Which is also what resolved likes to do.

Well when people deploy authoritative servers that return A records for A queries and NXDOMAIN for AAAA queries you get behaviours like that. No protocol compliant server will to that as the server knows there are A records at the name and that it should return NODATA responses to the AAAA lookups. Unfortunately there are GLB vendors whose products fail in exactly this way and when a recursive server has had NXDOMAIN responses for all the nameservers, the recursive server has NO WHERE TO SEND THE LOOKUP. It's been told that all the nameservers DO NOT EXIST. Flushing the cache clears this learnt state so the lookups work for a while until the glue records are replaced.

And before you say one shouldn't be looking up AAAA records, just about every machine on the internet is dual stacked.

With such crapshoot I'm starting to think that maybe DNS protocol itself is garbage pile of bad ideas with DNSSEC being cherry on top of it, thus any effort to rely on it is a fool's errand.

It's poor implementations, not the protocol. Garbage In - Garbage Out.

That said this is the wrong place to discuss if UNBOUND was correct or not to return SERVFAIL.

Is there even a caching local server that doesn't do that ? Unbound is widely hailed as example of such server and it's one thing you can find in almost any Linux distro. What I was saying is that resolved is, technically, on par with it. What else it can hope to achieve ?

@toreanderson
Copy link
Contributor

@toreanderson toreanderson commented Oct 29, 2020

Just for reference systemd has a REALLY bad reputation with other DNS vendors.

yes, I agree; but with other resolvers, we go 'oh right that should be fixed, even if the zone is also broken'; with resolved, we go 'of course, ANOTHER thing that is broken'

Yes, systemd-resolved's DNSSEC implementation is fundamentally broken and unusable. If enabled, it will eventually degrade into not responding to any queries at all, because it will treat transient lookup failures as a signal that the upstream server does not support DNSSEC at all, and refuse to forward further queries to it. Details here: #6490 (comment)

@drzraf
Copy link

@drzraf drzraf commented Nov 24, 2020

DNSSEC validation failed for question dmwww.geo.dmcdn.net IN A: failed-auxiliary (dailymotion)

@kevincox
Copy link

@kevincox kevincox commented Mar 25, 2021

I saw this for updates.cdn-apple.com which appears to be due to a CNAME to updates.g.aaplimg.com. resolvectl flush-caches did not help but systemctl restart systemd-resolved did.

logs (sorry, logging level was not high)
Mar 25 07:29:05 kevinidea systemd[1]: Started Network Name Resolution.
Mar 25 10:41:37 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN SOA: failed-auxiliary
Mar 25 10:41:37 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN A: failed-auxiliary
Mar 25 10:41:37 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN AAAA: failed-auxiliary
Mar 25 10:41:38 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN AAAA: failed-auxiliary
Mar 25 10:41:38 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN SOA: failed-auxiliary
Mar 25 10:41:38 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN A: failed-auxiliary
Mar 25 10:41:40 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN AAAA: failed-auxiliary
Mar 25 10:41:40 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN SOA: failed-auxiliary
Mar 25 10:41:40 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN A: failed-auxiliary
Mar 25 10:41:44 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN SOA: failed-auxiliary
Mar 25 10:41:44 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN A: failed-auxiliary
Mar 25 10:41:44 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN AAAA: failed-auxiliary
Mar 25 10:45:49 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN AAAA: failed-auxiliary
Mar 25 10:45:49 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN SOA: failed-auxiliary
Mar 25 10:45:49 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN A: failed-auxiliary
Mar 25 10:45:50 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN SOA: failed-auxiliary
Mar 25 10:45:50 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN AAAA: failed-auxiliary
Mar 25 10:45:50 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN A: failed-auxiliary
Mar 25 10:45:52 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN SOA: failed-auxiliary
Mar 25 10:45:52 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN AAAA: failed-auxiliary
Mar 25 10:45:52 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN A: failed-auxiliary
Mar 25 10:45:56 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN SOA: failed-auxiliary
Mar 25 10:45:56 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN A: failed-auxiliary
Mar 25 10:45:56 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN AAAA: failed-auxiliary
Mar 25 10:57:14 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN SOA: failed-auxiliary
Mar 25 10:57:14 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN A: failed-auxiliary
Mar 25 10:57:14 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN AAAA: failed-auxiliary
Mar 25 10:57:15 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN SOA: failed-auxiliary
Mar 25 10:57:15 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN A: failed-auxiliary
Mar 25 10:57:15 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN AAAA: failed-auxiliary
Mar 25 10:57:17 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN SOA: failed-auxiliary
Mar 25 10:57:17 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN AAAA: failed-auxiliary
Mar 25 10:57:17 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN A: failed-auxiliary
Mar 25 10:57:21 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN AAAA: failed-auxiliary
Mar 25 10:57:21 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN SOA: failed-auxiliary
Mar 25 10:57:21 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN A: failed-auxiliary
Mar 25 10:57:56 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN AAAA: failed-auxiliary
Mar 25 10:57:56 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN SOA: failed-auxiliary
Mar 25 10:57:56 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN A: failed-auxiliary
Mar 25 10:57:57 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN SOA: failed-auxiliary
Mar 25 10:57:57 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN AAAA: failed-auxiliary
Mar 25 10:57:57 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN A: failed-auxiliary
Mar 25 10:57:59 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN AAAA: failed-auxiliary
Mar 25 10:57:59 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN SOA: failed-auxiliary
Mar 25 10:57:59 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN A: failed-auxiliary
Mar 25 10:58:03 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN SOA: failed-auxiliary
Mar 25 10:58:03 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN A: failed-auxiliary
Mar 25 10:58:03 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN AAAA: failed-auxiliary
Mar 25 10:58:52 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN SOA: failed-auxiliary
Mar 25 10:58:52 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN A: failed-auxiliary
Mar 25 11:02:28 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN SOA: failed-auxiliary
Mar 25 11:02:28 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN A: failed-auxiliary
Mar 25 11:02:54 kevinidea systemd-resolved[730]: Flushed all caches.
Mar 25 11:02:56 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN SOA: failed-auxiliary
Mar 25 11:02:56 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN A: failed-auxiliary
Mar 25 11:16:46 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN SOA: failed-auxiliary
Mar 25 11:16:46 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN AAAA: failed-auxiliary
Mar 25 11:16:46 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN A: failed-auxiliary
Mar 25 11:35:05 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN SOA: failed-auxiliary
Mar 25 11:35:05 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN A: failed-auxiliary
Mar 25 11:37:28 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN SOA: failed-auxiliary
Mar 25 11:37:28 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN A: failed-auxiliary
Mar 25 11:37:36 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN SOA: failed-auxiliary
Mar 25 11:37:36 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN A: failed-auxiliary
Mar 25 11:37:36 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN AAAA: failed-auxiliary
Mar 25 11:46:04 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN SOA: failed-auxiliary
Mar 25 11:46:04 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN AAAA: failed-auxiliary
Mar 25 11:46:04 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN A: failed-auxiliary
Mar 25 11:52:52 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN SOA: failed-auxiliary
Mar 25 11:52:52 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN A: failed-auxiliary
Mar 25 11:52:52 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN AAAA: failed-auxiliary
Mar 25 11:52:58 kevinidea systemd-resolved[730]: Flushed all caches.
Mar 25 11:52:59 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN AAAA: failed-auxiliary
Mar 25 11:52:59 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN SOA: failed-auxiliary
Mar 25 11:52:59 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN A: failed-auxiliary
Mar 25 11:54:51 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN SOA: failed-auxiliary
Mar 25 11:54:51 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN A: failed-auxiliary
Mar 25 11:54:51 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN AAAA: failed-auxiliary
Mar 25 11:54:53 kevinidea systemd-resolved[730]: Flushed all caches.
Mar 25 11:54:55 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN SOA: failed-auxiliary
Mar 25 11:54:55 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN A: failed-auxiliary
Mar 25 11:54:55 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN AAAA: failed-auxiliary
Mar 25 11:57:27 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN SOA: failed-auxiliary
Mar 25 11:57:27 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN AAAA: failed-auxiliary
Mar 25 11:57:27 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN A: failed-auxiliary
Mar 25 11:57:47 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN AAAA: failed-auxiliary
Mar 25 11:57:47 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN SOA: failed-auxiliary
Mar 25 11:57:47 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN A: failed-auxiliary
Mar 25 11:59:36 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN SOA: failed-auxiliary
Mar 25 11:59:36 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN A: failed-auxiliary
Mar 25 11:59:36 kevinidea systemd-resolved[730]: DNSSEC validation failed for question updates.g.aaplimg.com IN AAAA: failed-auxiliary
Mar 25 12:00:41 kevinidea systemd[1]: Stopping Network Name Resolution...

@bluetech
Copy link
Contributor

@bluetech bluetech commented Mar 26, 2021

This also started happening to us a few days ago.

  • Mostly works fine, DNS resolution in applications only fails sporadically
  • AWS VPC
  • Multiple servers
  • Using the VPC DNS endpoint
  • Debian 10, systemd version 241
  • resolve is configured in nsswitch.conf, resolved is configured with DNSSEC=allow-downgrade (the default).

The logs look like this (reverse order):

Mar 26 14:49:02 systemd-resolved[3272]: DNSSEC validation failed for question foo.bar.eu-central-1.rds.amazonaws.com IN A: failed-auxiliary
Mar 26 14:49:02 systemd-resolved[3272]: DNSSEC validation failed for question foo.bar.eu-central-1.rds.amazonaws.com IN SOA: failed-auxiliary
Mar 26 14:49:02 systemd-resolved[3272]: DNSSEC validation failed for question foo.bar.eu-central-1.rds.amazonaws.com IN DS: failed-auxiliary
Mar 26 14:49:02 systemd-resolved[3272]: DNSSEC validation failed for question eu-central-1.rds.amazonaws.com IN DS: failed-auxiliary
Mar 26 14:49:02 systemd-resolved[3272]: Server 10.40.0.2 does not support DNSSEC, downgrading to non-DNSSEC mode.
Mar 26 14:49:02 systemd-resolved[3272]: DNSSEC validation failed for question amazonaws.com IN DS: failed-auxiliary
Mar 26 14:49:02 systemd-resolved[3272]: Using degraded feature set (UDP+EDNS0) for DNS server 10.40.0.2.

Setting DNSSEC=no and restarting systemd-resolved fixes the issue.

I don't know if the problem is in systemd or AWS, and perhaps this is already fixed in newer systemd versions, but hopefully this helps someone in a similar situation.

@kevincox
Copy link

@kevincox kevincox commented Mar 26, 2021

@bluetech Did you try just restarting? I am curious if this is purely a local transient issue where resolved gets into a bad state or if it may be due to certain DNS responses in upstream caches or temporarily misbehaving servers.

It also seems to be a clear trend that this is only occurring for servers that don't support DNSSEC. I haven't seen a case reported where a DNSSEC domain has failed.

@gettysburg
Copy link

@gettysburg gettysburg commented Mar 28, 2021

Is this seriously still open?

Don't have the time to read all the replies since my last visit to the thread, but god almighty.. it can't be that hard, right?

Sorry for the off-topic ramblings - for what it's worth, I haven't encountered these errors anymore after re-installing my system, but I don't think I actually enabled / forced DNSSEC this time.. derp.

@kpfleming
Copy link
Contributor

@kpfleming kpfleming commented Apr 7, 2021

Yes, this is still happening as of systemd 247. I've got a machine with a fully-validating DNS recursive resolver (PowerDNS Recursive configured to process DNSSEC), and a machine with systemd 247 that has systemd-resolved pointed at that recursor. With DNSSEC=yes in resolved.conf, a few dozen 'no-signature' responses in the systemd-resolved journal (presumably from domains that have broken DNSSEC configurations) result in the server being treated as 'incompatible-server' and all DNS resolution fails from that point until systemd-resolved is restarted.

I can reproduce this at will and provide any journals/logs needed, but this is clearly not a new problem.

@kpfleming
Copy link
Contributor

@kpfleming kpfleming commented Apr 7, 2021

And it appears this may have been resolved in systemd 248.

@jansuX2
Copy link

@jansuX2 jansuX2 commented May 12, 2021

And it appears this may have been resolved in systemd 248.

Doesn't seem so, I just tested it with systemd 248.2-2 and it halted after a while, resolvectl status was not responding and it consumed around 12% CPU for a minute, as always.

Tested with
DNSSEC=yes
DNSOverTLS=yes

@kevincox
Copy link

@kevincox kevincox commented May 23, 2021

For me this has been started happening reliably for updates.cdn-apple.com. I managed to get a trace at debug log level. The trace is using DNSOverTLS and I can reproduce the issue with both 1.1.1.1 and dns.google.

https://gist.github.com/kevincox/c547e0d7caea5513d9b3c9e1ba825681

Unfortunately I am not able to try out 248 quite yet, but other comments suggest that the issue is still present so presumably these logs are still relevant.

@kevincox
Copy link

@kevincox kevincox commented Sep 26, 2021

This is still happening on 249. I have a number of domains that reliably fail to resolve using systemd-resolved.

% resolvectl --version
systemd 249 (249.4)
+PAM +AUDIT -SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT -GNUTLS +OPENSSL +ACL +BLKID +CURL -ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY -P11KIT -QRENCODE +BZIP2 +LZ4 +XZ +ZLIB -ZSTD -XKBCOMMON +UTMP -SYSVINIT default-hierarchy=unified

Log for mybell.bell.ca. This appears to be choking on it having an invalid SOA record for the domain behind the CNAME (production-mybell.bell.ca) which while technically "invalid" it isn't clear why it should matter. It does work with dnssec disabled.

Log for updates.cdn-apple.com. This looks similar but it is choking on a failed DS response for updates.g.aaplimg.com.

I'm not an expert on DNS but not other resolvers that I have found fail these domains despite performing DNSSEC validation. Is this a bug in systemd-resolved or is it rightfully checking something that the others should?

@dbrgn
Copy link

@dbrgn dbrgn commented Nov 25, 2021

Other subdomains by Apple seem to be affected as well:

$ resolvectl query developer.apple.com
developer.apple.com: resolve call failed: DNSSEC validation failed: failed-auxiliary

Their setup seems a bit weird: https://dnsviz.net/d/developer.apple.com/dnssec/ However, the apple.com zone doesn't seem to be DNSSEC enabled, so I can't quite see why resolved thinks that the DNSSEC validation failed.

@marka63
Copy link

@marka63 marka63 commented Nov 25, 2021

The servers for g.applimg.com are not DNSSEC aware. Validators need to cope with this by checking for insecure delegations higher up the DNS hierarchy. Queries for DS records are expected to fail until all the world has DNSSEC aware servers. This is a consequence of the DS being in the parent zone and RFC 1034 based DNS servers not knowing this.

@kevincox
Copy link

@kevincox kevincox commented Feb 3, 2022

This also affects cloud.yugabyte.com

@kevincox
Copy link

@kevincox kevincox commented Apr 2, 2022

I'm also seeing this for *.files.wordpress.com. It's safe to say that DNSSEC with systemd-resolved can not be recommended for most users due to the many domains that it is broken for.

Example:

% resolvectl query thejenkinscomic.files.wordpress.com
thejenkinscomic.files.wordpress.com: resolve call failed: Connection timed out
Apr 02 19:30:14 kevinryzen systemd-resolved[1329]: [🡕] DNSSEC validation failed for question files.wordpress.com IN SOA: failed-auxiliary
Apr 02 19:30:14 kevinryzen systemd-resolved[1329]: [🡕] DNSSEC validation failed for question thejenkinscomic.files.wordpress.com IN AAAA: failed-auxiliary
Apr 02 19:30:14 kevinryzen systemd-resolved[1329]: [🡕] DNSSEC validation failed for question thejenkinscomic.files.wordpress.com IN A: failed-auxiliary

@marianrh
Copy link

@marianrh marianrh commented May 8, 2022

I'm encountering the problem on Ubuntu 22.04 (systemd 249) for some domains:

$ resolvectl query www.youtube.com
www.youtube.com: resolve call failed: DNSSEC validation failed: failed-auxiliary

Interestingly, it only occurs when I'm using both DNSSEC and DNS over TLS. My resolved config is:

[Resolve]
Cache=yes
DNSSEC=allow-downgrade
DNSOverTLS=opportunistic

When I remove DNSOverTLS, the DNSSEC error goes away and the query succeeds, which I don't understand at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests