New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
resolve call failed: DNSSEC validation failed: failed-auxiliary #9867
Comments
|
I see two problems here that I don't really understand:
UPDATE: I tried the lookup again. This time, it seems the entries just expired, so a new lookup was stuck again for a while. I seem to reproduce the delay fairly consistently too. But this time the lookup eventually succeeded. This time lookup of DNSKEY took from 23:38:30 to 23:39:23. The verdicts were all insecure. But it ended up replying with an IN A address for the domain. |
|
Note that |
|
Ping? I can still reproduce this one, as of current master (v239-751-g49cdae63d168). Any ideas what might be going wrong here? I can try to check back when this actually worked (if it actually has worked before) and try to bisect to see where the problem was introduced... Will try to do that now. |
|
Ping? I can still reproduce this issue consistently. Just tried it with a build that includes @poettering's #11194 and it still doesn't work... Let me know if you'd like me to collect fresh logs for this one, I'd be happy to. Cheers, |
|
I think that the core issue is that while the |
|
I'm wondering if other DNSSEC clients/implementations also have trouble with In other words: Is this a resolved bug? Or is this an actual problem with the way Pragmatically, what can we do about it? Currently I have disabled resolving through resolved on my machines, since not being able to checkout git repos from savannah.gnu.org is a problem I'd rather not live with right now... |
|
Currently running into this on my ArchLinux machine. I'd really not want to need to either disable DNSSEC or use a nonlocal resolver. Also adding that one of the other affected domains is |
No, I asked a gf it she can ping |
|
Can also confirm that Edit: with "local dns server" I mean the one provided by my router, not systemd-resolved. |
|
I also had a lot of problems with resolving various names like My configuration was:
I removed the |
Hmm, workaround doesn't work for me. I have: $ grep -E "^[^#]" /etc/systemd/resolved.conf
[Resolve]
DNSSEC=allow-downgrade
Cache=yesThe only difference is I don't set DNS explicitly, and per the suggestion I don't have the Btw, I think these values are actually defaults. |
|
There were some updates lately. Unfortunately, I didn't had time to check whether the problem is fixed. My current (working) configuration is: |
|
@fabolhak on the machine I originally encountered this problem, resolving |
|
I have DNSSEC enabled and I was running into Setting My system gets both a IPv4 (dynamic) address and a IPv6 (Comcast, doesn't seem dynamic) address systemd 242 (242.0-3-arch) |
|
I looked into this a bit further, it turns out DNSViz shows there's indeed a problem with It lists delegation status as Bogus for Under Errors it lists:
So it turns out this domain seems to be misconfigured, indeed. Not really sure how to report this to the owners of the domain, so they get it fixed... Also not really sure whether resolved should do something different in this case (such as resolve it without any DNSSEC?), it seems other resolvers are able to resolve this domain, not really sure where the difference is and whether that's really the correct behavior... BTW, I can't reproduce this with Will keep looking... |
|
Reported to hostmaster of gnu.org by e-mail, got a reply saying:
So hopefully that domain will be fixed shortly! Cheers, |
|
It would appear that most resolvers will happily resolve FQDNs such as $ dig @1.1.1.1 savannah.gnu.org. +dnssec | grep ';; flags'
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
$ dig @8.8.8.8 savannah.gnu.org. +dnssec | grep ';; flags'
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1I think A practical consequence of the current behaviour is that it becomes impossible for a domain owner to gracefully deploy (or remove) DNSSEC signatures without interruption, as he has no control over when the parent (gTLD/ccTLD) zone is reloaded and the DS record becomes visible. It is impossible to in an «atomic»/instantaneous manner transition from a having no DS in the parent zone and no signatures in zone itself to a state where the zone is signed and DS records exists in the parent zone (or vice versa) - even before taking TTL and caching into account. The normal procedure when deploying DNSSEC is to first sign the zone, wait for all slaves to pick it up and for TTLs to expire, and only then push the DS records to the parent zone via the registrar. This procedure could take hours or even days, during which RFC 4033 section 5 contains some guidance on how to behave in this this situation: As I understand it, The RFC goes on to say: And also: However, even though $ dig @127.0.0.53 savannah.gnu.org. SOA | grep status:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 41084This seems improper to me. The correct thing to do, as I understand it, would be to answer the query without the |
|
Having the same problem. After activating DNSSEC in systemd-resolved, Facebook stopped showing most of the images: systemd version 239 (Kubuntu 18.10). |
|
I've got the same issue on systemd 242.32-3 (Arch).
It doesn't look like there are any errors with the DNSSEC though... Querying my upstream provider DNSSEC-enabled resolvers work fine. |
|
Someone reported on a somewhat similar issue that setting I just tried it, and it helps with this issue too. |
Disabling DNSSEC does work for me, although I'd really prefer not to do that :/ |
|
I've got the same issue with systemd 242.84 on Arch. However, without restarting resolved, even if I manually execute Then I executed |
|
Getting the same exact issue with systemd 243.51-1 on Arch for savannah.gnu.org. Disabling DNSSEC works for me but, as others have already pointed out, I'd prefer not to do that. Let me know if there are any logs I can provide to help with this. edit: not sure if this will be helpful or not, but, this bug seems to be quite inconsistent. But that I mean, it depends on what network I'm on. The bug appears on my home (comcast) wifi, but does not when I'm connected to the coffeeshop wifi. |
|
why does everyone think disabling a bugged security feature is a solution? do you also disable your antivirus because it makes your PC slow? |
|
probably because the bugged security feature is preventing productivity. that off-topic here though. |
@Silur I first noticed the error when I was attempting to install Rust. I could not even download the shell script required(rustup) to do so. The domain its hosted on is Also, I wanted to install the Signal messenger. But its even its repo cannot be resolved My systemd-resolved status and version if you want to reproduce. ( servers configured are Quad9 and Cleanbrowsing anti-malware DNS resolvers ) This is why I turned off DNSSEC. As @Nothing4You mentioned, it was hampering my productivity. The only commonality I found between these two domains was that they both have CNAME records pointing to a cloudfront subdomain. (Below responses received after turning off DNSSEC, restarting systemd-resolved and flushing the cache). Funnily enough, the presence of CNAME records themselves does not seem to the cause of the problem. As it does not seem to affect the VS Code repo(checked below after turning DNSSEC back to |
|
@ian-kelling The problem with savannah.gnu.org was that there weren't delegating NS record in the gnu.org zone. With DNSSEC delegation errors like this are exposed as the DS response comes from the parent zone. With plain DNS and the parent and child zones both being served by the same set of servers clients never see referral responses for the child zone so the lack of NS records is not obvious. NSEC proving non-existence of savannah.gnu.org/DS: The NS bit was not set in the bitmap of the NSEC RR corresponding to the delegated name (savannah.gnu.org). |
|
Looking at this ticket there is no evidence of a bug as the reports do not capture the DNSSEC records (DNSKEY, DS, RRSIG) at the time of the issue. There is evidence of DNSSEC failures due to operator error, e.g. savannah.gnu.org was not properly delegated (no NS records in gnu.org for savannah.gnu.org), clocks being wrong or zone not being re-signed in time (signature-expired being reported). DNSSEC errors aren't hard to diagnose if you actually take a little bit of time to learn how DNSSEC works:
|
Actually, the only thing that should matter is that there were (and still are) no That is, This means anything below |
|
Using test zones (https://workbench.sidnlabs.nl/bad-dnssec.html) may help this one move along as one won't be testing against moving targets (a.k.a. production zones). nods.bad-dnssec.wb.sidnlabs.nl and ok.nods.bad-dnssec.wb.sidnlabs.nl are signed zones that should validate as insecure and do with BIND 9 (below). The should also approximate savannah.gnu.org original state (these aren't missing the delegating NS RRset). I don't have access to systemd to perform the same lookups with it. |
|
Discussing this on another channel full of DNS developers from multiple vendors. It appears that two levels of no DS records fails. Getting people to turn on DNSSEC is hard enough without implementation errors like this continuing to exist for years after they are reports. |
|
And to show that it does work through other validators |
|
Just for reference systemd has a REALLY bad reputation with other DNS vendors. |
|
@marka63 Yep, now you get it. |
|
SERVFAILs are often warranted. There are lots of broken implementations of DNS out there, mostly because recursive servers have been too permissive with results so there has been no feedback when implementations get things wrong. Without knowing the query and state of the servers at the time it is impossible to determine if SERVFAIL is valid or not. That said this is the wrong place to discuss if UNBOUND was correct or not to return SERVFAIL. |
Not when request to the same server for the same address either goes through immediately, fails immediately or with ridiculous timeout as if failover servers don't matter and then failure is cached, so it doesn't even try getting the address but then on forced cache wipe it does… or not. Which is also what resolved likes to do. With such crapshoot I'm starting to think that maybe DNS protocol itself is garbage pile of bad ideas with DNSSEC being cherry on top of it, thus any effort to rely on it is a fool's errand.
Is there even a caching local server that doesn't do that ? Unbound is widely hailed as example of such server and it's one thing you can find in almost any Linux distro. What I was saying is that resolved is, technically, on par with it. What else it can hope to achieve ? |
Well when people deploy authoritative servers that return A records for A queries and NXDOMAIN for AAAA queries you get behaviours like that. No protocol compliant server will to that as the server knows there are A records at the name and that it should return NODATA responses to the AAAA lookups. Unfortunately there are GLB vendors whose products fail in exactly this way and when a recursive server has had NXDOMAIN responses for all the nameservers, the recursive server has NO WHERE TO SEND THE LOOKUP. It's been told that all the nameservers DO NOT EXIST. Flushing the cache clears this learnt state so the lookups work for a while until the glue records are replaced. And before you say one shouldn't be looking up AAAA records, just about every machine on the internet is dual stacked.
It's poor implementations, not the protocol. Garbage In - Garbage Out.
|
Yes, |
|
|
|
I saw this for logs (sorry, logging level was not high) |
|
This also started happening to us a few days ago.
The logs look like this (reverse order): Setting I don't know if the problem is in systemd or AWS, and perhaps this is already fixed in newer systemd versions, but hopefully this helps someone in a similar situation. |
|
@bluetech Did you try just restarting? I am curious if this is purely a local transient issue where resolved gets into a bad state or if it may be due to certain DNS responses in upstream caches or temporarily misbehaving servers. It also seems to be a clear trend that this is only occurring for servers that don't support DNSSEC. I haven't seen a case reported where a DNSSEC domain has failed. |
|
Is this seriously still open? Don't have the time to read all the replies since my last visit to the thread, but god almighty.. it can't be that hard, right? Sorry for the off-topic ramblings - for what it's worth, I haven't encountered these errors anymore after re-installing my system, but I don't think I actually enabled / forced DNSSEC this time.. derp. |
|
Yes, this is still happening as of systemd 247. I've got a machine with a fully-validating DNS recursive resolver (PowerDNS Recursive configured to process DNSSEC), and a machine with systemd 247 that has systemd-resolved pointed at that recursor. With I can reproduce this at will and provide any journals/logs needed, but this is clearly not a new problem. |
|
And it appears this may have been resolved in systemd 248. |
Doesn't seem so, I just tested it with systemd 248.2-2 and it halted after a while, resolvectl status was not responding and it consumed around 12% CPU for a minute, as always. Tested with |
|
For me this has been started happening reliably for https://gist.github.com/kevincox/c547e0d7caea5513d9b3c9e1ba825681 Unfortunately I am not able to try out 248 quite yet, but other comments suggest that the issue is still present so presumably these logs are still relevant. |
|
This is still happening on 249. I have a number of domains that reliably fail to resolve using systemd-resolved. Log for mybell.bell.ca. This appears to be choking on it having an invalid SOA record for the domain behind the CNAME ( Log for updates.cdn-apple.com. This looks similar but it is choking on a failed DS response for I'm not an expert on DNS but not other resolvers that I have found fail these domains despite performing DNSSEC validation. Is this a bug in systemd-resolved or is it rightfully checking something that the others should? |
|
Other subdomains by Apple seem to be affected as well: Their setup seems a bit weird: https://dnsviz.net/d/developer.apple.com/dnssec/ However, the apple.com zone doesn't seem to be DNSSEC enabled, so I can't quite see why resolved thinks that the DNSSEC validation failed. |
|
The servers for g.applimg.com are not DNSSEC aware. Validators need to cope with this by checking for insecure delegations higher up the DNS hierarchy. Queries for DS records are expected to fail until all the world has DNSSEC aware servers. This is a consequence of the DS being in the parent zone and RFC 1034 based DNS servers not knowing this. |
|
This also affects cloud.yugabyte.com |
|
I'm also seeing this for Example: |
|
I'm encountering the problem on Ubuntu 22.04 (systemd 249) for some domains: Interestingly, it only occurs when I'm using both DNSSEC and DNS over TLS. My resolved config is: When I remove DNSOverTLS, the DNSSEC error goes away and the query succeeds, which I don't understand at all. |

systemd version the issue has been seen with
Latest git (v239-525-g9e5f34a639f6)
Used distribution
Fedora Rawhide
Expected behaviour you didn't see
systemd-resolved resolving all domains.
Unexpected behaviour you saw
systemd-resolved having trouble with specific domains (in my case, savannah.gnu.org)
Steps to reproduce the problem
My ens3.network has:
My resolved.conf has:
Output of
resolvectl:Resolving some domains works fine:
And:
But this one fails, hangs for a long time and finally times out:
systemd-resolved logs are here: https://gist.github.com/filbranden/ad972ab59aaf23e5c8f0013f867db1eb
Issue #9283 looks similar (same error message), not sure if it's a dupe somehow... cc @yuwata and @irtimmer.
Thanks!
Filipe
The text was updated successfully, but these errors were encountered: