New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
systemd-resolved stops resolving after some time with "DNSSEC validation failed[...]: incompatible-server " #6490
Comments
|
it appears that resolved comes to the conclusion that your DNS server doesn't properly do DNSSEC. Please run systemd-resolved with the "SYSTEMD_LOG_LEVEL=debug" env var (by doing systemctl edit systemd-resolved, and then adding the two lines |
|
I'll add the debug log level and will report my findings. Thanks |
|
with debug logging enabled, the issue appears much less frequently, but I had two issues this morning the journal shows the systemd-resolved Unit contains and the environment variable is active but the journal does not contain a trace of the DNS resolution. Anything I might be doing wrong? |
|
any chance you can provide a longer log excerpt around the lookup? |
|
Hello Lennart, sure, here is a full session from a few days ago (with debug logging environment variable present as discussed above, but the journal log does not show any additional information): at 14:46:01 I've restarted systemd-resolved and the previous failing domains were starting to resolve without issues again. |
|
Hmm, is this reproducible still with current systemd? If so, the logs shown already state that a degraded DNS feature level is used for the server. resolved initially uses the most powerful feature level, and then downgrades bit by bit, if it notices that its requests don't work. Now, given that you already are one level down something must have happened before making resolved think that the server is bad. Hence, if you can reproduce this with v235, any chance you can trigger this again and then look for messages earlier than the issue you are encountering that show when and why resolved decided to downgrade? |
|
Yes, the problem is reproducible with current systemd:
I tried to resolve |
|
@poettering I'm very busy this week but I will try v235 next week or over the holidays and report back |
|
BIND seems to have default query timeout of 10 seconds, so systemd-resolved does not work well with it and often degrades the feature set. Unbound seems to work better with BIND. I will have a look at Unbound's infra cache module and will see whether we can adapt systemd-resolved's timeout and feature set algorithm to be more compatible with BIND and similar DNS resolver software. I switched from Unbound to systemd-resolved a few days ago and DNS resolution often does not work anymore when I enable DNSSEC. For a user who doesn't know about all the details it will look like the Internet is broken. |
|
Just wanted to add that I observed the same with v238. The issues are exactly the same, restart solves the problems (and the upstream servers queried are bind9s). systemd-resolved slowly degrades (but never attempts to upgrade after some time ?) with - what looks like - 5s timeouts. I'm not sure how bind9 itself works when resolving takes time (according to documentation, controlled by resolver-query-timeout, with default and possible minimum 10s, settable up to 30s), but that seems to be confusing systemd's resolver about losing / not receiving expected replies. Whether bind9 internally extends the delay up to 30s(?) for failed queries or not - I'm not sure (I read something somewhere that ancient bind8 behaved this way). Some way to control systemd-resolved behaviour (in context of timeouts, or policy how to retry / how hard to try) would be nice to have. Otherwise any timeout for any reason is essentially a slow but one-way ticket with resolved degrading itself to unnecessarily limited functionality or - in case of dnssec=yes - non-workable state requiring restart. Options such as (global + per-link): MaxRetries= etc. EDIT: some clarifications |
|
I confirm this issue on Bionic, systemd 237-3ubuntu10.3. |
|
I'm also affected by this issue. Unfortunately, it seems some major changes in the way systemd-resolved determines the features that an upstream resolver supports. You are invited to help with #9384. |
|
I have not been able to reproduce since specifying FallbackDNS. Does anyone have a reliable (with a specified time it will take) way to reproduce it? |
|
Yes : $ cat /etc/systemd/resolved.conf
[Resolve]
LLMNR=no
MulticastDNS=no
DNSSEC=yes$ resolvectl
Global
LLMNR setting: no
MulticastDNS setting: no
DNSOverTLS setting: no
DNSSEC setting: yes
DNSSEC supported: yes
Current DNS Server: 8.8.8.8
Fallback DNS Servers: 8.8.8.8
8.8.4.4
2001:4860:4860::8888
2001:4860:4860::8844
[...]there is no per-link setting $ resolvectl flush-caches
$ resolvectl reset-server-features
$ resolvectl query blog.haschek.at
blog.haschek.at: resolve call failed: DNSSEC validation failed: incompatible-server
$ resolvectl query google.com
google.com: resolve call failed: DNSSEC validation failed: incompatible-server
$ resolvectl
Global
LLMNR setting: no
MulticastDNS setting: no
DNSOverTLS setting: no
DNSSEC setting: yes
DNSSEC supported: no
Current DNS Server: 8.8.8.8
Fallback DNS Servers: 8.8.8.8
8.8.4.4
2001:4860:4860::8888
2001:4860:4860::8844
[...]resolved needs to be reset to make it usable again : $ resolvectl reset-server-features
$ resolvectl query google.com
google.com: 216.58.204.110 -- link: enp0s31f6
-- Information acquired via protocol DNS in 55.1ms.
-- Data is authenticated: no |
|
Please, has anyone found a solution for this issue or a workaroud? Seems like yes, |
|
I do not have a solution for this issue, but for me it happens approximately every third day on my office workstation (Debian stretch with systemd 239-12~bpo9+1). Since this is an unacceptable situation on my office workstation, I had to disable DNSSEC or switch over to a different resolver. |
|
@RolandRosenfeld This is a known issue. As I said, it would require some changes to systemd-resolved. |
|
Quick and dirty "workaround". |
|
Also observing this on Ubuntu Bionic, Would love to help out getting to the bottom of this - it's currently frustrating my DNSSEC deployment (with SSHFP records I finally hope to be using...) @eddebc wow... |
|
Ubuntu 18.04 and up are made much worse by the partial fix for #8608 just fyi (the previous merge proposal is what in Ubuntu's systemd). Once the better fix is merged we plan to backport it to 18.04 LTS. |
|
@BryanQuigley Thanks for the pointer! |
Is this a private repo, cannot access it? |
I can access this perfectly fine, anonymously, not signed in. |
|
I am also experiencing this issue quite often, even though I am using an upstream recursive resolver known to be DNSSEC capable (CloudFlare's What seems to be happening is that some query is answered with
I am attaching debug-level logs from Note that at the time I was simply using my computer normally and not trying to do anything specific to provoke the issue to occur. The issue appears to have triggered around these log lines: The failing query it complains about appears to be this one from the PCAP (note how it took five seconds to complete, so the This was clearly a transient issue, after having issued $ resolvectl query services.mozilla.com -t SOA
services.mozilla.com IN SOA ns-679.awsdns-20.net awsdns-hostmaster.amazon.com 1 7200 900 1209600 86400 -- link: tun0
-- Information acquired via protocol DNS in 402.3ms.
-- Data is authenticated: noMy $ egrep -v '^($|#)' /etc/systemd/resolved.conf
[Resolve]
DNS=1.1.1.1
DNSSEC=yes
Cache=noI'm running |
|
In case it is of any help to others, here is the workaround I use to alleviate this issue. It looks for log messages from It is an ugly hack, but it works. To use it, drop it into # /etc/systemd/system/systemd-resolved-autofix-dnssec.service
[Service]
ExecStart=sh -c 'journalctl -n0 -fu systemd-resolved | grep -m1 "DNSSEC validation failed.*incompatible-server" && resolvectl reset-server-features'
Restart=always
[Install]
WantedBy=systemd-resolved.service |
|
@poettering I just realised this issue still has the needs-reporter-feedback label you added back in July 2017. I believe that others and I have since provided the feedback requested, so that the label may now be removed. If not, please advise which additional information is still needed - I will try to provide it as soon as I can. |
|
This issue is still seen in Fedora 31 (with Systemd v243-4.gitef67743.fc31) |
|
Also currently facing this issue.
|
|
I just encountered this issue, disabling DNSSEC, restarting, enabling it and restarting fixed it very temporarily. This is incredibly annoying. Using |
|
@Avamander It is a known issue for years. Unfortunately, it is mostly ignored by the developers. It would be helpful if someone with contacts to RedHat or otherwise to the systemd developers could highlight the issue. However, as I wrote, it is not an easy issue and requires a rewrite of the DNS server feature detection of systemd-resolved. Please don't add to this issue if you have the same problem but use the emoticons to indicate that you are also affected. |
|
Emoticons only work if someone really pays attention. It is clear that it isn't the case. The very least if there aren't any immediate plans on fixing this then the documentation should be updated to warn against a critical bug that will bite anyone using it in the ass. |
|
I still have the issue on Ubuntu 20.04 |
|
I got this issues when running fedora 33 under virtualbox. |
|
That's because Fedora started using systemd-resolved in Fedora 33. Every Fedora 33 user is going to experience this. |
|
@kpfleming That will not be the case, as F33 disables DNSSEC by default. |
|
It doesn't change the fact that people expect this functionality to work properly. It really doesn't. |
You cannot "disabled DNSSEC" on the Operating System. DNS libraries and DNS applications determine this by themselves. The real problem appears when these programs and libraries use the information from /etc/resolv.conf to find a real DNS server, and what they get is only a nameserver pointing to systemd-resolved via "nameserver 127.0.0.53". Additionally, there seems to be some negative caching issues for me on fedora 33, that prevents me from using systemd-resolved when using my iphone as hotspot. It appears some DNS queries are lost while my laptop is connecting to my iphone, resulting in my hotspot connection just not working at all due to DNS failures. |
|
@letoams In that case you are talking about a different issue than this one. This issue is about systemd-resolved's tendency to, when configured with I encourage you to submit a new issue about the behaviour you're describing (assuming nobody already has). |
More like an absolute certainty. It's totally broken for nearly three years now. |
|
poettering@ca8fe05 will in all likelihood fix this issue |
This adjusts our feature level handling: when DNSSEC strict mode is on, let's never lower the feature level below the lowest DNSSEC mode. Also, when asking whether DNSSEC is supproted, always say yes in strict mode. This means that error reporting about transactions that fail because of missing DNSSEC RRs will not report "incompatible-server" but instead "missing-signature" or suchlike. The main difference here is that DNSSEC failures become local to a transaction, instead of propagating into the feature level we reuse for future transactions. This is beneficial with routers that implement "mostly a DNS proxy", i.e. that propagate most DNS requests 1:1 to their upstream servers, but synthesize local answers for a select few domains. For example, AVM Fritz!Boxes operate that way: they proxy most traffic 1:1 upstream in an DNSSEC-compatible fashion, but synthesize the "fritz.box" locally, so that it can be used to configure the router. This local domain cannot be DNSSEC verified, it comes without signatures. Previously this would mean once that domain was resolved feature level would be downgraded, and we'd thus fail all future DNSSEC attempts. With this change, the immediate lookup for "fritz.box" will fail validation, but for all other unrelated future ones that comes without prejudice. (While we are at it, also make a couple of other downgrade paths a bit tighter.) Fixes: systemd#10570 systemd#14435 systemd#6490
This adjusts our feature level handling: when DNSSEC strict mode is on, let's never lower the feature level below the lowest DNSSEC mode. Also, when asking whether DNSSEC is supproted, always say yes in strict mode. This means that error reporting about transactions that fail because of missing DNSSEC RRs will not report "incompatible-server" but instead "missing-signature" or suchlike. The main difference here is that DNSSEC failures become local to a transaction, instead of propagating into the feature level we reuse for future transactions. This is beneficial with routers that implement "mostly a DNS proxy", i.e. that propagate most DNS requests 1:1 to their upstream servers, but synthesize local answers for a select few domains. For example, AVM Fritz!Boxes operate that way: they proxy most traffic 1:1 upstream in an DNSSEC-compatible fashion, but synthesize the "fritz.box" locally, so that it can be used to configure the router. This local domain cannot be DNSSEC verified, it comes without signatures. Previously this would mean once that domain was resolved feature level would be downgraded, and we'd thus fail all future DNSSEC attempts. With this change, the immediate lookup for "fritz.box" will fail validation, but for all other unrelated future ones that comes without prejudice. (While we are at it, also make a couple of other downgrade paths a bit tighter.) Fixes: systemd#10570 systemd#14435 systemd#6490
This adjusts our feature level handling: when DNSSEC strict mode is on, let's never lower the feature level below the lowest DNSSEC mode. Also, when asking whether DNSSEC is supproted, always say yes in strict mode. This means that error reporting about transactions that fail because of missing DNSSEC RRs will not report "incompatible-server" but instead "missing-signature" or suchlike. The main difference here is that DNSSEC failures become local to a transaction, instead of propagating into the feature level we reuse for future transactions. This is beneficial with routers that implement "mostly a DNS proxy", i.e. that propagate most DNS requests 1:1 to their upstream servers, but synthesize local answers for a select few domains. For example, AVM Fritz!Boxes operate that way: they proxy most traffic 1:1 upstream in an DNSSEC-compatible fashion, but synthesize the "fritz.box" locally, so that it can be used to configure the router. This local domain cannot be DNSSEC verified, it comes without signatures. Previously this would mean once that domain was resolved feature level would be downgraded, and we'd thus fail all future DNSSEC attempts. With this change, the immediate lookup for "fritz.box" will fail validation, but for all other unrelated future ones that comes without prejudice. (While we are at it, also make a couple of other downgrade paths a bit tighter.) Fixes: systemd#10570 systemd#14435 systemd#6490
This adjusts our feature level handling: when DNSSEC strict mode is on, let's never lower the feature level below the lowest DNSSEC mode. Also, when asking whether DNSSEC is supproted, always say yes in strict mode. This means that error reporting about transactions that fail because of missing DNSSEC RRs will not report "incompatible-server" but instead "missing-signature" or suchlike. The main difference here is that DNSSEC failures become local to a transaction, instead of propagating into the feature level we reuse for future transactions. This is beneficial with routers that implement "mostly a DNS proxy", i.e. that propagate most DNS requests 1:1 to their upstream servers, but synthesize local answers for a select few domains. For example, AVM Fritz!Boxes operate that way: they proxy most traffic 1:1 upstream in an DNSSEC-compatible fashion, but synthesize the "fritz.box" locally, so that it can be used to configure the router. This local domain cannot be DNSSEC verified, it comes without signatures. Previously this would mean once that domain was resolved feature level would be downgraded, and we'd thus fail all future DNSSEC attempts. With this change, the immediate lookup for "fritz.box" will fail validation, but for all other unrelated future ones that comes without prejudice. (While we are at it, also make a couple of other downgrade paths a bit tighter.) Fixes: systemd#10570 systemd#14435 systemd#6490
This adjusts our feature level handling: when DNSSEC strict mode is on, let's never lower the feature level below the lowest DNSSEC mode. Also, when asking whether DNSSEC is supproted, always say yes in strict mode. This means that error reporting about transactions that fail because of missing DNSSEC RRs will not report "incompatible-server" but instead "missing-signature" or suchlike. The main difference here is that DNSSEC failures become local to a transaction, instead of propagating into the feature level we reuse for future transactions. This is beneficial with routers that implement "mostly a DNS proxy", i.e. that propagate most DNS requests 1:1 to their upstream servers, but synthesize local answers for a select few domains. For example, AVM Fritz!Boxes operate that way: they proxy most traffic 1:1 upstream in an DNSSEC-compatible fashion, but synthesize the "fritz.box" locally, so that it can be used to configure the router. This local domain cannot be DNSSEC verified, it comes without signatures. Previously this would mean once that domain was resolved feature level would be downgraded, and we'd thus fail all future DNSSEC attempts. With this change, the immediate lookup for "fritz.box" will fail validation, but for all other unrelated future ones that comes without prejudice. (While we are at it, also make a couple of other downgrade paths a bit tighter.) Fixes: systemd#10570 systemd#14435 systemd#6490
|
Still an issue with 247.2 |
This adjusts our feature level handling: when DNSSEC strict mode is on, let's never lower the feature level below the lowest DNSSEC mode. Also, when asking whether DNSSEC is supproted, always say yes in strict mode. This means that error reporting about transactions that fail because of missing DNSSEC RRs will not report "incompatible-server" but instead "missing-signature" or suchlike. The main difference here is that DNSSEC failures become local to a transaction, instead of propagating into the feature level we reuse for future transactions. This is beneficial with routers that implement "mostly a DNS proxy", i.e. that propagate most DNS requests 1:1 to their upstream servers, but synthesize local answers for a select few domains. For example, AVM Fritz!Boxes operate that way: they proxy most traffic 1:1 upstream in an DNSSEC-compatible fashion, but synthesize the "fritz.box" locally, so that it can be used to configure the router. This local domain cannot be DNSSEC verified, it comes without signatures. Previously this would mean once that domain was resolved feature level would be downgraded, and we'd thus fail all future DNSSEC attempts. With this change, the immediate lookup for "fritz.box" will fail validation, but for all other unrelated future ones that comes without prejudice. (While we are at it, also make a couple of other downgrade paths a bit tighter.) Fixes: systemd#10570 systemd#14435 systemd#6490
This adjusts our feature level handling: when DNSSEC strict mode is on, let's never lower the feature level below the lowest DNSSEC mode. Also, when asking whether DNSSEC is supproted, always say yes in strict mode. This means that error reporting about transactions that fail because of missing DNSSEC RRs will not report "incompatible-server" but instead "missing-signature" or suchlike. The main difference here is that DNSSEC failures become local to a transaction, instead of propagating into the feature level we reuse for future transactions. This is beneficial with routers that implement "mostly a DNS proxy", i.e. that propagate most DNS requests 1:1 to their upstream servers, but synthesize local answers for a select few domains. For example, AVM Fritz!Boxes operate that way: they proxy most traffic 1:1 upstream in an DNSSEC-compatible fashion, but synthesize the "fritz.box" locally, so that it can be used to configure the router. This local domain cannot be DNSSEC verified, it comes without signatures. Previously this would mean once that domain was resolved feature level would be downgraded, and we'd thus fail all future DNSSEC attempts. With this change, the immediate lookup for "fritz.box" will fail validation, but for all other unrelated future ones that comes without prejudice. (While we are at it, also make a couple of other downgrade paths a bit tighter.) Fixes: systemd#10570 systemd#14435 systemd#6490
This adjusts our feature level handling: when DNSSEC strict mode is on, let's never lower the feature level below the lowest DNSSEC mode. Also, when asking whether DNSSEC is supproted, always say yes in strict mode. This means that error reporting about transactions that fail because of missing DNSSEC RRs will not report "incompatible-server" but instead "missing-signature" or suchlike. The main difference here is that DNSSEC failures become local to a transaction, instead of propagating into the feature level we reuse for future transactions. This is beneficial with routers that implement "mostly a DNS proxy", i.e. that propagate most DNS requests 1:1 to their upstream servers, but synthesize local answers for a select few domains. For example, AVM Fritz!Boxes operate that way: they proxy most traffic 1:1 upstream in an DNSSEC-compatible fashion, but synthesize the "fritz.box" locally, so that it can be used to configure the router. This local domain cannot be DNSSEC verified, it comes without signatures. Previously this would mean once that domain was resolved feature level would be downgraded, and we'd thus fail all future DNSSEC attempts. With this change, the immediate lookup for "fritz.box" will fail validation, but for all other unrelated future ones that comes without prejudice. (While we are at it, also make a couple of other downgrade paths a bit tighter.) Fixes: systemd#10570 systemd#14435 systemd#6490
|
Fixed by #18624 |
Submission type
systemd version the issue has been seen with
Used distribution
In case of bug report: Expected behaviour you didn't see
In case of bug report: Unexpected behaviour you saw
In case of bug report: Steps to reproduce the problem
upstream resolver is a Knot-DNS resolver
What is the best way to debug this?
The text was updated successfully, but these errors were encountered: