-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
systemd-resolved: resolve call failed: DNSSEC validation failed: failed-auxiliary #4003
Comments
The interesting lines are probably these ones:
The question is why we get EINVAL there... |
Still reproducible in Ubuntu Zesty after systemd upgrade to v. 232-7 |
@mikken hmm, so the lookup you are doing there works fine here with git master at least. Any chance you could retry the lookup locally with current git master? If the issue is still reproducible, and chance you could add the following lines to top of resolved-dns-dnssec.c, resolved-dns-trust-anchor.c, resolved-dns-transaction.c: #undef EINVAL
#define EINVAL __LINE__ and recompile systemd? You don't even have to install the git version of systemd for that, all you'd need to do is build it, and run resolved directly from the source tree, after temporarily masking the installed resolved ( The above changes are a nice hack that tells us which line precisely generated the EINVAL... It's a debugging hack. Then, run this, trigger the issue and send me the logs... |
Test domain will now be Error now looks different:
Logs from patched git version:
|
Let's increase a number of timeouts as they apparently are too short for some real-world lookups. See: systemd#4003 (comment) In particular we change the following timeouts: 1) The first UDP retry we increase 500ms → 750ms. This is a good idea, since some servers need relatively long responses for trivial lookups, and giving up our first attempt also has the effect of trying a different server for the next attempt which has the side effect that we'll run two down-grade iterations in parallel, on both servers. Hence, let's give servers a bit more time in the first iteration. 2) Permit 24 retries instead of just 16 per transactions. If we end up downgrading all the way down to UDP for a lookup we already need 5 iterations for that. If we want permit a couple of lost packages for each (let's say 4), then we already need 20 iterations. 3) Increase the overall query timeout on the service side to 60s (from 45s), simply because very long and slow DNSSEC + CNAME chains (such as us.ynuf.alipay.com) hit this boundary too easily. The client side timeout for the bus method call is increased to 90s, in order to have room for the dbus reply to go through
@mikken uh, seems that domain is actively evil.They responde to requests for SOA with SERVFAIL and delay each such reply for at least 1s. We use that looking for zone cuts, and as we cannot distuingish SERVFAIL as meaning "i don't speak the protocol version you are asking for" from "i am a weird domain which always responds SERVFAIL to SOA requests", we end up downgrading our protocol iteratively in the hope it's just a protocol issue. That takes a lot of time, which eventually makes us hit the timeout... In #5347 i have now pushed a couple of fixes which will make things a bit faster, while at the same time making the timeouts longer. The lookup is still super-slow due to all the lookups involved (as they mix that with tons of CNAMEs), but at least works reliably here now. It's a pretty nasty situation... |
Ok, I guess this is fixed then since other domains seem to resolve. |
Let's increase a number of timeouts as they apparently are too short for some real-world lookups. See: systemd#4003 (comment) In particular we change the following timeouts: 1) The first UDP retry we increase 500ms → 750ms. This is a good idea, since some servers need relatively long responses for trivial lookups, and giving up our first attempt also has the effect of trying a different server for the next attempt which has the side effect that we'll run two down-grade iterations in parallel, on both servers. Hence, let's give servers a bit more time in the first iteration. 2) Permit 24 retries instead of just 16 per transactions. If we end up downgrading all the way down to UDP for a lookup we already need 5 iterations for that. If we want permit a couple of lost packages for each (let's say 4), then we already need 20 iterations. 3) Increase the overall query timeout on the service side to 60s (from 45s), simply because very long and slow DNSSEC + CNAME chains (such as us.ynuf.alipay.com) hit this boundary too easily. The client side timeout for the bus method call is increased to 90s, in order to have room for the dbus reply to go through
I have this error with every domain in Internet the first time I run systemd-resolve but the second time it just works fine. It's very strange...and worrying EDIT: I'm using Fedora 26, Systemd v233 |
Submission type
systemd version the issue has been seen with
Used distribution
Unexpected behaviour you saw
I see a failed resolution with some domain names, this is one example:
I saw similar reports in already closed bugs, but they seem to be fixed by v231 and this happens in v231.
I can reproduce with both DNSSEC=yes and DNSSEC=allow-downrade.
My upstream Unbound server with DNSSEC checks enabled sees no problem with these names.
Some logs:
The text was updated successfully, but these errors were encountered: