systemd-resolv. Fails on SERVFAIL instead of trying another upstream server. #7147

mistralol · 2017-10-20T09:57:31Z

Submission type

Bug report

systemd version the issue has been seen with

systemd --version
systemd 232
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN

Used distribution

Ubuntu

In case of bug report: Expected behaviour you didn't see

DNS Lookup failure. systemd-resolv did not move onto the next dns server on failure. It simply cached the resolved item indefinatly from a upstream DNS server that return SERVFAIL.

systemd-resolve --flush-caches. Also failed to correct the issue.

Only after restarting systemd-resolv was it then able to correctly resolv domain names again.

In case of bug report: Unexpected behaviour you saw

SERVFAIL is returned from systemd-resolv. However in this case 3 of the 4 name servers work and will return the correct answer. The 4th returning SERVFAIL. SERV fail is a soft error. The resolver should move onto the next DNS server on its list. systemd-resolv fails to do this. NXDomain is a hard failure and should pass this onto the client performing the dns lookup.

DNS Config looks like

Link 3 (enp0s31f6)
Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6
LLMNR setting: yes
MulticastDNS setting: no
DNSSEC setting: no
DNSSEC supported: no
DNS Servers: 10.66.0.198
10.51.50.3
10.51.4.61
10.51.50.2

Manual lookup gets the following.

$ host www.google.co.uk 10.66.0.198
Using domain server:
Name: 10.66.0.198
Address: 10.66.0.198#53
Aliases:

www.google.co.uk has address 216.58.209.131
www.google.co.uk has IPv6 address 2a00:1450:400f:804::2003

$ host www.google.co.uk 10.51.50.3
Using domain server:
Name: 10.51.50.3
Address: 10.51.50.3#53
Aliases:

Host www.google.co.uk not found: 2(SERVFAIL)

$ host www.google.co.uk 10.51.4.61
Using domain server:
Name: 10.51.4.61
Address: 10.51.4.61#53
Aliases:

www.google.co.uk has address 172.217.20.99
www.google.co.uk has IPv6 address 2a00:1450:4007:80c::2003

$ host www.google.co.uk 10.51.50.2
Using domain server:
Name: 10.51.50.2
Address: 10.51.50.2#53
Aliases:

www.google.co.uk has address 216.58.192.35
www.google.co.uk has IPv6 address 2607:f8b0:4008:805::2003

In case of bug report: Steps to reproduce the problem

With 1 of multiple dns servers which is starting.

$ host www.google.co.uk

host www.google.co.uk not found: 2(SERVFAIL)

Restarting systemd-resolv instantly resolves the error if it selects a different upstream dns server on restart.

poettering · 2017-10-23T17:03:10Z

See 201d995 for an explanation why we cache SERVFAIL

mistralol · 2017-10-23T18:27:55Z

And this 12 year old explanation around glibc which ended up in the same situation which resulted in the agreement to try the next server. https://bugzilla.redhat.com/show_bug.cgi?id=160914

From my view point. Though it might be able to be explained. It does not lead to good end user experience not being able to get to "google" because 1 of 4 upstream windows dns server temporary flaked out without restarting systemd-resolve. I also tried to flush the cache of systemd using "systemd-resolve --flush-caches". But again it did not try the next server.

In my understand any upstream server may behave like this at any time and be temporary. Examples would include: Out of memory limits. Query rate limits, Maximum connection limits, Database access problems.

It should not cache this failure indefinably or for a long period of time. The RFC(https://tools.ietf.org/html/rfc2308) restricts this to a period of 5 minutes in section 7.1. I am quite sure elsewhere in the rfc's when a resolver see's a SERV FAIL response. It is responsible for sending queries to other server.

In my unique case the upstream server actually sent a SERVFAIL to every query sent to the server for every domain. Do you really expect that this is the expected behaviour or a robust system?

the-maldridge · 2017-10-24T00:47:18Z

@mistralol How did you fully flush the cache to recover from this?

Currently, we accept SERVFAIL after downgrading fully, cache it and move on. Let's extend this a bit: after downgrading fully, if the SERVFAIL logic continues to be an issue, then use a different DNS server if there are any. Fixes: systemd#7147

poettering · 2017-12-08T18:53:21Z

Fix waiting in #7591. Would be greatly appreciated if you could test this against your server setup, if they are still showing this behaviour?

Currently, we accept SERVFAIL after downgrading fully, cache it and move on. Let's extend this a bit: after downgrading fully, if the SERVFAIL logic continues to be an issue, then use a different DNS server if there are any. Fixes: systemd#7147

poettering added resolve RFE 🎁 Request for Enhancement, i.e. a feature request labels Oct 23, 2017

poettering mentioned this issue Dec 8, 2017

resolved: retry with a different server on SERVFAIL #7591

Merged

poettering added the has-pr label Dec 8, 2017

keszybz closed this as completed in #7591 Dec 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

systemd-resolv. Fails on SERVFAIL instead of trying another upstream server. #7147

systemd-resolv. Fails on SERVFAIL instead of trying another upstream server. #7147

mistralol commented Oct 20, 2017

poettering commented Oct 23, 2017

mistralol commented Oct 23, 2017

the-maldridge commented Oct 24, 2017

poettering commented Dec 8, 2017

systemd-resolv. Fails on SERVFAIL instead of trying another upstream server. #7147

systemd-resolv. Fails on SERVFAIL instead of trying another upstream server. #7147

Comments

mistralol commented Oct 20, 2017

Submission type

systemd version the issue has been seen with

Used distribution

In case of bug report: Expected behaviour you didn't see

In case of bug report: Unexpected behaviour you saw

www.google.co.uk has address 216.58.209.131 www.google.co.uk has IPv6 address 2a00:1450:400f:804::2003

Host www.google.co.uk not found: 2(SERVFAIL)

www.google.co.uk has address 172.217.20.99 www.google.co.uk has IPv6 address 2a00:1450:4007:80c::2003

In case of bug report: Steps to reproduce the problem

poettering commented Oct 23, 2017

mistralol commented Oct 23, 2017

the-maldridge commented Oct 24, 2017

poettering commented Dec 8, 2017

www.google.co.uk has address 216.58.209.131
www.google.co.uk has IPv6 address 2a00:1450:400f:804::2003

www.google.co.uk has address 172.217.20.99
www.google.co.uk has IPv6 address 2a00:1450:4007:80c::2003