Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SERVFAIL not handled well #22

Closed
raylu opened this issue Jan 30, 2013 · 5 comments
Closed

SERVFAIL not handled well #22

raylu opened this issue Jan 30, 2013 · 5 comments

Comments

@raylu
Copy link

raylu commented Jan 30, 2013

python -c 'import dns.resolver; dns.resolver.query("_domainkey.collabfinder.com", "TXT")'

This hangs because nameservers aren't removed from the list for SERVFAIL: https://github.com/rthalley/dnspython/blob/master/dns/resolver.py#L839
The comment is not very helpful in explaining why.

@rthalley
Copy link
Owner

On 30 Jan 2013, at 03:57, raylu notifications@github.com wrote:

python -c 'import dns.resolver; dns.resolver.query("_domainkey.collabfinder.com", "TXT")'

This hangs because nameservers aren't removed from the list for SERVFAIL: https://github.com/rthalley/dnspython/blob/master/dns/resolver.py#L839
The comment is not very helpful in explaining why.

It doesn't hang, but it will take up to the resolver's lifetime to give up (30 seconds). You can change the timeouts, e.g.

dns.resolver.get_default_resolver().timeout = 1.0 # time to wait for any given server
dns.resolver.get_default_resolver().lifetime = 5.0 # total time to spend on this resolution

Unfortunately the DNS protocol doesn't differentiate between "the server temporarily cannot return an answer, try again" and "the server is broken and can't return your answer" in result codes, using SERVFAIL for both situations. Since you don't know if any given SERVFAIL is a temporary failure or a more enduring one, if you remove the server from the set you risk not getting an answer at all. Perhaps the resolver should have a policy setting saying whether SERVFAIL should be treated as an enduring failure and cause the server to be removed from the set.

/Bob

@raylu
Copy link
Author

raylu commented Jan 30, 2013

In my understanding, SERVFAIL is

Server failure - The name server was unable to process this query due to a problem with the name server.

I don't see anything in RFC 1035 about temporary failure. Am I (as often happens when dealing with RFCs) reading the wrong document?

@rthalley
Copy link
Owner

On 30 Jan 2013, at 20:45, raylu notifications@github.com wrote:

In my understanding, SERVFAIL is

Server failure - The name server was unable to process this query due to a problem with the name server.

I don't see anything in RFC 1035 about temporary failure. Am I (as often happens when dealing with RFCs) reading the wrong document

Unfortunately, that's about all there is about SERVFAIL in the standards. All RFC 1035's "Server failure" error promises is that "this query" couldn't be processed due to "a problem with with the name server". I doesn't say anything one way or another about why the query failed, or if the failure is transient or due to a more enduring problem, or whether a subsequent attempt at the same query is likely to fail again.

Section 5.3.3, item 4.d of RFC 1034 says

     d. if the response shows a servers failure or other
        bizarre contents, delete the server from the SLIST and
        go back to step 3.

which would seem to support removing a server from the list upon receiving SERVFAIL, at least if you're a full resolver (dnspython is a stub resolver). But even here, it's still a trade-off because as I said SERVFAIL is a catch-all. It might be that the server had a very temporary resource issue and if you just asked again you'd get the answer you wanted.

I don't know of any further general clarification of SERVFAIL in subsequent DNS RFCs.

I will ponder how to put some kind of SERVFAIL policy knob into the resolver, but in the meantime you're best bet is to lower your timeouts as I said in my prior message.

/Bob

@raylu
Copy link
Author

raylu commented Jan 30, 2013

Thanks for looking into this. We have lowered our timeouts and that has taken care of our issue for now.

host and dig both seem to respond instantly with an error.

It also seems unreasonable to expect a retry after a SERVFAIL to work in the real world. If there's an issue with the server, that's their problem (other DNS resolvers will fail immediately anyway).

@rthalley
Copy link
Owner

We now do not retry a SERVFAILing nameserver by default

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants