-
Notifications
You must be signed in to change notification settings - Fork 429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not retry the same name server on a negative response #1589
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1589 +/- ##
==========================================
- Coverage 79.49% 79.49% -0.00%
==========================================
Files 180 180
Lines 17971 17973 +2
==========================================
+ Hits 14286 14287 +1
- Misses 3685 3686 +1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes look good to me. Looking at the changes here, it's difficult to understand how the trust factor still plays into it. Are there sufficient comments that discuss the interaction of name server pool retries and RetryDnsHandle
retries? Now that you have state in your head, would be good to make sure it's sufficiently documented.
(Might also be good to rebase this on top of current main to deal with the tokio issue in CI.) |
That's a good point. The
That's a great point, I don't think their interaction is very well documented, partly because they are in separate crates (the name server pool is part of the |
88253e3
to
336632b
Compare
I've rebased on top of current main and added some more documentation on |
336632b
to
4471bdf
Compare
@peterthejohnston the documentation changes here look great, thanks! |
Currently in Trust-DNS, there are two mechanisms that allow failed queries to be retried by the resolver:
RetryDnsHandle
reattempts queries against the name server pool if they failThe name server pool retries basically any unsuccessful response against fallback name servers, unless it gets a
trusted
NoRecordsFound
error, which occurs whenNameServerConfig::trust_nx_responses
is true for that server and the resolver received an emptyNXDomain
response.The
RetryDnsHandle
uses theRetryableError
trait to determine if an error should be retried. However, the implementation ofRetryableError::should_retry
forResolveError
uses the same criteria as the name server pool, which I think is not the desired behavior. This leads to the same query being retried to the same name server when it shouldn't be.For example (this was real behavior observed when testing the resolver on a device running Fuchsia):
NODATA
response from the first name server, and the name server pool retries the query on the other ones, getting the same responseRetryDnsHandle
now retries that entire query over the whole name server pool again, because it got an error for whichRetryableError::should_retry
istrue
. This happensResolverOpts::attempts
number of times.in effect, we send this query 3 (# of name servers) * 3 (# of total attempts) = 9 times, 3 times to each name server. The name server pool is used correctly here to retry on a negative response; however, the RetryDnsHandle should probably only be used on IO errors (e.g. we failed to connect to a given server) or other errors on which it's reasonable to ask the same name server again. If we successfully get a negative response from a server, e.g. a
NODATA
response, it doesn't make sense to expect an OK response when we retry, so we should not be retrying the query to that same name server.The desired end state is one where, if the resolver encounters no IO errors, only one query is made to each name server in the pool, at most.