-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose TTLs on lookups #444
Conversation
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
|
resolver/src/lookup.rs
Outdated
@@ -62,7 +81,14 @@ impl Lookup { | |||
rdatas.extend_from_slice(&*self.rdatas); | |||
rdatas.extend_from_slice(&*other.rdatas); | |||
|
|||
Self::new(Arc::new(rdatas)) | |||
// If both lookups have TTLs, choose the lower one. | |||
match (self.ttl(), other.ttl()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we need to compare the expiration times, not the TTLs, here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on what you want to track here. The raw TTL, or the time in which the the record is no longer valid. The RFCs definitely recommend the latter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks great. Thanks for the PR! I’ll merge this in once the tests pass.
Edit: I missed @briansmith comments. let's work that out.
resolver/src/lookup.rs
Outdated
).wait() | ||
.unwrap() | ||
.ttl(), | ||
Some(Duration::from_secs(86400)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to have a test for CNAME, in particular where the CNAME expires earlier than the A records.
It would also be good to have a test for NXDOMAIN that shows how this works for NXDOMAIN. In particular, "The remaining of the current meanings, of being the TTL to be used for negative responses, is the new defined meaning of the SOA minimum field." from https://tools.ietf.org/html/rfc2308; see https://tools.ietf.org/html/rfc2308#section-5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some tests for TTLs as they relate to CNAME: https://github.com/bluejekyll/trust-dns/blob/master/resolver/src/lookup_state.rs#L946
We might have a gap on the NXDOMAIN/NoData responses, in regards to TTL.
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Codecov Report
@@ Coverage Diff @@
## master #444 +/- ##
=========================================
+ Coverage 86.54% 86.64% +0.1%
=========================================
Files 133 133
Lines 13391 13399 +8
=========================================
+ Hits 11589 11610 +21
+ Misses 1802 1789 -13
Continue to review full report at Codecov.
|
resolver/src/dns_lru.rs
Outdated
@@ -63,7 +63,7 @@ impl DnsLru { | |||
let ttl_until = now + ttl; | |||
|
|||
// insert into the LRU | |||
let lookup = Lookup::new(Arc::new(rdatas)); | |||
let lookup = Lookup::new_with_ttl(Arc::new(rdatas), ttl); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per @briansmith recommendation, it seems ttl_until is what is wanted here, no? I missed that before
I agree, that seems much more correct. In 75cbbcd and ffc9cae, I've modified this PR so that the new field represents the expiration time for the lookup rather than the TTL. |
resolver/src/dns_lru.rs
Outdated
@@ -63,7 +63,7 @@ impl DnsLru { | |||
let ttl_until = now + ttl; | |||
|
|||
// insert into the LRU | |||
let lookup = Lookup::new(Arc::new(rdatas)); | |||
let lookup = Lookup::new_with_ttl(Arc::new(rdatas), ttl_until); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest renaming things to be consistent. Your new API functions is called valid_until
. I suggest renaming the ttl_util
variable here the same thing. I also suggest renaming Lookup::new_with_ttl
to Lookup::new_valid_until
. Then it becomes easier to understand the code holistically.
resolver/src/lookup.rs
Outdated
} | ||
|
||
/// Return a new instance with the given rdatas and TTL. | ||
pub fn new_with_deadline(rdatas: Arc<Vec<RData>>, ttl: Instant,) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ttl
should be renamed valid_until
.
resolver/src/lookup.rs
Outdated
(Some(my_ttl), Some(other_ttl)) => | ||
Self::new_with_deadline(Arc::new(rdatas), min(my_ttl, other_ttl)), | ||
(Some(ttl), None) | (None, Some(ttl)) => | ||
Self::new_with_deadline(Arc::new(rdatas), ttl), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't clear that this logic is correct when mixing Some(ttl)
and None
. What's the rationale here? Is it even possible that this could happen for lookup_ip
or is this something that could only happen for other kinds of queries?
OTOH maybe RFC 2308 means we should always have a TTL; if not from the record, then from the overall response? I'm not sure since I don't know all the DNS details.
Regardless, we should write down the reasoning, at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, the only time Lookup::append
is called on a Lookup
that doesn't have a TTL is here:
https://github.com/bluejekyll/trust-dns/blob/97afd93b007e0ba4140be4f09ba7245f821f1fa9/resolver/src/hosts.rs#L63-L76
In this case, it seems that the empty lookups are only created as placeholders and should be immediately overwritten with the new lookup, so in this case, the logic seems correct. Everywhere else where Lookup::append
is called, the lookups are actual responses to a DNS query and should always have a TTL.
I'd be happy to add a comment explaining this in the method, if you like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hosts is an interesting case. It's driven by a file, as opposed to an actual lookup. I wonder if we could remove the optional, and just pick the max_ttl in this case. At the moment, we don't support live reloading of the hosts file. So if this is indeed the only case, then I say we get rid of it by giving it a better default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the other main case whereLookup
s are constructed without a TTL are the loopbacks. Do you think it's okay for localhost to appear to have MAX_TTL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just double checked the code, yes, I believe this is safe. My only concern was if this could have some effect on the LRU cache, but I think both of these code paths return before cache is either used. So we don't have to worry about what that would mean.
So yes, I think we can safely simplify this to default to MAX_TTL and thus get rid of the Option on TTL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bluejekyll okay, sounds good! I'll make that change shortly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bluejekyll: changed in 526b8d7
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
resolver/src/lookup.rs
Outdated
pub fn new(rdatas: Arc<Vec<RData>>) -> Self { | ||
Lookup { rdatas } | ||
/// Return new instance with given rdatas and the maximum TTL. | ||
pub fn new(rdatas: Arc<Vec<RData>>,) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should switch every caller to use new_with_deadline
and have them pass in valid_until
. That way we don't accidentally default when we shouldn't. Is that practical or is there something that makes that awkward?
(NIT: Are we really going to do this trailing comma in arguments thing in rustfmt?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should switch every caller to use
new_with_deadline
and have them pass invalid_until
. That way we don't accidentally default when we shouldn't. Is that practical or is there something that makes that awkward?
I don't think there's anything that makes that particularly difficult. I originally chose to leave the new
constructor as-is and make it default because I wanted to keep the diff small, but I'm happy to change this so the caller is responsible for defaulting if you think that's better.
resolver/src/lookup.rs
Outdated
/// Return new instance with given rdatas and the maximum TTL. | ||
pub fn new(rdatas: Arc<Vec<RData>>,) -> Self { | ||
let valid_until = | ||
Instant::now() + Duration::from_secs(MAX_TTL as u64); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an Instant::MAX
or something like it that can be used instead?
Instant::now()
is relatively expensive (potentially a syscall). MAX_TTL
is for parsing/validating input on the wire.
If there's no Instant::MAX
or equivalent then maybe we should use something like enum Validity { ValidUntil(Instant), Forever }
, isomorphic to Option
, with comparison semantics such that ValidUntil(x) < Forever
for all x.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's no
Instant::MAX
or equivalent then maybe we should use something likeenum Validity { ValidUntil(Instant), Forever }
, isomorphic toOption
, with comparison semantics such thatValidUntil(x) < Forever
for all x.
I like this approach (and you're correct that there's no Instant::MAX
).
However, I would like to get @bluejekyll's opinion on whether we should do something like this rather than using MAX_TTL
(which he suggested in https://github.com/bluejekyll/trust-dns/pull/444#discussion_r185392627) before I make more changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I’m fine with MAX_TTL. I understand the concern @briansmith has, but I don’t feel like we need to over complicate this. As long as we guarantee that the correct TTL is used on insert.
Forcing the caller to pick a good value, rather than a decent default, seems potentially error prone as well. We could change ‘new’ to ‘with_max_ttl’ to make it more explicit.
@bluejekyll It looks to me like the test failures on the most recent build of this branch are unrelated to my changes, do you agree that seems to be the case? |
I just kicked them off again. There are a couple of tests that have some races between server startup and the test running. I need to track those down and fix them. Not your issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Thanks for the great work and the contribution! |
@bluejekyll thank you (and @briansmith) for all your guidance and feedback! :) |
Closes #711. Depends on #967. This PR changes the proxy's `destination` module to honor the TTLs associated with DNS lookups, now that bluejekyll/trust-dns#444 has been merged and we can access this information from the Trust-DNS Resolver API. The `destination::background::DestinationSet` type has been modified so that, when a successful result is received for a DNS query, the DNS server will be polled again after the deadline associated with that query, rather than after a fixed deadline. The fixed deadline is still used to determine when to poll again for negative DNS responses or for errors. Furthermore, Conduit now accepts an optional CONDUIT_PROXY_DNS_MIN_TTL environment variable that will configure a minimum TTL for DNS results. If the deadline of a DNS response gives it a TTL shorter than the configured minimum, Conduit will not poll DNS again until after that minimum TTL is elapsed. By default, there is no minimum value set, as this feature is intended primarily for when Conduit is run locally for development purposes. Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Closes #711. Depends on #967. This PR changes the proxy's `destination` module to honor the TTLs associated with DNS lookups, now that bluejekyll/trust-dns#444 has been merged and we can access this information from the Trust-DNS Resolver API. The `destination::background::DestinationSet` type has been modified so that, when a successful result is received for a DNS query, the DNS server will be polled again after the deadline associated with that query, rather than after a fixed deadline. The fixed deadline is still used to determine when to poll again for negative DNS responses or for errors. Furthermore, Conduit now accepts an optional CONDUIT_PROXY_DNS_MIN_TTL environment variable that will configure a minimum TTL for DNS results. If the deadline of a DNS response gives it a TTL shorter than the configured minimum, Conduit will not poll DNS again until after that minimum TTL is elapsed. By default, there is no minimum value set, as this feature is intended primarily for when Conduit is run locally for development purposes. Signed-off-by: Eliza Weisman <eliza@buoyant.io>
This PR changes the
trust-dns-resolver
API to expose the TTLs of DNSlookups, which is needed downstream in Conduit (linkerd/linkerd2#711).
Reviewers should note the following:
Lookup
s were constructed in places where the TTL is notknown or there is no TTL (e.g. for
localhost
inlookup_state.rs
).Therefore, I made the TTL field on
Lookup
optional, and added a newLookup::new_with_ttl
constructor, to avoid breaking existing code.append
ing twoLookups
, if both have known TTLs, the returnedLookup
has the lower of the two TTLs. Thus, the appendedLookup
should time out when the shorter-lived of either of its constituents
does. This seemed like the correct behaviour to me, so that
append
doesn't become a way for short-lived
Lookup
s to escape their TTLs,but I wasn't sure about this --- please let me know if this intuition
is incorrect!
TTL
function inlookup.rs
.Signed-off-by: Eliza Weisman eliza@buoyant.io