Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow DNS interface to use agent cache #5300

Merged
merged 3 commits into from
Feb 25, 2019

Conversation

ShimmerGlass
Copy link
Contributor

Adds to new configuration parameters "dns_config.use_cache" and
"dns_config.cache_max_age" controlling how DNS requests use the agent
cache when quering servers.

@pierresouchay
Copy link
Contributor

@banks I told you about this one :-)

Copy link
Contributor

@pierresouchay pierresouchay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Aestek documentation mainly, but not clear to end user how max_stale VS max_age VS enable cache do work together

agent/config/runtime.go Show resolved Hide resolved
website/source/docs/agent/options.html.md Show resolved Hide resolved
@ShimmerGlass
Copy link
Contributor Author

I will work the documentation so it's clearer what setting corresponds to what. Maybe the Agent caching section could also be moved to its own page since it is not an HTTP API only thing anymore.

@mkeeler
Copy link
Member

mkeeler commented Feb 4, 2019

@Aestek One thing about the current Agent Caching documentation is that most of it is specific to HTTP including all the various headers that can be passed with the requests. It would however be worthwhile to document the agent cache usage on this page: https://www.consul.io/docs/agent/dns.html

Copy link
Member

@mkeeler mkeeler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good. There are a few questions inline.

One other prerequisite to getting this merged would be for all of the RPCs in use to be cacheable. Presumably the reason you didn't already include it was because the Catalog.NodeServices RPC was not already an implemented cache-type. It should be fairly straight forward implementation that is mostly copy pasted from the existing Catalog.ServiceNodes cache type.

Lastly after my other comment on the PR about all the configurations I found out that the cache will only perform stale queries so max_stale is superceded by using the cache and our docs should reflect that.

agent/config/builder.go Show resolved Hide resolved
website/source/docs/agent/options.html.md Show resolved Hide resolved
agent/dns.go Outdated
reply, ok := raw.(*structs.PreparedQueryExecuteResponse)
if !ok {
// This should never happen, but we want to protect against panics
d.logger.Printf("dns: internal error: response type not correct")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should add a "[ERR]" to the front of the message.

agent/dns.go Outdated
@@ -1201,6 +1256,12 @@ RPC:
ttl, _ = d.GetTTLForService(out.Service)
}

// if the result came from cache, substract its age from the ttl
ttl -= age
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Aestek & @pierresouchay

I am not sure that the TTL calculations here are what is best/least-surprising.

One thing that initially caught me off guard was that this could potentially set a TTL lower than the remaining cache lifetime of the data.

For example, if you have a service ttl of 30s, a cache max age of 60s and your cached data is 20s old. This will set the TTL to 10s. At first glance you could ask why should a DNS client ask for the same data before the cache would have expired. If the cache hasn't expired and the request is made again it will just use the cached value the same as before. Therefore the TTL could have been something like min(<remaining cache lifetime>, <service ttl - age>) which would have resulted in a 40s TTL.

There are two reasons why that might not be best.

  1. Automatic cache refreshing
  2. The data might be fetched on behalf of the HTTP API with a different max age or requiring revalidation.

For prepared queries, there is no automatic cache refreshing so that doesn't matter right here.

So the question is whether it is better to have the service and node ttls be a max TTL for those resources or whether we should allow the cache lifetime to increase it. There are advantages and disadvantages to both. What are your thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting max(<remaining cache lifetime>, <service ttl - age>) ? In your example min(<remaining cache lifetime>, <service ttl - age>) would still have resulted in a 20s TTL.
While I agree that its useless to have a TTL lower than the next cache refresh duration it seems weird to me to set a TTL higher than the one configured by the user.
When agent DNS cache the TTL settings can be lowered a lot since the requests would not hit so hard on the servers anymore. In our case we might juste use them for rate limiting with values between 1s and 5s.
Maybe it's best not to change the TTL depending on the cache age at all ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Sorry, max(<remaining cache lifetime>, <service ttl - age>

However thinking about it some more it might be best the way you originally did it. It may result in more requests but they should be very easy to respond to from the cache.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Aestek @mkeeler As an DNS admin, I would like to enjoy the feature without having to handle the complexity of implementation. I have the feeling it should be simpler than that.

My feeling is:

  • TTL => stay unchanged
  • max-age = max_stale duration ideally (so, I don't care what your guys are doing of kind optimizations)

Thus, enabling cache would simply be:

  • dns_config.enable_cache=true

@ShimmerGlass ShimmerGlass force-pushed the dns-cache branch 2 times, most recently from bdcac9b to 393b268 Compare February 6, 2019 15:08
@ShimmerGlass
Copy link
Contributor Author

  • cache entry age no longer affects TTLs
  • I've implemented Catalog.NodeServices cached and used in DNS

Copy link
Member

@mkeeler mkeeler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few more changes if you would. In particular I liked how you split out the cache get or RPC for NodeServices into its own function. It would be great if you could do the same for the other similar code.

agent/agent.go Outdated Show resolved Hide resolved
agent/dns.go Outdated Show resolved Hide resolved
agent/dns.go Show resolved Hide resolved
agent/dns.go Outdated Show resolved Hide resolved
Thibault Gilles added 3 commits February 20, 2019 16:34
Adds to new configuration parameters "dns_config.use_cache" and
"dns_config.cache_max_age" controlling how DNS requests use the agent
cache when quering servers.
@ShimmerGlass
Copy link
Contributor Author

<3

@pierresouchay
Copy link
Contributor

Yeah! We are gonna test this at scale next week in our DCs
CC @Annih

@Annih
Copy link

Annih commented Feb 22, 2019

Sounds great :)

@mkeeler mkeeler merged commit f1cdfbe into hashicorp:master Feb 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants