Provide insight into resource utilization implied by templates #205

ryanbreen · 2015-02-24T15:58:56Z

A poorly written template can [D]DoS the Consul servers, representing a real risk to the health of the cluster. consul-template should provide some feedback when the template(s) will create a high number of views. In that case, you might even refuse to run unless something like a -force flag is passed.

Additionally, it would be cool if -dry provided accounting for the number of views. That would make it a pseudo-query planner and a more useful tool when attempting to minimize the impact of templates.

(Note that I'm using "views" as a proxy for "things that do potentially expensive stuff. If there are leading indicators of template complexity other than view count, they would be cool to include in this accounting.)

The text was updated successfully, but these errors were encountered:

sethvargo · 2015-02-24T17:16:04Z

@ryanbreen thank you for opening an issue. While this is a great feature request, I am afraid it is not possible due to the dynamic nature of how Consul Template compiles dependencies. This would have been possible in the original versions of Consul Template, but since v0.6.0, Consul Template implements n-pass evaluation. This allows the user to do something like:

{{range ls "active_services"}}
{{range service .Key}}
{{.Address}}:{{.Port}}
{{end}}
{{end}}

This template is rather simple - it queries a list of key-value pairs in the KV store called active_services and, for each of those entires, it attemps to watch that service. This hypothetical template would allow operators to add/remove services from this template simply by changing a value in the KV store. So why is this relevant?

If active_services contains 1 key, this template requires 2 views (ls + service x 1). But if the KV store has 50 keys, this template requires 51 views (ls + service x 50). Even more complicated, those are dynamic values, so the template could require 5 services to start (which might be deemed "healthy" or "ok"), but then over time grow to 500. This is a classic n+1 problem and I believe this is something that users of Consul Template (and related tooling) need to be cogniscenet of when using the tool. I hope this clarifies why we cannot know the complexity of an operation due to the highly dynamic and ever-changing nature of dependencies.

I would not be opposed to adding some warnings to the README, etc, but I do not think there is an appropriate or viable UX where Consul Template needs to decide for the user. @armon and I have discussed this pretty extensively and have come to the conclusion this is a problem that should be solved in Consul moreso that Consul Template. Consul 0.5 is a step in the right direction and much more performant than older versions, but we have some ideas about how to make the Consul Template use case even better.

If you would like specific insights about what is going on, set the log level to info and you can see each query happening. If you want super detailed information, set the log level to debug and you can see the output of each request, benchmarks, etc.

While it is true that a poorly-written template could DDoS your Consul cluster, I think this is a case where documentation/education is a better answer than a complex checks and engineering within the tool itself.

sethvargo · 2015-02-24T19:41:32Z

@ryanbreen I just wanted to let you know that I just merged in #206 which might help you out. On each pass, Consul Template will now tell you how many views (dependencies) are being watched. If that number is greater than 128 (which was chosen based off of the "common" Consul setup), the log message is converted to a warning. This log message will appear in both once and dry mode, so you can detect potential DDoSes by running CT with -dry -once and inspecting that line.

The reason we do not think it is appropriate to define a flag or require user-input to allow CT to continue is that it requires the user have too much knowledge about their own internal consul setup. Similarly, this is why we only warn as opposed to requiring validation to continue. For a large Consul installation, it may be perfectly viable to watch 250 keys in a single template 😄.

However, what this does not fix is this: CT is not aware of other CT instances. If you have a template that uses 10 dependencies, you won't get the warning. But if you are rendering that template on 50 different machines, you actually have 500 total dependencies that you're hitting Consul with. Unfortunately there is no way for CT to detect this, since it is not aware of the other instances of CT that are running on other machines. In this way, it is important to be cognizant of the number of instances running the template as well.

Finally, if you are running Consul 0.2+, you can specify the -max-stale parameter in CT. This will allow CT to query non-leader nodes in the cluster, more evenly-distributing the weight. You can read more about it in the README.

ryanbreen · 2015-02-24T19:43:19Z

Very cool, thanks!

sethvargo closed this as completed Feb 24, 2015

sethvargo mentioned this issue Feb 24, 2015

Warn the user if they have over 128 watches running #206

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide insight into resource utilization implied by templates #205

Provide insight into resource utilization implied by templates #205

ryanbreen commented Feb 24, 2015

sethvargo commented Feb 24, 2015

sethvargo commented Feb 24, 2015

ryanbreen commented Feb 24, 2015

Provide insight into resource utilization implied by templates #205

Provide insight into resource utilization implied by templates #205

Comments

ryanbreen commented Feb 24, 2015

sethvargo commented Feb 24, 2015

sethvargo commented Feb 24, 2015

ryanbreen commented Feb 24, 2015