Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Config plugin for dynamic config update #5409

Closed
wingsof opened this issue Feb 12, 2019 · 4 comments
Closed

Config plugin for dynamic config update #5409

wingsof opened this issue Feb 12, 2019 · 4 comments
Labels
discussion Topics for discussion

Comments

@wingsof
Copy link
Contributor

wingsof commented Feb 12, 2019

Feature Request

Hello,

Configuring telegraf is easy when I'm managing several hosts manually or there's no need of changing collecting configuration. But sometimes when the case telegraf is part of monitoring software or solution, telegraf should deployed on datacenter-wide and configuration managed centrally with automation tool. So the needs of configuration API or tool arises here.

Proposal:

Implementation of plugin with following features might be necessary to fulfill this requirement

  • Several APIs for reading and updating part or whole configuration
  • Support for authorization (key based or others like authorization API based with token)
  • Should work both with single configuration file and separated files circumstances.
  • telegraf should not restart with whole set of configurations but reflect changed only
  • each of reconfig policy need community discussion when needed (like how telegraf should work when new configuration stops monitoring of 'process A' while telegraf is on it's way of getting metrics)

Current behavior:

No feature currently.

Desired behavior:

Users / applications can config telegraf remotely when they want to.

Use case: [Why is this important (helps with prioritizing requests)]

One of the main trend of infrastructure management is IaC (Infrastructure as Code) and CaC (Configuration as Code). Surely telegraf is taking important position in infrastructure / application monitoring and management area now but has some limitation, and adding this feature make telegraf possible to integrate with CaC tools and expand application of telegraf in many ways or at least provides central management of configuration for users with DC scale monitoring.

@danielnelson
Copy link
Contributor

We have an issue open for this, #272, so I'm going to close this, but I'd like to talk over your ideas a bit first. Some of the work towards this has been prototyped already in github.com/danielnelson/tgconfig, so I have some ideas on how I would like this already. This repo contains a new plugin system, similar to the other plugin types, but for loading and updating the configuration.

Several APIs for reading and updating part or whole configuration

This is not something I'm promoting, and I'm not sure if I would merge a plugin that does this, but it would be possible. Instead of connecting to each Telegraf and changing the config I think Telegraf should be able to get it's configuration file from a store on the network, and also watch the store for changes. Then when you update the config store, all Telegraf you are running will grab the config and update automatically.

Still, like I said this should be possible. Right now I the "loader" plugin has two methods: one to signal config changes, and one to load the config itself.

Support for authorization (key based or others like authorization API based with token)
Should work both with single configuration file and separated files circumstances.

Yes

telegraf should not restart with whole set of configurations but reflect changed only

Yes, though not implemented yet in the prototype.

each of reconfig policy need community discussion when needed (like how telegraf should work when new configuration stops monitoring of 'process A' while telegraf is on it's way of getting metrics)

I assumed when a plugin is removed from the config we would replace the input between intervals, this is a little bit of simplification but seems fairly straightforward to me, let me know if I'm misunderstanding.

One of the main trend of infrastructure management is IaC (Infrastructure as Code) and CaC (Configuration as Code).

I view these as a separate concept that Telegraf supports quite well, maybe to a fault. There is certainly nothing currently preventing you from defining the config as static code or a template that is expanded whenever Telegraf is deployed. In my opinion, dynamic config updates really just give you integration with config tools you may already be using, so Telegraf fits into your environment more seamlessly. If not used carefully, dynamic configuration can move you away from infrastructure as code.

Anyway that just my 2 cents. Would also love to know what, if any, tools you are currently using to manage your configuration and if this model would work well with it.

@danielnelson danielnelson added the discussion Topics for discussion label Feb 12, 2019
@wingsof
Copy link
Contributor Author

wingsof commented Mar 4, 2019

Hello,

Sorry for my late reply. Here's my idea about your reply.

We have an issue open for this, #272, so I'm going to close this, but I'd like to talk over your ideas a bit first. Some of the work towards this has been prototyped already in github.com/danielnelson/tgconfig, so I have some ideas on how I would like this already. This repo contains a new plugin system, similar to the other plugin types, but for loading and updating the configuration.

Hey, it's great to hear you already have prototype! Wish it can be a part of master branch soon.

Several APIs for reading and updating part or whole configuration

This is not something I'm promoting, and I'm not sure if I would merge a plugin that does this, but it would be possible. Instead of connecting to each Telegraf and changing the config I think Telegraf should be able to get it's configuration file from a store on the network, and also watch the store for changes. Then when you update the config store, all Telegraf you are running will grab the config and update automatically.

Still, like I said this should be possible. Right now I the "loader" plugin has two methods: one to signal config changes, and one to load the config itself.

It's great if I can have method to signal telegraf with storage and access information. We have several subnets strictly separated each other in our company and each of target nodes only can access stores inside subnet it is located. Sometimes IP address or directory of config file in the server (or even access method) can change and I think telegraf can access new location with updated information as well.

But in that case, plugin itself might need to scan all of config file to check which part of configuration changed. It might be not so much of work but I cannot convince it's a good approach.

One simple informational method providing when the config changed and what was changed might be very helpful also for operational and debugging purpose.

Support for authorization (key based or others like authorization API based with token)
Should work both with single configuration file and separated files circumstances.

Yes

telegraf should not restart with whole set of configurations but reflect changed only

Yes, though not implemented yet in the prototype.

each of reconfig policy need community discussion when needed (like how telegraf should work when new configuration stops monitoring of 'process A' while telegraf is on it's way of getting metrics)

I assumed when a plugin is removed from the config we would replace the input between intervals, this is a little bit of simplification but seems fairly straightforward to me, let me know if I'm misunderstanding.

One of the main trend of infrastructure management is IaC (Infrastructure as Code) and CaC (Configuration as Code).

I view these as a separate concept that Telegraf supports quite well, maybe to a fault. There is certainly nothing currently preventing you from defining the config as static code or a template that is expanded whenever Telegraf is deployed. In my opinion, dynamic config updates really just give you integration with config tools you may already be using, so Telegraf fits into your environment more seamlessly. If not used carefully, dynamic configuration can move you away from infrastructure as code.

Anyway that just my 2 cents. Would also love to know what, if any, tools you are currently using to manage your configuration and if this model would work well with it.

We're using ansible to manage telegraf config of target nodes. But as you might imagine, most of sysadmins don't want to open their systems for passwordless ssh connection because of security reasons and they continuously requested us to provide other way of managing configuration. That's why I posted this feature request and I guest many of companies has similar requirement.

@danielnelson
Copy link
Contributor

sysadmins don't want to open their systems for passwordless ssh connection because of security reasons

Yeah, don't do this. Definitely use an authorized ssh key or some other secure method of authentication.

@danielnelson
Copy link
Contributor

But in that case, plugin itself might need to scan all of config file to check which part of configuration changed. It might be not so much of work but I cannot convince it's a good approach.

I'll keep this in mind, we have some users with thousands of plugins and it would be important to be able to reload quickly. I am going to close this issue, keep an eye on #272 for updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Topics for discussion
Projects
None yet
Development

No branches or pull requests

2 participants