Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DISCUSS: configuration validation #1244

Closed
calebhailey opened this issue Apr 26, 2016 · 15 comments

Comments

Projects
None yet
7 participants
@calebhailey
Copy link
Member

commented Apr 26, 2016

Hello all. We would like to collect feedback and input on how Sensu deals with invalid configuration. There are currently two types of "invalid" configuration:

  1. Invalid JSON (i.e. bad JSON syntax; e.g. missing commas, etc)
  2. Invalid configuration definition (e.g. missing a required definition attribute; e.g. missing client name attribute)

Unfortunately Sensu also currently behaves in two different ways when encountering invalid configuration:

  1. Weak validation (e.g. Sensu will just ignore configuration files with invalid JSON, and continue to load/run)
  2. Strong validation (e.g. Sensu will stop if it encounters an invalid configuration definition such as a client definition that is missing a required configuration attribute)

We would like to settle on a single validation behavior for all invalid configuration scenarios. Should Sensu ignore invalid configuration (weak validation) and continue loading/running -OR- should Sensu reject invalid configuration and stop (strong validation)?

Please feel free to comment with your thoughts and/or vote using emoji reactions.

  • 👍 = "it's all good!" Sensu should ignore invalid configuration and keep running (weak validation)
  • 👎 = "no way!" Sensu should reject invalid configuration, and stop (strong validation)

We're targeting the 0.25 release for configuration validation improvements, so we'll leave this issue open for a few weeks before we make a final decision.

Thanks in advance for your time & consideration. #monitoringlove

@calebhailey calebhailey changed the title DISCUSS: invalid configuration DISCUSS: configuration validation Apr 26, 2016

@calebhailey calebhailey added this to the 0.25 milestone Apr 26, 2016

@portertech

This comment has been minimized.

Copy link
Member

commented Apr 26, 2016

We should probably make this configurable. Default to strong validation, but provide a CLI argument like --ignore_invalid or --ignore_invalid_config to change the behaviour to weak.

@portertech

This comment has been minimized.

Copy link
Member

commented Apr 26, 2016

I personally would use strong validation, but at least it's configurable for those who feel differently. Making the validation behaviour consistent between configuration file loading errors and invalid definitions is the big win 👍

@DerekSchenk

This comment has been minimized.

Copy link

commented Apr 27, 2016

👎 I would prefer strong validation. Troubleshooting is difficult when the configuration is accepted but not used as intended.

Allowing a flag to ignore invalid configs is an option, but when would having an incorrect configuration be a desired operating condition?

What about having a check_config command that would validate changes, this would allow you to restart with confidence knowing that your configs are ok.

@calebhailey

This comment has been minimized.

Copy link
Member Author

commented Apr 27, 2016

@DerekSchenk as of Sensu 0.23 there is a new --print_config CLI option that has Sensu load configuration and quit, printing the result to STDOUT. This should make it possible to "test" configuration before restarting Sensu. See here for documentation: https://sensuapp.org/docs/latest/configuration#sensu-command-line-interfaces-and-arguments

@cjchand

This comment has been minimized.

Copy link

commented Apr 28, 2016

I think the current behavior is optimal because:

  • It makes no sense to try and continue when there's misconfig to the level that breaks core functionality (e.g.: missing/invalid config for Redis or Transport for sensu-server). I'd rather have the process die so I can alert on it (monitor the monitors, yo) than have it stay alive and "silently" fail (where "silent" means I don't wanna troll all the logs for errors).
    • This is not to say that misconfigs such as wrong host/port fall into this category, as you can't reliably tell at run-time if it's a misconfig or if Redis/RMQ are down. In these cases, it needs to continue to run just in case it is the scenario where Redis/RMQ are in in the crapper.
    • Said differently: I'm talking strictly about config that's malformed or incorrect such that we can't even attempt to connect to the Store or Transport pieces.
  • Having soft failures for check configs is awesomeness. I'd rather lose one check due to a misplaced/missing comma than have sensu-client flat out not run. This is a major plus over Nagios where one missing curly brace kills all of your monitoring.

In our scenario, we are a DevOps shop. Service Owners are responsible for, erm I mean, empowered to create their own monitoring. We're in the process replacing Nagios + check_mk with Sensu. We've had fairly good traction with getting people to write check_mk local checks, which helps enable a self-service, roll-your-own, monitoring model. They can iterate on their checks as they wish and leave the Nagios server-side alone. If they jack up a local check, then it only affects that particular check.

Fast forward to when we get the next-gen architecture in place, it means we have lots of folks who are new to Sensu. That means we have many, many opportunities for simple config mistakes as people come up to speed on Sensu. I fear torches and pitchforks if I move them to a model where a missing comma takes all of their monitoring offline.

In summary: I'd leave it the way it is.

Main improvement that comes to mind is sweeping the config before a service restart to call out any bad config before actually stopping any Sensu process.

$0.02 deposited :)

@portertech

This comment has been minimized.

Copy link
Member

commented Apr 28, 2016

@cjchand Thank you for the input!

I think using --ignore-invalid for weak validation may be best in any case where Sensu is considered to be a self-serve service, as anyone can still push a valid JSON file that contains invalid definitions.

Making the behaviour consistent between file and definition validation will be the big improvement, making the validation action configurable is the icing on the cake.

@portertech

This comment has been minimized.

Copy link
Member

commented Apr 29, 2016

After having a go at Sensu settings validation that supports --ignore-invalid for weak validation, I have found it to cause several problems. One example, SENSU_LOADED_TEMPFILE (https://github.com/sensu/sensu-settings/blob/master/lib/sensu/settings/loader.rb#L140) which is consumed by other processes like handlers to load settings. The loaded tempfile file list with --ignore-invalid may contain invalid files and/or invalid definitions created from one or more files. Invalid files can be excluded from the tempfile list, however, invalid definitions would still get through, potentially causing havoc.

@portertech

This comment has been minimized.

Copy link
Member

commented Apr 29, 2016

We should go with strong validation ONLY, however, provide a preflight settings check feature which can be used to validate updating configuration prior to restarting the Sensu process.

@portertech

This comment has been minimized.

Copy link
Member

commented Apr 29, 2016

Idea: add an exit status to --print_config for validation, use it in init restart.

@portertech

This comment has been minimized.

Copy link
Member

commented May 3, 2016

I have created a PR that adds --validate_config #1254

@kobbe

This comment has been minimized.

Copy link

commented May 6, 2016

I am quite happy as long as the problems are printed to stdout so I can read about the problems easy once they occur.

@sstarcher

This comment has been minimized.

Copy link

commented May 9, 2016

If I can only pick strong or weak I would pick strong. I have personally had someone misconfigure something that proper validation would've caught that disabled a check for a few weeks before it was noticed.

@Gillingham

This comment has been minimized.

Copy link

commented Jun 1, 2016

#1254 should probably also have related issues for the service script things to add a configtest or similar functionality to the various systemd/service/init scripts in sensu/sensu-build to enable easier use of this new flag

@calebhailey calebhailey modified the milestones: 0.25, 0.26 Jun 6, 2016

@calebhailey

This comment has been minimized.

Copy link
Member Author

commented Jun 13, 2016

The 0.25 release is going to hit today with fewer features than originally anticipated & primarily only some internal improvements. Moving this issue to 0.26 to keep the conversation going.

@calebhailey calebhailey modified the milestones: 0.26, 0.27 Aug 16, 2016

@calebhailey calebhailey removed this from the 0.27 milestone Aug 16, 2016

@calebhailey

This comment has been minimized.

Copy link
Member Author

commented Aug 16, 2016

This was implemented in 0.25 via --validate_config, so I'm going to close this.

However, not all of the packages are taking advantage of --validate_config in their service scripts (only sysvinit uses it), which we're tracking as a separate issue: https://github.com/sensu/sensu-build/issues/172

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.