Make output of puppetdb metrics cleaner
Changes:
* Change "puppet_server" to "puppetserver" when referring to the
product and naming files. This is consistent with how puppetserver
is referred to in product and documentation.
* Refactor puppetserver metrics script to store all metrics for a
timestamp in a single JSON file, even when multiple servers are
queried
* Refactor puppetdb metrics script to store all metrics for a
timestamp in a single JSON file, even when multiple servers are
queried
* Standardize file timestamp format to use ISO 8601
* Standardize filename to always be ${timestamp}.json
* Move data out of puppetdb template and into manifest. Adheres to
best practices for templates, which encourage keeping data and code
out of templates as much as possible.
These changes are targeted at simplifying data upload into Graphite /
Grafana.Add error catching between hosts
So that if gathering metrics from one host fails, it won't prevent others from being collected and written to file.
So that the total time it took to run the query for a given server is available.
Embed metrics data in template
While it's not awesome to have a conditional in a template, there doesn't seem to be an obviously more elegant way to do this right now.
Implement outputdir/host/timestamp.json structure
This commit updates the scripts to create one directory per server metrics are being obtained from, and to place files for each server in its dedicated directory. The idea is to make it easier to manually browse records on the filesystem using cd and ls, for very large sets of data.
Fix metrics_ensure=absent behavior
Make it so that if a metric is ensured absent, the outputdir is also ensured absent
Add deprecated support for old puppet_server API
So that customers already using this module don't have to change their own code to use the new version of the module.
Adhere to Unix philosophy "Rule of Silence"
The cleanup script which runs nightly runs `find [...] -delete -print`
to delete old reports. The -print flag causes the find command to
produce output like the following on stdout - which is then emailed to
users.
/opt/puppetlabs/pe_metric_curl_cron_jobs/puppetserver/127.0.0.1/20170304T025501Z.json
/opt/puppetlabs/pe_metric_curl_cron_jobs/puppetdb/127.0.0.1/20170304T025501Z.json
This is a normal recurring activity and does not need to be logged.
Sending emails for it violates the "Rule of Silence" in the Unix
philosophy. When a program has nothing surprising to say, it should say
nothing.
This commit aligns the retention cleanup cron job with the Unix
philosophy's rule of silence.Merge pull request #6 from reidmv/clean-output
Cleanup output for easy import into visualization tools
Add exception handling when failing to retrieve a puppetdb metric
Prior to this commit, if a single metric fails to be retrieved from the api then the whole metric script would fail. This commit adds exception handling that causes the error received to be logged in place of the metric hash. Fixes #9
Add exception handling to puppet server mertrics script
Prior to this commit, if the puppetserver status endpoint wasn't avialable or didn't respond then no metrics file would be created. After this commit, a metrics file is created with the error message in place of the metrics hash. Fixes #9
Create method for retrieving the status endpoint
This allows for raising the error at the correct place, when gathering the metrics, and then still get the api start and stop times recorded.
Add error and error count keys
In PuppetServer metrics you can only fail to collect the status endpoint so there's one error and error count key. In PuppetDB metrics you can fail to gather any individual metric so there is an error and error count key per metric.
Merge pull request #12 from npwalker/fix_exception_handling
Improve Exception Handling
Use new metric for command queue depth in PE 2017.1
Prior to this commit, we used the activemq metrics to determine the command queue depth. After this commit, when the PE version is 2071.1 or higher and thus includes the new stockpile queue implementation we use the new metric for determining the queue depth. Otherwise we continue to use the amq metrics.
Add command processing time and command processed metrics
These metrics allow for the calculation of sec/command, commands/sec, and commands processed as shown on the PDB performance dashboard.
Remove retry-persistence metrics
We added these while grasping for some new metrics that might be helpful. This metric never provided any debugging help and it has been removed in PE 2017.1.0 so we might as well just remove it altogether instead of special casing for version.
Merge pull request #11 from npwalker/fix_command_queue_metric_in_2017_1
Fix command queue metric in 2017.1
Updated README.md to reflect version 3.0.0
Prior to this commit the README.md showed information about v2.x, which is no longer accurate with v3. This update provides new output anf updated commands for the current state of the module.
Cleanup old scripts after renaming with .sh extension
Prior to this commit, if you upgrade from a < 2 version of the module to a > 2 version of the module you'd see that you had a puppetdb_metrics.sh script and a puppet_server_metrics.sh script in your scripts directory. After this commit, we ensure that those scripts are removed.
Merge pull request #10 from jarretlavallee/issue8_README
Updated README.md to reflect version 3.0.0 changes
Pull the PuppetDB status endpoint for consistent queue_depth metric
Prior to this commit, we used ActiveMQ metrics to find the queue depth metric and then in 2017.1 we used a specific metric from PuppetDB. However, the status endpoint has a queue_depth metric in it that correctly chooses the backend depending on version. After this commit, we will only track the status endpoint and not the specific queue depth metric so that we can look at the same metric across versions.
Combine puppetdb and puppetserver metrics script into one
Prior to this commit, the puppetserver and puppetdb metrics gathering scripts were seperate. However, after adding the status endpoint to puppetdb metrics tracking it became clear that these two scripts should be the same. After this commit, there is one template for making the metrics gathering script.
Allow passing a different metric script template to pe_metric
Prior to this commit, we consolidated the puppetdb and puppetserver metric script templates to be the same reducing complexity. However, after doing that we wanted to add a seperate script for tracking activemq metrics that doesn't follow the same template. After this commit, we allow new metrics like ActiveMQ to pass in a different script template to use. This allows consolidation of the puppetdb and puppetserver scripts and still allows for a one-off metrics template for new metrics. This commit also renames the metrics template to tk_metrics to indicate it is made for interacting with trapper keeper metrics.
Merge pull request #13 from npwalker/track_puppetdb_status_endpoint
Pull the PuppetDB status endpoint for consistent queue_depth metric
This commit adds the ability to monitor ActiveMQ, assuming it's been configured with Jolokia. That basically means Support-patched systems (using Charlie's pe_activemq_jolokia module), 2016.4.x >= 2016.4.4, or >2017.1.x.
Make cron minute, retention rate configurable
So that the frequency at which metrics are collected can be increased. This is especially useful and meaningful when systems like ActiveMQ, which have rapid changes in metrics we care about, are being monitored.
Merge pull request #14 from reidmv/activemq
Add ActiveMQ metrics gathering capability Add ability to configure retention days and collection frequency
Merge pull request #16 from abottchen/do-not-delete-the-scripts
Make tidy cron not delete scripts and only delete its own metrics
Refactor compression and deletion into one script
Prior to this commit, deletion of old files and compression of files were seperate cron jobs. After this commit, we have a single script to do both and stick with one cron job for tidying things.
Include hour, minute, and second in compressed tarball name
Prior to this commit, running the script to compress metrics files twice in one day would result in overwriting the existing tarball. If you want to run the script manually for debugging or any other reason, losing data would be a not preferred side effect. After this commit, the tarball name includes the hour, minute and second to avoid overwriting the tarball if run twice in one day.
Change paths in tidy compression
Prior to this commit the tar in tidy would embed the absolute path of the json files. Since paths can change, it is worth doing a relative path when compressing the files. This commit changes to relative paths for the tidy script Signed-off-by: Nick Walker <nick.walker@puppetlabs.com>
Only run the tidy cron jobs once daily
Prior to this commit the tidy cron jobs would run every minute of 2am,
which overwrites the tarball initially created by the script every
minute. This commit starts the cron job only at 2:00am instead.Merge pull request #15 from abottchen/add-tar-compression
Add compression of metrics files everyday. This produces about a 90% reduction in space needed to store metrics.
Move 'what do you get' to the top of the README (#19)
This update moves the value of the module to the top of the README and places how to setup the module second to give people context for why they should use the module before how to use it. This update also addresses changes made in version 3 of the module