…eport.rb. This is in response to issue #4.
…gger_type instead. This was another place where I was assuming that the Nagios service was named 'Nagios' (and Pingdom named 'Pingdom'). Fixed to switch on the trigger type instead.
This recognizes automatic (e.g., API) incident resolutions that come from services named something other than "Nagios." Bad assumption.... Fixes #4.
This mostly takes out the organization-specific options in pull request commits, and then makes things a little more consistent with other scripts.
…script to send an email notification to the on-duty person
There's a little too much going on with date ranges now. Maybe just choose a start date? or start and period?
Thanks, guys! Also, added a few more comments about things to fix later.
Instead of reporting on the currently-in-progress rotation (the old default), this changes the default to cover the last-completed rotation. Seems like a better default since you usually want an apples-to-apples comparison of one whole week versus another. `rotation-report.rb -a 0` will give you the old behavior.
These options, --start-time and --end-time, take ISO 8601 date/times (e.g., '2011-03-02T14:00:00-05:00'), and can be used to set any arbitrary reporting period you want. The "previous" period will be the same length, one week earlier. (Note all the labels say "vs. last week" for the percentage change values, but that should be roughly accurate for most uses.) I added this because I noticed that using the current rotation period only works if no irregular exceptions are set. If you have a weekly rotation and someone sets a two-day exception, the report will only cover the two-day period versus the same two days a week ago. Not as useful. So, this works around that problem for now.
Using the `-a`|`--rotations-ago COUNT` option, you can create rotation reports for rotations that have already elapsed. The `-c`|`--campfire-message` option will paste the generated report into the configured Campfire room. This still only supports weekly rotations for now.
Assuming that the alerts in a given report (current and previous period) only span two consecutive months, this will now capture all the needed alerts. Also works around a bug in Chronic; should send the fix to the maintainer. At this point the rotation report is pretty useful for reporting on the current rotation. Need to add options for output control and for choosing which rotation period to report on.
Previously the incident call only pulled in 100 incidents. Hopefully (!) that would be enough, but just in case, this will make repeated calls (up to 10, for 1000 incidents total) until it finds an incident that is prior to the report period (current rotation and the previous rotation). The loop sleeps one second between calls for API politeness, so this slows things down if you have a lot of alerts in a report period.
Also, some code cleanup.
I'm very sensitive to people getting interrupted in their work, and would like to know if those interruptions are on their way up or down. I also want to track how often people are being woken up later at night to deal with alerts. The report now shows the volume of SMS/Phone alerts overall and compared to the last rotation, and also the volume of late-night (10p to 8a) alerts, also compared to the last rotation. Also started in on some code cleanups thanks to the Hack Arts crew.
Could also take the approach of only asking for resolved incidents in the API call, but it seems better to report the full alert load and instead add a count of how many are resolved.