Skip to content

Commit

Permalink
SOC-3529 add support for pd merging (#10)
Browse files Browse the repository at this point in the history
* Add comment explaining how the different pieces fit together, and start renaming things to use correct terms...

* Rename `zabbix_get_event_id` to `zabbix_get_Event_id_from_pd_object` and
(try to) make it smart enough to pull an eventid from either an alert or an
incident.

* Minimum viable code (it ACK'd two zabbix events that were part of a single merged PD incident).

* Attempt to fallback to old mechanism if /alerts end-point didn't work...

* It actually works now. Merges the events properly in Zabbix.
- Switched JSON encoding to using a JSON object instead of imported `to_json` so can "allow_blessed" for all json stuff.
- Added a bunch of debugging stuff, including every func outputting its arguments on every call if DEBUG>=5.
- Added logic and a couple functions for merging Zabbix events when PD incidents merge.

* Style fixups with `perltidy`

* Another pass with (newer version of) `perltidy`.

* Add a dependency blocking LWP from installing on perl 5.10 pipeline

* Try to get a lower version of a dependency module when compile testing for perl 5.10

* Added configurable option for how to handle incoming merge events

* Update documentation and example config to reflect feature changes.
Also a bit of just general docs/example improvements while I was in there.

* Minor `perltidy` fixes.

* Remove the attempt to validate the pdmergeaction option; it wasn't working
and doesn't add enough value to be worth digging into further (anything
other than `merge` or `resolve` gets treated same as `ignore`).

And a tiny bit more debugging.

* Minor formatting fixup via `perltidy`
  • Loading branch information
eric-eisenhart committed Jul 28, 2023
1 parent 90405db commit 01c3955
Show file tree
Hide file tree
Showing 4 changed files with 265 additions and 97 deletions.
3 changes: 3 additions & 0 deletions .github/workflows/perlcompile.yml
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,9 @@ jobs:
# Install a specific version of Test::Deep when testing with perl 5.10.0
- run: perl -e 'use 5.12.0;' || cpanm --install Test::Deep@1.130

# Requirement that gets an incompatible version on perl5.10 compilation tests
- run: perl -e 'use 5.011;' || cpanm --install IO::Socket::IP@0.41

# Install the dependencies declared by the module ...
- run: cpanm --installdeps --skip-satisfied .

Expand Down
39 changes: 24 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,19 +23,37 @@ from PagerDuty.
Zabbix event to match (with note of who changed it).
- If triggeredupdate=1, PagerDuty incident "triggered" (created) will add
the PD URL as a comment on the Zabbix event.
- If resolvedupdate=1, PagerDuty incident "resolved" will try to close the
Zabbix event. (can't close everything).
- PD "resolved" status can mean this is a "child" in an incident merge,
user marked 'resolved', or resolved because Zabbix closed event via API.
- For merges, if pdmergeaction=merge the "parent" incident will be
marked as a "cause", and the child as a "symptom" with parent as
it's "cause". In Zabbix you'll see the parent in events list and
can expand to see children.
- If merge and pdmergeaction=ignore, nothing at all is done.
- If merge and pdmergeaction=resolve, will do same as not a merge.
- If not a merge, and resolvedupdate=1, will try to close the
Zabbix event. (can't close everything, but tries).
- Otherwise, will do nothing.

If there's interest, could update Zabbix for "delegated", "escalated",
"reassigned", "reopened", "responder.added", "responder.replied" and
"status_update_published". I'm not sure how to even cause some of those,
and the others didn't seem important to have show up in Zabbix.

The Priority/severity mappings are currently hard-coded, but if there's
interest in making those configurable it shouldn't be hard to do.

### Requirements
- Zabbix 6.4+ (might work with older, I'm testing on 6.4)
- perl 5.10+
- valid SSL certs
- some perl modules
- valid SSL certs on zabbix web (or needs to use http not https)
- Perl modules (in cpanfile and all are commonly available in most distros):
- CGI
- JSON
- LWP::UserAgent
- LWP::Protocol::https
- CGI::Carp
- AppConfig

This CGI is stateless, so can easily be clustered for HA. Probably can run
on same servers as zabbix-web, but we run it elsewhere because our Zabbix
Expand Down Expand Up @@ -78,19 +96,14 @@ servers are purely internal/VPN-only.
```
7. Configure Zabbix to send alerts to PagerDuty with the Zabbix WebHook included with recent Zabbix versions.

**Important**: Use the generic-sounding "Events API v2" and _not_ the Zabbix-branded one.
Or create an Orchestration Rule (under Automation) that routes to your Zabbix service and
use an integration key from there.
(As of May 2023, if you use the Zabbix-branded integration, key information vanishes somewhere
in PagerDuty and pagerduty2zabbix can't work out the zabbix event id)

If you've updated Zabbix, this may need to be updated to a version of
the script that sets pagerduty "dedup_key" to zabbix "eventid".

I recommend setting the `token` in the `Media type` to `{ALERT.SENDTO}`
and putting your PagerDuty API token into "Send to" of the user's media
configuration. (so you have the easy option of additional PD integrations for different teams, etc)
6. Copy pagerduty2zabbix.conf.example to ./pagerduty2zabbix.conf or /etc/pagerduty2zabbix.conf
Make sure not accessible to public, since needs a secret (zabbix API key).
7. Edit pagerduty2zabbix.conf:
- Get an API token from PagerDuty that can update the relevant PagerDuty events and set `pdtoken` to that.
(profile pic > User Settings > Create API User Token)
Expand Down Expand Up @@ -120,11 +133,7 @@ servers are purely internal/VPN-only.
## FAQ/Common Problems/Likely Problems:

- "Unable to determine zabbix event id" in error log:
This means that it couldn't find a `dedup_key` in the PagerDuty event.
If you use a "Zabbix" integration key from PagerDuty, the dedup_key
silently vanishes. Use the generic-looking "Events API V2" instead,
or create an orchestration (in automation) that routes to your Zabbix
service(s).
This means that it couldn't find a `dedup_key` (or `alert_key`) in the PagerDuty event.

## References

Expand Down
15 changes: 12 additions & 3 deletions pagerduty2zabbix.conf.example
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,10 @@ zabbixurl=https://zabbix.example.com/zabbix
zabbixurl=https://zabbix1.example.com/zabbix
zabbixurl=https://zabbix2.example.com/zabbix

# Output "success" header early, regardless of how well we do.
# Useful to prevent PagerDuty from disabling the WebHook if you have intermittent problems.
superearlysuccess=1

# How many times to try each zabbix URL before trying next or giving up:
zabbixretries=1

Expand All @@ -31,6 +35,11 @@ triggeredupdate=1
# Allow PD to (try to) close zabbix events when a PD incident is marked resolved?
resolvedupdate=1

# Output "success" header early, regardless of how well we do.
# May be useful to prevent PagerDuty from disabling the WebHook if you have intermittent problems.
superearlysuccess=0


# How to handle PD merge events.
# Valid choices:
# - merge (merge zabbix events to match PD)
# - ignore (do nothing)
# - resolve (try to close the child events)
pdmergeaction=merge
Loading

0 comments on commit 01c3955

Please sign in to comment.