Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add VictorOps Notifier #417

Merged
merged 1 commit into from Jul 27, 2016
Merged

Add VictorOps Notifier #417

merged 1 commit into from Jul 27, 2016

Conversation

@rhazdon
Copy link

rhazdon commented Jul 3, 2016

Hi,

I upgraded the alertmanager for sending alerts to VictorOps.
In my company we are using this feature already so please excuse I didn't open a discussion beforehand.

What do you think about it?

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jul 6, 2016

@brian-brazil @fabxc I guess you are more likely than me to have an opinion on this.

VERSION Outdated
@@ -1 +1 @@
0.2.1
0.2.2

This comment has been minimized.

Copy link
@beorn7

beorn7 Jul 6, 2016

Member

Version change will happen during release, not upon individual commits.

This comment has been minimized.

Copy link
@rhazdon

rhazdon Jul 6, 2016

Author

Of course. :)
I will fix that asap.

EDIT: Removed.

@rhazdon rhazdon force-pushed the rhazdon:master branch from 713b8b8 to bdb57db Jul 6, 2016
NotifierConfig: NotifierConfig{
VSendResolved: true,
},
MessageType: `{{ if eq .Status "firing" }}CRITICAL{{ else }}RECOVERY{{ end }}`,

This comment has been minimized.

Copy link
@brian-brazil

brian-brazil Jul 6, 2016

Member

Does Victorops treat messages differently depending on this? We may wish to hardcode resolved notifications to send a recovery message.

This comment has been minimized.

Copy link
@rhazdon

rhazdon Jul 7, 2016

Author

Yes. Depends on the value, VictorOps will create or close an incident.
It expects one of the following keywords in this field: INFO, WARNING, ACKNOWLEDGEMENT, CRITICAL, RECOVERY. Available here: API Docs.

Is there maybe a better way to resolve the exact status/type of an alert?

This comment has been minimized.

Copy link
@mirthy

mirthy Jul 7, 2016

I wonder whether or not this MessageType switching should occur instead impl.go. Kind of like how the pagerduty and opsgenies ones do it.

(PagerDuty): https://github.com/rhazdon/alertmanager/blob/001e0716bda50a9721b7c8b9cc32751969f6b8e6/notify/impl.go#L405

(Opsgenie):
https://github.com/rhazdon/alertmanager/blob/001e0716bda50a9721b7c8b9cc32751969f6b8e6/notify/impl.go#L673

But I can see an argument against it if we want to be able to use INFO, WARNING, etc.

This comment has been minimized.

Copy link
@brian-brazil

brian-brazil Jul 7, 2016

Member

That's the interesting questino. Allowing the user to do it this way will lead to users breaking recovery notifications. So what we might want to do is leave this configurable but only apply to firing alerts, and have some code to substitute in emergency if an invalid value is used.

APIURL string `yaml:"api_url"`
RoutingKey string `yaml:"routing_key"`
MessageType string `yaml:"message_type"`
EntityID string `yaml:"entity_id"`

This comment has been minimized.

Copy link
@brian-brazil

brian-brazil Jul 6, 2016

Member

This should be hardcoded to avoid users changing it and inadvertently breaking things.

This comment has been minimized.

Copy link
@mirthy

mirthy Jul 7, 2016

From my understand in the API docs, Entity Id is very much like the OpsGenie "Source," there we use the __alertmanagerURL but in this case we're using Group labels, which may be more readable and understandable too.

This comment has been minimized.

Copy link
@brian-brazil

brian-brazil Jul 7, 2016

Member

My read of the docs is that this is like Pagerduty's IncidentKey, and should be handled accordingly.

This comment has been minimized.

Copy link
@mirthy

mirthy Jul 8, 2016

ahh yes, I totally agree with that reading now.

This comment has been minimized.

Copy link
@fabxc

fabxc Jul 26, 2016

Member

Yes, this should not be configurable and still needs to be removed.
In the implementation it should use GroupKey(ctx).

This comment has been minimized.

Copy link
@brian-brazil

brian-brazil Jul 26, 2016

Member

Is this visible to the end user in VictorOps? If so we probably want something prettier as is currently done.

This comment has been minimized.

Copy link
@fabxc

fabxc Jul 26, 2016

Member

No we don't. In PagerDuty the key is visible to the user as well and we don't care. By default VictorOps creates a random ID for you, so even that it's not that different.

We can think about having something more descriptive in the future but not in the context of this PR. The current group key is also a fairly large digest of things, which would exceed a few thousands characters quite easily.

@@ -31,6 +31,11 @@
{{ define "pagerduty.default.instances" }}{{ template "__text_alert_list" . }}{{ end }}


{{ define "victorops.default.entity_id" }}{{ .GroupLabels.SortedPairs.Values | join " " }} {{ if gt (len .CommonLabels) (len .GroupLabels) }} / {{ with .CommonLabels.Remove .GroupLabels.Names }}{{ .Values | join " " }}{{ end }}{{ end }}{{ end }}

This comment has been minimized.

Copy link
@brian-brazil

brian-brazil Jul 6, 2016

Member

It's incorrect to include the common labels, this should only be the group labels.

This comment has been minimized.

Copy link
@rhazdon

rhazdon Jul 7, 2016

Author

Removed. :)

@@ -31,6 +31,11 @@
{{ define "pagerduty.default.instances" }}{{ template "__text_alert_list" . }}{{ end }}


{{ define "victorops.default.entity_id" }}{{ .GroupLabels.SortedPairs.Values | join " " }} {{ if gt (len .CommonLabels) (len .GroupLabels) }} / {{ with .CommonLabels.Remove .GroupLabels.Names }}{{ .Values | join " " }}{{ end }}{{ end }}{{ end }}
{{ define "victorops.default.message" }}{{ template "__subject" . }} | {{ template "__alertmanagerURL" . }}{{ end }}
{{ define "victorops.default.from" }}{{ template "__alertmanager" . }}{{ end }}

This comment has been minimized.

Copy link
@brian-brazil

brian-brazil Jul 6, 2016

Member

We've used the URL for other similar fields

This comment has been minimized.

Copy link
@rhazdon

rhazdon Jul 7, 2016

Author

This field returneds the name of the tool from which the incident came from.
VictorOps displays this information then.
b

To avoid misunderstanding I can rename this field into the json key VictorOps expects (monitoring_tool), described here: VictorOps API.

@mirthy

This comment has been minimized.

Copy link

mirthy commented Jul 12, 2016

Any more progress with this? What do we need to do here?

@rhazdon

This comment has been minimized.

Copy link
Author

rhazdon commented Jul 13, 2016

@mirthy Yes, in 2-3 days. I became a daddy two days ago. :)

@rhazdon rhazdon force-pushed the rhazdon:master branch from aa743c1 to b459466 Jul 19, 2016
@rhazdon

This comment has been minimized.

Copy link
Author

rhazdon commented Jul 19, 2016

I rebased my fork and solved the merge conflicts.

@mirthy

This comment has been minimized.

Copy link

mirthy commented Jul 21, 2016

Hi guys, looks like @rhazdon made some more changes, any chance someone could look at this again? Sorry to be a pain. Is there anything I can do to help?

@rhazdon

This comment has been minimized.

Copy link
Author

rhazdon commented Jul 26, 2016

Hi Guys,
is there anything I should change or improve? :)

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Jul 26, 2016

I was on vacation. Am looking now.

}

fmt.Printf("unexpected VictorOps response from %s (POSTed %s), %s: %s",
apiURL, msg, resp.Status, body)

This comment has been minimized.

Copy link
@fabxc

fabxc Jul 26, 2016

Member

No line break here.

Also this should be using log.Debugf rather than fmt.Printf.

This comment has been minimized.

Copy link
@rhazdon

rhazdon Jul 26, 2016

Author

Hi @fabxc, thank you very much. You are right, I fixed that. :)

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Jul 26, 2016

@rhazdon please also check the comment on the entity ID above.

@rhazdon

This comment has been minimized.

Copy link
Author

rhazdon commented Jul 26, 2016

I did. Testing it currently on prodution.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Jul 26, 2016

Thanks. Can you squash that into one commit and then we should be good for merge.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Jul 26, 2016

Nope, compile error ahead.

Djordje Atlialp
Add default VictorOpsAPIURL

Add VictorOps default config

Add VictorOpsConfig struct in notifiers

Add new template tags for victorops

Add notifications logic for victorops

Compiled template tags with make assets

Remove common labels from entity_id template

Set messageType default value to CRITICAL

Recovery messageType is not configurable anymore. Firing state only allows specific keys

Make assets

Using log.Debugf

EntityID should not be configureable

Remove entity_id from template

Use GroupKey(ctx) as entity_id

Improve debug logging

Fix type of entity_id
@rhazdon rhazdon force-pushed the rhazdon:master branch from 5d02fcc to 8e0f405 Jul 26, 2016
@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Jul 27, 2016

👍

@fabxc fabxc merged commit 42696d9 into prometheus:master Jul 27, 2016
1 of 2 checks passed
1 of 2 checks passed
continuous-integration/travis-ci/pr The Travis CI build failed
Details
ci/circleci Your tests passed on CircleCI!
Details
@fuzzyami

This comment has been minimized.

Copy link

fuzzyami commented Oct 19, 2016

@rhazdon
Thanks for your work on VO integration - its something we'd want to use too.

I'm trying it now - and my alerts have payloads ("StateMessage" in VO terms) that could be somewhat improved. The current format contains (among other things) a space-delimited list of values:

"TooManyConfs1 AMS 3.3.3.3 GER-03 1.1.1.1"

This list originates in the key-value pairs that are part of the alert-data in the AlertManager. It would be awesome if that list contained both the keys and the values from the alert:

alertname:TooManyConfs1 datacetner:AMS external_ip:3.3.3.3 region:GER-03 internal_ip:1.1.1.1

Those key/value pairs provide important context to the alert. Without the key, the readability of the alert is greatly reduced. Perhaps this can be easily fixed?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Oct 19, 2016

That's a default for all receivers where the message is considered short, as usually you'll know what the values are from experience and there's things like SMS size limits to worry about. You can override it with notification templates.

@fuzzyami

This comment has been minimized.

Copy link

fuzzyami commented Oct 19, 2016

Thanks, Brian.

For future readers of this thread, I'm pasting below our current VictorOps receiver config. This is a little crude, but it provides all the info we need (alert name is first, then summary and finally full 'payload'). In the future we might customize the summary text and drop the labels.

receivers:
- name: victorOps-receiver
  victorops_configs:
    - api_key: <our_secrect_api_key>
        routing_key: <our_routing_key>
        message: 'Alert: {{ .CommonLabels.alertname }}. Summary:{{ .CommonAnnotations.summary }}. RawData: {{ .CommonLabels }}'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.