building alerting system for grafana #2209

Dieterbe · 2015-06-22T21:22:15Z

Hi everyone,
I recently joined raintank and I will be working with @torkelo, @mattttt , and you, on alerting support for Grafana.

From the results of the Grafana User Survey it is obvious that alerting is the most commonly missed feature for Grafana.
I have worked on/with a few alerting systems in the past (nagios, bosun, graph-explorer, etsy's kale stack, ...) and I'm excited about the opportunity in front of us:
we can take the best of said systems, but combine them with Grafana's focus on a polished user experience, resulting in a powerful alerting system, well-integrated and smooth to work with.

First of all, terminology sync:

alerting: executing logic (threshold checks or more advanced) to know the state of an entity. (ok, warning, critical)
notifications: emails, text messages, posts to chat, etc to make people aware of a state change
monitoring: this term covers everything about monitoring (data collection, visualizations, alerting) so I won't be using it here.

I want to spec out requirements, possible implementation ideas and their pro's/cons. With your feedback, we can adjust, refine and choose a specific direction.

General thoughts:

integration with existing tools vs built-in: there's some powerfull alerting systems out there (bosun, kale) that deserve integration.
Many alerting systems are more basic (define expression/threshold, get notification when breached), for those it seems integration is not worth the pain (though I won't stop you)
The integrations are a long term effort. I think the low hanging fruit ("meet 80% of the needs with 20% of the effort") can be met with a system
that is more closely tied to Grafana, i.e. compiled into the grafana binary.
That said, a lot of people confuse seperation of concerns with "must be different services".
If the code is sane, it'll be decoupled packages but there's nothing necessarily wrong with compiling them together. i.e. you could run:
- 1 grafana binary that does everything (grafana as you know it + all alerting features) for simplicity
- multiple grafana binaries in different modes (visualization instances and alerting instances) even highly available/redundant setups if you want to, using an external worker queue

That said, we don't want to reinvent the wheel: we want alerting code and functionality to integrate well with Grafana, but if high-quality code is compatible, we should use it. In fact, I have a prototype that leverages some existing bosun code. (see "Current state")

polling vs stream processing: they have different performance characteristics,
but they should be able to take the same or similar alerting rule definitions (thresholds, boolean logic, ..), they mostly are about how the actual rules are executed and don't
change much about how rules are defined. Since polling is much simpler and should be able to scale fairly far this should IMHO be our initial focus.

Current state

The raintank/grafana version currently has an alerting package
with a simple scheduler, an in-process worker bus as well as rabbitmq based, an alert executor and email notifications.
It uses the bosun expression libraries which gives us the ability to evaluate arbitrarily complex expressions (use several metrics, use boolean logic, math, etc).
This package is currently raintank-specific but we will merge a generic version of this into upstream grafana. This will provide an alert execution platform but notably still missing is

an interface to create and manage alerting rules
state management (acknowledgements etc)

these are harder problems, which I hope to tackle with your input.

Requirements, Future implementations

First off, I think bosun is a pretty fantastic system for alerting (not so much for visualization)
You can make your alerting rules as advanced as you want, and it enables you to fine-tune over time, backtest on historical data, so you can get them just right.
And it has a good state machine.
In theory we could just compile bosun straight into grafana, and leverage bosun via its REST api instead of Golang api, but then we have less finegrained control and
for now I feel more comfortable trying out piece by piece (piece meaning golang package) and make the integration decision on a case by case basis. Though the integration
may look different down the road based on experience and as we figure out what we want our alerting to look like.

Either way, we don't just want great alerting. We want great alerting combined with great visualizations, notifications with context, and a smooth workflow where you can manage
your alerts in the same place you manage your visualizations. So it needs to be nicely integrated into Grafana. To that end, there's a few things to consider:

some visualized metrics (metrics plotted on graphs) are not alerted on
some visualized metrics are alerted on:
- A: with simple threshold checks: easy to visualize alerting logic
- B: with more advanced logic: (e.g. look at standard deviation of the series being plotted, compare current median against historical median, etc): can't easily be visualized nex
  to the input series
some metrics used in alerting logic are not to be vizualized

Basically, there's a bunch of stuff you may want visualized (V), and a bunch of stuff you want alerts (A), and V and A have some overlap.
I need to think about this a bit more and wonder what y'all think.
There will definitely need to be 1 central place where you can get an overview of all the things you're alerting on, irrespective of where those rules are defined.

There's a few more complications which I'll explain through an example sketch of how alerting could look like:

let's say we have a timeseries for requests (A) and one for errorous requests (B) and this is what we want to plot.
we then use fields C,D,E to put stuff that we don't want to alert on.
C contains the formula for ratio of error requests against the total.

we may for example want to alert (see E) if the median of this ratio in the last 5min ago is more than 1.5 of what the ratio was in the same 5minute period last week, and also
if the errors seen in the last 5min is worse than the errors seen since 2 months ago until 5min ago.

notes:

some queries use different timeranges than what is rendered
in addition to processing by tsdb (such as Graphite's sum(), divide() etc which return series) we need to be able to reduce series to single numbers. fairly easy to implement (and in fact currently the bosun library does this for us)
we need boolean logic (bosun also gives us this)
in this example the expression only uses variables defined within the same panel, but it might make sense to include expressions of other panels/graphs.

other ponderings:

do we integrate with current grafana graph threshold settings (which are currently for viz only, not for processing) ? if the expression is a threshold check, we could automatically
display a threshold line
using the letters is a bit clunky, could we refer to the aliases instead? like #requests and #errors?
if the expression are stats.$site.requests and stats.$site.errors, and we want to have seperate alert instances for every site (but only set up the rule once)? what if we only want it for a select few of the sites. what if we want different parameters based on which site? bosun actually supports all these features, and we could expose them though we should probably build a UI around them.

I think for an initial implementation every graph could have two fields, like so:

warn: - expression
         - notification settings (email,http hook, ..)
crit: - expression
        -notification settings

where the expression is something like what I put in E in the sketch.
for logic/data that we don't want to visualize, we just toggle off the visibility icon.
grafana would replace the variables in the formula's, execute the expression (with the current bosun based executor). results (state changes) could be fed into something like elasticsearch and displayed via the annotations system.

Thoughts?
Do you have concerns or needs that I didn't addres?

The text was updated successfully, but these errors were encountered:

linkslice · 2015-06-22T22:59:14Z

I'd love to help out with this! My suggestion would be to stick with the nagios-style guidelines. That way the tools could easily be used with other monitoring tools. e.g. Nagios, Zenoss, Icinga, etc..

torkelo · 2015-06-23T15:51:41Z

The biggest thing about this feature is getting the basic architecture right.

Some questions i would like to explore

What components are required how are they run (in proc in grafana, out of proc),
How should things be coordinated.
Should we ignore "in stream" alerting, (only focus on pull based)

Going more in depth into 1)
I am worried about making grafana-server into a monolith. Would like to find a way to seperate grafana-server into services that are more isolated from each other (and can be run either inproc or as seperate processes). This was kind of the plan with the bus abstraction. Another option would be to have the alerting component only speak to grafana via the HTTP api, might limit integration, not sure.

dahendel · 2015-06-23T16:23:23Z

I agree with torkelo. In my experience with other projects with everything "built-in" it can get quite cumbersome to troubleshoot. I like the idea of the service running externally, but a nice config page in grafana that talks to the service through the HTTP api to handle managing all the alerts. Also, for large scale deployments this would probably end up being a requirement as performance would eventually degrade (I would at least have this as a configuration option).

do we integrate with current grafana graph threshold settings (which are currently for viz only, not for processing) ? if the expression is a threshold check, we could automatically display a threshold line

I think that could be a good place to start. Alert if its set, don't if its not.

Back to number 1. I think that if the bosun service could run separately but still have the ability to completely configure everything through grafana that would be, in my opinion, ideal.

Keep up the awesome work.

Jhors2 · 2015-06-26T21:34:00Z

The only shortcoming I have seen with bosun is the data sources it can use. If you could leverage the language for expressing bosun alerting but also integrate with existing data sources that are configured via the regular grafana UI it would certainly be ideal.

Being able to represent alerting thresholds, when you are close to them, as well as automatically push annotations for when they have triggered in my mind make an ideal single pane UI.

Looking forward to the work that will be done here!

damm · 2015-06-26T23:57:07Z

It should use the thresholds defined in the Dashboard to alert on
Let's keep it simple; if the Dashboard shows the color for warning it should be alerting.
This likely be something outside of the grafana-server process itself.
... Something that would use the rest api to scrape the dashboards and it's settings and render them and alert using an external command.
Alerting level; just a box to drop in the editor that this Dashboard should be monitored; and it should be checked every minute. If there's no data it should still for a period it should still alert? (checkbox)

Lastly; as we depend on Grafana more I admit i'm willing to say 2. could be something i'd be willing to pay for.

dennisjac · 2015-07-03T15:40:43Z

I'm curious why people think this should be included into Grafana at all?
Grafana neither receives nor stores that actual data but "only" visualizes it. Any alerting system should instead be based on the data in the metric store.
If this is really integrated into Grafana I hope this can be disabled because over here we already use Icinga for alerting so any kind of alerting in Grafana would only clutter the GUI more even though it wouldn't be used at all.

damm · 2015-07-03T19:05:40Z

Absolutely correct @dennisjac; Grafana only renders things.

But as we've moved things server side it's no longer just client rendering; the possibilities of a worker process that could check your metrics and alert; is less difficult.

Data is in a database; provided it's sprinkled with the data that tells it to check the metric ...

Some people may agree or disagree that we should not cross the streams and make Grafana do more than visualize it (roughly) but I'm not them.

dennisjac · 2015-07-03T19:35:36Z

I'm not really opposed to the feature for people who want it to be integrated but I hope it will be made optional for people who already have monitoring/alerting systems available.

The new Telegraf project (metric collector from the influxdb guys) also is looking at monitoring/alerting features which is dislike for the same reason. I elaborated on this here:
https://influxdb.com/blog/2015/06/19/Announcing-Telegraf-a-metrics-collector-for-InfluxDB.html#comment-2114821565

damm · 2015-07-03T19:50:09Z

I think torkelo has done a really good job at giving us features in Grafana2 that we don't have to enable.

As far as influxdb they're going to have to make some money somehow; either off of support of influxdb and professional services or products for it.

The latter sounds much more viable

elvarb · 2015-07-06T21:13:01Z

Another angle on this. There seems to be upcoming support for elasticsearch as a metric storage for grafana. Bosun can right now query elasticsearch for log data.

Would it make sense when designing the alerting system to allow for alerts from log data as well? Maybe not a feature for the first version, but something that can be implemented later.

Also I agree with the idea of splitting the processes. Have Grafana the interface to view and create alerts, have something else handle the alerting. Having the alerting part api based would also allow other tools to interface with it.

deebs031 · 2015-07-08T15:22:46Z

+1 to Alerting. Outside DevOps usage, applications built for end users need to provide user defined alerts. Nice to have it in the visualization tool...

j1n6 · 2015-07-13T18:52:06Z

+1 this will close the loop - the propose of getting metrics.

AdrianParente · 2015-07-14T00:03:31Z

+1 Alerting from Grafana + a Horizontally Scaling Backend from InfluxDB will make them the standard to beat for Metrics Alerting Configurations

lesaux · 2015-07-15T16:40:37Z

+1 I'd love horizontal scaling of the alerting on multiple grafana nodes.

rsetzer · 2015-07-15T18:05:37Z

It would be great if one could associate a "debounce" like behavior with an alert. For example, I want to fire an alert only if the defined threshold exceeds X for N minutes.

I have seen this with some of the alerting tools, unfortunately we are currently using Seyren which doesn't appear to provide such an option. We are using Grafana for our dashboard development and are looking forward to pulling the alerting into Grafana as well. Keep up the good work.

j1n6 · 2015-07-15T18:43:31Z

We have two use cases:

infrastructure team creates alert through provision tools as usual into common monitoring stack (common cluster check or system checks in nagios friendly system )
software developers create app metrics via Grafana

We would love to have an unified alerting system handles alerts, flap detection, escalation and contacts. That helps us recording and correlating events/operations in the same source of truth. A lot of system has solved the alerting problem. I hope Grafana can do better at this in long term, short term not to reinvent existing systems would be helpful in terms of deliverables.

One suggestion is Grafana can provide API for extracting monitoring definition (alerting state), third party can contribute configuration export plugins. This would be very ideal in our use case exporting nagios configuration.

More importantly, I would love to see some integrated anomaly detection solution too!

On 15 Jul 2015, at 17:40, Pierig Le Saux notifications@github.com wrote:

+1 I'd love horizontal scaling of the alerting on multiple grafana nodes.

—
Reply to this email directly or view it on GitHub.

falkenbt · 2015-07-15T22:08:24Z

I agree with @activars. I don't really see why a dashboard solution should handle alerting which is a more or less solved problem by lots of other tools, mostly quite mature. Do one thing and do it well.

IMHO it would make more sense to focus on the integration part.

Example: Define dynamic warn/crit thresholds in grafana (e.g. like in @Dieterbe example above) and provide an API (REST?) that returns the state (normal, warn, crit) of exactly this graph. A nagios, icinga, bosun etc. could request all the "monitoring" enabled graphs (another API feature), iterate through the individual states and do the necessary alerting.

In our case service catalogs and defined actions are the hard part - which service is how business critical, where to send emails to, flapping etc. Also you would not have to worry about user / group management in grafana which most companies already have in a central place (AD, LDAP, Crowd etc.) and integrated with the alerting system.

Also we have to consider that unlike a dashboard solution the quality requirements for an alerting tool can be considered much higher in term of reliability, resilience, stability etc. which creates (testing) effort that shouldn't be underestimated.

Also what about non-timeseries related checks, like calling a webservice, pinging a machine, running custom scripts...would you want that in grafana as well? I guess the bosun adoption would provide all this but I'm not really familiar with it.

On the other hand I can image how a simple alerting system would make a lot of users happy that don't have a good alternative in place, but this could maybe be resolved with some example integration patterns for other alerting tools.

bigkraig · 2015-07-16T00:18:48Z

As much as I want Grafana to solve all of my problems, I think falkenbt hit the nail on the head with this one.

An API to expose the mentioned data, some plumbing in bosun, and some integration patterns with common alerting platforms makes a lot of sense.

jemilsson · 2015-07-17T18:41:05Z

Congratulations on your new job at raintank @Dieterbe! I have been reading your blog for a while and you have some really sound ideas on monitoring, particularly regarding metrics and its place in alerting. I am confident that you will find a good way implementing alerting in grafana.

As you probably would agree upon, the people behind Bosun are pretty much doing alerting the right way. The lacking thing with Bosun is really the visualizations. I would like to see Bosun behind the Grafana UI. Combining Grafanas dashboard and bosuns alerting behind the same interface would make for an awesome and complete monitoring solution.

Also i think it would be a shame to fragment the open source monitoring community further, your ideas on monitoring seem to be really compatible with the ideas of the people behind Bosun. If you would unite i am sure the result would be great.

Where i work we are using Elastic for logs/events and have just begun using InfluxDB for metrics. We have been exploring different solutions for monitoring and are currently leaning towards Bosun. We are already using Grafana for dashboards, but would like to access all our monitoring information through the same interface, it would be great if Grafana could become that interface.

Keep up the great job, and good luck!

sudharsh · 2015-07-21T14:08:18Z

On a related tangent, we got the alerting part working alerting working by integrating grafana with riemann. Was a nice exercise getting to know the internals of grafana :).

This was easier with riemann as the config is just clojure code. I imagine this integration is going to be harder in Bosun.

Here are a couple of screenshots of it in action

The changes to the grafana part included adding an "/alerts" and a "/subscriptions" endpoint and have it talk to another little api that sits on top for riemann to do the crud.

The nice thing is the fact that the changes in the alert definitions are reflected immediately without having to send a SIGHUP to riemann. So enabling/disabling, time period tweaks for state changes is just a matter of changing it in the UI and have that change propagate itself to riemann.

Still haven't benchmarked this integration but I don't think it's going to be that bad. Will blog about it after I cleanup the code and once it goes live.

The whole reason we did this was because people can just go ahead and set these alerts and notifications from a very familiar UI and not bother us to write riemann configs :).

dinoshauer · 2015-07-22T08:06:55Z

@sudharsh your implementation sounds really interesting. Are you planning on releasing this to the wild?

Dieterbe · 2015-07-22T10:20:34Z

lots of good ideas, thanks everyone.
Inspired by some of the comments and @pabloa's https://github.com/pabloa/grafana-alerts project we decided to focus first and foremost on the UI and UX for configuring and managing alerting rules as part of the same workflow of editing dashboards and panels. Grafana would save those rules somewhere and provides easy access to it so that other scripts or tools can fetch the alerting rules.
Perhaps via a file, an API call, a section in the dashboard config, or an entry in the database.
(I like the idea of having it as part of the dashboard definition itself, so that open source projects can come with grafana dashboard json files for them which would have alerting rules included though not necessarily active by default. on the other hand having them in a database seems more robust)
Either way, we want to provide easy access so you can generate configuration for whatever other system you want to use that actually executes the alerting rules and processes the events. (from hereon I'll refer to this as a "handler").
Such a handler could be nagios, or sensu, or bosun, a tool that you write or the litmus alert scheduler-executor which is a handler that you could compile into grafana which provides a nice and simple integration backed by bosun but we really want to make sure you can use whatever system you want.

As long as your handler supports querying the datastore you use. we would start off with simple static threshold but later also want to make it easy to choose reduction functions, boolean expressions between multiple conditions, etc.

@sudharsh that is a very nice approach. I like how your solution can talk directly to a remote API, bypassing the intermediate step described above (of course this does imply it only works for 1 given backend which we try to avoid), and that it can automatically reload the configuration. (you're right, bosun currently does not support it, it might in the future. FWIW the litmus handler does handle this fine and it uses bosun's expression evaluation mechanism). I never really got into riemann much. Mostly I've been concerned about adding such a different language to the stack that not many people understand or can debug when things go wrong. But I'm very curious to learn more about your system and about Riemann's CLJ code. (I'ld love it if my suspicions are incorrect)

@dennisjac yes it would be optional.
@elvarb there is a ticket for ES as a datasource. in fact the goal is that if grafana supports rendering data from a given datasource it should also support composing alerting rules for it. As for query execution/querying this of course depends on what handler you decide to use. (for the litmus handler we'll start out with the most popular ones like graphite and influxdb)
@rsetzer : agreed, it's a good thing to be able to specify how long a threshold should be exceeded before we trigger
@falkenbt : I believe many things can be phrased as a timeseries querying problem (for example the pings example). But you're right, some things aren't really timeseries related and those are out of scope for what we're trying to build here. And I think that's OK: we want to provide the best way to configure and manage alerting on timeseries and aim for integration with other systems that are perhaps more optimized for the "misc scripts" case (such as nagios, icinga, sensu, ...). As for concerns such as reliability of delivery, escalations etc, you could hook in a service such as pagerduty.
@activars & @falkenbt does this seem to match your expectation or what do you think could be improved specifically?
@jemilsson thank you! and that's exactly how i see it: bosun is great at alerting but not good at visualization. Grafana is great at visualization and UX but has no alerting. I'm trying to drive a collaboration which will grow over time I think

Does anyone have any thoughts on what kind of context to ship in notifications like emails?
At the very least, the notification should contain a viz of the data you're alerting on, but it should imho be possible to include other related graphs. Here we could use grafana's png rendering backend when generating the notification content. I'm also thinking about leveraging grafana's snapshot feature. like when an alert triggers, take a snapshot of a certain dashboard for context.
and maybe that snapshot (html page) could be included in the email, or that might be a bit too much data/complexity. also the javascript features would be unavailable in mail clients anyway (so you wouldn't be able to zoom on graphs in an email). Perhaps we could link from the email to a hosted dashboard snapshot.

felixbarny · 2015-07-22T10:25:36Z

I like the general approach of docker - batteries included, but removeable. So a basic alerting implementation that can be swapped out would be a good approach imho.

JulienChampseix · 2015-07-22T12:48:49Z

influxdb will be supported for alerting ? or only graphite ?

nickman · 2015-07-28T19:21:08Z

One thing I would like to see is the idea of hierarchical alert trees. There's simply too many facets being monitored and stand alone alert states have an unmanageable cardinality. With a hierarchy tree, I can define all these low level alerts which roll up to medium level alerts which roll up to high level ......

As such, each rolled up alert automatically assumes the high severity of all the children below it. In that way, I can get an impression of [and manage] system health accurately with a much lower surface area of analysis.

This is an example I have borrowed from an old document I wrote a while ago. Yes, please chuckle away at the use of the word "Struts". It's OLD ok ? This presents a very simple hierarchy for one server.

At some point, the server experiences sustained 75% CPU utilization, so this trips these alerts into a warning state: CPU-# --> CPU --> Host/OS --> System

If one really applied themselves, one could keep an eye on an entire data center with one indicator. (yeah, not really, but this serves as a thought excercise)

ahrajabi · 2015-08-05T05:21:44Z

Why do not use graphite-beacon? I think you can merge graphite-beacon that is very light with grafana.

Dieterbe · 2015-08-05T08:02:26Z

@felixbarny I like that terminology. we'll likely adopt that wording.
@JulienChampseix yes the standard handler would/will support influxdb
@nickman that's interesting. it actually falls in line with the end-goal we have in mind, of being able to create very highlevel alerts that can include / depend on more fine grained alerting rules and information. bosun already does this, and long term we want to make this functionality available through a more user friendly interface, but we have to start more simple than this.
@AmirhosseinRajabi looks like a cool project and I think making graphite-beacon into a handler for the alerts configured through the grafana UI would make a lot of sense.

JulienChampseix · 2015-08-05T12:08:16Z

@Dieterbe is it possible to have an update of the current status ? for alerting system
in order to know which system is comptatible (graphite/influxdb) ?
which subscribtion available ? which alert type available ?
thanks for your update.

naveen-tirupattur · 2016-10-05T16:44:11Z

@Dieterbe Any ETA for alerting support for OpenTSDB?

thisisjaid · 2016-10-05T17:09:05Z

@sofixa Thanks, should have looked at the roadmap myself, case of not RTFMing. Appreciated nonetheless.

Dieterbe · 2016-10-06T08:36:32Z

@Dieterbe Any ETA for alerting support for OpenTSDB?

i don't work on alerting anymore. maybe @torkelo or @bergquist can answer.

LoaderMick · 2016-10-27T02:00:24Z

@torkelo @bergquist

Any ETA for alerting support for OpenTSDB

utkarshcmu · 2016-10-27T05:10:21Z

@LoaderMick @naveen-tirupattur OpenTSDB alerting is added to Grafana, should be a part of the next release. Also, the alerting for OpenTSDB is working in the nightly builds.

nnsaln · 2016-11-01T13:18:46Z

Any ETA for alerting support for influxDB and prometheus too?

zihaoyu · 2016-11-01T13:23:25Z

@nnsaln alerting for both data sources is already in master branch.

bischofs · 2016-11-02T17:51:02Z

I cant seem to get the alerting working with OpenTSDB with (Grafana v4.0.0-pre1 (commit: 578507a)). I tested the email system (working) but the alerts just don't fire even when I have a very low threshold. Is there anyway to run the queries manually and see the data that it is pulling?

nnsaln · 2016-11-03T07:06:08Z

Grafana v4.0.0-pre1 (commit: 9b28bf2)
error tsdb.HandleRequest() error Influxdb returned statuscode invalid status code: 400 Bad Request

superbool · 2016-11-07T02:26:32Z

@torkelo
Can the 'webhook alert notification' post the alert metric ,json or form type?

calind · 2016-11-12T17:28:06Z

Hi guys, will Grafana support alerting for queries using template variables or is there a target release for this?

RichiH · 2016-11-12T18:07:58Z

All, please try 4.0 beta; if something is missing, open new issues. Richard Sent by mobile; excuse my brevity.

nnsaln · 2016-11-14T08:07:12Z

I've tried 4.0 beta, but I still got this error
error tsdb.HandleRequest() error Influxdb returned statuscode invalid status code: 400 Bad Request

nnsaln · 2016-11-14T08:18:25Z

I cannot save alert notifications - send to, after I saved, row send to is become blank again

deric · 2016-12-05T10:06:37Z

@nnsaln You're supposed to fill notification target there, not email address. Open the grafana side menu and hover over the Alerting menu option, then hit the Notifications menu options. There you can setup a notification target that you can use from your alert rules.

deepujain · 2016-12-05T21:11:49Z

Is there any plan to support template variables along with alerting ? I do understand each graph generated by a (or set) template variable corresponds to a different graph and hence generating alert against a static value is not correct.

…

On Mon, Dec 5, 2016 at 2:06 AM, Tomas Barton ***@***.***> wrote: @nnsaln <https://github.com/nnsaln> You're supposed to fill notification target there, not email address. Open the grafana side menu and hover over the Alerting menu option, then hit the Notifications menu options. There you can setup a notification target that you can use from your alert rules. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2209 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAY0-X4UkyVE0MeBlSiYD9892OuruGcVks5rE-I6gaJpZM4FJUTl> .

-- Deepak

torkelo · 2016-12-06T04:19:53Z

No, there is currently no support to do this. Maybe in far future but

deepujain · 2016-12-06T17:05:22Z

99% of dashboards use template variables. They were designed with template variables to avoid "dashboard explosion" problem.

…

On Mon, Dec 5, 2016 at 8:20 PM, Torkel Ödegaard ***@***.***> wrote: No, there is currently no support to do this. Maybe in far future but — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2209 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAY0-T9iFrqUcq4KbIECDe526040U6DHks5rFOJ4gaJpZM4FJUTl> .

-- Deepak

torkelo · 2016-12-06T22:14:04Z

Yes, but a generic exploration dashboard is not the same as a dashboard design for alert rules.

So far there has not been a proposal for how to support template variables in a intuitive / understandable way. What should alert query with variable do? Interpolate with current saved variable value, with all? Should it treat every value as separate rule and keep state for every etc. Supporting templating variables opens up a can of worms for complexity and potentially confusing behavior. might e added some day if someone comes up with a simple and understandable way.

flyersa · 2016-12-06T22:32:00Z

In the meantime nothing stops you to create seperate alert dashboards. Alerting is new and a huge addition to grafana. It will evolve within time, but in the short time it was implemented it added huge value to grafana, and thanks to all contributors for that! Am 06.12.2016 11:14 nachm. schrieb "Torkel Ödegaard" < notifications@github.com>:

…

Yes, but a generic exploration dashboard is not the same as a dashboard design for alert rules. So far there has not been a proposal for how to support template variables in a intuitive / understandable way. What should alert query with variable do? Interpolate with current saved variable value, with all? Should it treat every value as separate rule and keep state for every etc. Supporting templating variables opens up a can of worms for complexity and potentially confusing behavior. might e added some day if someone comes up with a simple and understandable way. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2209 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEKf_5VldwX2fG-USjnmlMH2qOZIDdKpks5rFd5DgaJpZM4FJUTl> .

deepujain · 2016-12-06T22:32:09Z

+1 Torkel. It does make alerting fairly complicated.

…

On Tue, Dec 6, 2016 at 2:14 PM, Torkel Ödegaard ***@***.***> wrote: Yes, but a generic exploration dashboard is not the same as a dashboard design for alert rules. So far there has not been a proposal for how to support template variables in a intuitive / understandable way. What should alert query with variable do? Interpolate with current saved variable value, with all? Should it treat every value as separate rule and keep state for every etc. Supporting templating variables opens up a can of worms for complexity and potentially confusing behavior. might e added some day if someone comes up with a simple and understandable way. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2209 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAY0-UgrMH9u7sI-FmPVgFhMVXJBvzTvks5rFd48gaJpZM4FJUTl> .

-- Deepak

xhook · 2016-12-09T10:53:18Z

@bergquist regarding this comment

alerting within grafana does not support HA yet. Our plan is to add support to partition alerts between servers in the future

Is there a ticket to track the progress? Any branch to contribute?

And big thanks for the nice job!

deepujain · 2016-12-11T17:47:32Z

Kern, <3 grafana. I was just trying to share thoughts around alerting with template dashboards.

…

On Fri, Dec 9, 2016 at 2:53 AM, Dmitry Zhukov ***@***.***> wrote: @bergquist <https://github.com/bergquist> regarding this comment alerting within grafana does not support HA yet. Our plan is to add support to partition alerts between servers in the future Is there a ticket to track the progress? Any branch to contribute? And big thanks for the nice job! — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2209 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAY0-aQXFZUeEfVl0MSQP7FQpMZGIh0mks5rGTMsgaJpZM4FJUTl> .

-- Deepak

jaimegago · 2016-12-20T00:06:50Z

@torkelo @Dieterbe It's awesome to finally have alerting built into Grafana ! What is the recommended way (if any) to create alerts programmatically?

torkelo · 2016-12-20T07:37:22Z

@jaimegago to create alerts programmatically use the dashboard api, alerts are saved along with a panel & dashboard.

jaimegago · 2016-12-21T20:56:57Z

@torkelo How about notifications targets (e.g. create a new notification email via API) ?

edit: Answering to myself here, I found the api/alert-notifications endpoint. I think it just needs to be documented

torkelo · 2016-12-22T07:00:37Z

Of course there is an http api for that, just go to alerting notifications page, add a notification and check the http api call grafana makes

CCWeiZ · 2017-02-27T12:16:16Z

@torkelo ,Is there any api can be used to create alert (not create alert notification ) programmatically

bergquist · 2017-02-27T13:01:14Z

@CCWeiZ Alerts is a part of the dashboard json. So you can only create dashboard that contains alert not alerts only.

You can read more about the dashboard api on http://docs.grafana.org/http_api/dashboard/

robertchen · 2017-12-15T16:17:15Z

is this available: I want to setup an alert for if a value compare to 3 days ago, the value is not increasing. (says the requests, if now value - 3 days ago requests < 100, then we say there are no much requests.). How to do this?

Dieterbe mentioned this issue Jun 22, 2015

Add support for alerting - maybe join forces with tattle? #102

Closed

torkelo added the area/alerting Grafana Alerting label Jun 23, 2015

building alerting system for grafana #2209

building alerting system for grafana #2209

Comments

Dieterbe commented Jun 22, 2015

General thoughts:

Current state

Requirements, Future implementations

linkslice commented Jun 22, 2015

torkelo commented Jun 23, 2015

dahendel commented Jun 23, 2015

Jhors2 commented Jun 26, 2015

damm commented Jun 26, 2015

dennisjac commented Jul 3, 2015

damm commented Jul 3, 2015

dennisjac commented Jul 3, 2015

damm commented Jul 3, 2015

elvarb commented Jul 6, 2015

deebs031 commented Jul 8, 2015

j1n6 commented Jul 13, 2015

AdrianParente commented Jul 14, 2015

lesaux commented Jul 15, 2015

rsetzer commented Jul 15, 2015

j1n6 commented Jul 15, 2015

falkenbt commented Jul 15, 2015

bigkraig commented Jul 16, 2015

jemilsson commented Jul 17, 2015

sudharsh commented Jul 21, 2015

dinoshauer commented Jul 22, 2015

Dieterbe commented Jul 22, 2015

felixbarny commented Jul 22, 2015

JulienChampseix commented Jul 22, 2015

nickman commented Jul 28, 2015

ahrajabi commented Aug 5, 2015

Dieterbe commented Aug 5, 2015

JulienChampseix commented Aug 5, 2015

naveen-tirupattur commented Oct 5, 2016

thisisjaid commented Oct 5, 2016

Dieterbe commented Oct 6, 2016

LoaderMick commented Oct 27, 2016

utkarshcmu commented Oct 27, 2016

nnsaln commented Nov 1, 2016

zihaoyu commented Nov 1, 2016

bischofs commented Nov 2, 2016

nnsaln commented Nov 3, 2016

superbool commented Nov 7, 2016

calind commented Nov 12, 2016

RichiH commented Nov 12, 2016 via email

nnsaln commented Nov 14, 2016

nnsaln commented Nov 14, 2016

deric commented Dec 5, 2016

deepujain commented Dec 5, 2016 via email

torkelo commented Dec 6, 2016

deepujain commented Dec 6, 2016 via email

torkelo commented Dec 6, 2016

flyersa commented Dec 6, 2016 via email

deepujain commented Dec 6, 2016 via email

xhook commented Dec 9, 2016

deepujain commented Dec 11, 2016 via email

jaimegago commented Dec 20, 2016

torkelo commented Dec 20, 2016

jaimegago commented Dec 21, 2016 • edited Loading

torkelo commented Dec 22, 2016

CCWeiZ commented Feb 27, 2017

bergquist commented Feb 27, 2017

robertchen commented Dec 15, 2017

jaimegago commented Dec 21, 2016 •

edited

Loading