From 2b5007dbb09c4429c7887f61247b925e17505815 Mon Sep 17 00:00:00 2001 From: Gavin Sandie Date: Sun, 28 Apr 2013 11:20:14 +0100 Subject: [PATCH] Add a howto on alerting when % thresholds breached Adds a howto on using `fixed-time-window` to alert when a certain percentage of events have occurred with a desired state in that time period. --- howto.html | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/howto.html b/howto.html index 87ec76b03..8194b3756 100644 --- a/howto.html +++ b/howto.html @@ -476,6 +476,39 @@

Find the host using the most CPU

(with {:service "Max CPU" :host nil} prn))) {% endhighlight %} + +

Alerting when a certain percentage of events happen

+ +

Sometimes you'll have a service that will fail. You might expect one or two +failures, but if you get over a certain percentage of failures you want to be +notified. In this case you can use a fixed-time-window + +{% highlight clj %} +(streams + (where (and metric (service "app1")(tagged "sign_in") + ; We want to get alerted about failed sign ins. However we expect that there will be failures + ; due to incorect passwords etc. So we only want to get alerted if more than 50% of the signins + ; in a 60 second period are failures. The app tags failed signins with a warning state. + ; + ; fixed-time-window sends a vector of events out every 60 seconds + (fixed-time-window 60 + ; smap passes those events into a function + (smap (fn [events] + ; we include a function here filter the vector for all of the "warning" events and count them + ; we will then count the total of the events + ; finally work out the percentage + (let [percent (/ (count (filter #(= (:state %) "warning") events)) + (count events))] + ; take an action on the value of percent, in this example we imagine that the + ; page-ops and mail-devs functions can take strings + (cond + (> percent 0.7) (page-ops (format "sign_in is CRITICAL: %f percent" (float percent))) + (> percent 0.5) (mail-devs (format "sign_in is BAD: %f percent" (float percent))) + :else (prn (format "sign in is %f" (float percent))) + )))))) + )) +{% endhighlight %} +

Working with dashboard

Application specific host grouping