Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a howto on alerting when % thresholds breached #209

Merged
merged 1 commit into from Apr 30, 2013
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
33 changes: 33 additions & 0 deletions howto.html
Expand Up @@ -476,6 +476,39 @@ <h3>Find the host using the most CPU</h3>
(with {:service "Max CPU" :host nil} prn)))
{% endhighlight %}


<h3>Alerting when a certain percentage of events happen</h3>

<p>Sometimes you'll have a service that will fail. You might expect one or two
failures, but if you get over a certain percentage of failures you want to be
notified. In this case you can use a <code>fixed-time-window</code>

{% highlight clj %}
(streams
(where (and metric (service "app1")(tagged "sign_in")
; We want to get alerted about failed sign ins. However we expect that there will be failures
; due to incorect passwords etc. So we only want to get alerted if more than 50% of the signins
; in a 60 second period are failures. The app tags failed signins with a warning state.
;
; fixed-time-window sends a vector of events out every 60 seconds
(fixed-time-window 60
; smap passes those events into a function
(smap (fn [events]
; we include a function here filter the vector for all of the "warning" events and count them
; we will then count the total of the events
; finally work out the percentage
(let [percent (/ (count (filter #(= (:state %) "warning") events))
(count events))]
; take an action on the value of percent, in this example we imagine that the
; page-ops and mail-devs functions can take strings
(cond
(> percent 0.7) (page-ops (format "sign_in is CRITICAL: %f percent" (float percent)))
(> percent 0.5) (mail-devs (format "sign_in is BAD: %f percent" (float percent)))
:else (prn (format "sign in is %f" (float percent)))
))))))
))
{% endhighlight %}

<h2>Working with dashboard</h2>

<h3>Application specific host grouping</h3>
Expand Down