Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
http check monitoring system based on couchdb
Ruby
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
bin
config
lib
LICENSE
README
Rakefile

README

deckard : http monitoring system

deckard is a http check monitoring system built on top of CouchDB.

license: apache 2

Features:

* Email and SMS based alerts (through email)
* Designated on-call sms email address
* Basic CouchDB replication latency alerts
* Basic content check alerts
* Content check alerts with EC2 elastic IP failover
* All checks are defined in CouchDB (CRUD checks with ReST)
* Alert priorities (log, email, SMS and notifo)
* Simple setup via cron
* Basic scheduling to silence alerts
* Adjustable delay before firing check content requests
* Basic Chef "tag" lookup support
* Alert stats database for trending and analysis

Usage:

$ deckard --all ./deckard.yml

You now have the option of running --all, --failover, --content, --replication if you only want to run a subset of checks

Setup:

* Setup and configure all appropriate databases and alert documents.
* Create a crontab entry

$ crontab -e
*/5 * * * * deckard --all /path/deckard.yml &> /dev/null


Example documents:

On Call document format:

{
   "_id": "on_call_person",
   "sms_email": "8675309@jenny.net"
   "notifo_usernames" : ["jenny"]
}

For sms_email you will need to put in the phone number and sms to email host for your phone provider. Provide both an sms email and notifo username(s) and the sms will be only used for backup if something should go wrong with notifo. Saves you money on your text message bill! *You need the notifo application on your phone to use the notifo support. Note that notifo_usernames is an array of usernames so multiple people can get notifications.


Failover check document format:

{
   "_id": "lb01",
   "url": "http://somecheck.com/check.html",
   "secondary_instance_id": "i-1234",
   "priority": 2,
   "region": "us-east-1",
   "elastic_ip": "127.0.0.1",
   "content": "sometext",
   "failover": true,
   "primary_instance_id": "i-4321"
}

This document needs all the details to cause an elastic ip switch in the case the content is not found on the url.


Replication check format:

{
   "_id": "node01_node02",
   "name": "test",
   "master_url": "http://node01/db",
   "slave_url": "http://node02/db",
   "offset": 0,
   "priority": 1,
   "schedule": [
       2,
       3
   ]
}

This will test the doc counts between two databases and if they become out of sync by more or less than the thresholds specified in the config an alert is triggered.


HTTP content check format:

{
   "_id": "deckard.com:5984/",
   "url": "http://deckard.com:5984/",
   "content": "couchdb",
   "priority": 2
}


For all of these priority and schedule are optional fields in these documents, priority is 0, 1 and 2. 0 is log only, 1 is log and email and 2 is log, email and sms. The schedule is an array containing integers for the hours the alert should be silent. Check out the replication check definition above.


For Chef "tag" support you need to install a view in your chef database and configure the url to it in your Deckard config file.

function(doc) {
    if (doc.chef_type != 'node') return;
    
    emit(doc.automatic.ec2 ? doc.automatic.ec2.public_hostname : doc.automatic.fqdn, doc.normal.tags[0]);
}


Regarding alert stats, just create a stats database and puts it's name in the config file. When alerts happen you should begin to see documents get added. By itself it isn't all that helpful, analysis is the key. Here is a basic "counts" design document to get you started, it will give you stats about the different kinds of alerts you are having.

{
   "_id": "_design/counts",
   "language": "javascript",
   "views": {
       "by_type": {
           "map": "function(doc) {\n  emit(doc.type, 1);\n}",
           "reduce": "_count"
       },
       "by_url": {
           "map": "function(doc) {\n  emit(doc.url, 1);\n}",
           "reduce": "_count"
       },
       "by_error": {
           "map": "function(doc) {\n  emit(doc.error, 1);\n}",
           "reduce": "_count"
       }
   }
}
Something went wrong with that request. Please try again.