SmartDataCenter internal API and agents for monitoring and alarming
Switch branches/tags
HEAD-2308 TRITON-3 TRITON-478 amon-TOOLS-1983 grr-MON-366 grr-TRITON-476 jclulow_fixinstall master mon-347 node12 release-20140904 release-20140918 release-20141002 release-20141016 release-20141030 release-20141113 release-20141127 release-20141211 release-20141225 release-20150108 release-20150122 release-20150205 release-20150219 release-20150305 release-20150319 release-20150402 release-20150416 release-20150430 release-20150514 release-20150528 release-20150611 release-20150625 release-20150709 release-20150723 release-20150806 release-20150820 release-20150903 release-20150917 release-20151001 release-20151015 release-20151029 release-20151112 release-20151126 release-20151210 release-20151224 release-20160107 release-20160121 release-20160204 release-20160218 release-20160303 release-20160317 release-20160331 release-20160414 release-20160428 release-20160512 release-20160526 release-20160609 release-20160625 release-20160707 release-20160721 release-20160804 release-20160818 release-20160901 release-20160915 release-20160929 release-20161013 release-20161027 release-20161110 release-20161124 release-20161208 release-20161222 release-20170105 release-20170119 release-20170202 release-20170216 release-20170302 release-20170316 release-20170330 release-20170413 release-20170427 release-20170511 release-20170525 release-20170608 release-20170622 release-20170706 release-20170720 release-20170803 release-20170817 release-20170831 release-20170914 release-20170928 release-20171012 release-20171026 release-20171109 release-20171123 release-20171207 release-20171221 release-20180104 release-20180118 release-20180201 release-20180215 release-20180301 release-20180315 release-20180329 release-20180412 release-20180426 release-20180510 release-20180524 release-20180607 release-20180621 release-20180705 release-20180719 release-20180802 release-20180816 release-20180830 release-20180913 release-20180927 release-20181011 release-20181025 release-20181108 release-20181122 release-20181206 rfd82 ssh-deps
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
agent
bin
boot
common
deps
docs
master
plugins
relay
sandbox
test
tools
.gitignore
.gitmodules
CHANGES.md
LICENSE
Makefile
README.md
package.json

README.md

sdc-amon

This repository is part of the Joyent SmartDataCenter project (SDC). For contribution guidelines, issues, and general documentation, visit the main SDC project page.

Amon is a monitoring and alarming system for SmartDataCenter (SDC). It has three components: a central master, a tree of relays, and agents. Probes (things to check and alarm on) and ProbeGroups (optional grouping of probes) are configured on the master (i.e. on the "Amon Master API" or "Amon API" for short). Probe data is passed from the master, via the relays to the appropriate amon-agent where the probe is run. When a probe fails/trips it raises an event, which passes through the relays up to the master. The master handles events by creating or updating alarms and sending notifications to the configured contacts, if appropriate (suppression and de-duplication rules can mean a notification is not always sent). The Amon Master API provides the API needed by cloudapi, and ultimately the User and Operations Portals, to allow management of Amon probes, probe groups and alarms.

Design Overview

There is an "Amon Master" HTTP server that runs in the "amon" core zone as the "amon-master" SMF service. This is the endpoint for the "Amon Master API". The Amon Master stores long-lived Amon system data (probes, probe groups, contacts) in Moray and shorter-lived data (alarms) in redis. Redis runs in a separate "amonredis" core zone.

There is an "Amon Relay" running on each compute node global zone to ferry (1) probe configuration down to Amon Agents where probes are run; and (2) events up from agents to the master for handling. This is installed with the agents shar (which includes all SDC agents) as "amon-relay" on each compute node.

There is an "Amon Agent" running at each location where the supported probes need to run. Currently that is each compute node global zone in the DC plus in each core SDC (and Manta) zone.

Code Layout

master/         Amon master (node.js package)
relay/          Amon relay (node.js package)
agent/          Amon agent (node.js package)
plugins/        "amon-plugins" node.js package that holds probe types
                (e.g. "log-scan.js" implements the "log-scan" probe type).
common/         "amon-common" node.js module to share code between the
                above packages.
bin/            Some convenience scripts to run local builds of node, etc.
docs/           API docs
test/           Test suite.
tools/          General tools stuff for development of amon.

Development

Typically Amon development is done by:

  • making edits to a clone of sdc-amon.git on a Mac (likely Linux too, but that's untested) or a SmartOS development zone,

      git clone git@github.com:joyent/sdc-amon.git
      cd sdc-amon
      git submodule update --init   # not necessary first time
      vi
    
  • building:

      make all
      make check
    
  • syncing changes to a running SDC (typically a COAL running locally in VMWare) via one or more of:

      ./tools/rsync-master-to-coal
      ./tools/rsync-relay-to-coal
      ./tools/rsync-agent-to-coal
    
  • then testing changes in that SDC (e.g. COAL). See "Testing" below for running the test suite.

If you are developing from an OS other than SmartOS, you obviously can't be updating binary parts of Amon. Currently that typically only bites when trying to update npm deps of the version of node used by the Amon components.

Testing

Currently the primary client of the test suite is testing in a full install of all Amon components in a full SDC setup (e.g. in COAL). The bulk of the test suite (everything under "test/...") is installed with the Amon Relay (i.e. in the headnode global zone).

You can run the test suite from there as follows:

cd /opt/smartdc/agents/lib/node_modules/amon-relay
./test/runtests

This will run all the main tests against the running Amon system and also login to the Amon Master zone(s) and run its local test suite.

To sync local changes to a running COAL and run the test suite there try:

make test-coal

COAL Notes: Getting email notifications

For many ISPs it is common for outbound SMTP traffic (port 25) to be blocked. This means that Amon Master's default mail config results in no outbound email notifications. One way around that is to use your gmail account like this:

$ ssh coal                  # login to your COAL headnode gz
$ sdc-login amon            # login to the "amon" core zone
$ vi /opt/smartdc/amon/cfg/amon-master.json
# Edit the "notificationsPlugins.email.config" key to looks something like:
    "config": {
      "smtp": {
        "host": "smtp.gmail.com",
        "port": 587,
        "ssl": false,
        "use_authentication": true,
        "user": "YOUR-GMAIL-NAME@gmail.com",
        "pass": "YOUR-GMAIL-PASSWORD"
       },
      "from": "\"Monitoring (no reply)\" <no-reply@joyent.com>"
    }
$ svcadm restart amon-master

Personally, I'm using a separate gmail account for this so I don't have to put my personal gmail password in that config file.