Nagios plugin for checking values in gmond
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.



Author: Lars Kellogg-Stedman

This is a Nagios plugin that checks values collected by Ganglia. It can poll either gmond or use the interactive query interface provided by gmetad.


Copyright (C) 2010 Lars Kellogg-Stedman

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <>.


-h, --help show this help message and exit
-w WARN, --warn=WARN
 Warn threshold.
-c CRITICAL, --critical=CRITICAL
 Critical threshold.
-v, --verbose Make output more verbse.
 Address of gmond/gmetad host.
-H HOST, --host=HOST
 Host for which we want metrics.
-l, --list List available metrics on the target host.
-m METRIC, --metric=METRIC
 Metric to compare against threshold values.
 Expression to compare against threshold values.
-q, --query Use gmetad query interface instead of gmond.
 Cluster name for gmetad query.
 Additional metrics to return as performance data.
 Exit status on connection failure, missing host or missing metric (default WARN).
-p PORT, --port=PORT
 Port on which to communicate w/ gmond/gmetad


Using gmond

Generate a WARN status if cpu_wio is >= 30% or CRITICAL if cpu_wio is >= 50%:

check_gmond -H -m cpu_wio -w 30 -c 50

Generate a WARN status if cpu_idle is < 70 or CRITICAL if cpu_idle < 50:

check_gmond -H -m cpu_idle -w :70 -c :50

Generate a CRITICAL status if os_relase is not "":

check_gmond -H -m os_release -c "!"

Using gmetad

Gmetad provides an interactive query interface that allows for efficiently fetching a subtree of the XML data. For environments with large numbers of hosts this offer a substantial performance advantage.

Use the --query flag to activate gmetad support. In additional to the parameters you provide when using gmond, you will also need to provide the appropriate Ganglia cluster name with --cluster (-C). For example:

check_gmond -q -C 'HPC Monitoring' \
  -H -m cpu_wio -w 30 -c 50

If you fail to provide a cluster name or if you mistype the cluster name, gmetad will behave essentially just like gmond -- that is, it will dump the entire XML tree.

Fetching additional metrics

You can include additional metrics as performance data in the check result using the '-x' flag. This is useful if you are using Nagios/Icinga to process performance data (e.g, using pnp4nagios).

For example:

# check_ganglia -q -C 'My cluster' -H \
  -m cpu_wio -x cpu_idle -x cpu_aidle -x cpu_nice \
  -x cpu_user -x cpu_system
cpu_wio OKAY: 1.8 | cpu_wio=1.8; cpu_idle=92.7; cpu_aidle=90.0;
  cpu_nice=0.0; cpu_user=1.0; cpu_system=4.5;

(Notice that the output has been wrapped here for display purposes, but will actually show up all on one line).

Using expressions

In some cases, the value provided by Ganglia is not, by itself, to meet your monitoring needs. You can ask check_ganglia to evaluate an arbitrary Python expression to compute the value of a metric with the --expression option. The host dictionary is available to this expression, the keys of which are the values available from Ganglia.

For example, if we want to adjust the value of load_five by dividing it by the number of cores in the system, we could call check_ganglia like this:

check_ganglia -q -C 'My cluster' -H \
  -m load_five --expression 'host["load_five"]/host["cpu_num"]'

Specifying threshold values

(This is extracted from check_gmond.checkval; see the embedded documentation for the most current version).

The arguments to the -w and -c options use the following syntax:

For numeric values

  • 5 -- match if v >= 5
  • 3:5 -- match if 3 <= v <= 5
  • :5 -- match if v <=5
  • 1,2,3 -- match if v in (1,2,3)

For string values

  • foo -- match if v == foo
  • foo,bar -- match if v in (foo, bar)


You can negate a threshold expression by preceding it with '!'. For example:

  • !5 -- match if v < 5
  • !3:5 -- match if v<3 || v>5
  • !1,2,3 -- match if v not in (1,2,3)