Usage Documentation

All RHV Checks are executed via RHV REST-API, so the following requirements must be met:

At least RHV 4.0 or oVirt 4.0 is required, older version aren't supported anymore.
HTTPS connection from Monitoring server to RHV Manager (default: TCP/443)
User for REST-API login
- use Admin-Role "ReadOnlyAdmin" for your monitoring user

The following checks can be performed by this plugin:

General Options

-vv: verbose Mode
-vvv: debug Mode
-p <port>: Port of REST-API (default: 443)
--ca-file: Path to RHEV-CA
-A <api>: REST-API path (default: /api)
-a <User>@<Domain>:<Password>: Authentication credentials
-f <Auth file>: Authentication file
-t <timeout>: Seconds before connection times out (default: 15)
-w <warn>: warning value
-c <crit>: critical value
-V: display version of plugin and exit
-h: Print detailed help screen

Authentication can be done either with -a

$ check_rhv -a <User>@Domain>:<Password>

or with -f (advantage of this version is that your password isn't visible in Icinga/Nagios configuration!)

$ check_rhv -f <auth file>

Syntax of auth file must be:

username=<User>@<Domain>
password=<Password>

Datacenter Checks

Datacenter checks are executed generally in the following way:

$ check_rhv -H <REV-Manager> -a <User>@<Domain>:<Password> -D <datacenter> [-l <check>] [-s <subcheck>]

You can monitor more then one datacenter with option -D:

For example you have 3 datacenters with the following names:

testing
production01
production02

Depending on your -D argument you can monitor a specific, multiple or all datacenters:

-D * : monitor all datacenters (testing, production01 and production02)
-D production* : monitor all production* datacenters (production01 and production02)
-D production01: monitor only production01 datacenter

Datacenter Status

Get the status of your datacenter(s) with the following option:

-l status

Hint: If you don't specify a check with -l, the status of the datacenter will be checked.

Example:

$ check_rhv -H rhevm -a admin@internal:password -D default -l status
RHV OK: Datacenters ok - 1/1 Datacenters with state UP|Datacenters=1;1;1;0;

Datacenter Version

Get the version of your datacenter and exit:

-l version

Example:

$ check_rhv -H rhevm -a admin@internal:password -D default -l version
RHV OK: Version ok - default: 3.0

Datacenter Storagedomain status

Get the status of all attached storagedomains with:

-l storage -s status

Hint: If you don't specify a subcheck with -s, the status of the storagedomains will be checked.

Example:

$ check_rhv -H rhevm -a admin@internal:password -D default -l storage -s status
RHV OK: Datacenters ok - 3/3 Storagedomains with state Active|storagedomains=3;3;3;0;

Datacenter Storagedomain usage

Get the used disk space of all attached storagedomains:

-l storage -s usage [-w <warn> ] [-c <crit>]

You can optional specifiy a warning and critical value. If you don't specify it, a default value of 60% (warning) and 80% (critical) is used.

Example:

$ check_rhv -H rhevm -a admin@internal:password -D default -l storage -s usage -w 60 -c 80
RHV WARNING: storage warning - 70.90% used (Default-iscsi) |storage_Default-iscsi=70.90%;60;80;0;

Datacenter Storagedomain overall usage

Get the overall used disk space of all attached storagedomains:

-l storage -s overall-usage [-w <warn> ] [-c <crit>]

You can optional specifiy a warning and critical value. If you don't specify it, a default value of 60% (warning) and 80% (critical) is used.

Example:

$ check_rhv -H rhevm -a admin@internal:password -D default -l storage -s overall-usage -w 60 -c 80
RHV WARNING: storage warning - 70.90% used (default) |ovstorage=70.90%;60;80;0;

Cluster Checks

Cluster checks are executed generally in the following way:

$ check_rhv -H <RHV-Manager> -a <User>@<Domain>:<Password> -C <cluster> [-l <check>] [-s <subcheck>]

You can monitor more then one cluster with option -C:

For example you have 3 cluster with the following names:

testing
production01
production02

Depending on your -C argument you can monitor a specific, multiple or all cluster:

-C * : monitor all clusters (testing, production01 and production02)
-C production* : monitor all production* cluster (production01 and production02)
-C production01: monitor only production01 cluster

Cluster Host Status

Get the status of all cluster hosts with:

-l hosts [-w <warn> ] [-c <crit>]

You can optional specifiy a warning and critical value for hosts which have to be UP. If you don't specify it, all hosts have to be UP.

Hint: If you don't specify a check with -l, the status of the hosts will be checked.

Example:

$ check_rhv -H rhevm -a admin@internal:password -C default -l hosts -w 1 -c 1
RHV OK: Cluster ok - 1/1 Hosts with state UP|hosts=1;1;1;0;

Cluster VM Status

Get the status of all cluster vms with:

-l vms [-w <warn> ] [-c <crit>]

You can optional specifiy a warning and critical value for vms which have to be UP. If you don't specify it, all vms have to be UP.

Example:

$ check_rhv -H rhevm -a admin@internal:password -C default -l vms -w 20 -c 25
RHV CRITICAL: Cluster critical - 6/32 Vms with state UP|vms=6;20;25;0;

Cluster Network Status

Get the status of all cluster networks with:

-l networks [-w <warn> ] [-c <crit>]

You can optional specifiy a warning and critical value for networks which have to be Operationals. If you don't specify it, all networks have to be Operational.

Example:

$ check_rhv -H rhevm -a admin@internal:password -C default -l networks -w 7 -c 7
RHV OK: Clusters ok - 7/7 Networks with state Operational|networks=7;7;7;0;

Cluster Gluster Volume Status

Get the status of all cluster networks with:

-l glustervolumes

Specify a gluster volume:

-l glustervolumes -n <volume>

Example:

$ check_rhv -H rhevm -a admin@internal:password -C default -l glustervolumes
RHV OK: Glustervolumes ok - 2/2 Glustervolumes with state Active|glustervolumes=2;2;2;0;

$ check_rhv -H rhevm -a admin@internal:password -C default -l glustervolumes -n engine
RHV OK: Glustervolumes ok - 1/1 Glustervolumes with state Active|glustervolumes=1;1;1;0;

Host Checks

Host checks are executed generally in the following way:

$ check_rhv -H <RHV-Manager> -a <User>@<Domain>:<Password> -R <Host> [-l <check>] [-s <subcheck>]

You can monitor more then one Host with option -R:

For example you have 3 hosts with the following names:

rhev-test01
rhev-prod01
rhev-prod02

Depending on your -R argument you can monitor a specific, multiple or all hosts:

-R * : monitor all hosts (rhev-test01, rhev-prod02 and rhev-prod03)
-R rhev-prod* : monitor all rhev-prod* hosts (rhev-prod01 and rhev-prod02)
-R rhev-prod01: monitor only rhev-prod01 host

Host Status

Get the status of your host(s) with the following option:

-l status

Hint: If you don't specify a check with -l, the status of the hosts will be checked.

Example:

$ check_rhv -H rhevm -a admin@internal:password -R rhevh -l status
RHV OK: Hosts ok - 1/1 Hosts with state UP|Hosts=1;1;1;0;

Host VM Status

Get the status of the vms running on this host(s) with the following option:

-l vms

Example:

$ check_rhv -H rhevm -a admin@internal:password -R rhevh -l vms
RHV CRITICAL: Vms critical - 4/41 Vms with state UP

If you specify -v (verbose) you see more detailed information:

$ check_rhv -H rhevm -a admin@internal:password -R -l vms -v
RHV CRITICAL: Vms critical - 4/41 Vms with state UP [Details: 4 up, 1 suspended, 36 down]

Host Load

Get the load usage (5min average) of your host with the following option:

-l load [-w <warn>] [-c <crit>]

You can optional specifiy a warning and critical value. If you don't specify it, the default value for warning equals the number of CPU cores and critical equals twice the number of CPU cures. E.g. you have 2 4-core CPUs, the default warning value is 8 and the default critical value is 16.

Example:

$ check_rhv -H rhevm -a admin@internal:password -R rhevh -l load -w 2 -c 4
RHV OK: cpu.load.avg.5m ok - 0.010  (rhevh) |cpu.load.avg.5m=0.010;2;4;0;

Host CPU utilization

Get the cpu utilization of your host with:

-l cpu -s usage [-w <warn>] [-c <crit>]

You can optional specifiy a warning and critical value. If you don't specify it, a default value of 60 (warning) and 80 (critical) is used.

Hint: If you don't specify a subcheck with -s, the cpu usage of this host will be checked.

Example:

$ check_rhv -H rhevm -a admin@internal:password -R rhevh -l cpu -s usage -w 60 -c 80
RHV OK: cpu ok - 11% used (rhevh) |cpu=11%;60;80;0; cpu.current.user=7;cpu.current.system=4;cpu.current.idle=89;

Host KSM usage

Get the percentage of CPU usage for Kernel SamePage Merging with:

-l ksm [-w <warn>] [-c <crit>]

You can optional specifiy a warning and critical value. If you don't specify it, a default value of 60 (warning) and 80 (critical) is used.

Example:

$ check_rhv -H rhevm -a admin@internal:password -R rhevh -l ksm -w 60 -c 80
RHV OK: ksm.cpu.current ok - 3% used (rhevh) |ksm.cpu.current=3%;60;80;0;

Host Memory usage

Monitor the memory usage in percentage with:

-l memory -s mem [-w <warn>] [-c <crit>]

You can optional specifiy a warning and critical value. If you don't specify it, a default value of 60 (warning) and 80 (critical) is used.

Hint: If you don't specify a subcheck with -s, the memory usage of this host will be checked.

Example:

$ check_rhv -H rhevm -a admin@internal:password -R rhevh -l memory -s mem -w 60 -c 80
RHV CRITICAL: memory critical - 80.11% used (rhevh) |memory=80.11%;60;80;0; memory.cached=0;memory.used=9485712097.28;memory.buffers=0;

Host Swap usage

Get the swap space usage:

-l memory -s swap [-w <warn>] [-c <crit>]

You can optional specifiy a warning and critical value. If you don't specify it, a default value of 60 (warning) and 80 (critical) is used.

Example:

$ check_rhv -H rhevm -a admin@internal:password -R rhevh -l memory -s swap
RHV OK: swap ok - 11.71% used (rhevh) |swap=11.71%;60;80;0;

Host Network status

Get the status of all network interfaces:

-l network -s status

Specify a network interface:

-l network -s status -n <nic>

Hint: If you don't specify a subcheck with -s, the nic status of this host will be checked.

Example:

$ check_rhv -H rhevm -a admin@internal:password -R rhevh -l network -s status
RHV OK: Hosts ok - 2/2 Nics with state Active|nics=2;2;2;0;

$ check_rhv -H rhevm -a admin@internal:password -R rhevh -l network -s status -n eth0
RHV OK: Hosts ok - 1/1 Nics with state Active|nics=1;1;1;0;

Host Network traffic

Get the network traffic of all nics in Mbit/s:

-l network -s traffic [-w <warn>] [-c <crit>]

Specify a network interface:

-l network -s traffic -n <nic> [-w <warn>] [-c <crit>]

You can optional specifiy a warning and critical value. If you don't specify it, a default value of 500 (warning) and 700 (critical) is used.

Example:

$ check_rhv -H rhevm -a admin@internal:password -R rhevh -l network -s traffic -w 2048 -c 3062
RHV OK: traffic ok - eth0: 0 Mbit/s eth1: 0 Mbit/s |traffic_eth0=0MB;62.5;87.5;0; traffic_eth1=0MB;62.5;87.5;0;

$ check_rhv -H rhevm -a admin@internal:password -R rhevh -l network -s traffic -w 2048 -c 3062 -n eth0
RHV OK: traffic ok - eth0: 0 Mbit/s |traffic_eth0=0MB;62.5;87.5;0;

Host Network errors

Get the network errors of all nics:

-l network -s errors [-w <warn>] [-c <crit>]

Specify a network interface:

-l network -s errors -n <nic> [-w <warn>] [-c <crit>]

You can optional specifiy a warning and critical value.

Example:

$ check_rhv -H rhevm -a admin@internal:password -R rhevh -l network -s errors -w 5 -c 10 -n eth0
RHV OK: errors ok - eth0: 0 Errors (rhevh) |errors_eth0=0c;5;10;0;

Host Updates

Check if updates are available for a host:

-l updates

Example:

$ check_rhv -H rhevm -a admin@internal:password -R rhevh -l network -s errors -w 5 -c 10 -n eth0
RHV OK: Updates available ok - rhevh: false

Storagedomain Checks

Storagedomain checks are executed generally in the following way:

$ check_rhv -H <RHV-Manager> -a <User>@<Domain>:<Password> -S <storagedomain> [-l <check>]

You can monitor more then one storagedomain with option -S:

For example you have 3 storagedomains with the following names:

isos
exports
iscsi01

Depending on your -S argument you can monitor a specific, multiple or all storagedomains:

-S * : monitor all storagedomains (isos, exports, iscsi01)
-S is* : monitor all is* storagedomains (isos and iscsi01)
-S iscsi01: monitor only iscsi01 storagedomain

Storagedomain Usage

Monitor used disk space of storagedomains:

-l usage

You can optional specifiy a warning and critical value. If you don't specify it, a default value of 60% (warning) and 80% (critical) is used.

Example:

$ check_rhv -H rhevm -a admin@internal:password -S isos -l usage -w 60 -c 80
RHV OK: storage ok - 39.78% used (isos) |storage_isos=39.78%;60;80;0;

Virtual Machine Checks

VM checks are executed generally in the following way:

$ check_rhv -H <RHV-Manager> -a <User>@<Domain>:<Password> -M <VM> [-l <check>] [-s <subcheck>]

You can monitor more then one VM with option -M:

For example you have 3 vms with the following names:

rhev-test01
rhev-prod01
rhev-prod02

Depending on your -M argument you can monitor a specific, multiple or all VMs:

-M * : monitor all VMs (rhev-test01, rhev-prod02 and rhev-prod03)
-M rhev-prod* : monitor all rhev-prod* VMs (rhev-prod01 and rhev-prod02)
-M rhev-prod01: monitor only rhev-prod01 VM

VM Status

Get the status of your virtual machine(s) with the following option:

-l status

Hint: If you don't specify a check with -l, the status of this VM will be checked.

Example:

$ check_rhv -H rhevm -a admin@internal:password -M vm -l status
RHV OK: Vms ok - 1/1 Vms with state UP|Vms=1;1;1;0;

VM CPU utilization

Monitor CPU utilization with:

-l cpu [-w <warn>] [-c <crit>]

You can optional specifiy a warning and critical value. If you don't specify it, a default value of 60 (warning) and 80 (critical) is used.

Example:

$ check_rhv -H rhevm -a admin@internal:password -M vm -l cpu -w 60 -c 80
RHV OK: cpu ok - 17% used (vm) |cpu=17%;60;80;0; cpu.current.guest=17;cpu.current.hypervisor=0;

VM Memory utilization

Monitor Memory utilization with:

-l memory

You can optional specifiy a warning and critical value. If you don't specify it, a default value of 60 (warning) and 80 (critical) is used.

Example:

$ check_rhv -H rhevm -a admin@internal:password -M vm -l memory -w 60 -c 80
RHV CRITICAL: memory critical - 82.00% used (vm) |memory=82.00%;60;80;0;

VM Network traffic

Get network traffic in MBit/s with:

-l network -s traffic

Specify a nic with:

-l network -s traffic -n <nic>

You can optional specifiy a warning and critical value. If you don't specify it, a default value of 500 (warning) and 700 (critical) is used.

Hint: If you don't specify a subcheck with -s, the traffic of all nics will be checked.

Example:

$ check_rhv -H rhevm -a admin@internal:password -M vm -l network -s traffic -w 900 -c 1000
RHEV OK: traffic ok - nic1: 0 Mbit/s (vm) |traffic_nic1=0MB;62.5;87.5;0;

VM Network errors

Get the network errors of all nics:

-l network -s errors [-w <warn>] [-c <crit>]

Specify a network interface:

-l network -s errors -n <nic> [-w <warn>] [-c <crit>]

You can optional specifiy a warning and critical value.

Example:

$ check_rhv -H rhevm -a admin@internal:password -M vm -l network -s errors -w 5 -c 10
RHV OK: errors ok - nic1: 0 Errors (vm) |errors_nic1=0c;5;10;0;

Virtual Machine Pool Checks

VM Pool checks are executed generally in the following way:

$ check_rhv -H <RHV-Manager> -a <User>@<Domain>:<Password> -P <VM-Pool> [-l <check>]

You can monitor more then one VM Pool with option -P:

For example you have 3 VM Pools with the following names:

rhev-test01
rhev-prod01
rhev-prod02

Depending on your -P argument you can monitor a specific, multiple or all VM Pools:

-P * : monitor all VM Pools (rhev-test01, rhev-prod02 and rhev-prod03)
-P rhev-prod* : monitor all rhev-prod* VM Pools (rhev-prod01 and rhev-prod02)
-P rhev-prod01: monitor only rhev-prod01 VM Pool

VM Pool Usage

Get the number of running virtual machine(s) of this VM Pool(s):

-l usage [-w <warn>] [-c <crit>]

You can optional specifiy a warning and critical value for number of free VMs.

Hint: If you don't specify a check with -l, the usage of this VM Pool will be checked.

Example:

$ check_rhv -H rhevm -a admin@internal:password -P pool -l usage -w 1 -c 2
RHV WARNING: VM Pool warning - 3/4 vms free|vmpool=1;1;2;0;

Icinga/Nagios Definitions

Now that you know how to use this plugin using the command line, here's a short description on how to integrate it into Icinga/Nagios.

For details see Icinga and Nagios documentation available on:

Command

First of all, you have to define the check_rhev3 command:

define command{
  command_name check_rhv
  command_line $USER1$/check_rhv -H $_RHEVM$ -a $ARG1$ $ARG2$
}

In this example we use a custom object variable $_RHEVM$, which represents the IP address or hostname of the RHEV Manager. See http://docs.icinga.org/latest/en/customobjectvars.html for details on custom object variables.

Host

To monitor a RHV Hypervisor you have to create a host:

define host{
  use       linux-server
  host_name rhevh
  alias     RHV Hypervisor
  address   192.168.1.2
  _rhevm    192.168.1.1
}

The hypervisor will be monitored through a REST-API call of the RHV-Manager, so you Icinga/Nagios-server doesn't need access to the IP of your hypervisor, but must be able to connect to RHV-Manger. IP-Adress or hostname of RHV-Manager is specified via custom object variable $_RHEVM$.

Note: You have to set this variable for all hosts, monitored with this plugin!

An example for a VM would be:

define host{
  use       linux-server
  host_name my-vm
  alias     My virtual machine
  address   192.168.1.3
  _rhevm    192.168.1.1
}

Again, we use $_RHEVM$ variable to speak with RHV-Manager.

Service

After defining hosts, you have to create services and assign these services to your hosts:

define service{
  use                 generic-service
  host_name           rhevh
  service_description RHEV CPU Check
  check_command       check_rhv!admin@internal:password!-R $HOSTNAME$ -l cpu
}

In this example, a CPU check for RHV-Hypervisor rhevh is defined.

Note that we use variable $HOSTNAME$ as name of RHV Host. So make sure that host_name for your Icinga/Nagios host definition matches the name of this host in your RHV environment. Otherwise you can hardcode the name in Icinga/Nagios configuration!

Memory Check for your VM with warning and critical value:

define service{
  use                 generic-service
  host_name           my-vm
  service_description RHEV Memory Check
  check_command       check_rhv!admin@internal:password!-R $HOSTNAME$ -l memory -s usage -w 70 -c 90
}

As for RHV Hypervisor, make sure that host_name matches vm name in RHV!

Usage Documentation

General Options

Datacenter Checks

Datacenter Status

Datacenter Version

Datacenter Storagedomain status

Datacenter Storagedomain usage

Datacenter Storagedomain overall usage

Cluster Checks

Cluster Host Status

Cluster VM Status

Cluster Network Status

Cluster Gluster Volume Status

Host Checks

Host Status

Host VM Status

Host Load

Host CPU utilization

Host KSM usage

Host Memory usage

Host Swap usage

Host Network status

Host Network traffic

Host Network errors

Host Updates

Storagedomain Checks

Storagedomain Usage

Virtual Machine Checks

VM Status

VM CPU utilization

VM Memory utilization

VM Network traffic

VM Network errors

Virtual Machine Pool Checks

VM Pool Usage

Icinga/Nagios Definitions

Command

Host

Service

Clone this wiki locally