Skip to content

23_Performance_Monitoring

Marc A. Smith edited this page Mar 3, 2017 · 2 revisions

Introduction

Since ESOS trunk/r512 a new daemon was specifically written and integrated into the install image, its purpose is to collect and store performance metrics into a database. While there are many existing tools which accomplish this, many of them require complex configuration directives and additional software to work correctly.

At current stage the daemon collects and stores block device metrics only, it can use PostgreSQL and MySQL as back-ends and takes care of compacting the samples to avoid excessive record numbers.


Configuring the Performance Statistics Agent

The agent uses a single configuration file: '/etc/perf-agent.con' which contains all of the relevant options. The database connection string is formatted using a standard URL notation. Example of PostgreSQL and MySQL connection strings:

# PostgreSQL
DBURI = postgres://username:password@host/database

# MySQL
DBURI = mysql://username:password@host/database

You need to provide the agent with an empty database, it will take care itself of creating all of the tables.

The System option is used to identify your host in case of multiple ESOS agents logging to the same database. You can ignore the HostAddress option as it is not used.

PollingInterval sets the samples resolution and by default it's equal to 5 seconds. Changing the resolution is strongly discouraged and it may later disappear as a configurable parameter.

BlockDevices is a white-space separated list of devices to monitor.

Example:

# Monitor /dev/sda /dev/sdb /dev/sdc
BlockDevices = sda sdb sdc

Starting the Agent

The agent can be started by the init script:

/etc/rc.d/rc.perfagent start

Or in debug mode (which will print messages to stdout):

python /usr/local/perf-agent/perfagentmain.py

To terminate the agent in debug mode press CTRL+C.

To start the agent automatically upon boot, change the '/etc/rc.conf' file as follows:

#rc.perfagent_enable=NO
rc.perfagent_enable=YES

Agent details

The following block device metrics are stored into the database, the first column is equal to the one in the database:

readscompleted = BigInteger # n of read reqs completed
readsmerged = BigInteger # n of reads merged by scheduler
sectorsread = BigInteger # n of sectors read during sample period
writescompleted = BigInteger # n of write requests completed
sectorswritten = BigInteger # sectors written during period
kbwritten = BigInteger # sum of Kb written during sample period
kbread = BigInteger # sum of Kb read during sample period
averagereadtime = Integer # avg of ms spent doing writes
averagewritetime = Integer # avg of ms spent doing reads
iotime = Integer # Combined I/O execution time in ms
interval = Integer # Sample interval in s
writespeed = Integer # W in Kb/s
readspeed = Integer # R in Kb/s
devicerate = Integer # Rate of combined R+W in KB/s

Auto-Compacter

Enabling the agent to start automatically on boot, will enable a sample reducer to be run once every 24 hours ('croncompact.py'), the reducer will compute averages of samples following this schema:

  • Samples of the previous day (starting at 00:00 ending at 23:59) reduce to 15 minutes (average or sum depending of the field) and keep them for the next 7 days
  • Samples of 7 days ago (starting at 00:00 ending at 23:59) reduce to hourly samples
  • Samples of 31 days ago (starting at 00:00 ending at 23:59) reduce to 1 daily sample

If you don't want to reduce the samples or you will automatically purge them by other means then simply comment out the line in '/etc/crontab' which contains the 'croncompact.py' reference.


Nagios

NRPE is included with ESOS; Nagios Remote Plugin Executor (NRPE) is a addon that is designed to allow you to execute Nagios plugins on remote Linux machines (like ESOS). You'll need to first enable the 'rc.nrpe' service so it starts at boot time; edit '/etc/rc.conf' and change it like this:

#rc.nrpe_enable=NO
rc.nrpe_enable=YES

You'll then need to configure NRPE. See this document for assistance, or many other examples exist on the web. Use this command to start NRPE:

/etc/rc.d/rc.nrpe start

Munin

Munin is a networked resource monitoring tool. In ESOS, we include the munin-c package which is a C rewrite of the munin node components. To enable munin-c you'll need to edit '/etc/rc.conf' and change the line for 'rc.munin' to this:

#rc.munin_enable=NO
rc.munin_enable=YES

Refer to the project home page for munin-c on configuration information. Once munin-c is configured, you can then start the service like this:

/etc/rc.d/rc.munin start