draft_protocol aggregation

Web Interface Enhancements for Clusters

Content from Munin Trac Wiki. Was written by snide between 2011-07-09 and 2011-07-11.

In the upcoming 2.0, '''data collection scales up to about 200 nodes''' on decent hardware (multi-core server, rrdcached, ...).

The '''web interface''', on the other hand, '''doesn't scale that well'''. The granularity of the information published is much too fine to be able to have the whole picture. Some '''aggregation''' has to be done in order to be able to show only the '''most important data''' on the summary pages. Data mining will then enable the user to dwell into having as much details when needed. It is much like '''zooming''', but not on a temporal scale, but on a '''group/node/field scale'''.

''Data collection passing the 1k node barrier will need another RFC, and is therefore outside the scope of this one''.

Design

Overview

Regroup similar nodes in a graphical summary, to present them as a single cluster.
Each plugin has to be reduced to only one field, called master_field, defined as a cdef from the others. No data will be stored for it. It enables to change its definition at a later time, without loosing its history.
Each master_field has an aggregation property, with a value of AVERAGE, MAX, MIN or SUM.
First implementation done outside.
Enables a very simple testing phase, with almost no synchronisation required with the main Munin.
Implemented as a complex multi-level multigraph plugin that will parse the munin datafile to generate its data.

Reduction from a whole host to only one value

It will be done using the "load" plugin, since :

cross-plateform
only one field
should already represent quite well the state of the serveur.

It has also some bad points :

It isn't showed a %, but in absolute value. Could use % from critical.
Could also use MAX % on plugins that defines critical.

Aggregation

aggregation is done via a multigraph plugin.
The plugin as aggregated according to their plugin names, not their graph_title.
Sibling plugins, like the "ip_" one, can be multiple per host, so have no automatic agregation possible. First, a manual preaggregation of all into one, named "ip" should take place. Then on this common one, a "master_field" can be specified, and aggreation will take place as usual.
When there is no master field defined, no aggregation takes place. This is by design : it enables a sparse view.
Some well known plugins don't have to define explicit master_field : a default will be provided. ( Useful on CPU, RAM, etc.)
The default aggregration type is STACK : it offers a fair mix of information presented.

UI enhancements

There are nice to have, but not possible with a plugin-only approach

The last aggregation graph in the series points to the real graph (new graph_ property ?)
Zooming can be accessed also for intermediate graphs. (Extra link in the HTML ?)
Next to each host, on the overview pages, there is a little graph (http://en.wikipedia.org/wiki/Sparkline) that shows the last 30min tendency (6 last values)
Each host has a color with the hue defined (2 steps : 0 -(green_to_yellow)-> %warning -(yellow_to_red)-> %critical) of its plugins, same for the clusters. The tooltip will represent all the offending fields, in offending order (group by $group, $host, $plugin & order by % desc)

One Multigraph Plugin

The whole aggregation will be done in a special plugin directly on the master, that presents another hierarchy and loan data from the real plugins.

It parses the munin datafile to retrieve the informations it needs. To reduce each plugin to its master_field, it uses optional extra configuration fields that the plugin emits. A list of well-known plugins can be hand-crafted in it, in order to be able deploy it easily, without touching all the plugins on an installation.

The cluster plugin loads the datastore file. It aggregates fields up to the whole install, down to each host (useful for debugging).

multigraph cluster
graph_title Cluster view for field "cpu"
graph_order cpu
update no
cpu.cdef cluster.group1.subgroup2.host1.cpu, cluster.group1.subgroup2.host2.cpu, cluster.group2.host3.cpu, +, +

multigraph cluster.group1
graph_title Cluster view for field "cpu"
graph_order cpu
update no
cpu.cdef cluster.group1.subgroup2.host1.cpu, cluster.group1.subgroup2.host2.cpu, +


multigraph cluster.group1.subgroup2.host1
graph_title Cluster view for host1
graph_order field1=

Provide feedback

Saved searches

Use saved searches to filter your results more quickly