Skip to content

Ganglia Quick Start

Vladimir Vuksan edited this page Mar 23, 2015 · 3 revisions

Table of Contents

Quick start guide

Introduction

Ganglia monitoring suite consists of three main parts: gmond, gmetad and web interface, usually called ganglia-web.

  • gmond is a daemon which needs to sit on every single node which needs to be monitored, gather monitoring statistics, send as well as receive the stats to and from within the same multicast or unicast channel
    • If it's a sender (mute=no) it will collect basic metrics such as System Load (load_one), CPU Utilization. It can also send user defined metrics through addition of C/Python modules.
    • If it's a receiver (deaf=no) it will aggregate all metrics sent to it from other hosts. It will keep an in memory cache of all metrics
  • gmetad - is a daemon that polls gmonds periodically and stores their metrics into a storage engine like RRD. It can poll multiple clusters and aggregate the metrics. It is also used by the web frontend in generating the UI.
  • ganglia-web – this component explains itself. It should sit on the same machine as gmetad as it needs access to the RRD files.
Clusters are logical grouping of machines and metrics e.g. database servers, web servers, production, test, qa etc. These are completely arbitrary. You have to run a separate gmond instance for each cluster.

In general you will need one receiving gmond per cluster and one gmetad per site

Installation

Easiest way to install it is to use binary packages. On Ubuntu/Debian you can install using apt-get e.g.

apt-get install ganglia-monitor ganglia-monitor-python gmetad

Firewall Rules

By default gmond communicates on UDP port 8649 (specified in udp_send_channel and udp_recv_channel) and gmetad downloads metrics data over TCP port 8649 (or different depending what is specified as tcp_accept_channel). If you have any rules that block traffic on those ports your metrics will not show up.

Single cluster configuration

If you have only a handful machines we recommend using a single cluster as that is the easiest thing to set up and configure. The only other decision you need to make is whether you want to use multicast or unicast transport. Per the [wiki:FAQ]

Multicast mode is the default setting and is the simplest to setup and also provides redundancy. Environments that are sensitive to "jitter" may consider setting up Ganglia in unicast mode, which significantly reduces the chatter but is a bit more complex to configure. Environments such as Amazon's AWS EC2 offerings do not support multicast, so unicast is the only setup option available.

Multicast

If you are using multicast transport you shouldn't need to configure anything as that is the default that Ganglia packages come with. The only thing you may need to do is point your gmetad to one or few of the hosts that are running gmond. There is no need to list every single host since a gmond set in receive mode will contain the list of all hosts and metrics in the cluster

# /etc/gmetad.conf on monhost
data_source "MyCluster" monhost

Unicast

To configure unicast you should designate one (or more) machines to be receivers. For example I will pick host mon1 to be my receiver. mon1's gmond.conf should look like this (I'm just showing portions above module { block)

 globals {
   daemonize = yes
   setuid = yes
   user = nobody
   debug_level = 0
   max_udp_msg_len = 1472
   mute = no
   deaf = no
   allow_extra_data = yes
   host_dmax = 86400 /* Remove host from UI after it hasn't report for a day */
   cleanup_threshold = 300 /*secs */
   gexec = no
   send_metadata_interval = 30 /*secs */
 }

 cluster {
   name = "Production"
   owner = "unspecified"
   latlong = "unspecified"
   url = "unspecified"
 }

 host {
   location = "unspecified"
 }

 udp_send_channel {
   host = mon1
   port = 8649
   ttl = 1
 }

 udp_recv_channel { 
   port = 8649
 }

 tcp_accept_channel {
   port = 8649
 }

On all the other machines you will need to configure only this

globals {
  daemonize = yes
  setuid = yes
  user = nobody
  debug_level = 0
  max_udp_msg_len = 1472
  mute = no
  deaf = yes
  allow_extra_data = yes
  host_dmax = 86400 /* Remove host from UI after it hasn't report for a day */
  cleanup_threshold = 300 /*secs */
  gexec = no
  send_metadata_interval = 30 /*secs */
}

cluster {
  name = "Production"
  owner = "unspecified"
  latlong = "unspecified"
  url = "unspecified"
}

host {
  location = "unspecified"
}

udp_send_channel {
  host = mon1
  port = 8649
  ttl = 1
}

Please notice that send_metadata_interval is set to 30 (seconds). Metrics in Ganglia are sent separately from it's metadata. Metadata contains information like metric group, type etc. In case you restart receiving gmond metadata will be lost and gmond will not know what to do with the metric data and it will be discarded. This may result in blank graphs. In multicast mode gmonds can talk to each other and will ask for metadata if it's missing. This is not possible in unicast mode thus you need to instruct gmond to periodically send metadata.

Now in your gmetad.conf put

# /etc/gmetad.conf on mon1
data_source "Production" mon1

Restart everything and you should be set :-).

Multiple cluster configuration

Image(ganglia_multiple_clusters1.png)

As you can see from the diagram above, let’s say we have three clusters on the same broadcast (same network), but instead of having three separate Ganglia web interfaces and gmetad collector daemons we can have one on node0.c1 node, which then can collect stats from three different multicast (in our case) channels.

So what components are needed on what server:

  • ganglia-gmond is needed on every single node
  • ganglia-gmetad and ganglia-web is needed on node0.c1 only (let’s say we want to dedicate
node0.c1 as a Ganglia web interface and stats collector)

And here is the setup snippets of configuration files:

/etc/gmond.conf identical on ClusterOne nodes (node0, node1, node2, node3) – I will specify the part which is the most important:

# /etc/gmond.conf - on ClusterOne
cluster {
  name = "ClusterOne"
  owner = "unspecified"
  latlong = "unspecified"
  url = "unspecified"
}
 
udp_send_channel {
  mcast_join = 239.2.11.71
  port = 8661
  ttl = 1
}
 
udp_recv_channel {
  mcast_join = 239.2.11.71
  port = 8661
  bind = 239.2.11.71
}
 
tcp_accept_channel {
  port = 8661
}

/etc/gmond.conf identical on ClusterTwo nodes (node0, node1, node2, node3):

# /etc/gmond.conf - on ClusterTwo
cluster {
  name = "ClusterTwo"
  owner = "unspecified"
  latlong = "unspecified"
  url = "unspecified"
}
 
udp_send_channel {
  mcast_join = 239.2.11.71
  port = 8662
  ttl = 1
}
 
udp_recv_channel {
  mcast_join = 239.2.11.71
  port = 8662
  bind = 239.2.11.71
}
 
tcp_accept_channel {
  port = 8662
}

/etc/gmond.conf identical on ClusterThree nodes (node0, node1, node2, node3):

# /etc/gmond.conf - on ClusterThree
cluster {
  name = "ClusterThree"
  owner = "unspecified"
  latlong = "unspecified"
  url = "unspecified"
}
 
udp_send_channel {
  mcast_join = 239.2.11.71
  port = 8663
  ttl = 1
}
 
udp_recv_channel {
  mcast_join = 239.2.11.71
  port = 8663
  bind = 239.2.11.71
}
 
tcp_accept_channel {
  port = 8663
}

/etc/gmetad.conf – only exists on node0.c1 (again the most important part below):

# /etc/gmetad.conf on node0.c1
data_source "ClusterOne" node0.c1:8661 node1.c1:8661
data_source "ClusterTwo" node0.c2:8662 node1.c2:8662
data_source "ClusterThree" node3.c2:8663 node1.c3:8663

Notice, we did not list all the nodes as data sources above for each cluster (imagine if you had like a thousand nodes per cluster :-) ), the reason why it is not necessary. Imagine this as a three different pools, every one of them has its own virtual boundaries. So what happens is, the gmetad daemon accesses the configured data sources for data, say if one node dies the other one will still be able to provide stats to gmetad, because gmond nodes exchange stats within their configured UDP channels.

Now all you have to do is to configure your web server on node0.c1, start gmetad (default location for RRDs is /var/lib/ganglia/rrds) and start gmond services on all the clusters. You should have working monitoring system for your three clusters on a single node.

Multiple cluster configuration has been adapted from post written by Vaidas Jablonskis http://jablonskis.org/2011/monitoring-multiple-clusters-using-ganglia/