This project is no longer under active development

StackHPC Monasca-Agent plugins

A collection of Monasca-Agent plugins to gather metrics. This repo functions as an incubator, with the ultimate aim to merge any effective plugins into the Monasca-Agent.

Includes:

Slurm (proof-of-concept)
nVidia GPUs
Prometheus (proof-of-concept)

Prometheus plugin

This is an experimental plugin which extends the capability of the existing Prometheus plugin to make it more useful. The following configuration options are supported:

metric_endpoint

The Prometheus endpoint to scrape.

Example:

metric_endpoint: "http://ceph-host:9283/metrics"

remove_hostname

Strip the hostname from each metric. This is useful when scraping an endpoint which exposes metrics not specific to a host. For example, RabbitMQ queue lengths, of Ceph cluster health.

Example:

remove_hostname: true

default_dimensions

A dict of dimensions to include with all metrics scraped from the specified endpoint.

Example:

default_dimensions:
  cluster_tag: production

counters_to_rates

Automatically convert counters to rates. This works by buffering counters locally and then computing the derivative with respect to time when the buffer is flushed to the Monasca API. When enabled, this setting uses the Prometheus metric type to automatically generate new rate metrics from counters. The counter metrics are still posted to the API unless they are not included in the whitelist. The rate metrics are named after the counters by appending _rate to the end of the metric name. Note that the Prometheus convention is to append _total to all counters, so a counter named ceph_osd_op_w will become ceph_osd_op_w_total_rate when converted to a rate.

Example:

counters_to_rates: True

Defaults to True.

whitelist

A whitelist of regexes used to determine which metrics are posted to the Monasca API. Many Prometheus endpoints generate vast quantities of data, so this can be a useful way to cut back on the number of metrics posted to the Monasca API to improve performance.

Example:

whitelist:
  - ceph_cluster_total_used_bytes
  - ceph_cluster_total_bytes
  - ceph_osd_op.*

Label whitelist

A whitelist of labels can be provided to reduce the number of unique time series created in Monasca. This is useful for exporters such as cAdvisor which produce many highly variable labels attached to each metric, of which some may not even be valid dimensions in Monasca.

Example:

label_whitelist:
  - name
  - state
  - hostname
  - interface

derived_metrics

A dict of metrics to derive from existing metrics. Supported operations are divide, sum and counter.

divide

The divide operation divides two metric series by each other. It enforces that the dimensions of the metrics match, to reduce the chance of an unphysical result. For example, in a ceph cluster with two OSDs, the following metrics may exist:

['ceph_osd_total_bytes', 'dimensions': {'osd': 1}, 'value': '1234',
 'ceph_osd_total_bytes', 'dimensions': {'osd': 2}, 'value': '4567']

['ceph_osd_total_used_bytes', 'dimensions': {'osd': 1}, 'value': '891',
 'ceph_osd_total_used_bytes', 'dimensions': {'osd': 2}, 'value': '111']

To calculate the fractional amount of space used on each OSD you must divide ceph_osd_total_used_bytes by ceph_osd_total_bytes for osd: 1 and again for osd: 2. The plugin does this by hashing the dimensions for each metric and using the hash to find the equivalent metric. If the two metric series do not have common sets of dimensions the operation will currently fail.

derived_metrics:
  ceph_cluster_usage:
    x: ceph_cluster_total_used_bytes
    y: ceph_cluster_total_bytes
    op: divide

sum

The sum operation sums all metrics in a series as a function of a specified dimension. For example, by specifying the osd dimension the total space used on all OSDs could be computed from the following metrics:

['ceph_osd_total_used_bytes', 'dimensions': {'osd': 1}, 'value': '891',
 'ceph_osd_total_used_bytes', 'dimensions': {'osd': 2}, 'value': '111']

If additional dimensions are present, these must remain the same for all metrics in the calculation. For example, it is not currently possible to create a sum on this hypothetical metric series:

['ceph_osd_total_used_bytes', 'dimensions': {'osd': 1, 'cluster: 'A'}, 'value': '891',
 'ceph_osd_total_used_bytes', 'dimensions': {'osd': 1, 'cluster: 'B'}, 'value': '111']

Example:

derived_metrics:
  ceph_osd_in_sum:
    series: ceph_osd_in
    key: ceph_daemon
    op: sum

counter

In many cases you will want to use counters_to_rates to automatically create counters from rates. As such this setting is enabled by default. However, sometimes Prometheus metrics may not be marked as counters correctly, or you may wish to calculate the rate of change of a gauge, or even of an existing rate.

To minimise user configuration, any metric ending with _total which is not marked as a counter will be converted automatically to a rate when counters_to_rates is True. This is because, by Prometheus convention, any metric ending with _total should be a counter. In this case the metric name will be appended with _rate to create the name of the new series, and the original series will remain.

For metrics which do not end in _total and/or are not marked as counters it may still be useful to convert the series to a rate. For example, the rate of change of remaining capacity would be a useful derivative of a gauge on a Ceph cluster. In this case you can use the counter operation to generate a rate from an arbitrary metric. The new metric assumes the name specified by the configuration key. For example in this case, a series of metrics called ceph_pool_wr_bytes_total_rate would be created from the metric series ceph_pool_wr_bytes.

Example:

derived_metrics:
  ceph_pool_wr_bytes_total:
    series: ceph_pool_wr_bytes
    op: counter

Note that this requires counters_to_rates to be enabled, which is the default, and if the same name is used for the existing series, the existing series will be converted to a rate in situ, overwriting the existing counter.

Full example configuration

init_config:
  timeout: 10
instances:
  - metric_endpoint: 'http://ceph-node:9283/metrics'
    remove_hostname: true
    default_dimensions:
      cluster_tag: production
    counters_to_rates: True
    whitelist:
      - ceph_cluster_total_used_bytes
      - ceph_cluster_total_bytes
      - ceph_osd_op.*
    derived_metrics: |
      ceph_cluster_usage:
        x: ceph_cluster_total_used_bytes
        y: ceph_cluster_total_bytes
        op: divide
      ceph_osd_in_sum:
        series: ceph_osd_in
        key: ceph_daemon
        op: sum
      ceph_pool_wr_bytes_total:
        series: ceph_pool_wr_bytes
        op: counter
      ceph_pool_rd_bytes_total:
        series: ceph_pool_rd_bytes
        op: counter

Note that more than one endpoint can be monitored by adding additional entries on the instances list.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.github/workflows		.github/workflows
stackhpc_monasca_agent_plugins		stackhpc_monasca_agent_plugins
.gitignore		.gitignore
.stestr.conf		.stestr.conf
.travis.yml		.travis.yml
LICENSE		LICENSE
README.rst		README.rst
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
test-requirements.txt		test-requirements.txt
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

This project is no longer under active development

StackHPC Monasca-Agent plugins

Prometheus plugin

metric_endpoint

remove_hostname

default_dimensions

counters_to_rates

whitelist

Label whitelist

derived_metrics

divide

sum

counter

Full example configuration

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

stackhpc/stackhpc-monasca-agent-plugins

Folders and files

Latest commit

History

Repository files navigation

This project is no longer under active development

StackHPC Monasca-Agent plugins

Prometheus plugin

metric_endpoint

remove_hostname

default_dimensions

counters_to_rates

whitelist

Label whitelist

derived_metrics

divide

sum

counter

Full example configuration

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages