# ExaMon Data Catalog

This notebook attempts to define an inventory of the data within the ExaMon database and more importantly wants to show how to use the API to build and extend it.

The procedures shown below focus on data related to the Marconi 100 system. These are mainly intended to show how to obtain, in a practical way, data information (metadata) from the database to efficiently create and document any dataset extracted from ExaMon.

**Status**: The notebook is a work in progress so questions and comments are welcome.



In [10]:
# Init steps

%matplotlib inline

!pip3 install wheel
!pip3 install pandas

import os
import getpass
import numpy as np
import pandas as pd
from examon.examon import Client, ExamonQL

# Connect
USER = input('username:')
print('password:')
PWD = getpass.getpass()
ex = Client('examon.cineca.it', port='3000', user=USER, password=PWD, verbose=False, proxy=True)
sq = ExamonQL(ex)




ModuleNotFoundError: No module named 'pandas'

# ExaMon Plugins
Plugins are components of ExaMon that collect or process data. Each plugin is specialized for a particular type of data or application. Consequently, in the case of collector-type plugins, they define a unique set of tags for the metrics exported to ExaMon. 

The `plugin` tag found in each ExaMon data is simply a namespace that defines metrics that have the same attributes (set of tags). Wanting to see things from a relational db (RDBMS) perspective, we can consider the metrics equivalent to *tables* that, in the context of the plugin, share the same schema.

In conclusion, it should be always possible to collect data related to a given plugin in a normalized table having as columns the tag keys of the metrics it collects.

To get started using the data contained in the ExaMon db, one option is to analyze the different sets of metrics collected by the various plugins. It is possible to use the `plugin` tag to identify the different groups of metrics in the ExaMon db.

In [None]:
df = sq.DESCRIBE(tag_key='plugin') \
    .execute()

display(df)

**NOTE:** In this report, we focus only on plugins that collect data from the Marconi 100 cluster of Cineca. The inner details of the operation of each plugin are omitted in favor of the data description.

## Ganglia
The [Ganglia](http://ganglia.sourceforge.net/) plugin connects to the Ganglia server (gmond), collects and translates the data payload (XML) to the ExaMon data model.


In [None]:
# Query of all metrics (a.k.a. tables) collected by the ganglia plugin

df = sq.DESCRIBE(tag_key='plugin', tag_value='ganglia_pub') \
    .execute()

display(df)

In [None]:
# Query of all attributes (keys and values) of a specific metric
#
# RDBMS analogy:
# tag key = table column name
# tag values = table column values

df = sq.DESCRIBE(metric='Gpu0_gpu_temp') \
    .execute()

display(df)

Some tags that are specific to this plugin are:


*   `gcluster`: cluster where the Ganglia server is collecting the data
*   `group`: label used by Ganglia to define sets of similar metrics





In [None]:
# All the possible values of a given tag key
#
# For example, we are interested in knowing all the possible "group"s of metric
# configured in the Ganglia instance 

df = sq.DESCRIBE(tag_key='group') \
    .execute()

display(df)

In [None]:
# ...and consequently the Ganglia metrics that belong to a specific group:

df = sq.DESCRIBE(tag_key='group', tag_value='gpu') \
    .execute()

display(df)

## IPMI
The [IPMI](https://en.wikipedia.org/wiki/Intelligent_Platform_Management_Interface) plugin collects all the sensor data provided by the OOB management interface (BMC) of cluster nodes.


In [None]:
# Query of all metrics collected by the IPMI plugin for all the clusters

df = sq.DESCRIBE(tag_key='plugin', tag_value='ipmi_pub') \
    .execute()

display(df)

In [None]:
# For example, if we only want to know metrics that relate to the ipmi collector of 
# Marconi 100:

df = sq.DESCRIBE(tag_key='plugin', tag_value='ipmi_pub') \
    .DESCRIBE(tag_key='cluster', tag_value='marconi100') \
    .JOIN(how='inner') \
    .execute()

display(df)

In [None]:
# If you are not sure, to obtain the list of eligible "tag_value"s to use as 
# "cluster" filter: 

df = sq.DESCRIBE(tag_key='cluster') \
    .execute()

print('Eligible "cluster" tag values:')
display(df)

In [None]:
# Query of all attributes (keys and values) of a specific metric

df = sq.DESCRIBE(metric='total_power') \
    .execute()

display(df)

**NOTE:** `rack` and `slot`, if defined, are convenience tags that are extracted from the hostname of the nodes, which is already collected in the `node` field. 

## Nagios

This plugin interfaces with a [Nagios](https://www.nagios.org/) extension developed by CINECA called ["Hnagios"](https://prace-ri.eu/wp-content/uploads/Design_Development_and_Improvement_of_Nagios_System_Monitoring_for_Large_Clusters.pdf), collects and translates the data payload to the ExaMon data model.

In [None]:
# Query of all metrics collected by the Nagios plugin for all the clusters

df = sq.DESCRIBE(tag_key='plugin', tag_value='nagios_pub') \
    .execute()

display(df)

In [None]:
# Query of all attributes (keys and values) of a specific metric

df = sq.DESCRIBE(metric='plugin_output') \
    .execute()

display(df)

Some tags that are specific to this plugin are:


*   `description`: it is the name of the entity (HW/SW) that is currently monitored by Nagios. It is defined as a hierarchy using `::` as a separator.
*   `host_group`: label to define group of nodes sharing the same function.
*   `nagiosdrained`: flag indicating that the node on which the specific alarm occurred was drained (placed offline) manually by an operator.
*   `state`: number indicating the state of host or service when the event handler was run
  *   Host events:
      *   0=UP,1=DOWN,2=UNREACHABLE
  *   Service events:
      *   0=OK,1=WARNING,2=CRITICAL,3=UNKNOWN
*   `state_type`:
      *   0=SOFT,1=HARD








In [None]:
# For example, we may be interested in all the entities monitored by nagios on 
# marconi100 and only for the 'compute' nodes

ret = ex.query_metricstags('state', 'description', filt={'cluster':['marconi100'], 'host_group':['compute']})

display(pd.DataFrame(ret))

In [None]:
# For example, we may be interested in which Marconi 100 nodes are included in a 
# certain "host_group"

# 1) list M100 "host_group"s
ret = ex.query_metricstags('state', 'host_group', filt={'cluster':['marconi100']})

display(pd.DataFrame(ret))

In [None]:
# 2) The "management" nodes of Marconi 100

ret = ex.query_metricstags('state', 'node', filt={'cluster':['marconi100'],'host_group':['management']})

display(pd.DataFrame(ret))

## Weather
This plugin collects all the weather data related to the Cineca facility location (Casalecchio di Reno) using an online open weather service  (https://openweathermap.org).

In [None]:
# Query of all metrics collected by the Weather plugin 

df = sq.DESCRIBE(tag_key='plugin', tag_value='weather_pub') \
    .execute()

display(df)

In [None]:
# Query of all attributes (keys and values) of a specific metric

df = sq.DESCRIBE(metric='clouds') \
    .execute()

display(df)

The `type` tag specifies the type of the requested metric:
*   `current`: measured value
*   `(hourly/daily)_forecast`: forecast at different time granularity

## Slurm
The Slurm plugin (time series data) collects some aggragated data from the [Slurm Workload Manager](https://www.schedmd.com/) server of the Cineca clusters. 

**NOTE:** it is a work in progress and may have some inconsistencies.


In [None]:
# Query of all metrics collected by the Slurm plugin for all the clusters

df = sq.DESCRIBE(tag_key='plugin', tag_value='slurm_pub') \
    .execute()

display(df)

In [None]:
# Query of all attributes (keys and values) of a specific metric

df = sq.DESCRIBE(metric='s21.totals.total_nodes_alloc') \
    .execute()

display(df)

The `partition` tag defines a logical subdivision (namely, the Slurm queue) of the cluster nodes.





## Logics
Logics is a data collection system already installed at Cineca. It is specialized for collecting power consumption data from equipment in the different rooms, typically using multimeters that communicate via [Modbus](https://modbus.org/) protocol. 

The ExaMon plugin dedicated to collecting this data interfaces to the Logics database (RDBMS) via its REST API. 

**NOTE:** Since the translation process is fully automated, the same inconsistencies present in the original db may result in the ExaMon database: e.g., metric names in the Italian language, units of measure as metric name, etc.


In [None]:
# Query of all metrics collected by the Logics plugin 

df = sq.DESCRIBE(tag_key='plugin', tag_value='logics_pub') \
    .execute()

display(df)

The most interesting metrics are `Potenza` and `Potenza_attiva`, which refer to the power consumption measured by the various multimeters.

Then there are other metrics derived from Logics such as `pue` and `Pue` that report the calculated PUE of the three computer rooms (F, N and M) of Cineca.

Finally, other derived metrics are `Tot*`, `pit`, and `pt` which represent the total power consumption of different categories of equipment (CDZs, pumps, chillers, racks, ...) typically used in Logics for the PUE calculation.

In [None]:
# Query of all attributes (keys and values) of a specific metric

df = sq.DESCRIBE(metric='Potenza') \
    .execute()

display(df)

In [None]:
# Query of all attributes (keys and values) of a specific metric

df = sq.DESCRIBE(metric='Potenza_attiva') \
    .execute()

display(df)

The most representative tags of Logics metrics are:

*  `panel`: the name of the electrical panel within the computer rooms
*  `device`: the name of the device, connected to the panel, of which the multimeter measures the parameters.

In [None]:
# Example: list of panels for metric "Potenza"

ret = ex.query_metricstags('Potenza', 'panel')

display(pd.DataFrame(ret))

In [None]:
# Example: list of panels for metric "Potenza_attiva"

ret = ex.query_metricstags('Potenza_attiva', 'panel')

display(pd.DataFrame(ret))

In [None]:
# Example: list of "device"s connected to the "panel"s named "f-a" and "f-c" 

ret = ex.query_metricstags('Potenza', 'device', filt={'panel':['f-a','f-c']})

display(pd.DataFrame(ret))

In [None]:
# For example, it may be of interest to understand what metrics are collected from 
# a given device

df = sq.DESCRIBE(tag_key='plugin', tag_value='logics_pub') \
    .DESCRIBE(tag_key='device', tag_value='chiller1-1') \
    .JOIN(how='inner') \
    .execute()

display(df)

In [None]:
# For example, we can plot some of these metrics for the given device ('chiller1-1')

%matplotlib inline

metrics = ['Potenza','Fattore_di_potenza','Tensione','Corrente','Stato']

df_list = (sq.SELECT('*') 
      .FROM(metric) 
      .WHERE(device='chiller1-1')
      .TSTART(7, 'days')
      .AGGRBY('avg', sampling_value=1, sampling_unit='minutes', align_sampling=True)
      .execute().df_table for metric in metrics)

ex.df_table = pd.concat(df_list)
ex.to_series(flat_index=True, interp='time', dropna=True, columns=['name'])
ex.df_ts.plot(figsize=[30,12], subplots=True);

## Schneider
The Schneider plugin is a dedicated data collector designed to acquire data from an industrial PLC by accessing its HMI module (from [Schneider Electric](https://www.se.com/ww/en/product-category/2100-human-machine-interfaces-hmi/)).

The PLC controls the valves and pumps of the liquid cooling circuit (RDHx) of Marconi 100. It consists of two (redundant) twin systems controllable by two identical HMI panels, Q101 and Q102.

The examon plugin extracts and stores all the metrics available on both panels.

In [None]:
# Query of all metrics collected by the Schneider plugin 

df = sq.DESCRIBE(tag_key='plugin', tag_value='schneider_pub') \
    .execute()

display(df)

In [None]:
# Query of all attributes (keys and values) of a specific metric

df = sq.DESCRIBE(metric='PLC_PLC_Q101.Portata_1_hmi') \
    .execute()

display(df)

The tag `panel` is used to distinguish the metrics of the two panels (Q101 and Q102).

## Vertiv
The [Vertiv](https://www.vertiv.com/en-emea/) plugin mainly collects data from the air-conditioning units (CDZ) located in room F (Marconi 100) of Cineca. 

The plugin uses the RESTful API interface available on the individual devices to extract the most interesting metrics.

In [None]:
# Query of all metrics collected by the Vertiv plugin

df = sq.DESCRIBE(tag_key='plugin', tag_value='vertiv_pub') \
    .execute()

display(df)

In [None]:
# Query of all attributes (keys and values) of a specific metric

df = sq.DESCRIBE(metric='Supply_Air_Temperature') \
    .execute()

display(df)

The tag `device` is used to distinguish the metrics of the different CDZs (cdz1-6).






In [None]:
# For example, we can plot some of these metrics for all the CDZs.

%matplotlib inline

metrics = ['Supply_Air_Temperature','Return_Air_Temperature']

df_list = (sq.SELECT('device') 
      .FROM(metric) 
      .TSTART(30, 'days')
      .AGGRBY('avg', sampling_value=10, sampling_unit='minutes', align_sampling=True)
      .execute().df_table for metric in metrics)

ex.df_table = pd.concat(df_list)
ex.to_series(flat_index=True, interp='time', dropna=True, columns=['device','name'])
ax = ex.df_ts.plot(figsize=[30,12], subplots=True, layout=(6,2));
for x in ax.reshape(-1):
  x.set_ylim(15, 32)
  x.set_ylabel('°C')



---

