# Network metrics Summary
Statistical analysis of the metrics from network devices. The devices include ConnectX Network Interface Controllers (NIC) and InfiniBand (IB) Switches.

NOTES:
* The metrics are system-wide.
* Bytes rate metric values are in units of B/ms.
* Packets metric values are rates in packets/ms.

## Loading

In [None]:
from IPython.display import display, HTML, Markdown

import nsys_display

display(HTML("<style>.container { width:95% !important; }</style>"))

import plotly.offline as pyo

def is_lab_notebook():
        import re
        import psutil
        return any(re.search('jupyter-lab-script', x) for x in psutil.Process().parent().cmdline())

if is_lab_notebook():
    pyo.init_notebook_mode()


#!pip3 install plotly

import pandas as pd
pd.options.plotting.backend = "plotly"
pd.set_option('display.max_rows', 100)
pd.set_option('display.width', 1000)

import nsys_pres

In [None]:
files_df = pd.read_parquet("files.parquet")
all_stats_df = pd.read_parquet("all_stats.parquet")

## Statistics for all ranks
Overall statistics for the network metrics across all ranks.
* Q1(approx) is the 25th percentile of the dataset. Approximated by the min of Q1 for each rank.
* Median(approx) is the 50th percentile of the dataset. Approximated by the median of the medians for each rank.
* Q3(approx) is the 75th percentile of the dataset. Approximated by the max of Q3 for each rank.

In [None]:
display(all_stats_df)
fig1=nsys_pres.display_boxplots_df(None, all_stats_df, xaxis_title="Metric Names", yaxis_title="Metric Value")
fig2=nsys_pres.display_graph(None, all_stats_df.index, all_stats_df[['Q1 (approx)', 'Median (approx)', 'Q3 (approx)']], title="50% of Distribution",  xaxis_title="Metric Names", yaxis_title="Metric Value")

## Statistics for all ranks and a particular network device
Please select a network device name to see:
* Table of per-rank statistics
  * Q1 is the 25th percentile of the dataset.
  * Median is the 50th percentile of the dataset.
  * Q3 is the 75th percentile of the dataset.
* Boxplot of the distribution of timing per-rank to support the investigation of outliers
* Graph of Q1, Median, & Q3 to see how close together the middle (50%) of the data is without outlier.

In [None]:
rank_stats_df = pd.read_parquet("rank_stats.parquet")
per_rank_device_gdf = pd.read_parquet('rank_stats_by_device.parquet')

nsys_display.display_stats_per_operation_device(rank_stats_df, per_rank_device_gdf, xaxis_title="Rank")

## Files
Ranks are assigned assuming that the file names include the rank and sort well.  If they are not sufficiently padded with zeros, the real rank may differ from the assigned ID.  This table allows you to identify the filename without the charts above having potentially very long label in the legend or x-axis.

In [None]:
display(files_df)