# Performance Variability Boxplots

Performance variability boxplots provide an insight into the runtime distribution and its varibility across callsites. Boxplots are calculated to represent the range of the distribution and outliers (dots) correspond which are beyond the 1.5*IQR. Additionally, several statistical measures like mean, variance, kurtosis, skewness are also provided.

In [11]:
import os, sys
from IPython.display import HTML, display

# Hatchet imports
import hatchet as ht
from hatchet.external.scripts import BoxPlot

First, we will construct a **hatchet.GraphFrame** using a sample dataset in our repository, **caliper-lulesh-json**. 

In [3]:
data_dir = os.path.realpath("../../../hatchet/tests/data")
data_path = os.path.join(data_dir, "caliper-lulesh-json/lulesh-annotation-profile.json")
gf = ht.GraphFrame.from_caliper_json(data_path)

Next, using the **hatchet.GraphFrame**, we can calculate the data required for performance variability boxplot using an exposed hatchet API, **Boxplot**.

The interface excepts the following attributes:
1. `tgt_gf` - Target hatchet.GraphFrame 
2. `bkg_gf` - Background hatchet.GraphFrame (optional)
3. `callsites` - List of callsite names for which we want to compute/visualize the boxplots.
4. `metrics` - Runtime metrics for which we need to calculate the boxplots.
5. `iqr_scale` - Interquartile range scale (by default = 1.5)

In [12]:
callsites = gf.dataframe.name.unique().tolist()
bp = BoxPlot(cat_column='rank', tgt_gf=gf, bkg_gf=None, callsites=callsites, metrics=["time"])

**Boxplot** API calculates the results and stores as a GraphFrames in a dictionary (i.e., `tgt` and `bkg`). 

In [14]:
bp.tgt

{'time': <hatchet.graphframe.GraphFrame at 0x7f917880d220>}

Using the **roundtrip** interface, we can then visualize the compute boxplot information. Below, we load the roundtrip interface that allows users to visualize plots on jupyter notebook cells directly. 

In [10]:
# This is the relative path from the notebook to Roundtrip files in hatchet/external/roundtrip/
roundtrip_path = '../../../hatchet/external/roundtrip/'
hatchet_path = "."

# Add the path so that the notebook can find the Roundtrip extension
module_path = os.path.abspath(os.path.join(roundtrip_path)) 
if module_path not in sys.path:
    sys.path.append(module_path)
    sys.path.append(hatchet_path)

    
# Uncomment this line to widen the cells to handle large trees 
#display(HTML("<style>.container { width:100% !important; }</style>"))

# Load the Roundtrip extension. This only needs to be loaded once.
%load_ext roundtrip

The roundtrip extension is already loaded. To reload it, use:
  %reload_ext roundtrip


Since **roundtrip** excepts the data in JSON format, **Boxplot** API exposes a method, `to_json()` which will dump the boxplot's graphframes (i.e., `tgt` and `bkg`) in JSON.

In [18]:
bp_json = bp.to_json()

In [19]:
print(bp_json['main'])

{'tgt': {'time': {'q': [105528.0, 113072.25, 116494.0, 124430.75, 137098.0], 'ocat': [], 'ometric': [], 'min': 105528.0, 'max': 137098.0, 'mean': 119373.5, 'var': 104497970.25, 'imb': 0.14847935262013764, 'kurt': -0.9421848873183336, 'skew': 0.5436725364039101}}}


Now, we can trigger the visualization using **roundtrip** magic command, `%loadVisualization`. `%loadVisualization` expects the `roundtrip_path` (path in which roundtrip resides), `"boxplot"` (identifier to the visualization type) and  variable containing the data for the boxplots (here it is bp_json).

Interactions on the boxplot visualization:
1. Users can select the metric of interest to visualize the corresponding runtime information.
2. Users can sort the callsites by their statistical attributes (i.e., mean, min, max, variance, imbalance, kurtosis and skewness).
3. Users can select the sorting order (i.e., ascending or descending).
4. Users can select the number of callsites that would be visualized.

In [20]:
%loadVisualization roundtrip_path "boxplot" bp_json

<IPython.core.display.Javascript object>

Once the exploration of the variability is done. Users can get the corresponding data in their visualization using the `%fetchData` magic command. Similar to the `%loadVisualization`, we will have to specify `"boxplot"` to identify the corresponding visualization type. The results will be stored in the following variable (here it is `result_csv` ) in the `.csv` format.

In [24]:
%fetchData "boxplot" result_csv

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [25]:
print(result_csv)

name,min,max,mean,var,imb,kurt,skew;CalcFBHourglassForceForElems,1088315,1276744,1197360.375,3561043884.734375,0.066298857601664,-0.8618185329919692,-0.336770351062538;CalcKinematicsForElems,493338,907675,740734,20585329027.5,0.22537240088884808,-1.323030118573988,-0.3042530153918946;IntegrateStressForElems,448597,987804,725254.375,29868514054.234375,0.3620103980758475,-1.2658383358291696,-0.1038366357478744;CalcHourglassControlForElems,494580,599077,574309,982583388.75,0.04312660954294639,2.322254192176139,-1.930747431397297;CalcMonotonicQGradientsForElems,326522,448753,393558.125,1927822359.609375,0.140245802319543,-1.5265491924225043,-0.08914394549811265


The `.csv` formatted output can be converted to a dataframe as shown below.

In [28]:
import pandas as pd

columns = result_csv.split(';')[0].split(',')
data = [x.split(',') for x in result_csv.split(';')[1:]]
df = pd.DataFrame(data, columns=columns).set_index('name')

In [29]:
df

Unnamed: 0_level_0,min,max,mean,var,imb,kurt,skew
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
CalcFBHourglassForceForElems,1088315,1276744,1197360.375,3561043884.734375,0.066298857601664,-0.8618185329919692,-0.336770351062538
CalcKinematicsForElems,493338,907675,740734.0,20585329027.5,0.225372400888848,-1.323030118573988,-0.3042530153918946
IntegrateStressForElems,448597,987804,725254.375,29868514054.23437,0.3620103980758475,-1.2658383358291696,-0.1038366357478744
CalcHourglassControlForElems,494580,599077,574309.0,982583388.75,0.0431266095429463,2.322254192176139,-1.930747431397297
CalcMonotonicQGradientsForElems,326522,448753,393558.125,1927822359.609375,0.140245802319543,-1.5265491924225043,-0.0891439454981126
