# NVTX Iteration Analysis
Investigate the progress and consistency of an iteration based application.  The target range, chosen by name, is assumed to be a delineator between iterations.  This report graphs when the application reaches this range's(start), how long between ranges (delta), and how long the range took, encase it was behaving as a barrier, such as collective communication all-reduce.  You will see how long each rank takes in comparison to eachother.  The ranks that are taking longer to reach the end of the iteration(delta or range start) typically are preventing all the ranks from moving forward to the next iteration and so their performance should be investigated relative to the median or min.   Identify the ranks and iterations here.  Then investigate their cause by opening the 2-3 ranks (outlier plus median and/or min) in Nsight Systems as a multi-report view.

All times are in nanoseconds.

## Load Data

In [None]:
#General setup properties
import IPython.display
from IPython.display import display, HTML, Markdown
display(HTML("<style>.container { width:95% !important; }</style>"))

import pickle
import importlib
import os
import glob
import math
import re
import time
from collections import deque
import pandas as pd
import numpy as np
import sqlite3
#!pip3 install plotly

pd.options.plotting.backend = "plotly"
pd.set_option('display.max_rows', 100)
pd.set_option('display.width', 250)

import nsys_pres

In [None]:
#load the data
stats_df = pd.read_parquet("stats.parquet")
files_df = pd.read_parquet("files.parquet")

pace_map_by_column=dict()
pace_map_by_column['start'] = pd.read_parquet("pace_start.parquet").rename_axis("Iterations",axis='index').rename_axis("Ranks",axis='columns')
pace_map_by_column['end'] = pd.read_parquet("pace_end.parquet").rename_axis("Iterations",axis='index').rename_axis("Ranks",axis='columns')
pace_map_by_column['duration'] = pd.read_parquet("pace_duration.parquet").rename_axis("Iterations",axis='index').rename_axis("Ranks",axis='columns')
pace_map_by_column['duration_accum'] = pd.read_parquet("pace_duration_accum.parquet").rename_axis("Iterations",axis='index').rename_axis("Ranks",axis='columns')
pace_map_by_column['delta'] = pd.read_parquet("pace_delta.parquet").rename_axis("Iterations",axis='index').rename_axis("Ranks",axis='columns')
pace_map_by_column['delta_accum'] = pd.read_parquet("pace_delta_accum.parquet").rename_axis("Iterations",axis='index').rename_axis("Ranks",axis='columns')
pace_map_by_column['delta_stats'] = pd.read_parquet("pace_delta_stats.parquet").rename_axis("Iterations",axis='index').rename_axis("Ranks",axis='columns')


## Statistics for Target Operation
Overall statistics for the target range across all ranks.
* Q1(approx) is the 25th percentile of the dataset. Approximated by the min of Q1 for each rank.
* Median(approx) is the 50th percentile of the dataset.   Approximated by the median of the medians for each rank.
* Q3(approx) is the 75th percentile of the dataset.   Approximated by the max of Q3 for each rank.

In [None]:
display(stats_df.T)

## Start of Target Operation
Wall-clock time of when each rank reached this operation.
* The progress graph represents each rank as a line, the x-axis prepresents linear progress. 
* The consistency graph transposes the data, showing a line for each iteration. Visually moving upward from the x-axis, this is when the rank reached this iteration.

In [None]:
figs=list()
nsys_pres.display_pace_graph(figs, pace_map_by_column, 'start')

## Delta Between Targets
The time measured between the target range.
* A boxplot to understand the distribution of the rank timing per iteration 
* The progress graph represents each rank as a line, the x-axis prepresents linear progress. 
* The consistency graph transposes the data, showing a line for each iteration. Visually moving upward from the x-axis, this is how long each iteration tool.  Search here for inconsistencies between ranks to investigate.  If the lines are consistent horizontally then each rank completed their work at the same time.

In [None]:

delta_stats_df = pace_map_by_column['delta_stats']

figs = nsys_pres.display_boxplots_df(None, delta_stats_df.loc[1:],
                          title="Delta boxplot per iteration (across ranks)",
                          xaxis_title="Iteration")

In [None]:
figs=list()
nsys_pres.display_pace_graph(figs, pace_map_by_column, 'delta')

## Variance in Delta between Targets
Similar "Delta between Barriers", the median time per-iteration is subtracted from the rank's value, typically making the ouliers more obvious if they were subtle above.

In [None]:
figs=list()
nsys_pres.display_pace_graph_delta_minus_median(figs, pace_map_by_column)

## Delta Accumulated
This is the accomulated time between the target ranges.  This is more relevant if the target range is a barrier, such as an all-reduce, which must wait for all participants.  Assuming that the time in the target range is consistent, this may look similar to the graph of starts.  If the target range is a barrier, it will provide a more accurate picture of how fast the rank is processing it's data overall.   This can assist in identifying issues such inconsistent workload distribution, consistently giving a lighter load to a particular rank, or a hardware difference (distance from storage, NUMA setup, GPU or NIC bindings, throttling, etc), or even other OS and software services interfering.

In [None]:
figs=list()
nsys_pres.display_pace_graph(figs, pace_map_by_column, 'delta_accum')

## Duration Accumuled
Similar to delta accumulation, how long are we spending in the target range so far in the application's progress through these iterations?  If this is a barrier-like operation it is how long you are blocking.

In [None]:
figs=list()
nsys_pres.display_pace_graph(figs, pace_map_by_column, 'duration_accum')

## Duration
The duration is not accumulated here, so outliers are more likely to stand out.

In [None]:
figs=list()
nsys_pres.display_pace_graph(figs, pace_map_by_column, 'duration')

## Files
Ranks are assigned assuming that the file names include the rank and sort well.  If they are not sufficiently padded with zeros, the real rank may differ from the assigned ID.  This table allows you to identify the filename without the charts above having potentially very long label in the legend or x-axis.

In [None]:
display(files_df)