# Statistics Diff

This notebook compares results from two runs of the same recipe.

In [None]:
import pandas as pd
import plotly.offline as pyo

from IPython.display import display, HTML, Markdown

import nsys_display

display(HTML("<style>.container { width:95% !important; }</style>"))
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_rows', None)
pyo.init_notebook_mode()

## Overall statistics diff

The table shows the difference values for each operation. Note that:
* All time values are in nanoseconds.
* Q1 is the 25th percentile of the data set. Approximated by the min of Q1 for each rank.
* Median is the 50th percentile of the data set. Approximated by the median of the medians for each rank.
* Q3 is the 75th percentile of the data set. Approximated by the max of Q3 for each rank.

In [None]:
ranks_df = pd.read_parquet('all_stats.parquet')
display(ranks_df)

The line graphs shows the difference values for each operation.

To toggle the line traces on and off on the graph, click on their corresponding legend entries.

In [None]:
nsys_display.display_stats_scatter(ranks_df, xaxis_title='Name', yaxis_title='Diff Value (ns)')

## Per-rank statistics diff

The table and graphs show the difference values of the operation selected from the drop-down menu, for each rank.

To toggle the line traces on and off on the graph, click on their corresponding legend entries. When working with large data sets, consider using the .head() function to limit the number of displayed elements or zoom in on the diagram for better visibility.

In [None]:
per_rank_df = pd.read_parquet('rank_stats.parquet')
nsys_display.display_stats_per_operation(per_rank_df, box=False, xaxis_title='Rank', yaxis_title='Diff Value (ns)')

## Files

The table associates each rank number with the original filenames. Ranks are assigned assuming that the file names include the rank with sufficient zero padding for proper sorting. Otherwise, the actual rank may differ from the assigned ID.

In [None]:
files_df = pd.read_parquet("files.parquet")
display(files_df)