# CUDA GPU Memory Operation Summary (by Time)

This notebook provides a summary of GPU memory operations and their execution times.

In [None]:
import pandas as pd
import plotly.offline as pyo

from IPython.display import display, HTML, Markdown

import nsys_display

display(HTML("<style>.container { width:95% !important; }</style>"))
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_rows', None)
pyo.init_notebook_mode()

## Overall statistics

The table shows the statistics across all ranks. Note that:
* All time values are in nanoseconds.
* Q1 is the 25th percentile of the data set. Approximated by the min of Q1 for each rank.
* Median is the 50th percentile of the data set. Approximated by the median of the medians for each rank.
* Q3 is the 75th percentile of the data set. Approximated by the max of Q3 for each rank.

In [None]:
ranks_df = pd.read_parquet('all_stats.parquet')
display(ranks_df)

The diagram shows statistical box plots and line graphs for each operation across all ranks.

To toggle the line traces on and off on the graph, click on their corresponding legend entries. When working with large data sets, consider using the .head() function to limit the number of displayed elements or zoom in on the diagram for better visibility.

In [None]:
nsys_display.display_box(ranks_df, xaxis_title='Name', yaxis_title='Value (ns)')
nsys_display.display_stats_scatter(ranks_df, xaxis_title='Name', yaxis_title='Value (ns)')

## Per-rank statistics

The table and graphs show statistics of the operation selected from the drop-down menu, for each rank. Note that:
* All time values are in nanoseconds.
* 'Q1', 'Med', and 'Q3' are the 25th, 50th, and 75th percentiles of the data set, respectively.
* The 'Time' column is calculated using a summation of the "Total Time" column. It represents that function's percent of the execution time of the functions listed and not a percentage of the application wall or CPU execution time.

To toggle the line traces on and off on the graph, click on their corresponding legend entries. When working with large data sets, consider using the .head() function to limit the number of displayed elements or zoom in on the diagram for better visibility.

In [None]:
per_rank_df = pd.read_parquet('rank_stats.parquet')
per_rank_gpu_df = pd.read_parquet('rank_stats_by_device.parquet')

nsys_display.display_stats_per_operation_device(per_rank_df, per_rank_gpu_df, xaxis_title='Rank', yaxis_title='Value (ns)')

## Files

The table associates each rank number with the original filename. Ranks are assigned assuming that the file names include the rank with sufficient zero padding for proper sorting. Otherwise, the actual rank may differ from the assigned ID.

In [None]:
files_df = pd.read_parquet("files.parquet")
display(files_df)