# CUDA Synchronous Memset

This notebook identifies synchronous memset operations with pinned host memory or Unified Memory region, that block the host until all issued CUDA calls are complete.

Suggestions:
1. Avoid excessive use of synchronization.
2. Use asynchronous CUDA event calls, such as cudaStreamWaitEvent() and cudaEventSynchronize(), to prevent host synchronization.

In [None]:
import pandas as pd
import plotly.offline as pyo

from IPython.display import display, HTML, Markdown

import nsys_display

display(HTML("<style>.container { width:95% !important; }</style>"))
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_rows', None)
pyo.init_notebook_mode()

The table shows analysis results for each individual rank selected from the drop-down menu.

All time values are in nanoseconds.

In [None]:
df = pd.read_parquet('analysis.parquet')
nsys_display.display_table_per_rank(df)

## Files

The table associates each rank number with the original filename. Ranks are assigned assuming that the file names include the rank with sufficient zero padding for proper sorting. Otherwise, the actual rank may differ from the assigned ID.

In [None]:
files_df = pd.read_parquet("files.parquet")
display(files_df)