<img src="Images/nvidia_header.png" style="margin-left: -30px; width: 300px; float: left;">

## Exercise: Profile Your Code with Nsight Systems

In this exercise, you will learn how to profile your code with Nsight Systems. 
Nsight Systems is a system-wide performance analysis tool, designed to visualize CPU and GPU activities. The short video 

<div style="background-color: #e7f3fe; border-left: 6px solid #2196F3; padding: 10px; margin: 10px 0;">
  <strong>NOTE:</strong> There are some systems that are unable to display the Nsight Streamer due to firewalls or bandwidth limitations and its use of the WebRTC protocol. If this is your situation, you can still review the solution and images provided to see the profiling results.  To improve your bandwidth, you might try closing other applications and browser tabs on your computer or on your network.
</div>

To run Nsight Systems, you can use the command-line interface provided by `nsys`:

In [None]:
!nvcc --extended-lambda -o /tmp/a.out Solutions/compute-io-overlap.cu # build executable
!nsys profile --force-overwrite true -o ../nsight-reports/compute-io-overlap /tmp/a.out # run and profile executable

The code above stores the output in a file called `compute-io-overlap` in the `nsight-reports` directory.

To view the report, launch the Nsight Streamer to view the report through the graphical user intervface (GUI). 
To do that, navigate to the "NVIDIA Nsight" tab and select "Display Nsight UI via Nsight Streamer". Click "Ok" in the dialog box. The Nsight Streamer will open in a new tab within JupyterLab. 

<img src="Images/connect.png" alt="Connect to Nsight Tool UI">

You'll see a blank screen and there will be a short delay as the streamer spins up.  You'll get a login dialog: the username and password are both `nvidia`.

After the Nsight Streamer connects, the GUI will open.  Open the report file that we generated earlier.  You'll find it with **File->Open** from the menu.  Then navigate to **nsight-reports**.

<img src="Images/open.png" alt="Open the report">

Navigate the opened report to see the timeline of your application:

<img src="Images/navigate.png" alt="Navigate the report">

The following short video also demonstrates this process.

In [1]:
%%html
<video controls src="https://d36m44n9vdbmda.cloudfront.net/assets/s-ac-04-v2/videos/020203_nsight_setup.mp4" 
width=800>evasion</video>

For the exercise, your task is to navigate the report and identify:
- when GPU compute is launched
- when CPU writes data on disk
- when CPU waits for GPU
- when data is transferred between CPU and GPU

If you’re unsure how to proceed, consider expanding this section for guidance. Use the hint only after giving the problem a genuine attempt.

<details>
  <summary>Hints</summary>
  
  - Try unfolding "CUDA HW" section to see more detail on what is happening on the GPU
  - Memory transfers between CPU and GPU will be under "CUDA HW / Memory" section
  - IO-related activities on CPU should express themselves as `writev` and `fclose` system calls
</details>

Open this section only after you’ve made a serious attempt at solving the problem. Once you’ve completed your solution, compare it with the 
reference provided here to evaluate your approach and identify any potential improvements.

<details>
  <summary>Solution</summary>

  Launch of computation happens on the CPU side:
  ![Compute](Images/compute.png "Compute")

  Data transfer between CPU and GPU can be located in the "CUDA HW / Memory" section:

  ![Copy](Images/copy.png "Copy")

  CPU writes data on disk can be found in the "OS runtime libraries" section:
  ![Write](Images/write.png "Write")

</details>

---
Congratulations! Proceed to the [next exercise](02.02.04-Exercise-NVTX.ipynb).

<img src="Images/nvidia_header.png" style="margin-left: -30px; width: 300px; float: left;">