# Optimizing a Deep Neural Network (DNN) training program
---

This Lab is into two parts: part 1 & 2. The part 1 discusses profiling using NVIDIA Nsight Systems, while part 2 expatiates on the application of PyTorch Profiler with TensorBoard plugin. Both parts focus on steps to optimizing deep neural network training program using a PyTorch mnist program. 

Overall, the lab teaches how to use NVIDIA Nsight Systems and PyTorch Profiler with TensorBoard to optimize a simple Deep Neural Network (DNN) training program that detects hand-written digits. The techniques and strategies discussed in this lab will translate to optimizing any application that uses NVIDIA's GPUs.

In this lab, you will learn how to do the following.
- Run the sample application
- Use NVIDIA Nsight Systems to profile the application
- Use PyTorch Profiler to profile the application and visualize on TensorBoard 
- Interpret the timeline provided by NVIDIA Nsight Systems and understand the application's use of the system resources
- Use TensorBoard execution summary, step time breakdown, and performance recommendation to understand the application's use of the system resources
- Identify performance problems in the application and apply optimization strategies
- Confirm the performance improvement gained from the optimizations

Below is the agenda to get us started with the optimizing process of a simple Deep Neural Network (DNN) training program.

## Table of Content
1. Part 1 (Profiling With NVIDIA Nsight Systems) [**required for evaluation**]
    1. [Start the NVIDIA Nsight Systems lab](jupyter_notebook/01_introduction.ipynb)
    1. [PyTorch mnist and Optimization Workflow](jupyter_notebook/02_pytorch_mnist.ipynb)
    1. [Data Transfers between Host and GPU](jupyter_notebook/03_data_transfer.ipynb)
    1. [Tensor Core](jupyter_notebook/04_tensor_core_util.ipynb)
    1. [Summary](jupyter_notebook/05_summary.ipynb)
1. Part 2 (PyTorch Profiler with Tensorboard)[**optional**]
    1. [Start PyTorch Profiler with TensorBoard plugin](jupyter_notebook/tb01_introduction.ipynb)
    1. [PyTorch mnist Optimization from TensorBoard Visualization](jupyter_notebook/tb02_pytorch_mnist.ipynb)
    1. [Memory Operation ](jupyter_notebook/tb03_data_transfer.ipynb)
    1. [Tensor Core](jupyter_notebook/tb04_tensor_core_util.ipynb)
    1. [Summary](jupyter_notebook/tb05_summary.ipynb)

Let's execute the cell below to display information about the CUDA driver and GPUs running on the server by running the `nvidia-smi` command. To do this, execute the cell block below by giving it focus (clicking on it with your mouse), and hitting Ctrl-Enter, or pressing the play button in the toolbar above. If all goes well, you should see some output returned below the grey cell.


In [None]:
!nvidia-smi

### Tutorial Duration
The lab material will be presented in a 2hr session..

### Content Level
Beginner, Intermediate

### Target Audience and Prerequisites
The target audience for this lab are Prospective mentors who desire to become a mentor at AI-based Hackathon. Audience are expected to have Python programming background Knowledge. 

## Links and Resources


[NVIDIA® Nsight™ Systems](https://docs.nvidia.com/nsight-systems/)


**NOTE**: To be able to see the profiler output, please download NVIDIA Nsight Systems' latest version from [here](https://developer.nvidia.com/nsight-systems).

You can also get resources from [openhackathons technical resource page](https://www.openhackathons.org/s/technical-resources)


--- 

## Licensing 

This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0).
