<div align="center"><h1>Accelerating Applications with CUDA C/C++ (Pre-lab)</h1></div>
<div align="right">Test Jupyter and CUDA</div>

![CUDA](./images/CUDA_Logo.jpg)

---
## Accelerated Systems

*Accelerated systems*, also referred to as *heterogeneous systems*, are those composed of both CPUs and GPUs. Accelerated systems run CPU programs which in turn, launch functions that will benefit from the massive parallelism provided by GPUs. This lab environment is an accelerated system which includes an NVIDIA GPU. Information about this GPU can be queried with the `nvidia-smi` (*Systems Management Interface*) command line command. Issue the `nvidia-smi` command now, by `CTRL` + `ENTER` on the code execution cell below. You will find these cells throughout this lab any time you need to execute code. The output from running the command will be printed just below the code execution cell after the code runs. After running the code execution block immediately below, take care to find and note the name of the GPU in the output.



#### NOTE:
    * Under Linux  Nvidia-SMI command is typically located in /usr/bin
    * Under Windows-10 it is stored by default in the following location:
    C:\Windows\System32\DriverStore\FileRepository\nvdm*\nvidia-smi.exe
    

In [None]:
!nvidia-smi

---
### Exercise: Test a Hello GPU Kernel

In the line BELOW change sm_61 to the correct value depending upon your GPU


Maxwell cards (CUDA 6 until CUDA 11)

    SM50 or SM_50, compute_50 –
    Tesla/Quadro M series.
    Deprecated from CUDA 11, will be dropped in future versions.
    SM52 or SM_52, compute_52 –
    Quadro M6000 , GeForce 900, GTX-970, GTX-980, GTX Titan X.
    SM53 or SM_53, compute_53 –
    Tegra (Jetson) TX1 / Tegra X1, Drive CX, Drive PX, Jetson Nano.

Pascal (CUDA 8 and later)

    SM60 or SM_60, compute_60 –
    Quadro GP100, Tesla P100, DGX-1 (Generic Pascal)
    SM61 or SM_61, compute_61–
    GTX 1080, GTX 1070, GTX 1060, GTX 1050, GTX 1030, Titan Xp, Tesla P40, Tesla P4, Discrete GPU on the NVIDIA Drive PX2
    SM62 or SM_62, compute_62 – 
    Integrated GPU on the NVIDIA Drive PX2, Tegra (Jetson) TX2 
    
 Volta (CUDA 9 and later)

    SM70 or SM_70, compute_70 –
    DGX-1 with Volta, Tesla V100, GTX 1180 (GV104), Titan V, Quadro GV100
    SM72 or SM_72, compute_72 –
    Jetson AGX Xavier, Drive AGX Pegasus, Xavier NX    

Turing (CUDA 10 and later)

    SM75 or SM_75, compute_75 –
    GTX/RTX Turing – GTX 1660 Ti, RTX 2060, RTX 2070, RTX 2080, Titan RTX, Quadro RTX 4000, Quadro RTX 5000, Quadro RTX 6000, Quadro RTX 8000, Quadro T1000/T2000, Tesla T4 
    
Ampere (CUDA 11 and later)

    SM80 or SM_80, compute_80 –
    Tesla A100 (GA100), NVIDIA DGX-A100, RTX Ampere – RTX 3080
    
    SM86 or SM_86, compute 86
    Tesla GA10x cards, RTX Ampere – RTX 3080, GA102 – RTX 3090, RTX A2000, A3000, RTX A4000, A5000, A6000, NVIDIA A40, GA106 – RTX 3060, GA104 – RTX 3070, GA107 – RTX 3050, RTX A10, RTX A16, RTX A40, A2 Tensor Core GPU
    
    SM87 or SM_87, compute_87
    Jetson AGX Orin and Drive AGX Orin only
    
Ada Lovelace  (CUDA 11.8 and later)

    SM89 or SM_89, compute_89 –
    NVIDIA GeForce RTX 4090, RTX 4080, RTX 6000, Tesla L40

Hopper (CUDA 12 and later)

    SM90 or SM_90, compute_90 –
    NVIDIA H100 (GH100)
    
    SM90a or SM_90a, compute_90a – (for PTX ISA version 8.0) – adds acceleration for features like wgmma and setmaxnreg. This is required for NVIDIA CUTLASS

In [None]:
!nvcc -arch=sm_61 -o hello-gpu 01-hello-gpu-solution.cu -run

## Running the NVIDIA profiler

The traditional tool is/was nvprof. On some newer CUDA platforms this tool is no longer supplied-- it is being phased out. Instead use nsys. 

 --stats=true
 
 nsys profile -o foo --stats=true
 
 change nvprof as needed below
 
 !nsys profile -o foo --stats=true ./hello-gpu

In [None]:
!nvprof ./hello-gpu

---
## Setting Up the NVIDIA Visual Profiler or Nsight Compute

1) Open a terminal session and type "nvvp" to start the NVIDIA profiler. It is a graphical application (using Ecclipse) so it needs to run on your graphical desktop rather than in a browser (well there is a way but we won't bother).

Choose workspace.

2) Start the Nsight Compute