# CUDA on Colab

This notebook, based on an example from Nvidia, shows how to check the GPU status of your Colab notebook, check out a github repository containing your c++ code, and compile it using either g++ for CPU or nvcc for GPU. and run it.

Not yet covered, profiling.

Author: Evelyn Mitchell
Source Repository: https://github.com/evelynmitchell/cuda-on-colab
Date: 2023-12-04

The nvidia-smi cli tells you about your GPU. The sample outputs for different types of GPUs or TPUs follow.

In [None]:
!nvidia-smi

A100 GPU
```

```

V100 GPU
```
Mon Dec  4 18:42:36 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   31C    P0    23W / 300W |      0MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
```

T4 TPU
```
Mon Dec  4 18:40:38 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   48C    P8    11W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
```

# C++ for CUDA
Install the c++ build chain, which should be already available on colab.

In [None]:
!apt install build-essential

The GPU compiler for c++ from Nvidia is called nvcc, and is already installed on Colab, as is build-essential, which provides g++ as well.

In [None]:
!nvcc --version

## Get the code
Checkout the repository containing the c++ files to compile.

In [None]:
!git clone https://github.com/evelynmitchell/cuda-on-colab

## Build the code for CPU

In [None]:
!g++ /content/cuda-on-colab/src/simple.cpp -o simple


In [None]:
!chmod +x ./simple
!./simple

## Compile to a CUDA kernel

Adding the  ```__global__``` specifier to a function indicates it will be compiled to a CUDA kernel and run on a GPU processor.

This code fails when it's compiled due to an error in how it is called. The error and fix follow this section.

In [None]:
!nvcc /content/cuda-on-colab/src/simple_cuda.cu -o simple_cuda

In [None]:
!chmod +x ./simple_cuda
!./simple_cuda

## Configure kernel launch

The error from the prior version of the compilation "__global__ function call must be configured" is corrected by adding kernel launch parameters <<<gridsize,blocksize>>> to the function.

In [None]:
!nvcc /content/cuda-on-colab/src/simple_cuda_kernel_launch.cu -o simple_cuda_kernal_launch

In [None]:
!chmod +x ./simple_cuda_kernal_launch
!./simple_cuda_kernal_launch

In [None]:
!nvprof ./simple_cuda_kernal_launch

In [None]:
!nvcc /content/cuda-on-colab/src/simple_cuda_memory_alloc.cu -o simple_cuda_memory_alloc


In [None]:
!/content/simple_cuda_memory_alloc
!nvprof ./simple_cuda_memory_alloc