<a href="https://colab.research.google.com/github/jianywu/automotive_project/blob/main/cuda/src/CUDA_colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# How to run CUDA in a Colab notebook

This ipynb src is from: https://github.com/gittimos/cuda-colab.

Develop and test high-performance CUDA applications directly within a browser, without the need for local GPU resources.

This is a companion to my [blog post](https://moritzstrube.substack.com/publish/post/142751435), which contains a more detailed explanation and many valuable links.

Use nvidia-smi to check if connected with GPU.

In [1]:
!nvidia-smi

Mon Jun 17 10:21:13 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   37C    P8               9W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

 ### Installing nvcc4jupyter

 First, you need to install the nvcc4jupyter plugin in your Colab notebook. This can be done by running

In [2]:
pip install nvcc4jupyter

Collecting nvcc4jupyter
  Downloading nvcc4jupyter-1.2.1-py3-none-any.whl (10 kB)
Installing collected packages: nvcc4jupyter
Successfully installed nvcc4jupyter-1.2.1


### Loading the Extension
After installation, load the extension by running the following cell:

In [3]:
%load_ext nvcc4jupyter

Detected platform "Colab". Running its setup...
Source files will be saved in "/tmp/tmp7phfyogt".


### Writing CUDA Code

Now, you're ready to write your CUDA code. Use the %%cuda magic command at the beginning of a cell to indicate that the following code is CUDA C++ code:

In [4]:
%%cuda
#include <stdio.h>

__global__ void hello(){
    printf("Hello from block: %u, thread: %u\n", blockIdx.x, threadIdx.x);
}

int main(){
    hello<<<2, 2>>>();
    cudaDeviceSynchronize();
}

Hello from block: 1, thread: 0
Hello from block: 1, thread: 1
Hello from block: 0, thread: 0
Hello from block: 0, thread: 1



### Compiling with cuRAND

In [5]:
%%cuda
#include <stdio.h>
#include <curand_kernel.h>

__global__ void setup_kernel(curandState *state)
{
    int idx = threadIdx.x + blockDim.x * blockIdx.x;
    curand_init(1234, idx, 0, &state[idx]);
}

__global__ void generate_random_numbers(curandState *state, float *results)
{
    int idx = threadIdx.x + blockDim.x * blockIdx.x;
    curandState localState = state[idx];
    results[idx] = curand_uniform(&localState);
    state[idx] = localState;
}

int main()
{
    const int n = 256;
    float *devResults;
    curandState *devStates;

    cudaMalloc((void **)&devResults, n * sizeof(float));
    cudaMalloc((void **)&devStates, n * sizeof(curandState));

    setup_kernel<<<1, n>>>(devStates);
    generate_random_numbers<<<1, n>>>(devStates, devResults);

    float *hostResults = (float *)malloc(n * sizeof(float));
    cudaMemcpy(hostResults, devResults, n * sizeof(float), cudaMemcpyDeviceToHost);

    for (int i = 0; i < n; i++) {
        printf("%1.4f ", hostResults[i]);
    }

    printf("\n");

    free(hostResults);
    cudaFree(devResults);
    cudaFree(devStates);

    return 0;
}

0.1455 0.8202 0.5504 0.2948 0.9147 0.8690 0.3219 0.7829 0.0113 0.2855 0.7816 0.2338 0.6791 0.2824 0.6299 0.1212 0.4333 0.3831 0.5136 0.2987 0.4166 0.0345 0.0494 0.0467 0.6166 0.6480 0.8685 0.4012 0.0631 0.4972 0.6809 0.9350 0.0704 0.0458 0.1324 0.3785 0.6457 0.9930 0.9952 0.7677 0.3217 0.8210 0.2765 0.2691 0.4579 0.1969 0.9555 0.8739 0.7996 0.3810 0.6662 0.3153 0.9428 0.5006 0.3369 0.1490 0.8637 0.6191 0.6820 0.4573 0.9261 0.5650 0.7117 0.8252 0.8755 0.2216 0.2958 0.4046 0.3896 0.7335 0.7301 0.8154 0.0913 0.0866 0.6974 0.1811 0.5834 0.9255 0.9029 0.0413 0.9522 0.5507 0.7237 0.3976 0.7519 0.4398 0.4638 0.6094 0.7358 0.3272 0.6961 0.4893 0.9698 0.0456 0.2025 0.9491 0.1516 0.0424 0.6149 0.5638 0.6939 0.3146 0.3789 0.4445 0.2434 0.7767 0.3410 0.4800 0.9155 0.7785 0.3328 0.9594 0.7204 0.0598 0.0413 0.0768 0.7260 0.3966 0.6304 0.9518 0.6376 0.0148 0.9479 0.3318 0.0571 0.0555 0.9037 0.5324 0.1275 0.0500 0.7154 0.1011 0.3220 0.3423 0.8223 0.8637 0.0300 0.5660 0.9525 0.3251 0.0420 0.6712 0.5937