# HOW TO INSTALL CUDA on Google Colaboratory's Jupyter NB

Google Colaboratory has **NVIDIA Toolkit 11x installed by default**.
We verify by typing: 

In [1]:
!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0




>In order to make CUDA work we need **NVIDIA Toolkit 9.2 instead**.
  
We first proceed by deleting the current version from 
this "colab" instance.
 We achieve that by running Linux commands, as we are working on a Linux based system. We can run them with Python 3 by preceeding them with "!".



In [None]:
!apt-get --purge remove cuda nvidia* libnvidia-*
!dpkg -l | grep cuda- | awk '{print $2}' | xargs -n1 dpkg --purge
!apt-get remove cuda-*
!apt autoremove
!apt-get update



> Check if uninstalled correctly:



In [2]:
!nvcc --version

/bin/bash: nvcc: command not found


"*command not found*" output means we dont have CUDA Toolkit no more.

> Time to download CUDA Toolkit 9.2

In order to accomplish this we first have to do some updates on our colab system.

Here we face the first problem **(skip if not interested on details)**, NVIDIA repositories uses repositories keys to access their packages, but the one provided seems to be deprecated.

(srce: https://stackoverflow.com/questions/72104648/how-can-i-fix-this-dpkg-error-while-installing-cuda-on-google-colab)

 We then need to delete this key and then updating new ones before pulling from their site. Further details explained on link below.

 (srce: https://developer.nvidia.com/blog/updating-the-cuda-linux-gpg-repository-key/)




> We first need to remove the outdated signing key. Then guess the OS version and architecture we are runnig on collab, by running the following commands:



In [5]:
!sudo apt-key del 7fa2af80

OK




> Output: OK, deleted outdated key.



In [None]:
!cat /proc/cpuinfo

In [None]:
!cat /etc/os-release



> We know, in our case, that we are running Ubuntu 18.04
x64

Continue by replacing `$OS_VERSION` and `$ARCH` with them on the next command:




`!wget https://developer.download.nvidia.com/compute/cuda/repos/$OS_VERSION/$ARCH/cuda-keyring_1.0-1_all.deb`






In [None]:
!wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-keyring_1.0-1_all.deb

In [None]:
!sudo dpkg -i cuda-keyring_1.0-1_all.deb



> We now updated the signing key. We can finally proceed on installing NVIDIA Toolkit 9.2:



In [None]:
!wget https://developer.nvidia.com/compute/cuda/9.2/Prod/local_installers/cuda-repo-ubuntu1604-9-2-local_9.2.88-1_amd64 -O cuda-repo-ubuntu1604-9-2-local_9.2.88-1_amd64.deb
!dpkg -i cuda-repo-ubuntu1604-9-2-local_9.2.88-1_amd64.deb
!apt-key add /var/cuda-repo-9-2-local/7fa2af80.pub
!apt-get update
!apt-get install cuda-9.2



> We have successfully installed NVIDIA Toolkit 9.2

After checking CUDA version, we can now proceed on installing a package that will allow us to create code blocks with C syntax and CUDA functions on a Jupyter notebook, like the one Google Colab uses.



In [9]:
!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Wed_Apr_11_23:16:29_CDT_2018
Cuda compilation tools, release 9.2, V9.2.88


In [None]:
!pip3 install git+https://github.com/andreinechaev/nvcc4jupyter.git

In [11]:
%load_ext nvcc_plugin

created output directory at /content/src
Out bin /content/result.out




> We are now ready to start paralell coding with CUDA on this instance of Colab.
Just start the block with "%%cu"




> Use this example to test the instalation.

The output should show the specs of the GPU we are being provided with.



In [12]:
%%cu
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <cuda_runtime.h>

__host__ void properties_Device(int deviceID);

int main(int argc, char** argv)
{
    int deviceCount;
    cudaGetDeviceCount(&deviceCount);
    if (deviceCount == 0)
    {
        printf("!!NO CUDA DEVICE DETECTED!!\n");
        printf("<hit [ENTER] to exit>");
        getchar();
        return 1;
    }
    else
    {
        printf("CUDA devices found: <%d> \n", deviceCount);
        for (int id = 0; id < deviceCount; id++) 
        {
            properties_Device(id);
        }   
    }
 

    time_t date;
    time(&date);
    printf("************************************************\n");
    printf("Program run date: %s\n", ctime(&date));
    printf("<Hit [ENTER] to exit>");
    getchar();
    return 0;

}
__host__ void properties_Device(int deviceID)
{
    cudaDeviceProp deviceProp;
    cudaGetDeviceProperties(&deviceProp, deviceID);
  
    int cudaCores = 0;
    int SM = deviceProp.multiProcessorCount;
    int major = deviceProp.major;
    int minor = deviceProp.minor;
    const char *archName;
    switch (major)
    {
    case 1:

          archName = "TESLA";
          cudaCores = 8;
          break;
    case 2:

          archName = "FERMI";
          if (minor == 0)  
              cudaCores = 32;
          else 
              cudaCores = 48;
          break;
    case 3:

          archName = "KEPLER";
          cudaCores = 192;
          break;
   
    case 4:

          archName = "MAXWELL";
          cudaCores = 128;
          break;
    case 5:

          archName = "PASCAL";
          cudaCores = 64;
          break;
    case 6:
          cudaCores = 64;
          if (minor == 0)
                archName = "VOLTA";
          else
                archName = "TURING";
          break;
    case 7:
          archName = "AMPERE";
          cudaCores = 8;
          break;
          

         
    default:
          archName = "UNKNOWN";
    }
    int rtV;
    cudaRuntimeGetVersion(&rtV);
    
    printf("***************************************************\n");
    printf("DEVICE %d: %s\n", deviceID, deviceProp.name);
    printf("***************************************************\n");
    printf("> CUDA Toolkit \t: %d.%d\n", rtV / 1000, (rtV % 1000) / 10);
    printf("> CUDA Architecture \t: %s\n", archName);
    printf("> Computing Capacity \t: %d.%d\n", major, minor);
    printf("> No. of MultiProcessors \t: %d\n", SM);
    printf("> No. CUDA cores (%dx%d) \t: %d\n", cudaCores, SM, cudaCores*SM);
    printf("> Global Memory (total) \t: %u MiB\n",
      deviceProp.totalGlobalMem / (1024 * 1024));
    printf("***************************************************\n");
}



CUDA devices found: <1> 
***************************************************
DEVICE 0: Tesla T4
***************************************************
> CUDA Toolkit 	: 9.2
> CUDA Architecture 	: AMPERE
> Computing Capacity 	: 7.5
> No. of MultiProcessors 	: 40
> No. CUDA cores (8x40) 	: 320
> Global Memory (total) 	: 15109 MiB
***************************************************
************************************************
Program run date: Tue Sep 20 17:14:12 2022

<Hit [ENTER] to exit>
