# Creating a Specific Version of Python and PyTorch Environment on Colab

This document will guide free users in creating a specific version of Python, PyTorch and Tensorflow execution environment on Google Colab.

**For those using this notebook as a template to create your own version, please carefully read the [README.md](https://github.com/liuyuweitarek/downgrade-upgrade-colab-python/blob/main/README.md) to ensure that your Python version, PyTorch version (or TensorFlow version), and cuDNN version are correct.**

As proof of concept, I use Miniconda version 3.8 to create a Python 3.7 environment, which can use both PyTorch 1.7.1 and tensorflow 2.1.0 version here.

## Connect to google drive space


In [None]:
from google.colab import drive
drive.mount('/content/drive')

## Forward to working directory

In [None]:
import os
os.chdir("/content/drive/MyDrive/custom_env_colab")

In [None]:
PYTHON_VERSION = "3.7"
%env PYTHONPATH = # /env/python

## Build virtual conda env

In [None]:
!wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
!sudo chmod +x Miniconda3-latest-Linux-x86_64.sh
!./Miniconda3-latest-Linux-x86_64.sh -b -f -p /usr/local
!conda update --yes conda

In [None]:
import sys
sys.path.append(f'/usr/local/lib/python{PYTHON_VERSION}/site-packages')

In [None]:
!conda create -n myenv python={PYTHON_VERSION} --yes

## For Pytorch

In [None]:
%%shell
eval "$(conda shell.bash hook)"
conda activate myenv
python -m pip install -r requirements.txt --default-timeout=3000
python -m pip install torch==1.7.1+cu101 --extra-index-url https://download.pytorch.org/whl --no-cache-dir

In [None]:
%%shell
eval "$(conda shell.bash hook)"
conda activate myenv
python test_pytorch_gpu.py

Pytorch Done!

## For Tensorflow

In [None]:
%%shell
eval "$(conda shell.bash hook)"
conda activate myenv
python test_tensorflow_gpu.py

### Fix Missing Files

It seems that tensorflow can't access GPU due to missing some files. Let's fix it.

These files are missing:

`libcudart.so.10.1`, `libcublas.so.10`, `libcufft.so.10`, `libcusolver.so.10` and `libcusparse.so.10`. are modules which real matter to GPU usage.

> `libnvinfer.so.6` and `libnvinfer_plugin.so.6` are for TensorRT, if you don't use the package, could just ignore them.

Let's unzip `cudnn-10.1-linux-x64-v7.6.5.32.tgz` and soft link the google colab cudnn files to the missing files.


```

#### Unzip cudnn package

In [None]:
!tar -xzvf cudnn-10.1-linux-x64-v7.6.5.32.tgz
!cp -P cuda/include/cudnn.h /usr/lib64-nvidia
!cp -P cuda/lib64/libcudnn* /usr/lib64-nvidia
!chmod a+r /usr/lib64-nvidia/libcudnn*

#### Soft link the google colab files to the missing files

Please read more for [**How to Find out Which File(Missing files) to Replace with Which(Cuda files)?**](#scrollTo=BsLqEu_g7hYF&line=1&uniqifier=1)

Google colab will update it's environment; therefore, the code below won't work everytime. **YOU SHOULD KNOW HOW TO FIX BY YOURSELF**.

In [None]:
!cp -P /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcudart.so.12.2.140 /usr/lib64-nvidia/libcudart.so.10.1
!cp -P /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcublas.so.12.2.5.6 /usr/lib64-nvidia/libcublas.so.10
!cp -P /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcufft.so.11.0.8.103 /usr/lib64-nvidia/libcufft.so.10
!cp -P /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcusolver.so.11.5.2.141 /usr/lib64-nvidia/libcusolver.so.10
!cp -P /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcusparse.so.12.1.2.141 /usr/lib64-nvidia/libcusparse.so.10

## Test Tensorflow Again

In [None]:
%%shell
eval "$(conda shell.bash hook)"
conda activate myenv
python test_tensorflow_gpu.py

## **IMPORTANT: How to Find out Which File(Missing files) to Replace with Which(Cuda files)?**

### STEP 1: Find files missing from Error Log

```text
2024-08-25 18:14:43.631701: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2024-08-25 18:14:43.631808: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2024-08-25 18:14:43.631841: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2024-08-25 18:14:44.443595: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2024-08-25 18:14:44.461980: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-08-25 18:14:44.462181: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:00:04.0 name: Tesla T4 computeCapability: 7.5
coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 14.75GiB deviceMemoryBandwidth: 298.08GiB/s
2024-08-25 18:14:44.462245: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2024-08-25 18:14:44.462357: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10'; dlerror: libcublas.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2024-08-25 18:14:44.462461: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2024-08-25 18:14:44.463143: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2024-08-25 18:14:44.463312: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2024-08-25 18:14:44.463431: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10'; dlerror: libcusparse.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2024-08-25 18:14:44.468145: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
```

These files are missing:

`libcublas.so.10`, `libcufft.so.10`, `libcusolver.so.10` and `libcusparse.so.10`. are modules which real matter to GPU usage. Let's find them!

> `libnvinfer.so.6` and `libnvinfer_plugin.so.6` are for TensorRT, if you don't use the package, could just ignore them.


### STEP 2: Checkout Google Colab Current Environment

In [None]:
!ls -ll /usr/local

As shown in OUPUT:
It's now in using `CUDA-12.2`
```text
total 34808
drwxr-xr-x  1 root root     4096 Aug 31 09:34 bin
drwxr-xr-x  2 root root     4096 Aug 31 09:34 cmake
drwxr-xr-x  3 root root     4096 Aug 29 13:41 colab
drwxr-xr-x  2 root root     4096 Aug 31 09:34 compiler_compat
-rwxr-xr-x  1 root root 35457696 Aug 31 09:34 _conda
drwxr-xr-x  2 root root     4096 Aug 31 09:34 condabin
drwxr-xr-x  2 root root     4096 Aug 31 09:34 conda-meta
lrwxrwxrwx  1 root root       22 Nov 10  2023 cuda -> /etc/alternatives/cuda
lrwxrwxrwx  1 root root       25 Nov 10  2023 cuda-12 -> /etc/alternatives/cuda-12
drwxr-xr-x  1 root root     4096 Nov 10  2023 cuda-12.2
drwxr-xr-x  3 root root     4096 Aug 31 09:35 envs
drwxr-xr-x  1 root root     4096 Aug 31 09:34 etc
drwxr-xr-x  2 root root     4096 Oct  4  2023 games
...
```

### STEP 3: Check whether missing files in the cuda folder


In [None]:
# Search folder by folder or refer the path below.
# Missing files: libcublas.so.10, libcufft.so.10, libcusolver.so.10, libcusparse.so.10
!ls -ll /usr/local/cuda-12.2/targets/x86_64-linux/lib/ | grep -E 'libcublas.so|libcufft.so|libcusolver.so|libcusparse.so'

# OUTPUT
# lrwxrwxrwx 1 root root        15 Aug 16  2023 libcublas.so -> libcublas.so.12
# lrwxrwxrwx 1 root root        21 Aug 16  2023 libcublas.so.12 -> libcublas.so.12.2.5.6
# -rw-r--r-- 1 root root 106675248 Aug 16  2023 libcublas.so.12.2.5.6
# lrwxrwxrwx 1 root root        14 Aug 16  2023 libcufft.so -> libcufft.so.11
# lrwxrwxrwx 1 root root        22 Aug 16  2023 libcufft.so.11 -> libcufft.so.11.0.8.103
# -rw-r--r-- 1 root root 178387496 Aug 16  2023 libcufft.so.11.0.8.103
# lrwxrwxrwx 1 root root        17 Aug 16  2023 libcusolver.so -> libcusolver.so.11
# lrwxrwxrwx 1 root root        25 Aug 16  2023 libcusolver.so.11 -> libcusolver.so.11.5.2.141
# -rw-r--r-- 1 root root 115505432 Aug 16  2023 libcusolver.so.11.5.2.141
# lrwxrwxrwx 1 root root        17 Aug 16  2023 libcusparse.so -> libcusparse.so.12
# lrwxrwxrwx 1 root root        25 Aug 16  2023 libcusparse.so.12 -> libcusparse.so.12.1.2.141
# -rw-r--r-- 1 root root 263825056 Aug 16  2023 libcusparse.so.12.1.2.141

### STEP 4: Let's modify the command

In [None]:
!cp -P /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcublas.so.12.2.5.6 /usr/lib64-nvidia/libcublas.so.10
!cp -P /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcufft.so.11.0.8.103 /usr/lib64-nvidia/libcufft.so.10
!cp -P /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcusolver.so.11.5.2.141 /usr/lib64-nvidia/libcusolver.so.10
!cp -P /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcusparse.so.12.1.2.141 /usr/lib64-nvidia/libcusparse.so.10

### Try Again

In [None]:
%%shell
eval "$(conda shell.bash hook)"
conda activate myenv
python test_tensorflow_gpu.py