# GPU-Jupyter

This Jupyterlab Instance is connected to the GPU via CUDA drivers. In this notebook, we test the installation and perform some basic operations on the GPU.

In [1]:
!jupyter --version

Selected Jupyter core packages...
IPython          : 8.22.2
ipykernel        : 6.29.3
ipywidgets       : 8.1.2
jupyter_client   : 8.6.1
jupyter_core     : 5.7.2
jupyter_server   : 2.13.0
jupyterlab       : 4.1.5
nbclient         : 0.10.0
nbconvert        : 7.16.3
nbformat         : 5.10.4
notebook         : 7.1.2
qtconsole        : not installed
traitlets        : 5.14.2


## Test GPU connection

### Using the following command, your GPU type and its NVIDIA-SMI driver version should be listed:

In [2]:
!nvidia-smi

Tue Apr  9 08:07:24 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.05              Driver Version: 545.84       CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA GeForce RTX 3060 ...    On  | 00000000:01:00.0 Off |                  N/A |
| N/A   38C    P8              10W /  60W |      0MiB /  6144MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

### Now, test if PyTorch can access the GPU via CUDA:

In [3]:
from __future__ import print_function
import numpy as np
import torch

print(torch.cuda.is_available())

a = torch.rand(5, 3)
a

True


tensor([[0.3254, 0.9917, 0.7786],
        [0.5244, 0.4340, 0.8261],
        [0.7216, 0.0024, 0.3113],
        [0.8638, 0.6087, 0.6495],
        [0.2394, 0.2831, 0.6866]])

### Now, test if Tensorflow can access the GPU via CUDA:

In [4]:
import tensorflow as tf
# NUMA warnings are no problem
print(tf.test.is_built_with_cuda(), tf.config.list_physical_devices('GPU'))

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

2024-04-09 08:07:27.416927: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


True [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 7894643083535347790
xla_global_id: -1
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 3774873600
locality {
  bus_id: 1
  links {
  }
}
incarnation: 9669600989926253149
physical_device_desc: "device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6"
xla_global_id: 416903419
]


2024-04-09 08:07:30.376208: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /device:GPU:0 with 3600 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6


In [5]:
# Test GPU-calculation in TF
with tf.device('/gpu:0'):
    a = tf.Variable(tf.random.uniform(shape=(100, 100)), name="a")
    b = tf.matmul(a, a)
    del a

np.mean(b)

2024-04-09 08:07:30.385786: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3600 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6


25.202852

## Performance test

#### Now we want to know how much faster a typical operation is using GPU. Therefore we do the same operation in numpy, PyTorch and PyTorch with CUDA. The test operation is the calculation of the prediction matrix that is done in a linear regression.

### 1) Numpy

In [6]:
x = np.random.rand(10000, 256)

In [7]:
%%timeit
H = x.dot(np.linalg.inv(x.transpose().dot(x))).dot(x.transpose())

973 ms ± 151 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


### 2) PyTorch

In [8]:
x = torch.rand(10000, 256)

In [9]:
%%timeit
# Calculate the projection matrix of x on the CPU
H = x.mm( (x.t().mm(x)).inverse() ).mm(x.t())

290 ms ± 44.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


### 3) PyTorch on GPU via CUDA

In [10]:
# let us run this cell only if CUDA is available
# We will use ``torch.device`` objects to move tensors in and out of GPU
if torch.cuda.is_available():
    device = torch.device("cuda")          # a CUDA device object
    x = torch.rand(10000, 256, device=device) # directly create a tensor on GPU
    y = x.to(device)                       # or just use strings ``.to("cuda")``
    print(x[0:5, 0:5])
    print(y.to("cpu", torch.double)[0:5, 0:5])

tensor([[0.3845, 0.0880, 0.5156, 0.4110, 0.5014],
        [0.7350, 0.1050, 0.2858, 0.1222, 0.2921],
        [0.3644, 0.0735, 0.8485, 0.8745, 0.8900],
        [0.8959, 0.6109, 0.7280, 0.2928, 0.9440],
        [0.0158, 0.1069, 0.9924, 0.0682, 0.6148]], device='cuda:0')
tensor([[0.3845, 0.0880, 0.5156, 0.4110, 0.5014],
        [0.7350, 0.1050, 0.2858, 0.1222, 0.2921],
        [0.3644, 0.0735, 0.8485, 0.8745, 0.8900],
        [0.8959, 0.6109, 0.7280, 0.2928, 0.9440],
        [0.0158, 0.1069, 0.9924, 0.0682, 0.6148]], dtype=torch.float64)


In [11]:
%%timeit
# Calculate the projection matrix of x on the GPU
H = x.mm( (x.t().mm(x)).inverse() ).mm(x.t())

51.4 ms ± 121 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
