# GPU-Jupyter

This Jupyterlab Instance is connected to the GPU via CUDA drivers. In this notebook, we test the installation and perform some basic operations on the GPU.

## Test GPU connection

#### Using the following command, your GPU type and its NVIDIA-SMI driver version should be listed:

In [1]:
!nvidia-smi

Thu Dec 14 17:16:30 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA RTX A6000    On   | 00000000:41:00.0 Off |                  Off |
| 30%   49C    P8    27W / 300W |      5MiB / 49140MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA RTX A6000    On   | 00000000:61:00.0 Off |                  Off |
| 35%   63C    P2    90W / 300W |   9635MiB / 49140MiB |      0%      Default |
|       

#### Now, test if PyTorch can access the GPU via CUDA:

In [2]:
import torch
torch.cuda.is_available()

True

In [3]:
import tensorflow as tf
from tensorflow.python.client import device_lib
print(tf.config.list_physical_devices('XLA_GPU'))
device_lib.list_local_devices()

2023-12-14 17:16:32.448916: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2023-12-14 17:16:32.472734: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-12-14 17:16:32.472758: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-12-14 17:16:32.473445: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-12-14 17:16:32.477355: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2023-12-14 17:16:32.477730: I tensorflow/core/platform/cpu_feature_guard.cc:1

[]


2023-12-14 17:16:33.915841: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2023-12-14 17:16:33.916057: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2023-12-14 17:16:33.916926: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2256] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required l

[name: "/device:CPU:0"
 device_type: "CPU"
 memory_limit: 268435456
 locality {
 }
 incarnation: 14747982026689315297
 xla_global_id: -1]

In [4]:
from __future__ import print_function
import numpy as np
import torch
a = torch.rand(5, 3)
a

tensor([[0.3446, 0.0452, 0.2264],
        [0.7986, 0.7481, 0.9437],
        [0.0514, 0.0179, 0.9945],
        [0.6514, 0.9786, 0.4902],
        [0.9525, 0.8661, 0.2606]])

## Performance test

#### Now we want to know how much faster a typical operation is using GPU. Therefore we do the same operation in numpy, PyTorch and PyTorch with CUDA. The test operation is the calculation of the prediction matrix that is done in a linear regression.

### 1) Numpy

In [5]:
x = np.random.rand(10000, 256)

In [6]:
%%timeit
H = x.dot(np.linalg.inv(x.transpose().dot(x))).dot(x.transpose())

260 ms ± 61.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


### 2) PyTorch

In [7]:
x = torch.rand(10000, 256)

In [8]:
%%timeit
# Calculate the projection matrix of x on the CPU
H = x.mm( (x.t().mm(x)).inverse() ).mm(x.t())

76.7 ms ± 1.28 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


### 3) PyTorch on GPU via CUDA

In [9]:
# let us run this cell only if CUDA is available
# We will use ``torch.device`` objects to move tensors in and out of GPU
if torch.cuda.is_available():
    device = torch.device("cuda")          # a CUDA device object
    x = torch.rand(10000, 256, device=device) # directly create a tensor on GPU
    y = x.to(device)                       # or just use strings ``.to("cuda")``
    print(x[0:5, 0:5])
    print(y.to("cpu", torch.double)[0:5, 0:5])

tensor([[0.3524, 0.4564, 0.5821, 0.0973, 0.7754],
        [0.7047, 0.2262, 0.4790, 0.1555, 0.5360],
        [0.0142, 0.1699, 0.9471, 0.2035, 0.9215],
        [0.5230, 0.0497, 0.8534, 0.3936, 0.3059],
        [0.8031, 0.8541, 0.3866, 0.6828, 0.7291]], device='cuda:0')
tensor([[0.3524, 0.4564, 0.5821, 0.0973, 0.7754],
        [0.7047, 0.2262, 0.4790, 0.1555, 0.5360],
        [0.0142, 0.1699, 0.9471, 0.2035, 0.9215],
        [0.5230, 0.0497, 0.8534, 0.3936, 0.3059],
        [0.8031, 0.8541, 0.3866, 0.6828, 0.7291]], dtype=torch.float64)


In [10]:
%%timeit
# Calculate the projection matrix of x on the GPU
H = x.mm( (x.t().mm(x)).inverse() ).mm(x.t())

3.37 ms ± 23.6 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)


## Exhaustive Testing on GPU

In [11]:
# let us run this cell only if CUDA is available
# We will use ``torch.device`` objects to move tensors in and out of GPU
import torch
if torch.cuda.is_available():
    device = torch.device("cuda")          # a CUDA device object
    x = torch.rand(10000, 10, device=device) # directly create a tensor on GPU

In [12]:
if torch.cuda.is_available():
    y = x.to(device)                       # or just use strings ``.to("cuda")``
    print(x[0:5, 0:5])

tensor([[0.4466, 0.0260, 0.0687, 0.6375, 0.9676],
        [0.2974, 0.0200, 0.0621, 0.4341, 0.0167],
        [0.1146, 0.3012, 0.9246, 0.1484, 0.8045],
        [0.4448, 0.5577, 0.4649, 0.2364, 0.7051],
        [0.0479, 0.7472, 0.2121, 0.9418, 0.7699]], device='cuda:0')


In [13]:
if torch.cuda.is_available():
    # Here is the memory of the GPU a border. 
    # A matrix with 100000 lines requires 37 GB, but only 8 GB are available.
    H = x.mm( (x.t().mm(x)).inverse() ).mm(x.t())

In [14]:
if torch.cuda.is_available():
    print(H[0:5, 0:5])

tensor([[ 1.2995e-03,  1.6008e-04,  3.7637e-04,  1.3155e-04,  4.5707e-05],
        [ 1.6008e-04,  8.3649e-04,  4.2130e-05,  9.5201e-05,  1.6981e-04],
        [ 3.7637e-04,  4.2130e-05,  1.1736e-03,  3.9943e-04, -2.7599e-04],
        [ 1.3155e-04,  9.5201e-05,  3.9942e-04,  4.7651e-04,  1.6600e-04],
        [ 4.5707e-05,  1.6981e-04, -2.7599e-04,  1.6600e-04,  1.3608e-03]],
       device='cuda:0')


In [15]:
if torch.cuda.is_available():
    # This operation is difficult, as an symmetric matrix is transferred 
    # back to the CPU. Is possible up to 30000 rows.
    print(H.to("cpu", torch.double)[0:5, 0:5])

tensor([[ 1.2995e-03,  1.6008e-04,  3.7637e-04,  1.3155e-04,  4.5707e-05],
        [ 1.6008e-04,  8.3649e-04,  4.2130e-05,  9.5201e-05,  1.6981e-04],
        [ 3.7637e-04,  4.2130e-05,  1.1736e-03,  3.9943e-04, -2.7599e-04],
        [ 1.3155e-04,  9.5201e-05,  3.9942e-04,  4.7651e-04,  1.6600e-04],
        [ 4.5707e-05,  1.6981e-04, -2.7599e-04,  1.6600e-04,  1.3608e-03]],
       dtype=torch.float64)
