# GPU-Jupyter

This JupyterLab has GPU-support via NVIDIA CUDA drivers and the NVIDIA Container Toolkit. 
In this notebook, we test the installation and perform some basic operations on the GPU.

In [1]:
!jupyter --version

Selected Jupyter core packages...
IPython          : 8.32.0
ipykernel        : 6.29.5
ipywidgets       : 8.1.5
jupyter_client   : 8.6.3
jupyter_core     : 5.7.2
jupyter_server   : 2.14.2
jupyterlab       : 4.3.5
nbclient         : 0.10.2
nbconvert        : 7.16.6
nbformat         : 5.10.4
notebook         : 7.3.2
qtconsole        : not installed
traitlets        : 5.14.3


## Test GPU support

### Using the following command, your NVIDIA GPU and its CUDA driver version should be listed:

In [2]:
!nvidia-smi

Fri Feb 28 13:45:24 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.04              Driver Version: 561.17         CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA GeForce RTX 3060 ...    On  |   00000000:01:00.0  On |                  N/A |
| N/A   40C    P8             12W /   60W |     329MiB /   6144MiB |     47%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

In [3]:
!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Oct_29_23:50:19_PDT_2024
Cuda compilation tools, release 12.6, V12.6.85
Build cuda_12.6.r12.6/compiler.35059454_0


### PyTorch with access to the NVIDIA GPU:

In [4]:
from __future__ import print_function
import numpy as np
import torch

# set a seed for the experiment's repeatability, thus ACM's R1
RANDOM_SEED = 23
np.random.seed(RANDOM_SEED)
torch.manual_seed(RANDOM_SEED)
torch.cuda.manual_seed_all(RANDOM_SEED)

# print the PyTorch version and if the GPU is accessible
print(f"PyTorch version: '{torch.__version__}'")
print(torch.cuda.is_available())

a = torch.rand(5, 3)
a

PyTorch version: '2.6.0+cu126'
True


tensor([[0.4283, 0.2889, 0.4224],
        [0.3571, 0.9577, 0.1100],
        [0.2933, 0.9205, 0.5876],
        [0.1299, 0.6729, 0.1028],
        [0.7876, 0.5540, 0.4653]])

### TensorFlow with access to the NVIDIA GPU:

In [5]:
import tensorflow as tf

tf.random.set_seed(RANDOM_SEED)
# print TensorFlow version
print(f"Tensorflow version: '{tf.__version__}'")

# print GPU support via CUDA (NUMA warnings are no problem)
print(tf.test.is_built_with_cuda(), tf.config.list_physical_devices('GPU'))

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

2025-02-28 13:45:27.152695: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1740750327.168237    2069 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1740750327.172865    2069 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-02-28 13:45:27.191600: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Tensorflow version: '2.18.0'
True [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 3457522988752085538
xla_global_id: -1
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 3760193536
locality {
  bus_id: 1
  links {
  }
}
incarnation: 18019360743414494751
physical_device_desc: "device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6"
xla_global_id: 416903419
]


I0000 00:00:1740750330.604077    2069 gpu_device.cc:2022] Created device /device:GPU:0 with 3586 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6


In [6]:
# Test GPU-calculation in TF
with tf.device('/gpu:0'):
    a = tf.Variable(tf.random.uniform(shape=(100, 100)), name="a")
    b = tf.matmul(a, a)
    del a

np.mean(b)

I0000 00:00:1740750330.614717    2069 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3586 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6


np.float32(25.162207)

## Performance test

#### Now we want to know how much faster a typical operation is using GPU. Therefore we do the same operation in numpy, PyTorch and PyTorch with CUDA. The test operation is the calculation of the prediction matrix that is done in a linear regression.

### 1) Numpy

In [7]:
x = np.random.rand(10000, 256)

In [8]:
%%timeit
H = x.dot(np.linalg.inv(x.transpose().dot(x))).dot(x.transpose())

1.1 s ± 469 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


### 2) PyTorch

In [9]:
x = torch.rand(10000, 256)

In [10]:
%%timeit
# Calculate the projection matrix of x on the CPU
H = x.mm( (x.t().mm(x)).inverse() ).mm(x.t())

289 ms ± 9.94 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


### 3) PyTorch on GPU via CUDA

In [11]:
# let us run this cell only if CUDA is available
# We will use ``torch.device`` objects to move tensors in and out of GPU
if torch.cuda.is_available():
    device = torch.device("cuda")          # a CUDA device object
    x = torch.rand(10000, 256, device=device) # directly create a tensor on GPU
    y = x.to(device)                       # or just use strings ``.to("cuda")``
    print(x[0:5, 0:5])
    print(y.to("cpu", torch.double)[0:5, 0:5])

tensor([[0.3186, 0.6480, 0.0896, 0.6143, 0.8915],
        [0.5869, 0.9017, 0.4452, 0.4422, 0.7567],
        [0.2252, 0.6995, 0.4897, 0.3269, 0.5074],
        [0.8776, 0.7383, 0.6485, 0.0821, 0.6425],
        [0.9088, 0.3180, 0.8072, 0.1974, 0.5014]], device='cuda:0')
tensor([[0.3186, 0.6480, 0.0896, 0.6143, 0.8915],
        [0.5869, 0.9017, 0.4452, 0.4422, 0.7567],
        [0.2252, 0.6995, 0.4897, 0.3269, 0.5074],
        [0.8776, 0.7383, 0.6485, 0.0821, 0.6425],
        [0.9088, 0.3180, 0.8072, 0.1974, 0.5014]], dtype=torch.float64)


In [12]:
%%timeit
# Calculate the projection matrix of x on the GPU
H = x.mm( (x.t().mm(x)).inverse() ).mm(x.t())

51.1 ms ± 234 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
