# GPU-Jupyter

[GPU-Jupyter](https://github.com/iot-salzburg/gpu-jupyter): Your GPU-accelerated JupyterLab with PyTorch, TensorFlow, and a rich data science toolstack for your reproducible deep learning experiments.

In this notebook, we test the installation of NVIDIA drivers, CUDA, and if PyTorch and Tensorflow can access the GPU.

In [1]:
!jupyter --version

Selected Jupyter core packages...
IPython          : 9.9.0
ipykernel        : 7.1.0
ipywidgets       : 8.1.5
jupyter_client   : 8.7.0
jupyter_core     : 5.9.1
jupyter_server   : 2.15.0
jupyterlab       : 4.5.1
nbclient         : 0.10.4
nbconvert        : 7.16.6
nbformat         : 5.10.4
notebook         : 7.5.1
qtconsole        : not installed
traitlets        : 5.14.3


## Test GPU Support

Using `nvidia-smi`, the latest supported CUDA version is listed.

In [2]:
!nvidia-smi

Fri Jan  9 07:57:09 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.105.08             Driver Version: 580.105.08     CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA RTX A6000               On  |   00000000:41:00.0 Off |                  Off |
| 30%   48C    P8             54W /  300W |       1MiB /  49140MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA RTX A6000               On  |   00

In [3]:
!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Tue_May_27_02:21:03_PDT_2025
Cuda compilation tools, release 12.9, V12.9.86
Build cuda_12.9.r12.9/compiler.36037853_0


## Test TensorFlow Installation

In [4]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '1'  # deactivate minor TF warnings

import tensorflow as tf

# set a seed for the experiment's repeatability, thus ACM's R1
tf.random.set_seed(seed=23)

print(f"Tensorflow version: '{tf.__version__}'")
print(tf.test.is_built_with_cuda())
print(tf.config.list_physical_devices('GPU'))
print(tf.reduce_sum(tf.random.normal([1000, 1000])))

  if not hasattr(np, "object"):


Tensorflow version: '2.20.0'
True
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')]
tf.Tensor(389.54736, shape=(), dtype=float32)


I0000 00:00:1767945432.665107    1118 gpu_device.cc:2020] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 46731 MB memory:  -> device: 0, name: NVIDIA RTX A6000, pci bus id: 0000:41:00.0, compute capability: 8.6
I0000 00:00:1767945432.666721    1118 gpu_device.cc:2020] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 46731 MB memory:  -> device: 1, name: NVIDIA RTX A6000, pci bus id: 0000:61:00.0, compute capability: 8.6


## Test PyTorch Installation:

In [5]:
from __future__ import print_function
import numpy as np
import torch

# set a seed for the experiment's repeatability, thus ACM's R1
RANDOM_SEED = 23
np.random.seed(RANDOM_SEED)
torch.manual_seed(RANDOM_SEED)
torch.cuda.manual_seed_all(RANDOM_SEED)

# print the PyTorch version and if the GPU is accessible
print(f"PyTorch version: '{torch.__version__}'")
print(torch.cuda.is_available())

a = torch.rand(5, 3)
a

PyTorch version: '2.9.1+cu128'
True


tensor([[0.4283, 0.2889, 0.4224],
        [0.3571, 0.9577, 0.1100],
        [0.2933, 0.9205, 0.5876],
        [0.1299, 0.6729, 0.1028],
        [0.7876, 0.5540, 0.4653]])

## Performance test

Now we want to know how much faster a typical operation is using a GPU. Therefore, we compute the same operation in NumPy, PyTorch (CPU), PyTorch (GPU), and TensorFlow (GPU). The operation is the calculation of the prediction matrix in linear regression.

### 1) Numpy

In [6]:
x = np.random.rand(10000, 256)

In [7]:
%%timeit -n 3 -r 3
H = x.dot(np.linalg.inv(x.transpose().dot(x))).dot(x.transpose())

159 ms ± 11.7 ms per loop (mean ± std. dev. of 3 runs, 3 loops each)


### 2) PyTorch

In [8]:
x = torch.rand(10000, 256)

In [9]:
%%timeit -n 3 -r 5
# Calculate the projection matrix of x on the CPU
H = x.mm( (x.t().mm(x)).inverse() ).mm(x.t())

67.8 ms ± 3.13 ms per loop (mean ± std. dev. of 5 runs, 3 loops each)


### 3) PyTorch on GPU via CUDA

In [10]:
# let us run this cell only if CUDA is available
# We will use ``torch.device`` objects to move tensors in and out of GPU
if torch.cuda.is_available():
    device = torch.device("cuda")          # a CUDA device object
    x = torch.rand(10000, 256, device=device) # directly create a tensor on GPU
    y = x.to(device)                       # or just use strings ``.to("cuda")``
    print(x[0:5, 0:5])
    print(y.to("cpu", torch.double)[0:5, 0:5])

tensor([[0.3186, 0.6480, 0.0896, 0.6143, 0.8915],
        [0.5869, 0.9017, 0.4452, 0.4422, 0.7567],
        [0.2252, 0.6995, 0.4897, 0.3269, 0.5074],
        [0.8776, 0.7383, 0.6485, 0.0821, 0.6425],
        [0.9088, 0.3180, 0.8072, 0.1974, 0.5014]], device='cuda:0')
tensor([[0.3186, 0.6480, 0.0896, 0.6143, 0.8915],
        [0.5869, 0.9017, 0.4452, 0.4422, 0.7567],
        [0.2252, 0.6995, 0.4897, 0.3269, 0.5074],
        [0.8776, 0.7383, 0.6485, 0.0821, 0.6425],
        [0.9088, 0.3180, 0.8072, 0.1974, 0.5014]], dtype=torch.float64)


In [11]:
%%timeit -n 10 -r 7
# Calculate the projection matrix of x on the GPU
H = x.mm( (x.t().mm(x)).inverse() ).mm(x.t())

4.63 ms ± 3.02 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


### 4) TensorFlow on GPU via CUDA

In [12]:
if tf.config.list_physical_devices("GPU"):
    # Create a tensor on GPU (TF will place it on GPU by default when available)
    x = tf.random.uniform([10000, 256], dtype=tf.float32)
    y = tf.identity(x)  # equivalent to "move/copy" (will stay on GPU if x is on GPU)
    # Print slices
    print(x[:5, :5].numpy())

[[0.9438342  0.3385067  0.7677115  0.6680695  0.3649447 ]
 [0.27590752 0.59973633 0.60091686 0.69706666 0.6652031 ]
 [0.05486917 0.60015345 0.7303884  0.86384666 0.1850121 ]
 [0.25567186 0.04193914 0.43598843 0.30942154 0.5328213 ]
 [0.07237387 0.17294586 0.9945524  0.182423   0.22695827]]


In [13]:
%%timeit -n 10 -r 7
# Projection matrix: H = x (x^T x)^{-1} x^T
H = tf.matmul(x, tf.linalg.solve(tf.matmul(tf.transpose(x), x), tf.transpose(x)))

2.89 ms ± 303 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)


I0000 00:00:1767945437.550987    1118 cuda_solvers.cc:175] Creating GpuSolver handles for stream 0x5c5bb8bcbfa0
