# Confirm Notebook Can Access GPU 

This notebook provides a couple of small examples that confirm your notebook environment is able to connect to a GPU, either locally or on a cluster. The bottom section also provides a short example showcasing the increased performance of using the GPU over a CPU for a matrix multiplication task. 


The one caveat to ensuring this notebook works correctly and you can communicate with the gpu, is that you must install pytorch.

```
pip3 install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
```

In the near future this requirement will be handled automatically. 

## Am I connected to a GPU?

In [1]:
import torch
import timeit

In [2]:
x = torch.rand(5, 3)
print(x)

tensor([[0.3546, 0.9113, 0.7730],
        [0.1329, 0.8297, 0.8176],
        [0.4446, 0.3666, 0.1261],
        [0.4993, 0.8514, 0.9197],
        [0.7537, 0.0293, 0.4049]])


In [3]:
use_cuda = torch.cuda.is_available()
use_cuda

True

In [4]:
if use_cuda:
    print('__CUDNN VERSION:', torch.backends.cudnn.version())
    print('__Number CUDA Devices:', torch.cuda.device_count())
    print('__CUDA Device Name:',torch.cuda.get_device_name(0))
    print('__CUDA Device Total Memory [GB]:',torch.cuda.get_device_properties(0).total_memory/1e9)

__CUDNN VERSION: 8005
__Number CUDA Devices: 1
__CUDA Device Name: Tesla K80
__CUDA Device Total Memory [GB]: 11.99702016


The above outputs indicate that pytorch is installed, it can see the gpu, and it shows us some details about the device we are connected to. 

## Does the GPU speed things up?

In [5]:
def batched_dot_mul_sum(a,b):
    return a.mul(b).sum(-1)

In [6]:
cpu = torch.device("cpu")
gpu = torch.device("cuda")

In [7]:
x_cpu = torch.randn(4000,4000,device=cpu)
x_gpu = torch.randn(4000,4000, device=gpu)

In [8]:
t0 = timeit.Timer(stmt='batched_dot_mul_sum(x,x)', setup='from __main__ import batched_dot_mul_sum',
                  globals={'x': x_cpu})

t1 = timeit.Timer(stmt='batched_dot_mul_sum(x,x)', setup='from __main__ import batched_dot_mul_sum',
                  globals={'x': x_gpu})

print(f'mul_sum(x, x) cpu:  {t0.timeit(100)} seconds')
print(f'mul_sum(x, x) gpu:  {t1.timeit(100)} seconds')

mul_sum(x, x) cpu:  2.2202524829190224 seconds
mul_sum(x, x) gpu:  0.003196620149537921 seconds


We can see from the output above that the speed up on this particular node is orders of magnitude faster using the GPU over the CPU for a 4000,4000 matrix multiplication. Yeah!

If this notebook does not run for you, that means you are not connected to a GPU. 