# Using [torch](https://pytorch.org/docs/stable/cuda.html) to compare CPU/GPU speeds
stough 202-

The [Graphics Processing Unit](https://www.extremetech.com/gaming/269335-how-graphics-cards-work)
is a common [coprocessor](https://en.wikipedia.org/wiki/Coprocessor) designed to do parallel floating point
arithmetic. In the past this was computer graphics, but this massively parallel math is useful 
in all scientific computation.

Also, going to use jupyterlab [magic commands](https://ipython.readthedocs.io/en/stable/interactive/magics.html).

In [None]:
import torch
import torch.cuda as cuda
import numpy as np

import time
from IPython.display import display, Markdown

In [None]:
cuda.is_available()

In [None]:
cuda.get_device_name(0)

In [None]:
cuda.get_device_properties(0)

&nbsp;

## We'll do a large matrix multiply operation
in numpy, torch, and torch on the GPU.

In [None]:
A = np.random.rand(400,1000,200)
B = np.random.rand(400,200,1000)

In [None]:
(8*A.size)/(1024**2)

In [None]:
C = np.matmul(A,B)
C.shape

In [None]:
8*C.size/(1024**2)

In [None]:
%%timeit -n 5 -r 4
# C = np.matmul(A,B)
np.matmul(A,B, out=C)

&nbsp;

### Test in Torch

In [None]:
At_cpu = torch.from_numpy(A)
Bt_cpu = torch.from_numpy(B)

In [None]:
At_cpu.is_cuda

In [None]:
Ct_cpu = torch.matmul(At_cpu, Bt_cpu)
Ct_cpu.shape

In [None]:
%%timeit -n 5 -r 4
# Ct_cpu = torch.matmul(At_cpu, Bt_cpu)
torch.matmul(At_cpu, Bt_cpu, out = Ct_cpu)

&nbsp;

### Now test in torch, on the GPU

In [None]:
At_gpu = At_cpu.cuda()
Bt_gpu = Bt_cpu.cuda()
Ct_gpu = torch.zeros_like(Ct_cpu).cuda()

In [None]:
At_gpu.is_cuda

In [None]:
%%timeit -n 5 -r 4
torch.matmul(At_gpu, Bt_gpu, out=Ct_gpu)

&nbsp;

### Try again, with our own timing.

- [math expressions in markdown](https://stackoverflow.com/questions/48422762/is-it-possible-to-show-print-output-as-latex-in-jupyter-notebook)

In [None]:
times = []

st = time.time()

for i in range(100):
    torch.zeros(Ct_gpu.shape, out=Ct_gpu)
    torch.matmul(At_gpu, Bt_gpu, out=Ct_gpu)
    times.append(time.time() - st)

et = time.time()

# Why not be more complicated...
# print(f'20 iters took {1000000*(et-st):.2f}')
display(Markdown(rf'20 iters took {1000000*(et-st):.2f}$\mu$s per.'))

In [None]:
times