### Comparison of speed of matrix multiplication using cuda (gpu) vs cpu
https://www.youtube.com/watch?v=r9IqwpMR9TE

https://github.com/MariyaSha/CUDA_speedtest/blob/main/CUDA_speedtest.ipynb

#### CPU
```
Intel(R) Core(TM) i9-9900 CPU @ 3.10GHz with 62 GB DDR4 memory
```

#### GPU configuration
```shell
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.11    Driver Version: 525.60.11    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro RTX 4000     Off  | 00000000:01:00.0  On |                  N/A |
| 32%   72C    P0    48W / 125W |   2295MiB /  8192MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
```

In [1]:
import torch

if torch.cuda.is_available():
    device = torch.device("cuda")
else:
    device = torch.device("cpu")
    
print("using", device, "device")

using cuda device


In [6]:
import time

matrix_size = 32*256

x = torch.randn(matrix_size, matrix_size)
y = torch.randn(matrix_size, matrix_size)

print("************ CPU SPEED ***************")
start = time.time()
result = torch.matmul(x,y)
print(time.time() - start)
print("verify device:", result.device)

x_gpu = x.to(device)
y_gpu = y.to(device)
torch.cuda.synchronize()

for i in range(3):
    print("************ GPU SPEED ***************")
    start = time.time()
    result_gpu = torch.matmul(x_gpu,y_gpu)
    torch.cuda.synchronize()
    print(time.time() - start)
    print("verify device:", result_gpu.device)

************ CPU SPEED ***************
2.0924489498138428
verify device: cpu
************ GPU SPEED ***************
0.20512938499450684
verify device: cuda:0
************ GPU SPEED ***************
0.17242193222045898
verify device: cuda:0
************ GPU SPEED ***************
0.17338085174560547
verify device: cuda:0
