# The CPU

Colab link [here](https://colab.research.google.com/drive/1uyPnxmF6-_zs_DWmar1JzRweQi-SiwFU?usp=sharing)

We all know that the cpu is the heart of a computer. However, it actually isn't the best for training models. Pytorch allows us to utilize our GPU's ability to compute many parallel computations at once to speed up training. So, how do we do this?

***
# CUDA

CUDA is the most well known toolkit used to speed up training. It requires an Nvidia GPU.

Utilizing CUDA is extremely simple with Pytorch. Let's look at an example below.

In [2]:
# import necessary module
import torch

device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(device)

cuda


***
# MPS

MPS is Apple silicon's equivalent to CUDA. What's nice about MPS is that it can utilize your entire device's ram, rather than being limited by a standard GPU's vram.

It is activated just like CUDA.

In [None]:
device = 'mps' if torch.backends.mps.is_available() else 'cpu'

If we have cuda/mps installed we will now be using the GPU.

<br>

Now that our device is defined, we need to move our computations to our GPU. This is done with `.to(device)`.

In [3]:
tensor1 = torch.ones(3, 3)
print(f'Device upon initialization: {tensor1.device}')

# moving the tensor to the gpu
tensor1 = tensor1.to(device)
print(f'Device after moving: {tensor1.device}')

# we can also initialize devices upon tensor creation
tensor2 = torch.ones(2, 2, device=device)
print(f'Device upon initialization: {tensor2.device}')

Device upon initialization: cpu
Device after moving: cuda:0
Device upon initialization: cuda:0


Now all of our computations can be done significantly faster. Keep in mind that by default everything is created on the CPU and must manually be moved to the GPU. This includes models and loss calculations.

<br>

Lets see an example of how to move a model to our GPU.

In [4]:
# import necessary modules
import torch.nn as nn
import torch.nn.functional as F

class testModel(nn.Module):
  def __init__(self):
    super().__init__()
    self.layer1 = nn.Linear(10, 5)
    self.layer2 = nn.Linear(5, 1)

  def forward(self, x):
    x = self.layer1(x)
    x = F.relu(x)
    x = self.layer2(x)
    return x

model = testModel()
print(f'Model device upon initialization: {model.layer1.weight.device}')
model.to(device)
print(f'Model device after moving: {model.layer1.weight.device}')

Model device upon initialization: cpu
Model device after moving: cuda:0


***
# Important Note

Cross device calculations cannot be completed. An error will always occur. Make sure that all tensors are on the same device before doing calculations. Always use `.to(device)` to make sure tensors are on the same device.

Here is an example of an error you will get with cross device tensors.

In [5]:
cpu_tensor = torch.ones(3, 3)

gpu_tensor = torch.ones(3, 3, device=device)

try:
  print(cpu_tensor + gpu_tensor)
except Exception as e:
  print(e)

Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
