## The previous 9 notebooks were learnt using PyTorch 1.x features, in this notebook we summarise the new features introduced in PyTorch 2.x

##`torch.compile` introduces various speed-ups to the model training and predictions, especially for transformer based models.

## Before 2.0

In [None]:
import torch
import torchvision

model = torchvision.models.resnet50()

In [None]:
torch.__version__

'2.1.0+cu121'

## After 2.0

In [None]:
import torch
import torchvision

model = torchvision.models.resnet50() # could be any model
compiled_model = torch.compile(model) # NEW IN 2.0 --> faster model



The speedups are relative according to how much the GPU is used, greater the number of parameters, greater the speedups. The larger model will take a longer time to train but will be relatively faster. E.g. a model with 1M parameters may take 10 min to train and a model with 10M parameters may take 20 min to train (instead of 100 min)

* Can increase batch size
* Increase data size (eg. 224x224 instead of 32x32)
* Increase model size (use more parameters)
* Decreasing data transfer - bandwidth costts (transferring data) will slow down a GPU (it wants to compute on data)

As a result of doing the above, the **relative** speedups should be better

## Globally set devices + context manager

Previously we set the device of our models/tensors with `tensor.to(device)` or `model.to(device)`

You can now set the device with a context manager

In [None]:
import torch

# Set device
device = "cuda" if torch.cuda.is_available() else "cpu"

# Set device with context manager NEW
with torch.device(device):
  # All tensors or pytorch objects created in this context manager will be on target device
  layer = torch.nn.Linear(20, 30)
  print(f"Layer weights are on device: {layer.weight.device}")
  print(f"Layer creating data on device: {layer(torch.randn(128, 20)).device}")

Layer weights are on device: cpu
Layer creating data on device: cpu


In [None]:
import torch

# Set the device
device = "cuda" if torch.cuda.is_available() else "cpu"

# Set device globally NEW
torch.set_default_device(device)

# All tensors or pytorch objects created will be onn target device
layer = torch.nn.Linear(20, 30)
print(f"Layer weights are on device: {layer.weight.device}")
print(f"Layer creating data on device: {layer(torch.randn(128, 20)).device}")

Layer weights are on device: cpu
Layer creating data on device: cpu


## Other speedups with TF32

TF32 = TensorFloat32 a datatype that bridges Float32(FP32 - default in PyTorch) (more accurate) and Float16(FP16) (faster computation)

TensorFloat32 is a datatype from NVIDIA which combines the accuracy of float32 and speed of float16

Available on NVIDIA Ampere GPUs+

Usually works when GPU Score >= (8, 0)


In [None]:
# Get GPU capability score
GPU_SCORE = torch.cuda.get_device_capability()
print(f"GPU capability score: {GPU_SCORE}")
if GPU_SCORE >= (8, 0):
  torch.backends.cuda.matmul.allow_tf32 = True
else:
  torch.backends.cuda.matmul.allow_tf32 = False # if get inconsistent results can change back to false

RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx