<a href="https://colab.research.google.com/github/gauthiermartin/pytorch-deep-learning-course/blob/main/10_Introduction_to_pytorch_2_0_and_torch_compile.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# PyTorch 2.0 Quick Intro

* Reference Book Chapter - https://www.learnpytorch.io/pytorch_2_intro/
* PyTorch 2.0 Release Notes - https://pytorch.org/blog/pytorch-2.0-release/

In [1]:
import torch
print(torch.__version__)

2.1.0+cu118


In [2]:
!nvidia-smi

Thu Nov  2 11:39:30 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   32C    P0    24W / 300W |      0MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

## Quick code examples

### Before PyTorch 2.0

In [1]:
import torch
import torchvision

model = torchvision.models.resnet50()

### After Pytorch 2.0

In [2]:
import torch
import torchvision

model = torchvision.models.resnet50() # note: this could be any model
compiled_model = torch.compile(model)

## 0. Getting Setup

In [3]:
import torch
import torchvision

# Check PyTorch version
pt_version = torch.__version__
print(f"[INFO] Current PyTorch version: {pt_version} (should be 2.x+)")

# Install PyTorch 2.0 if necessary
if pt_version.split(".")[0] == "1": # Check if PyTorch version begins with 1
    !pip3 install -U torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
    print("[INFO] PyTorch 2.x installed, if you're on Google Colab, you may need to restart your runtime.\
          Though as of April 2023, Google Colab comes with PyTorch 2.0 pre-installed.")
    import torch
    pt_version = torch.__version__
    print(f"[INFO] Current PyTorch version: {pt_version} (should be 2.x+)")
else:
    print("[INFO] PyTorch 2.x installed, you'll be able to use the new features.")



[INFO] Current PyTorch version: 2.1.0+cu118 (should be 2.x+)
[INFO] PyTorch 2.x installed, you'll be able to use the new features.


## 1. Get GPU info

Why get GPU info ?  

Becasue PyTorch 2.0 feature leverage the features on the new NVIDIA GPU.

Well what's a newer NVIDIA GPU ?

To find out NVIDIA GPU Compatibility Score - https://developer.nvidia.com/cuda-gpus


If your GPU has a score of 8.0+, it can leverage **most** of the new PyTorch 2.0 Feature, but some of these feature will still have an impact on lower score GPU but impact will be less.

**Note:** If you are wondering witch gpu to use for deep learning - Checkout Tim Dettmers blog post "Which GPU for deep learning" - https://timdettmers.com/2023/01/30/which-gpu-for-deep-learning/

In [4]:
# Make sure we're using a NVIDIA GPU
if torch.cuda.is_available():
  gpu_info = !nvidia-smi
  gpu_info = '\n'.join(gpu_info)
  if gpu_info.find("failed") >= 0:
    print("Not connected to a GPU, to leverage the best of PyTorch 2.0, you should connect to a GPU.")

  # Get GPU name
  gpu_name = !nvidia-smi --query-gpu=gpu_name --format=csv
  gpu_name = gpu_name[1]
  GPU_NAME = gpu_name.replace(" ", "_") # remove underscores for easier saving
  print(f'GPU name: {GPU_NAME}')

  # Get GPU capability score
  GPU_SCORE = torch.cuda.get_device_capability()
  print(f"GPU capability score: {GPU_SCORE}")
  if GPU_SCORE >= (8, 0):
    print(f"GPU score higher than or equal to (8, 0), PyTorch 2.x speedup features available.")
  else:
    print(f"GPU score lower than (8, 0), PyTorch 2.x speedup features will be limited (PyTorch 2.x speedups happen most on newer GPUs).")

  # Print GPU info
  print(f"GPU information:\n{gpu_info}")

else:
  print("PyTorch couldn't find a GPU, to leverage the best of PyTorch 2.0, you should connect to a GPU.")

GPU name: NVIDIA_A100-SXM4-40GB
GPU capability score: (8, 0)
GPU score higher than or equal to (8, 0), PyTorch 2.x speedup features available.
GPU information:
Fri Nov  3 11:40:59 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA A100-SXM...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   33C    P0    42W / 400W |      3MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
        

### 1.1 Globaly set devices

Previously, we've set the device of our tensors/models using `.to(device)`

* `tensor.to(device)`
* `model.to(device)`

But in PyTorch 2.0 it is possible to set the device with a context manager as well as a global device

Docs:
- https://pytorch.org/tutorials/recipes/recipes/changing_default_device.html?highlight=device%20context%20manager
- https://pytorch.org/docs/stable/generated/torch.set_default_device.html

In [6]:
import torch

# Set the device

device = "cuda" if torch.cuda.is_available else "cpu"

# Set the device with context manager (Requires PyTorch 2.x +)
with torch.device(device):
    # All tensors / models created within context manager will be on device without using the `to` method
    layer = torch.nn.Linear(10, 10)

    print(f"Model on Device: {layer.weight.device}")

Model on Device: cuda:0


In [10]:
# Set the device globally (Requires PyTorch 2.x +)
torch.set_default_device(device)

# All tensors / models created within context manager will be on device without using the `to` method
layer = torch.nn.Linear(10, 10)

print(f"Model on Device: {layer.weight.device}")

Model on Device: cuda:0


In [11]:
# Set the device globally (Requires PyTorch 2.x +)
torch.set_default_device("cpu")

# All tensors / models created within context manager will be on device without using the `to` method
layer = torch.nn.Linear(10, 10)

print(f"Model on Device: {layer.weight.device}")

Model on Device: cpu


## 2. Setting up the experiments

Time to test speed!

To keep simple we will run 4 examples


* Model: ResNet 50 from torchvision
* Data: CIFA10 from torchvision
* Epochs: 4 single run and 3x5 (multi-run)
* Batch Size : 128

Note you may want to change this depending on the available GPU memory available



In [12]:
!nvidia-smi

Fri Nov  3 11:59:38 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA A100-SXM...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   34C    P0    47W / 400W |   1137MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [15]:
import torch
import torchvision

print(f"PyTorch version: {torch.__version__}")
print(f"TorchVision version: {torchvision.__version__}")

# Set the target device
device = "cuda" if torch.cuda.is_available() else "cpu"

torch.set_default_device(device)
print(f"Using Default Device: {device}")

PyTorch version: 2.1.0+cu118
TorchVision version: 0.16.0+cu118
Using Default Device: cuda


In [17]:
# Create model weights and transforms
model_weights = torchvision.models.ResNet50_Weights.IMAGENET1K_V2 # <- use the latest weights (could also use .DEFAULT)
transforms = model_weights.transforms()

# Setup model
model = torchvision.models.resnet50(weights=model_weights)

# Count the number of parameters in the model
total_params = sum(
    param.numel() for param in model.parameters() # <- all params
)

total_trainable_parms = sum(
    param.numel() for param in model.parameters() if param.requires_grad # <- only trainable params
)

print(f"Total parameters of model: {total_params} (the more parameters, the more GPU memory the model will use, the more *relative* of a speedup you'll get)")
print(f"Total trainable parameters of model: {total_trainable_parms}")
print(f"Model transforms:\n{transforms}")

Total parameters of model: 25557032 (the more parameters, the more GPU memory the model will use, the more *relative* of a speedup you'll get)
Total trainable parameters of model: 25557032
Model transforms:
ImageClassification(
    crop_size=[224]
    resize_size=[232]
    mean=[0.485, 0.456, 0.406]
    std=[0.229, 0.224, 0.225]
    interpolation=InterpolationMode.BILINEAR
)


**Note:** Pytorch 2.x *relative* speed up will most noticable when as much of the GPU is possible us being used. This means a larger model (more trainable parameters) will have a larger relative speedup.


In [19]:
def create_model(num_classes=10):

  model_weights = torchvision.models.ResNet50_Weights.DEFAULT
  transforms = model_weights.transforms()

  # Setup model
  model = torchvision.models.resnet50(weights=model_weights)

  model.fc = torch.nn.Linear(in_features=2048, out_features=10)

  return model, transforms

In [22]:
model, transforms = create_model()
transforms

ImageClassification(
    crop_size=[224]
    resize_size=[232]
    mean=[0.485, 0.456, 0.406]
    std=[0.229, 0.224, 0.225]
    interpolation=InterpolationMode.BILINEAR
)

### 2.2 Speedups are most noticable when a large portion of the GPU is being used

Since modern GPUs are so *fast* at performing operations, you will often notice the majority of *relative* speedups when as much data as possible is on the GPU.

In practice, you generally want to use as much of the GPU memory as possible.

* Increasing batch size - With a larger memory GPU you can increase batch size to 64, 128, 256, 512 ...
* Increasing data size - Use 224x224 image vs 32x32 or event 336x336
* Increase the model size - from 1M params to 10M params
* Decrease data transfer - Since bandwidth costs (transfering data) will slow down a GPU (because it wants to compute on data)

As a results of doing the above, your relative speedup should be better

E.g. Overall training time may take longer but not lineraly.


Resource




