# Introduction

In this notebook, we will practice using **PyTorch** to **save and load models**. We will train a **CNN** for the **CIFAR image classification** task.

<br>

**Libraries**

In [1]:
# Core
import numpy as np
import pandas as pd
import seaborn as sns
sns.set_style('darkgrid')
sns.set(style='darkgrid', font_scale=1.4)
import matplotlib.pyplot as plt
%matplotlib inline
import random

# Sklearn
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.pipeline import make_pipeline
from sklearn.compose import make_column_transformer

# Pytorch
import torch
import torch.nn as nn
import torch.optim as optim
import torch.optim.lr_scheduler as lr_scheduler
from torch.utils.data import Dataset, DataLoader
import torchvision
from torchvision import transforms

**Reproducibility**

In [2]:
# Random seeds
def set_seed(seed=0):
    np.random.seed(seed)
    random.seed(seed)
    torch.manual_seed(seed)
set_seed()

**Config**

In [3]:
# Hyperparameters
BATCH_SIZE = 64
LEARNING_RATE = 0.01
N_EPOCHS = 20

Connect to **GPU** if available, otherwise use the **CPU**.

In [4]:
# Config device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cpu')

To use the **device**, in general we need to send the **model**, **features** and **labels** to the device.
1. `model.to(device)`
2. `features = features.to(device)`
3. `labels = labels.to(device)`

# Data

A **dataloader** will feed in **batches** of training examples to the model. This is especially **useful** for very **large datasets** where you can't load the whole dataset in one go. 

In [5]:
# Transformations
transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))])

In [6]:
# Dataset
train_dataset = torchvision.datasets.CIFAR10(root='./data', train=True, transform=transform, download=True)
test_dataset = torchvision.datasets.CIFAR10(root='./data', train=False, transform=transform, download=False)

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


  0%|          | 0/170498071 [00:00<?, ?it/s]

Extracting ./data/cifar-10-python.tar.gz to ./data


In [7]:
# Dataloader
train_loader = DataLoader(dataset=train_dataset, batch_size=BATCH_SIZE, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=BATCH_SIZE, shuffle=False)

In [8]:
# List of class names
classes = ['plane','car','bird','cat','deer','dog','frog','horse','ship','truck']

# Model

When working with **neural networks**, we need to **inherit** the `nn.Module` and define a `forward` attribute. The inheritance part is done to get access to **attributes** like `model.parameters()`, which are used in training. 

Note that we **don't** need to apply **softmax** at the end because our loss function will apply it for us.

In [9]:
# Model
class ConvNet(nn.Module):
    def __init__(self):
        super().__init__()
        
        # Layers
        self.conv1=nn.Conv2d(in_channels=3, out_channels=16, kernel_size=5, stride=1, padding=0)
        self.pool=nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        self.conv2=nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, stride=1, padding=0)
        self.flat=nn.Flatten()
        self.fc1=nn.Linear(in_features=32*5*5, out_features=128)
        self.fc2=nn.Linear(in_features=128, out_features=64)
        self.fc3=nn.Linear(in_features=64, out_features=10)
        self.relu=nn.ReLU()
    
    def forward(self, x):
        # Block 1
        out = self.conv1(x)
        out = self.relu(out)
        out = self.pool(out)
        
        # Block 2
        out = self.conv2(out)
        out = self.relu(out)
        out = self.pool(out)
        
        # FC1
        out = self.flat(out)
        out = self.fc1(out)
        out = self.relu(out)
        
        # FC2
        out = self.fc2(out)
        out = self.relu(out)
        
        # FC3
        out = self.fc3(out)
        return out

model = ConvNet().to(device)

# Loss & optimiser

The **loss** function is called from the `torch.nn` library and the **optimiser** from the `torch.optim` library. Note that we need to input the model **parameters** into the optimiser.

We also use a **learning rate scheduler** to gradually reduce the learning rate over time.

In [10]:
# Loss
loss = nn.CrossEntropyLoss()

# Optimiser
optimiser = optim.SGD(params=model.parameters(), lr=LEARNING_RATE)

# Learning rate scheduler
scheduler = lr_scheduler.CosineAnnealingLR(optimiser, T_max=N_EPOCHS)

# Train model

The most **important** thing to remember with PyTorch is that after every epoch, you have to **zero the gradients** (otherwise they will accumulate and explode). Other than that, we need to perform the **forward pass**, **backward pass**, and **update the parameters**.

In [11]:
# Loop over epochs
for epoch in range(N_EPOCHS):
    loss_acc = 0
    train_count = 0
    
    # Loop over batches
    for i, (imgs, labels) in enumerate(train_loader):
        # Reshape
        imgs = imgs.to(device)
        labels = labels.to(device)
        
        # Forward pass
        preds = model(imgs)
        L = loss(preds,labels)
        
        # Backprop
        L.backward()
        
        # Update parameters
        optimiser.step()
        
        # Zero gradients
        optimiser.zero_grad()
        
        # Track loss
        loss_acc += L.detach().item()
        train_count += 1
    
    # Update learning rate
    scheduler.step()
        
    # Print loss
    if (epoch+1)%1==0:
        print(f'Epoch {epoch+1}/{N_EPOCHS}, loss {loss_acc/train_count:.5f}')
        
print('')
print('Training complete')

Epoch 1/20, loss 2.29980
Epoch 2/20, loss 2.18798
Epoch 3/20, loss 1.91091
Epoch 4/20, loss 1.71794
Epoch 5/20, loss 1.59358
Epoch 6/20, loss 1.50920
Epoch 7/20, loss 1.44012
Epoch 8/20, loss 1.38323
Epoch 9/20, loss 1.33632
Epoch 10/20, loss 1.29362
Epoch 11/20, loss 1.25689
Epoch 12/20, loss 1.22460
Epoch 13/20, loss 1.19628
Epoch 14/20, loss 1.17518
Epoch 15/20, loss 1.15627
Epoch 16/20, loss 1.14290
Epoch 17/20, loss 1.13267
Epoch 18/20, loss 1.12412
Epoch 19/20, loss 1.11993
Epoch 20/20, loss 1.11819

Training complete


# Save model

Saving in PyTorch allows us to save the **current state** of the **model** and also any other objects like the **optimiser**, **learning rate** etc. It is very **flexible** and **easy** to use.

In [12]:
# Path
PATH = "ConvNet.pt"

# Save
torch.save({
            'epoch': N_EPOCHS,
            'model_state_dict': model.state_dict(),
            'optimiser_state_dict': optimiser.state_dict(),
            'scheduler_state_dict': scheduler.state_dict(),
            'loss': loss_acc/train_count,
            }, PATH)

# Load model

First we need to **initialise** the (same) **model**, **optimiser** etc and then we can load the **dictionary** locally. It is important to call `model.eval()` to set dropout and batch normalization layers to **evaluation mode** before running **inference**.

In [13]:
# Initialise model and optimiser
model = ConvNet().to(device)
optimiser = optim.SGD(params=model.parameters(), lr=LEARNING_RATE)
scheduler = lr_scheduler.CosineAnnealingLR(optimiser, T_max=N_EPOCHS)

# Load checkpoint
checkpoint = torch.load(PATH)

# Load states
model.load_state_dict(checkpoint['model_state_dict'])
optimiser.load_state_dict(checkpoint['optimiser_state_dict'])
scheduler.load_state_dict(checkpoint['scheduler_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']

# Keep on training or evaluate model
#model.train()
model.eval()

ConvNet(
  (conv1): Conv2d(3, 16, kernel_size=(5, 5), stride=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): Conv2d(16, 32, kernel_size=(5, 5), stride=(1, 1))
  (flat): Flatten(start_dim=1, end_dim=-1)
  (fc1): Linear(in_features=800, out_features=128, bias=True)
  (fc2): Linear(in_features=128, out_features=64, bias=True)
  (fc3): Linear(in_features=64, out_features=10, bias=True)
  (relu): ReLU()
)

# Evaluate

Let's explore our model's **accuracy** on the **test set**.

In [14]:
with torch.no_grad():
    n_correct=0
    n_samples=0
    
    n_class_correct = [0 for i in range(10)]
    n_class_sample = [0 for i in range(10)]
    
    for imgs, labels in test_loader:
        imgs = imgs.to(device)
        labels = labels.to(device)
        output = model(imgs)
        
        _, preds = torch.max(output, 1)
        
        n_samples += labels.shape[0]
        n_correct += (preds == labels).sum().item()
        
        for i in range(BATCH_SIZE):
            try:
                label = labels[i].item()
                pred = preds[i].item()
            except:
                break
            
            if (label==pred):
                n_class_correct[label]+=1
            n_class_sample[label]+=1
    
    acc = 100 * n_correct/n_samples
    print(f'Overall accuracy on test set: {acc:.1f} %')
    
    for i in range(10):
        print(f'Accuracy of {classes[i]}: {100* n_class_correct[i]/n_class_sample[i]:.1f} %')

Overall accuracy on test set: 59.3 %
Accuracy of plane: 63.9 %
Accuracy of car: 71.0 %
Accuracy of bird: 45.9 %
Accuracy of cat: 39.1 %
Accuracy of deer: 42.3 %
Accuracy of dog: 50.4 %
Accuracy of frog: 74.4 %
Accuracy of horse: 66.8 %
Accuracy of ship: 74.2 %
Accuracy of truck: 65.0 %


**Check out my other PyTorch tutorials**

1. [PT1 - Linear Regression with PyTorch](https://www.kaggle.com/code/samuelcortinhas/pt1-linear-regression-with-pytorch/notebook)
2. [PT2 - Logistic Regression with PyTorch](https://www.kaggle.com/code/samuelcortinhas/pt2-logistic-regression-with-pytorch)
3. [PT3 - Neural Networks with PyTorch](https://www.kaggle.com/code/samuelcortinhas/pt3-neural-networks-with-pytorch)
4. [PT4 - CNNs with PyTorch](https://www.kaggle.com/samuelcortinhas/pt4-cnns-with-pytorch)
5. [PT5 - Save & load models with PyTorch](https://www.kaggle.com/samuelcortinhas/pt5-save-load-models-with-pytorch)