## Spoilers

- build a function to perform training steps.
- implement our own dataset class.
- use data loaders to generate mini-batches.
- build a function to perform mini-batch gradient descent.
- evaluate the model on the validation set.
- intergrate TensorBoard to monitor model training and evaluation.
- save/checkpoint our model to disk.
- load our model from disk to resume training or to deploy.

In [1]:
import numpy as np
from sklearn.linear_model import LinearRegression
import torch
import torch.optim as optim
import torch.nn as nn
from torch.utils.data import TensorDataset, DataLoader, Dataset
from torch.utils.data.dataset import random_split
import torch.nn.functional as F
from torch.utils.tensorboard import SummaryWriter

import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('fivethirtyeight')

2023-08-06 15:32:15.341024: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


## Rethinking the Training Loop

In [2]:
%run -i '../data_generation/simple_linear_regression.py'
%run -i '../data_preparation/v0.py'
%run -i '../model_configuration/v0.py'

#### Training Step

In [3]:
# Use higheer-order function to build training step
def make_train_step(model, loss_fn, optimizer):
    # Builds function that performs a step in the training loop
    def train_step(x, y):
        # Sets model to TRAIN mode
        model.train()
        # Makes predictions
        yhat = model(x)
        # Computes loss
        loss = loss_fn(y, yhat)
        # Computes gradients
        loss.backward()
        # Updates parameters and zeroes gradients
        optimizer.step()
        optimizer.zero_grad()
        # Returns the loss
        return loss.item()
    # Returns the function that will be called inside the training loop
    return train_step

In [4]:
%%writefile '../model_configuration/v1.py'

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Set learning parameters
learning_rate = 1e-3

torch.manual_seed(42)

# Create model and send to device
model = nn.Sequential(nn.Linear(1,1)).to(device)

# Defines a SGD optimizer to update the parameters
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

# Defines a MSE loss function
loss_fn = nn.MSELoss(reduction='mean')

# Creates the training_step function for our model, loss function and optimizer
train_step = make_train_step(model, loss_fn, optimizer)

Overwriting ../model_configuration/v1.py


In [5]:
%run -i '../model_configuration/v1.py'

In [6]:
%%writefile '../model_training/v1.py'

# Defines number of epochs
n_epochs = 1000

losses = []

# Training loop
for epoch in range(n_epochs):
    loss = train_step(x_train_tensor, y_train_tensor)
    losses.append(loss)

Overwriting ../model_training/v1.py


In [7]:
%run -i '../model_training/v1.py'

In [8]:
print(model.state_dict())

OrderedDict([('0.weight', tensor([[1.2197]], device='cuda:0')), ('0.bias', tensor([1.4200], device='cuda:0'))])


## Dataset

In PyTorch, a dataset is represented by a regular Python class that inherits from the Dataset class. A dataset class must implement three methods:
- __init__(self) : The constructor, it is called when we instantiate the dataset object. There is no need to load the whole dataset in the constructor. If your dataset is big, loading it at once would not be memory efficient. It is recommended to load the data lazily in the ___get_item___ method.

- __get_item__(self, index): it allows the dataset to be indexed so that it can work like a list (dataset[i]).
- __len__(self): it returns the length of the dataset.

In [9]:
class CustomDataset(Dataset):
    def __init__(self, x_tensor, y_tensor):
        self.x = x_tensor
        self.y = y_tensor

    def __getitem__(self, index):
        return (self.x[index], self.y[index])
    
    def __len__(self):
        return len(self.x)
    
x_train_tensor = torch.from_numpy(x_train).float()
y_train_tensor = torch.from_numpy(y_train).float()

train_data = CustomDataset(x_train_tensor, y_train_tensor)
print(train_data[0])

(tensor([0.6398]), tensor([2.2890]))


#### TensorDataset

In [10]:
#if a dataset is nothing more than a couple of tensors, we can use PyTorch's TensorDataset class
train_datatensor = TensorDataset(x_train_tensor, y_train_tensor)
print(train_datatensor[0])

(tensor([0.6398]), tensor([2.2890]))


Why we need build a dataset anyway? -> Because we want to use a ...

## DataLoader

We tell DataLoader which dataset to use, the desired mini-batch size and if we'd like to shuffle it or not, and that all. PyTorch's DataLoader will automatically create mini-batches for our.

In [11]:
train_loader = DataLoader(
    dataset=train_data,
    batch_size=16,
    shuffle=True,
)

In [12]:
next(iter(train_loader))[0], next(iter(train_loader))[1]


(tensor([[0.8150],
         [0.9011],
         [0.4282],
         [0.3418],
         [0.5195],
         [0.8504],
         [0.5994],
         [0.8295],
         [0.3350],
         [0.7141],
         [0.8421],
         [0.2950],
         [0.0872],
         [0.0349],
         [0.0795],
         [0.8969]]),
 tensor([[2.5023],
         [1.0689],
         [2.9750],
         [1.5896],
         [2.1105],
         [2.0180],
         [2.0363],
         [2.6549],
         [2.2896],
         [1.8805],
         [1.9373],
         [2.8198],
         [3.0079],
         [2.4757],
         [2.9086],
         [2.7999]]))

In [13]:
list(train_loader)[0]

[tensor([[0.6398],
         [0.9011],
         [0.0349],
         [0.4886],
         [0.5162],
         [0.3418],
         [0.2417],
         [0.1033],
         [0.4842],
         [0.9766],
         [0.8504],
         [0.5994],
         [0.6324],
         [0.3668],
         [0.0795],
         [0.3338]]),
 tensor([[2.2890],
         [2.8792],
         [1.1649],
         [2.0180],
         [2.0511],
         [1.7058],
         [1.4962],
         [1.3058],
         [2.0114],
         [3.0231],
         [2.7999],
         [2.2896],
         [2.3468],
         [1.7644],
         [1.2484],
         [1.7144]])]

In [14]:
%%writefile '../data_preparation/v1.py'

x_train_tensor = torch.tensor(x_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.float32)

# Builds Dataset
train_dataset = TensorDataset(x_train_tensor, y_train_tensor)

# Builds DataLoader
train_loader = DataLoader(
    dataset=train_dataset,
    batch_size=16,
    shuffle=True
)

Overwriting ../data_preparation/v1.py


In [15]:
%run -i '../data_preparation/v1.py'

In [16]:
%run -i '../model_configuration/v1.py'

In [17]:
%%writefile '../model_training/v2.py'

# Defines number of epochs
n_epochs = 1000

losses = []

for epoch in range(n_epochs):
    # inner loop
    mini_batch_losses = []
    for x_batch, y_batch in train_loader:
        x_batch = x_batch.to(device)
        y_batch = y_batch.to(device)

        mini_batch_loss = train_step(x_batch, y_batch)
        mini_batch_losses.append(mini_batch_loss)
    
    # Computes average loss over all mini-batches
    loss = np.mean(mini_batch_losses)

    losses.append(loss)

Overwriting ../model_training/v2.py


In [18]:
%run -i '../model_training/v2.py'

In [19]:
model.state_dict()

OrderedDict([('0.weight', tensor([[1.5415]], device='cuda:0')),
             ('0.bias', tensor([1.3110], device='cuda:0'))])

In [20]:
%run -i '../data_preparation/v1.py'
%run -i '../model_configuration/v1.py'
%run -i '../model_training/v2.py'

In [21]:
model.state_dict()

OrderedDict([('0.weight', tensor([[1.5415]], device='cuda:0')),
             ('0.bias', tensor([1.3110], device='cuda:0'))])

#### Mini-Batch Inner Loop

In [22]:
def mini_batch(device, data_loader, step):
    mini_batch_losses = []
    for x_batch, y_batch in data_loader:
        x_batch = x_batch.to(device)
        y_batch = y_batch.to(device)

        mini_batch_loss = step(x_batch, y_batch)
        mini_batch_losses.append(mini_batch_loss)
    
    loss = np.mean(mini_batch_losses)

    return loss

In [23]:
%run -i '../data_preparation/v1.py'
%run -i '../model_configuration/v1.py'

In [24]:
%%writefile '../model_training/v3.py'

# Defines number of epochs
n_epochs = 1000

losses = []

for epoch in range(n_epochs):
    # Training
    loss = mini_batch(device=device,
                      data_loader=train_loader,
                      step=train_step)

    losses.append(loss)

Overwriting ../model_training/v3.py


In [25]:
%run -i '../model_training/v3.py'

In [26]:
model.state_dict()

OrderedDict([('0.weight', tensor([[1.5415]], device='cuda:0')),
             ('0.bias', tensor([1.3110], device='cuda:0'))])

In [27]:
%%writefile '../data_preparation/v2.py'

torch.manual_seed(0)

# Builds tensors from numpy arrays BEFORE split
x_tensor = torch.as_tensor(x_train, dtype=torch.float32)
y_tensor = torch.as_tensor(y_train, dtype=torch.float32)

# Builds dataset containing ALL data points
dataset = TensorDataset(x_tensor, y_tensor)

# Perform split
ration = 0.8
n_total = len(dataset)
n_train = int(n_total * ration)
n_val = n_total - n_train
train_data, val_data = random_split(dataset, [n_train, n_val])

# Builds a loader for each split
batch_size = 64
train_loader = DataLoader(
                        dataset=train_data, 
                        batch_size=batch_size, 
                        shuffle=True)

val_loader = DataLoader(val_data, batch_size=batch_size)

Overwriting ../data_preparation/v2.py


In [28]:
%run -i '../data_preparation/v2.py'

## Evaluation

In [29]:
def make_val_step(model, loss_fn):
    # Builds function that performs a step in the validation loop
    def perform_val_step(x, y):
        # Set model to eval mode
        model.eval()
        
        # Step 1 - Computes our model's predicted output
        # forwad pass
        yhat = model(x)

        # Step 2 - Computes the validation loss
        loss = loss_fn(y, yhat)

        return loss.item()
    
    # Returns the function that will be called inside the validation loop
    return perform_val_step

In [30]:
%%writefile '../model_configuration/v2.py'


device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Set learning rate
lr = 1e-3

torch.manual_seed(42)

# Now we can create a model and send it at once to the device
model = nn.Sequential(nn.Linear(1,1)).to(device=device)

# Define optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=lr)

# Define loss function
loss_fn = nn.MSELoss(reduction='mean')

# Create train step
tran_step = make_train_step(model, loss_fn, optimizer)

# Create validation step
val_step = make_val_step(model, loss_fn)

Overwriting ../model_configuration/v2.py


In [31]:
%run -i '../model_configuration/v2.py'

In [32]:
%%writefile '../model_training/v4.py'

n_epochs = 200
losses = []
val_losses = []

for epoch in range(n_epochs):
    # inner loop - training
    loss = mini_batch(device, train_loader, train_step)
    losses.append(loss)

    # VALIDATION - no gradients in validation!
    with torch.no_grad():
        val_loss = mini_batch(device, val_loader, val_step)
        val_losses.append(val_loss)

Overwriting ../model_training/v4.py


In [33]:
%run -i '../model_training/v4.py'

In [34]:
model.state_dict()

OrderedDict([('0.weight', tensor([[0.7645]], device='cuda:0')),
             ('0.bias', tensor([0.8300], device='cuda:0'))])

In [35]:
%run -i '../data_preparation/v2.py'
%run -i '../model_configuration/v2.py'
%run -i '../model_training/v4.py'

## TensorBoard

#### SummaryWriter

In [36]:
wrtier = SummaryWriter('../runs/test')

add_graph

In [37]:
dummy_x, dummy_y = next(iter(train_loader))
wrtier.add_graph(model, dummy_x.to(device))

add_scalars

In [38]:
wrtier.add_scalars(
    main_tag = 'loss',
    tag_scalar_dict={'training': loss,
                     'validation': val_loss},
    global_step=epoch
)

In [39]:
%run -i '../data_preparation/v2.py'

In [40]:
%%writefile '../model_configuration/v3.py'

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# learning rate
lr = 1e-3

torch.manual_seed(42)

# model
model = nn.Sequential(nn.Linear(1,1)).to(device)

# optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=lr)

# MSE loss
loss_fn = nn.MSELoss(reduction='mean')

# training step
train_step = make_train_step(model, loss_fn, optimizer)

# val step
val_step = make_val_step(model, loss_fn)

# Creates a Summary Writer to inferface with TensorBoard
writer = SummaryWriter('../runs/simple_linear_regression')
x_dummy, y_dummy = next(iter(train_loader))
x_dummy = x_dummy.to(device)
y_dummy = y_dummy.to(device)
writer.add_graph(model, x_dummy)

Overwriting ../model_configuration/v3.py


In [41]:
%run -i '../model_configuration/v3.py'

In [42]:
%%writefile '../model_training/v5.py'

# Defines epochs
n_epochs = 200

losses = []
val_losses = []

for epoch in range(n_epochs):
    # training
    loss = mini_batch(device=device,
                      data_loader=train_loader,
                      step=train_step)
    losses.append(loss)

    # validation - no gradients in validation!
    with torch.no_grad():
        val_loss = mini_batch(device=device,
                              data_loader=val_loader,
                              step=val_step)
        val_losses.append(val_loss)

    # Records both losses for each epoch under tag 'loss'
    writer.add_scalars(main_tag='loss',
                        tag_scalar_dict={'train': loss,
                                         'val': val_loss},
                        global_step=epoch)

# close writer
writer.close()

Overwriting ../model_training/v5.py


In [43]:
%run -i '../model_training/v5.py'

In [44]:
model.state_dict()

OrderedDict([('0.weight', tensor([[0.9330]], device='cuda:0')),
             ('0.bias', tensor([1.0949], device='cuda:0'))])

## Saving and Loading Models

#### Model State
- model.state_dict() # Returns a dictionary containing a whole state of the module.
- optimizer.state_dict() # Returns a dictionary containing a whole state of the optimizer.
- losses
- epoch

#### Saving

In [45]:
checkpoint = {
    'epoch': n_epochs,
    'model_state_dict': model.state_dict(),
    'optimizer_state_dict': optimizer.state_dict(),
    'loss': losses,
    'val_loss': val_losses,
}

torch.save(checkpoint, 'checkpoint.pth')

#### Resuming Training

In [46]:
%run -i '../data_preparation/v2.py'
%run -i '../model_configuration/v3.py'
model.state_dict()

OrderedDict([('0.weight', tensor([[0.7645]], device='cuda:0')),
             ('0.bias', tensor([0.8300], device='cuda:0'))])

- LOAD THE DICTIONARY BACK USING torch.load()
- LOAD MODEL AND OPTIMIZER STATE DICTIONARIES BACK USING THE load_state_dict() METHOD
- LOAD EVERYTHING ELSE INTO THEIR CORRESPONDING VARIABLES

In [47]:
checkpoint = torch.load('checkpoint.pth')

model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])

# Load the last epoch
saved_epoch = checkpoint['epoch']
saved_losses = checkpoint['loss']
saved_val_losses = checkpoint['val_loss']

# REMEMBER TO SET THE MODEL TO TRAIN FOR RESUMING TRAINING
model.train()

Sequential(
  (0): Linear(in_features=1, out_features=1, bias=True)
)

In [48]:
model.state_dict()

OrderedDict([('0.weight', tensor([[0.9330]], device='cuda:0')),
             ('0.bias', tensor([1.0949], device='cuda:0'))])

In [49]:
%run -i '../model_training/v5.py'

In [50]:
model.state_dict()

OrderedDict([('0.weight', tensor([[1.0419]], device='cuda:0')),
             ('0.bias', tensor([1.2501], device='cuda:0'))])

#### Deploying/Making Predictions

In [51]:
%run -i '../model_configuration/v3.py'

In [52]:
checkpoint = torch.load('checkpoint.pth')
model.load_state_dict(checkpoint['model_state_dict'])
model.state_dict()

OrderedDict([('0.weight', tensor([[0.9330]], device='cuda:0')),
             ('0.bias', tensor([1.0949], device='cuda:0'))])

In [53]:
# predict
new_inputs = torch.tensor([[0.20], [0.34], [0.57]])

model.eval()
model(new_inputs.to(device=device))

tensor([[1.2815],
        [1.4121],
        [1.6267]], device='cuda:0', grad_fn=<AddmmBackward0>)

* **NOTE: ALWAY SET MODEL MODE:**
    - **checkpointing: model.train()**
    - **deploying/making predictions: model.eval()**

## Putting It All Together

- Data Preparation V2
- Model Configuration V3
- Model Training V5

In [54]:
%%writefile '../data_preparation/v2.py'

torch.manual_seed(0)

# Builds tensors from numpy arrays BEFORE splitting into train and test
x_train_tensor = torch.as_tensor(x_train).float()
y_train_tensor = torch.as_tensor(y_train).float()

# Builds dataset containing ALL data points
train_dataset = TensorDataset(x_train_tensor, y_train_tensor)

# Perform split
ratio = 0.8
n_total = len(train_dataset)
n_train = int(n_total * ratio)
n_val = n_total - n_train
train_dataset, val_dataset = random_split(train_dataset, [n_train, n_val])

# Builds data loaders
train_loader = DataLoader(
    dataset=train_dataset,
    batch_size=64,
    shuffle=True)

val_loader = DataLoader(
    dataset=val_dataset,
    batch_size=64,
    shuffle=False)

Overwriting ../data_preparation/v2.py


In [55]:
%%writefile '../model_configuration/v3.py'

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# learning rate
lr = 1e-3

torch.manual_seed(42)

# model
model = nn.Sequential(nn.Linear(1, 1)).to(device)

# optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=lr)

# loss function
loss_fn = nn.MSELoss(reduction='mean')

# train step
train_step = make_train_step(
    model=model,
    loss_fn=loss_fn,
    optimizer=optimizer
)

# val step
val_step = make_val_step(
    model=model,
    loss_fn=loss_fn
)


# Create SummaryWriter to interface with TensorBoard
writer = SummaryWriter('../runs/simple_linear_regression')

# Fetches a sungle mini-batch of data so we can add graph
x, y = next(iter(train_loader))
writer.add_graph(model, x.to(device))

Overwriting ../model_configuration/v3.py


In [56]:
%%writefile '../model_training/v5.py'

# defines number of epochs
n_epochs = 200

losses = []
val_losses = []

for epoch in range(n_epochs):
    # Training 
    loss = mini_batch(device, train_loader, train_step)
    losses.append(loss)

    # VALIDATION - no gradient tracking needed
    with torch.no_grad():
        val_loss = mini_batch(device, val_loader, val_step)
        val_losses.append(val_loss)

    # Records both losses for each epoch under tag 'loss'
    writer.add_scalars(
                        main_tag='loss', 
                        tag_scalar_dict = {'train': loss, 'val': val_loss}, 
                        global_step=epoch)

# close writer
writer.close()

Overwriting ../model_training/v5.py


In [57]:
%run -i '../data_preparation/v2.py'
%run -i '../model_configuration/v3.py'
%run -i '../model_training/v5.py'

In [58]:
model.state_dict()

OrderedDict([('0.weight', tensor([[0.9330]], device='cuda:0')),
             ('0.bias', tensor([1.0949], device='cuda:0'))])

## SUMMARY

- writing a higher-order function that builds functions to perform training steps.
- understanding PyTorch's Dataset and DataLoader classes, implementing a custom dataset, and using a DataLoader with a custom dataset.
- using PyTorch's DataLoader class to generate mini-batches for training a neural network.
- writing higher-order function that builds functions to perform validation steps.
- realizeing the importance of including model.eval() and model.train() in the appropriate places when training and validating a model.
- remember the purpose of no_grad() and using it to prevent any kind of gradient computation during validation.
- using SummaryWriter to interface with TensorBaord for logging.
- adding a graph representation of the model to TensorBoard.
- using TensorBoard to plot the loss curves for training and validation.
- saving/checkpointing a model during training, resuming training from a checkpoint, and loading a model for inference or deployment.
- realizeing the importance of setting the mode of the model: train() or eval() for checkpointing or deploying for prediction.