In [None]:
import numpy as np
import torch
import torch.optim as optim
import torch.nn as nn
from torch.utils.data import TensorDataset, DataLoader

# Chapter 4: Rethinking the Training Loop

Now that you've dived a little bit deeper into PyTorch's Datasets and DataLoaders, it's time to put your knowledge into action :-)

We're using the same synthetic dataset from the previous challenges (*b = 0.5* and *w = -3* for a **linear regression with a single feature (x)**), but this time you'll be implementing mini-batch gradient descent in PyTorch.

$$
\Large
y = b + w x
$$

## Data Generation

In [None]:
true_b = .5
true_w = -3
N = 100

# Data Generation
np.random.seed(42)
x = np.random.rand(N, 1)
epsilon = (.1 * np.random.randn(N, 1))
y = true_b + true_w * x + epsilon

# Shuffles the indices
idx = np.arange(N)
np.random.shuffle(idx)

# Uses first 80 random indices for train
train_idx = idx[:int(N*.8)]
# Uses the remaining indices for validation
val_idx = idx[int(N*.8):]

# Generates train and validation sets
x_train, y_train = x[train_idx], y[train_idx]
x_val, y_val = x[val_idx], y[val_idx]

## Data Preparation

The preparation of data starts by **converting the data points** from Numpy arrays to PyTorch tensors and sending them to the available **device**:

In [None]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# Our data was in Numpy arrays, but we need to transform them 
# into PyTorch's Tensors and then we send them to the 
# chosen device
x_train_tensor = torch.as_tensor(x_train).float().to(device)
y_train_tensor = torch.as_tensor(y_train).float().to(device)

x_val_tensor = torch.as_tensor(x_val).float().to(device)
y_val_tensor = torch.as_tensor(y_val).float().to(device)

But, this time, the data preparation also includes creating datasets and data loaders for both training and validation sets. That's your task now - you're free to choose any mini-batch size you want (and we encourage you to play with different values), but we suggest you to start with 16:

Hint: you can use a simple `TensorDataset` for this task

### Answer

In [None]:
train_data = TensorDataset(x_train_tensor, y_train_tensor)
val_data = TensorDataset(x_val_tensor, y_val_tensor)

train_loader = DataLoader(dataset=train_data, batch_size=16, shuffle=True)
val_loader = DataLoader(dataset=val_data, batch_size=16)

## Model Configuration

The model configuration not only includes the definition of model, optimizer, and loss function, but also the creation of functions to perform both **training and validation steps**. You can use the **helper methods** below for that:

In [None]:
def make_train_step_fn(model, loss_fn, optimizer):
    # Builds function that performs a step in the train loop
    def perform_train_step_fn(x, y):
        # Sets model to TRAIN mode
        model.train()
        
        # Step 1 - Computes our model's predicted output - forward pass
        yhat = model(x)
        # Step 2 - Computes the loss
        loss = loss_fn(yhat, y)
        # Step 3 - Computes gradients for both "a" and "b" parameters
        loss.backward()
        # Step 4 - Updates parameters using gradients and the learning rate
        optimizer.step()
        optimizer.zero_grad()
        
        # Returns the loss
        return loss.item()
    
    # Returns the function that will be called inside the train loop
    return perform_train_step_fn

In [None]:
def make_val_step_fn(model, loss_fn):
    # Builds function that performs a step in the validation loop
    def perform_val_step_fn(x, y):
        # Sets model to EVAL mode
        model.eval()
        
        # Step 1 - Computes our model's predicted output - forward pass
        yhat = model(x)
        # Step 2 - Computes the loss
        loss = loss_fn(yhat, y)
        # There is no need to compute Steps 3 and 4, since we don't update parameters during evaluation
        return loss.item()
    
    return perform_val_step_fn

Your task is, once again, to define a **model**, an **optimizer**, and a **loss function** to tackle our **linear** regression with a **single input** and **single output**. Then, you should use these elements (and the helper methods above) to create your `train_step_fn` and `val_step_fn` functions:

### Answer

In [None]:
torch.manual_seed(42)

lr = 0.1

model = nn.Sequential(nn.Linear(1, 1)).to(device)
optimizer = optim.SGD(model.parameters(), lr=lr)
loss_fn = nn.MSELoss(reduction='mean')

In [None]:
train_step_fn = make_train_step_fn(model, loss_fn, optimizer)
val_step_fn = make_val_step_fn(model, loss_fn)

## Mini-Batch Inner Loop

Your task is to implement a function that **executes the mini-batch inner loop**. Given a *data loader*, a device, and a **step function** (that could be either `train_step_fn` or `val_step_fn`), the function should:

- loop over the mini-batches yielded by the data loader
- send the mini-batch data (x and y) to the device
- execute the `step_fn` using x and y
- appends the returned loss to the list of `mini_batch_losses`

In the end, the `mini_batch` function will return the **average loss over all mini-batches**.

### Answer

In [None]:
def mini_batch(device, data_loader, step_fn):
    mini_batch_losses = []
    for x_batch, y_batch in data_loader:
        x_batch = x_batch.to(device)
        y_batch = y_batch.to(device)

        mini_batch_loss = step_fn(x_batch, y_batch)
        mini_batch_losses.append(mini_batch_loss)

    loss = np.mean(mini_batch_losses)
    return loss

## Model Training

Your task is to implement mini-batch gradien descent using the `mini_batch` function you've just implemented above to execute both **training** and **validation steps**:

Obs.: the parameter update is happening **inside the training step function** now, that's why you only see the losses in the loop below.

### Answer

In [None]:
# Defines number of epochs
n_epochs = 200

losses = []
val_losses = []

for epoch in range(n_epochs):
    # inner loop
    loss = mini_batch(device, train_loader, train_step_fn)
    losses.append(loss)
    
    # VALIDATION
    # no gradients in validation!
    with torch.no_grad():
        val_loss = mini_batch(device, val_loader, val_step_fn)
        val_losses.append(val_loss)    
        
print(model.state_dict())
print(losses[-1], val_losses[-1])

## Saving Models

Once the model is fully trained, you may **save it to disk**. Your task is to build a dictionary containing all relevant information, and using `torch.save` to save this dictionary to a file:

### Answer

In [None]:
checkpoint = {'model_state_dict': model.state_dict(),
              'optimizer_state_dict': optimizer.state_dict(),
              'epoch': n_epochs,
              'loss': losses,
              'val_loss': val_losses}

torch.save(checkpoint, 'model_checkpoint.pth')

## Loading Models and Making Predictions

Once your model is saved to disk, you can load it back to either continue training it or deliver predictions (if it's already fully trained). Your first task is to load both the model and the optimizer states from a file and restore them into the `new_model` and `new_optimizer` respectively. Then, you should make predictions for the `new_inputs` tensor (assuming the loaded model was already fully trained)

Hint: don't forget to set the proper mode before making predictions.

In [None]:
lr = 0.01

new_model = nn.Sequential(nn.Linear(1, 1)).to(device)
new_optimizer = optim.SGD(model.parameters(), lr=lr)
loss_fn = nn.MSELoss(reduction='mean')

### Answer

In [None]:
checkpoint = torch.load('model_checkpoint.pth')

new_model.load_state_dict(checkpoint['model_state_dict'])
new_optimizer.load_state_dict(checkpoint['optimizer_state_dict'])

saved_epoch = checkpoint['epoch']
saved_losses = checkpoint['loss']
saved_val_losses = checkpoint['val_loss']

new_model.state_dict()

In [None]:
new_inputs = torch.tensor([[.20], [.34], [.57]])

model.eval() # always use EVAL for fully trained models!
model(new_inputs.to(device))

Congratulations! You successfully trained a PyTorch model using **mini-batch** gradient descent, saved it to disk, and "deployed" it to make predictions!