## Optimizing Model Parameters

In [1]:
print("Optimizing Model Parameters")

Optimizing Model Parameters


### Pre-requisite Code


In [2]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor


In [4]:
training_data = datasets.FashionMNIST(
    root="data/", download=True, train=True, transform=ToTensor(),
)

test_data = datasets.FashionMNIST(
    root="data/", download=True, train=False, transform=ToTensor(),
)

train_dataloader = DataLoader(training_data, batch_size=64)
test_dataloader = DataLoader(test_data, batch_size = 64)

In [5]:
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork()

#### Hyperparameters

Hyperparameters are adjustable parameters that let us control the model optimization process.

- Epochs: the number of times to iterate over the dataset
- Batch Size: the number of data samples propagated through the network before the parameters are updated.
- Learning Rate: how much to update the model parameters at each batch/epoch. Smaller values yield slow learning speed, while large values may result in unpredictable behavious during training.

In [6]:
learning_rate = 1e-3
batch_size = 64
epochs = 5

## Optimization loop

Each iteration of the optimization loop is called epoch.

Each epoch consist of two main parts:
- Train Loop : iterate over the training dataset and try to converge to optimal parameters.
- The Validation/Test Loop: iterate over the test dataset to check if model performance is improving.

#### Loss Function

Loss function measures the degree of dissimilarity of obtained result to the target value, and it is the loss function that we want to minimize during training. To calculate the loss, we make a prediction using inputs of our given data sample and compare it against the true data label value.

Common Loss functions:
- nn.MSELoss (Mean Square Error)
- nn.NLLLoss (Negative LogLiklihood)
- nn.CrossEntropyLoss (combines nn.Softmax and nn.NLLLoss) -> normalize the logits and compute the prediction error.

In [7]:
loss_fn = nn.CrossEntropyLoss()

#### Optimizer

Optimization is the process of adjusting the model parameters to reduce model error in each training step. Optimization algorithms defines how this process if performed.

Inside the training loop optimization happens in three steps:
- Call optimizer.zero_grad() to reset the gradients of model parameters. Gradients by default add up. To rpevent double-counting, we explicitly zero them at each iteration.

- Backpropagate the prediction loss with a call to `loss.backward()`. Pytorch deposits the gradients of the loss wrt each parameter.

- Once we have our gradients, we call optimizer.step() to adjust the parameters by the gradients collected in the backward pass.

In [8]:
optimizer = torch.optim.SGD(model.parameters(), lr = learning_rate)

## Full Implementation

- train_loop : loops over the optimization code
- test_loop : evaluates the model's performance against our test data

In [9]:
def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()   # important for batch normalization and dropout layers
    for batch, (X,y) in enumerate(dataloader):
        # Compute prediction and loss
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if batch % 100 == 0:
            loss, current = loss.item(), batch * batch_size + len(X)
            print(f"Loss: {loss:>7f} [{current:>5d}/{size:>5d}]")

In [10]:
def test_loop(dataloader, model, loss_fn):
    model.eval()    # Set the model to evaluation mode: important for batch normalization and dropout layers
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    test_loss, correct = 0,0

    # Evaluating the model with torch.no_grad() ensures that no gradients are computed during test mode
    # also serves to reduce unnecessary gradient computations and memory usage for tensors with requires_grad=True
    with torch.no_grad():
        for X,y in dataloader:
            pred = model(X)
            test_loss += loss_fn(pred, y)
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()

    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100 * correct):>0.1f}%, Avg loss: {test_loss:>8f}\n")

In [11]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

epochs = 10
for t in range(epochs):
    print(f"Epoch {t+1} \n---------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1 
---------------------------
Loss: 2.293435 [   64/60000]
Loss: 2.277121 [ 6464/60000]
Loss: 2.254801 [12864/60000]
Loss: 2.255815 [19264/60000]
Loss: 2.228968 [25664/60000]
Loss: 2.201147 [32064/60000]
Loss: 2.207932 [38464/60000]
Loss: 2.169936 [44864/60000]
Loss: 2.171769 [51264/60000]
Loss: 2.118714 [57664/60000]
Test Error: 
 Accuracy: 37.5%, Avg loss: 2.122446

Epoch 2 
---------------------------
Loss: 2.147232 [   64/60000]
Loss: 2.127120 [ 6464/60000]
Loss: 2.062356 [12864/60000]
Loss: 2.078024 [19264/60000]
Loss: 2.018129 [25664/60000]
Loss: 1.962888 [32064/60000]
Loss: 1.977720 [38464/60000]
Loss: 1.901686 [44864/60000]
Loss: 1.912248 [51264/60000]
Loss: 1.799344 [57664/60000]
Test Error: 
 Accuracy: 56.6%, Avg loss: 1.821400

Epoch 3 
---------------------------
Loss: 1.880958 [   64/60000]
Loss: 1.832691 [ 6464/60000]
Loss: 1.713471 [12864/60000]
Loss: 1.743969 [19264/60000]
Loss: 1.630108 [25664/60000]
Loss: 1.603620 [32064/60000]
Loss: 1.598930 [38464/60000]
Loss