##### Name: K Lalith Aditya
##### Regd No: 22231
##### OPTIMIZING MODEL PARAMETERS


After having the model and data its time to train , validate and test our model by optimizing its parameters on our data. Training a model is an iterative process

In each iteration the model makes a guess about the output , caluculates the error i.e. the loss function. collects the derivatives of the error with respect to the parameters, and optimizes the parameters using gradient descent.

##### Pre Requisite Code


In [31]:
import torch
# Import the PyTorch library
from torch import nn
# Import the `nn` module from PyTorch, which contains the neural network classes
from torch.utils.data import DataLoader
# Import the `DataLoader` class from PyTorch, which is used to load data in batches
from torchvision import datasets
# Import the `datasets` module from TorchVision, which contains the FashionMNIST dataset
from torchvision.transforms import ToTensor
# Import the `ToTensor` transform from TorchVision, which converts images to tensors


training_data = datasets.FashionMNIST(
    root = "data",
    train = True,
    download = True,
    transform = ToTensor()
)
# Load the FashionMNIST dataset, specifying that we want the training data and to download it if it doesn't exist already. We also apply the `ToTensor` transform to convert the images to tensors.


test_data = datasets.FashionMNIST(
    root = 'data',
    train = False,
    download = True,
    transform = ToTensor()
)
# Load the FashionMNIST dataset, specifying that we want the test data and to download it if it doesn't exist already. We also apply the `ToTensor` transform to convert the images to tensors.

train_dataloader = DataLoader(training_data, batch_size = 64)
# Create a `DataLoader` object for the training data, specifying that we want to load the data in batches of size 64.

test_dataloader = DataLoader(test_data, batch_size = 64)
# Create a `DataLoader` object for the test data, specifying that we want to load the data in batches of size 64.

class NeuralNetwork(nn.Module):
    # Define a class called `NeuralNetwork` that inherits from the `nn.Module` class.

    def __init__(self):
        super().__init__()
        # Call the `__init__()` method of the `nn.Module` class.
        self.flatten = nn.Flatten()
        # Create a `nn.Flatten` object to flatten the input images.
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28,512),# linear layer with 28*28 inputs and 512 outputs
            nn.ReLU(), # ReLU activation
            nn.Linear(512,512), # linear layer with 512 inputs and 512 outputs
            nn.ReLU(), # ReLU activation
            nn.Linear(512,18) # linear layer 512 inputs and 18 outputs
        )
        
    def forward(self, x): # function forward
            x = self.flatten(x) # flattens the inputs
            logits = self.linear_relu_stack(x) # inputs into relu stack
            return logits
        
model = NeuralNetwork()        # calling the instance of NeuralNetwork

#### Hyperparameters

Hyper parameters are adjustable parameters that let you control the model optimization process. Different hyperparameter values can impact the model training and convergence rates

Hyperparameters for training:

Number of Epochs: The number times to iterate over the dataset

Batch Size: The number of data samples propagated through the network before the parameters are updated

Learning Rate: How much to update model parameters at each batch/epoch. Smaller values yield slow learning speed, while large values result in unpredictable behaviour during training

In [32]:
learning_rate = 1e-3
batch_size = 64
epochs = 5

#### Optimization Loop

Once the hyper parameters are set , we can train and optimize our model with an optimization loop. Each iteration of the optimization loop is called an epoch.

Each Epoch consists of two main parts:
1 The Train Loop: Iterate over the training set and try to converge to optimal parameters.
2 The Validation/Test Loop: Iterate over the test dataset to check if the model performance is improving.



#### Loss Function


Loss function measures the dissimilarity of obtained result to the target value, and its the loss function to be minimized during the training.
To calculate the loss we make a prediction using the inputs of our given data sample and compare it against the true data label value.

Common Loss Functions:
1. nn.MSELoss (Mean Square Error) for regression
2. nn.NNLLoss (Negative Log Likelihood) for classification
3. nn.CrossEntropyLoss combines nn.LogSoftmax and nn.NLLLoss



In [33]:
# initializing the cross entropy loss function

loss_fn = nn.CrossEntropyLoss()


### Optimizer

Optimization is the process of adjusting the model parameters to reduce the model error in training step. Optimization algorithms define how this process is performed.
In optimizer object the optimization logic is encapsulated.
Many other optimizers are:-
1. ADAM
2. RMSProp

Initializing the optimizer by registering the model's parameters that need to be trained, and passing the learning rate hyper parameter

In [34]:
optimizer = torch.optim.SGD(model.parameters(), lr = learning_rate)

Inside the training loop. optimization happens in three steps:

1. Call optimizer.zero_grad() to reset the gradients of model parameters. Gradients by default add up; to prevent double-counting, we explicitly zero them at each iteration.

2. Backpropagate the prediction loss with a call to loss.backward(). PyTorch deposits the gradients of the loss w.r.t each parameter.

Once we have our gradients, we call optimizer.step() to adjust parameters by the gradients collected in the backward pass.

In [37]:
def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset) # Get the total size of the dataset
    # Set the model to training mode - important for batch normalization and dropout layers
    # Unnecessary in this situation but added for best practices
    model.train() # Set the model to training mode (important for batch normalization and dropout layers)
    for batch, (X, y) in enumerate(dataloader): # Iterate over the data batches
        # Compute prediction and loss
        pred = model(X) # Make predictions using the model
        loss = loss_fn(pred, y) # Calculate the loss between predictions and true labels

        # Backpropagation
        loss.backward() # Compute gradients for model parameters using backpropagation
        optimizer.step() # Update model parameters using the specified optimizer
        optimizer.zero_grad() # Reset gradients to zero for the next iteration

        if batch % 100 == 0: # Print loss information every 100 batches
            loss, current = loss.item(), (batch + 1) * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")


def test_loop(dataloader, model, loss_fn):
    # Set the model to evaluation mode - important for batch normalization and dropout layers
    # Unnecessary in this situation but added for best practices
    model.eval() # Set the model to evaluation mode (important for batch normalization and dropout layers)
    size = len(dataloader.dataset) # Get the total size of the dataset
    num_batches = len(dataloader) # Get the number of batches in the dataloader
    test_loss, correct = 0, 0 # Initialize variables to track test loss and correct predictions

    # Evaluating the model with torch.no_grad() ensures that no gradients are computed during test mode
    # also serves to reduce unnecessary gradient computations and memory usage for tensors with requires_grad=True
    with torch.no_grad():
        for X, y in dataloader: # Iterate over the data batches
            pred = model(X) # Make predictions using the model
            test_loss += loss_fn(pred, y).item() # Calculate the loss on test data
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()  # Count correct predictions

    test_loss /= num_batches # Calculate the average test loss
    correct /= size # Calculate the accuracy by dividing the correct predictions by the total dataset size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

In [38]:
loss_fn = nn.CrossEntropyLoss() # cross entropy loss is the loss function
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate) # Stochastic Gradient descent is the optimizer

epochs = 10 # 10 epochs
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer) # Train the model for one epoch using the train_loop function
    test_loop(test_dataloader, model, loss_fn) #Evaluate the model on the test dataset using the test_loop function
print("Done!")

Epoch 1
-------------------------------
loss: 0.797336  [   64/60000]
loss: 0.861056  [ 6464/60000]
loss: 0.639708  [12864/60000]
loss: 0.836981  [19264/60000]
loss: 0.736338  [25664/60000]
loss: 0.739897  [32064/60000]
loss: 0.819592  [38464/60000]
loss: 0.806062  [44864/60000]
loss: 0.796040  [51264/60000]
loss: 0.768770  [57664/60000]
Test Error: 
 Accuracy: 71.8%, Avg loss: 0.759868 

Epoch 2
-------------------------------
loss: 0.760314  [   64/60000]
loss: 0.832050  [ 6464/60000]
loss: 0.609199  [12864/60000]
loss: 0.813219  [19264/60000]
loss: 0.714624  [25664/60000]
loss: 0.715436  [32064/60000]
loss: 0.795119  [38464/60000]
loss: 0.790725  [44864/60000]
loss: 0.774587  [51264/60000]
loss: 0.748458  [57664/60000]
Test Error: 
 Accuracy: 73.1%, Avg loss: 0.738050 

Epoch 3
-------------------------------
loss: 0.727878  [   64/60000]
loss: 0.805841  [ 6464/60000]
loss: 0.582827  [12864/60000]
loss: 0.793039  [19264/60000]
loss: 0.695862  [25664/60000]
loss: 0.695012  [32064/600