<a href="https://colab.research.google.com/github/kameshcodes/deep-learning-codes/blob/main/pytorch_day_4_training_neural_network.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

$$\textbf{Training Neural Network}$$

---
---

<br>
<br>
<br>


# $\textbf{1. Loading a Dataset From TorchVision}$

----


In [2]:
import torch
from torchvision import datasets
from torch.utils.data import DataLoader
from torchvision.transforms import ToTensor
import matplotlib.pyplot as plt

In [3]:
train = datasets.FashionMNIST(
    root='data',
    train = True,
    download = True,
    transform = ToTensor()
)

test = datasets.FashionMNIST(
    root = 'data',
    train = False,
    download = False,
    transform = ToTensor()
)


train_dataloader = DataLoader(train, batch_size = 64, shuffle = True)
test_dataloader = DataLoader(test, batch_size = 64, shuffle = True)

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to data/FashionMNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 26421880/26421880 [00:02<00:00, 12735099.59it/s]


Extracting data/FashionMNIST/raw/train-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 29515/29515 [00:00<00:00, 200340.31it/s]


Extracting data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 4422102/4422102 [00:03<00:00, 1252438.10it/s]


Extracting data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 5148/5148 [00:00<00:00, 14698622.87it/s]

Extracting data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw






# $\textbf{2. Build the Neural Network}$

---

In [4]:
import torch
from torch import nn

In [13]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f'Using {device} device.')

Using cpu device.


In [33]:
import torch
import torch.nn as nn

class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()

        # First fully connected layer
        self.fc1 = nn.Linear(28 * 28, 64)
        self.batch_norm1 = nn.BatchNorm1d(64)
        self.relu = nn.ReLU()
        self.dropout1 = nn.Dropout(0.5)

        # Second fully connected layer
        self.fc2 = nn.Linear(64, 32)
        self.batch_norm2 = nn.BatchNorm1d(32)
        self.dropout2 = nn.Dropout(0.5)

        # Output layer
        self.fc3 = nn.Linear(32, 10)
        self.log_softmax = nn.LogSoftmax(dim=1)

    def forward(self, x):
        x = self.flatten(x)

        x = self.fc1(x)
        x = self.batch_norm1(x)
        x = self.relu(x)
        x = self.dropout1(x)

        x = self.fc2(x)
        x = self.batch_norm2(x)
        x = self.relu(x)
        x = self.dropout2(x)

        x = self.fc3(x)
        return x

In [34]:
model = NeuralNetwork().to(device)
print(model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (fc1): Linear(in_features=784, out_features=64, bias=True)
  (batch_norm1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU()
  (dropout1): Dropout(p=0.5, inplace=False)
  (fc2): Linear(in_features=64, out_features=32, bias=True)
  (batch_norm2): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (dropout2): Dropout(p=0.5, inplace=False)
  (fc3): Linear(in_features=32, out_features=10, bias=True)
  (log_softmax): LogSoftmax(dim=1)
)


<br>
$\text{$Note:$ To use the model, we pass it the input data. This executes the model’s forward, along with some background operations.}$

Do not call `model.forward()` directly!

<br>
<br>

For Example:


In [40]:
X = torch.rand(1, 28, 28, device=device).to(device)

model.eval()

logits = model(X)

pred_probab = nn.Softmax(dim=1)(logits)
y_pred = pred_probab.argmax(1)
print(f"Predicted class: {y_pred}")

Predicted class: tensor([9])


### $2.1$ $Model$ $Parameter$

---

In [43]:
for name, param in model.named_parameters():
    print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")

Layer: fc1.weight | Size: torch.Size([64, 784]) | Values : tensor([[ 0.0133, -0.0237,  0.0206,  ...,  0.0139,  0.0252, -0.0354],
        [-0.0050, -0.0281, -0.0206,  ...,  0.0111,  0.0132,  0.0084]],
       grad_fn=<SliceBackward0>) 

Layer: fc1.bias | Size: torch.Size([64]) | Values : tensor([ 0.0268, -0.0006], grad_fn=<SliceBackward0>) 

Layer: batch_norm1.weight | Size: torch.Size([64]) | Values : tensor([1., 1.], grad_fn=<SliceBackward0>) 

Layer: batch_norm1.bias | Size: torch.Size([64]) | Values : tensor([0., 0.], grad_fn=<SliceBackward0>) 

Layer: fc2.weight | Size: torch.Size([32, 64]) | Values : tensor([[ 0.0019,  0.0728,  0.0160, -0.1096, -0.0224, -0.1016,  0.0629,  0.0023,
         -0.0318,  0.0864,  0.0658, -0.0057,  0.0155, -0.1068,  0.1052,  0.0478,
          0.0003,  0.0469,  0.0825, -0.1178,  0.0335,  0.1099, -0.0461,  0.0363,
         -0.0271, -0.1058, -0.0787, -0.0416, -0.0631, -0.0724,  0.0916,  0.0602,
          0.0233,  0.0690, -0.0083,  0.1036,  0.0065, -0.0189,  

# $\textbf{3. Training the Neural Network}$

---

In [44]:
def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    # Set the model to training mode - important for batch normalization and dropout layers
    # Unnecessary in this situation but added for best practices
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        # Compute prediction and loss
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if batch % 100 == 0:
            loss, current = loss.item(), batch * batch_size + len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

def test_loop(dataloader, model, loss_fn):
    # Set the model to evaluation mode - important for batch normalization and dropout layers
    # Unnecessary in this situation but added for best practices
    model.eval()
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    test_loss, correct = 0, 0

    # Evaluating the model with torch.no_grad() ensures that no gradients are computed during test mode
    # also serves to reduce unnecessary gradient computations and memory usage for tensors with requires_grad=True
    with torch.no_grad():
        for X, y in dataloader:
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()

    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

$Hyperparameters$

---
**Number of Epochs** - the number times to iterate over the dataset

**Batch Size** - the number of data samples propagated through the network before the parameters are updated

**Learning Rate** - how much to update models parameters at each batch/epoch. Smaller values yield slow learning speed, while large values may result in unpredictable behavior during training.

In [45]:
learning_rate = 1e-2
batch_size = 64
epochs = 5

In [46]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

epochs = 10
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.403424  [   64/60000]
loss: 1.871727  [ 6464/60000]
loss: 1.647410  [12864/60000]
loss: 1.457512  [19264/60000]
loss: 1.266085  [25664/60000]
loss: 1.254902  [32064/60000]
loss: 1.196556  [38464/60000]
loss: 1.069652  [44864/60000]
loss: 0.945549  [51264/60000]
loss: 1.130857  [57664/60000]
Test Error: 
 Accuracy: 79.4%, Avg loss: 0.733250 

Epoch 2
-------------------------------
loss: 1.128931  [   64/60000]
loss: 0.953985  [ 6464/60000]
loss: 1.011680  [12864/60000]
loss: 1.121553  [19264/60000]
loss: 0.850140  [25664/60000]
loss: 0.854010  [32064/60000]
loss: 0.987238  [38464/60000]
loss: 0.876438  [44864/60000]
loss: 0.645891  [51264/60000]
loss: 0.864375  [57664/60000]
Test Error: 
 Accuracy: 81.3%, Avg loss: 0.577402 

Epoch 3
-------------------------------
loss: 0.788171  [   64/60000]
loss: 0.895216  [ 6464/60000]
loss: 0.922458  [12864/60000]
loss: 0.764669  [19264/60000]
loss: 0.712194  [25664/60000]
loss: 0.699676  [32064/600

# $\textbf{4. Saving Model}$

---

In [47]:
torch.save(model, 'model.pth')