### Weight Initialization

* In PyTorch, layers like `nn.Linear` and `nn.Conv2d` use **He Uniform** initialization by default for weights.
* You can explicitly initialize weights using functions like `nn.init.kaiming_uniform_()` or `nn.init.kaiming_normal_()`.

**Why weight initialization matters:**

* Proper initialization helps the network **converge faster** during training.
* It prevents **vanishing or exploding gradients**, especially in deep networks.
* Good initialization ensures that signals (activations) and gradients maintain a reasonable scale across layers, improving **training stability** and final performance.


In [1]:
import torch
import torch.nn as nn
import math

torch.manual_seed(2025) 

linear_01 = nn.Linear(in_features=12, out_features=6)
print(f'weight boundary: {linear_01.weight.min().item()} ~ {linear_01.weight.max().item()}')

# pytorch weight = 1/sqrt(fan_in)
fan_in = linear_01.in_features  
bound = 1 / math.sqrt(fan_in)
print('bound:', bound)

weight boundary: -0.2870391011238098 ~ 0.2778533101081848
bound: 0.2886751345948129


In [2]:
nn.Linear(in_features=12, out_features=6).weight

Parameter containing:
tensor([[ 0.0659, -0.2521,  0.2000, -0.1711,  0.2313,  0.2874, -0.1012,  0.2303,
          0.0141,  0.1053, -0.0433,  0.0949],
        [-0.2482, -0.2393,  0.1905,  0.1161, -0.2317, -0.0941,  0.2304, -0.0812,
          0.2458,  0.2154, -0.2783, -0.1832],
        [-0.1151, -0.1292,  0.0518, -0.0755,  0.1770,  0.1658,  0.2855,  0.0584,
         -0.2340, -0.2408,  0.0152,  0.0131],
        [ 0.2644, -0.2522, -0.1562, -0.2101,  0.1018,  0.1315, -0.0366,  0.1072,
         -0.0036,  0.1219, -0.1012, -0.1467],
        [-0.1274, -0.1332,  0.1563, -0.2631,  0.1822,  0.0242, -0.1544, -0.1338,
         -0.0272,  0.0235,  0.2836, -0.2072],
        [-0.1529,  0.2757, -0.1078,  0.0923, -0.2552,  0.2812, -0.1020,  0.2624,
         -0.1282,  0.1250,  0.0238,  0.1085]], requires_grad=True)

In [3]:
torch.manual_seed(2025)

conv_01 = nn.Conv2d(in_channels=12, out_channels=24, kernel_size=3)
print(f'weight boundary: {conv_01.weight.min().item()} ~ {conv_01.weight.max().item()}')

# pytorch weight = 1/sqrt(fan_in). Conv2d fan_in is in_channels * kernel_height * kernel_width
fan_in = conv_01.in_channels * 3 * 3
bound = 1 / math.sqrt(fan_in)
print('bound:', bound)

weight boundary: -0.09610466659069061 ~ 0.0962221547961235
bound: 0.09622504486493763


In [4]:
class SimpleCNN_01(nn.Module):
    def __init__(self, num_classes=10):
        super().__init__()
        # First Conv2d layer: input channels 3, output channels 32, kernel size 3x3, stride 1
        self.conv_1 = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, stride=1)
        # Second Conv2d layer: input channels 32, output channels 64, kernel size 3x3, stride 1
        self.conv_2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=1)
        # Max pooling layer with kernel size 2x2
        self.pool = nn.MaxPool2d(kernel_size=2)
        # Flatten layer to convert 3D feature map to 1D vector
        self.flatten = nn.Flatten()
        # Fully connected classification layer
        self.classifier = nn.Linear(in_features=12544, out_features=num_classes)

        # Weight initialization for all Conv2d layers
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                print('kaiming normal initialization applied')
                # Default Conv2d uses kaiming_uniform; here we replace with kaiming_normal
                # Note the trailing underscore (_) in nn.init.kaiming_normal_ to modify in-place
                nn.init.kaiming_normal_(m.weight, mode="fan_in", nonlinearity="relu")

    def forward(self, x):
        # Apply first conv layer followed by ReLU activation
        x = F.relu(self.conv_1(x))
        # Apply second conv layer followed by ReLU activation
        x = F.relu(self.conv_2(x))
        # Apply max pooling
        x = self.pool(x)
        # Flatten the feature map
        x = self.flatten(x)
        # Pass through classifier to get final output
        x = self.classifier(x)

        return x


In [5]:
simple_cnn_01 = SimpleCNN_01(num_classes=10)

kaiming normal initialization applied
kaiming normal initialization applied


### Applying Batch Normalization

* After a Linear layer, use `BatchNorm1d(num_features)`, where `num_features` equals the Linear layer's `out_features`.
* After a Conv2d layer, use `BatchNorm2d(num_features)`, where `num_features` equals the Conv2d layer's `out_channels`.
* In an existing network, replace `Conv -> Activation` with `Conv -> BatchNorm -> Activation`.

**Why Batch Normalization?**
Batch Normalization normalizes the activations of each layer to have zero mean and unit variance. This stabilizes and accelerates training, helps prevent vanishing/exploding gradients, and can provide slight regularization to reduce overfitting.


In [6]:
from torchvision.datasets import CIFAR10
from torchvision.transforms import ToTensor
from torch.utils.data import DataLoader
from torch.utils.data import random_split

train_dataset = CIFAR10(root='./data', train=True, download=True, transform=ToTensor())
test_dataset = CIFAR10(root='./data', train=False, download=True, transform=ToTensor())

tr_size = int(0.85 * len(train_dataset))
val_size = len(train_dataset) - tr_size
tr_dataset, val_dataset = random_split(train_dataset, [tr_size, val_size])
print('tr:', len(tr_dataset), 'valid:', len(val_dataset))

tr_loader = DataLoader(tr_dataset, batch_size=32, shuffle=True, num_workers=4)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False, num_workers=4)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False, num_workers=4)

100%|██████████| 170M/170M [00:03<00:00, 48.3MB/s]


tr: 42500 valid: 7500


In [7]:
from tqdm import tqdm
import torch.nn.functional as F

class Trainer:
    def __init__(self, model, loss_fn, optimizer, train_loader, val_loader, device=None):
        self.model = model.to(device)  # Move model to specified device
        self.loss_fn = loss_fn          # Loss function
        self.optimizer = optimizer      # Optimizer
        self.train_loader = train_loader
        self.val_loader = val_loader
        self.device = device
    
    def train_epoch(self, epoch):
        self.model.train()  # Set model to training mode
        
        # Initialize running loss and accuracy metrics
        accu_loss = 0.0
        running_avg_loss = 0.0
        num_total = 0.0
        accu_num_correct = 0.0
        accuracy = 0.0
        
        # Use tqdm to visualize training progress
        with tqdm(total=len(self.train_loader), desc=f"Epoch {epoch+1} [Training..]", leave=True) as progress_bar:
            for batch_idx, (inputs, targets) in enumerate(self.train_loader):
                # Move inputs and targets to the specified device
                inputs = inputs.to(self.device)
                targets = targets.to(self.device)
                
                # Forward pass
                outputs = self.model(inputs)
                loss = self.loss_fn(outputs, targets)
                
                # Backward pass
                self.optimizer.zero_grad()
                loss.backward()
                self.optimizer.step()

                # Update running average loss
                accu_loss += loss.item()
                running_avg_loss = accu_loss /(batch_idx + 1)

                # Calculate accuracy
                # Compare predicted class (argmax) with targets
                num_correct = (outputs.argmax(-1) == targets).sum().item()
                num_total += inputs.shape[0]
                accu_num_correct += num_correct
                accuracy = accu_num_correct / num_total

                # Update tqdm progress bar with loss and accuracy
                progress_bar.update(1)
                if batch_idx % 20 == 0 or (batch_idx + 1) == progress_bar.total:
                    progress_bar.set_postfix({"Loss": running_avg_loss, 
                                              "Accuracy": accuracy})
        
        return running_avg_loss, accuracy
                
    def validate_epoch(self, epoch):
        if not self.val_loader:
            return None
            
        self.model.eval()  # Set model to evaluation mode

        # Initialize running loss and accuracy metrics
        accu_loss = 0
        running_avg_loss = 0
        num_total = 0.0
        accu_num_correct = 0.0
        accuracy = 0.0

        with tqdm(total=len(self.val_loader), desc=f"Epoch {epoch+1} [Validating]", leave=True) as progress_bar:
            with torch.no_grad():  # Disable gradient computation for validation
                for batch_idx, (inputs, targets) in enumerate(self.val_loader):
                    inputs = inputs.to(self.device)
                    targets = targets.to(self.device)
                    
                    outputs = self.model(inputs)
                    loss = self.loss_fn(outputs, targets)

                    # Update running average loss
                    accu_loss += loss.item()
                    running_avg_loss = accu_loss /(batch_idx + 1)

                    # Calculate accuracy
                    num_correct = (outputs.argmax(-1) == targets).sum().item()
                    num_total += inputs.shape[0]
                    accu_num_correct += num_correct
                    accuracy = accu_num_correct / num_total

                    # Update tqdm progress bar with loss and accuracy
                    progress_bar.update(1)
                    if batch_idx % 20 == 0 or (batch_idx + 1) == progress_bar.total:
                        progress_bar.set_postfix({"Loss": running_avg_loss, 
                                                  "Accuracy":accuracy})
        return running_avg_loss, accuracy
    
    def fit(self, epochs):
        # Initialize history dict to store train/validation loss and accuracy
        history = {'train_loss': [], 'train_acc': [], 'val_loss': [], 'val_acc': []}
        for epoch in range(epochs):
            train_loss, train_acc = self.train_epoch(epoch)
            val_loss, val_acc = self.validate_epoch(epoch)
            print(f"Epoch {epoch+1}/{epochs}, Train Loss: {train_loss:.4f} Train Accuracy: {train_acc:.4f}",
                  f", Val Loss: {val_loss:.4f} Val Accuracy: {val_acc:.4f}" if val_loss is not None else "")
            
            # Record metrics for this epoch
            history['train_loss'].append(train_loss)
            history['train_acc'].append(train_acc)
            history['val_loss'].append(val_loss)
            history['val_acc'].append(val_acc)
            
        return history 
    
    # Return the trained model
    def get_trained_model(self):
        return self.model


#### Applying Batch Normalization to the Model and Evaluating Performance

* Modify the original network so that each Conv -> Activation block becomes Conv -> BatchNorm -> Activation.


In [8]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchinfo import summary

NUM_INPUT_CHANNELS = 3

class SimpleCNNWithBN(nn.Module):
    def __init__(self, num_classes):
        super().__init__()

        # Conv block 1: Two Conv2d layers with 32 filters, followed by BatchNorm and ReLU, then MaxPooling
        self.conv_block_1 = nn.Sequential(
            nn.Conv2d(in_channels=NUM_INPUT_CHANNELS, out_channels=32, kernel_size=3, padding=1),
            nn.BatchNorm2d(num_features=32),
            nn.ReLU(),
            nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3, padding=1),
            nn.BatchNorm2d(num_features=32),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2)
        )
        
        # Conv block 2: Two Conv2d layers with 64 filters, followed by BatchNorm and ReLU, then MaxPooling
        # padding='same' ensures output size remains the same (introduced in PyTorch 1.8)
        self.conv_block_2 = nn.Sequential(
            nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding='same'),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, padding='same'),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2)
        )

        # Conv block 3: Two Conv2d layers with 128 filters, followed by BatchNorm and ReLU, then MaxPooling
        self.conv_block_3 = nn.Sequential(
            nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2)
        )

        # Classifier block: Global Average Pooling -> Flatten -> Linear layer for final classification
        self.classifier_block = nn.Sequential(
            nn.AdaptiveAvgPool2d(output_size=(1, 1)),
            nn.Flatten(),
            nn.Linear(in_features=128, out_features=num_classes)
        )
        
    def forward(self, x):
        x = self.conv_block_1(x)
        x = self.conv_block_2(x)
        x = self.conv_block_3(x)
        x = self.classifier_block(x)

        return x

# Instantiate the model with Batch Normalization
simple_cnn = SimpleCNNWithBN(num_classes=10)

# Print model summary: shows input/output sizes and number of parameters per layer
summary(model=simple_cnn, input_size=(1, 3, 32, 32), 
        col_names=['input_size', 'output_size', 'num_params'], 
        row_settings=['var_names'])


Layer (type (var_name))                  Input Shape               Output Shape              Param #
SimpleCNNWithBN (SimpleCNNWithBN)        [1, 3, 32, 32]            [1, 10]                   --
├─Sequential (conv_block_1)              [1, 3, 32, 32]            [1, 32, 16, 16]           --
│    └─Conv2d (0)                        [1, 3, 32, 32]            [1, 32, 32, 32]           896
│    └─BatchNorm2d (1)                   [1, 32, 32, 32]           [1, 32, 32, 32]           64
│    └─ReLU (2)                          [1, 32, 32, 32]           [1, 32, 32, 32]           --
│    └─Conv2d (3)                        [1, 32, 32, 32]           [1, 32, 32, 32]           9,248
│    └─BatchNorm2d (4)                   [1, 32, 32, 32]           [1, 32, 32, 32]           64
│    └─ReLU (5)                          [1, 32, 32, 32]           [1, 32, 32, 32]           --
│    └─MaxPool2d (6)                     [1, 32, 32, 32]           [1, 32, 16, 16]           --
├─Sequential (conv_block_2)    

#### Apply a function to create repeated Sequential Container blocks


In [9]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchinfo import summary

NUM_INPUT_CHANNELS = 3

class SimpleCNNWithBN(nn.Module):
    def __init__(self, num_classes):
        super().__init__()

        # Maintain the same output spatial size as input by using padding=1 in conv.
        self.conv_block_1 = self.create_convbn_block(first_channels=3, middle_channels=32, last_channels=32)        
        
        # Two Conv2d layers with out_channels=64. stride=1 is default, padding='same' introduced in v1.8.
        self.conv_block_2 = self.create_convbn_block(first_channels=32, middle_channels=64, last_channels=64)
        
        # Two Conv2d layers with 128 filters followed by Max Pooling.
        self.conv_block_3 = self.create_convbn_block(first_channels=64, middle_channels=128, last_channels=128)
        
        # Global Average Pooling (GAP) and final classifier layer.
        self.classifier_block = nn.Sequential(
            nn.AdaptiveAvgPool2d(output_size=(1, 1)),  # GAP converts feature map to 1x1 per channel
            nn.Flatten(),                               # Flatten to feed into Linear layer
            nn.Linear(in_features=128, out_features=num_classes)  # Final classification
        )

    # Function to create repeated Conv -> BatchNorm -> ReLU -> Conv -> BatchNorm -> ReLU -> MaxPool blocks
    def create_convbn_block(self, first_channels, middle_channels, last_channels):
        conv_bn_block = nn.Sequential(
            nn.Conv2d(in_channels=first_channels, out_channels=middle_channels, kernel_size=3, padding=1),
            nn.BatchNorm2d(middle_channels),
            nn.ReLU(),
            nn.Conv2d(in_channels=middle_channels, out_channels=last_channels, kernel_size=3, padding=1),
            nn.BatchNorm2d(last_channels),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2)
        )
        return conv_bn_block
        
    def forward(self, x):
        x = self.conv_block_1(x)
        x = self.conv_block_2(x)
        x = self.conv_block_3(x)
        x = self.classifier_block(x)

        return x

# Instantiate model and summarize architecture
simple_cnn = SimpleCNNWithBN(num_classes=10)

summary(model=simple_cnn, input_size=(1, 3, 32, 32), 
        col_names=['input_size', 'output_size', 'num_params'], 
        row_settings=['var_names'])


Layer (type (var_name))                  Input Shape               Output Shape              Param #
SimpleCNNWithBN (SimpleCNNWithBN)        [1, 3, 32, 32]            [1, 10]                   --
├─Sequential (conv_block_1)              [1, 3, 32, 32]            [1, 32, 16, 16]           --
│    └─Conv2d (0)                        [1, 3, 32, 32]            [1, 32, 32, 32]           896
│    └─BatchNorm2d (1)                   [1, 32, 32, 32]           [1, 32, 32, 32]           64
│    └─ReLU (2)                          [1, 32, 32, 32]           [1, 32, 32, 32]           --
│    └─Conv2d (3)                        [1, 32, 32, 32]           [1, 32, 32, 32]           9,248
│    └─BatchNorm2d (4)                   [1, 32, 32, 32]           [1, 32, 32, 32]           64
│    └─ReLU (5)                          [1, 32, 32, 32]           [1, 32, 32, 32]           --
│    └─MaxPool2d (6)                     [1, 32, 32, 32]           [1, 32, 16, 16]           --
├─Sequential (conv_block_2)    

In [10]:
import torch 
import torch.nn as nn
from torch.optim import SGD, Adam

NUM_INPUT_CHANNELS = 3
NUM_CLASSES = 10

model = SimpleCNNWithBN(num_classes=NUM_CLASSES)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
optimizer = Adam(model.parameters(), lr=0.001)
loss_fn = nn.CrossEntropyLoss()

trainer = Trainer(model=model, loss_fn=loss_fn, optimizer=optimizer,
       train_loader=tr_loader, val_loader=val_loader, device=device)

history = trainer.fit(30)

Epoch 1 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 133.34it/s, Loss=1.23, Accuracy=0.553]
Epoch 1 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 249.82it/s, Loss=1.14, Accuracy=0.595]


Epoch 1/30, Train Loss: 1.2296 Train Accuracy: 0.5535 , Val Loss: 1.1428 Val Accuracy: 0.5952


Epoch 2 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 139.91it/s, Loss=0.822, Accuracy=0.71]
Epoch 2 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 251.97it/s, Loss=0.785, Accuracy=0.728]


Epoch 2/30, Train Loss: 0.8223 Train Accuracy: 0.7104 , Val Loss: 0.7853 Val Accuracy: 0.7279


Epoch 3 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 141.04it/s, Loss=0.665, Accuracy=0.769]
Epoch 3 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 240.69it/s, Loss=0.699, Accuracy=0.753]


Epoch 3/30, Train Loss: 0.6648 Train Accuracy: 0.7692 , Val Loss: 0.6990 Val Accuracy: 0.7528


Epoch 4 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 140.15it/s, Loss=0.57, Accuracy=0.805]
Epoch 4 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 252.80it/s, Loss=0.591, Accuracy=0.792]


Epoch 4/30, Train Loss: 0.5704 Train Accuracy: 0.8051 , Val Loss: 0.5911 Val Accuracy: 0.7924


Epoch 5 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 141.66it/s, Loss=0.493, Accuracy=0.829]
Epoch 5 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 255.78it/s, Loss=0.597, Accuracy=0.791]


Epoch 5/30, Train Loss: 0.4928 Train Accuracy: 0.8291 , Val Loss: 0.5969 Val Accuracy: 0.7915


Epoch 6 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 140.98it/s, Loss=0.432, Accuracy=0.85]
Epoch 6 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 252.31it/s, Loss=0.51, Accuracy=0.825]


Epoch 6/30, Train Loss: 0.4324 Train Accuracy: 0.8501 , Val Loss: 0.5095 Val Accuracy: 0.8249


Epoch 7 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 141.35it/s, Loss=0.382, Accuracy=0.868]
Epoch 7 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 255.03it/s, Loss=0.707, Accuracy=0.766]


Epoch 7/30, Train Loss: 0.3818 Train Accuracy: 0.8677 , Val Loss: 0.7074 Val Accuracy: 0.7659


Epoch 8 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 141.35it/s, Loss=0.33, Accuracy=0.884]
Epoch 8 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 252.08it/s, Loss=0.658, Accuracy=0.79]


Epoch 8/30, Train Loss: 0.3302 Train Accuracy: 0.8838 , Val Loss: 0.6581 Val Accuracy: 0.7903


Epoch 9 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 140.21it/s, Loss=0.288, Accuracy=0.899]
Epoch 9 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 246.66it/s, Loss=0.59, Accuracy=0.805]


Epoch 9/30, Train Loss: 0.2885 Train Accuracy: 0.8993 , Val Loss: 0.5899 Val Accuracy: 0.8055


Epoch 10 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 139.59it/s, Loss=0.247, Accuracy=0.914]
Epoch 10 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 255.67it/s, Loss=0.57, Accuracy=0.822]


Epoch 10/30, Train Loss: 0.2470 Train Accuracy: 0.9138 , Val Loss: 0.5698 Val Accuracy: 0.8220


Epoch 11 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 140.13it/s, Loss=0.211, Accuracy=0.926]
Epoch 11 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 249.11it/s, Loss=0.647, Accuracy=0.812]


Epoch 11/30, Train Loss: 0.2113 Train Accuracy: 0.9261 , Val Loss: 0.6471 Val Accuracy: 0.8119


Epoch 12 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 142.45it/s, Loss=0.187, Accuracy=0.935]
Epoch 12 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 251.44it/s, Loss=0.565, Accuracy=0.831]


Epoch 12/30, Train Loss: 0.1871 Train Accuracy: 0.9350 , Val Loss: 0.5655 Val Accuracy: 0.8307


Epoch 13 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 142.55it/s, Loss=0.157, Accuracy=0.947]
Epoch 13 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 251.74it/s, Loss=0.593, Accuracy=0.827]


Epoch 13/30, Train Loss: 0.1570 Train Accuracy: 0.9468 , Val Loss: 0.5934 Val Accuracy: 0.8265


Epoch 14 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 142.02it/s, Loss=0.142, Accuracy=0.95]
Epoch 14 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 236.76it/s, Loss=0.588, Accuracy=0.828]


Epoch 14/30, Train Loss: 0.1417 Train Accuracy: 0.9505 , Val Loss: 0.5876 Val Accuracy: 0.8279


Epoch 15 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 140.81it/s, Loss=0.124, Accuracy=0.957]
Epoch 15 [Validating]: 100%|██████████| 235/235 [00:01<00:00, 228.28it/s, Loss=0.708, Accuracy=0.809]


Epoch 15/30, Train Loss: 0.1238 Train Accuracy: 0.9569 , Val Loss: 0.7076 Val Accuracy: 0.8093


Epoch 16 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 141.43it/s, Loss=0.113, Accuracy=0.961]
Epoch 16 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 247.86it/s, Loss=0.727, Accuracy=0.812]


Epoch 16/30, Train Loss: 0.1132 Train Accuracy: 0.9611 , Val Loss: 0.7273 Val Accuracy: 0.8117


Epoch 17 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 140.82it/s, Loss=0.0952, Accuracy=0.967]
Epoch 17 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 251.25it/s, Loss=0.784, Accuracy=0.803]


Epoch 17/30, Train Loss: 0.0952 Train Accuracy: 0.9668 , Val Loss: 0.7845 Val Accuracy: 0.8033


Epoch 18 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 139.83it/s, Loss=0.0901, Accuracy=0.969]
Epoch 18 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 248.40it/s, Loss=0.7, Accuracy=0.825]


Epoch 18/30, Train Loss: 0.0901 Train Accuracy: 0.9693 , Val Loss: 0.6999 Val Accuracy: 0.8247


Epoch 19 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 139.01it/s, Loss=0.084, Accuracy=0.971]
Epoch 19 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 240.61it/s, Loss=0.629, Accuracy=0.836]


Epoch 19/30, Train Loss: 0.0840 Train Accuracy: 0.9712 , Val Loss: 0.6292 Val Accuracy: 0.8364


Epoch 20 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 138.88it/s, Loss=0.0763, Accuracy=0.974]
Epoch 20 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 250.70it/s, Loss=0.674, Accuracy=0.829]


Epoch 20/30, Train Loss: 0.0763 Train Accuracy: 0.9736 , Val Loss: 0.6739 Val Accuracy: 0.8293


Epoch 21 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 136.03it/s, Loss=0.0775, Accuracy=0.973]
Epoch 21 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 253.20it/s, Loss=0.798, Accuracy=0.813]


Epoch 21/30, Train Loss: 0.0775 Train Accuracy: 0.9726 , Val Loss: 0.7978 Val Accuracy: 0.8128


Epoch 22 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 140.69it/s, Loss=0.0673, Accuracy=0.978]
Epoch 22 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 249.87it/s, Loss=0.71, Accuracy=0.839]


Epoch 22/30, Train Loss: 0.0673 Train Accuracy: 0.9776 , Val Loss: 0.7105 Val Accuracy: 0.8393


Epoch 23 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 140.06it/s, Loss=0.0677, Accuracy=0.977]
Epoch 23 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 251.64it/s, Loss=0.746, Accuracy=0.829]


Epoch 23/30, Train Loss: 0.0677 Train Accuracy: 0.9766 , Val Loss: 0.7465 Val Accuracy: 0.8289


Epoch 24 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 139.54it/s, Loss=0.0629, Accuracy=0.978]
Epoch 24 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 247.69it/s, Loss=0.679, Accuracy=0.835]


Epoch 24/30, Train Loss: 0.0629 Train Accuracy: 0.9780 , Val Loss: 0.6788 Val Accuracy: 0.8351


Epoch 25 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 141.55it/s, Loss=0.0499, Accuracy=0.984]
Epoch 25 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 246.13it/s, Loss=0.738, Accuracy=0.825]


Epoch 25/30, Train Loss: 0.0499 Train Accuracy: 0.9840 , Val Loss: 0.7376 Val Accuracy: 0.8252


Epoch 26 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 140.04it/s, Loss=0.0636, Accuracy=0.978]
Epoch 26 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 247.59it/s, Loss=0.81, Accuracy=0.815]


Epoch 26/30, Train Loss: 0.0636 Train Accuracy: 0.9780 , Val Loss: 0.8104 Val Accuracy: 0.8145


Epoch 27 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 140.28it/s, Loss=0.0557, Accuracy=0.981]
Epoch 27 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 247.36it/s, Loss=0.675, Accuracy=0.839]


Epoch 27/30, Train Loss: 0.0557 Train Accuracy: 0.9807 , Val Loss: 0.6746 Val Accuracy: 0.8391


Epoch 28 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 141.65it/s, Loss=0.0475, Accuracy=0.984]
Epoch 28 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 246.58it/s, Loss=0.726, Accuracy=0.836]


Epoch 28/30, Train Loss: 0.0475 Train Accuracy: 0.9841 , Val Loss: 0.7265 Val Accuracy: 0.8357


Epoch 29 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 141.58it/s, Loss=0.0543, Accuracy=0.981]
Epoch 29 [Validating]: 100%|██████████| 235/235 [00:01<00:00, 229.58it/s, Loss=0.73, Accuracy=0.84]


Epoch 29/30, Train Loss: 0.0543 Train Accuracy: 0.9810 , Val Loss: 0.7295 Val Accuracy: 0.8401


Epoch 30 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 139.46it/s, Loss=0.0457, Accuracy=0.984]
Epoch 30 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 247.82it/s, Loss=0.771, Accuracy=0.83]

Epoch 30/30, Train Loss: 0.0457 Train Accuracy: 0.9844 , Val Loss: 0.7710 Val Accuracy: 0.8297





#### Predict with Predictor class

In [11]:
class Predictor:
    def __init__(self, model, device):
        self.model = model.to(device)
        self.device = device

    def evaluate(self, loader):
        self.model.eval()
        eval_metric = 0.0
        num_total = 0.0
        accu_num_correct = 0.0

        with tqdm(total=len(loader), desc=f"[Evaluating]", leave=True) as progress_bar:
            with torch.no_grad():
                for batch_idx, (inputs, targets) in enumerate(loader):
                    inputs = inputs.to(self.device)
                    targets = targets.to(self.device)
                    pred = self.model(inputs)

                    num_correct = (pred.argmax(-1) == targets).sum().item()
                    num_total += inputs.shape[0]
                    accu_num_correct += num_correct
                    eval_metric = accu_num_correct / num_total

                    progress_bar.update(1)
                    if batch_idx % 20 == 0 or (batch_idx + 1) == progress_bar.total:
                        progress_bar.set_postfix({"Accuracy": eval_metric})
        
        return eval_metric

    def predict_proba(self, inputs):
        self.model.eval()
        with torch.no_grad():
            inputs = inputs.to(self.device)
            outputs = self.model(inputs)
            pred_proba = F.softmax(outputs, dim=-1) 

        return pred_proba

    def predict(self, inputs):
        pred_proba = self.predict_proba(inputs)
        pred_class = torch.argmax(pred_proba, dim=-1)

        return pred_class

In [12]:
trained_model = trainer.get_trained_model()

predictor = Predictor(model=trained_model, device=device)
eval_metric = predictor.evaluate(test_loader)
print(f'test dataset evaluation:{eval_metric:.4f}')

[Evaluating]: 100%|██████████| 313/313 [00:01<00:00, 250.71it/s, Accuracy=0.828]

test dataset evaluation:0.8279





### Dropout

* PyTorch provides `nn.Dropout(p)` layer for implementing dropout.
* `nn.Dropout(p)` randomly sets elements of the input tensor to 0 with probability `p`. The remaining elements are scaled by a factor of `1/(1-p)` to maintain the expected sum of activations.

**Why use Dropout:**
Dropout is a regularization technique that helps prevent overfitting by randomly "dropping" units during training. This forces the network to not rely too heavily on any single neuron, promoting redundancy and improving generalization.


In [13]:
import torch
import torch.nn as nn
import torch.nn.functional as F

input_tensor = torch.randn(4, 10)
print(f"input Tensor:\n{input_tensor}")

# Count the number of zeros in the original input tensor
num_zeros = torch.sum(input_tensor == 0).item()
print(f"Number of zeros in input_tensor: {num_zeros}")

# Define a Dropout layer with p=0.25
dropout = nn.Dropout(p=0.25)

# Apply the Dropout layer
output_tensor = dropout(input_tensor)

# Count the number of elements that became zero after applying Dropout
num_zeros = torch.sum(output_tensor == 0).item()

# Get the total number of elements in the output tensor
total_elements = output_tensor.numel()

# Calculate the percentage of zeros in the output tensor
percentage_zeros = (num_zeros / total_elements) * 100

print(f"Output Tensor:\n{output_tensor}")
print(f"Number of zeros in output tensor: {num_zeros}")
print(f"Percentage of zeros: {percentage_zeros:.2f}%")


input Tensor:
tensor([[ 0.5081, -0.5714,  0.3153,  0.1583,  0.3019,  0.7540,  0.2886,  0.0384,
         -0.3487, -0.8386],
        [-0.4652, -1.4711,  0.5225, -1.7037,  0.1465, -0.8918, -0.3246, -0.4334,
          0.8285,  0.8772],
        [-1.0809, -1.2962,  1.5066, -1.0503,  1.5361,  0.4981, -1.1635, -0.8305,
         -1.8033, -0.7408],
        [-1.1327,  0.1192, -1.1528,  0.4611,  0.3449, -0.7675, -1.0406, -0.1629,
          0.3756,  0.7246]])
Number of zeros in input_tensor: 0
Output Tensor:
tensor([[ 0.0000, -0.7618,  0.4203,  0.2111,  0.4025,  0.0000,  0.3848,  0.0512,
         -0.4649, -1.1181],
        [-0.6203, -1.9614,  0.6967, -2.2716,  0.1953, -0.0000, -0.4328, -0.0000,
          1.1047,  1.1696],
        [-1.4413, -1.7282,  0.0000, -0.0000,  2.0481,  0.6641, -1.5513, -1.1074,
         -0.0000, -0.9878],
        [-1.5103,  0.1589, -1.5371,  0.6148,  0.4599, -0.0000, -1.3874, -0.2172,
          0.0000,  0.9662]])
Number of zeros in output tensor: 9
Percentage of zeros: 22.50

### Connecting the Classifier with Dropout and Linear Layer

* PyTorch allows modifying a model dynamically by connecting only specific submodules of an existing model.
* The model structure can be modified so that only the classification block applies Dropout before the Linear layer.


In [14]:
NUM_CLASSES = 10

simple_cnnbn_base = SimpleCNNWithBN(num_classes=NUM_CLASSES)
simple_cnnbn_base.classifier_block

Sequential(
  (0): AdaptiveAvgPool2d(output_size=(1, 1))
  (1): Flatten(start_dim=1, end_dim=-1)
  (2): Linear(in_features=128, out_features=10, bias=True)
)

In [15]:
do_classifier_block = nn.Sequential(
            nn.Flatten(),
            nn.Dropout(p=0.5),
            nn.Linear(in_features=128*4*4, out_features=300),
            nn.ReLU(),
            nn.Dropout(p=0.3),
            nn.Linear(in_features=300, out_features=10),
        )
simple_cnnbn_base.classifier_block = do_classifier_block
print(simple_cnnbn_base.classifier_block)

Sequential(
  (0): Flatten(start_dim=1, end_dim=-1)
  (1): Dropout(p=0.5, inplace=False)
  (2): Linear(in_features=2048, out_features=300, bias=True)
  (3): ReLU()
  (4): Dropout(p=0.3, inplace=False)
  (5): Linear(in_features=300, out_features=10, bias=True)
)


In [16]:
summary(model=simple_cnnbn_base, input_size=(1, 3, 32, 32), 
        col_names=['input_size', 'output_size', 'num_params'], 
        row_settings=['var_names'])

Layer (type (var_name))                  Input Shape               Output Shape              Param #
SimpleCNNWithBN (SimpleCNNWithBN)        [1, 3, 32, 32]            [1, 10]                   --
├─Sequential (conv_block_1)              [1, 3, 32, 32]            [1, 32, 16, 16]           --
│    └─Conv2d (0)                        [1, 3, 32, 32]            [1, 32, 32, 32]           896
│    └─BatchNorm2d (1)                   [1, 32, 32, 32]           [1, 32, 32, 32]           64
│    └─ReLU (2)                          [1, 32, 32, 32]           [1, 32, 32, 32]           --
│    └─Conv2d (3)                        [1, 32, 32, 32]           [1, 32, 32, 32]           9,248
│    └─BatchNorm2d (4)                   [1, 32, 32, 32]           [1, 32, 32, 32]           64
│    └─ReLU (5)                          [1, 32, 32, 32]           [1, 32, 32, 32]           --
│    └─MaxPool2d (6)                     [1, 32, 32, 32]           [1, 32, 16, 16]           --
├─Sequential (conv_block_2)    

In [17]:
def create_do_classifier_block(first_features, second_features, first_dos, second_dos, num_classes=10):
    return nn.Sequential(
            nn.Flatten(),
            nn.Dropout(p=first_dos),
            nn.Linear(in_features=first_features, out_features=second_features),
            nn.ReLU(),
            nn.Dropout(p=second_dos),
            nn.Linear(in_features=second_features, out_features=num_classes),
        )

simple_cnnbn_base = SimpleCNNWithBN(num_classes=NUM_CLASSES)
do_classifier_block = create_do_classifier_block(first_features=128*4*4, second_features=300,
                                                 first_dos=0.5, second_dos=0.3, num_classes=10)
simple_cnnbn_base.classifier_block = do_classifier_block

summary(model=simple_cnnbn_base, input_size=(1, 3, 32, 32), 
        col_names=['input_size', 'output_size', 'num_params'], 
        row_settings=['var_names'])

Layer (type (var_name))                  Input Shape               Output Shape              Param #
SimpleCNNWithBN (SimpleCNNWithBN)        [1, 3, 32, 32]            [1, 10]                   --
├─Sequential (conv_block_1)              [1, 3, 32, 32]            [1, 32, 16, 16]           --
│    └─Conv2d (0)                        [1, 3, 32, 32]            [1, 32, 32, 32]           896
│    └─BatchNorm2d (1)                   [1, 32, 32, 32]           [1, 32, 32, 32]           64
│    └─ReLU (2)                          [1, 32, 32, 32]           [1, 32, 32, 32]           --
│    └─Conv2d (3)                        [1, 32, 32, 32]           [1, 32, 32, 32]           9,248
│    └─BatchNorm2d (4)                   [1, 32, 32, 32]           [1, 32, 32, 32]           64
│    └─ReLU (5)                          [1, 32, 32, 32]           [1, 32, 32, 32]           --
│    └─MaxPool2d (6)                     [1, 32, 32, 32]           [1, 32, 16, 16]           --
├─Sequential (conv_block_2)    

In [18]:
import torch 
import torch.nn as nn
from torch.optim import SGD, Adam

NUM_INPUT_CHANNELS = 3
NUM_CLASSES = 10

model = SimpleCNNWithBN(num_classes=NUM_CLASSES)
do_classifier_block = create_do_classifier_block(first_features=128*4*4, second_features=300,
                                                 first_dos=0.5, second_dos=0.3, num_classes=10)
model.classifier_block = do_classifier_block
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
optimizer = Adam(model.parameters(), lr=0.001)
loss_fn = nn.CrossEntropyLoss()

trainer = Trainer(model=model, loss_fn=loss_fn, optimizer=optimizer,
       train_loader=tr_loader, val_loader=val_loader, device=device)
 
history = trainer.fit(30)

Epoch 1 [Training..]: 100%|██████████| 1329/1329 [00:10<00:00, 132.59it/s, Loss=1.37, Accuracy=0.506]
Epoch 1 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 244.22it/s, Loss=1.03, Accuracy=0.618]


Epoch 1/30, Train Loss: 1.3663 Train Accuracy: 0.5058 , Val Loss: 1.0258 Val Accuracy: 0.6177


Epoch 2 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 133.45it/s, Loss=0.972, Accuracy=0.662]
Epoch 2 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 245.14it/s, Loss=0.89, Accuracy=0.685]


Epoch 2/30, Train Loss: 0.9719 Train Accuracy: 0.6621 , Val Loss: 0.8900 Val Accuracy: 0.6855


Epoch 3 [Training..]: 100%|██████████| 1329/1329 [00:10<00:00, 132.50it/s, Loss=0.83, Accuracy=0.71]
Epoch 3 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 243.10it/s, Loss=0.752, Accuracy=0.735]


Epoch 3/30, Train Loss: 0.8301 Train Accuracy: 0.7103 , Val Loss: 0.7519 Val Accuracy: 0.7347


Epoch 4 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 133.78it/s, Loss=0.743, Accuracy=0.745]
Epoch 4 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 246.09it/s, Loss=0.767, Accuracy=0.735]


Epoch 4/30, Train Loss: 0.7426 Train Accuracy: 0.7453 , Val Loss: 0.7667 Val Accuracy: 0.7347


Epoch 5 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 135.10it/s, Loss=0.665, Accuracy=0.772]
Epoch 5 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 248.96it/s, Loss=0.642, Accuracy=0.772]


Epoch 5/30, Train Loss: 0.6646 Train Accuracy: 0.7724 , Val Loss: 0.6424 Val Accuracy: 0.7716


Epoch 6 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 133.87it/s, Loss=0.6, Accuracy=0.793]
Epoch 6 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 247.63it/s, Loss=0.599, Accuracy=0.793]


Epoch 6/30, Train Loss: 0.6004 Train Accuracy: 0.7934 , Val Loss: 0.5989 Val Accuracy: 0.7935


Epoch 7 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 133.72it/s, Loss=0.55, Accuracy=0.812]
Epoch 7 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 236.38it/s, Loss=0.573, Accuracy=0.803]


Epoch 7/30, Train Loss: 0.5501 Train Accuracy: 0.8116 , Val Loss: 0.5727 Val Accuracy: 0.8035


Epoch 8 [Training..]: 100%|██████████| 1329/1329 [00:10<00:00, 132.06it/s, Loss=0.504, Accuracy=0.83]
Epoch 8 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 245.70it/s, Loss=0.523, Accuracy=0.821]


Epoch 8/30, Train Loss: 0.5040 Train Accuracy: 0.8300 , Val Loss: 0.5232 Val Accuracy: 0.8215


Epoch 9 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 132.92it/s, Loss=0.466, Accuracy=0.841]
Epoch 9 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 248.49it/s, Loss=0.543, Accuracy=0.817]


Epoch 9/30, Train Loss: 0.4657 Train Accuracy: 0.8409 , Val Loss: 0.5428 Val Accuracy: 0.8171


Epoch 10 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 134.29it/s, Loss=0.425, Accuracy=0.856]
Epoch 10 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 248.23it/s, Loss=0.54, Accuracy=0.814]


Epoch 10/30, Train Loss: 0.4253 Train Accuracy: 0.8557 , Val Loss: 0.5402 Val Accuracy: 0.8139


Epoch 11 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 134.05it/s, Loss=0.399, Accuracy=0.864]
Epoch 11 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 245.90it/s, Loss=0.505, Accuracy=0.829]


Epoch 11/30, Train Loss: 0.3994 Train Accuracy: 0.8636 , Val Loss: 0.5050 Val Accuracy: 0.8295


Epoch 12 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 133.56it/s, Loss=0.366, Accuracy=0.874]
Epoch 12 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 250.94it/s, Loss=0.522, Accuracy=0.832]


Epoch 12/30, Train Loss: 0.3661 Train Accuracy: 0.8744 , Val Loss: 0.5222 Val Accuracy: 0.8323


Epoch 13 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 134.89it/s, Loss=0.344, Accuracy=0.882]
Epoch 13 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 249.33it/s, Loss=0.5, Accuracy=0.833]


Epoch 13/30, Train Loss: 0.3437 Train Accuracy: 0.8820 , Val Loss: 0.5004 Val Accuracy: 0.8327


Epoch 14 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 133.80it/s, Loss=0.325, Accuracy=0.888]
Epoch 14 [Validating]: 100%|██████████| 235/235 [00:01<00:00, 229.02it/s, Loss=0.539, Accuracy=0.825]


Epoch 14/30, Train Loss: 0.3252 Train Accuracy: 0.8876 , Val Loss: 0.5385 Val Accuracy: 0.8255


Epoch 15 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 135.18it/s, Loss=0.301, Accuracy=0.898]
Epoch 15 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 238.78it/s, Loss=0.511, Accuracy=0.838]


Epoch 15/30, Train Loss: 0.3009 Train Accuracy: 0.8975 , Val Loss: 0.5109 Val Accuracy: 0.8377


Epoch 16 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 133.95it/s, Loss=0.279, Accuracy=0.904]
Epoch 16 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 245.03it/s, Loss=0.499, Accuracy=0.84]


Epoch 16/30, Train Loss: 0.2792 Train Accuracy: 0.9044 , Val Loss: 0.4989 Val Accuracy: 0.8399


Epoch 17 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 133.89it/s, Loss=0.262, Accuracy=0.908]
Epoch 17 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 244.30it/s, Loss=0.493, Accuracy=0.843]


Epoch 17/30, Train Loss: 0.2620 Train Accuracy: 0.9085 , Val Loss: 0.4932 Val Accuracy: 0.8429


Epoch 18 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 133.45it/s, Loss=0.249, Accuracy=0.914]
Epoch 18 [Validating]: 100%|██████████| 235/235 [00:01<00:00, 232.58it/s, Loss=0.498, Accuracy=0.841]


Epoch 18/30, Train Loss: 0.2491 Train Accuracy: 0.9140 , Val Loss: 0.4979 Val Accuracy: 0.8409


Epoch 19 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 134.22it/s, Loss=0.236, Accuracy=0.919]
Epoch 19 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 244.52it/s, Loss=0.478, Accuracy=0.851]


Epoch 19/30, Train Loss: 0.2359 Train Accuracy: 0.9194 , Val Loss: 0.4782 Val Accuracy: 0.8515


Epoch 20 [Training..]: 100%|██████████| 1329/1329 [00:10<00:00, 132.89it/s, Loss=0.218, Accuracy=0.925]
Epoch 20 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 244.74it/s, Loss=0.55, Accuracy=0.845]


Epoch 20/30, Train Loss: 0.2182 Train Accuracy: 0.9249 , Val Loss: 0.5497 Val Accuracy: 0.8445


Epoch 21 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 133.96it/s, Loss=0.213, Accuracy=0.925]
Epoch 21 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 242.98it/s, Loss=0.571, Accuracy=0.831]


Epoch 21/30, Train Loss: 0.2132 Train Accuracy: 0.9251 , Val Loss: 0.5712 Val Accuracy: 0.8309


Epoch 22 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 134.76it/s, Loss=0.204, Accuracy=0.93]
Epoch 22 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 243.06it/s, Loss=0.525, Accuracy=0.848]


Epoch 22/30, Train Loss: 0.2039 Train Accuracy: 0.9304 , Val Loss: 0.5253 Val Accuracy: 0.8476


Epoch 23 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 133.54it/s, Loss=0.2, Accuracy=0.932]
Epoch 23 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 246.84it/s, Loss=0.547, Accuracy=0.842]


Epoch 23/30, Train Loss: 0.1998 Train Accuracy: 0.9319 , Val Loss: 0.5466 Val Accuracy: 0.8417


Epoch 24 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 133.33it/s, Loss=0.186, Accuracy=0.935]
Epoch 24 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 247.36it/s, Loss=0.55, Accuracy=0.845]


Epoch 24/30, Train Loss: 0.1861 Train Accuracy: 0.9355 , Val Loss: 0.5500 Val Accuracy: 0.8451


Epoch 25 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 134.29it/s, Loss=0.178, Accuracy=0.938]
Epoch 25 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 245.37it/s, Loss=0.57, Accuracy=0.844]


Epoch 25/30, Train Loss: 0.1778 Train Accuracy: 0.9380 , Val Loss: 0.5696 Val Accuracy: 0.8441


Epoch 26 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 133.82it/s, Loss=0.171, Accuracy=0.942]
Epoch 26 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 239.33it/s, Loss=0.55, Accuracy=0.848]


Epoch 26/30, Train Loss: 0.1713 Train Accuracy: 0.9423 , Val Loss: 0.5501 Val Accuracy: 0.8483


Epoch 27 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 136.02it/s, Loss=0.163, Accuracy=0.944]
Epoch 27 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 246.23it/s, Loss=0.574, Accuracy=0.846]


Epoch 27/30, Train Loss: 0.1632 Train Accuracy: 0.9445 , Val Loss: 0.5741 Val Accuracy: 0.8457


Epoch 28 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 134.76it/s, Loss=0.153, Accuracy=0.947]
Epoch 28 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 236.27it/s, Loss=0.55, Accuracy=0.853]


Epoch 28/30, Train Loss: 0.1534 Train Accuracy: 0.9474 , Val Loss: 0.5497 Val Accuracy: 0.8525


Epoch 29 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 134.75it/s, Loss=0.152, Accuracy=0.948]
Epoch 29 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 242.08it/s, Loss=0.593, Accuracy=0.841]


Epoch 29/30, Train Loss: 0.1521 Train Accuracy: 0.9482 , Val Loss: 0.5933 Val Accuracy: 0.8405


Epoch 30 [Training..]: 100%|██████████| 1329/1329 [00:09<00:00, 135.24it/s, Loss=0.147, Accuracy=0.951]
Epoch 30 [Validating]: 100%|██████████| 235/235 [00:00<00:00, 237.64it/s, Loss=0.588, Accuracy=0.843]

Epoch 30/30, Train Loss: 0.1465 Train Accuracy: 0.9507 , Val Loss: 0.5881 Val Accuracy: 0.8429





In [19]:
trained_model = trainer.get_trained_model()

predictor = Predictor(model=trained_model, device=device)
eval_metric = predictor.evaluate(test_loader)
print(f'test dataset evaluation:{eval_metric:.4f}')

[Evaluating]: 100%|██████████| 313/313 [00:01<00:00, 250.20it/s, Accuracy=0.844]

test dataset evaluation:0.8442



