<u>Unresolved Questions</u>

- <i>Geometric intuition of perceptron. How decision boundry change after changing the values of weights (Geometrically)? </i>
    - Without gradient descent and with gradient descent
    
- <i>Derive an equation of backpropogation in multiclass classification</i>
    - Need to consider two different cases - 1. i == j 2. i != j

- <i>When to use `categorical cross-entropy` and when to use `sparse categorical cross-entropy`</i>
    - categorical cross-entropy - when the output column is one-hot-encoded
    - sparse categorical cross-entropy - when the output column is lable encoded

- <i>Why GPU's are preferred to train neural networks over CPU's? What is the difference between TPU and GPU</i>
    - "GPUs are versatile sports cars good for all roads; TPUs are Formula 1 cars that only race on perfect tracks but are unbeatable there."

---

In [1]:
# Importing necessary libraries
import numpy as np
import pandas as pd

import torch
from torch import nn
from torch.utils.data import Dataset, DataLoader
from sklearn.model_selection import train_test_split
import torchvision.transforms as transforms # Transforming images

from typing import Tuple, Optional, Callable

In [2]:
# Check for GPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

Using device: cuda


In [3]:
# Dataset Class with Features
class FashionMNISTDataset(Dataset):
    """
    Dataset class with transforms, validation, and error handling
    """
    def __init__(self, 
                 features: torch.Tensor, 
                 labels: torch.Tensor, 
                 transform: Optional[Callable] = None, # Callable means it act as a function
                 normalize: bool = True,
                 mode: str = 'train') -> None:
        """
        Args:
            features: Feature tensor
            labels: Label tensor  
            transform: Optional transform to be applied on features
            normalize: Whether to normalize pixel values to [0,1]
            mode: 'train', 'val', or 'test' - affects data augmentation
        """
        super().__init__()
        
        # Input validation
        if features.shape[0] != labels.shape[0]:
            raise ValueError(f"Features and labels must have same length. "
                           f"Got {features.shape} and {labels.shape}")
        
        self.features = features
        self.labels = labels.long()  # Ensure labels are long type for CrossEntropy (Otherwise error)
        self.normalize = normalize
        self.mode = mode
        
        # Set default transforms if none provided
        if transform is None:
            self.transform = self._get_default_transforms() # Our own transformation function
        else:
            self.transform = transform # Function variable is passed which is callable
            
        # Normalize pixel values to [0,1] if requested
        if self.normalize:
            self.features = self.features.float() / 255.0
            
        print(f"Dataset created: {self.mode} mode, {len(self)} samples")
        
    def _get_default_transforms(self): # <- private method
        """Get default transforms based on mode"""
        if self.mode == 'train': # Transformation only applied while training
            # Data augmentation for training
            return transforms.Compose([
                transforms.RandomRotation(degrees=10),
                transforms.RandomHorizontalFlip(p=0.1),  # Less common for fashion items
                transforms.RandomAffine(degrees=0, translate=(0.1, 0.1)),
                transforms.Normalize(mean=[0.5], std=[0.5])  # Normalize to [-1, 1]
            ])
        else:
            # Only normalization for val/test
            return transforms.Compose([
                transforms.Normalize(mean=[0.5], std=[0.5]) # Normalization is a preprocessing so we need to apply it on test data as well.
            ])
    
    def __len__(self) -> int:
        return self.features.shape[0]
    
    def __getitem__(self, index: int) -> Tuple[torch.Tensor, torch.Tensor]:
        """Get item with error handling"""
        try:
            feature = self.features[index]
            label = self.labels[index]
            
            # Apply transforms if any
            if self.transform:
                feature = self.transform(feature)
                
            return feature, label
        except Exception as e:
            print(f"Error loading index {index}: {e}")
            # Return a default sample in case of error
            return self.features[0], self.labels[0]
    
    def get_class_distribution(self):
        """Get class distribution for analysis"""
        unique, counts = torch.unique(self.labels, return_counts=True) # Also calculates the frequency of each class
        return dict(zip(unique.tolist(), counts.tolist()))

In [4]:
# Data Loading and Splitting Pipeline
class DataManager:
    """
    Data management class for loading and splitting
    """
    def __init__(self, train_csv_path: str, test_csv_path: str, random_state: int = 42):
        self.train_csv_path = train_csv_path
        self.test_csv_path = test_csv_path
        self.random_state = random_state
        self.class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
                           'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
        
    def load_raw_data(self) -> Tuple[pd.DataFrame, pd.DataFrame]:
        """Load raw CSV data with error handling"""
        try:
            train_df = pd.read_csv(self.train_csv_path)
            test_df = pd.read_csv(self.test_csv_path)
            
            print(f"Loaded training data: {train_df.shape}")
            print(f"Loaded test data: {test_df.shape}")
            
            # Validate data
            self._validate_data(train_df, test_df)
            
            return train_df, test_df
        except Exception as e:
            raise FileNotFoundError(f"Error loading data: {e}")
    
    def _validate_data(self, train_df: pd.DataFrame, test_df: pd.DataFrame):
        """Validate loaded data"""
        # Check for missing values
        if train_df.isnull().any().any():
            warnings.warn("Training data contains missing values")
        if test_df.isnull().any().any():
            warnings.warn("Test data contains missing values")
            
        # Check label range
        train_labels = train_df.iloc[:, 0]
        test_labels = test_df.iloc[:, 0]
        
        if train_labels.min() < 0 or train_labels.max() > 9:
            raise ValueError("Training labels should be between 0-9") # Specific to this dataset
        if test_labels.min() < 0 or test_labels.max() > 9:
            raise ValueError("Test labels should be between 0-9")
    
    def create_stratified_splits(self, 
                               train_df: pd.DataFrame,
                               val_size: float = 0.2,
                               stratify: bool = True) -> Tuple[pd.DataFrame, pd.DataFrame]:
        """
        Create stratified train/validation split
        
        Args:
            train_df: Training dataframe
            val_size: Proportion for validation set
            stratify: Whether to maintain class distribution
        """
        features = train_df.iloc[:, 1:].values
        labels = train_df.iloc[:, 0].values
        
        if stratify:
            X_train, X_val, y_train, y_val = train_test_split(
                features, labels,
                test_size=val_size,
                stratify=labels,
                random_state=self.random_state
            )
        else:
            X_train, X_val, y_train, y_val = train_test_split(
                features, labels,
                test_size=val_size,
                random_state=self.random_state
            )
        
        # Convert back to DataFrames
        train_split = pd.DataFrame(np.column_stack([y_train, X_train]))
        val_split = pd.DataFrame(np.column_stack([y_val, X_val]))
        
        print(f"Training set: {train_split.shape}")
        print(f"Validation set: {val_split.shape}")
        
        return train_split, val_split
    
    def prepare_tensors(self, df: pd.DataFrame) -> Tuple[torch.Tensor, torch.Tensor]:
        """Convert DataFrame to tensors with proper shapes"""
        # Separate features and labels
        labels = torch.tensor(df.iloc[:, 0].values, dtype=torch.long)
        features = torch.tensor(df.iloc[:, 1:].values, dtype=torch.float32)
        
        # Reshape features to image format [N, 1, 28, 28]
        features = features.reshape(-1, 1, 28, 28)
        
        return features, labels
    
    # OPTIONAL
    def get_data_statistics(self, features: torch.Tensor, labels: torch.Tensor, name: str):
        """Print dataset statistics"""
        print(f"\n{name} Dataset Statistics:")
        print(f"Features shape: {features.shape}")
        print(f"Labels shape: {labels.shape}")
        print(f"Feature range: [{features.min():.2f}, {features.max():.2f}]")
        
        # Class distribution
        unique, counts = torch.unique(labels, return_counts=True)
        print("Class distribution:")
        for class_idx, count in zip(unique, counts):
            class_name = self.class_names[class_idx]
            print(f"  {class_idx} ({class_name}): {count} samples ({count/len(labels)*100:.1f}%)")

In [5]:
# Initialize Data Manager and Load Data
data_manager = DataManager(
    train_csv_path="data/fashion-mnist_train.csv",
    test_csv_path="data/fashion-mnist_test.csv",
    random_state=42
)

# Load raw data
train_df, test_df = data_manager.load_raw_data()

# Create stratified train/validation split
train_split, val_split = data_manager.create_stratified_splits(
    train_df, 
    val_size=0.15,  # 15% for validation
    stratify=True
)

# Prepare tensors
train_features, train_labels = data_manager.prepare_tensors(train_split)
val_features, val_labels = data_manager.prepare_tensors(val_split)
test_features, test_labels = data_manager.prepare_tensors(test_df)

# Print statistics
data_manager.get_data_statistics(train_features, train_labels, "Training")
data_manager.get_data_statistics(val_features, val_labels, "Validation") 
data_manager.get_data_statistics(test_features, test_labels, "Test")

Loaded training data: (60000, 785)
Loaded test data: (10000, 785)
Training set: (51000, 785)
Validation set: (9000, 785)

Training Dataset Statistics:
Features shape: torch.Size([51000, 1, 28, 28])
Labels shape: torch.Size([51000])
Feature range: [0.00, 255.00]
Class distribution:
  0 (T-shirt/top): 5100 samples (10.0%)
  1 (Trouser): 5100 samples (10.0%)
  2 (Pullover): 5100 samples (10.0%)
  3 (Dress): 5100 samples (10.0%)
  4 (Coat): 5100 samples (10.0%)
  5 (Sandal): 5100 samples (10.0%)
  6 (Shirt): 5100 samples (10.0%)
  7 (Sneaker): 5100 samples (10.0%)
  8 (Bag): 5100 samples (10.0%)
  9 (Ankle boot): 5100 samples (10.0%)

Validation Dataset Statistics:
Features shape: torch.Size([9000, 1, 28, 28])
Labels shape: torch.Size([9000])
Feature range: [0.00, 255.00]
Class distribution:
  0 (T-shirt/top): 900 samples (10.0%)
  1 (Trouser): 900 samples (10.0%)
  2 (Pullover): 900 samples (10.0%)
  3 (Dress): 900 samples (10.0%)
  4 (Coat): 900 samples (10.0%)
  5 (Sandal): 900 samples 

In [6]:
# Training dataset with augmentation
train_dataset = FashionMNISTDataset(
    features=train_features,
    labels=train_labels,
    mode='train',
    normalize=True
)

# Validation dataset without augmentation
val_dataset = FashionMNISTDataset(
    features=val_features,
    labels=val_labels,
    mode='val',
    normalize=True
)

# Test dataset without augmentation
test_dataset = FashionMNISTDataset(
    features=test_features,
    labels=test_labels,
    mode='test',
    normalize=True
)

# Print class distributions
print("\nClass distributions:")
print("Training:", train_dataset.get_class_distribution())
print("Validation:", val_dataset.get_class_distribution())
print("Test:", test_dataset.get_class_distribution())

Dataset created: train mode, 51000 samples
Dataset created: val mode, 9000 samples
Dataset created: test mode, 10000 samples

Class distributions:
Training: {0: 5100, 1: 5100, 2: 5100, 3: 5100, 4: 5100, 5: 5100, 6: 5100, 7: 5100, 8: 5100, 9: 5100}
Validation: {0: 900, 1: 900, 2: 900, 3: 900, 4: 900, 5: 900, 6: 900, 7: 900, 8: 900, 9: 900}
Test: {0: 1000, 1: 1000, 2: 1000, 3: 1000, 4: 1000, 5: 1000, 6: 1000, 7: 1000, 8: 1000, 9: 1000}


In [7]:
# DataLoader Configuration
class DataLoaderManager:
    """Manager for creating optimized DataLoaders"""
    
    @staticmethod
    def create_dataloaders(train_dataset: Dataset, 
                           val_dataset: Dataset, 
                           test_dataset: Dataset,
                           batch_size: int = 64,
                           num_workers: int = None,
                           pin_memory: bool = None) -> Tuple[DataLoader, DataLoader, DataLoader]:
        """
        Create optimized DataLoaders with industrial settings
        
        Args:
            train_dataset, val_dataset, test_dataset: Dataset objects
            batch_size: Batch size for training
            num_workers: Number of worker processes
            pin_memory: Whether to pin memory for GPU transfer
        """
        
        # Auto-configure num_workers and pin_memory based on system
        if num_workers is None:
            num_workers = 0
            # num_workers = min(4, os.cpu_count())  # Conservative default
            
        if pin_memory is None:
            pin_memory = torch.cuda.is_available()
        
        # Training DataLoader - with shuffling
        train_loader = DataLoader(
            dataset=train_dataset,
            batch_size=batch_size,
            shuffle=True,  # Always shuffle training data
            num_workers=num_workers,
            pin_memory=pin_memory,
            drop_last=True,  # Drop incomplete batches for consistent batch norm
            persistent_workers=num_workers > 0  # Keep workers alive
        )
        
        # Validation DataLoader - no shuffling, potentially larger batch
        val_loader = DataLoader(
            dataset=val_dataset,
            batch_size=batch_size * 2,  # Can use larger batch for inference
            shuffle=False,
            num_workers=num_workers,
            pin_memory=pin_memory,
            drop_last=False
        )
        
        # Test DataLoader - no shuffling, potentially larger batch
        test_loader = DataLoader(
            dataset=test_dataset,
            batch_size=batch_size * 2,
            shuffle=False,
            num_workers=num_workers,
            pin_memory=pin_memory,
            drop_last=False
        )
        
        print(f"DataLoaders created:")
        print(f"  Training: {len(train_loader)} batches of size {batch_size}")
        print(f"  Validation: {len(val_loader)} batches of size {batch_size * 2}")
        print(f"  Test: {len(test_loader)} batches of size {batch_size * 2}")
        print(f"  Workers: {num_workers}, Pin Memory: {pin_memory}")
        
        return train_loader, val_loader, test_loader

# Create DataLoaders
batch_size = 64
train_loader, val_loader, test_loader = DataLoaderManager.create_dataloaders(
    train_dataset=train_dataset,
    val_dataset=val_dataset,
    test_dataset=test_dataset,
    batch_size=batch_size
)

# Test the DataLoaders
print("\nTesting DataLoaders:")
for batch_idx, (images, labels) in enumerate(train_loader):
    print(f"Training Batch {batch_idx}:")
    print(f"  Images shape: {images.shape}")
    print(f"  Labels shape: {labels.shape}")
    print(f"  Image range: [{images.min():.3f}, {images.max():.3f}]")
    print(f"  Unique labels: {torch.unique(labels).tolist()}")
    
    if batch_idx == 2:  # Just show first 3 batches
        break

DataLoaders created:
  Training: 796 batches of size 64
  Validation: 71 batches of size 128
  Test: 79 batches of size 128
  Workers: 0, Pin Memory: True

Testing DataLoaders:
Training Batch 0:
  Images shape: torch.Size([64, 1, 28, 28])
  Labels shape: torch.Size([64])
  Image range: [-1.000, 1.000]
  Unique labels: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Training Batch 1:
  Images shape: torch.Size([64, 1, 28, 28])
  Labels shape: torch.Size([64])
  Image range: [-1.000, 1.000]
  Unique labels: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Training Batch 2:
  Images shape: torch.Size([64, 1, 28, 28])
  Labels shape: torch.Size([64])
  Image range: [-1.000, 1.000]
  Unique labels: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


In [8]:
# Model Architecture
class ImprovedCustomModel(nn.Module):
    """
    Improved model architecture with better practices
    """
    def __init__(self, input_shape: tuple = (1, 28, 28), num_classes: int = 10, dropout_rate: float = 0.3):
        super().__init__()
        
        # Calculate flattened input size
        self.input_size = np.prod(input_shape)
        
        self.network = nn.Sequential(
            nn.Flatten(start_dim=1, end_dim=-1), # After flattening, input shape becomes [batch_size, 784] for 28x28 images
            
            # First hidden layer
            nn.Linear(in_features=self.input_size, out_features=512),
            nn.BatchNorm1d(num_features=512),
            nn.ReLU(),
            nn.Dropout(p=dropout_rate),
            
            # Second hidden layer
            nn.Linear(in_features=512, out_features=256),
            nn.BatchNorm1d(num_features=256),
            nn.ReLU(),
            nn.Dropout(p=dropout_rate),
            
            # Third hidden layer
            nn.Linear(in_features=256, out_features=128),
            nn.BatchNorm1d(num_features=128),
            nn.ReLU(),
            nn.Dropout(p=dropout_rate),
            
            # Output layer (NO SOFTMAX - handled by CrossEntropyLoss)
            nn.Linear(in_features=128, out_features=num_classes)
        )
        
        # Initialize weights
        self._initialize_weights()
    
    def _initialize_weights(self):
        """Initialize weights using He initialization for ReLU networks"""
        for module in self.modules():
            if isinstance(module, nn.Linear):
                nn.init.kaiming_normal_(module.weight, mode='fan_out', nonlinearity='relu')
                if module.bias is not None:
                    nn.init.constant_(module.bias, 0)
            elif isinstance(module, nn.BatchNorm1d):
                nn.init.constant_(module.weight, 1)
                nn.init.constant_(module.bias, 0)
    
    def forward(self, x):
        return self.network(x)

# Create model
model = ImprovedCustomModel(
    input_shape=(1, 28, 28),
    num_classes=10,
    dropout_rate=0.3
).to(device)

print(f"Model created with {sum(p.numel() for p in model.parameters()):,} parameters")
print(f"Trainable parameters: {sum(p.numel() for p in model.parameters() if p.requires_grad):,}")

Model created with 569,226 parameters
Trainable parameters: 569,226


In [10]:
# Training Loop with Validation
import torch.nn.functional as F
from tqdm import tqdm
import time

class Trainer:
    """Training class with validation and monitoring"""
    
    def __init__(self, model, train_loader, val_loader, criterion, optimizer, device, scheduler=None):
        self.model = model
        self.train_loader = train_loader
        self.val_loader = val_loader
        self.criterion = criterion
        self.optimizer = optimizer
        self.device = device
        self.scheduler = scheduler
        
        # Metrics tracking
        self.train_losses = []
        self.val_losses = []
        self.train_accuracies = []
        self.val_accuracies = []
        
    def train_epoch(self):
        """Train for one epoch"""
        self.model.train()
        total_loss = 0
        correct = 0
        total = 0
        
        pbar = tqdm(self.train_loader, desc='Training')
        for batch_idx, (images, labels) in enumerate(pbar):
            # Move data to device
            images, labels = images.to(self.device), labels.to(self.device)
            
            # Forward pass
            outputs = self.model(images)
            loss = self.criterion(outputs, labels)
            
            # Backward pass
            self.optimizer.zero_grad()
            loss.backward()
            
            # Gradient clipping (optional but good practice)
            torch.nn.utils.clip_grad_norm_(self.model.parameters(), max_norm=1.0)
            
            self.optimizer.step()
            
            # Statistics
            total_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
            
            # Update progress bar
            pbar.set_postfix({
                'Loss': f'{loss.item():.4f}',
                'Acc': f'{100.*correct/total:.2f}%'
            })
        
        avg_loss = total_loss / len(self.train_loader)
        accuracy = 100. * correct / total
        
        return avg_loss, accuracy
    
    def validate_epoch(self):
        """Validate for one epoch"""
        self.model.eval()
        total_loss = 0
        correct = 0
        total = 0
        
        with torch.no_grad():
            pbar = tqdm(self.val_loader, desc='Validation')
            for batch_idx, (images, labels) in enumerate(pbar):
                images, labels = images.to(self.device), labels.to(self.device)
                
                outputs = self.model(images)
                loss = self.criterion(outputs, labels)
                
                total_loss += loss.item()
                _, predicted = torch.max(outputs.data, 1)
                total += labels.size(0)
                correct += (predicted == labels).sum().item()
                
                pbar.set_postfix({
                    'Loss': f'{loss.item():.4f}',
                    'Acc': f'{100.*correct/total:.2f}%'
                })
        
        avg_loss = total_loss / len(self.val_loader)
        accuracy = 100. * correct / total
        
        return avg_loss, accuracy
    
    def train(self, epochs):
        """Complete training loop with validation"""
        print(f"Starting training for {epochs} epochs...")
        start_time = time.time()
        
        best_val_acc = 0
        patience = 5
        patience_counter = 0
        
        for epoch in range(epochs):
            print(f"\nEpoch [{epoch+1}/{epochs}]")
            
            # Training
            train_loss, train_acc = self.train_epoch()
            
            # Validation
            val_loss, val_acc = self.validate_epoch()
            
            # Update learning rate scheduler
            if self.scheduler:
                self.scheduler.step(val_loss)
            
            # Store metrics
            self.train_losses.append(train_loss)
            self.val_losses.append(val_loss)
            self.train_accuracies.append(train_acc)
            self.val_accuracies.append(val_acc)
            
            # Print epoch summary
            print(f"Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.2f}%")
            print(f"Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.2f}%")
            
            # Early stopping
            if val_acc > best_val_acc:
                best_val_acc = val_acc
                patience_counter = 0
                # Save best model
                torch.save(self.model.state_dict(), 'best_model.pth')
                print(f"New best validation accuracy: {best_val_acc:.2f}%")
            else:
                patience_counter += 1
                
            if patience_counter >= patience:
                print(f"Early stopping after {epoch+1} epochs")
                break
        
        total_time = time.time() - start_time
        print(f"\nTraining completed in {total_time:.2f} seconds")
        print(f"Best validation accuracy: {best_val_acc:.2f}%")

# Setup training components
learning_rate = 0.001  # Reduced learning rate
epochs = 5

# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.AdamW(  # AdamW is often better than Adam
    model.parameters(), 
    lr=learning_rate,
    weight_decay=1e-4  # L2 regularization
)

# Learning rate scheduler
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
    optimizer, 
    mode='min', 
    factor=0.5, 
    patience=3
)

# Create trainer and start training
trainer = Trainer(
    model=model,
    train_loader=train_loader,
    val_loader=val_loader,
    criterion=criterion,
    optimizer=optimizer,
    device=device,
    scheduler=scheduler
)

# Train the model
trainer.train(epochs=epochs)

Starting training for 5 epochs...

Epoch [1/5]


Training: 100%|██████████| 796/796 [01:07<00:00, 11.78it/s, Loss=0.8317, Acc=60.68%]
Validation: 100%|██████████| 71/71 [00:01<00:00, 55.14it/s, Loss=0.4429, Acc=77.91%]


Train Loss: 1.3248, Train Acc: 60.68%
Val Loss: 0.6067, Val Acc: 77.91%
New best validation accuracy: 77.91%

Epoch [2/5]


Training: 100%|██████████| 796/796 [01:01<00:00, 12.86it/s, Loss=0.6742, Acc=70.13%]
Validation: 100%|██████████| 71/71 [00:01<00:00, 59.22it/s, Loss=0.3809, Acc=79.19%]


Train Loss: 0.8086, Train Acc: 70.13%
Val Loss: 0.5413, Val Acc: 79.19%
New best validation accuracy: 79.19%

Epoch [3/5]


Training: 100%|██████████| 796/796 [01:01<00:00, 12.97it/s, Loss=0.4860, Acc=73.38%]
Validation: 100%|██████████| 71/71 [00:01<00:00, 60.05it/s, Loss=0.3555, Acc=80.61%]


Train Loss: 0.7122, Train Acc: 73.38%
Val Loss: 0.5079, Val Acc: 80.61%
New best validation accuracy: 80.61%

Epoch [4/5]


Training: 100%|██████████| 796/796 [01:00<00:00, 13.19it/s, Loss=0.5397, Acc=75.03%]
Validation: 100%|██████████| 71/71 [00:01<00:00, 54.42it/s, Loss=0.3347, Acc=81.39%]


Train Loss: 0.6738, Train Acc: 75.03%
Val Loss: 0.4866, Val Acc: 81.39%
New best validation accuracy: 81.39%

Epoch [5/5]


Training: 100%|██████████| 796/796 [01:02<00:00, 12.83it/s, Loss=0.7258, Acc=76.24%]
Validation: 100%|██████████| 71/71 [00:01<00:00, 56.39it/s, Loss=0.3508, Acc=82.27%]

Train Loss: 0.6431, Train Acc: 76.24%
Val Loss: 0.4687, Val Acc: 82.27%
New best validation accuracy: 82.27%

Training completed in 319.64 seconds
Best validation accuracy: 82.27%





In [11]:
# Testing with Detailed Metrics
from sklearn.metrics import classification_report

class ModelEvaluator:
    """Industrial-grade model evaluation class"""
    
    def __init__(self, model, test_loader, device, class_names):
        self.model = model
        self.test_loader = test_loader
        self.device = device
        self.class_names = class_names
    
    def evaluate(self):
        """Comprehensive model evaluation"""
        self.model.eval()
        all_predictions = []
        all_labels = []
        total_loss = 0
        
        criterion = nn.CrossEntropyLoss()
        
        print("Evaluating model on test set...")
        with torch.no_grad():
            for images, labels in tqdm(self.test_loader, desc='Testing'):
                images, labels = images.to(self.device), labels.to(self.device)
                
                outputs = self.model(images)
                loss = criterion(outputs, labels)
                total_loss += loss.item()
                
                _, predicted = torch.max(outputs, 1)
                
                all_predictions.extend(predicted.cpu().numpy())
                all_labels.extend(labels.cpu().numpy())
        
        # Calculate metrics
        accuracy = 100. * sum(p == l for p, l in zip(all_predictions, all_labels)) / len(all_labels)
        avg_loss = total_loss / len(self.test_loader)
        
        print(f"\nTest Results:")
        print(f"Average Loss: {avg_loss:.4f}")
        print(f"Accuracy: {accuracy:.2f}%")
        
        # Detailed classification report
        print("\nClassification Report:")
        report = classification_report(
            all_labels, 
            all_predictions, 
            target_names=self.class_names,
            digits=3
        )
        print(report)
        
        return {
            'accuracy': accuracy,
            'loss': avg_loss,
            'predictions': all_predictions,
            'labels': all_labels,
            'report': report
        }

# Load best model and evaluate
model.load_state_dict(torch.load('best_model.pth'))

evaluator = ModelEvaluator(
    model=model,
    test_loader=test_loader,
    device=device,
    class_names=data_manager.class_names
)

results = evaluator.evaluate()

Evaluating model on test set...


Testing:   0%|          | 0/79 [00:00<?, ?it/s]

Testing: 100%|██████████| 79/79 [00:01<00:00, 75.17it/s]


Test Results:
Average Loss: 0.4585
Accuracy: 82.55%

Classification Report:
              precision    recall  f1-score   support

 T-shirt/top      0.733     0.822     0.775      1000
     Trouser      0.982     0.959     0.970      1000
    Pullover      0.699     0.728     0.713      1000
       Dress      0.801     0.905     0.850      1000
        Coat      0.736     0.807     0.770      1000
      Sandal      0.964     0.834     0.894      1000
       Shirt      0.661     0.416     0.511      1000
     Sneaker      0.848     0.880     0.864      1000
         Bag      0.946     0.947     0.947      1000
  Ankle boot      0.871     0.957     0.912      1000

    accuracy                          0.826     10000
   macro avg      0.824     0.826     0.820     10000
weighted avg      0.824     0.826     0.820     10000






### **Possible Reasons for Sub-100 % Accuracy**

<i>Why might the network never reach perfect accuracy?</i>

1. **Data-related limitations**
    1. **Insufficient training examples** – too few samples prevent the model from learning robust patterns.  
    2. **Excessive intrinsic complexity** – the underlying function is highly non-linear or chaotic, exceeding the data’s explanatory power.  
    3. **Non-representative sampling** – training set distribution diverges from real-world or test distribution.  
    4. **Irreducible label noise or measurement error** – random errors in targets set an unavoidable error floor.

2. **Model-related constraints**
    1. **Mismatched inductive bias** – architectural assumptions (linearity, locality, etc.) clash with true data structure.  
    2. **Under-capacity** – model too simple to express the necessary decision boundaries.  
    3. **Sub-optimal hyper-parameters** – learning rate, depth, width, regularization, etc. are not tuned to the problem’s sweet spot.

3. **Operational & economic barriers**
    1. **Limited compute/GPU budget** – restricted training time or hardware caps achievable convergence.  
    2. **Diminishing returns** – each extra 1 % accuracy may demand exponentially more data, parameters, or FLOPs.  
    3. **Annotation inconsistencies** – conflicting or subjective labels inject noise that no optimizer can eliminate.