# CNN Average Model Perfomance
This notebook was used to get four instantiations of a model with differing training and testing sampling to get an idea of how this CNN will perform on average. The performance of these four instances are used to help determine the correctness of our bias metrics. Additionally, the last set of models is used to test a potential mitigation tactic. There are five types of training and testing used to get a wider range of data biases.

The five types of training/testing splits are:
1. training sampled from 1st half of 0.1; test sampled from 2nd half of 0.1 (9900 train - 990 test)
2. training sampled from 1st half of 0.9; test sampled from 2nd half of 0.9 (9900 train - 990 test)
3. training sampled from all of 0.1; test sampled from "full/test"          (9960 train - 1000 test)
4. training sampled from all of 0.9; test sampled from "full/test"          (9960 train - 1000 test)
5. 50% training sampled from 0.1, 50% training sampled from 09; test sampled from "full/test" (9996 train - 1000 test)

Each of these types had four instantations to ensure results are not from chance of a sampling, making a total of 20 models.

Note that several changes have been made from CNN_CrossEntropy.ipynb to get this version:
- added additional convolutional layer (3 convolutional layers total)
- added adaptive average pooling layer to ensure a 5x5 shape no matter the input size and to reduce noise
- added dropout layer to randomly set 50% of features to 0 to prevent overfitting (prevent overreliance on certain neurons)
- during model training, added learning rate scheduler that will reduce the learning rate by factor of 0.5 every 10 epochs, which will speed up learning at the beginning then slow down later
- model training uses 50 epochs
- evaluate_model_classwise() function that will evaluate the classwisew accuracy of a model
- run_biased_mnist_cnn_subset() now has commented options for sampling from first/second half of a dataset for train/test
- run_biased_mnist_cnn_subset() will now return the train and test accuracies for both overall and classwise
- a small cell with a for-loop to run four model instantations
- a small cell to cleanly print out the results of the four model instantiations

The final results of the 16 models can be found at the bottom of this notebook underneath "OFFICIAL RUN FOR 4 MODELS" section.

In [288]:
import os
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import json
from PIL import Image
from torch.utils.data import Dataset, DataLoader, Subset, ConcatDataset
from torchvision import transforms
from tqdm import tqdm
import psutil
import threading
import time
import torchvision
import torch.nn.functional as F
from torch.optim.lr_scheduler import StepLR

In [228]:
"""
# Biased MNIST CNN Training - CrossEntropy Version

## Model Architecture:
- SimpleCNN with 2 convolutional layers followed by a fully connected layer
- Input: RGB images (3 channels, 160x160)
- Conv1: 3 -> 16 channels, 3x3 kernel, stride 2, padding 1, followed by ReLU and MaxPool
- Conv2: 16 -> 32 channels, 3x3 kernel, stride 2, padding 1, followed by ReLU and MaxPool
- Final feature map size: 32 x 10 x 10 
- Fully connected layer: 3200 -> 10 (one for each digit)

## Loss Function:
- CrossEntropyLoss
- Works directly with class indices (0-9)
- Good for multi-class classification problems

## Training Details:
- Optimizer: Adam with learning rate 0.001
- Batch size: 32
- Epochs: 20 for subset training (10% of training data)
- Dataset: Biased MNIST with correlation level 0.5
- Uses data normalization based on dataset statistics

## How to Run:
1. Place your biased MNIST dataset in the correct paths:
   - Default paths assume: 'biased_mnist/full_0.5/trainval'
2. Run the subset training function for faster iteration:
   - run_biased_mnist_cnn_subset()

## Expected Outputs:
- Training/validation accuracies and loss will be displayed during training
- Model will be saved as 'simple_biased_mnist_cnn_subset_05.pth'

## Notes:
- Memory monitoring is included to track RAM usage
- You can adjust the correlation level by changing paths and filenames
- The test on Standard MNIST probably isn't a good idea I am not sure yet. 
"""

"\n# Biased MNIST CNN Training - CrossEntropy Version\n\n## Model Architecture:\n- SimpleCNN with 2 convolutional layers followed by a fully connected layer\n- Input: RGB images (3 channels, 160x160)\n- Conv1: 3 -> 16 channels, 3x3 kernel, stride 2, padding 1, followed by ReLU and MaxPool\n- Conv2: 16 -> 32 channels, 3x3 kernel, stride 2, padding 1, followed by ReLU and MaxPool\n- Final feature map size: 32 x 10 x 10 \n- Fully connected layer: 3200 -> 10 (one for each digit)\n\n## Loss Function:\n- CrossEntropyLoss\n- Works directly with class indices (0-9)\n- Good for multi-class classification problems\n\n## Training Details:\n- Optimizer: Adam with learning rate 0.001\n- Batch size: 32\n- Epochs: 20 for subset training (10% of training data)\n- Dataset: Biased MNIST with correlation level 0.5\n- Uses data normalization based on dataset statistics\n\n## How to Run:\n1. Place your biased MNIST dataset in the correct paths:\n   - Default paths assume: 'biased_mnist/full_0.5/trainval'\n

In [229]:
# Memory monitoring class
class MemoryMonitor:
    def __init__(self, interval=1.0):
        self.interval = interval
        self.running = False
        self.thread = None
        self.max_memory = 0
        self.current_memory = 0
    
    def memory_monitor_func(self):
        while self.running:
            # Get memory info
            process = psutil.Process(os.getpid())
            memory_info = process.memory_info()
            memory_mb = memory_info.rss / (1024 * 1024)  # converting to mb
            
            self.current_memory = memory_mb
            self.max_memory = max(self.max_memory, memory_mb)
            time.sleep(self.interval)
    
    def start(self):
        self.running = True
        self.thread = threading.Thread(target=self.memory_monitor_func)
        self.thread.daemon = True
        self.thread.start()
    
    def stop(self):
        self.running = False
        if self.thread:
            self.thread.join(timeout=2.0)
    
    def get_memory_usage(self):
        return {
            'current': self.current_memory,
            'max': self.max_memory
        }

In [230]:
# Define the BiasedMNISTDataset class with JSON integration
class BiasedMNISTDataset(Dataset):
    def __init__(self, root_dir, transform=None, json_path=None):
        self.root_dir = root_dir
        self.transform = transform
        self.image_paths = []
        self.labels = []
        
        # load labels from JSON
        label_dict = {}
        if json_path and os.path.exists(json_path):
            try:
                with open(json_path, 'r') as f:
                    json_data = json.load(f)

                if isinstance(json_data, list):
                    for item in json_data:
                        if isinstance(item, dict) and 'index' in item and 'digit' in item:
                            label_dict[item['index']] = item['digit']
                print(f"Loaded {len(label_dict)} labels from JSON file")
            except Exception as e:
                print(f"Error loading JSON: {e}")
        
        # Load the images
        if os.path.exists(root_dir):
            for filename in os.listdir(root_dir):
                if filename.endswith('.jpg'):
                    self.image_paths.append(os.path.join(root_dir, filename))

                    try:
                        index = int(os.path.basename(filename).split('.')[0])
                        
                        # get label from JSON
                        if index in label_dict:
                            label = label_dict[index]
                        else:
                            label = index % 10
                        
                    except (ValueError, IndexError) as e:
                        print(f"Error parsing filename {filename}: {e}")
                        label = 0
                    
                    self.labels.append(label)
        else:
            print(f"Directory not found: {root_dir}")
    
    def __len__(self):
        return len(self.image_paths)
    
    def __getitem__(self, idx):
        img_path = self.image_paths[idx]
        image = Image.open(img_path)
        label = self.labels[idx]
        
        if self.transform:
            image = self.transform(image)
            
        return image, label

In [267]:
class SimpleCNN(nn.Module):
    def __init__(self, num_classes=10):
        super(SimpleCNN, self).__init__()
        
        # First conv layer: 3 -> 16 channels
        self.conv1 = nn.Conv2d(3, 16, kernel_size=3, stride=2, padding=1)
        self.bn1 = nn.BatchNorm2d(16)
        
        # Second conv layer: 16 -> 32 channels
        self.conv2 = nn.Conv2d(16, 32, kernel_size=3, stride=2, padding=1)
        self.bn2 = nn.BatchNorm2d(32)

        # Third conv layer: 32 -> 64 channels (added)
        self.conv3 = nn.Conv2d(32, 64, kernel_size=3, stride=2, padding=1)
        self.bn3 = nn.BatchNorm2d(64)
        
        # Dropout for regularization
        self.dropout = nn.Dropout(0.5)
        
        # Adaptive pooling to make input size more flexible
        self.pool = nn.AdaptiveAvgPool2d((5, 5))
        
        # Final linear layer: 64 x 5 x 5 -> num_classes
        self.fc = nn.Linear(64 * 5 * 5, num_classes)

    def forward(self, x):
        # Conv1 + BN + ReLU + MaxPool
        x = F.relu(self.bn1(self.conv1(x)))
        x = F.max_pool2d(x, 2)

        # Conv2 + BN + ReLU + MaxPool
        x = F.relu(self.bn2(self.conv2(x)))
        x = F.max_pool2d(x, 2)

        # Conv3 + BN + ReLU + MaxPool
        x = F.relu(self.bn3(self.conv3(x)))
        x = F.max_pool2d(x, 2)

        # Adaptive pooling
        x = self.pool(x)

        # Apply dropout
        x = self.dropout(x)

        # Flatten and fully connected layer
        x = x.view(x.size(0), -1)
        x = self.fc(x)
        
        return x


In [232]:
# Calculate dataset statistics with a small sample
def calculate_stats_fast(dataset, sample_size=1000):
    indices = torch.randperm(len(dataset))[:sample_size]
    
    # Create a temporary dataloader
    mini_loader = DataLoader(
        Subset(dataset, indices),
        batch_size=100,
        shuffle=False,
        num_workers=0
    )
    
    print(f"Calculating statistics using {sample_size} random samples (fast mode)...")
    
    channels_sum = torch.zeros(3)
    channels_squared_sum = torch.zeros(3)
    num_batches = 0
    
    # memory monitoring
    memory_monitor = MemoryMonitor()
    memory_monitor.start()
    
    try:
        # progress bar
        progress_bar = tqdm(mini_loader, desc="Calculating stats")
        
        for data, _ in progress_bar:
            # [batch_size, 3, height, width]
            channels_sum += torch.mean(data, dim=[0, 2, 3])
            channels_squared_sum += torch.mean(data**2, dim=[0, 2, 3])
            num_batches += 1
            
            mem_usage = memory_monitor.get_memory_usage()
            progress_bar.set_postfix({'RAM': f"{mem_usage['current']:.1f}MB"})
    
    finally:
        memory_monitor.stop()

        mem_usage = memory_monitor.get_memory_usage()
        print(f"Statistics calculation - Maximum RAM usage: {mem_usage['max']:.1f}MB")
    
    mean = channels_sum / num_batches
    std = (channels_squared_sum / num_batches - mean**2)**0.5
    
    print(f"Fast statistics calculation complete. Using sample of {sample_size} images.")
    return mean, std

def create_subset_loader(dataset, fraction=0.1, batch_size=32):
    subset_size = int(len(dataset) * fraction)
    indices = torch.randperm(len(dataset))[:subset_size]
    
    subset_dataset = Subset(dataset, indices)
    
    # DataLoader
    subset_loader = DataLoader(
        subset_dataset, 
        batch_size=batch_size,
        shuffle=True,
        num_workers=0
    )
    
    return subset_loader, subset_size

def check_label_range(dataset):
    try:
        sample_size = min(1000, len(dataset))
        indices = torch.randperm(len(dataset))[:sample_size]
        
        labels = [dataset[i.item()][1] for i in indices]
        min_label = min(labels)
        max_label = max(labels)
        
        print(f"Label range (from sample of {sample_size}): {min_label} to {max_label}")

        if max_label >= 10:
            print(f"WARNING: Found labels outside expected range (0-9): max={max_label}")
            return False
        return True
    except Exception as e:
        print(f"Error checking label range: {e}")
        return False

In [269]:
def train_model(model, train_loader, test_loader, num_epochs=50):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    print(f"Using device: {device}")
    model = model.to(device)

    # Using CrossEntropyLoss
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-4)

    # Learning rate scheduler
    scheduler = StepLR(optimizer, step_size=10, gamma=0.5)

    # Memory monitoring setup (optional, your code seems to include this)
    memory_monitor = MemoryMonitor()
    memory_monitor.start()

    try:
        for epoch in range(num_epochs):
            model.train()
            running_loss = 0.0
            correct = 0
            total = 0

            print(f"Epoch {epoch+1}/{num_epochs}")
            progress_bar = tqdm(train_loader, desc="Training")
            
            for inputs, labels in progress_bar:
                inputs, labels = inputs.to(device), labels.to(device)

                optimizer.zero_grad()

                outputs = model(inputs)

                # Calculate CrossEntropy loss directly with integer labels
                loss = criterion(outputs, labels)

                loss.backward()
                optimizer.step()
                
                running_loss += loss.item()

                _, predicted = torch.max(outputs.data, 1)
                total += labels.size(0)
                correct += (predicted == labels).sum().item()

                mem_usage = memory_monitor.get_memory_usage()

                progress_bar.set_postfix({
                    'loss': f"{loss.item():.4f}",
                    'acc': f"{100 * correct / total:.2f}%",
                    'RAM': f"{mem_usage['current']:.1f}MB"
                })

            epoch_loss = running_loss / len(train_loader)
            epoch_acc = 100 * correct / total
            mem_usage = memory_monitor.get_memory_usage()
            print(f'Epoch {epoch+1}/{num_epochs}, Loss: {epoch_loss:.4f}, Accuracy: {epoch_acc:.2f}%, Max RAM: {mem_usage["max"]:.1f}MB')

            # Evaluate on the test set
            test_accuracy = evaluate_model(model, test_loader, device, memory_monitor)[0]
            print(f'Test Accuracy: {test_accuracy:.2f}%')

            # Step the learning rate scheduler
            scheduler.step()
    
    finally:
        memory_monitor.stop()
        mem_usage = memory_monitor.get_memory_usage()
        print(f"Maximum RAM usage: {mem_usage['max']:.1f}MB")
    
    return model


In [236]:
# version that returns class-wise accuracy as well
def evaluate_model(model, test_loader, device=None, memory_monitor=None, num_classes=10, show_progress=True):
    if device is None:
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    
    model.eval()
    total_correct = 0
    total_samples = 0

    class_correct = [0 for _ in range(num_classes)]
    class_total = [0 for _ in range(num_classes)]

    progress_iter = test_loader
    if show_progress:
        progress_iter = tqdm(test_loader, desc="Evaluating")

    with torch.no_grad():
        for inputs, labels in progress_iter:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            _, predicted = torch.max(outputs.data, 1)

            total_samples += labels.size(0)
            total_correct += (predicted == labels).sum().item()

            for label, prediction in zip(labels, predicted):
                class_total[label.item()] += 1
                if label == prediction:
                    class_correct[label.item()] += 1

            if show_progress:
                postfix = {'acc': f"{100 * total_correct / total_samples:.2f}%"}
                if memory_monitor:
                    mem_usage = memory_monitor.get_memory_usage()
                    postfix['RAM'] = f"{mem_usage['current']:.1f}MB"
                progress_iter.set_postfix(postfix)

    accuracy = 100 * total_correct / total_samples
    classwise_accuracy = {
        f"class_{i}": 100 * class_correct[i] / class_total[i] if class_total[i] > 0 else 0.0
        for i in range(num_classes)
    }

    return accuracy, classwise_accuracy

In [237]:
# evalute model on train and test set; obtain overall accuracies and class-wise accuracies
def evaluate_model_classwise(model, train_loader, test_loader):
    train_accuracy, train_classwise = evaluate_model(model, train_loader, device=None, memory_monitor=None, num_classes=10)
    test_accuracy, test_classwise = evaluate_model(model, test_loader, device=None, memory_monitor=None, num_classes=10)

    print(f"Train Accuracy: {train_accuracy:.2f}%")
    print("Train Class-wise Accuracy:")
    for cls, acc in train_classwise.items():
        print(f"  {cls}: {acc:.2f}%")

    print(f"Test Accuracy: {test_accuracy:.2f}%")
    print("Test Class-wise Accuracy:")
    for cls, acc in test_classwise.items():
        print(f"  {cls}: {acc:.2f}%")
    
    return train_accuracy, train_classwise, test_accuracy, test_classwise

In [238]:
# CHANGES MADE:
# - Removed "mse" from the model filename

# Run the complete workflow with full dataset
def run_biased_mnist_cnn():
    # CHANGE THIS PATH AND RENAME MODEL FILE FOR EACH BIAS LEVEL
    base_dir = 'biased_mnist'
    train_folder = f"{base_dir}/full_0.5/trainval"  # Using full_0.5 for now
    test_folder = f"{base_dir}/full/test"
    train_json_path = f"{base_dir}/full_0.5/trainval.json" # Using full_0.5 for now
    test_json_path = f"{base_dir}/full/test.json"
    
    transform = transforms.Compose([
        transforms.ToTensor()
    ])
    
    print("Creating datasets...")
    train_dataset = BiasedMNISTDataset(train_folder, transform=transform, json_path=train_json_path)
    test_dataset = BiasedMNISTDataset(test_folder, transform=transform, json_path=test_json_path)

    print(f"Training dataset size: {len(train_dataset)}")
    print(f"Test dataset size: {len(test_dataset)}")
    
    if len(train_dataset) == 0 or len(test_dataset) == 0:
        print("Error: Empty dataset found. Please check the file paths.")
        return

    print("Checking label ranges...")
    train_labels_ok = check_label_range(train_dataset)
    test_labels_ok = check_label_range(test_dataset)
    
    if not (train_labels_ok and test_labels_ok):
        print("WARNING: Label range check failed. Please check the dataset.")

    mean, std = calculate_stats_fast(train_dataset, sample_size=1000)
    print(f"Dataset mean: {mean}")
    print(f"Dataset std: {std}")
    
    normalized_transform = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize(mean=mean, std=std)
    ])
    
    train_dataset_normalized = BiasedMNISTDataset(train_folder, transform=normalized_transform, json_path=train_json_path)
    test_dataset_normalized = BiasedMNISTDataset(test_folder, transform=normalized_transform, json_path=test_json_path)
    
    # dataloaders
    train_loader = DataLoader(train_dataset_normalized, batch_size=32, shuffle=True, num_workers=0)
    test_loader = DataLoader(test_dataset_normalized, batch_size=32, shuffle=False, num_workers=0)
    
    print(f"Using FULL training dataset with {len(train_dataset_normalized)} images")
    print(f"Using FULL test dataset with {len(test_dataset_normalized)} images")
    
    # the simplified CNN model
    model = SimpleCNN(num_classes=10)
    print("Model architecture:")
    print(model)
    
    # Train the model
    print("\nStarting training...")
    trained_model = train_model(model, train_loader, test_loader, num_epochs=3)
    
    # Final EVAL
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    final_accuracy, classwise_acc = evaluate_model(trained_model, test_loader, device, None, num_classes=10)
    print(f"\nFinal Test Accuracy: {final_accuracy:.2f}%")
    print(f"\nFinal Classwise Accuracy: {classwise_acc:.2f}")
    
    # Save model
    torch.save(trained_model.state_dict(), 'simple_biased_mnist_cnn_full.pth')
    print("Model saved as 'simple_biased_mnist_cnn_full.pth'")

In [293]:
# CHANGES MADE:
# - Removed "mse" from the model filename

# Run the complete workflow with subset of data
def run_biased_mnist_cnn_subset():
    # SET PATHS CORRECTLY FOR CORRECT BIAS LEVEL
    base_dir = 'biased_mnist'
    #train_folder = f"{base_dir}/full_0.9/trainval"  # Using full_0.5 for now
    #test_folder = f"{base_dir}/full/test"
    #test_folder = f"{base_dir}/full_0.9/trainval"
    #train_json_path = f"{base_dir}/full_0.9/trainval.json" # Change when changing correlation levels
    #test_json_path = f"{base_dir}/full/test.json"
    #test_json_path = f"{base_dir}/full_0.9/trainval.json"
    train_folder1 = f"{base_dir}/full_0.1/trainval"
    train_folder2 = f"{base_dir}/full_0.9/trainval"
    test_folder = f"{base_dir}/full/test"
    train_json_path1 = f"{base_dir}/full_0.1/trainval.json"
    train_json_path2 = f"{base_dir}/full_0.9/trainval.json"
    test_json_path = f"{base_dir}/full/test.json"
    
    transform = transforms.Compose([
        transforms.ToTensor()
    ])
    
    print("Creating datasets...")
    train_dataset1 = BiasedMNISTDataset(train_folder1, transform=transform, json_path=train_json_path1)
    train_dataset2 = BiasedMNISTDataset(train_folder2, transform=transform, json_path=train_json_path2)
    test_dataset = BiasedMNISTDataset(test_folder, transform=transform, json_path=test_json_path)
    train_dataset = ConcatDataset([train_dataset1, train_dataset2]) 


    print(f"Training dataset size: {len(train_dataset)}")
    print(f"Test dataset size: {len(test_dataset)}")
    
    if len(train_dataset) == 0 or len(test_dataset) == 0:
        print("Error: Empty dataset found. Please check the file paths.")
        return

    print("Checking label ranges...")
    train_labels_ok = check_label_range(train_dataset)
    test_labels_ok = check_label_range(test_dataset)
    
    if not (train_labels_ok and test_labels_ok):
        print("WARNING: Label range check failed. Please check the dataset.")

    mean, std = calculate_stats_fast(train_dataset, sample_size=1000)
    print(f"Dataset mean: {mean}")
    print(f"Dataset std: {std}")

    normalized_transform = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize(mean=mean, std=std)
    ])
    
    train_dataset_normalized1 = BiasedMNISTDataset(train_folder1, transform=normalized_transform, json_path=train_json_path1)
    train_dataset_normalized2 = BiasedMNISTDataset(train_folder2, transform=normalized_transform, json_path=train_json_path2)
    train_dataset_normalized = ConcatDataset([train_dataset_normalized1, train_dataset_normalized2]) 
    test_dataset_normalized = BiasedMNISTDataset(test_folder, transform=normalized_transform, json_path=test_json_path)

    train_fraction = 0.0833  # Use 10% of training data (10,000 IMAGES)
    #test_fraction = 0.2   # Use 20% of test data (2,000 IMAGES)
    test_fraction = 0.1   # Use 10% of test data (1,000 IMAGES)
    #train_fraction = 0.33   # use 9900 of train data from 0.1 set
    #test_fraction = 0.033   # Use 1% of test data (9900 IMAGES) from same 0.1 set

    # Split indices
    half_len_train = len(train_dataset_normalized) // 2
    half_len_test = len(test_dataset_normalized) // 2

    # Create Subsets
    #train_half_dataset = Subset(train_dataset_normalized, list(range(half_len_train)))
    #test_half_dataset = Subset(test_dataset_normalized, list(range(half_len_test, len(test_dataset_normalized))))
    
    train_loader, train_subset_size = create_subset_loader(
        train_dataset_normalized, 
        #train_half_dataset, # only sample from first half of dataset for train
        fraction=train_fraction, 
        batch_size=32
    )
    
    test_loader, test_subset_size = create_subset_loader(
        test_dataset_normalized, 
        #test_half_dataset, #only sample from second half
        fraction=test_fraction, 
        batch_size=32
    )
    
    print(f"Using {train_subset_size} training images ({train_fraction*100:.1f}% of dataset)")
    print(f"Using {test_subset_size} test images ({test_fraction*100:.1f}% of dataset)")
    
    # the simplified CNN model
    model = SimpleCNN(num_classes=10)
    print("Model architecture:")
    print(model)
    
    # Train the model
    print("\nStarting training...")
    trained_model = train_model(model, train_loader, test_loader, num_epochs=50)
    
    # Final EVAL
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    final_accuracy, classwise_acc = evaluate_model(trained_model, test_loader, device, None, num_classes=10)
    print(f"\nFinal Test Accuracy: {final_accuracy:.2f}%")
    print(f"\nFinal Classwise Accuracy: {classwise_acc}")
    
    # Save model
    torch.save(trained_model.state_dict(), 'simple_biased_mnist_cnn_subset_01.pth')
    print("Model saved as 'simple_biased_mnist_cnn_subset_01.pth'")

    # return accuracies
    return evaluate_model_classwise(trained_model, train_loader, test_loader)

In [240]:
def load_trained_model(model_path):
    model = SimpleCNN(num_classes=10)
    model.load_state_dict(torch.load(model_path))
    model.eval()
    return model

def load_standard_mnist():
    # Standard normalization for MNIST
    transform = transforms.Compose([
        transforms.Resize((160, 160)),  # Resize to match our biased MNIST images
        transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))
    ])

    mnist_test = torchvision.datasets.MNIST(
        root='./data', 
        train=False, 
        download=True, 
        transform=transform
    )
    
    class MNISTtoRGB(torch.utils.data.Dataset):
        def __init__(self, mnist_dataset):
            self.mnist_dataset = mnist_dataset
            
        def __len__(self):
            return len(self.mnist_dataset)
            
        def __getitem__(self, idx):
            img, label = self.mnist_dataset[idx]
            rgb_img = torch.cat([img, img, img], dim=0)
            return rgb_img, label
    
    rgb_mnist_test = MNISTtoRGB(mnist_test)
    return rgb_mnist_test

In [241]:
# CHANGES MADE:
# - Removed "mse" from model filenames

def test_on_standard_mnist():
    model_01_path = 'simple_biased_mnist_cnn_subset_01.pth'
    model_05_path = 'simple_biased_mnist_cnn_subset_05.pth'
    try:
        model_01 = load_trained_model(model_01_path)
        print(f"Successfully loaded model trained on correlation level 0.1")
    except:
        print(f"Could not load model from {model_01_path}")
        model_01 = None
        
    try:
        model_05 = load_trained_model(model_05_path)
        print(f"Successfully loaded model trained on correlation level 0.5")
    except:
        print(f"Could not load model from {model_05_path}")
        model_05 = None

    print("Loading standard MNIST test set...")
    mnist_dataset = load_standard_mnist()
    mnist_loader = DataLoader(mnist_dataset, batch_size=64, shuffle=False, num_workers=0)
    print(f"Loaded {len(mnist_dataset)} standard MNIST test images")

    print("\nEvaluating models on standard (unbiased) MNIST:")
    
    results = {}
    
    if model_01 is not None:
        print("\nEvaluating model trained on correlation level 0.1:")
        accuracy_01 = evaluate_model(model_01, mnist_loader, None, None, num_classes=10)[0]
        results["0.1"] = accuracy_01
        print(f"Accuracy on standard MNIST: {accuracy_01:.2f}%")
        
    if model_05 is not None:
        print("\nEvaluating model trained on correlation level 0.5:")
        accuracy_05 = evaluate_model(model_05, mnist_loader, None, None, num_classes=10)[0]
        results["0.5"] = accuracy_05
        print(f"Accuracy on standard MNIST: {accuracy_05:.2f}%")
    
    # Print summary
    print("\nSummary of model performance on standard MNIST:")
    print("-" * 50)
    print("| Correlation Level | Biased Test Acc | Standard MNIST Acc |")
    print("|-------------------|-----------------|-------------------|")
    
    if "0.1" in results:
        print(f"| 0.1               | 37.55%          | {results['0.1']:.2f}%             |")
    
    if "0.5" in results:
        print(f"| 0.5               | 18.00%          | {results['0.5']:.2f}%             |")
    
    print("-" * 50)
    print("Note: A model that relies heavily on bias features will perform poorly on standard MNIST")

In [242]:
# Execute the function for full dataset training
# run_biased_mnist_cnn()

# Execute the function for subset training
#run_biased_mnist_cnn_subset()

# Run the test on standard mnist as counterfactual set
# test_on_standard_mnist()

In [294]:
tot_train_accuracies = []
classwise_train_accuracies = []
tot_test_accuracies = []
classwise_test_accuracies = []
num_models = 4

for i in range(num_models):
    trainA, class_trainA, testA, class_testA = run_biased_mnist_cnn_subset()
    tot_train_accuracies.append(trainA)
    classwise_train_accuracies.append(list(class_trainA.values()))
    tot_test_accuracies.append(testA)
    classwise_test_accuracies.append(list(class_testA.values()))

Creating datasets...
Loaded 60000 labels from JSON file
Loaded 60000 labels from JSON file
Loaded 10000 labels from JSON file
Training dataset size: 120000
Test dataset size: 10000
Checking label ranges...
Label range (from sample of 1000): 0 to 9
Label range (from sample of 1000): 0 to 9
Calculating statistics using 1000 random samples (fast mode)...


Calculating stats: 100%|███████████| 10/10 [00:00<00:00, 12.01it/s, RAM=471.6MB]


Statistics calculation - Maximum RAM usage: 471.6MB
Fast statistics calculation complete. Using sample of 1000 images.
Dataset mean: tensor([0.0601, 0.0498, 0.0577])
Dataset std: tensor([0.1498, 0.1411, 0.1414])
Loaded 60000 labels from JSON file
Loaded 60000 labels from JSON file
Loaded 10000 labels from JSON file
Using 9996 training images (8.3% of dataset)
Using 1000 test images (10.0% of dataset)
Model architecture:
SimpleCNN(
  (conv1): Conv2d(3, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
  (bn1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (conv2): Conv2d(16, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
  (bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (conv3): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
  (bn3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (dropout): Dropout(p=0.5, inplace=False)
  (pool): AdaptiveAvgPool

Training: 100%|█| 313/313 [00:26<00:00, 11.76it/s, loss=1.8317, acc=46.43%, RAM=


Epoch 1/50, Loss: 1.7501, Accuracy: 46.43%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.51it/s, acc=14.70%, RAM=214.4MB]


Test Accuracy: 14.70%
Epoch 2/50


Training: 100%|█| 313/313 [00:27<00:00, 11.49it/s, loss=1.5800, acc=55.90%, RAM=


Epoch 2/50, Loss: 1.3969, Accuracy: 55.90%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.27it/s, acc=27.10%, RAM=232.8MB]


Test Accuracy: 27.10%
Epoch 3/50


Training: 100%|█| 313/313 [00:26<00:00, 11.73it/s, loss=0.9426, acc=61.21%, RAM=


Epoch 3/50, Loss: 1.1993, Accuracy: 61.21%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.30it/s, acc=34.90%, RAM=222.7MB]


Test Accuracy: 34.90%
Epoch 4/50


Training: 100%|█| 313/313 [00:27<00:00, 11.20it/s, loss=1.7838, acc=67.05%, RAM=


Epoch 4/50, Loss: 1.0187, Accuracy: 67.05%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.51it/s, acc=38.90%, RAM=252.2MB]


Test Accuracy: 38.90%
Epoch 5/50


Training: 100%|█| 313/313 [00:26<00:00, 11.89it/s, loss=0.6032, acc=71.71%, RAM=


Epoch 5/50, Loss: 0.8665, Accuracy: 71.71%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.55it/s, acc=41.30%, RAM=231.2MB]


Test Accuracy: 41.30%
Epoch 6/50


Training: 100%|█| 313/313 [00:26<00:00, 11.83it/s, loss=0.7891, acc=75.03%, RAM=


Epoch 6/50, Loss: 0.7594, Accuracy: 75.03%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.71it/s, acc=48.70%, RAM=217.4MB]


Test Accuracy: 48.70%
Epoch 7/50


Training: 100%|█| 313/313 [00:26<00:00, 11.94it/s, loss=0.2578, acc=76.95%, RAM=


Epoch 7/50, Loss: 0.6897, Accuracy: 76.95%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.57it/s, acc=51.20%, RAM=248.8MB]


Test Accuracy: 51.20%
Epoch 8/50


Training: 100%|█| 313/313 [00:27<00:00, 11.59it/s, loss=0.6812, acc=79.71%, RAM=


Epoch 8/50, Loss: 0.6152, Accuracy: 79.71%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.25it/s, acc=50.40%, RAM=159.3MB]


Test Accuracy: 50.40%
Epoch 9/50


Training: 100%|█| 313/313 [00:26<00:00, 11.87it/s, loss=0.8889, acc=81.30%, RAM=


Epoch 9/50, Loss: 0.5629, Accuracy: 81.30%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.69it/s, acc=56.70%, RAM=224.0MB]


Test Accuracy: 56.70%
Epoch 10/50


Training: 100%|█| 313/313 [00:26<00:00, 11.82it/s, loss=0.4250, acc=82.24%, RAM=


Epoch 10/50, Loss: 0.5204, Accuracy: 82.24%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 13.28it/s, acc=54.80%, RAM=154.4MB]


Test Accuracy: 54.80%
Epoch 11/50


Training: 100%|█| 313/313 [00:26<00:00, 11.89it/s, loss=0.6331, acc=86.37%, RAM=


Epoch 11/50, Loss: 0.4165, Accuracy: 86.37%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.81it/s, acc=61.20%, RAM=217.8MB]


Test Accuracy: 61.20%
Epoch 12/50


Training: 100%|█| 313/313 [00:26<00:00, 11.99it/s, loss=0.1604, acc=87.85%, RAM=


Epoch 12/50, Loss: 0.3709, Accuracy: 87.85%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.12it/s, acc=58.60%, RAM=235.2MB]


Test Accuracy: 58.60%
Epoch 13/50


Training: 100%|█| 313/313 [00:26<00:00, 11.68it/s, loss=0.2411, acc=88.54%, RAM=


Epoch 13/50, Loss: 0.3422, Accuracy: 88.54%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.54it/s, acc=60.40%, RAM=180.5MB]


Test Accuracy: 60.40%
Epoch 14/50


Training: 100%|█| 313/313 [00:26<00:00, 11.82it/s, loss=0.2263, acc=89.54%, RAM=


Epoch 14/50, Loss: 0.3233, Accuracy: 89.54%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.15it/s, acc=60.30%, RAM=225.6MB]


Test Accuracy: 60.30%
Epoch 15/50


Training: 100%|█| 313/313 [00:26<00:00, 11.75it/s, loss=0.5748, acc=90.40%, RAM=


Epoch 15/50, Loss: 0.3024, Accuracy: 90.40%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.47it/s, acc=59.80%, RAM=225.8MB]


Test Accuracy: 59.80%
Epoch 16/50


Training: 100%|█| 313/313 [00:26<00:00, 11.95it/s, loss=0.0853, acc=90.73%, RAM=


Epoch 16/50, Loss: 0.2831, Accuracy: 90.73%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.33it/s, acc=58.70%, RAM=230.1MB]


Test Accuracy: 58.70%
Epoch 17/50


Training: 100%|█| 313/313 [00:26<00:00, 11.62it/s, loss=0.0455, acc=91.67%, RAM=


Epoch 17/50, Loss: 0.2585, Accuracy: 91.67%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.51it/s, acc=60.10%, RAM=231.1MB]


Test Accuracy: 60.10%
Epoch 18/50


Training: 100%|█| 313/313 [00:27<00:00, 11.56it/s, loss=0.1022, acc=91.84%, RAM=


Epoch 18/50, Loss: 0.2451, Accuracy: 91.84%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.37it/s, acc=58.90%, RAM=206.2MB]


Test Accuracy: 58.90%
Epoch 19/50


Training: 100%|█| 313/313 [00:26<00:00, 11.68it/s, loss=0.0834, acc=92.23%, RAM=


Epoch 19/50, Loss: 0.2338, Accuracy: 92.23%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.95it/s, acc=58.20%, RAM=212.8MB]


Test Accuracy: 58.20%
Epoch 20/50


Training: 100%|█| 313/313 [00:26<00:00, 11.86it/s, loss=0.1284, acc=93.11%, RAM=


Epoch 20/50, Loss: 0.2159, Accuracy: 93.11%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.35it/s, acc=60.60%, RAM=204.8MB]


Test Accuracy: 60.60%
Epoch 21/50


Training: 100%|█| 313/313 [00:26<00:00, 11.74it/s, loss=0.0389, acc=94.84%, RAM=


Epoch 21/50, Loss: 0.1743, Accuracy: 94.84%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.52it/s, acc=61.90%, RAM=218.0MB]


Test Accuracy: 61.90%
Epoch 22/50


Training: 100%|█| 313/313 [00:26<00:00, 11.67it/s, loss=0.3927, acc=94.93%, RAM=


Epoch 22/50, Loss: 0.1660, Accuracy: 94.93%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.64it/s, acc=60.90%, RAM=208.3MB]


Test Accuracy: 60.90%
Epoch 23/50


Training: 100%|█| 313/313 [00:26<00:00, 11.62it/s, loss=0.1067, acc=95.11%, RAM=


Epoch 23/50, Loss: 0.1531, Accuracy: 95.11%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.39it/s, acc=61.70%, RAM=221.2MB]


Test Accuracy: 61.70%
Epoch 24/50


Training: 100%|█| 313/313 [00:26<00:00, 11.86it/s, loss=0.1453, acc=95.61%, RAM=


Epoch 24/50, Loss: 0.1529, Accuracy: 95.61%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.08it/s, acc=61.20%, RAM=227.6MB]


Test Accuracy: 61.20%
Epoch 25/50


Training: 100%|█| 313/313 [00:27<00:00, 11.51it/s, loss=0.1219, acc=95.67%, RAM=


Epoch 25/50, Loss: 0.1462, Accuracy: 95.67%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.26it/s, acc=61.40%, RAM=219.2MB]


Test Accuracy: 61.40%
Epoch 26/50


Training: 100%|█| 313/313 [00:26<00:00, 11.84it/s, loss=0.1115, acc=95.53%, RAM=


Epoch 26/50, Loss: 0.1481, Accuracy: 95.53%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.34it/s, acc=61.00%, RAM=230.3MB]


Test Accuracy: 61.00%
Epoch 27/50


Training: 100%|█| 313/313 [00:27<00:00, 11.43it/s, loss=0.2119, acc=95.95%, RAM=


Epoch 27/50, Loss: 0.1361, Accuracy: 95.95%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.98it/s, acc=61.40%, RAM=174.7MB]


Test Accuracy: 61.40%
Epoch 28/50


Training: 100%|█| 313/313 [00:26<00:00, 11.98it/s, loss=0.0528, acc=96.33%, RAM=


Epoch 28/50, Loss: 0.1302, Accuracy: 96.33%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.49it/s, acc=61.60%, RAM=178.9MB]


Test Accuracy: 61.60%
Epoch 29/50


Training: 100%|█| 313/313 [00:26<00:00, 11.72it/s, loss=0.2092, acc=96.52%, RAM=


Epoch 29/50, Loss: 0.1255, Accuracy: 96.52%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 13.63it/s, acc=62.10%, RAM=160.9MB]


Test Accuracy: 62.10%
Epoch 30/50


Training: 100%|█| 313/313 [00:26<00:00, 11.88it/s, loss=0.1963, acc=96.59%, RAM=


Epoch 30/50, Loss: 0.1175, Accuracy: 96.59%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.43it/s, acc=60.90%, RAM=200.8MB]


Test Accuracy: 60.90%
Epoch 31/50


Training: 100%|█| 313/313 [00:26<00:00, 11.75it/s, loss=0.1548, acc=96.88%, RAM=


Epoch 31/50, Loss: 0.1109, Accuracy: 96.88%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.39it/s, acc=60.70%, RAM=210.2MB]


Test Accuracy: 60.70%
Epoch 32/50


Training: 100%|█| 313/313 [00:26<00:00, 11.78it/s, loss=0.0419, acc=97.31%, RAM=


Epoch 32/50, Loss: 0.1001, Accuracy: 97.31%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.39it/s, acc=61.50%, RAM=239.4MB]


Test Accuracy: 61.50%
Epoch 33/50


Training: 100%|█| 313/313 [00:26<00:00, 11.79it/s, loss=0.0131, acc=97.31%, RAM=


Epoch 33/50, Loss: 0.0991, Accuracy: 97.31%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 13.29it/s, acc=61.70%, RAM=152.3MB]


Test Accuracy: 61.70%
Epoch 34/50


Training: 100%|█| 313/313 [00:26<00:00, 11.87it/s, loss=0.1498, acc=97.46%, RAM=


Epoch 34/50, Loss: 0.0938, Accuracy: 97.46%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.11it/s, acc=60.40%, RAM=236.2MB]


Test Accuracy: 60.40%
Epoch 35/50


Training: 100%|█| 313/313 [00:26<00:00, 11.95it/s, loss=0.0148, acc=97.71%, RAM=


Epoch 35/50, Loss: 0.0904, Accuracy: 97.71%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.22it/s, acc=61.60%, RAM=206.4MB]


Test Accuracy: 61.60%
Epoch 36/50


Training: 100%|█| 313/313 [00:26<00:00, 11.64it/s, loss=0.1213, acc=97.55%, RAM=


Epoch 36/50, Loss: 0.0936, Accuracy: 97.55%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.67it/s, acc=61.20%, RAM=225.1MB]


Test Accuracy: 61.20%
Epoch 37/50


Training: 100%|█| 313/313 [00:25<00:00, 12.10it/s, loss=0.0145, acc=97.81%, RAM=


Epoch 37/50, Loss: 0.0855, Accuracy: 97.81%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.17it/s, acc=60.90%, RAM=228.3MB]


Test Accuracy: 60.90%
Epoch 38/50


Training: 100%|█| 313/313 [00:26<00:00, 11.78it/s, loss=0.0113, acc=97.64%, RAM=


Epoch 38/50, Loss: 0.0884, Accuracy: 97.64%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.40it/s, acc=61.10%, RAM=210.9MB]


Test Accuracy: 61.10%
Epoch 39/50


Training: 100%|█| 313/313 [00:26<00:00, 11.89it/s, loss=0.1047, acc=97.88%, RAM=


Epoch 39/50, Loss: 0.0813, Accuracy: 97.88%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.20it/s, acc=61.30%, RAM=227.3MB]


Test Accuracy: 61.30%
Epoch 40/50


Training: 100%|█| 313/313 [00:28<00:00, 11.10it/s, loss=0.0042, acc=97.82%, RAM=


Epoch 40/50, Loss: 0.0850, Accuracy: 97.82%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.00it/s, acc=61.20%, RAM=230.6MB]


Test Accuracy: 61.20%
Epoch 41/50


Training: 100%|█| 313/313 [00:26<00:00, 12.00it/s, loss=0.1840, acc=98.12%, RAM=


Epoch 41/50, Loss: 0.0803, Accuracy: 98.12%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.47it/s, acc=61.20%, RAM=239.7MB]


Test Accuracy: 61.20%
Epoch 42/50


Training: 100%|█| 313/313 [00:26<00:00, 11.94it/s, loss=0.0691, acc=98.08%, RAM=


Epoch 42/50, Loss: 0.0749, Accuracy: 98.08%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.80it/s, acc=61.80%, RAM=230.5MB]


Test Accuracy: 61.80%
Epoch 43/50


Training: 100%|█| 313/313 [00:26<00:00, 11.91it/s, loss=0.0932, acc=97.99%, RAM=


Epoch 43/50, Loss: 0.0778, Accuracy: 97.99%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.19it/s, acc=62.20%, RAM=171.4MB]


Test Accuracy: 62.20%
Epoch 44/50


Training: 100%|█| 313/313 [00:27<00:00, 11.45it/s, loss=0.0857, acc=98.10%, RAM=


Epoch 44/50, Loss: 0.0728, Accuracy: 98.10%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.36it/s, acc=61.40%, RAM=207.9MB]


Test Accuracy: 61.40%
Epoch 45/50


Training: 100%|█| 313/313 [00:26<00:00, 11.80it/s, loss=0.0796, acc=98.35%, RAM=


Epoch 45/50, Loss: 0.0699, Accuracy: 98.35%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 12.43it/s, acc=62.00%, RAM=132.8MB]


Test Accuracy: 62.00%
Epoch 46/50


Training: 100%|█| 313/313 [00:27<00:00, 11.37it/s, loss=0.0211, acc=98.28%, RAM=


Epoch 46/50, Loss: 0.0707, Accuracy: 98.28%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.91it/s, acc=61.60%, RAM=227.5MB]


Test Accuracy: 61.60%
Epoch 47/50


Training: 100%|█| 313/313 [00:26<00:00, 11.81it/s, loss=0.1318, acc=97.98%, RAM=


Epoch 47/50, Loss: 0.0758, Accuracy: 97.98%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.31it/s, acc=61.40%, RAM=240.6MB]


Test Accuracy: 61.40%
Epoch 48/50


Training: 100%|█| 313/313 [00:27<00:00, 11.24it/s, loss=0.3114, acc=98.38%, RAM=


Epoch 48/50, Loss: 0.0708, Accuracy: 98.38%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.27it/s, acc=61.80%, RAM=222.9MB]


Test Accuracy: 61.80%
Epoch 49/50


Training: 100%|█| 313/313 [00:26<00:00, 11.84it/s, loss=0.1435, acc=98.42%, RAM=


Epoch 49/50, Loss: 0.0663, Accuracy: 98.42%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.68it/s, acc=62.00%, RAM=213.8MB]


Test Accuracy: 62.00%
Epoch 50/50


Training: 100%|█| 313/313 [00:27<00:00, 11.54it/s, loss=0.0429, acc=98.38%, RAM=


Epoch 50/50, Loss: 0.0681, Accuracy: 98.38%, Max RAM: 551.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.00it/s, acc=61.80%, RAM=218.7MB]


Test Accuracy: 61.80%
Maximum RAM usage: 551.9MB


Evaluating: 100%|███████████████████| 32/32 [00:01<00:00, 16.42it/s, acc=61.80%]



Final Test Accuracy: 61.80%

Final Classwise Accuracy: {'class_0': 65.17857142857143, 'class_1': 75.60975609756098, 'class_2': 51.401869158878505, 'class_3': 63.63636363636363, 'class_4': 65.55555555555556, 'class_5': 56.79012345679013, 'class_6': 73.46938775510205, 'class_7': 56.86274509803921, 'class_8': 53.84615384615385, 'class_9': 51.54639175257732}
Model saved as 'simple_biased_mnist_cnn_subset_01.pth'


Evaluating: 100%|█████████████████| 313/313 [00:21<00:00, 14.73it/s, acc=99.92%]
Evaluating: 100%|███████████████████| 32/32 [00:02<00:00, 14.99it/s, acc=61.80%]


Train Accuracy: 99.92%
Train Class-wise Accuracy:
  class_0: 100.00%
  class_1: 100.00%
  class_2: 100.00%
  class_3: 99.70%
  class_4: 99.90%
  class_5: 99.90%
  class_6: 99.90%
  class_7: 100.00%
  class_8: 99.78%
  class_9: 100.00%
Test Accuracy: 61.80%
Test Class-wise Accuracy:
  class_0: 65.18%
  class_1: 75.61%
  class_2: 51.40%
  class_3: 63.64%
  class_4: 65.56%
  class_5: 56.79%
  class_6: 73.47%
  class_7: 56.86%
  class_8: 53.85%
  class_9: 51.55%
Creating datasets...
Loaded 60000 labels from JSON file
Loaded 60000 labels from JSON file
Loaded 10000 labels from JSON file
Training dataset size: 120000
Test dataset size: 10000
Checking label ranges...
Label range (from sample of 1000): 0 to 9
Label range (from sample of 1000): 0 to 9
Calculating statistics using 1000 random samples (fast mode)...


Calculating stats: 100%|███████████| 10/10 [00:00<00:00, 12.79it/s, RAM=532.1MB]


Statistics calculation - Maximum RAM usage: 532.1MB
Fast statistics calculation complete. Using sample of 1000 images.
Dataset mean: tensor([0.0599, 0.0495, 0.0591])
Dataset std: tensor([0.1486, 0.1401, 0.1429])
Loaded 60000 labels from JSON file
Loaded 60000 labels from JSON file
Loaded 10000 labels from JSON file
Using 9996 training images (8.3% of dataset)
Using 1000 test images (10.0% of dataset)
Model architecture:
SimpleCNN(
  (conv1): Conv2d(3, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
  (bn1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (conv2): Conv2d(16, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
  (bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (conv3): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
  (bn3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (dropout): Dropout(p=0.5, inplace=False)
  (pool): AdaptiveAvgPool

Training: 100%|█| 313/313 [00:28<00:00, 10.98it/s, loss=2.0094, acc=46.05%, RAM=


Epoch 1/50, Loss: 1.7681, Accuracy: 46.05%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.46it/s, acc=19.70%, RAM=195.1MB]


Test Accuracy: 19.70%
Epoch 2/50


Training: 100%|█| 313/313 [00:27<00:00, 11.50it/s, loss=1.9803, acc=58.06%, RAM=


Epoch 2/50, Loss: 1.3292, Accuracy: 58.06%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.80it/s, acc=32.90%, RAM=228.7MB]


Test Accuracy: 32.90%
Epoch 3/50


Training: 100%|█| 313/313 [00:27<00:00, 11.50it/s, loss=1.3342, acc=65.23%, RAM=


Epoch 3/50, Loss: 1.0880, Accuracy: 65.23%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.86it/s, acc=38.80%, RAM=236.1MB]


Test Accuracy: 38.80%
Epoch 4/50


Training: 100%|█| 313/313 [00:27<00:00, 11.31it/s, loss=0.3933, acc=70.22%, RAM=


Epoch 4/50, Loss: 0.9104, Accuracy: 70.22%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.61it/s, acc=46.30%, RAM=237.0MB]


Test Accuracy: 46.30%
Epoch 5/50


Training: 100%|█| 313/313 [00:27<00:00, 11.32it/s, loss=0.4226, acc=73.81%, RAM=


Epoch 5/50, Loss: 0.8076, Accuracy: 73.81%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.92it/s, acc=49.80%, RAM=210.9MB]


Test Accuracy: 49.80%
Epoch 6/50


Training: 100%|█| 313/313 [00:28<00:00, 11.06it/s, loss=0.9827, acc=75.84%, RAM=


Epoch 6/50, Loss: 0.7264, Accuracy: 75.84%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.77it/s, acc=50.40%, RAM=256.2MB]


Test Accuracy: 50.40%
Epoch 7/50


Training: 100%|█| 313/313 [00:27<00:00, 11.37it/s, loss=0.6527, acc=78.88%, RAM=


Epoch 7/50, Loss: 0.6505, Accuracy: 78.88%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.04it/s, acc=54.00%, RAM=221.4MB]


Test Accuracy: 54.00%
Epoch 8/50


Training: 100%|█| 313/313 [00:27<00:00, 11.56it/s, loss=0.4726, acc=79.05%, RAM=


Epoch 8/50, Loss: 0.6152, Accuracy: 79.05%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.88it/s, acc=53.80%, RAM=209.2MB]


Test Accuracy: 53.80%
Epoch 9/50


Training: 100%|█| 313/313 [00:27<00:00, 11.50it/s, loss=0.3235, acc=81.60%, RAM=


Epoch 9/50, Loss: 0.5504, Accuracy: 81.60%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.70it/s, acc=58.00%, RAM=183.1MB]


Test Accuracy: 58.00%
Epoch 10/50


Training: 100%|█| 313/313 [00:26<00:00, 11.71it/s, loss=0.5450, acc=83.00%, RAM=


Epoch 10/50, Loss: 0.5035, Accuracy: 83.00%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.06it/s, acc=57.20%, RAM=231.4MB]


Test Accuracy: 57.20%
Epoch 11/50


Training: 100%|█| 313/313 [00:27<00:00, 11.55it/s, loss=0.2917, acc=86.86%, RAM=


Epoch 11/50, Loss: 0.4007, Accuracy: 86.86%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.02it/s, acc=59.00%, RAM=155.1MB]


Test Accuracy: 59.00%
Epoch 12/50


Training: 100%|█| 313/313 [00:26<00:00, 11.76it/s, loss=0.9913, acc=87.83%, RAM=


Epoch 12/50, Loss: 0.3728, Accuracy: 87.83%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.89it/s, acc=57.60%, RAM=213.9MB]


Test Accuracy: 57.60%
Epoch 13/50


Training: 100%|█| 313/313 [00:27<00:00, 11.45it/s, loss=0.5570, acc=88.69%, RAM=


Epoch 13/50, Loss: 0.3438, Accuracy: 88.69%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.01it/s, acc=59.80%, RAM=207.7MB]


Test Accuracy: 59.80%
Epoch 14/50


Training: 100%|█| 313/313 [00:26<00:00, 11.74it/s, loss=0.2957, acc=88.98%, RAM=


Epoch 14/50, Loss: 0.3294, Accuracy: 88.98%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.12it/s, acc=58.90%, RAM=169.8MB]


Test Accuracy: 58.90%
Epoch 15/50


Training: 100%|█| 313/313 [00:27<00:00, 11.52it/s, loss=0.6492, acc=90.05%, RAM=


Epoch 15/50, Loss: 0.3032, Accuracy: 90.05%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.81it/s, acc=59.80%, RAM=212.4MB]


Test Accuracy: 59.80%
Epoch 16/50


Training: 100%|█| 313/313 [00:27<00:00, 11.50it/s, loss=0.1838, acc=90.22%, RAM=


Epoch 16/50, Loss: 0.2866, Accuracy: 90.22%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.99it/s, acc=59.30%, RAM=233.8MB]


Test Accuracy: 59.30%
Epoch 17/50


Training: 100%|█| 313/313 [00:27<00:00, 11.37it/s, loss=0.6653, acc=90.66%, RAM=


Epoch 17/50, Loss: 0.2782, Accuracy: 90.66%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.87it/s, acc=59.20%, RAM=156.2MB]


Test Accuracy: 59.20%
Epoch 18/50


Training: 100%|█| 313/313 [00:26<00:00, 11.66it/s, loss=0.1006, acc=91.78%, RAM=


Epoch 18/50, Loss: 0.2512, Accuracy: 91.78%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.87it/s, acc=59.20%, RAM=206.4MB]


Test Accuracy: 59.20%
Epoch 19/50


Training: 100%|█| 313/313 [00:27<00:00, 11.57it/s, loss=0.0404, acc=92.48%, RAM=


Epoch 19/50, Loss: 0.2352, Accuracy: 92.48%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.86it/s, acc=59.40%, RAM=194.1MB]


Test Accuracy: 59.40%
Epoch 20/50


Training: 100%|█| 313/313 [00:26<00:00, 11.65it/s, loss=0.1134, acc=92.51%, RAM=


Epoch 20/50, Loss: 0.2269, Accuracy: 92.51%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.68it/s, acc=59.60%, RAM=190.7MB]


Test Accuracy: 59.60%
Epoch 21/50


Training: 100%|█| 313/313 [00:27<00:00, 11.46it/s, loss=0.5378, acc=94.51%, RAM=


Epoch 21/50, Loss: 0.1862, Accuracy: 94.51%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 12.73it/s, acc=59.90%, RAM=159.1MB]


Test Accuracy: 59.90%
Epoch 22/50


Training: 100%|█| 313/313 [00:27<00:00, 11.44it/s, loss=0.0154, acc=95.12%, RAM=


Epoch 22/50, Loss: 0.1640, Accuracy: 95.12%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.89it/s, acc=60.40%, RAM=184.9MB]


Test Accuracy: 60.40%
Epoch 23/50


Training: 100%|█| 313/313 [00:26<00:00, 11.61it/s, loss=0.0788, acc=95.14%, RAM=


Epoch 23/50, Loss: 0.1667, Accuracy: 95.14%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 12.90it/s, acc=59.60%, RAM=111.9MB]


Test Accuracy: 59.60%
Epoch 24/50


Training: 100%|█| 313/313 [00:28<00:00, 10.98it/s, loss=0.0651, acc=95.20%, RAM=


Epoch 24/50, Loss: 0.1512, Accuracy: 95.20%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.59it/s, acc=59.70%, RAM=196.5MB]


Test Accuracy: 59.70%
Epoch 25/50


Training: 100%|█| 313/313 [00:26<00:00, 11.67it/s, loss=0.2621, acc=95.82%, RAM=


Epoch 25/50, Loss: 0.1441, Accuracy: 95.82%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 13.05it/s, acc=59.40%, RAM=139.1MB]


Test Accuracy: 59.40%
Epoch 26/50


Training: 100%|█| 313/313 [00:27<00:00, 11.52it/s, loss=0.0657, acc=95.70%, RAM=


Epoch 26/50, Loss: 0.1430, Accuracy: 95.70%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.57it/s, acc=59.20%, RAM=238.9MB]


Test Accuracy: 59.20%
Epoch 27/50


Training: 100%|█| 313/313 [00:26<00:00, 11.61it/s, loss=0.0093, acc=95.98%, RAM=


Epoch 27/50, Loss: 0.1328, Accuracy: 95.98%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.88it/s, acc=58.60%, RAM=234.0MB]


Test Accuracy: 58.60%
Epoch 28/50


Training: 100%|█| 313/313 [00:27<00:00, 11.53it/s, loss=0.0868, acc=96.24%, RAM=


Epoch 28/50, Loss: 0.1308, Accuracy: 96.24%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.90it/s, acc=60.10%, RAM=217.3MB]


Test Accuracy: 60.10%
Epoch 29/50


Training: 100%|█| 313/313 [00:27<00:00, 11.54it/s, loss=0.1866, acc=96.32%, RAM=


Epoch 29/50, Loss: 0.1298, Accuracy: 96.32%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.95it/s, acc=59.80%, RAM=210.9MB]


Test Accuracy: 59.80%
Epoch 30/50


Training: 100%|█| 313/313 [00:30<00:00, 10.15it/s, loss=0.5705, acc=96.39%, RAM=


Epoch 30/50, Loss: 0.1241, Accuracy: 96.39%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 11.78it/s, acc=59.20%, RAM=129.0MB]


Test Accuracy: 59.20%
Epoch 31/50


Training: 100%|█| 313/313 [00:28<00:00, 11.12it/s, loss=0.0405, acc=96.99%, RAM=


Epoch 31/50, Loss: 0.1094, Accuracy: 96.99%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 12.15it/s, acc=59.30%, RAM=155.5MB]


Test Accuracy: 59.30%
Epoch 32/50


Training: 100%|█| 313/313 [00:27<00:00, 11.50it/s, loss=0.0138, acc=97.50%, RAM=


Epoch 32/50, Loss: 0.0967, Accuracy: 97.50%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.49it/s, acc=59.70%, RAM=185.0MB]


Test Accuracy: 59.70%
Epoch 33/50


Training: 100%|█| 313/313 [00:27<00:00, 11.32it/s, loss=0.1577, acc=97.43%, RAM=


Epoch 33/50, Loss: 0.0981, Accuracy: 97.43%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 12.75it/s, acc=59.10%, RAM=142.0MB]


Test Accuracy: 59.10%
Epoch 34/50


Training: 100%|█| 313/313 [00:28<00:00, 10.84it/s, loss=0.1942, acc=97.51%, RAM=


Epoch 34/50, Loss: 0.0956, Accuracy: 97.51%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.44it/s, acc=59.50%, RAM=213.5MB]


Test Accuracy: 59.50%
Epoch 35/50


Training: 100%|█| 313/313 [00:31<00:00,  9.91it/s, loss=0.0789, acc=97.31%, RAM=


Epoch 35/50, Loss: 0.0981, Accuracy: 97.31%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.46it/s, acc=59.60%, RAM=235.1MB]


Test Accuracy: 59.60%
Epoch 36/50


Training: 100%|█| 313/313 [00:26<00:00, 11.61it/s, loss=0.0362, acc=97.35%, RAM=


Epoch 36/50, Loss: 0.0978, Accuracy: 97.35%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.00it/s, acc=59.50%, RAM=201.7MB]


Test Accuracy: 59.50%
Epoch 37/50


Training: 100%|█| 313/313 [00:27<00:00, 11.42it/s, loss=0.3030, acc=97.54%, RAM=


Epoch 37/50, Loss: 0.0944, Accuracy: 97.54%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.32it/s, acc=59.50%, RAM=190.1MB]


Test Accuracy: 59.50%
Epoch 38/50


Training: 100%|█| 313/313 [00:26<00:00, 11.65it/s, loss=0.0674, acc=97.58%, RAM=


Epoch 38/50, Loss: 0.0882, Accuracy: 97.58%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.02it/s, acc=59.60%, RAM=196.6MB]


Test Accuracy: 59.60%
Epoch 39/50


Training: 100%|█| 313/313 [00:27<00:00, 11.40it/s, loss=0.0610, acc=97.45%, RAM=


Epoch 39/50, Loss: 0.0924, Accuracy: 97.45%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.79it/s, acc=59.10%, RAM=214.2MB]


Test Accuracy: 59.10%
Epoch 40/50


Training: 100%|█| 313/313 [00:26<00:00, 11.70it/s, loss=0.0171, acc=97.63%, RAM=


Epoch 40/50, Loss: 0.0899, Accuracy: 97.63%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.75it/s, acc=59.90%, RAM=233.4MB]


Test Accuracy: 59.90%
Epoch 41/50


Training: 100%|█| 313/313 [00:26<00:00, 11.85it/s, loss=0.2151, acc=97.93%, RAM=


Epoch 41/50, Loss: 0.0810, Accuracy: 97.93%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 12.98it/s, acc=59.80%, RAM=154.7MB]


Test Accuracy: 59.80%
Epoch 42/50


Training: 100%|█| 313/313 [00:26<00:00, 11.66it/s, loss=0.0868, acc=98.05%, RAM=


Epoch 42/50, Loss: 0.0781, Accuracy: 98.05%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.90it/s, acc=59.50%, RAM=242.5MB]


Test Accuracy: 59.50%
Epoch 43/50


Training: 100%|█| 313/313 [00:26<00:00, 11.81it/s, loss=0.1113, acc=97.96%, RAM=


Epoch 43/50, Loss: 0.0780, Accuracy: 97.96%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.58it/s, acc=58.90%, RAM=224.8MB]


Test Accuracy: 58.90%
Epoch 44/50


Training: 100%|█| 313/313 [00:27<00:00, 11.22it/s, loss=0.0614, acc=97.93%, RAM=


Epoch 44/50, Loss: 0.0777, Accuracy: 97.93%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.01it/s, acc=59.00%, RAM=204.8MB]


Test Accuracy: 59.00%
Epoch 45/50


Training: 100%|█| 313/313 [00:27<00:00, 11.35it/s, loss=0.0257, acc=98.12%, RAM=


Epoch 45/50, Loss: 0.0746, Accuracy: 98.12%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.89it/s, acc=59.70%, RAM=203.5MB]


Test Accuracy: 59.70%
Epoch 46/50


Training: 100%|█| 313/313 [00:27<00:00, 11.38it/s, loss=0.2867, acc=98.01%, RAM=


Epoch 46/50, Loss: 0.0775, Accuracy: 98.01%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.98it/s, acc=58.60%, RAM=235.2MB]


Test Accuracy: 58.60%
Epoch 47/50


Training: 100%|█| 313/313 [00:27<00:00, 11.21it/s, loss=0.0499, acc=98.00%, RAM=


Epoch 47/50, Loss: 0.0776, Accuracy: 98.00%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.72it/s, acc=59.60%, RAM=226.1MB]


Test Accuracy: 59.60%
Epoch 48/50


Training: 100%|█| 313/313 [00:27<00:00, 11.46it/s, loss=0.3646, acc=98.14%, RAM=


Epoch 48/50, Loss: 0.0709, Accuracy: 98.14%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.03it/s, acc=60.00%, RAM=216.5MB]


Test Accuracy: 60.00%
Epoch 49/50


Training: 100%|█| 313/313 [00:28<00:00, 11.17it/s, loss=0.0495, acc=98.10%, RAM=


Epoch 49/50, Loss: 0.0712, Accuracy: 98.10%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.64it/s, acc=59.80%, RAM=200.1MB]


Test Accuracy: 59.80%
Epoch 50/50


Training: 100%|█| 313/313 [00:27<00:00, 11.49it/s, loss=0.1442, acc=98.28%, RAM=


Epoch 50/50, Loss: 0.0726, Accuracy: 98.28%, Max RAM: 629.2MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.21it/s, acc=59.30%, RAM=216.9MB]


Test Accuracy: 59.30%
Maximum RAM usage: 629.2MB


Evaluating: 100%|███████████████████| 32/32 [00:01<00:00, 16.52it/s, acc=59.30%]



Final Test Accuracy: 59.30%

Final Classwise Accuracy: {'class_0': 55.10204081632653, 'class_1': 76.85950413223141, 'class_2': 51.37614678899082, 'class_3': 69.52380952380952, 'class_4': 57.4468085106383, 'class_5': 45.333333333333336, 'class_6': 71.91011235955057, 'class_7': 55.932203389830505, 'class_8': 46.808510638297875, 'class_9': 56.70103092783505}
Model saved as 'simple_biased_mnist_cnn_subset_01.pth'


Evaluating: 100%|█████████████████| 313/313 [00:21<00:00, 14.61it/s, acc=99.95%]
Evaluating: 100%|███████████████████| 32/32 [00:02<00:00, 14.81it/s, acc=59.30%]


Train Accuracy: 99.95%
Train Class-wise Accuracy:
  class_0: 100.00%
  class_1: 100.00%
  class_2: 99.90%
  class_3: 99.90%
  class_4: 100.00%
  class_5: 100.00%
  class_6: 99.90%
  class_7: 100.00%
  class_8: 99.79%
  class_9: 100.00%
Test Accuracy: 59.30%
Test Class-wise Accuracy:
  class_0: 55.10%
  class_1: 76.86%
  class_2: 51.38%
  class_3: 69.52%
  class_4: 57.45%
  class_5: 45.33%
  class_6: 71.91%
  class_7: 55.93%
  class_8: 46.81%
  class_9: 56.70%
Creating datasets...
Loaded 60000 labels from JSON file
Loaded 60000 labels from JSON file
Loaded 10000 labels from JSON file
Training dataset size: 120000
Test dataset size: 10000
Checking label ranges...
Label range (from sample of 1000): 0 to 9
Label range (from sample of 1000): 0 to 9
Calculating statistics using 1000 random samples (fast mode)...


Calculating stats: 100%|███████████| 10/10 [00:00<00:00, 12.33it/s, RAM=471.2MB]


Statistics calculation - Maximum RAM usage: 471.2MB
Fast statistics calculation complete. Using sample of 1000 images.
Dataset mean: tensor([0.0577, 0.0465, 0.0579])
Dataset std: tensor([0.1461, 0.1356, 0.1405])
Loaded 60000 labels from JSON file
Loaded 60000 labels from JSON file
Loaded 10000 labels from JSON file
Using 9996 training images (8.3% of dataset)
Using 1000 test images (10.0% of dataset)
Model architecture:
SimpleCNN(
  (conv1): Conv2d(3, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
  (bn1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (conv2): Conv2d(16, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
  (bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (conv3): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
  (bn3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (dropout): Dropout(p=0.5, inplace=False)
  (pool): AdaptiveAvgPool

Training: 100%|█| 313/313 [00:28<00:00, 10.80it/s, loss=1.4710, acc=46.61%, RAM=


Epoch 1/50, Loss: 1.7774, Accuracy: 46.61%, Max RAM: 266.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.18it/s, acc=15.60%, RAM=208.4MB]


Test Accuracy: 15.60%
Epoch 2/50


Training: 100%|█| 313/313 [00:27<00:00, 11.27it/s, loss=1.8966, acc=56.96%, RAM=


Epoch 2/50, Loss: 1.3626, Accuracy: 56.96%, Max RAM: 266.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 12.43it/s, acc=29.20%, RAM=163.2MB]


Test Accuracy: 29.20%
Epoch 3/50


Training: 100%|█| 313/313 [00:27<00:00, 11.58it/s, loss=1.5286, acc=63.83%, RAM=


Epoch 3/50, Loss: 1.1362, Accuracy: 63.83%, Max RAM: 266.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.72it/s, acc=33.40%, RAM=245.3MB]


Test Accuracy: 33.40%
Epoch 4/50


Training: 100%|█| 313/313 [00:27<00:00, 11.54it/s, loss=0.6641, acc=68.46%, RAM=


Epoch 4/50, Loss: 0.9786, Accuracy: 68.46%, Max RAM: 266.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 13.00it/s, acc=43.00%, RAM=134.0MB]


Test Accuracy: 43.00%
Epoch 5/50


Training: 100%|█| 313/313 [00:28<00:00, 10.80it/s, loss=0.6171, acc=72.26%, RAM=


Epoch 5/50, Loss: 0.8498, Accuracy: 72.26%, Max RAM: 266.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.33it/s, acc=47.80%, RAM=209.9MB]


Test Accuracy: 47.80%
Epoch 6/50


Training: 100%|█| 313/313 [00:27<00:00, 11.57it/s, loss=0.3790, acc=74.68%, RAM=


Epoch 6/50, Loss: 0.7571, Accuracy: 74.68%, Max RAM: 266.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 12.82it/s, acc=48.80%, RAM=146.0MB]


Test Accuracy: 48.80%
Epoch 7/50


Training: 100%|█| 313/313 [00:27<00:00, 11.57it/s, loss=0.7949, acc=76.89%, RAM=


Epoch 7/50, Loss: 0.6904, Accuracy: 76.89%, Max RAM: 266.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.84it/s, acc=49.70%, RAM=198.0MB]


Test Accuracy: 49.70%
Epoch 8/50


Training: 100%|█| 313/313 [00:27<00:00, 11.38it/s, loss=0.4133, acc=79.13%, RAM=


Epoch 8/50, Loss: 0.6298, Accuracy: 79.13%, Max RAM: 266.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 12.64it/s, acc=53.60%, RAM=142.6MB]


Test Accuracy: 53.60%
Epoch 9/50


Training: 100%|█| 313/313 [00:27<00:00, 11.40it/s, loss=0.8933, acc=81.15%, RAM=


Epoch 9/50, Loss: 0.5699, Accuracy: 81.15%, Max RAM: 266.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.61it/s, acc=52.50%, RAM=200.6MB]


Test Accuracy: 52.50%
Epoch 10/50


Training: 100%|█| 313/313 [00:27<00:00, 11.37it/s, loss=0.4788, acc=82.71%, RAM=


Epoch 10/50, Loss: 0.5222, Accuracy: 82.71%, Max RAM: 266.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 12.64it/s, acc=54.60%, RAM=104.2MB]


Test Accuracy: 54.60%
Epoch 11/50


Training: 100%|█| 313/313 [00:27<00:00, 11.39it/s, loss=0.2015, acc=86.49%, RAM=


Epoch 11/50, Loss: 0.4080, Accuracy: 86.49%, Max RAM: 266.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.59it/s, acc=54.90%, RAM=230.1MB]


Test Accuracy: 54.90%
Epoch 12/50


Training: 100%|█| 313/313 [00:28<00:00, 11.07it/s, loss=0.3842, acc=88.01%, RAM=


Epoch 12/50, Loss: 0.3699, Accuracy: 88.01%, Max RAM: 266.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 11.67it/s, acc=54.10%, RAM=121.9MB]


Test Accuracy: 54.10%
Epoch 13/50


Training: 100%|█| 313/313 [00:29<00:00, 10.75it/s, loss=0.5700, acc=88.58%, RAM=


Epoch 13/50, Loss: 0.3504, Accuracy: 88.58%, Max RAM: 266.9MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.37it/s, acc=57.30%, RAM=273.8MB]


Test Accuracy: 57.30%
Epoch 14/50


Training: 100%|█| 313/313 [00:27<00:00, 11.27it/s, loss=0.9069, acc=89.13%, RAM=


Epoch 14/50, Loss: 0.3225, Accuracy: 89.13%, Max RAM: 273.8MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.41it/s, acc=56.50%, RAM=211.4MB]


Test Accuracy: 56.50%
Epoch 15/50


Training: 100%|█| 313/313 [00:27<00:00, 11.30it/s, loss=0.2228, acc=89.91%, RAM=


Epoch 15/50, Loss: 0.3066, Accuracy: 89.91%, Max RAM: 273.8MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.04it/s, acc=57.10%, RAM=223.4MB]


Test Accuracy: 57.10%
Epoch 16/50


Training: 100%|█| 313/313 [00:28<00:00, 11.07it/s, loss=0.2602, acc=90.74%, RAM=


Epoch 16/50, Loss: 0.2954, Accuracy: 90.74%, Max RAM: 273.8MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.23it/s, acc=57.80%, RAM=188.3MB]


Test Accuracy: 57.80%
Epoch 17/50


Training: 100%|█| 313/313 [00:27<00:00, 11.27it/s, loss=0.4692, acc=91.27%, RAM=


Epoch 17/50, Loss: 0.2717, Accuracy: 91.27%, Max RAM: 273.8MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.95it/s, acc=57.90%, RAM=230.5MB]


Test Accuracy: 57.90%
Epoch 18/50


Training: 100%|█| 313/313 [00:29<00:00, 10.53it/s, loss=0.2862, acc=91.82%, RAM=


Epoch 18/50, Loss: 0.2555, Accuracy: 91.82%, Max RAM: 273.8MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 13.96it/s, acc=59.20%, RAM=182.2MB]


Test Accuracy: 59.20%
Epoch 19/50


Training: 100%|█| 313/313 [00:27<00:00, 11.25it/s, loss=0.2734, acc=92.05%, RAM=


Epoch 19/50, Loss: 0.2451, Accuracy: 92.05%, Max RAM: 273.8MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.75it/s, acc=57.40%, RAM=218.7MB]


Test Accuracy: 57.40%
Epoch 20/50


Training: 100%|█| 313/313 [00:27<00:00, 11.24it/s, loss=0.1591, acc=92.55%, RAM=


Epoch 20/50, Loss: 0.2266, Accuracy: 92.55%, Max RAM: 273.8MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.28it/s, acc=56.90%, RAM=162.6MB]


Test Accuracy: 56.90%
Epoch 21/50


Training: 100%|█| 313/313 [00:27<00:00, 11.34it/s, loss=0.2972, acc=94.34%, RAM=


Epoch 21/50, Loss: 0.1832, Accuracy: 94.34%, Max RAM: 273.8MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.76it/s, acc=59.30%, RAM=250.2MB]


Test Accuracy: 59.30%
Epoch 22/50


Training: 100%|█| 313/313 [00:28<00:00, 11.15it/s, loss=0.2957, acc=95.03%, RAM=


Epoch 22/50, Loss: 0.1697, Accuracy: 95.03%, Max RAM: 273.8MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.64it/s, acc=58.40%, RAM=193.5MB]


Test Accuracy: 58.40%
Epoch 23/50


Training: 100%|█| 313/313 [00:27<00:00, 11.45it/s, loss=0.1143, acc=95.45%, RAM=


Epoch 23/50, Loss: 0.1556, Accuracy: 95.45%, Max RAM: 273.8MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.01it/s, acc=58.10%, RAM=226.6MB]


Test Accuracy: 58.10%
Epoch 24/50


Training: 100%|█| 313/313 [00:28<00:00, 11.06it/s, loss=0.1208, acc=95.67%, RAM=


Epoch 24/50, Loss: 0.1486, Accuracy: 95.67%, Max RAM: 273.8MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.59it/s, acc=58.40%, RAM=194.5MB]


Test Accuracy: 58.40%
Epoch 25/50


Training: 100%|█| 313/313 [00:27<00:00, 11.46it/s, loss=0.0677, acc=95.22%, RAM=


Epoch 25/50, Loss: 0.1543, Accuracy: 95.22%, Max RAM: 273.8MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.84it/s, acc=58.40%, RAM=222.0MB]


Test Accuracy: 58.40%
Epoch 26/50


Training: 100%|█| 313/313 [00:29<00:00, 10.69it/s, loss=0.1647, acc=95.52%, RAM=


Epoch 26/50, Loss: 0.1494, Accuracy: 95.52%, Max RAM: 273.8MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.38it/s, acc=57.70%, RAM=221.3MB]


Test Accuracy: 57.70%
Epoch 27/50


Training: 100%|█| 313/313 [00:27<00:00, 11.32it/s, loss=0.0872, acc=95.81%, RAM=


Epoch 27/50, Loss: 0.1389, Accuracy: 95.81%, Max RAM: 273.8MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.68it/s, acc=58.00%, RAM=238.4MB]


Test Accuracy: 58.00%
Epoch 28/50


Training: 100%|█| 313/313 [00:27<00:00, 11.30it/s, loss=0.1360, acc=96.45%, RAM=


Epoch 28/50, Loss: 0.1291, Accuracy: 96.45%, Max RAM: 273.8MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.69it/s, acc=59.70%, RAM=201.5MB]


Test Accuracy: 59.70%
Epoch 29/50


Training: 100%|█| 313/313 [00:27<00:00, 11.19it/s, loss=0.2702, acc=96.00%, RAM=


Epoch 29/50, Loss: 0.1287, Accuracy: 96.00%, Max RAM: 273.8MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.71it/s, acc=58.70%, RAM=238.5MB]


Test Accuracy: 58.70%
Epoch 30/50


Training: 100%|█| 313/313 [00:28<00:00, 10.84it/s, loss=0.0698, acc=96.37%, RAM=


Epoch 30/50, Loss: 0.1233, Accuracy: 96.37%, Max RAM: 273.8MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 13.96it/s, acc=56.50%, RAM=217.0MB]


Test Accuracy: 56.50%
Epoch 31/50


Training: 100%|█| 313/313 [00:28<00:00, 11.13it/s, loss=0.1637, acc=96.79%, RAM=


Epoch 31/50, Loss: 0.1135, Accuracy: 96.79%, Max RAM: 273.8MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 13.90it/s, acc=57.80%, RAM=232.8MB]


Test Accuracy: 57.80%
Epoch 32/50


Training: 100%|█| 313/313 [00:29<00:00, 10.57it/s, loss=0.0784, acc=97.39%, RAM=


Epoch 32/50, Loss: 0.1010, Accuracy: 97.39%, Max RAM: 273.8MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 13.51it/s, acc=58.70%, RAM=184.0MB]


Test Accuracy: 58.70%
Epoch 33/50


Training: 100%|█| 313/313 [00:29<00:00, 10.79it/s, loss=0.0912, acc=97.40%, RAM=


Epoch 33/50, Loss: 0.1008, Accuracy: 97.40%, Max RAM: 273.8MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 12.99it/s, acc=59.40%, RAM=174.5MB]


Test Accuracy: 59.40%
Epoch 34/50


Training: 100%|█| 313/313 [00:29<00:00, 10.64it/s, loss=0.2497, acc=97.26%, RAM=


Epoch 34/50, Loss: 0.1004, Accuracy: 97.26%, Max RAM: 273.8MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 12.54it/s, acc=59.10%, RAM=156.8MB]


Test Accuracy: 59.10%
Epoch 35/50


Training: 100%|█| 313/313 [00:28<00:00, 11.14it/s, loss=0.2521, acc=97.23%, RAM=


Epoch 35/50, Loss: 0.1002, Accuracy: 97.23%, Max RAM: 273.8MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.63it/s, acc=58.60%, RAM=248.5MB]


Test Accuracy: 58.60%
Epoch 36/50


Training: 100%|█| 313/313 [00:28<00:00, 10.94it/s, loss=0.0959, acc=97.34%, RAM=


Epoch 36/50, Loss: 0.0986, Accuracy: 97.34%, Max RAM: 273.8MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.49it/s, acc=58.30%, RAM=216.9MB]


Test Accuracy: 58.30%
Epoch 37/50


Training: 100%|█| 313/313 [00:28<00:00, 10.99it/s, loss=0.1960, acc=97.43%, RAM=


Epoch 37/50, Loss: 0.0976, Accuracy: 97.43%, Max RAM: 273.8MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.46it/s, acc=59.70%, RAM=236.1MB]


Test Accuracy: 59.70%
Epoch 38/50


Training: 100%|█| 313/313 [00:29<00:00, 10.75it/s, loss=0.0020, acc=97.92%, RAM=


Epoch 38/50, Loss: 0.0879, Accuracy: 97.92%, Max RAM: 273.8MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.37it/s, acc=59.30%, RAM=213.5MB]


Test Accuracy: 59.30%
Epoch 39/50


Training: 100%|█| 313/313 [00:28<00:00, 11.08it/s, loss=0.1199, acc=97.54%, RAM=


Epoch 39/50, Loss: 0.0921, Accuracy: 97.54%, Max RAM: 273.8MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.37it/s, acc=57.50%, RAM=242.1MB]


Test Accuracy: 57.50%
Epoch 40/50


Training: 100%|█| 313/313 [00:28<00:00, 10.98it/s, loss=0.3355, acc=97.69%, RAM=


Epoch 40/50, Loss: 0.0879, Accuracy: 97.69%, Max RAM: 273.8MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 13.66it/s, acc=58.30%, RAM=172.8MB]


Test Accuracy: 58.30%
Epoch 41/50


Training: 100%|█| 313/313 [00:28<00:00, 11.16it/s, loss=0.3033, acc=97.79%, RAM=


Epoch 41/50, Loss: 0.0855, Accuracy: 97.79%, Max RAM: 273.8MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.38it/s, acc=59.40%, RAM=229.4MB]


Test Accuracy: 59.40%
Epoch 42/50


Training: 100%|█| 313/313 [00:29<00:00, 10.58it/s, loss=0.1251, acc=97.97%, RAM=


Epoch 42/50, Loss: 0.0795, Accuracy: 97.97%, Max RAM: 273.8MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 13.90it/s, acc=58.80%, RAM=159.9MB]


Test Accuracy: 58.80%
Epoch 43/50


Training: 100%|█| 313/313 [00:28<00:00, 11.13it/s, loss=0.0910, acc=97.93%, RAM=


Epoch 43/50, Loss: 0.0802, Accuracy: 97.93%, Max RAM: 273.8MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.52it/s, acc=59.00%, RAM=236.2MB]


Test Accuracy: 59.00%
Epoch 44/50


Training: 100%|█| 313/313 [00:28<00:00, 11.03it/s, loss=0.0501, acc=97.98%, RAM=


Epoch 44/50, Loss: 0.0800, Accuracy: 97.98%, Max RAM: 273.8MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 13.20it/s, acc=59.00%, RAM=191.5MB]


Test Accuracy: 59.00%
Epoch 45/50


Training: 100%|█| 313/313 [00:28<00:00, 11.13it/s, loss=0.0350, acc=98.17%, RAM=


Epoch 45/50, Loss: 0.0765, Accuracy: 98.17%, Max RAM: 273.8MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.47it/s, acc=59.10%, RAM=219.0MB]


Test Accuracy: 59.10%
Epoch 46/50


Training: 100%|█| 313/313 [00:29<00:00, 10.76it/s, loss=0.0381, acc=98.18%, RAM=


Epoch 46/50, Loss: 0.0759, Accuracy: 98.18%, Max RAM: 273.8MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.27it/s, acc=59.50%, RAM=217.7MB]


Test Accuracy: 59.50%
Epoch 47/50


Training: 100%|█| 313/313 [00:28<00:00, 10.83it/s, loss=0.1121, acc=98.26%, RAM=


Epoch 47/50, Loss: 0.0752, Accuracy: 98.26%, Max RAM: 273.8MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.61it/s, acc=59.10%, RAM=209.8MB]


Test Accuracy: 59.10%
Epoch 48/50


Training: 100%|█| 313/313 [00:29<00:00, 10.77it/s, loss=0.0448, acc=98.49%, RAM=


Epoch 48/50, Loss: 0.0694, Accuracy: 98.49%, Max RAM: 273.8MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.27it/s, acc=59.40%, RAM=200.8MB]


Test Accuracy: 59.40%
Epoch 49/50


Training: 100%|█| 313/313 [00:28<00:00, 11.10it/s, loss=0.0484, acc=98.28%, RAM=


Epoch 49/50, Loss: 0.0707, Accuracy: 98.28%, Max RAM: 273.8MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.46it/s, acc=58.80%, RAM=232.3MB]


Test Accuracy: 58.80%
Epoch 50/50


Training: 100%|█| 313/313 [00:28<00:00, 11.03it/s, loss=0.0748, acc=98.20%, RAM=


Epoch 50/50, Loss: 0.0714, Accuracy: 98.20%, Max RAM: 273.8MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 13.91it/s, acc=58.90%, RAM=233.8MB]


Test Accuracy: 58.90%
Maximum RAM usage: 273.8MB


Evaluating: 100%|███████████████████| 32/32 [00:02<00:00, 15.87it/s, acc=58.90%]



Final Test Accuracy: 58.90%

Final Classwise Accuracy: {'class_0': 59.61538461538461, 'class_1': 78.76106194690266, 'class_2': 58.333333333333336, 'class_3': 56.32183908045977, 'class_4': 56.52173913043478, 'class_5': 52.04081632653061, 'class_6': 67.46987951807229, 'class_7': 58.11965811965812, 'class_8': 52.04081632653061, 'class_9': 48.0}
Model saved as 'simple_biased_mnist_cnn_subset_01.pth'


Evaluating: 100%|█████████████████| 313/313 [00:22<00:00, 13.64it/s, acc=99.97%]
Evaluating: 100%|███████████████████| 32/32 [00:02<00:00, 13.49it/s, acc=58.90%]


Train Accuracy: 99.97%
Train Class-wise Accuracy:
  class_0: 100.00%
  class_1: 100.00%
  class_2: 99.90%
  class_3: 100.00%
  class_4: 100.00%
  class_5: 100.00%
  class_6: 100.00%
  class_7: 100.00%
  class_8: 99.90%
  class_9: 99.90%
Test Accuracy: 58.90%
Test Class-wise Accuracy:
  class_0: 59.62%
  class_1: 78.76%
  class_2: 58.33%
  class_3: 56.32%
  class_4: 56.52%
  class_5: 52.04%
  class_6: 67.47%
  class_7: 58.12%
  class_8: 52.04%
  class_9: 48.00%
Creating datasets...
Loaded 60000 labels from JSON file
Loaded 60000 labels from JSON file
Loaded 10000 labels from JSON file
Training dataset size: 120000
Test dataset size: 10000
Checking label ranges...
Label range (from sample of 1000): 0 to 9
Label range (from sample of 1000): 0 to 9
Calculating statistics using 1000 random samples (fast mode)...


Calculating stats: 100%|███████████| 10/10 [00:00<00:00, 12.15it/s, RAM=343.8MB]


Statistics calculation - Maximum RAM usage: 343.8MB
Fast statistics calculation complete. Using sample of 1000 images.
Dataset mean: tensor([0.0609, 0.0484, 0.0574])
Dataset std: tensor([0.1506, 0.1384, 0.1403])
Loaded 60000 labels from JSON file
Loaded 60000 labels from JSON file
Loaded 10000 labels from JSON file
Using 9996 training images (8.3% of dataset)
Using 1000 test images (10.0% of dataset)
Model architecture:
SimpleCNN(
  (conv1): Conv2d(3, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
  (bn1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (conv2): Conv2d(16, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
  (bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (conv3): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
  (bn3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (dropout): Dropout(p=0.5, inplace=False)
  (pool): AdaptiveAvgPool

Training: 100%|█| 313/313 [00:27<00:00, 11.24it/s, loss=0.6590, acc=46.31%, RAM=


Epoch 1/50, Loss: 1.7513, Accuracy: 46.31%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.05it/s, acc=20.30%, RAM=252.9MB]


Test Accuracy: 20.30%
Epoch 2/50


Training: 100%|█| 313/313 [00:27<00:00, 11.29it/s, loss=1.2071, acc=57.75%, RAM=


Epoch 2/50, Loss: 1.3422, Accuracy: 57.75%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 15.08it/s, acc=29.20%, RAM=231.2MB]


Test Accuracy: 29.20%
Epoch 3/50


Training: 100%|█| 313/313 [00:27<00:00, 11.29it/s, loss=0.9011, acc=64.17%, RAM=


Epoch 3/50, Loss: 1.1204, Accuracy: 64.17%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.40it/s, acc=35.30%, RAM=242.7MB]


Test Accuracy: 35.30%
Epoch 4/50


Training: 100%|█| 313/313 [00:28<00:00, 10.99it/s, loss=1.0509, acc=67.95%, RAM=


Epoch 4/50, Loss: 0.9614, Accuracy: 67.95%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 12.77it/s, acc=42.60%, RAM=222.8MB]


Test Accuracy: 42.60%
Epoch 5/50


Training: 100%|█| 313/313 [00:27<00:00, 11.39it/s, loss=0.8739, acc=72.07%, RAM=


Epoch 5/50, Loss: 0.8386, Accuracy: 72.07%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.20it/s, acc=48.00%, RAM=235.5MB]


Test Accuracy: 48.00%
Epoch 6/50


Training: 100%|█| 313/313 [00:27<00:00, 11.32it/s, loss=0.9758, acc=75.54%, RAM=


Epoch 6/50, Loss: 0.7337, Accuracy: 75.54%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 13.39it/s, acc=50.10%, RAM=160.6MB]


Test Accuracy: 50.10%
Epoch 7/50


Training: 100%|█| 313/313 [00:27<00:00, 11.40it/s, loss=0.3781, acc=78.05%, RAM=


Epoch 7/50, Loss: 0.6610, Accuracy: 78.05%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.30it/s, acc=55.00%, RAM=224.2MB]


Test Accuracy: 55.00%
Epoch 8/50


Training: 100%|█| 313/313 [00:27<00:00, 11.21it/s, loss=0.8572, acc=80.09%, RAM=


Epoch 8/50, Loss: 0.6031, Accuracy: 80.09%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 13.10it/s, acc=56.60%, RAM=220.8MB]


Test Accuracy: 56.60%
Epoch 9/50


Training: 100%|█| 313/313 [00:27<00:00, 11.39it/s, loss=0.3090, acc=81.41%, RAM=


Epoch 9/50, Loss: 0.5559, Accuracy: 81.41%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.52it/s, acc=57.60%, RAM=223.7MB]


Test Accuracy: 57.60%
Epoch 10/50


Training: 100%|█| 313/313 [00:27<00:00, 11.30it/s, loss=0.5102, acc=83.53%, RAM=


Epoch 10/50, Loss: 0.4977, Accuracy: 83.53%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.29it/s, acc=59.10%, RAM=164.6MB]


Test Accuracy: 59.10%
Epoch 11/50


Training: 100%|█| 313/313 [00:27<00:00, 11.24it/s, loss=0.7810, acc=87.47%, RAM=


Epoch 11/50, Loss: 0.3876, Accuracy: 87.47%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.44it/s, acc=62.60%, RAM=225.2MB]


Test Accuracy: 62.60%
Epoch 12/50


Training: 100%|█| 313/313 [00:28<00:00, 10.99it/s, loss=0.2828, acc=88.65%, RAM=


Epoch 12/50, Loss: 0.3528, Accuracy: 88.65%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 11.73it/s, acc=60.30%, RAM=120.5MB]


Test Accuracy: 60.30%
Epoch 13/50


Training: 100%|█| 313/313 [00:27<00:00, 11.38it/s, loss=0.1907, acc=89.36%, RAM=


Epoch 13/50, Loss: 0.3261, Accuracy: 89.36%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.30it/s, acc=58.80%, RAM=207.5MB]


Test Accuracy: 58.80%
Epoch 14/50


Training: 100%|█| 313/313 [00:27<00:00, 11.32it/s, loss=0.3119, acc=89.72%, RAM=


Epoch 14/50, Loss: 0.3111, Accuracy: 89.72%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 11.91it/s, acc=59.10%, RAM=134.6MB]


Test Accuracy: 59.10%
Epoch 15/50


Training: 100%|█| 313/313 [00:27<00:00, 11.42it/s, loss=0.3347, acc=90.96%, RAM=


Epoch 15/50, Loss: 0.2807, Accuracy: 90.96%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.94it/s, acc=61.20%, RAM=244.1MB]


Test Accuracy: 61.20%
Epoch 16/50


Training: 100%|█| 313/313 [00:27<00:00, 11.26it/s, loss=0.3641, acc=91.37%, RAM=


Epoch 16/50, Loss: 0.2658, Accuracy: 91.37%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 11.54it/s, acc=61.40%, RAM=148.5MB]


Test Accuracy: 61.40%
Epoch 17/50


Training: 100%|█| 313/313 [00:27<00:00, 11.35it/s, loss=0.3611, acc=91.65%, RAM=


Epoch 17/50, Loss: 0.2613, Accuracy: 91.65%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.24it/s, acc=60.10%, RAM=221.9MB]


Test Accuracy: 60.10%
Epoch 18/50


Training: 100%|█| 313/313 [00:27<00:00, 11.38it/s, loss=0.1334, acc=92.13%, RAM=


Epoch 18/50, Loss: 0.2470, Accuracy: 92.13%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 12.22it/s, acc=60.80%, RAM=122.0MB]


Test Accuracy: 60.80%
Epoch 19/50


Training: 100%|█| 313/313 [00:27<00:00, 11.37it/s, loss=0.0622, acc=92.77%, RAM=


Epoch 19/50, Loss: 0.2277, Accuracy: 92.77%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.30it/s, acc=61.00%, RAM=212.7MB]


Test Accuracy: 61.00%
Epoch 20/50


Training: 100%|█| 313/313 [00:27<00:00, 11.23it/s, loss=0.1834, acc=93.50%, RAM=


Epoch 20/50, Loss: 0.2082, Accuracy: 93.50%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 12.29it/s, acc=60.60%, RAM=139.0MB]


Test Accuracy: 60.60%
Epoch 21/50


Training: 100%|█| 313/313 [00:27<00:00, 11.29it/s, loss=0.1604, acc=94.87%, RAM=


Epoch 21/50, Loss: 0.1704, Accuracy: 94.87%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.48it/s, acc=62.50%, RAM=207.2MB]


Test Accuracy: 62.50%
Epoch 22/50


Training: 100%|█| 313/313 [00:27<00:00, 11.36it/s, loss=0.5653, acc=95.73%, RAM=


Epoch 22/50, Loss: 0.1523, Accuracy: 95.73%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 12.18it/s, acc=60.70%, RAM=146.8MB]


Test Accuracy: 60.70%
Epoch 23/50


Training: 100%|█| 313/313 [00:27<00:00, 11.38it/s, loss=0.1302, acc=95.89%, RAM=


Epoch 23/50, Loss: 0.1464, Accuracy: 95.89%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.60it/s, acc=60.20%, RAM=216.8MB]


Test Accuracy: 60.20%
Epoch 24/50


Training: 100%|█| 313/313 [00:27<00:00, 11.38it/s, loss=0.2123, acc=96.13%, RAM=


Epoch 24/50, Loss: 0.1417, Accuracy: 96.13%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 12.24it/s, acc=60.40%, RAM=143.8MB]


Test Accuracy: 60.40%
Epoch 25/50


Training: 100%|█| 313/313 [00:27<00:00, 11.25it/s, loss=0.0047, acc=96.23%, RAM=


Epoch 25/50, Loss: 0.1342, Accuracy: 96.23%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.55it/s, acc=60.80%, RAM=219.7MB]


Test Accuracy: 60.80%
Epoch 26/50


Training: 100%|█| 313/313 [00:27<00:00, 11.26it/s, loss=0.1570, acc=96.48%, RAM=


Epoch 26/50, Loss: 0.1266, Accuracy: 96.48%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 12.07it/s, acc=60.20%, RAM=162.5MB]


Test Accuracy: 60.20%
Epoch 27/50


Training: 100%|█| 313/313 [00:27<00:00, 11.23it/s, loss=0.2880, acc=96.37%, RAM=


Epoch 27/50, Loss: 0.1333, Accuracy: 96.37%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.43it/s, acc=59.80%, RAM=220.5MB]


Test Accuracy: 59.80%
Epoch 28/50


Training: 100%|█| 313/313 [00:27<00:00, 11.19it/s, loss=0.0224, acc=96.59%, RAM=


Epoch 28/50, Loss: 0.1218, Accuracy: 96.59%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 12.65it/s, acc=60.60%, RAM=162.6MB]


Test Accuracy: 60.60%
Epoch 29/50


Training: 100%|█| 313/313 [00:27<00:00, 11.29it/s, loss=0.3708, acc=96.87%, RAM=


Epoch 29/50, Loss: 0.1156, Accuracy: 96.87%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.27it/s, acc=60.80%, RAM=207.4MB]


Test Accuracy: 60.80%
Epoch 30/50


Training: 100%|█| 313/313 [00:28<00:00, 11.05it/s, loss=0.0715, acc=96.67%, RAM=


Epoch 30/50, Loss: 0.1151, Accuracy: 96.67%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 13.55it/s, acc=61.60%, RAM=178.8MB]


Test Accuracy: 61.60%
Epoch 31/50


Training: 100%|█| 313/313 [00:28<00:00, 11.12it/s, loss=0.0489, acc=97.13%, RAM=


Epoch 31/50, Loss: 0.1011, Accuracy: 97.13%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.58it/s, acc=62.50%, RAM=230.5MB]


Test Accuracy: 62.50%
Epoch 32/50


Training: 100%|█| 313/313 [00:27<00:00, 11.29it/s, loss=0.2869, acc=97.52%, RAM=


Epoch 32/50, Loss: 0.0958, Accuracy: 97.52%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.43it/s, acc=61.90%, RAM=182.6MB]


Test Accuracy: 61.90%
Epoch 33/50


Training: 100%|█| 313/313 [00:27<00:00, 11.42it/s, loss=0.1085, acc=97.71%, RAM=


Epoch 33/50, Loss: 0.0906, Accuracy: 97.71%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.77it/s, acc=62.00%, RAM=236.0MB]


Test Accuracy: 62.00%
Epoch 34/50


Training: 100%|█| 313/313 [00:27<00:00, 11.36it/s, loss=0.0041, acc=97.61%, RAM=


Epoch 34/50, Loss: 0.0907, Accuracy: 97.61%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.21it/s, acc=61.20%, RAM=171.0MB]


Test Accuracy: 61.20%
Epoch 35/50


Training: 100%|█| 313/313 [00:27<00:00, 11.53it/s, loss=0.1354, acc=97.72%, RAM=


Epoch 35/50, Loss: 0.0854, Accuracy: 97.72%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.58it/s, acc=61.50%, RAM=203.9MB]


Test Accuracy: 61.50%
Epoch 36/50


Training: 100%|█| 313/313 [00:27<00:00, 11.46it/s, loss=0.1189, acc=97.86%, RAM=


Epoch 36/50, Loss: 0.0886, Accuracy: 97.86%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 12.23it/s, acc=60.70%, RAM=142.8MB]


Test Accuracy: 60.70%
Epoch 37/50


Training: 100%|█| 313/313 [00:27<00:00, 11.57it/s, loss=0.0465, acc=97.76%, RAM=


Epoch 37/50, Loss: 0.0865, Accuracy: 97.76%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.55it/s, acc=61.40%, RAM=206.8MB]


Test Accuracy: 61.40%
Epoch 38/50


Training: 100%|█| 313/313 [00:27<00:00, 11.56it/s, loss=0.0634, acc=97.76%, RAM=


Epoch 38/50, Loss: 0.0847, Accuracy: 97.76%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 12.70it/s, acc=60.90%, RAM=167.8MB]


Test Accuracy: 60.90%
Epoch 39/50


Training: 100%|█| 313/313 [00:27<00:00, 11.28it/s, loss=0.0937, acc=98.15%, RAM=


Epoch 39/50, Loss: 0.0750, Accuracy: 98.15%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.58it/s, acc=61.60%, RAM=229.2MB]


Test Accuracy: 61.60%
Epoch 40/50


Training: 100%|█| 313/313 [00:27<00:00, 11.42it/s, loss=0.3269, acc=97.88%, RAM=


Epoch 40/50, Loss: 0.0819, Accuracy: 97.88%, Max RAM: 533.4MB


Evaluating: 100%|███████| 32/32 [00:02<00:00, 12.45it/s, acc=61.90%, RAM=77.7MB]


Test Accuracy: 61.90%
Epoch 41/50


Training: 100%|█| 313/313 [00:28<00:00, 11.18it/s, loss=0.1317, acc=98.02%, RAM=


Epoch 41/50, Loss: 0.0761, Accuracy: 98.02%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 12.82it/s, acc=61.40%, RAM=154.6MB]


Test Accuracy: 61.40%
Epoch 42/50


Training: 100%|█| 313/313 [00:29<00:00, 10.46it/s, loss=0.0765, acc=98.36%, RAM=


Epoch 42/50, Loss: 0.0707, Accuracy: 98.36%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 13.83it/s, acc=61.30%, RAM=179.0MB]


Test Accuracy: 61.30%
Epoch 43/50


Training: 100%|█| 313/313 [00:27<00:00, 11.43it/s, loss=0.0913, acc=98.40%, RAM=


Epoch 43/50, Loss: 0.0714, Accuracy: 98.40%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 13.48it/s, acc=60.90%, RAM=188.3MB]


Test Accuracy: 60.90%
Epoch 44/50


Training: 100%|█| 313/313 [00:27<00:00, 11.38it/s, loss=0.1128, acc=98.46%, RAM=


Epoch 44/50, Loss: 0.0688, Accuracy: 98.46%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.13it/s, acc=61.40%, RAM=203.0MB]


Test Accuracy: 61.40%
Epoch 45/50


Training: 100%|█| 313/313 [00:27<00:00, 11.24it/s, loss=0.2357, acc=98.34%, RAM=


Epoch 45/50, Loss: 0.0710, Accuracy: 98.34%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.45it/s, acc=61.20%, RAM=220.6MB]


Test Accuracy: 61.20%
Epoch 46/50


Training: 100%|█| 313/313 [00:27<00:00, 11.21it/s, loss=0.0885, acc=98.29%, RAM=


Epoch 46/50, Loss: 0.0719, Accuracy: 98.29%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 13.25it/s, acc=61.30%, RAM=154.9MB]


Test Accuracy: 61.30%
Epoch 47/50


Training: 100%|█| 313/313 [00:28<00:00, 11.11it/s, loss=0.0239, acc=98.41%, RAM=


Epoch 47/50, Loss: 0.0666, Accuracy: 98.41%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.60it/s, acc=61.30%, RAM=224.6MB]


Test Accuracy: 61.30%
Epoch 48/50


Training: 100%|█| 313/313 [00:28<00:00, 11.07it/s, loss=0.0597, acc=98.51%, RAM=


Epoch 48/50, Loss: 0.0651, Accuracy: 98.51%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 13.85it/s, acc=60.60%, RAM=181.6MB]


Test Accuracy: 60.60%
Epoch 49/50


Training: 100%|█| 313/313 [00:27<00:00, 11.24it/s, loss=0.1340, acc=98.52%, RAM=


Epoch 49/50, Loss: 0.0641, Accuracy: 98.52%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.15it/s, acc=60.40%, RAM=195.6MB]


Test Accuracy: 60.40%
Epoch 50/50


Training: 100%|█| 313/313 [00:28<00:00, 11.07it/s, loss=0.1555, acc=98.56%, RAM=


Epoch 50/50, Loss: 0.0631, Accuracy: 98.56%, Max RAM: 533.4MB


Evaluating: 100%|██████| 32/32 [00:02<00:00, 14.06it/s, acc=61.80%, RAM=191.1MB]


Test Accuracy: 61.80%
Maximum RAM usage: 533.4MB


Evaluating: 100%|███████████████████| 32/32 [00:01<00:00, 16.41it/s, acc=61.80%]



Final Test Accuracy: 61.80%

Final Classwise Accuracy: {'class_0': 65.59139784946237, 'class_1': 66.93548387096774, 'class_2': 62.83185840707964, 'class_3': 69.72477064220183, 'class_4': 67.74193548387096, 'class_5': 56.043956043956044, 'class_6': 61.61616161616162, 'class_7': 61.05263157894737, 'class_8': 53.65853658536585, 'class_9': 49.504950495049506}
Model saved as 'simple_biased_mnist_cnn_subset_01.pth'


Evaluating: 100%|█████████████████| 313/313 [00:22<00:00, 14.13it/s, acc=99.93%]
Evaluating: 100%|███████████████████| 32/32 [00:03<00:00, 10.48it/s, acc=61.80%]


Train Accuracy: 99.93%
Train Class-wise Accuracy:
  class_0: 100.00%
  class_1: 100.00%
  class_2: 99.90%
  class_3: 99.91%
  class_4: 99.80%
  class_5: 100.00%
  class_6: 100.00%
  class_7: 100.00%
  class_8: 99.68%
  class_9: 100.00%
Test Accuracy: 61.80%
Test Class-wise Accuracy:
  class_0: 65.59%
  class_1: 66.94%
  class_2: 62.83%
  class_3: 69.72%
  class_4: 67.74%
  class_5: 56.04%
  class_6: 61.62%
  class_7: 61.05%
  class_8: 53.66%
  class_9: 49.50%


In [295]:
for i in range(num_models):
    print("*********************************")
    print(f"   ACCURACIES FOR MODEL {i+1}   ")
    print("Overall Train\t| Overall Test")
    print(f"{tot_train_accuracies[i]:.2f}%\t\t| {tot_test_accuracies[i]:.2f}%")
    print("----------------------------")
    print("Train Classwise\t| Test Classwise")
    for j in range(10):
        print(f"{j}: {classwise_train_accuracies[i][j]:.2f}%\t| {j}: {classwise_test_accuracies[i][j]:.2f}%")
    print("*********************************")

*********************************
   ACCURACIES FOR MODEL 1   
Overall Train	| Overall Test
99.92%		| 61.80%
----------------------------
Train Classwise	| Test Classwise
0: 100.00%	| 0: 65.18%
1: 100.00%	| 1: 75.61%
2: 100.00%	| 2: 51.40%
3: 99.70%	| 3: 63.64%
4: 99.90%	| 4: 65.56%
5: 99.90%	| 5: 56.79%
6: 99.90%	| 6: 73.47%
7: 100.00%	| 7: 56.86%
8: 99.78%	| 8: 53.85%
9: 100.00%	| 9: 51.55%
*********************************
*********************************
   ACCURACIES FOR MODEL 2   
Overall Train	| Overall Test
99.95%		| 59.30%
----------------------------
Train Classwise	| Test Classwise
0: 100.00%	| 0: 55.10%
1: 100.00%	| 1: 76.86%
2: 99.90%	| 2: 51.38%
3: 99.90%	| 3: 69.52%
4: 100.00%	| 4: 57.45%
5: 100.00%	| 5: 45.33%
6: 99.90%	| 6: 71.91%
7: 100.00%	| 7: 55.93%
8: 99.79%	| 8: 46.81%
9: 100.00%	| 9: 56.70%
*********************************
*********************************
   ACCURACIES FOR MODEL 3   
Overall Train	| Overall Test
99.97%		| 58.90%
----------------------------
T

In [297]:
print(tot_train_accuracies)
print(classwise_train_accuracies)
print(tot_test_accuracies)
print(classwise_test_accuracies)

[99.91996798719488, 99.9499799919968, 99.96998799519808, 99.92997198879551]
[[100.0, 100.0, 100.0, 99.7005988023952, 99.90118577075098, 99.89669421487604, 99.89711934156378, 100.0, 99.78354978354979, 100.0], [100.0, 100.0, 99.90049751243781, 99.90319457889642, 100.0, 100.0, 99.89722507708119, 100.0, 99.79188345473464, 100.0], [100.0, 100.0, 99.89888776541962, 100.0, 100.0, 100.0, 100.0, 100.0, 99.90176817288801, 99.8984771573604], [100.0, 100.0, 99.90375360923966, 99.90503323836657, 99.79550102249489, 100.0, 100.0, 100.0, 99.68152866242038, 100.0]]
[61.8, 59.3, 58.9, 61.8]
[[65.17857142857143, 75.60975609756098, 51.401869158878505, 63.63636363636363, 65.55555555555556, 56.79012345679013, 73.46938775510205, 56.86274509803921, 53.84615384615385, 51.54639175257732], [55.10204081632653, 76.85950413223141, 51.37614678899082, 69.52380952380952, 57.4468085106383, 45.333333333333336, 71.91011235955057, 55.932203389830505, 46.808510638297875, 56.70103092783505], [59.61538461538461, 78.761061946

 # OFFICIAL RUNS FOR 4 MODELS

using 50/50 split of 0.1, newest arch, dropout rate = 0.5, scheduler learning rate step size = 10
Using 9900 training images (33.0% of dataset)
Using 990 test images (3.3% of dataset)
[99.62626262626263, 99.51515151515152, 99.61616161616162, 99.74747474747475]
[[99.89858012170386, 99.9107939339875, 99.69040247678019, 99.51028403525955, 99.46004319654428, 99.41176470588235, 99.60474308300395, 99.90300678952474, 99.39577039274924, 99.39455095862765], [99.69356486210418, 99.72652689152234, 99.6023856858847, 99.71098265895954, 99.07502569373072, 99.66517857142857, 100.0, 99.6003996003996, 98.82854100106496, 99.19354838709677], [99.46294307196563, 99.91007194244604, 99.60591133004927, 99.8995983935743, 99.69167523124358, 99.32885906040268, 99.79338842975207, 99.7104247104247, 99.29506545820745, 99.38900203665987], [99.59839357429719, 100.0, 99.89583333333333, 99.8015873015873, 99.68085106382979, 99.57446808510639, 99.70588235294117, 100.0, 99.57671957671958, 99.59390862944163]]
[69.8989898989899, 68.68686868686869, 71.61616161616162, 72.62626262626263]
[[80.68181818181819, 83.6734693877551, 53.93258426966292, 70.43478260869566, 67.88990825688073, 67.05882352941177, 83.17757009345794, 70.10309278350516, 56.73076923076923, 64.28571428571429], [71.95121951219512, 71.875, 71.42857142857143, 67.5925925925926, 73.33333333333333, 62.365591397849464, 78.57142857142857, 72.64150943396227, 46.236559139784944, 68.90756302521008], [71.875, 93.57798165137615, 62.857142857142854, 67.88990825688073, 80.0, 62.244897959183675, 77.41935483870968, 71.0, 57.77777777777778, 69.47368421052632], [79.34782608695652, 79.33884297520662, 64.51612903225806, 70.96774193548387, 71.27659574468085, 69.62025316455696, 83.49514563106796, 75.96153846153847, 60.0, 68.96551724137932]]

*********************************
   ACCURACIES FOR MODEL 1   
Overall Train	| Overall Test
99.63%		| 69.90%
----------------------------
Train Classwise	| Test Classwise
0: 99.90%	| 0: 80.68%
1: 99.91%	| 1: 83.67%
2: 99.69%	| 2: 53.93%
3: 99.51%	| 3: 70.43%
4: 99.46%	| 4: 67.89%
5: 99.41%	| 5: 67.06%
6: 99.60%	| 6: 83.18%
7: 99.90%	| 7: 70.10%
8: 99.40%	| 8: 56.73%
9: 99.39%	| 9: 64.29%
*********************************
*********************************
   ACCURACIES FOR MODEL 2   
Overall Train	| Overall Test
99.52%		| 68.69%
----------------------------
Train Classwise	| Test Classwise
0: 99.69%	| 0: 71.95%
1: 99.73%	| 1: 71.88%
2: 99.60%	| 2: 71.43%
3: 99.71%	| 3: 67.59%
4: 99.08%	| 4: 73.33%
5: 99.67%	| 5: 62.37%
6: 100.00%	| 6: 78.57%
7: 99.60%	| 7: 72.64%
8: 98.83%	| 8: 46.24%
9: 99.19%	| 9: 68.91%
*********************************
*********************************
   ACCURACIES FOR MODEL 3   
Overall Train	| Overall Test
99.62%		| 71.62%
----------------------------
Train Classwise	| Test Classwise
0: 99.46%	| 0: 71.88%
1: 99.91%	| 1: 93.58%
2: 99.61%	| 2: 62.86%
3: 99.90%	| 3: 67.89%
4: 99.69%	| 4: 80.00%
5: 99.33%	| 5: 62.24%
6: 99.79%	| 6: 77.42%
7: 99.71%	| 7: 71.00%
8: 99.30%	| 8: 57.78%
9: 99.39%	| 9: 69.47%
*********************************
*********************************
   ACCURACIES FOR MODEL 4   
Overall Train	| Overall Test
99.75%		| 72.63%
----------------------------
Train Classwise	| Test Classwise
0: 99.60%	| 0: 79.35%
1: 100.00%	| 1: 79.34%
2: 99.90%	| 2: 64.52%
3: 99.80%	| 3: 70.97%
4: 99.68%	| 4: 71.28%
5: 99.57%	| 5: 69.62%
6: 99.71%	| 6: 83.50%
7: 100.00%	| 7: 75.96%
8: 99.58%	| 8: 60.00%
9: 99.59%	| 9: 68.97%
*********************************

using 50/50 split of 0.9, newest arch, dropout rate = 0.5, scheduler learning rate step size = 10
Using 9900 training images (33.0% of dataset)
Using 990 test images (3.3% of dataset)
[100.0, 100.0, 100.0, 99.98989898989899]
[[100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0], [100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0], [100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0], [100.0, 100.0, 100.0, 99.89949748743719, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0]]
[97.77777777777777, 97.97979797979798, 97.27272727272727, 97.27272727272727]
[[94.56521739130434, 94.69026548672566, 97.97979797979798, 98.14814814814815, 100.0, 95.45454545454545, 100.0, 100.0, 97.89473684210526, 99.00990099009901], [97.24770642201835, 97.36842105263158, 96.84210526315789, 94.56521739130434, 97.9381443298969, 97.9381443298969, 97.67441860465117, 100.0, 100.0, 100.0], [96.42857142857143, 96.36363636363636, 96.875, 97.9381443298969, 99.02912621359224, 96.66666666666667, 96.15384615384616, 97.97979797979798, 97.67441860465117, 97.84946236559139], [92.78350515463917, 97.47899159663865, 98.0, 95.0, 97.84946236559139, 97.19626168224299, 97.16981132075472, 96.62921348314607, 100.0, 100.0]]
*********************************
   ACCURACIES FOR MODEL 1   
Overall Train	| Overall Test
100.00%		| 97.78%
----------------------------
Train Classwise	| Test Classwise
0: 100.00%	| 0: 94.57%
1: 100.00%	| 1: 94.69%
2: 100.00%	| 2: 97.98%
3: 100.00%	| 3: 98.15%
4: 100.00%	| 4: 100.00%
5: 100.00%	| 5: 95.45%
6: 100.00%	| 6: 100.00%
7: 100.00%	| 7: 100.00%
8: 100.00%	| 8: 97.89%
9: 100.00%	| 9: 99.01%
*********************************
*********************************
   ACCURACIES FOR MODEL 2   
Overall Train	| Overall Test
100.00%		| 97.98%
----------------------------
Train Classwise	| Test Classwise
0: 100.00%	| 0: 97.25%
1: 100.00%	| 1: 97.37%
2: 100.00%	| 2: 96.84%
3: 100.00%	| 3: 94.57%
4: 100.00%	| 4: 97.94%
5: 100.00%	| 5: 97.94%
6: 100.00%	| 6: 97.67%
7: 100.00%	| 7: 100.00%
8: 100.00%	| 8: 100.00%
9: 100.00%	| 9: 100.00%
*********************************
*********************************
   ACCURACIES FOR MODEL 3   
Overall Train	| Overall Test
100.00%		| 97.27%
----------------------------
Train Classwise	| Test Classwise
0: 100.00%	| 0: 96.43%
1: 100.00%	| 1: 96.36%
2: 100.00%	| 2: 96.88%
3: 100.00%	| 3: 97.94%
4: 100.00%	| 4: 99.03%
5: 100.00%	| 5: 96.67%
6: 100.00%	| 6: 96.15%
7: 100.00%	| 7: 97.98%
8: 100.00%	| 8: 97.67%
9: 100.00%	| 9: 97.85%
*********************************
*********************************
   ACCURACIES FOR MODEL 4   
Overall Train	| Overall Test
99.99%		| 97.27%
----------------------------
Train Classwise	| Test Classwise
0: 100.00%	| 0: 92.78%
1: 100.00%	| 1: 97.48%
2: 100.00%	| 2: 98.00%
3: 99.90%	| 3: 95.00%
4: 100.00%	| 4: 97.85%
5: 100.00%	| 5: 97.20%
6: 100.00%	| 6: 97.17%
7: 100.00%	| 7: 96.63%
8: 100.00%	| 8: 100.00%
9: 100.00%	| 9: 100.00%
*********************************

using 0.1 and full/test set, newest arch, dropout rate = 0.5, scheduler learning rate step size = 10
Using 9960 training images (16.6% of dataset)
Using 1000 test images (10.0% of dataset)
[99.66867469879519, 99.41767068273093, 99.66867469879519, 99.60843373493977]
[[99.59432048681542, 99.91063449508489, 100.0, 99.39148073022312, 99.36908517350157, 99.22737306843267, 99.89722507708119, 99.80824544582934, 99.79959919839679, 99.59308240081384], [99.09456740442656, 99.53789279112755, 99.49647532729104, 99.5136186770428, 99.38461538461539, 99.00881057268722, 99.89868287740629, 99.60591133004927, 98.96157840083073, 99.60591133004927], [99.70414201183432, 100.0, 99.9001996007984, 99.39271255060729, 99.5987963891675, 99.35553168635876, 100.0, 99.80119284294234, 99.2964824120603, 99.58246346555323], [99.79166666666667, 99.91197183098592, 99.70588235294117, 99.52651515151516, 99.56756756756756, 99.42196531791907, 99.79879275653923, 99.80601357904946, 99.19273461150352, 99.28716904276986]]
[75.0, 72.6, 75.5, 75.2]
[[72.52747252747253, 90.08264462809917, 77.45098039215686, 65.26315789473684, 72.63157894736842, 69.04761904761905, 81.11111111111111, 77.11864406779661, 69.0721649484536, 71.02803738317758], [72.34042553191489, 83.62068965517241, 72.22222222222223, 78.84615384615384, 71.56862745098039, 59.57446808510638, 82.55813953488372, 73.7864077669903, 62.37623762376238, 67.3913043478261], [71.76470588235294, 88.49557522123894, 68.26923076923077, 81.13207547169812, 82.4074074074074, 77.21518987341773, 81.31868131868131, 73.07692307692308, 64.81481481481481, 65.68627450980392], [83.50515463917526, 86.06557377049181, 79.0, 75.22123893805309, 76.59574468085107, 69.0909090909091, 81.52173913043478, 77.10843373493977, 56.12244897959184, 65.93406593406593]]
*********************************
   ACCURACIES FOR MODEL 1   
Overall Train	| Overall Test
99.67%		| 75.00%
----------------------------
Train Classwise	| Test Classwise
0: 99.59%	| 0: 72.53%
1: 99.91%	| 1: 90.08%
2: 100.00%	| 2: 77.45%
3: 99.39%	| 3: 65.26%
4: 99.37%	| 4: 72.63%
5: 99.23%	| 5: 69.05%
6: 99.90%	| 6: 81.11%
7: 99.81%	| 7: 77.12%
8: 99.80%	| 8: 69.07%
9: 99.59%	| 9: 71.03%
*********************************
*********************************
   ACCURACIES FOR MODEL 2   
Overall Train	| Overall Test
99.42%		| 72.60%
----------------------------
Train Classwise	| Test Classwise
0: 99.09%	| 0: 72.34%
1: 99.54%	| 1: 83.62%
2: 99.50%	| 2: 72.22%
3: 99.51%	| 3: 78.85%
4: 99.38%	| 4: 71.57%
5: 99.01%	| 5: 59.57%
6: 99.90%	| 6: 82.56%
7: 99.61%	| 7: 73.79%
8: 98.96%	| 8: 62.38%
9: 99.61%	| 9: 67.39%
*********************************
*********************************
   ACCURACIES FOR MODEL 3   
Overall Train	| Overall Test
99.67%		| 75.50%
----------------------------
Train Classwise	| Test Classwise
0: 99.70%	| 0: 71.76%
1: 100.00%	| 1: 88.50%
2: 99.90%	| 2: 68.27%
3: 99.39%	| 3: 81.13%
4: 99.60%	| 4: 82.41%
5: 99.36%	| 5: 77.22%
6: 100.00%	| 6: 81.32%
7: 99.80%	| 7: 73.08%
8: 99.30%	| 8: 64.81%
9: 99.58%	| 9: 65.69%
*********************************
*********************************
   ACCURACIES FOR MODEL 4   
Overall Train	| Overall Test
99.61%		| 75.20%
----------------------------
Train Classwise	| Test Classwise
0: 99.79%	| 0: 83.51%
1: 99.91%	| 1: 86.07%
2: 99.71%	| 2: 79.00%
3: 99.53%	| 3: 75.22%
4: 99.57%	| 4: 76.60%
5: 99.42%	| 5: 69.09%
6: 99.80%	| 6: 81.52%
7: 99.81%	| 7: 77.11%
8: 99.19%	| 8: 56.12%
9: 99.29%	| 9: 65.93%
*********************************

using 0.9 and full/test set, newest arch, dropout rate = 0.5, scheduler learning rate step size = 10
Using 9960 training images (16.6% of dataset)
Using 1000 test images (10.0% of dataset)
[100.0, 100.0, 100.0, 100.0]
[[100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0], [100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0], [100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0], [100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0]]
[16.5, 15.2, 15.7, 16.4]
[[19.54022988505747, 15.0, 12.037037037037036, 16.037735849056602, 19.51219512195122, 14.606741573033707, 20.952380952380953, 18.103448275862068, 15.74074074074074, 13.924050632911392], [13.25301204819277, 14.40677966101695, 6.896551724137931, 13.043478260869565, 16.50485436893204, 9.090909090909092, 24.175824175824175, 26.605504587155963, 17.346938775510203, 10.784313725490197], [11.578947368421053, 13.675213675213675, 15.88785046728972, 11.627906976744185, 11.403508771929825, 18.39080459770115, 26.881720430107528, 18.181818181818183, 19.753086419753085, 11.818181818181818], [24.46808510638298, 10.56910569105691, 12.121212121212121, 16.853932584269664, 15.238095238095237, 12.631578947368421, 14.953271028037383, 24.50980392156863, 16.49484536082474, 17.97752808988764]]
*********************************
   ACCURACIES FOR MODEL 1   
Overall Train	| Overall Test
100.00%		| 16.50%
----------------------------
Train Classwise	| Test Classwise
0: 100.00%	| 0: 19.54%
1: 100.00%	| 1: 15.00%
2: 100.00%	| 2: 12.04%
3: 100.00%	| 3: 16.04%
4: 100.00%	| 4: 19.51%
5: 100.00%	| 5: 14.61%
6: 100.00%	| 6: 20.95%
7: 100.00%	| 7: 18.10%
8: 100.00%	| 8: 15.74%
9: 100.00%	| 9: 13.92%
*********************************
*********************************
   ACCURACIES FOR MODEL 2   
Overall Train	| Overall Test
100.00%		| 15.20%
----------------------------
Train Classwise	| Test Classwise
0: 100.00%	| 0: 13.25%
1: 100.00%	| 1: 14.41%
2: 100.00%	| 2: 6.90%
3: 100.00%	| 3: 13.04%
4: 100.00%	| 4: 16.50%
5: 100.00%	| 5: 9.09%
6: 100.00%	| 6: 24.18%
7: 100.00%	| 7: 26.61%
8: 100.00%	| 8: 17.35%
9: 100.00%	| 9: 10.78%
*********************************
*********************************
   ACCURACIES FOR MODEL 3   
Overall Train	| Overall Test
100.00%		| 15.70%
----------------------------
Train Classwise	| Test Classwise
0: 100.00%	| 0: 11.58%
1: 100.00%	| 1: 13.68%
2: 100.00%	| 2: 15.89%
3: 100.00%	| 3: 11.63%
4: 100.00%	| 4: 11.40%
5: 100.00%	| 5: 18.39%
6: 100.00%	| 6: 26.88%
7: 100.00%	| 7: 18.18%
8: 100.00%	| 8: 19.75%
9: 100.00%	| 9: 11.82%
*********************************
*********************************
   ACCURACIES FOR MODEL 4   
Overall Train	| Overall Test
100.00%		| 16.40%
----------------------------
Train Classwise	| Test Classwise
0: 100.00%	| 0: 24.47%
1: 100.00%	| 1: 10.57%
2: 100.00%	| 2: 12.12%
3: 100.00%	| 3: 16.85%
4: 100.00%	| 4: 15.24%
5: 100.00%	| 5: 12.63%
6: 100.00%	| 6: 14.95%
7: 100.00%	| 7: 24.51%
8: 100.00%	| 8: 16.49%
9: 100.00%	| 9: 17.98%
*********************************

50% train from 0.1, 50% train from 0.9, test from full/test
Using 9996 training images (8.3% of dataset)
Using 1000 test images (10.0% of dataset)
[99.91996798719488, 99.9499799919968, 99.96998799519808, 99.92997198879551]
[[100.0, 100.0, 100.0, 99.7005988023952, 99.90118577075098, 99.89669421487604, 99.89711934156378, 100.0, 99.78354978354979, 100.0], [100.0, 100.0, 99.90049751243781, 99.90319457889642, 100.0, 100.0, 99.89722507708119, 100.0, 99.79188345473464, 100.0], [100.0, 100.0, 99.89888776541962, 100.0, 100.0, 100.0, 100.0, 100.0, 99.90176817288801, 99.8984771573604], [100.0, 100.0, 99.90375360923966, 99.90503323836657, 99.79550102249489, 100.0, 100.0, 100.0, 99.68152866242038, 100.0]]
[61.8, 59.3, 58.9, 61.8]
[[65.17857142857143, 75.60975609756098, 51.401869158878505, 63.63636363636363, 65.55555555555556, 56.79012345679013, 73.46938775510205, 56.86274509803921, 53.84615384615385, 51.54639175257732], [55.10204081632653, 76.85950413223141, 51.37614678899082, 69.52380952380952, 57.4468085106383, 45.333333333333336, 71.91011235955057, 55.932203389830505, 46.808510638297875, 56.70103092783505], [59.61538461538461, 78.76106194690266, 58.333333333333336, 56.32183908045977, 56.52173913043478, 52.04081632653061, 67.46987951807229, 58.11965811965812, 52.04081632653061, 48.0], [65.59139784946237, 66.93548387096774, 62.83185840707964, 69.72477064220183, 67.74193548387096, 56.043956043956044, 61.61616161616162, 61.05263157894737, 53.65853658536585, 49.504950495049506]]
*********************************
   ACCURACIES FOR MODEL 1   
Overall Train	| Overall Test
99.92%		| 61.80%
----------------------------
Train Classwise	| Test Classwise
0: 100.00%	| 0: 65.18%
1: 100.00%	| 1: 75.61%
2: 100.00%	| 2: 51.40%
3: 99.70%	| 3: 63.64%
4: 99.90%	| 4: 65.56%
5: 99.90%	| 5: 56.79%
6: 99.90%	| 6: 73.47%
7: 100.00%	| 7: 56.86%
8: 99.78%	| 8: 53.85%
9: 100.00%	| 9: 51.55%
*********************************
*********************************
   ACCURACIES FOR MODEL 2   
Overall Train	| Overall Test
99.95%		| 59.30%
----------------------------
Train Classwise	| Test Classwise
0: 100.00%	| 0: 55.10%
1: 100.00%	| 1: 76.86%
2: 99.90%	| 2: 51.38%
3: 99.90%	| 3: 69.52%
4: 100.00%	| 4: 57.45%
5: 100.00%	| 5: 45.33%
6: 99.90%	| 6: 71.91%
7: 100.00%	| 7: 55.93%
8: 99.79%	| 8: 46.81%
9: 100.00%	| 9: 56.70%
*********************************
*********************************
   ACCURACIES FOR MODEL 3   
Overall Train	| Overall Test
99.97%		| 58.90%
----------------------------
Train Classwise	| Test Classwise
0: 100.00%	| 0: 59.62%
1: 100.00%	| 1: 78.76%
2: 99.90%	| 2: 58.33%
3: 100.00%	| 3: 56.32%
4: 100.00%	| 4: 56.52%
5: 100.00%	| 5: 52.04%
6: 100.00%	| 6: 67.47%
7: 100.00%	| 7: 58.12%
8: 99.90%	| 8: 52.04%
9: 99.90%	| 9: 48.00%
*********************************
*********************************
   ACCURACIES FOR MODEL 4   
Overall Train	| Overall Test
99.93%		| 61.80%
----------------------------
Train Classwise	| Test Classwise
0: 100.00%	| 0: 65.59%
1: 100.00%	| 1: 66.94%
2: 99.90%	| 2: 62.83%
3: 99.91%	| 3: 69.72%
4: 99.80%	| 4: 67.74%
5: 100.00%	| 5: 56.04%
6: 100.00%	| 6: 61.62%
7: 100.00%	| 7: 61.05%
8: 99.68%	| 8: 53.66%
9: 100.00%	| 9: 49.50%
*********************************