## Importing Libraries


torch – the main PyTorch library for building and training neural networks.

numpy – used for numerical operations and array manipulations.

torchvision.datasets and torchvision.transforms – for loading and preprocessing image datasets.

TensorDataset and DataLoader from torch.utils.data – to wrap data into tensors (multi-dimensional arrays) and efficiently load it in batches during training.

train_test_split from sklearn.model_selection – to split our dataset into training and testing subsets.

In [1]:
import torch
import numpy as np
from torchvision import datasets, transforms
from torch.utils.data import TensorDataset, DataLoader
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix

## Setting Hyperparameters and Configuration Values

#### RANDOM_SEED = 42
Ensures reproducibility — setting a fixed random seed makes sure that data splits, weight initialization, and other random operations give the same results each run.

#### BATCH_SIZE = 64
The number of samples processed before the model updates its parameters.
A typical starting point that balances speed and stability.

#### LR = 0.01
Controls how fast or slow the model learns during optimization.

#### INPUT_SIZE = 784  # 28 × 28
Each MNIST image (28×28 pixels) is flattened into a 784-dimensional vector, which serves as the input size for our model.

#### TEST_VALID_SIZE = 0.4
Specifies that 40% of the total data will be temporarily held out, to later be split evenly into validation (20%) and test (20%) sets.

In [4]:
RANDOM_SEED = 42 
BATCH_SIZE = 64
LR = 0.01 
INPUT_SIZE = 784 
TEST_VALID_SIZE = 0.4 

## Data Preparation (A1)

#### 1. Define Transformations and Load Full Data
    ToTensor() – converts each image to a PyTorch tensor and normalizes pixel values to the range [0, 1].

    Lambda(lambda x: x.flatten()) – reshapes each 28×28 grayscale image into a single vector of 784 values.

The MNIST training (60k) and test (10k) sets are both loaded, giving a total of 70,000 images.

#### 2. Combine and Split the Dataset

The full dataset is merged, normalized, and flattened, then, a two-step stratified split is performed using train_test_split:

    Step 1: 60% Training, 40% Temporary

    Step 2: Temporary (40%) → 20% Validation + 20% Test

stratify=y_combined ensures all subsets preserve the same class distribution across digits 0–9.

After splitting, all NumPy arrays are converted back to PyTorch tensors:

    X_train = torch.tensor(X_train_np).float()
    y_train = torch.tensor(y_train_np)

#### 3. Create DataLoaders (10-Class)

The train, validation, and test tensors are wrapped into TensorDatasets and passed into DataLoaders for efficient batching and shuffling:

    DATA_LOADERS_FULL = {
        'train': DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True),
        'val': DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=False),
        'test': DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=False)
    }

shuffle=True → randomizes training order for better generalization

In [None]:
# 1. Define Transformation & Load Full Data
linear_model_transforms = transforms.Compose([
    transforms.ToTensor(), 
    transforms.Lambda(lambda x: x.flatten())
])

train_set_full = datasets.MNIST('./data', train=True, download=True, transform=linear_model_transforms)
test_set_full = datasets.MNIST('./data', train=False, download=True, transform=linear_model_transforms)

# Combine datasets into single tensors for stratified split
# NOTE: We combine the normalized, flattened Tensors here
X_combined = torch.cat([train_set_full.data.float().div(255).flatten(start_dim=1),
                        test_set_full.data.float().div(255).flatten(start_dim=1)], dim=0).numpy()
y_combined = torch.cat([train_set_full.targets, test_set_full.targets], dim=0).numpy()

# Step 1: Split into Training (60%) and Temporary (40%)
X_train_np, X_temp_np, y_train_np, y_temp_np = train_test_split(
    X_combined, y_combined, 
    test_size=TEST_VALID_SIZE, 
    random_state=RANDOM_SEED, 
    stratify=y_combined 
)

# Step 2: Split Temporary (40%) into Validation (20%) and Test (20%)
X_val_np, X_test_np, y_val_np, y_test_np = train_test_split(
    X_temp_np, y_temp_np, 
    test_size=0.5, 
    random_state=RANDOM_SEED, 
    stratify=y_temp_np 
)

# Convert arrays back to PyTorch Tensors (for A2 filtering)
X_train = torch.tensor(X_train_np).float()
X_val = torch.tensor(X_val_np).float()
X_test = torch.tensor(X_test_np).float()

y_train = torch.tensor(y_train_np)
y_val = torch.tensor(y_val_np)
y_test = torch.tensor(y_test_np)

# 3. Create Full DataLoaders (10-Class)
train_dataset = TensorDataset(X_train, y_train)
val_dataset = TensorDataset(X_val, y_val)
test_dataset = TensorDataset(X_test, y_test)

DATA_LOADERS_FULL = {
    'train': DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True),
    'val': DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=False),
    'test': DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=False)
}

print(f"Part A1 Complete. Train Size: {len(X_train)}, Validation Size: {len(X_val)}, Test Size: {len(X_test)}")

Part A1 Complete. Train Size: 42000, Validation Size: 14000, Test Size: 14000
