# Homework 1: Neural Networks from Scratch (Problems 4, 5, 6)

**CS-GY 6953 Deep Learning | Spring 2026**

--- 

## Overview

This notebook contains the starter code for Problems 4, 5, and 6 of Homework 1. 

**Objectives:**
1.  **Problem 4 (15 pts):** Train an MLP on the BloodMNIST medical dataset and perform detailed error analysis.
2.  **Problem 5 (15 pts):** Investigate the impact of weight initialization on training dynamics and activation statistics.
3.  **Problem 6 (25 pts):** Build a small modular neural network library (`mytorch`) from scratch using NumPy and train it on MNIST.

## Submission Instructions

1.  **Complete the code**: Fill in all `TODO` blocks in this notebook and in the `mytorch/` Python files.
2.  **Run all cells**: Ensure all outputs, plots, and analyses are visible.
3.  **Export/Zip**: Zip this notebook and the `mytorch/` folder together.

## Academic Integrity

You may discuss concepts with classmates, but all code and written analysis must be your own. Your experimental results (accuracies, confusion matrices, plots) 
should reflect your actual training runs. Do not fabricate or copy results.


--- 
## Setup

First, we install the necessary packages. We will use `medmnist` for Problem 4.

In [6]:
!pip install medmnist

import sys
!{sys.executable} -m pip install numpy matplotlib torch torchvision scikit-learn medmnist

Collecting medmnist
  Using cached medmnist-3.0.2-py3-none-any.whl.metadata (14 kB)
Collecting numpy (from medmnist)
  Downloading numpy-2.4.2-cp313-cp313-macosx_14_0_arm64.whl.metadata (6.6 kB)
Collecting pandas (from medmnist)
  Downloading pandas-3.0.1-cp313-cp313-macosx_11_0_arm64.whl.metadata (79 kB)
Collecting scikit-learn (from medmnist)
  Downloading scikit_learn-1.8.0-cp313-cp313-macosx_12_0_arm64.whl.metadata (11 kB)
Collecting scikit-image (from medmnist)
  Downloading scikit_image-0.26.0-cp313-cp313-macosx_11_0_arm64.whl.metadata (15 kB)
Collecting Pillow (from medmnist)
  Downloading pillow-12.1.1-cp313-cp313-macosx_11_0_arm64.whl.metadata (8.8 kB)
Collecting fire (from medmnist)
  Using cached fire-0.7.1-py3-none-any.whl.metadata (5.8 kB)
Collecting torch (from medmnist)
  Downloading torch-2.10.0-2-cp313-none-macosx_11_0_arm64.whl.metadata (31 kB)
Collecting torchvision (from medmnist)
  Downloading torchvision-0.25.0-cp313-cp313-macosx_12_0_arm64.whl.metadata (5.4 kB)
C

In [11]:
import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader, TensorDataset
import medmnist
from medmnist import INFO, Evaluator
from sklearn.metrics import confusion_matrix, classification_report, ConfusionMatrixDisplay
import math

# Set random seed for reproducibility
def set_seed(seed=42):
    np.random.seed(seed)
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed_all(seed)

set_seed(42)

# Check device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

Using device: cpu


## Data Setup

**BloodMNIST**: This dataset will be automatically downloaded by the `medmnist` library when you run the code in Problem 4.

**MNIST**: This dataset (used in Problems 5 and 6) will be downloaded to the `./data` directory by `torchvision` in the cell below. 
Please ensure you have an internet connection.

In [None]:
# Download MNIST data (used in Problems 5 and 6)
import os

# Create data directory
os.makedirs('./data', exist_ok=True)

print("Downloading MNIST...")
mnist_train = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transforms.ToTensor())
mnist_test = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transforms.ToTensor())
print("MNIST Ready.")

--- 
# Problem 4: Blood Cell Classification with MLPs (15 points)

In this problem, you will train a multi-layer perceptron (MLP) to classify microscopy images of blood cells using the **BloodMNIST** dataset.

### Q4.1: Data Loading and Exploration (2 points)

In [None]:
# TODO: Load BloodMNIST dataset
current_data_flag = 'bloodmnist'
info = INFO[current_data_flag]
n_channels = info['n_channels']
n_classes = len(info['label'])
class_labels = info['label']

DataClass = getattr(medmnist, info['python_class'])

# Define transforms (convert to tensor and normalize)
data_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[.5], std=[.5])
])

# Load train, validation, and test sets
# download=True ensures it is downloaded if not present
train_dataset = DataClass(split='train', transform=data_transform, download=True)
val_dataset = DataClass(split='val', transform=data_transform, download=True)
test_dataset = DataClass(split='test', transform=data_transform, download=True)

# TODO: Report number of samples
print(f"Train samples: {len(train_dataset)}")
print(f"Val samples: {len(val_dataset)}")
print(f"Test samples: {len(test_dataset)}")

# Create DataLoaders
BATCH_SIZE = 64
train_loader = DataLoader(dataset=train_dataset, batch_size=BATCH_SIZE, shuffle=True)
val_loader = DataLoader(dataset=val_dataset, batch_size=BATCH_SIZE, shuffle=False)
test_loader = DataLoader(dataset=test_dataset, batch_size=BATCH_SIZE, shuffle=False)

In [None]:
# TODO: Display a grid of 16 random training images (2 per class)
# Iterate through the dataset or loader to collect 2 examples per class

plt.figure(figsize=(10, 8))
# ... implementation here ...
plt.suptitle("BloodMNIST Examples")
plt.show()

In [None]:
# TODO: Plot the class distribution (bar chart) for the training set
# Count occurrences of each class index in train_dataset.labels

# ... implementation here ...
plt.title("Class Distribution (Training Set)")
plt.show()

**Question:** Is the dataset balanced? 

*TODO: Write your answer here.*

### Q4.2: Build and Train an MLP (5 points)

Architecture:
*   Input: 2352 (flattened 28x28x3)
*   Hidden 1: 256 (ReLU)
*   Hidden 2: 128 (ReLU)
*   Output: 8

In [None]:
class BloodMLP(nn.Module):
    def __init__(self):
        super(BloodMLP, self).__init__()
        # TODO: Define layers
        # self.flatten = ...
        # self.fc1 = ...
        # self.relu = ...
        # self.fc2 = ...
        # self.fc3 = ...
        pass

    def forward(self, x):
        # TODO: Implement forward pass
        return x

In [None]:
# TODO: Initialize model, optimizer (Adam, lr=1e-3), and loss function (CrossEntropy)
model = BloodMLP().to(device)
optimizer = optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.CrossEntropyLoss()

In [None]:
# TODO: Training Loop
num_epochs = 30
train_losses, val_losses = [], []
train_accs, val_accs = [], []

for epoch in range(num_epochs):
    # Training
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0
    
    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)
        labels = labels.squeeze().long() # MedMNIST labels are (N, 1)
        
        # Zero gradients, forward, backward, optimize
        # ...
        pass

    # Validation
    model.eval()
    val_loss = 0.0
    val_correct = 0
    val_total = 0
    
    with torch.no_grad():
        for images, labels in val_loader:
            images, labels = images.to(device), labels.to(device)
            labels = labels.squeeze().long()
            # ...
            pass
    
    # Print stats
    print(f"Epoch {epoch+1}/{num_epochs} - Train Loss: {running_loss:.4f} Acc: {correct/total:.4f} | Val Loss: {val_loss:.4f} Acc: {val_correct/val_total:.4f}")

In [None]:
# TODO: Plot Training Loss vs Validation Loss
# TODO: Plot Training Acc vs Validation Acc

### Q4.3: Evaluation and Analysis (5 points)

In [None]:
# TODO: Evaluate on Test Set
model.eval()
all_preds = []
all_labels = []

with torch.no_grad():
    for images, labels in test_loader:
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        preds = torch.argmax(outputs, dim=1)
        
        all_preds.extend(preds.cpu().numpy())
        all_labels.extend(labels.cpu().numpy().flatten())

# 1. Accuracy
# ...

# 2. Confusion Matrix
# cm = confusion_matrix(all_labels, all_preds)
# disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=class_labels.values())
# disp.plot(xticks_rotation='vertical')

# 3. Classification Report
# print(classification_report(...))

In [None]:
# TODO: Identify 2 most confused pairs
# TODO: Find class with lowest recall

In [None]:
# TODO: Display 5 misclassified examples of the lowest-recall class


### Q4.4: Prediction Confidence Analysis (3 points)

Categorize predictions into:
*   Confident & Correct (> 0.9)
*   Confident & Incorrect (> 0.9)
*   Uncertain & Correct (< 0.6)
*   Uncertain & Incorrect (< 0.6)

In [None]:
# TODO: Find and display 2 examples from each quadrant


**Analysis:** What visual characteristics distinguish the "Incorrect but Confident" examples? Why might the model be overconfident?

*TODO: Write your analysis here.*

--- 
# Problem 5: Weight Initialization and Training Dynamics (15 points)

### Q5.1: Implement Initialization Schemes (3 points)

In [12]:
def initialize_weights(shape, method):
    """
    Args:
        shape: tuple of (fan_in, fan_out)
        method: 'zero', 'small_random', 'xavier', 'he'
    Returns:
        torch.Tensor of initialized weights
    """

    if len(shape) != 2:
        raise ValueError("Shape must be a tuple of (fan_in, fan_out)")

    fan_in, fan_out = shape


    if method == 'zero':
        return torch.zeros(shape)
    elif method == 'small_random':
        return torch.randn(shape) * 0.01
    
    elif method == 'xavier':
        sigma = math.sqrt(2 / (fan_in + fan_out))
        return torch.randn(shape) * sigma
    elif method == 'he':
        sigma = math.sqrt(2 / fan_in)
        return torch.randn(shape) * sigma
    else:
        raise ValueError("Unknown method")


In [16]:
## tetsing the initialize_weights function
print(initialize_weights((10, 10), 'zero'))

tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])


In [17]:
print(initialize_weights((10, 10), 'small_random'))

tensor([[ 2.9100e-03, -7.9642e-04,  1.3200e-02, -1.5197e-02, -1.2531e-02,
         -2.0160e-03, -1.9768e-02,  9.2746e-03,  7.8943e-03,  7.8247e-03],
        [-6.4659e-04, -2.2984e-06,  5.6931e-03,  7.4762e-03,  2.1337e-02,
          5.0145e-03,  2.9843e-03,  1.3448e-02,  1.4614e-02,  1.0566e-02],
        [-5.4614e-03, -2.1778e-03, -2.8094e-03, -3.6046e-03, -3.5718e-03,
         -1.1568e-02, -1.7660e-02, -2.5380e-02, -3.3437e-04, -1.7017e-02],
        [ 5.8634e-03, -1.7527e-02, -8.9146e-03,  5.2475e-03,  3.5178e-03,
          2.4913e-03,  4.2356e-04,  8.9666e-03, -2.3369e-03,  6.0499e-04],
        [-1.8495e-03, -1.0381e-02, -1.0130e-03, -9.2718e-03,  7.3442e-03,
          3.0971e-04, -5.8653e-03, -3.1545e-03,  2.0147e-03,  3.8398e-03],
        [ 1.2310e-02,  1.2287e-02, -1.5806e-03,  6.9485e-03, -1.2785e-02,
         -1.2692e-02,  3.2581e-03, -1.4584e-02,  1.8989e-02, -4.0566e-04],
        [ 6.4671e-03, -2.0813e-02, -9.3036e-03, -1.3950e-02, -4.1754e-03,
          1.1060e-02,  2.5285e-0

In [19]:
print(initialize_weights((10, 10), 'xavier'))

tensor([[ 0.4920,  0.2822,  0.2231,  0.4482, -0.0932, -0.0885,  0.3427,  0.0547,
          0.5285, -0.4683],
        [-0.3549,  0.1330, -0.1451, -0.2936, -0.0543, -0.2171, -0.0080, -0.2694,
         -0.6162, -0.4822],
        [ 0.0088,  0.0288,  0.2125,  0.3115,  0.8013, -0.0971,  0.0028, -0.4918,
          0.0746, -0.1412],
        [-0.3725,  0.4467, -0.2295, -0.3645,  0.3404, -0.7741,  0.2426, -0.2964,
         -0.7370, -0.2469],
        [ 0.2965, -0.0794,  0.3166, -0.2754,  0.3863, -0.0992, -0.2288, -0.1147,
          0.1247,  0.0518],
        [ 0.3398,  0.2646,  0.0075, -0.4808,  0.1674, -0.1236, -0.0277, -0.0490,
          0.3827, -0.6771],
        [-0.1074,  0.0023, -0.0165,  0.3692,  0.2881, -0.2104, -0.4110, -0.1313,
         -0.1634, -0.2176],
        [ 0.1504, -0.4527,  0.1174,  0.1120,  0.0286,  0.4277, -0.3585, -0.0447,
         -0.2042,  0.1297],
        [-0.0035,  0.5740,  0.1918,  0.1886,  0.9113, -0.3691,  0.6139, -0.3680,
         -0.5049,  0.0263],
        [-0.2916, -

In [18]:
print(initialize_weights((10, 10), 'he'))

tensor([[ 0.3253, -0.5671, -0.9269, -0.9499, -0.1901, -1.0219,  0.2642,  0.1398,
          0.4514,  0.4669],
        [ 0.1332,  0.3748,  0.1029, -0.2585, -0.0113, -0.5375, -0.7209, -0.2325,
         -0.9325, -0.3825],
        [-0.4918,  0.2770, -0.0820,  0.1640, -0.2923,  0.6540,  0.0538,  0.3360,
         -0.2617, -0.5410],
        [ 0.1404, -0.1184,  0.0156, -0.1205, -0.0925,  0.2000, -0.1265,  0.2396,
         -0.1304, -0.4289],
        [ 0.9489,  0.1869,  0.1995, -0.0119,  0.2940,  0.2972, -0.4809,  0.0787,
          0.6541, -0.8177],
        [-0.0634, -0.1091,  0.7641, -0.7038, -0.2024,  0.5256, -0.6555, -0.0281,
         -1.1301,  0.2135],
        [ 0.3244, -0.7162, -0.2752,  1.0036, -0.4838, -0.0892,  0.7555, -0.2816,
         -0.4836, -0.1881],
        [-0.6948,  0.4869, -0.5555, -0.1494, -0.7034,  1.2568,  0.4129, -0.2170,
         -0.2476, -0.3390],
        [-0.0479, -0.4891, -0.5604,  0.4277, -0.5706,  0.9905, -0.2680, -0.1472,
         -0.7930, -0.4431],
        [ 0.2839,  

### Q5.2: Activation Statistics Before Training (5 points)

Architecture: 784 -> 256 -> 256 -> 256 -> 256 -> 256 -> 10
Activations: Tanh

In [None]:
# TODO: Define a custom MLP class that lets you inspect forward passes
class InitMLP(nn.Module):
    def __init__(self, init_method, activation='tanh'):
        super().__init__()
        # ...

# TODO: Collect stats (mean/std) for each layer across the 4 init methods
# Loop through init_methods = ['zero', 'small_random', 'xavier', 'he']
#   Instantiate model
#   Pass one batch of MNIST data
#   Record layer outputs (hooks or manual forward)
#   Note: Use the MNIST data downloaded in the 'Data Setup' section: 
#         loader = DataLoader(mnist_train, batch_size=256, shuffle=True)


In [None]:
# TODO: Plot Mean Activation vs Depth (subplot 1)
# TODO: Plot Std Activation vs Depth (subplot 2)


**Analysis:** Which methods show vanishing/exploding gradients?

*TODO: Write your analysis here.*

### Q5.3: Training Dynamics Comparison (4 points)

Train 4 networks (one per init) for 10 epochs on MNIST.

In [None]:
# TODO: Training loop for the 4 models
# Plot all 4 training loss curves
# Report final test accuracy


### Q5.4: ReLU Activation Experiment (3 points)

Repeat Q5.2 and Q5.3 with **ReLU** instead of Tanh.

In [None]:
# TODO: Activation statistics with ReLU
# TODO: Training comparison with ReLU


**Analysis:** Compare Tanh vs ReLU dynamics and best initialization methods.

*TODO: Write your analysis here.*

--- 
# Problem 6: Building a Neural Network Library (25 points)

You will implement `mytorch` in the provided auxiliary files and then use it here.

In [None]:
# Add current directory to path so we can import mytorch
import sys
import os
sys.path.append(os.getcwd())

import mytorch.nn as nn
import mytorch.utils as utils

### Q6.1 - Q6.5: Implementation

Please implement the classes in:
1.  `mytorch/nn/modules/linear.py`
2.  `mytorch/nn/modules/activation.py`
3.  `mytorch/nn/modules/loss.py`
4.  `mytorch/nn/sequential.py`
5.  `mytorch/nn/optim.py`

*(Editor's note: You should edit these files directly.)*

### Q6.6: Gradient Checking (3 points)

In [None]:
from mytorch.utils import gradient_check

# Create a small dummy network for checking
input_size = 5
hidden_size = 10
output_size = 3
batch_size = 4

model = nn.Sequential(
    nn.Linear(input_size, hidden_size),
    nn.ReLU(),
    nn.Linear(hidden_size, output_size)
)

loss_fn = nn.SoftmaxCrossEntropy()

# Random data
x_dummy = np.random.randn(batch_size, input_size)
y_dummy = np.eye(output_size)[np.random.choice(output_size, batch_size)]

# Check gradients
try:
    error = gradient_check(model, loss_fn, x_dummy, y_dummy)
    print(f"Gradient check Max Relative Error: {error:.2e}")
    if error < 1e-5:
        print("Gradient check PASSED!")
    else:
        print("Gradient check FAILED!")
except NotImplementedError:
    print("Gradient check not implemented yet.")

### Q6.7: Train and Evaluate on MNIST (4 points)

In [None]:
# TODO: Train on MNIST using mytorch
# 1. Load MNIST (flattened to 784)
#    Note: Use the MNIST data downloaded in the 'Data Setup' section.
#    You can access it via torchvision or just load the tensors if already loaded.
#    X_train = mnist_train.data.float().view(-1, 784) / 255.0
#    y_train = mnist_train.targets

# 2. Create model: 784 -> 128 (ReLU) -> 64 (ReLU) -> 10
# 3. Optimizer: SGD(lr=0.1)
# 4. Loss: SoftmaxCrossEntropy
# 5. Train for 3 epochs

# Plot loss
# Report test accuracy
# Display 10 random test predictions