# DATASCI 315, Homework 7: Regularization for Image Classification

In this homework assignment, you will train a neural network to infer the number of galaxies in an image, as in group-work assignment 6. Now, however, the images will be noisier and you will need to use various types of regularization to achieve the target level of accuracy.

To submit your work, please upload html output from executing this notebook to Canvas.

# Getting started

## Import relevant packages and initialize

In [None]:
import matplotlib.pyplot as plt
import torch
from torch import nn, optim
from torch.utils.data import DataLoader, TensorDataset

plt.rcParams["axes.grid"] = False
torch.manual_seed(42);

## Load the data

Here are steps to follow to load the data.

  1. Download the following files from Canvas (`Files/homework/hw7 data`):
      - [dataset_train_0.5_norm.pt](https://umich.instructure.com/courses/733177/files/39931923/download?download_frd=1)
      - [dataset_test_0.5_norm.pt](https://umich.instructure.com/courses/733177/files/39931924/download?download_frd=1)
  2. If you're using Google colab, go to the `Files` tab on the left
  3. Create a directory named `data` (or name is something else and change the path below)
  4. Upload the dataset files to this directory either by clicking the `Upload` button or dragging the files to the directory.

Alternative, you may place the dataset in your Google drive for persistent storage, and connect to it with the following code:
```
from google.colab import drive
drive.mount('/content/drive')
```
Now the following code block should load the data. You may need to update the file paths.

In [None]:
train_val_images, train_val_counts = torch.load("data/dataset_train_0.5_norm.pt", weights_only=True)
test_images, test_counts = torch.load("data/dataset_test_0.5_norm.pt", weights_only=True)

In [None]:
# Determine split indices
split_index = int(0.9 * len(train_val_images))

# Split the data
train_images = train_val_images[:split_index]
train_counts = train_val_counts[:split_index]
val_images = train_val_images[split_index:]
val_counts = train_val_counts[split_index:]

print("Training set size:", len(train_images))
print("Validation set size:", len(val_images))

## Inspect the data

Let's display two random images from the train and test set and their corresponding number of galaxies:

In [None]:
i = torch.randint(len(train_images), (1,)).item()
plt.imshow(train_images[i], cmap="gray")
plt.title(f"Number of galaxies: {train_counts[i]}");

In [None]:
i = torch.randint(len(val_images), (1,)).item()
plt.imshow(val_images[i], cmap="gray")
plt.title(f"Number of galaxies: {val_counts[i]}");

## Problem 1: Specifying the model architecture

We will be predicting the number of galaxies in the images, using simple feedforward neural networks.

The input to the model will be the image, and the output will be the number of galaxies in the image.
As the latter is a discrete variable, we will treat this as a classification problem, that is, the numbers will be treated as classes.

In [None]:
# the image dimension
dim = 50
# the number of classes (maximum number of galaxies is 6)
n_classes = 7

Fill out the below with a model architecture that you think will work well.

We are still limiting ourselves to linear layers (no convolutions), but you can now also use regularization.

One example of a regularization technique is [dropout](https://pytorch.org/docs/stable/generated/torch.nn.Dropout.html), which randomly sets a fraction of the input units to 0 at each update during training time, which helps prevent overfitting.


In [None]:
# BEGIN SOLUTION
model = nn.Sequential(
    nn.Flatten(),
    nn.Linear(dim * dim, 256),
    nn.ReLU(),
    nn.Dropout(0.5),
    nn.Linear(256, 256),
    nn.ReLU(),
    nn.Dropout(0.5),
    nn.Linear(256, 256),
    nn.ReLU(),
    nn.Linear(256, n_classes),
)
# END SOLUTION

In [None]:
# Test assertions
assert isinstance(model, nn.Module), "model should be an nn.Module"
# BEGIN HIDDEN TESTS
test_input = torch.randn(1, dim, dim)
test_output = model(test_input)
expected_shape = (1, n_classes)
assert test_output.shape == expected_shape, f"output shape should be {expected_shape}"
# END HIDDEN TESTS

## Problem 2: Training the model

The following function re-initializes the model parameters. It's useful if you are re-training a model after changing the training hyperparameters.

In [None]:
def reset_model_parameters(model):
    for module in model.modules():
        if hasattr(module, "reset_parameters"):
            module.reset_parameters()

Now fill out the function below. Unlike in GroupWork 6, we will be passing an *optimizer* to the function.

This allows us to experiment using optimizers (c.f. [Adam](https://pytorch.org/docs/stable/generated/torch.optim.Adam)) with different values of the `weight_decay` parameter, which uses L2 regularization to penalize large weights.

The function should return the training and testing losses so you can evaluate the model.

In [None]:
# BEGIN SOLUTION
def train(model, optimizer, train_dataloader, val_dataloader, num_epochs=10):
    loss_fn = nn.CrossEntropyLoss()
    reset_model_parameters(model)

    train_losses = []
    val_losses = []

    for epoch in range(num_epochs):
        train_loss = 0.0
        val_loss = 0.0

        model.train()
        for images, counts in train_dataloader:
            outputs = model(images)
            loss = loss_fn(outputs, counts.long())
            loss.backward()
            optimizer.step()
            optimizer.zero_grad()
            train_loss += loss.detach().item()

        with torch.no_grad():
            model.eval()
            for images, counts in val_dataloader:
                outputs = model(images)
                loss = loss_fn(outputs, counts.long())
                val_loss += loss.detach().item()

        train_loss /= len(train_dataloader)
        val_loss /= len(val_dataloader)

        train_losses.append(train_loss)
        val_losses.append(val_loss)

        print(f"epoch:{epoch} train:{train_loss:.4} val:{val_loss:.4}")

    return train_losses, val_losses


# END SOLUTION

In [None]:
# Test assertions
import inspect

assert callable(train), "train should be a function"
# BEGIN HIDDEN TESTS
sig = inspect.signature(train)
assert len(sig.parameters) >= 4, "train should have at least 4 parameters"
# END HIDDEN TESTS

Set up your optimizer with the appropriate parameters.

In [None]:
lr = 2e-3  # SOLUTION
weight_decay = 1e-5  # SOLUTION
optimizer = optim.Adam(model.parameters(), lr=lr, weight_decay=weight_decay)  # SOLUTION

In [None]:
# Test assertions
assert isinstance(optimizer, optim.Optimizer), "optimizer should be an Optimizer"
# BEGIN HIDDEN TESTS
assert len(optimizer.param_groups) > 0, "optimizer should have param groups"
# END HIDDEN TESTS

Set up your data loaders.

In [None]:
batch_size = 256  # SOLUTION

train_dataset = TensorDataset(train_images, train_counts)
train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

val_dataset = TensorDataset(val_images, val_counts)
val_dataloader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

In [None]:
# Test assertions
assert isinstance(batch_size, int) and batch_size > 0, "batch_size must be positive"
# BEGIN HIDDEN TESTS
assert len(train_dataloader) > 0, "train_dataloader should not be empty"
# END HIDDEN TESTS

Train the model. Note: adjust the number of epochs if you think the model is not trained enough.

In [None]:
train_losses, val_losses = train(model, optimizer, train_dataloader, val_dataloader, num_epochs=10)

Plot the training and validation losses.

In [None]:
plt.plot(train_losses, label="train")
plt.plot(val_losses, label="validation")
plt.legend();

## Problem 3: Evaluating the accuracy

Run the code below and achieve at least **63% accuracy** on the test set. If your model is not performing quite well enough, go back to Problems 1 and 2 and try different model architectures and training hyperparameters, then retrain the model until you achieve the target test-set accuracy.


In [None]:
model.eval()
pred_train_counts = model(train_images)
pred_val_counts = model(val_images)
pred_test_counts = model(test_images)

In [None]:
# BEGIN SOLUTION
train_accuracy = (pred_train_counts.argmax(dim=1) == train_counts).float().mean().item()
val_accuracy = (pred_val_counts.argmax(dim=1) == val_counts).float().mean().item()
test_accuracy = (pred_test_counts.argmax(dim=1) == test_counts).float().mean().item()
print(f"Train accuracy: {train_accuracy:.2f}")
print(f"Validation accuracy: {val_accuracy:.2f}")
print(f"Test accuracy: {test_accuracy:.2f}")
# END SOLUTION

In [None]:
# Test assertions
assert test_accuracy >= 0.63, f"Test accuracy >= 63% required, got {test_accuracy:.2%}"
# BEGIN HIDDEN TESTS
assert train_accuracy >= 0.5, f"Train accuracy >= 50%, got {train_accuracy:.2%}"
# END HIDDEN TESTS