<a href="https://colab.research.google.com/github/kankkw/229352-StatisticalLearning/blob/main/Lab08.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Statistical Learning for Data Science 2 (229352)
#### Instructor: Donlapark Ponnoprat

#### [Course website](https://donlapark.pages.dev/229352/)

## Lab #9

[Recipe for Training Neural Networks](https://karpathy.github.io/2019/04/25/recipe/)

In [None]:
%%capture
!git clone https://github.com/donlapark/ds352-labs.git

In [None]:
import numpy as np

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader, random_split
from torch.utils.tensorboard import SummaryWriter
import matplotlib.pyplot as plt

## Training a neural network in PyTorch

### Chihuahua or Muffin?

<center><img src="https://donlapark.pages.dev/229352/lab09-preview.jpg" width="500"/></center>

### 1. Data preparation

#### Load images, resize them to 128x128, and normalize the pixels to be in 0 - 1 range

In [None]:
transform = transforms.Compose([transforms.Resize((128, 128)),
                                transforms.ToTensor()])  # transform pixels to be in 0 - 1 range

dataset = datasets.ImageFolder(root="ds352-labs/lab09-data/train",
                                         transform=transform)

#### Split the dataset into training (80%), validation (20%)

In [None]:
train_size = int(0.8 * len(dataset))
val_size = len(dataset) - train_size
train_dataset, val_dataset = random_split(dataset, [train_size, val_size])

#### Load the datasets into DataLoader

In [None]:
train_loader = DataLoader(dataset=train_dataset,
                          batch_size=10,
                          shuffle=True)
val_loader = DataLoader(dataset=val_dataset,
                        batch_size=len(val_dataset),
                        shuffle=False)

#### Do the same for the test images

In [None]:
test_dataset = datasets.ImageFolder(root="ds352-labs/lab09-data/test",
                                    transform=transform)
test_loader = DataLoader(dataset=test_dataset,
                         batch_size=len(test_dataset),
                         shuffle=False)

#### Looking at the first minibatch

In [None]:
train_batches = iter(train_loader)
X, y = next(train_batches)

print(X.shape)  # (batch_size, channel, height, weight)
print(y.shape)

#### Visualize the first four images in the batch

In [None]:
X = X[:4]  # Select the first 4 images
X = X.numpy().transpose(0, 2, 3, 1)  # Convert from (B, C, H, W) to (B, H, W, C)

# Plot images
fig, axes = plt.subplots(1, 4, figsize=(12, 4))
for i in range(4):
    axes[i].imshow(X[i])
    axes[i].axis('off')
plt.show()

print(y[:4])

### 2. Build a simple logistic regression

<center><img src="https://donlapark.pages.dev/229352/logistic.png" width="300"/></center>

The most important component of the model class is the `__init__` method and the `forward` method.  

[Linear layer in Pytorch](https://docs.pytorch.org/docs/stable/generated/torch.nn.Linear.html)

[Activation functions in PyTorch](https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity). The most important ones are [ReLU](https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html), [Sigmoid](https://pytorch.org/docs/stable/generated/torch.nn.Sigmoid.html), [Softmax](https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html), [Tanh](https://pytorch.org/docs/stable/generated/torch.nn.Tanh.html).

In [None]:
class SimpleLogisticRegression(nn.Module):
    def __init__(self):
        super(SimpleLogisticRegression, self).__init__()
        self.flatten = nn.Flatten()
        self.linear = nn.Linear(3*128*128, 1)

    def forward(self, x):
        x = self.flatten(x)
        x = self.linear(x)
        return x

### 3. Initialize training components

#### Initialize the model and loss function

[Loss functions in PyTorch](https://pytorch.org/docs/stable/nn.html#loss-functions). Most important ones are [MSE](https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html), [Binary cross, entropy](https://pytorch.org/docs/stable/generated/torch.nn.BCELoss.html), [Categorical cross entropy](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html).

In [None]:
model = SimpleLogisticRegression()
criterion = nn.BCEWithLogitsLoss()

#### Manually setting initial weights to zero for demonstration

In [None]:
with torch.no_grad():
  for layer in model.modules():
      if isinstance(layer, nn.Linear):
          layer.weight.zero_()
          layer.bias.zero_()

#### Create two lists to collect training and validation losses

In [None]:
# List to store the loss values for plotting
train_losses = []
val_losses = []

#### Specify the learning rate

In [None]:
learning_rate = 1e-3
optimizer = optim.AdamW(model.parameters(), lr=learning_rate)

### 4. Training the model with gradient descent

#### Convert the dataloader into minibatches

In [None]:
for X, y in train_loader:
    break

#### Make a prediction on the minibatch (Forward pass)

In [None]:
y_hat = model(X)
y_hat = y_hat[:, 0]
y = y.float()

#### Calculate the loss function

Recall that `criterion()` is our binary cross-entropy loss (`BCELoss`).

In [None]:
# Compute the loss
loss = criterion(y_hat, y)
train_losses.append(loss.item())

#### Calculate the gradient (Backward pass)

In [None]:
# Backward pass: compute the gradient of the loss w.r.t. model parameters
loss.backward()

#### Perform a gradient descent step

Careful! We must not include this step in the gradient calculation, hence the use of `with torch.no_grad()`.

In [None]:
# Update the weights using the gradient descent rule


# Zero the gradients after updating
optimizer.step()
model.zero_grad()

#### Do the same for the validation set

Careful! Anything in the validation step must not be included in the gradient calculation, hence the use of `with torch.no_grad()`.

In [None]:
with torch.no_grad():
    for X, y in val_loader:
        y = y.float()
        y_hat = model(X)
        y_hat = y_hat[:, 0]
        val_loss = criterion(y_hat, y)
        val_losses.append(val_loss.item())

In [None]:
print(train_losses)
print(val_losses)

#### Combine everything together.

Repeat the previous steps for 20 **epochs** and plot the training and validation losses.

In [None]:
model = SimpleLogisticRegression()
criterion = nn.CrossEntropyLoss()
learning_rate = 1e-4
optimizer = optim.SGD(model.parameters(), lr=learning_rate, momentum=0.9)

with torch.no_grad():
  for layer in model.modules():
      if isinstance(layer, nn.Linear):
          layer.weight.zero_()
          layer.bias.zero_()

train_losses = []
val_losses = []


for epoch in range(10):

    model.train()
    for X, y in train_loader:
        y = y.float()
        y_hat = model(X)
        y_hat = y_hat[:, 0]

        loss = criterion(y_hat, y)
        train_losses.append(loss.item())

        loss.backward()
        optimizer.step()
        model.zero_grad()

    model.eval()
    with torch.no_grad():
        for X, y in val_loader:
            y = y.float()
            y_hat = model(X)
            y_hat = y_hat[:, 0]

            val_loss = criterion(y_hat, y)
            val_losses.append(val_loss.item())

    print(f"Epoch {epoch+1}")

# Exercise

In this exercise, we will add more layers to our classification model.

<img src="https://donlapark.pages.dev/229352/lab09-architecture.png" width="450"/>

1. Create a neural network with 3 hidden layers as shown in the picture.

2. Train the model with learning rate = 1e-2, 1e-3, 1e-4, 1e-5, and answer the following questions.
    2.1 What value of learning rate do you **think** is the best? Please explain your reason.
    2.2 What happens to the training losses if your learning rate is too large?
    2.3 What happens to the training losses if your learning rate is too small?

3. After finish training your model. Make the predictions on the test set and compute the accuracy. You may use the provided code below.

4. Use `plt.imshow()` to display at least four images that are incorrectly classified by this model.

In [None]:
class DeepNeuralNetwork(nn.Module):
    def __init__(self):
        super(DeepNeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.net = nn.Sequential(
            nn.Linear(3*128*128, 128),
            nn.ReLU(),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Linear(64, 32),
            nn.ReLU(),
            nn.Linear(32, 1)
        )

    def forward(self, x):
        x = self.flatten(x)
        x = self.net(x)
        return x

2.1
I think 1e-3 is the best learning rate because it converges faster than 1e-4 and 1e-5,
while still maintaining stable training compared to 1e-2 which may cause oscillation.

2.2
If the learning rate is too large, the training loss may oscillate or even diverge.
The model may overshoot the minimum and fail to converge.

2.3
If the learning rate is too small, the training loss decreases very slowly.
The model may take too long to converge and may get stuck in local minima.

In [None]:
# Use this code to calculate test accuracy
with torch.no_grad():
  test_batches = iter(test_loader)
  X, y = next(test_batches)

  y_hat = model(X)
  y_hat = y_hat[:, 0]

  y_hat = torch.sigmoid(y_hat)
  y_hat = (y_hat > 0.5).float() # the prediction
  ##TODO: compute accuracy
  accuracy = (y_hat == y).float().mean()
  print("Test Accuracy:", accuracy.item())

In [None]:
y = y.float()

incorrect = y_hat != y
indices = torch.where(incorrect)[0][:4]

for idx in indices:
    img = X[idx].permute(1,2,0).numpy()
    plt.imshow(img)
    plt.title(f"True: {int(y[idx].item())}, Pred: {int(y_hat[idx].item())}")
    plt.axis('off')
    plt.show()