# Convolutional Neural Networks in pyTorch

Architecture design becomes more complicated as we add in more types of layers. We can take advantage of successful architectures and their weights by importing *pretrained models*.

In this activity, we will load well-known CNN architectures and their pre-trained weights, and tune them for classification of the [Fashion MNIST](https://github.com/zalandoresearch/fashion-mnist) dataset. 

We will start by loading in a ResNet CNN architecture, which was first introduced in [this paper](https://arxiv.org/abs/1512.03385). They introduce a small mathematical trick which allows for less trainable parameters than earlier CNNs, which is useful as we are all sharing GPU resources :).

## To Do:
#### 1. Take a moment to read about the model linked above. Figure 3 compares the ResNet architecture to VGG, a more standard CNN architecture. We will load in ResNet18, an 18-layer version of this model.

#### 2. In Section 3.4 note that the authors preprocessed ImageNet (RGB) data into a shape of $224 \times 224 \times 3$ pixels per image. Our first task is to map our image data into the same shape so we can use the architecture without much modification.

In [13]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
from torchvision import models
from torchvision.io import decode_image
from sklearn.metrics import confusion_matrix



To accomplish this, we can use the `transforms.Compose` method from `torchvision`. I've done this for you here.

*Note: I didn't find the scaling used on the images in the paper. A simple testing of a couple scalings led me to choose the one here... it may not be the best!*

In [14]:
# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Define transformations
transform = transforms.Compose([
    transforms.Resize(224),  # ResNet expects at least 224x224 images
    transforms.Grayscale(num_output_channels=3),  # Convert to 3-channel RGB format
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))  # (mean, std)
])

# Load Fashion MNIST dataset
trainset = torchvision.datasets.FashionMNIST(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=32,
                                          shuffle=True)

testset = torchvision.datasets.FashionMNIST(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=32,
                                         shuffle=False)

### 3. Visualize the images to make sure you understand the data. (E.g. what do you expect to look different from last time?)

In [18]:

# Class labels that associate with numeric labels in dataset: 0,1,2,3,4,5,6,7,8,9
classes = ('T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
           'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot')





### 4. We now load in the resnet18 model with it's pretrained weights from the paper. Look through the architecture

Note that there is only one "Dense" or fully connected layer in this architecture!

In [16]:
model_res = models.resnet18(weights=models.ResNet18_Weights.IMAGENET1K_V1) # need to explicitly import and load weights
print(model_res)



ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

### 5. We then re-define the last layer of the model for *our* classification task.

Check your new architecture and check that it is what you expect.

In [17]:
num_features = model_res.fc.in_features # how many values going into last layer?
model_res.fc = nn.Linear(# complete me)
print(model_res)

SyntaxError: incomplete input (2235788832.py, line 3)

### 6. We can choose to only train the last layer or let all the layers fine tune.

In [12]:
# let's start by just training the final layer:
for param in model_res.parameters():
    param.requires_grad = False
for param in model_res.fc.parameters():
    param.requires_grad = True

In [10]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# am i using a GPU? This is so slow :(
print(torch.cuda.is_available())

model = model_res.to(device)

loss_f = # complete me
optimizer = optim.Adam(model.fc.parameters(), lr=1e-4) # good starting point

# Train the model
for epoch in range(5):  # Keep it short for the activity
    model_res.train()
    for images, labels in trainloader:
        images, labels = images.to(device), labels.to(device)

        outputs = model_res(images)
        loss = loss_f(outputs, labels)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")

True
Epoch 1, Loss: 0.6011
Epoch 2, Loss: 0.3849
Epoch 3, Loss: 0.3469
Epoch 4, Loss: 0.4370
Epoch 5, Loss: 0.3912


### 5. Evaluate your model as before. Then, try to tune the model for increased performance.

### 6. If you have time, you can swap out the model architecture for another pre-trained model. Check the `torch` documentation to find all the available models.