# Transfer Learning

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import torchvision
#from torchvision import datasets, models, transforms
from torchvision.models import resnet18,ResNet18_Weights
from torchvision.transforms import v2
from torch.utils.data import DataLoader

This will download the ResNet18 model, which has been trained on the ImageNet image classification task.  Note when we print it out, there's a portion called `model.fc` (fc stands for fully-connected) which makes the transition from the feature extraction step to the classification step.  That's the part we're going to replace with a network of our own.

In [2]:
model = resnet18(weights=ResNet18_Weights.DEFAULT)
print(model)

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

Here we build our Datasets from the train and test folders (`ImageFolder` is a Pytorch object that builds a dataset from images organized in a certain directory structure for classification purposes).  You will almost certainly want to add transformations to the `train_transforms` once you get it working.

In [3]:
train_transforms = v2.Compose([
    v2.ToImage(),
    v2.ToDtype(torch.float32, scale=True),
    v2.Resize(size=(256,256),antialias=True)
])
test_transforms = v2.Compose([
    v2.ToImage(),
    v2.ToDtype(torch.float32, scale=True),
    v2.Resize(size=(256,256),antialias=True)
])
training_data = torchvision.datasets.ImageFolder('train',
                                                transform=train_transforms)
testing_data = torchvision.datasets.ImageFolder('test',
                                               transform=test_transforms)

In [4]:
train_loader = DataLoader(training_data, batch_size=64, shuffle=True,
                          num_workers=4)
test_loader = DataLoader(testing_data, batch_size=64, num_workers=4)

In [5]:
len(training_data),len(testing_data),training_data.classes

(1669, 553, ['cows', 'eagles', 'fish', 'goats', 'horse', 'mules'])

We want Pytorch to know it doesn't need to keep track of gradients for the trained portion of the network.  The easiest way to do that, is to set all `parameters.requires_grad` to False, and then replace the `.fc` portion with a new network (which will be default have a `requires_grad` property of True).  Note our new `.fc` portion now has the correct number of output nodes for our classification task.

In [6]:
for param in model.parameters():
    param.requires_grad = False

In [None]:
num_feats = model.fc.in_features
model.fc=nn.Linear(num_feats,len(training_data.classes))
model = model.to('cuda')

Train it!

In [None]:
criterion = nn.CrossEntropyLoss()
    
optimizer = optim.Adam(model.fc.parameters(), lr=.001)
    
EPOCHS=40
    
for epoch in range(EPOCHS):
    totalloss = 0
    for batch, (X, y) in enumerate(train_loader):
        X = X.to('cuda')
        y = torch.Tensor(y).to('cuda')
        predictions = model(X)
        loss = criterion(predictions, y)
    
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    
        totalloss += loss.item()
    
    if epoch % 20 == 0:
        totalloss /= len(train_loader)
        print('Loss/train', totalloss, epoch)
        test_loss = 0
    
        with torch.no_grad():
            for batch, (X, y) in enumerate(test_loader):
                X = X.to('cuda')
                y = torch.Tensor(y).to('cuda')
                pred = model(X)
                test_loss += criterion(pred, y).item()
            test_loss /= len(test_loader)
        print('Loss/test', test_loss, epoch)

### The assignment

Your goal here is to evaluation transfer learning with ResNet18 on a number of different datasets.  To do this, you'll need a function that outputs an accuracy measure, and builds a confusion matrix.

Build several datasets, and evaluate success.  Here are some questions to explore:

- How does adding more classes impact the performance of your learned model? Are some classes harder to differentiate from each other than others (a confusion matrix would help)?
- How does adding more layers impact the speed of convergence and the ultimate performance of the model?
- Can you find some classes that you are unable to learn particularly well?
- If you add random transformations to the training set, how does test performance change? Does it take longer to overfit?