<a href="https://colab.research.google.com/github/ricglz/CE888_activities/blob/main/assignment/Project_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
! pip install torch torchvision skorch timm



## Preparations

Before we begin, lets mount the google drive to later on read information from it:

---



In [None]:
from google.colab import drive

drive_path = '/content/gdrive'
drive.mount(drive_path, force_remount=False)
drive_path += '/MyDrive'

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [None]:
import torch
import random
import numpy as np

seed = 42
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
random.seed(seed)
np.random.seed(seed)
torch.backends.cudnn.deterministic = True

## The Problem

We are going to train a neural network to classify **ants** and **bees**. The dataset consist of 120 training images and 75 validiation images for each class. First we create the training and validiation datasets:

In [None]:
import torchvision.transforms as T
from os import path

data_dir = path.join(drive_path, 'Flame')
resize = T.Resize((254, 254))
normalize = T.Normalize([0.485, 0.456, 0.406], 
                         [0.229, 0.224, 0.225])

In [None]:
train_transforms = T.Compose([
  resize,
  T.RandomHorizontalFlip(),
  T.RandomVerticalFlip(),
  T.ToTensor(),
  normalize
])
transforms = T.Compose([
  resize,
  T.ToTensor(),
  normalize
])

In [None]:
import torchvision.datasets as datasets
train_ds = datasets.ImageFolder(path.join(data_dir, 'Training'),
                                train_transforms)
len(train_ds)

80922

In [None]:
test_ds = datasets.ImageFolder(path.join(data_dir, 'Test'), transforms)
len(test_ds)

8617

The train dataset includes data augmentation techniques such as cropping to size 224 and horizontal flips.The train and validiation datasets are normalized with mean: `[0.485, 0.456, 0.406]`, and standard deviation: `[0.229, 0.224, 0.225]`. These values are the means and standard deviations of the ImageNet images. We used these values because the pretrained model was trained on ImageNet.

## Loading pretrained model

We use a pretrained `ResNet18` neural network model with its final layer replaced with a fully connected layer:

In [None]:
from torch import load, FloatTensor
from torch.nn import Linear, Module
import timm

f_params = None

class PretrainedModel(Module):
    def __init__(self, model='rexnet'):
        super().__init__()
        model_name = self.get_model_name(model)
        self.model = timm.create_model(model_name, pretrained=True, num_classes=1)
        # if use_pretrained:
        #    self.model.load_state_dict(self.get_state_dict())
    
    def get_state_dict(self):
        remove_model_prefix = lambda string: string[6:]
        return { remove_model_prefix(k): v for k, v in load(f_params).items() }
      
    def get_model_name(self, general_model):
        return 'rexnet_200' if general_model == 'rexnet' else \
               'tf_efficientnet_b8' if general_model == 'efficientnet' else ''

    def forward(self, x):
        return self.model(x).squeeze(-1)

Since we are training a binary classifier, the output of the final fully connected layer has size 2.

## Defining the API

---

### Callbacks

In this case the only Callback that will be used in every model will be an early stopping callback

In [None]:
from skorch.callbacks import EarlyStopping, Freezer, LRScheduler, ProgressBar

is_top_layer = lambda x: not x.startswith('model.fc') and \
                            not x.startswith('model._fc') and \
                            not x.startswith('model.head') and \
                            not x.startswith('model.classifier')
freezer = Freezer(is_top_layer) 
early_stopping = EarlyStopping(patience=3)
scheduler = LRScheduler(policy='StepLR', gamma=9e-1, step_size=1)
progress_bar = ProgressBar()

### Helper functions classifier

The next code will be used to create helper functions to easily create, fit and evaluate different type of CNN architectures

In [None]:
from torch import float64
from skorch.classifier import NeuralNetBinaryClassifier
from skorch.utils import to_tensor, to_numpy
import sklearn.metrics as sk_metrics 
import numpy as np

class MyClassifier(NeuralNetBinaryClassifier):
    def infer(self, x, **fit_params):
        x = to_tensor(x, device=self.device)
        if isinstance(x, dict):
            x_dict = self._merge_x_and_fit_params(x, fit_params)
            return self.module_(**x_dict).to(device=self.device, dtype=float64)
        return self.module_(x, **fit_params).to(device=self.device, dtype=float64)

    def train_step_single(self, Xi, yi, **fit_params):
        self.module_.train()
        y_pred = self.infer(Xi, **fit_params)
        yi = yi.to(device=self.device, dtype=float64)
        loss = self.get_loss(y_pred, yi, X=Xi, training=True)
        loss.backward()
        return { 'loss': loss, 'y_pred': y_pred }

    def validation_step(self, Xi, yi, **fit_params):
        self.module_.eval()
        y_pred = self.infer(Xi, **fit_params)
        yi = yi.to(device=self.device, dtype=float64)
        loss = self.get_loss(y_pred, yi, X=Xi, training=False)
        return { 'loss': loss,'y_pred': y_pred }

    def _get_y_values(self, X):
        y_true, y_pred = [], []
        nonlinearity = self._get_predict_nonlinearity()
        for images, labels in self.get_iterator(X):
            images = images.to(self.device)
            outputs = nonlinearity(self.module_(images))
            _, predicted = torch.max(outputs.data, 1)
            y_true.append(to_numpy(labels))
            y_pred.append(to_numpy(predicted))
        y_true = np.concatenate(y_true)
        y_pred = np.concatenate(y_pred)
        return y_true, y_pred

    def score(self, X, y=None):
        y_true, y_pred = self._get_y_values(X)
        return sk_metrics.roc_auc_score(y_true, y_pred)
    
    def scores(self, X, y=None):
        y_true, y_pred = self._get_y_values(X)
        accuracy = sk_metrics.accuracy_score(y_true, y_pred)
        confusion_matrix = sk_metrics.confusion_matrix(y_true, y_pred)
        f1 = sk_metrics.f1_score(y_true, y_pred)
        auc = sk_metrics.roc_auc_score(y_true, y_pred)
        return accuracy, confusion_matrix, f1, auc 

In [None]:
from torch.optim import Adam
from skorch.callbacks import Checkpoint
from skorch.dataset import CVSplit

def create_model(module_model):
    global f_params

    f_params = path.join(drive_path, f'Models/best_{module_model}.pt')
    checkpoint = Checkpoint(f_params=f_params, monitor='valid_acc_best')
    callbacks = [checkpoint, freezer, early_stopping, scheduler]
    lr = 2e-3

    return MyClassifier(
        PretrainedModel,
        module__model=module_model,
        optimizer=Adam,
        lr=lr,
        batch_size=28,
        max_epochs=10,
        iterator_train__shuffle=True,
        iterator_train__num_workers=16,
        iterator_valid__shuffle=True,
        iterator_valid__num_workers=16,
        train_split=CVSplit(0.2m, random_state=seed),
        callbacks=callbacks,
        device='cuda'
    )

In [None]:
def create_and_fit(model_name):
    net = create_model(model_name)
    net.fit(train_ds, y=None)
    return net

In [None]:
def print_and_plot_scores(net):
    accuracy, confusion_matrix, f1, auc = net.scores(test_ds, y=None)
    print(f'Accuracy: {accuracy}')
    print(f'F1 Score: {f1}')
    print(f'AUC: {auc}')
    disp = sk_metrics.ConfusionMatrixDisplay(
      confusion_matrix, display_labels=['Fire', 'No_Fire'])
    disp.plot()

That is quite a few parameters! Lets walk through each one:

1. `model_ft`: Our `ResNet18` neural network
2. `criterion=nn.CrossEntropyLoss`: loss function
3. `lr`: Initial learning rate
4. `batch_size`: Size of a batch
5. `max_epochs`: Number of epochs to train
6. `module__output_features`: Used by `__init__` in our `PretrainedModel` class to set the number of classes.
7. `optimizer`: Our optimizer
8. `optimizer__momentum`: The initial momentum
9. `iterator_{train,valid}__{shuffle,num_workers}`: Parameters that are passed to the dataloader.
10. `train_split`: A wrapper around `val_ds` to use our validation dataset.
11. `callbacks`: Our callbacks 
12. `device`: Set to `cuda` to train on gpu.

Now we are ready to train our neural network:

## Resnext model

----

In [None]:
rexnet = create_and_fit('rexnet')

  epoch    train_loss    valid_acc    valid_loss    cp      lr        dur
-------  ------------  -----------  ------------  ----  ------  ---------
      1        [36m0.3469[0m       [32m0.9665[0m        [35m0.1197[0m     +  0.0020  2425.1572




      2        [36m0.1279[0m       [32m0.9781[0m        [35m0.0741[0m     +  0.0018  700.8887
      3        [36m0.0905[0m       [32m0.9841[0m        [35m0.0559[0m     +  0.0016  699.2225
      4        [36m0.0727[0m       [32m0.9854[0m        [35m0.0468[0m     +  0.0015  697.6781
      5        [36m0.0618[0m       [32m0.9873[0m        [35m0.0408[0m     +  0.0013  699.2267
      6        [36m0.0584[0m       [32m0.9896[0m        [35m0.0370[0m     +  0.0012  695.9418


In [None]:
print_and_plot_scores(rexnet)

## EfficientNet

----

In [None]:
efficientnet = create_and_fit('efficientnet')

In [None]:
print_and_plot_scores(efficientnet)