# Weighted Ensemble Method

Used models (for the specific setup see file "models.py"):
    - Pretrained ResNet18, trained entire model
    - Non-pretrained ResNet18, trained entire model
    - Pretrained AlexNet, trained entire model
    - Pretrained SqueezeNet, trained entire model
    
Each model is ran in a separate cell. Two cells below this one we set up all the hyperparameters that seemed optimal from the weights and biases platform hyperparameter optimization. Due to the relative differences in accuracies/ROC-AUC between the models we decided to assign different weights to different models dependent on their relative accuracies, more on this in the last cell. The main idea behind using such a weighted ensemble method is generalization. This way any biases that might be present in a single model but not in the others get eliminated due to each model having only a relatively small influence on the final predictions.

Three of the four models were pretrained on ImageNet, which might introduce biases present in that dataset. To reduce these biases, we used a non-pretrained ResNet18 as fourth model.

At the end of the notebook we load all of the softmax predictions from csv files that led to the final submission. These softmax predictions are the outputs from the models described above and .

Note that we initially tried to include a GoogleNet as well but due to the fact that it prerformed badly we removed it.

In [None]:
"""
Imports
"""

import numpy as np

from sklearn.model_selection import StratifiedShuffleSplit
from livelossplot import PlotLosses

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader, Subset
import torchvision.transforms as transforms
from torchvision import models, datasets, transforms
from torch.optim.lr_scheduler import StepLR

from pycm import *
from utils import *
from datasets import *
from models import *

device = 'cuda'

In [None]:
"""

Optimal parameters as found by the hyperparameter optimization in the weights and biases platform.

"""
alex_batch_size = 64
res_batch_size = 64
squeeze_batch_size = 128

res_momentum = 0.8
squeeze_momentum = 0.9

alex_epochs = 40
res_epochs = 30
squeeze_epochs = 30

res_weight_decay = 1e-2
squeeze_weight_decay = 0.001

alex_lr = 0.0005
res_lr = 1e-2
squeeze_lr = 1e-2

set_seed(42)

In [None]:
"""

Setting up the datasets and transforms. 

These are just the basic transforms, different models might use different transforms.

"""

# Optimal transforms as found by data augmentation tests - see presentation and other notebook for more details
train_transform = transforms.Compose([
                                      transforms.ToTensor(),
                                      transforms.Normalize([0.5094, 0.5094, 0.5094], [0.2314, 0.2314, 0.2314]),
                                      transforms.RandomHorizontalFlip(p=0.1),
                                      transforms.RandomRotation(10),
                                      transforms.Resize(224)
                                     ])

# Test transforms should only include 'preprocessing', no data augmentation
test_transform = transforms.Compose([
                                     transforms.ToTensor(),
                                     transforms.Normalize([0.5094, 0.5094, 0.5094], [0.2314, 0.2314, 0.2314]),
                                     transforms.Resize(224)
                                    ])

covid_train_full = datasets.ImageFolder('xray-data/train', transform=train_transform)
covid_test = covid_test = TestDataSet('xray-data/test', transform=test_transform)
test_loader = DataLoader(covid_test , batch_size=1, shuffle=False, num_workers=1, drop_last=False)

In [None]:
"""

AlexNet

"""

# Instantiate the dataloader
train_loader = DataLoader(covid_train_full, batch_size=alex_batch_size, shuffle=True, num_workers=6)

# Loading in the model
model = CustomAlexnet().to(device)

# Setting up optimizer and loss function, Adam turned out to be best for AlexNet
optimizer = torch.optim.Adam(model.parameters(), lr=alex_lr)
criterion = nn.CrossEntropyLoss()

# Prepare model for training
model.train()

liveloss = PlotLosses()
# Training subroutine, training on full dataset
for epoch in range(alex_epochs):
    logs = {}
    train_loss, train_accuracy = train(model, optimizer, criterion, train_loader, device=device, imshape=(-1, 3, 224, 224))

    # Update the logs for the training data
    logs['' + 'log loss'] = train_loss.item()
    logs['' + 'accuracy'] = train_accuracy.item()

    liveloss.update(logs)

    liveloss.draw()

# Get predictions (softmax)
preds_alex = predict(model, test_loader, max=False, imshape=(-1, 3, 224, 224), device=device)

# Save predictions to a file
preds_to_file(preds_alex, "preds_alex")

# Deleting the model to free up GPU memory
del model

In [None]:
"""

SqueezeNet

"""

# Instantiate the dataloader
train_loader = DataLoader(covid_train_full, batch_size=squeeze_batch_size, shuffle=True, num_workers=6)

# Loading in the model
model = CustomSqueezenet().to(device)

# Setting up optimizer and loss function, we included adaptive learning rate. We investigated the effects of this
# adaptive learning rate, and in the end decided to just use a very small learning rate for the final submission
# which achieved the same thing. We left the adaptive learning rate in here for presentation purposes.
optimizer = torch.optim.SGD(model.parameters(), lr=squeeze_lr, weight_decay=squeeze_weight_decay, momentum=squeeze_momentum)
criterion = nn.CrossEntropyLoss()
scheduler = StepLR(optimizer, step_size=5, gamma=0.5)

# Prepare model for training
model.train()

liveloss = PlotLosses()
# Training subroutine, training on full dataset
for epoch in range(squeeze_epochs):
    logs = {}
    train_loss, train_accuracy = train(model, optimizer, criterion, train_loader, device=device, imshape=(-1, 3, 224, 224))

    # Update the logs for the training data
    scheduler.step()
    logs['' + 'log loss'] = train_loss.item()
    logs['' + 'accuracy'] = train_accuracy.item()

    liveloss.update(logs)

    liveloss.draw()
    
# Get predictions (softmax)
preds_squeeze = predict(model, test_loader, max=False, imshape=(-1, 3, 224, 224), device=device)

# Save predictions to a file
preds_to_file(preds_squeeze, "preds_squeeze")

# Deleting the model to free up GPU memory
del model

In [None]:
"""

ResNet

Note that we only included the pretrained version of the resnet here that we used for the final results for 
presentation purposes. It is probably not interesting to have two of the exact same models in here.

"""

# The resnet uses a single input channel, which is why the transforms are a little different
train_transform = transforms.Compose([
                                      transforms.ToTensor(),
                                      transforms.Grayscale(num_output_channels=1),
                                      transforms.Normalize([0.5094], [0.2314]),
                                      transforms.RandomHorizontalFlip(p=0.1),
                                      transforms.RandomRotation(10),
                                      transforms.Resize(224)
                                     ])

test_transform = transforms.Compose([
                                     transforms.ToTensor(),
                                     transforms.Grayscale(num_output_channels=1),
                                     transforms.Normalize([0.5094], [0.2314]),
                                     transforms.Resize(224)
                                    ])

# Setting the transforms on the data
covid_train_full.transform = train_transform
covid_test.transform = test_transform

# Making the dataloaders again
train_loader = DataLoader(covid_train_full, batch_size=res_batch_size, shuffle=True, num_workers=6)
test_loader = DataLoader(covid_test, batch_size=1, shuffle=False, num_workers=1, drop_last=False)


model = CustomResnet().to(device)

# Setting up optimizer and loss function, Adam turned out to be best for AlexNet
optimizer = torch.optim.SGD(model.parameters(), lr=res_lr, momentum=res_momentum)  # define an optimiser
criterion = nn.CrossEntropyLoss()

# Prepare model for training
model.train()

liveloss = PlotLosses()
# Training subroutine, training on full dataset
for epoch in range(res_epochs):
    logs = {}
    train_loss, train_accuracy = train(model, optimizer, criterion, train_loader, device=device, imshape=(-1, 1, 224, 224))

    # Update the logs for the training data
    logs['' + 'log loss'] = train_loss.item()
    logs['' + 'accuracy'] = train_accuracy.item()

    liveloss.update(logs)

    liveloss.draw()

# Get predictions (softmax)
preds_res = predict(model, test_loader, max=False, imshape=(-1, 1, 224, 224), device=device)

# Save predictions to a file
preds_to_file(preds_res, "preds_res")

# Deleting the model to free up GPU memory
del model

In [None]:
"""

Execute this cell if predictions are not anymore in kernel's variables.

THIS CELL WILL NOT RUN IF ABOVE CELLS ARE NOT RAN

"""

preds_alex = preds_from_file("preds_alex")
preds_squeeze = preds_from_file("preds_squeeze")
preds_res = preds_from_file("preds_res")

In [None]:
"""

Get a weighted average of predictions by different models

THIS CELL WILL NOT RUN IF ABOVE CELLS ARE NOT RAN

"""

preds_average([preds_alex, preds_res, preds_squeeze], [1.2, 1, 2]) # The predictions and the respective weights

In [None]:
"""

For reproducibility of the final results, here are the softmax outputs of the models described above. Run this cell
to get our predictions for our best submission. No need to run above cells before this, except for the imports.

- "softmax_alexnet.csv": Pretrained AlexNet
- "softmax_resnet18_1.csv": Non-pretrained ResNet18
- "softmax_resnet18_2.csv": Pretrained ResNet18
- "softmax_squeezenet.csv": Pretrained SqueezeNet

"""

preds_alex = preds_from_file("predictions/softmax_alexnet.csv")
preds_squeeze = preds_from_file("predictions/softmax_squeezenet.csv")
preds_res_1 = preds_from_file("predictions/softmax_resnet18_1.csv")
preds_res_2 = preds_from_file("predictions/softmax_resnet18_2.csv")

# Why did we choose these weights? 
# 1. Trial and error based on accuracies in validation sets
# 2. Intuition based on what we wanted to achieve. SqueezeNet and AlexNet had low generalization error. Resnets
#    had more generalization error but achieved higher accuries. The ResNet18_1 was mostly there to relieve biases
#    from the pretrained models.
preds_new = preds_average([preds_res_1, preds_alex, preds_squeeze, preds_res_2], [0.5, 5, 5, 0.1])

# Our team dropped one place in the private leaderboard with respect to the public leaderboard. This might be an
# indication for these weights being overfit to the test set.