<a href="https://colab.research.google.com/github/jeffreyfeng99/SYDE_522_A3/blob/master/SYDE_522_Assignment_3_joeydev.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Submission Notes
Group members:

Jeffrey Feng, 20704800

Joey Kuang, 20726074

# Progress and Methodology

### Progress

The first method we tried for this domain adaptation task followed unsupervised domain adaptation by backpropagation (DANN) [1, 2]. The feature extractor was a vanilla convolutional neural network and the domain and label classifiers were simple 2 to 3 layer dense networks. Batch normalization, ReLU, and dropout layers (P=0.5) were used throughout the network, and loss was determined by negative log likelihood. We first experimented with changing the feature extractor to pretrained state-of-the-art networks such as ResNet-34, VGG16, and EfficientNet-b0 [3, 4, 5]. We also tried larger networks such as the larger ResNet and EfficientNet variations. EfficientNet-b4 provided the best result. The two classifiers were also modified by adding layers, but this did not improve performance so we reverted for subsequent experiments. 

We also experimented with various training algorithms such as adversarial discriminative domain adaptation (ADDA), maximum classifier discrepancy (MCD), and DIRT-T [6, 7, 8]. We found that none of these provided stable accuracies, nor did they provide greater training accuracy than using EfficientNet-b0 on DANN. Additionally, we explored the effects of different data processing. We first apply dataset-specific normalization. We also tried applying domain-specific normalization, but this did not significantly improve results. To make domains appear similar to each other, edge detection is applied on inputs so that contours of objects are fed into the network rather than the full context of the image. This method performed decently, but could not outperform using the data as they were. We applied data augmentation based on the VISDA-2017 visual domain adaptation challenge winner [9]. This was significantly boosting performances from all our experimental baselines. 

### Final Method

Our setup is adapted from the DANN github repo [10]. The final method we arrived at uses the DANN algorithm to finetune an EfficientNet-b4 feature extractor pretrained on ImageNet, and two vanilla dense networks to classify the domain and label. Negative log likelihood loss is determined for each network and the sum is carried in backpropagation. We apply data augmentation that includes random crop, rotation, scale, horizontal flips, brightness scaling, and desaturation before applying dataset-wide normalization. The final input image size is 224x224 rather than the original 227x227 to accommodate the EfficientNet-b4 input size. We train with batch normalization and a batch size of 32 for 300 epochs at a learning rate of 1e-4. The Kaggle environment supported network training.

### References

[1] Y. Ganin and V. Lempitsky, “Unsupervised Domain Adaptation by Backpropagation,” arXiv:1409.7495 [cs, stat], Feb. 2015, Accessed: Apr. 14, 2022. [Online]. Available: http://arxiv.org/abs/1409.7495

[2] Y. Ganin et al., “Domain-Adversarial Training of Neural Networks,” arXiv:1505.07818 [cs, stat], May 2016, Accessed: Apr. 14, 2022. [Online]. Available: http://arxiv.org/abs/1505.07818

[3] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” arXiv:1512.03385 [cs], Dec. 2015, Accessed: Apr. 14, 2022. [Online]. Available: http://arxiv.org/abs/1512.03385

[4] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” arXiv:1409.1556 [cs], Apr. 2015, Accessed: Apr. 14, 2022. [Online]. Available: http://arxiv.org/abs/1409.1556

[5] M. Tan and Q. V. Le, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks,” arXiv:1905.11946 [cs, stat], Sep. 2020, Accessed: Apr. 14, 2022. [Online]. Available: http://arxiv.org/abs/1905.11946

[6] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell, “Adversarial Discriminative Domain Adaptation,” arXiv:1702.05464 [cs], Feb. 2017, Accessed: Apr. 14, 2022. [Online]. Available: http://arxiv.org/abs/1702.05464

[7] K. Saito, K. Watanabe, Y. Ushiku, and T. Harada, “Maximum Classifier Discrepancy for Unsupervised Domain Adaptation,” arXiv:1712.02560 [cs], Apr. 2018, Accessed: Apr. 14, 2022. [Online]. Available: http://arxiv.org/abs/1712.02560

[8] R. Shu, H. H. Bui, H. Narui, and S. Ermon, “A DIRT-T Approach to Unsupervised Domain Adaptation,” arXiv:1802.08735 [cs, stat], Mar. 2018, Accessed: Apr. 14, 2022. [Online]. Available: http://arxiv.org/abs/1802.08735

[9] G. French, M. Mackiewicz, and M. Fisher, “Self-ensembling for visual domain adaptation,” arXiv:1706.05208 [cs], Sep. 2018, Accessed: Apr. 14, 2022. [Online]. Available: http://arxiv.org/abs/1706.05208

[10] “DANN/model.py at master · fungtion/DANN,” GitHub. https://github.com/fungtion/DANN (accessed Apr. 14, 2022).



# Import statements

In [None]:
# This version is required to use efficientnetsb0-b7
!pip install torchvision==0.11.1

In [None]:
import os
import pandas as pd
import numpy as np
import random
from PIL import Image
from tqdm import tqdm
from datetime import datetime

import torch
import torch.nn as nn
import torch.utils.data as data
from torch.autograd import Function
import torch.backends.cudnn as cudnn
from torchvision import transforms
from torchvision import datasets
from torchvision import models
import torch.optim as optim

# Config

In [None]:
cuda = True
cudnn.benchmark = True

model_name = "efficientnet_b4"

LR = 1e-4
BATCH_SIZE = 32
IMAGE_SIZE = 224
N_EPOCH = 200

# Data loader

In [None]:
dataset_root = '../input/syde522a3data/data' # path to input data uploaded to kaggle directory
output_root = './04092022_efficientnetb4_aug_run1'# desired output pathname that will be saved under Output /kaggle/working
source_dataset_name = 'train_set'
target_dataset_name = 'test_set'

source_image_root = os.path.join(dataset_root, source_dataset_name)
target_image_root = os.path.join(dataset_root, target_dataset_name)

train_label_list = os.path.join(dataset_root, 'train_labels.csv')

os.makedirs(output_root, exist_ok=True)

In [None]:
class GetLoader(data.Dataset):
    def __init__(self, data_root, data_list=None, transform=None):
        self.root = data_root
        self.transform = transform

        # we only pass data_list if it's training set
        if data_list is not None:
            df = pd.read_csv(data_list)
            self.img_paths = df['dir'].to_list()

            if 'label2' in df.columns:
                self.img_labels = df['label2'].to_list()
            else: 
                self.img_labels = ['0' for i in range(len(self.img_paths))]

            if 'label1' in df.columns:
                self.domain_labels = df['label1'].to_list()
            else: 
                self.domain_labels = ['0' for i in range(len(self.img_paths))]
        else:
            # Walk through test folder - we don't need labels
            self.img_paths = [f for root,dirs,files in os.walk(data_root) for f in files if f.endswith('.png')]
            self.img_labels = ['0' for i in range(len(self.img_paths))]
            self.domain_labels = ['0' for i in range(len(self.img_paths))]

        self.n_data = len(self.img_paths)

    def __getitem__(self, item):
        img_paths, labels, domain_labels = self.img_paths[item%self.n_data], self.img_labels[item%self.n_data], self.domain_labels[item%self.n_data]
        imgs = Image.open(os.path.join(self.root, img_paths)).convert('RGB')

        if self.transform is not None:

            if isinstance(self.transform, list):
                tform = self.transform[int(domain_labels)]
            else:
                tform = self.transform

            imgs = tform(imgs)
            labels = int(labels)
            domain_labels = int(domain_labels)

        return imgs, labels, domain_labels, img_paths

    def __len__(self):
        return self.n_data

In [None]:
# Preprocess data
def preprocess_multiple_fn(mus, stds):
    tforms = []

    for i in range(len(mus)):
        tforms.append(preprocess_fn(mu=mus[i], std=stds[i]))
    
    return tforms

def preprocess_fn(mu=(0.6399, 0.6076, 0.5603), std=(0.3065, 0.3082, 0.3353), aug=False):
  if aug:
    img_transform = transforms.Compose([
        transforms.RandomCrop(IMAGE_SIZE),
        transforms.RandomHorizontalFlip(0.5),
        transforms.RandomRotation(15),
        transforms.ColorJitter(brightness=0.5, saturation=0.5),
        transforms.Resize(IMAGE_SIZE),
        transforms.ToTensor(),
        transforms.Normalize(mean=mu, std=std) 
    ])
  else:
    img_transform = transforms.Compose([
        transforms.Resize(IMAGE_SIZE),
        transforms.ToTensor(),
        transforms.Normalize(mean=mu, std=std) 
    ])

  return img_transform

def prep_dataloader(image_root, label_list=None, img_transform=None, 
                    drop_last=False, shuffle=True):
    dataset = GetLoader(
        data_root=image_root,
        data_list=label_list,
        transform=img_transform
    )

    dataloader = data.DataLoader(
        dataset=dataset,
        batch_size=BATCH_SIZE,
        shuffle=shuffle,
        num_workers=4,
        drop_last=drop_last)
    
    return dataset, dataloader


# Model definition

In [None]:
# if False, then we are feature extracting
def set_parameter_requires_grad(model, finetune):
    for param in model.parameters():
        param.requires_grad = finetune

In [None]:
class ReverseLayerF(Function):

    @staticmethod
    def forward(ctx, x, alpha):
        ctx.alpha = alpha

        return x.view_as(x)

    @staticmethod
    def backward(ctx, grad_output):
        output = grad_output.neg() * ctx.alpha

        return output, None


class CNNModel(nn.Module):

    def __init__(self, model_name="resnet18"):
        super(CNNModel, self).__init__()
        self.ft_out_size = 0

        if model_name == "resnet18":
            self.feature = models.resnet18(pretrained=True) 
            self.feature.fc = nn.Identity()
            self.ft_out_size = 512
        if model_name == "resnet50":
            self.feature = models.resnet50(pretrained=True) 
            self.feature.fc = nn.Identity()
            self.ft_out_size = 2048
        elif model_name == "vgg16":
            self.feature = models.vgg16_bn(pretrained=True) 
            self.feature.avgpool = nn.AdaptiveAvgPool2d(output_size=(1, 1)) # original is (7,7)
            self.feature.classifier = nn.Identity()
            self.ft_out_size = 512
        elif model_name == "efficientnet_b4":
            self.feature = models.efficientnet_b4(pretrained=True) 
            self.feature.avgpool = nn.AdaptiveAvgPool2d(output_size=(1, 1)) # original is (7,7)
            self.feature.classifier = nn.Identity()
            self.ft_out_size = 1792
        else:
            # Default feature extracting model
            self.feature = nn.Sequential()
            self.feature.add_module('f_conv1', nn.Conv2d(3, 64, kernel_size=5))
            self.feature.add_module('f_bn1', nn.BatchNorm2d(64))
            self.feature.add_module('f_pool1', nn.MaxPool2d(2))
            self.feature.add_module('f_relu1', nn.ReLU(True))
            self.feature.add_module('f_conv2', nn.Conv2d(64, 128, kernel_size=3))
            self.feature.add_module('f_bn2', nn.BatchNorm2d(128))
            self.feature.add_module('f_drop1', nn.Dropout2d())
            self.feature.add_module('f_pool2', nn.MaxPool2d(2))
            self.feature.add_module('f_relu2', nn.ReLU(True))
            self.feature.add_module('f_conv3', nn.Conv2d(128, 256, kernel_size=3))
            self.feature.add_module('f_bn3', nn.BatchNorm2d(256))
            self.feature.add_module('f_drop3', nn.Dropout2d())
            self.feature.add_module('f_pool3', nn.MaxPool2d(2))
            self.feature.add_module('f_relu4', nn.ReLU(True))
            self.feature.add_module('f_conv4', nn.Conv2d(256, 256, kernel_size=3))
            self.feature.add_module('f_bn4', nn.BatchNorm2d(256))
            self.feature.add_module('f_drop4', nn.Dropout2d())
            self.feature.add_module('f_pool4', nn.MaxPool2d(2))
            self.feature.add_module('f_relu4', nn.ReLU(True))
            self.feature.add_module('f_conv5', nn.Conv2d(256, 512, kernel_size=3))
            self.feature.add_module('f_bn5', nn.BatchNorm2d(512))
            self.feature.add_module('f_drop5', nn.Dropout2d())
            self.feature.add_module('f_pool5', nn.MaxPool2d(2))
            self.feature.add_module('f_relu5', nn.ReLU(True))

        self.class_classifier = nn.Sequential()
        self.class_classifier.add_module('c_fc1', nn.Linear(self.ft_out_size, 100))
        self.class_classifier.add_module('c_bn1', nn.BatchNorm1d(100))
        self.class_classifier.add_module('c_relu1', nn.ReLU(True))
        self.class_classifier.add_module('c_drop1', nn.Dropout2d())
        self.class_classifier.add_module('c_fc2', nn.Linear(100, 100))
        self.class_classifier.add_module('c_bn2', nn.BatchNorm1d(100))
        self.class_classifier.add_module('c_relu2', nn.ReLU(True))
        self.class_classifier.add_module('c_fc3', nn.Linear(100, 7))
        self.class_classifier.add_module('c_softmax', nn.LogSoftmax(dim=1))

        self.domain_classifier = nn.Sequential()
        self.domain_classifier.add_module('d_fc1', nn.Linear(self.ft_out_size, 100))
        self.domain_classifier.add_module('d_bn1', nn.BatchNorm1d(100))
        self.domain_classifier.add_module('d_relu1', nn.ReLU(True))
        self.domain_classifier.add_module('d_fc2', nn.Linear(100, 4))
        self.domain_classifier.add_module('d_softmax', nn.LogSoftmax(dim=1))
        


    def forward(self, input_data, alpha):
        input_data = input_data.expand(input_data.data.shape[0], 3, IMAGE_SIZE, IMAGE_SIZE)
        feature = self.feature(input_data)
        feature = feature.view(-1, self.ft_out_size)
        reverse_feature = ReverseLayerF.apply(feature, alpha)
        class_output = self.class_classifier(feature)
        domain_output = self.domain_classifier(reverse_feature)

        return class_output, domain_output

# Test pipeline

In [None]:
def test(net, epoch):

    # load data
    img_transform_source = preprocess_fn()
    img_transform_target = preprocess_fn(mu=(0.9566, 0.9566, 0.9566), std=(0.1752, 0.1752, 0.1752))


    dataset_source, dataloader_source = prep_dataloader(
        image_root=os.path.join(source_image_root, 'train_set'),
        label_list=train_label_list,
        img_transform=img_transform_source,
        shuffle=False
    )

    dataset_target, dataloader_target = prep_dataloader(
        image_root=os.path.join(target_image_root, 'test_set'),
        img_transform=img_transform_target,
        shuffle=False
    )

    net.eval()

    if cuda:
        net = net.cuda()

    train_pths, train_preds = inference(net, dataloader_source, cuda=cuda, alpha=alpha)
    train_results = pd.DataFrame({'id': train_pths, 'label': train_preds})
    train_results_pth = os.path.join(output_root, '%s_train_epoch%s.csv' % (datetime.now().strftime("%m%d%Y"), epoch))
    train_results.to_csv(train_results_pth, index=False)

    test_pths, test_preds = inference(net, dataloader_target, cuda=cuda, alpha=alpha)
    test_results = pd.DataFrame({'id': test_pths, 'label': test_preds})
    test_results_pth = os.path.join(output_root, '%s_test_epoch%s.csv' % (datetime.now().strftime("%m%d%Y"), epoch))
    test_results.to_csv(test_results_pth, index=False)

    print('epoch: %d, accuracy of the train dataset: %f' % (epoch, compare(train_label_list, train_results_pth)))

def inference(net, dataloader, cuda=True, alpha=0.0):
    preds = []
    pths = []
    for input_img, _,_, img_paths in dataloader: 

        if cuda:
            input_img = input_img.cuda()

        class_output, _ = net(input_data=input_img, alpha=alpha)
        pred = class_output.data.max(1, keepdim=True)[1]
        pths = pths + list(img_paths)
        preds = preds + list(pred.squeeze(1).cpu().numpy())
    return pths, preds

def compare(true_labels, predicted_labels):
    combined_df = pd.read_csv(true_labels)
    predicted_df = pd.read_csv(predicted_labels)

    combined_df['label'] = combined_df['dir'].map(predicted_df.set_index('id')['label'])

    true_labels = np.array(combined_df['label2'].to_list())
    pred_labels = np.array(combined_df['label'].to_list())

    return np.sum(true_labels == pred_labels) / len(true_labels)

# Training pipeline

In [None]:
manual_seed = random.randint(1, 10000)
random.seed(manual_seed)
torch.manual_seed(manual_seed)

# load data
img_transform_source = preprocess_fn(aug=True)
img_transform_target = preprocess_fn(mu=(0.9566, 0.9566, 0.9566), std=(0.1752, 0.1752, 0.1752), aug=True)


dataset_source, dataloader_source = prep_dataloader(
    image_root=os.path.join(source_image_root, 'train_set'), 
    label_list=train_label_list,
    img_transform=img_transform_source
)

dataset_target, dataloader_target = prep_dataloader(
    image_root=os.path.join(target_image_root, 'test_set'),
    img_transform=img_transform_target
)

# load model
my_net = CNNModel(model_name=model_name)

# setup optimizer
optimizer = optim.Adam(my_net.parameters(), lr=LR)

loss_class = torch.nn.NLLLoss()
loss_domain = torch.nn.NLLLoss()

if cuda:
    my_net = my_net.cuda()
    loss_class = loss_class.cuda()
    loss_domain = loss_domain.cuda()

set_parameter_requires_grad(my_net, True)

# training
for epoch in range(N_EPOCH):

    len_dataloader = max(len(dataloader_source), len(dataloader_target))
    data_source_iter = iter(dataloader_source)
    data_target_iter = iter(dataloader_target)

    i = 0
    while i < len_dataloader:

        p = float(i + epoch * len_dataloader) / N_EPOCH / len_dataloader
        alpha = 2. / (1. + np.exp(-10 * p)) - 1

        # training model using source data
        data_source = data_source_iter.next()
        s_img, s_label, s_domain_label, _ = data_source

        my_net.zero_grad()
        batch_size = len(s_label)

        input_img = torch.FloatTensor(batch_size, 3, IMAGE_SIZE, IMAGE_SIZE)
        class_label = torch.LongTensor(batch_size)
        domain_label = torch.LongTensor(batch_size)

        if cuda:
            s_img = s_img.cuda()
            s_label = s_label.cuda()
            s_domain_label = s_domain_label.cuda()
            input_img = input_img.cuda()
            class_label = class_label.cuda()
            domain_label = domain_label.cuda()

        input_img.resize_as_(s_img).copy_(s_img)
        class_label.resize_as_(s_label).copy_(s_label)
        domain_label.resize_as_(s_domain_label).copy_(s_domain_label)

        class_output, domain_output = my_net(input_data=input_img, alpha=alpha)
        err_s_label = loss_class(class_output, class_label)
        err_s_domain = loss_domain(domain_output, domain_label)

        # training model using target data
        if i == len(dataloader_target):
            data_target_iter = iter(dataloader_target)
        data_target = data_target_iter.next()
        t_img, _, _, _ = data_target

        batch_size = len(t_img) 

        input_img = torch.FloatTensor(batch_size, 3, IMAGE_SIZE, IMAGE_SIZE)
        domain_label = torch.ones(batch_size) * 3.0
        domain_label = domain_label.long()

        if cuda:
            t_img = t_img.cuda()
            input_img = input_img.cuda()
            domain_label = domain_label.cuda()

        input_img.resize_as_(t_img).copy_(t_img)

        _, domain_output = my_net(input_data=input_img, alpha=alpha)
        err_t_domain = loss_domain(domain_output, domain_label)
        err = err_t_domain + err_s_domain + err_s_label
        err.backward()
        optimizer.step()

        i += 1

        print('epoch: %d, [iter: %d / all %d], err_s_label: %f, err_s_domain: %f, err_t_domain: %f' \
            % (epoch, i, len_dataloader, err_s_label.cpu().data.numpy(),
                err_s_domain.cpu().data.numpy(), err_t_domain.cpu().data.numpy()))

    test(my_net, epoch)
    my_net.train()

print('done')

In [None]:
# Use the same path as output_root
!zip -r ./04092022_efficientnetb4_aug_run1.zip ./04092022_efficientnetb4_aug_run1/