# **Deep Learning Assignment 2022**

## Unsupervised Domain Adaptation

Student team:
*   Laiti Francesco
*   Lobba Davide

---

## Introduction

In this notebook we build, train and evaluate two different deep learning frameworks with respect to a baseline, that involves the topic of Unsupervised Domain Adaptation (UDA). For this assignment we use a UDA benchmark constisting of two domains, Product $P$ and Real World $RW$, treated as source domain and target domain, and viceversa. The aim of this project is to "propose a UDA technique to counteract the negative impact of the domain gap when training the model on the source distribution and evaluating it on the target distribution".

### Dataset

We use the [Adaptiope](https://drive.google.com/file/d/1FmdsvetC0oVyrFJ9ER7fcN-cXPOWx2gq/view) object recognition dataset composed of 3 distinct domain: syntethic, product, real world. The original dataset has 123 object categories for each domain, but for this assignment we will use only 20 categories randomly chosen. We use the standard 80%/20% train/test split, as suggested in the assignment.

### Method choices

We decided to implement two different approaches to investigate the results and the performances on the Adaptiope dataset with respect to a baseline.

For the delivery we implemented:
- A baseline trained on the source domain and evaluated on the target domain **without** any domain alignment strategy involved, in order to obtain upper and lower bound for the accuracy;
- The method proposed by [Maximum Classifier Discrepancy for Unsupervised Domain Adaptation (MCD_DA)](https://arxiv.org/abs/1712.02560) released in 2018 by Saito et. al.;
- The method proposed by [Deep Subdomain Adaptation Network for Image Classification (DSAN)](https://arxiv.org/abs/2106.09388) released in 2021 by Zhu et. al.

For all these experiments we use the ResNet18 as backbone model for computational restrictions.

### Experiments

For each approach we trained and tested the methods in both directions, one at a time.
- Regarding the baseline, we trained on $P$ training set and test on $RW$ test set. We compute the accuracy in order to obtain the lower bound accuracy for the domain adaptation task. Vice-versa for $RW$ to $P$. We also trained the baseline on each domain and test on the same domain test set in order to get the upper bound accuracy;
- Regarding the second method, MCD_DA, we proceeded in a similar way but without computing the upper buond accuracy;
- For the last method, DSAN, we conducted the same experiments of MCD_DA.

In the notebook we declared a section for the global constants, valid for every method. Then every approach has its own constants section that matches with the settings (epochs, learning rate, optimizer...) of the method takes in consideration.

### Requirements

To load the dataset, we use the built-in function of `google.colab` library to link the personal Drive to the virtual machine of GColab and `rsync` tool to get the status bar of the clone process. We expect the user to fix the dataset path according to the saved location of its own Drive. The file must be named `Adaptiope.zip`, that is the default name of the compressed folder.

We added a path to save the best model weights for each approach.

> Note that the entire workspace folder of this project is available [here](https://drive.google.com/drive/folders/1yyg4pHmEk3Jyc3T9xVX8M6z5nHpdpnhA?usp=sharing). You can simply create an alias or create a copy of the folder in your GDrive.

## Initialization

In this section we define the requirements to run correctly the notebook and load properly the dataset Adaptiope.

To keep track of model performances and create charts regarding loss and accuracy, we use [WandB](https://wandb.ai/dlfl/projects).

In [None]:
!pip3 install wandb -qU

### Import libraries

We import the necessary Python libraries.

In [None]:
import os
import shutil
from tqdm import tqdm
import yaml
import copy
import math

import torch
import torchvision.transforms as T
import torchvision
import torch.nn.functional as F

import numpy as np
import matplotlib.pyplot as plt

import wandb
wandb.login()

### Declare global constants

We declare the global constants used in this notebook valid for all the methods.

In [None]:
# dataset settings
NUM_CLASSES = 20 # subset of classes requested by the assignment
CLASSES = ["backpack", "bookcase", "car jack", "comb", "crown", "file cabinet", "flat iron", "game controller", "glasses",
           "helicopter", "ice skates", "letter tray", "monitor", "mug", "network switch", "over-ear headphones", "pen",
           "purse", "stand mixer", "stroller"]
BATCH_SIZE = 256
SPLIT_RATIO = 0.8 # 80/20, as requested by the assignment
IMAGENET_MEAN = [0.485, 0.456, 0.406]
IMAGENET_STD = [0.229, 0.224, 0.225]

# resnet18 parameters
WEIGHTS_RESNET18 = "IMAGENET1K_V1"  # reference: https://pytorch.org/vision/stable/models.html 

# env
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# wandb setup
WANDB_MODE = "disabled" # "online" to enable WandB
PROJECT_NAME = "DL_UDA_2022"
ENTITY = "dlfl"

# paths
DIRECTORY_P = "/content/adaptiope_small/product_images"
DIRECTORY_RW = "/content/adaptiope_small/real_life"
WEIGHTS_PATH = "/content/weights/"
ASSETS_PATH = "/content/visualizer/"

# for reproducibility
g = torch.Generator()
g.manual_seed(0)

### Mount the drive

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

### Prepare the Adaptiope dataset

We now download the dataset on the virtual machine and then create a subset of the dataset as requested by the assignment. Code adapted from the one provided.

In [None]:
!mkdir dataset
!rsync --info=progress "gdrive/My Drive/DL_UDA_2022/dataset/Adaptiope.zip" dataset/
!ls dataset
!unzip -qq dataset/Adaptiope.zip
!mkdir adaptiope_small

for d, td in zip(["Adaptiope/product_images", "Adaptiope/real_life"], ["adaptiope_small/product_images", "adaptiope_small/real_life"]):
  os.makedirs(td)
  for c in tqdm(CLASSES):
    c_path = os.path.join(d, c)
    c_target = os.path.join(td, c)
    shutil.copytree(c_path, c_target)

### Utility functions

We experienced an issue with the GPU memory where, after running different runs, CUDA interrupts the execution of cells with an error of full memory capacity reached. To partially address it, we collect most of the variables related to the GPU and free the cache.

After some runs, we, unfortunately, need to restart the notebook kernel to use it again.

In [None]:
def free_GPU_memory(*args):
    for arg in args:
      del arg
    
    with torch.no_grad():
      torch.cuda.empty_cache()

## Dataset & Dataloaders



We create four dataloaders, two for each dataset, returned as dictionary for an easy access and better order:
- Product $P$ train & test;
- Real World $RW$ train & test.

We also apply transformations to images in order to apply data augmentation and fit the image with the input sizes required by ResNet18. We also applied normalization metrics according to ImageNet mean and standard deviation.

The split ratio is set to 80/20. We decided to drop the last uncomplete batch because could affect negatively the performance of the model and also to avoid bias issues.

In [None]:
# structure adapted from the labs
def get_data(batch_size, img_root_product, img_root_realworld):

  transform = list()
  transform.append(T.Resize((256, 256)))                      # resize each PIL image to 256 x 256
  transform.append(T.RandomCrop((224, 224)))                  # randomly crop a 224 x 224 patch
  transform.append(T.ToTensor())                              # convert Numpy to Pytorch Tensor
  transform.append(T.Normalize(mean=IMAGENET_MEAN, 
                               std=IMAGENET_STD))             # normalize with ImageNet mean & std
  transform = T.Compose(transform)                            # compose the above transformations into one
    
  # load data
  product_images_dataset = torchvision.datasets.ImageFolder(root=img_root_product, transform=transform)
  real_images_dataset = torchvision.datasets.ImageFolder(root=img_root_realworld, transform=transform)

  # create train and test splits
  num_samples_product = len(product_images_dataset)
  training_samples_product = int(num_samples_product * SPLIT_RATIO)
  test_samples_product = num_samples_product - training_samples_product
  
  num_samples_real = len(real_images_dataset)
  training_samples_real = int(num_samples_real * SPLIT_RATIO)
  test_samples_real = num_samples_real - training_samples_real

  training_data_product, test_data_product = torch.utils.data.random_split(product_images_dataset, [training_samples_product, test_samples_product], generator=g)
  training_data_real, test_data_real = torch.utils.data.random_split(real_images_dataset, [training_samples_real, test_samples_real], generator=g)

  # initialize dataloaders
  product_train_loader = torch.utils.data.DataLoader(training_data_product, batch_size, shuffle=True, drop_last=True, generator=g) # we decided to drop the last incomplete batch
  product_test_loader = torch.utils.data.DataLoader(test_data_product, batch_size, shuffle=False, generator=g)

  realword_train_loader = torch.utils.data.DataLoader(training_data_real, batch_size, shuffle=True, drop_last=True, generator=g)
  realworld_test_loader = torch.utils.data.DataLoader(test_data_real, batch_size, shuffle=False, generator=g)
  
  prod_dictionary = {'name': 'product',
                     'train': product_train_loader,
                     'test': product_test_loader}
  rw_dictionary   = {'name': 'realworld',
                     'train': realword_train_loader,
                     'test': realworld_test_loader}
  
  return prod_dictionary, rw_dictionary

We create 2 dictionaries, one for Product and one for Real World. Each dictionary has three keys: 
- ``name`` store the name of the domain;
- ``train`` store the train set dataloader of the domain; 
- ``test`` store the test set dataloader of the domain.

In [None]:
product_data, rw_data = get_data(batch_size=BATCH_SIZE, img_root_product=DIRECTORY_P, img_root_realworld=DIRECTORY_RW)

## Loss & Optimizer

In this section we define the losses and optimizers used in this project.

### Loss function

We declare one main loss, the Cross entropy loss $\mathcal{L_{ce}}$ for supervised tasks. The objective is as follows:
$$\begin{aligned} 
\mathcal{L_{ce}} = -\sum_{c=1}^My_{o,c}\log(p_{o,c})
\end{aligned}
$$
where $M$ is number of classes, $y$ is the binary indicator (0 or 1) if class label $c$ is the correct classification for observation $o$, and $p$ is the predicted probability observation $o$ of class $c$.
 
The other losses for UDA tasks are described and declared when the methods are presented.

In [None]:
def get_cost_function():
  return torch.nn.CrossEntropyLoss()

### Optimizer

We declare two functions:
- An optimizer, choose through the ``selection`` parameter that apply the learning rate to the model parameters :
  - SGD: return a Stochastic Gradient Descent optimizer;
  - Adam: return an Adam optimizer.
- A scheduler. The papers do not introduced a scheduler, but to potentially achieve better performances we adopted the exponential scheduler with a defined gamma.

In [None]:
def get_optimizer(parameters, selection, lr, weight_decay=0, momentum=0):
    if selection == 'SGD':
        optimizer = torch.optim.SGD(parameters, lr=lr, weight_decay=weight_decay, momentum=momentum)
    elif selection == 'Adam':
        optimizer = torch.optim.Adam(parameters, lr=lr, weight_decay=weight_decay)
    else:
        raise NameError(f"Optimizer {selection} not recognized")
    return optimizer

def get_scheduler(optimizer, gamma):
    return torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma)

## Performance visualizer

In this section we declare first a method to get the predictions, then we define three functions for results visualization:
- T-SNE algorithm to plot the visualization samples in a 2D space;
- Classification report;
- Confusion matrix.

Then we define a function to call all the three methods to performe the visualization of the results.

> Note that all the results will be saved in a local directory of the VM. We uploaded them on Google Drive to link them in the notebook.

In [None]:
def get_predictions(model, dataloader):
    y_features_list = []
    y_true_list = []
    y_pred_list = []

    if type(model) is dict: # manage the 3-model case of MCD_DA
        model["gen"].eval()
        model["clf1"].eval()
        model["clf2"].eval()
    else:
        model.eval()

    with torch.no_grad():
        for batch_idx, (inputs, labels) in enumerate(dataloader):
            inputs = inputs.to(DEVICE)
            labels = labels.to(DEVICE)
            
            if type(model) is dict:
                feat     = model["gen"](inputs)
                features = model["clf1"].forward_features(feat)
                outputs  = model["clf1"](feat) # we choose the classifier 1 to display the results, as the relative paper did
            else:
                features = model.forward_features(inputs) # get features for t-SNE visualization
                outputs = model(inputs) # get probability from the linear classifier
            preds = outputs.argmax(dim=1)

            y_features_list.append(features)
            y_true_list.append(labels)
            y_pred_list.append(preds)

    y_features = torch.cat(y_features_list).cpu().numpy()
    y_true = torch.cat(y_true_list).cpu().numpy()
    y_pred = torch.cat(y_pred_list).cpu().numpy()

    free_GPU_memory(inputs, labels, y_features_list, y_true_list, y_pred_list)
    
    return y_features, y_true, y_pred

#### T-SNE

We define two functions:
- ``get_tsne`` generates the t-SNE regarding the classes of target samples;
- ``get_tsne_src_tgt`` generates the t-SNE regarding the source and target samples.

Both functions take in input the features extracted from the last layer of the network, before the final linear classifier.

In [None]:
from sklearn.manifold import TSNE
from matplotlib.pyplot import cm

def get_tsne(labels, features, method_name): 
    tsne = TSNE(perplexity=20, n_components=2, init="pca", learning_rate='auto', random_state=42).fit_transform(features)

    category_to_color = {}
    color = cm.rainbow(np.linspace(0, 1, NUM_CLASSES))
    for i, c in zip(range(0,NUM_CLASSES), color):
        category_to_color[i] = c

    category_to_label = {}
    for i, c in zip(range(0,NUM_CLASSES), CLASSES):
        category_to_label[i] = c

    # plot each category with a distinct label
    fig = plt.figure(figsize = (12, 12))
    ax = fig.add_subplot(111)
    ax.set_title("t-SNE - " + method_name.split('/')[-2])
    for category, color in category_to_color.items():
        mask = labels == category
        ax.plot(tsne[mask, 0], tsne[mask, 1], 'o', 
                color=color, label=category_to_label[category])
        
    ax.legend(loc='best')
    fig.savefig(method_name + 'tsne.png')
    plt.close()


def get_tsne_src_tgt(source_y_features, target_y_features, method_name):
  source_y_features = torch.from_numpy(source_y_features)
  target_y_features = torch.from_numpy(target_y_features)

  tsne = TSNE(n_components=2, init="random", learning_rate='auto').fit_transform(torch.cat([source_y_features, target_y_features]))

  tsne_source, tsne_target = torch.split(torch.from_numpy(tsne), split_size_or_sections=source_y_features.shape[0])
  fig = plt.figure()
  ax = fig.add_subplot(111)
  ax.set_title("t-SNE - Source & Target of " + method_name.split('/')[-2])

  ax.plot(tsne_source[:, 0], tsne_source[:, 1], 'o', color="blue", label="source")
  ax.plot(tsne_target[:, 0], tsne_target[:, 1], 'o', color="red", label="target")
      
  ax.legend(loc='best')
  fig.savefig(method_name + 'source_target_tsne.png')
  plt.close()

#### Confusion matrix

In [None]:
from sklearn.metrics import confusion_matrix
import seaborn as sns
import pandas as pd

def get_confusion_matrix(y_true, y_pred, method_name):
    cf_matrix = confusion_matrix(y_true, y_pred, normalize='true')
    df_cm = pd.DataFrame(cf_matrix, index = CLASSES,columns = CLASSES)
    plt.figure(figsize = (13, 8))
    plt.title("Confusion matrix - " + method_name.split('/')[-2])
    sns.heatmap(df_cm, annot=True)
    plt.savefig(method_name + 'confusion_matrix.png')
    plt.close()

#### Classification report

In [None]:
from sklearn.metrics import classification_report
import pandas as pd

def get_classification_report(y_trues, y_preds, method_name):
    report = classification_report(y_trues, y_preds, target_names=CLASSES, output_dict=True)
    pd.DataFrame(report).transpose().to_html(method_name + 'classification_report.html')


#### Generate and save results

In [None]:
def visualize_results(model, dl_source, dl_target, method_name):
    if not os.path.exists(method_name):
        os.makedirs(method_name, exist_ok = True)
    
    y_features_src, _, _ = get_predictions(model, dl_source)
    y_features_tgt, y_trues, y_preds = get_predictions(model, dl_target)

    get_tsne(y_trues, y_features_tgt, method_name)
    get_tsne_src_tgt(y_features_src, y_features_tgt, method_name)
    get_confusion_matrix(y_trues, y_preds, method_name)
    get_classification_report(y_trues, y_preds, method_name)

___
___

## 1° implementation: Baseline using ResNet18

The first implementation is the Baseline. We finetune a ResNet18 supervisedly on the source domain and evaluate it, as it is, on the target domain **without** any domain alignment strategy.

We also compute the upper bound accuracy of each domain.

### Local constants for Baseline

In [None]:
# training
EPOCHS = 15
OPTIMIZER = 'SGD'
LR = 0.01
WD = 0 # default PyTorch parameter
MOMENTUM = 0.9
GAMMA = 0.99    

### Network architecture

We change the linear classifier with a new one. The output neurons are the number of the classes.

We also defined a ``forward_features`` function to get the features extracted by the ResNet18 before applying the linear classifier, used when we want to plot the t-SNE.

In [None]:
class ResNet18(torch.nn.Module):
    def __init__(self, num_classes):
        super(ResNet18, self).__init__()
        self.resnet = torchvision.models.resnet18(weights=WEIGHTS_RESNET18)
        self.feature_extractor = torch.nn.Sequential(*(list(self.resnet.children())[:-1]))
        # we get the last hidden layer to extract features before the classification. We feed these features to the Classification layer
        # Reference: https://stackoverflow.com/questions/55083642/extract-features-from-last-hidden-layer-pytorch-resnet18
        self.cls = torch.nn.Linear(512, num_classes) 
    
    def forward(self, x):
        x = self.feature_extractor(x)
        x = x.view(x.size(0), 512)
        x = self.cls(x)
        return x
    
    def forward_features(self, x):
        x = self.feature_extractor(x)
        x = x.view(x.size(0), 512)
        return x

### Training and Test steps

In [None]:
def training_step_baseline(model, optimizer, device, train_loader):
  samples = 0.
  total_loss = 0.
  total_acc = 0.
  cost_function = get_cost_function()

  model.train()

  for batch_idx, (inputs, labels) in enumerate(train_loader):
      
      inputs = inputs.to(device)
      labels = labels.to(device)

      optimizer.zero_grad()

      # forward pass
      outputs = model(inputs)
      loss = cost_function(outputs, labels)
      pred = outputs.argmax(dim=1)

      total_acc += pred.eq(labels).sum().item()

      loss.backward()
      optimizer.step()

      samples += inputs.shape[0]
      total_loss += loss.item()
  free_GPU_memory(inputs, labels)
  
  return {"train/train_acc":(total_acc/samples) * 100, 
          "train/train_loss": total_loss/samples} 


In [None]:
def test_step_baseline(model, optimizer, device, test_loader):
  samples = 0.
  cumulative_loss = 0.
  cumulative_accuracy = 0.
  cost_function = get_cost_function()

  model.eval()

  with torch.no_grad():
    for batch_idx, (inputs, targets) in enumerate(test_loader):

        inputs = inputs.to(device)
        targets = targets.to(device)

        samples += inputs.shape[0]

        outputs = model(inputs)

        loss = cost_function(outputs, targets)

        pred = outputs.argmax(dim=1)

        cumulative_loss += loss.item()
        cumulative_accuracy += pred.eq(targets).sum().item()

  free_GPU_memory(inputs, targets)

  return {"test/test_acc": (cumulative_accuracy/samples) * 100, 
          "test/test_loss": cumulative_loss/samples}


### Declare the training loop

In [None]:
def training_baseline(data_source, data_target, wandb_setup):

    config = wandb_setup.config
    print('CONFIGS\n', yaml.dump(config._items, default_flow_style=False)) # pretty print of configs

    model = ResNet18(NUM_CLASSES).to(DEVICE)
    optimizer = get_optimizer(model.parameters(), config.optimizer, config.lr, WD, MOMENTUM)
    
    best_acc = 0.
    best_loss = 0.
    
    for e in range(config.epochs):
        print(f'-- Epoch [{e+1}/{config.epochs}] --')
        train_metrics = training_step_baseline(model, optimizer, DEVICE, data_source['train'])
        test_metrics = test_step_baseline(model, optimizer, DEVICE, data_target['test'])
        wandb.log({**train_metrics, **test_metrics})
        print(f'Train -> \tLoss:{train_metrics["train/train_loss"]:.4f} \tAccuracy: {train_metrics["train/train_acc"]:.2f}')
        print(f'Test -> \tLoss:{test_metrics["test/test_loss"]:.4f} \tAccuracy: {test_metrics["test/test_acc"]:.2f}')

        if (best_acc < test_metrics["test/test_acc"]):
            best_model = copy.deepcopy(model)
            best_acc = test_metrics["test/test_acc"]
            best_loss = test_metrics["test/test_loss"]

    os.makedirs(WEIGHTS_PATH + 'baseline/', exist_ok = True) 
    torch.save(best_model.state_dict(), WEIGHTS_PATH + 'baseline/' + config.name + '.pt')
    
    visualize_results(best_model, data_source['test'], data_target['test'], ASSETS_PATH + 'baseline/' + config.name + '/')

    wandb.summary["test_best_acc"] = best_acc
    wandb.summary["test_best_loss"] = best_loss
    wandb.finish()

    free_GPU_memory(model, best_model)

### Let's train the Baseline!

**Train**: Product <br>
**Test**: Real World 

$P_{train} \rightarrow RW_{test}$

Best test accuracy $acc = 76\% $

In [None]:
NAME_RUN = "baseline_P_to_RW"
config={
    "model": "ResNet18",
    "version": "Source only",
    "name": NAME_RUN,
    "batch_size": BATCH_SIZE,
    "epochs": EPOCHS,
    "lr": LR,
    "optimizer": OPTIMIZER
}

training_baseline(product_data, rw_data, wandb.init(project=PROJECT_NAME, entity=ENTITY, name=NAME_RUN, mode=WANDB_MODE, config=config))

| t-SNE | Confusion matrix |
|-|-|
| ![tsne](https://drive.google.com/uc?export=view&id=1-X8dCHJjw8ogp4A-kgaVXSkj51JiLn65) | ![cm](https://drive.google.com/uc?export=view&id=1-Xj5EDf_BGJ9_J5g5HsPR8Mg_SXBtkMd)|

<br>

|Classification report |
|-|

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>precision</th>
      <th>recall</th>
      <th>f1-score</th>
      <th>support</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>backpack</th>
      <td>0.516129</td>
      <td>0.941176</td>
      <td>0.666667</td>
      <td>17.00</td>
    </tr>
    <tr>
      <th>bookcase</th>
      <td>0.928571</td>
      <td>0.650000</td>
      <td>0.764706</td>
      <td>20.00</td>
    </tr>
    <tr>
      <th>car jack</th>
      <td>0.541667</td>
      <td>0.866667</td>
      <td>0.666667</td>
      <td>15.00</td>
    </tr>
    <tr>
      <th>comb</th>
      <td>0.888889</td>
      <td>0.727273</td>
      <td>0.800000</td>
      <td>22.00</td>
    </tr>
    <tr>
      <th>crown</th>
      <td>1.000000</td>
      <td>0.952381</td>
      <td>0.975610</td>
      <td>21.00</td>
    </tr>
    <tr>
      <th>file cabinet</th>
      <td>0.608696</td>
      <td>0.636364</td>
      <td>0.622222</td>
      <td>22.00</td>
    </tr>
    <tr>
      <th>flat iron</th>
      <td>0.764706</td>
      <td>0.812500</td>
      <td>0.787879</td>
      <td>16.00</td>
    </tr>
    <tr>
      <th>game controller</th>
      <td>0.789474</td>
      <td>0.714286</td>
      <td>0.750000</td>
      <td>21.00</td>
    </tr>
    <tr>
      <th>glasses</th>
      <td>1.000000</td>
      <td>0.526316</td>
      <td>0.689655</td>
      <td>19.00</td>
    </tr>
    <tr>
      <th>helicopter</th>
      <td>1.000000</td>
      <td>0.894737</td>
      <td>0.944444</td>
      <td>19.00</td>
    </tr>
    <tr>
      <th>ice skates</th>
      <td>0.708333</td>
      <td>0.809524</td>
      <td>0.755556</td>
      <td>21.00</td>
    </tr>
    <tr>
      <th>letter tray</th>
      <td>0.677419</td>
      <td>0.777778</td>
      <td>0.724138</td>
      <td>27.00</td>
    </tr>
    <tr>
      <th>monitor</th>
      <td>0.600000</td>
      <td>0.750000</td>
      <td>0.666667</td>
      <td>20.00</td>
    </tr>
    <tr>
      <th>mug</th>
      <td>1.000000</td>
      <td>0.833333</td>
      <td>0.909091</td>
      <td>24.00</td>
    </tr>
    <tr>
      <th>network switch</th>
      <td>0.909091</td>
      <td>0.588235</td>
      <td>0.714286</td>
      <td>17.00</td>
    </tr>
    <tr>
      <th>over-ear headphones</th>
      <td>0.875000</td>
      <td>0.823529</td>
      <td>0.848485</td>
      <td>17.00</td>
    </tr>
    <tr>
      <th>pen</th>
      <td>0.650000</td>
      <td>0.764706</td>
      <td>0.702703</td>
      <td>17.00</td>
    </tr>
    <tr>
      <th>purse</th>
      <td>0.684211</td>
      <td>0.565217</td>
      <td>0.619048</td>
      <td>23.00</td>
    </tr>
    <tr>
      <th>stand mixer</th>
      <td>0.833333</td>
      <td>0.952381</td>
      <td>0.888889</td>
      <td>21.00</td>
    </tr>
    <tr>
      <th>stroller</th>
      <td>0.823529</td>
      <td>0.666667</td>
      <td>0.736842</td>
      <td>21.00</td>
    </tr>
    <tr class="blank_row">
      <td colspan="6"></td>
    </tr>
    <tr>
      <th>accuracy</th>
      <td>0.760000</td>
      <td>0.760000</td>
      <td>0.760000</td>
      <td>0.76</td>
    </tr>
    <tr>
      <th>macro avg</th>
      <td>0.789952</td>
      <td>0.762653</td>
      <td>0.761678</td>
      <td>400.00</td>
    </tr>
    <tr>
      <th>weighted avg</th>
      <td>0.793269</td>
      <td>0.760000</td>
      <td>0.763174</td>
      <td>400.00</td>
    </tr>
  </tbody>
</table>

| Loss | Accuracy |
|-|-|
| ![loss](https://drive.google.com/uc?export=view&id=1HdIhyivMMctJ5nFGHiNkbI9uis2_VLFB) | ![accuracy](https://drive.google.com/uc?export=view&id=1orumdytadQMAgoXPDqLlE0IMj8rT2wpM)|

The results from the t-SNE and confusion matrix indicate that the extracted features are not sufficient for accurate classification. The overall accuracy is low, and the confusion matrix is very noisy. Additionally, the t-SNE results are not reliable.

The "bookcase" category is particularly challenging to classify correctly, as it is often mistaken for a "file cabinet". This is likely due to similarities in patterns, such as vertical and horizontal lines, between the two objects. The Product domain images are of high quality, with controlled lighting and removed backgrounds, which may contribute to the model's poor performance when tested on Real World domain images. 

Furthermore, it appears that the model begins to overfit after 3 epochs.


___

**Train**: Real World <br>
**Test**: Product

$RW_{train} \rightarrow P_{test}$

Best test accuracy $acc = 92\% $

In [None]:
NAME_RUN = "baseline_RW_to_P"
config={
    "model": "ResNet18",
    "version": "Source only",
    "name": NAME_RUN,
    "batch_size": BATCH_SIZE,
    "epochs": EPOCHS,
    "lr": LR,
    "optimizer": OPTIMIZER
}

training_baseline(rw_data, product_data, wandb.init(project=PROJECT_NAME, entity=ENTITY, name=NAME_RUN, mode=WANDB_MODE, config=config))

| t-SNE | Confusion matrix |
|-|-|
| ![tsne](https://drive.google.com/uc?export=view&id=1-TtXK1Byfx_Z-CEAmBSw_X4jGWUwi_Pw) | ![cm](https://drive.google.com/uc?export=view&id=1-Ud9NVUq4bBPLXjLYrvA2OqqXKnrJQsx)|

<br>

|Classification report |
|-|

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>precision</th>
      <th>recall</th>
      <th>f1-score</th>
      <th>support</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>backpack</th>
      <td>0.964286</td>
      <td>0.931034</td>
      <td>0.947368</td>
      <td>29.00</td>
    </tr>
    <tr>
      <th>bookcase</th>
      <td>0.826087</td>
      <td>0.904762</td>
      <td>0.863636</td>
      <td>21.00</td>
    </tr>
    <tr>
      <th>car jack</th>
      <td>1.000000</td>
      <td>0.764706</td>
      <td>0.866667</td>
      <td>17.00</td>
    </tr>
    <tr>
      <th>comb</th>
      <td>0.826087</td>
      <td>1.000000</td>
      <td>0.904762</td>
      <td>19.00</td>
    </tr>
    <tr>
      <th>crown</th>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>20.00</td>
    </tr>
    <tr>
      <th>file cabinet</th>
      <td>0.875000</td>
      <td>0.777778</td>
      <td>0.823529</td>
      <td>18.00</td>
    </tr>
    <tr>
      <th>flat iron</th>
      <td>0.789474</td>
      <td>0.937500</td>
      <td>0.857143</td>
      <td>16.00</td>
    </tr>
    <tr>
      <th>game controller</th>
      <td>1.000000</td>
      <td>0.875000</td>
      <td>0.933333</td>
      <td>24.00</td>
    </tr>
    <tr>
      <th>glasses</th>
      <td>0.950000</td>
      <td>1.000000</td>
      <td>0.974359</td>
      <td>19.00</td>
    </tr>
    <tr>
      <th>helicopter</th>
      <td>0.894737</td>
      <td>1.000000</td>
      <td>0.944444</td>
      <td>17.00</td>
    </tr>
    <tr>
      <th>ice skates</th>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>19.00</td>
    </tr>
    <tr>
      <th>letter tray</th>
      <td>0.933333</td>
      <td>0.875000</td>
      <td>0.903226</td>
      <td>16.00</td>
    </tr>
    <tr>
      <th>monitor</th>
      <td>1.000000</td>
      <td>0.900000</td>
      <td>0.947368</td>
      <td>20.00</td>
    </tr>
    <tr>
      <th>mug</th>
      <td>0.944444</td>
      <td>1.000000</td>
      <td>0.971429</td>
      <td>17.00</td>
    </tr>
    <tr>
      <th>network switch</th>
      <td>0.960000</td>
      <td>1.000000</td>
      <td>0.979592</td>
      <td>24.00</td>
    </tr>
    <tr>
      <th>over-ear headphones</th>
      <td>0.789474</td>
      <td>1.000000</td>
      <td>0.882353</td>
      <td>15.00</td>
    </tr>
    <tr>
      <th>pen</th>
      <td>0.923077</td>
      <td>0.827586</td>
      <td>0.872727</td>
      <td>29.00</td>
    </tr>
    <tr>
      <th>purse</th>
      <td>0.857143</td>
      <td>0.857143</td>
      <td>0.857143</td>
      <td>21.00</td>
    </tr>
    <tr>
      <th>stand mixer</th>
      <td>0.937500</td>
      <td>0.789474</td>
      <td>0.857143</td>
      <td>19.00</td>
    </tr>
    <tr>
      <th>stroller</th>
      <td>0.952381</td>
      <td>1.000000</td>
      <td>0.975610</td>
      <td>20.00</td>
    </tr>
    <tr class="blank_row">
      <td colspan="6"></td>
    </tr>
    <tr>
      <th>accuracy</th>
      <td>0.920000</td>
      <td>0.920000</td>
      <td>0.920000</td>
      <td>0.92</td>
    </tr>
    <tr>
      <th>macro avg</th>
      <td>0.921151</td>
      <td>0.921999</td>
      <td>0.918092</td>
      <td>400.00</td>
    </tr>
    <tr>
      <th>weighted avg</th>
      <td>0.925376</td>
      <td>0.920000</td>
      <td>0.919515</td>
      <td>400.00</td>
    </tr>
  </tbody>
</table>

| Loss | Accuracy |
|-|-|
| ![loss](https://drive.google.com/uc?export=view&id=10wKBqiLWtr1spSJR2tjjdpz0xp6MqVM_) | ![accuracy](https://drive.google.com/uc?export=view&id=1AP49Fd4L4ZpFCbrzahE7uYB9bkv_egYJ)|

In contrast to the previous experiment, we have observed an increase in accuracy when training on Real World images and testing on Product images. 

The t-SNE and confusion matrix results show a clear improvement in classification accuracy. Some object categories are classified completely correct. This improvement can be attributed to the Real World domain images providing a better generalization for the model, resulting in a higher accuracy when tested on Product images. 

As in the previous experiment, we also observe that the model tends to overfit after 3-4 epochs.
___

**Train**: Product <br>
**Test**: Product 

$P_{train} \rightarrow P_{test}$

Upper bound accuracy $acc = 96\% $

In [None]:
NAME_RUN = "baseline_P_to_P"
config={
    "model": "ResNet18",
    "version": "Source only",
    "name": NAME_RUN,
    "batch_size": BATCH_SIZE,
    "epochs": EPOCHS,
    "lr": LR,
    "optimizer": OPTIMIZER
}

training_baseline(product_data, product_data, wandb.init(project=PROJECT_NAME, entity=ENTITY, name=NAME_RUN, mode=WANDB_MODE, config=config))

| t-SNE | Confusion matrix | 
|-|-|
| ![tsne](https://drive.google.com/uc?export=view&id=1-R2SWSIcg2CzrbKKvYds89aAJJzAbFYc) | ![cm](https://drive.google.com/uc?export=view&id=1-S60LG_CV88_jnU4Org6x_j7ieOCAGa-) |

<br>

|Classification report |
|-|

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>precision</th>
      <th>recall</th>
      <th>f1-score</th>
      <th>support</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>backpack</th>
      <td>0.966667</td>
      <td>1.000000</td>
      <td>0.983051</td>
      <td>29.000</td>
    </tr>
    <tr>
      <th>bookcase</th>
      <td>0.869565</td>
      <td>0.952381</td>
      <td>0.909091</td>
      <td>21.000</td>
    </tr>
    <tr>
      <th>car jack</th>
      <td>0.937500</td>
      <td>0.882353</td>
      <td>0.909091</td>
      <td>17.000</td>
    </tr>
    <tr>
      <th>comb</th>
      <td>0.904762</td>
      <td>1.000000</td>
      <td>0.950000</td>
      <td>19.000</td>
    </tr>
    <tr>
      <th>crown</th>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>20.000</td>
    </tr>
    <tr>
      <th>file cabinet</th>
      <td>1.000000</td>
      <td>0.833333</td>
      <td>0.909091</td>
      <td>18.000</td>
    </tr>
    <tr>
      <th>flat iron</th>
      <td>0.941176</td>
      <td>1.000000</td>
      <td>0.969697</td>
      <td>16.000</td>
    </tr>
    <tr>
      <th>game controller</th>
      <td>0.954545</td>
      <td>0.875000</td>
      <td>0.913043</td>
      <td>24.000</td>
    </tr>
    <tr>
      <th>glasses</th>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>19.000</td>
    </tr>
    <tr>
      <th>helicopter</th>
      <td>0.894737</td>
      <td>1.000000</td>
      <td>0.944444</td>
      <td>17.000</td>
    </tr>
    <tr>
      <th>ice skates</th>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>19.000</td>
    </tr>
    <tr>
      <th>letter tray</th>
      <td>0.937500</td>
      <td>0.937500</td>
      <td>0.937500</td>
      <td>16.000</td>
    </tr>
    <tr>
      <th>monitor</th>
      <td>1.000000</td>
      <td>0.900000</td>
      <td>0.947368</td>
      <td>20.000</td>
    </tr>
    <tr>
      <th>mug</th>
      <td>0.944444</td>
      <td>1.000000</td>
      <td>0.971429</td>
      <td>17.000</td>
    </tr>
    <tr>
      <th>network switch</th>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>24.000</td>
    </tr>
    <tr>
      <th>over-ear headphones</th>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>15.000</td>
    </tr>
    <tr>
      <th>pen</th>
      <td>0.862069</td>
      <td>0.862069</td>
      <td>0.862069</td>
      <td>29.000</td>
    </tr>
    <tr>
      <th>purse</th>
      <td>1.000000</td>
      <td>0.904762</td>
      <td>0.950000</td>
      <td>21.000</td>
    </tr>
    <tr>
      <th>stand mixer</th>
      <td>0.950000</td>
      <td>1.000000</td>
      <td>0.974359</td>
      <td>19.000</td>
    </tr>
    <tr>
      <th>stroller</th>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>20.000</td>
    </tr>
    <tr class="blank_row">
      <td colspan="6"></td>
    </tr>
    <tr>
      <th>accuracy</th>
      <td>0.955000</td>
      <td>0.955000</td>
      <td>0.955000</td>
      <td>0.955</td>
    </tr>
    <tr>
      <th>macro avg</th>
      <td>0.958148</td>
      <td>0.957370</td>
      <td>0.956512</td>
      <td>400.000</td>
    </tr>
    <tr>
      <th>weighted avg</th>
      <td>0.956765</td>
      <td>0.955000</td>
      <td>0.954689</td>
      <td>400.000</td>
    </tr>
  </tbody>
</table>


___

**Train**: Real World <br>
**Test**: Real World 

$RW_{train} \rightarrow RW_{test}$

Upper bound accuracy $acc = 91\% $

In [None]:
NAME_RUN = "baseline_RW_to_RW"
config={
        "model": "ResNet18",
        "version": "Source only",
        "name": NAME_RUN,
        "batch_size": BATCH_SIZE,
        "epochs": EPOCHS,
        "lr": LR,
        "optimizer": OPTIMIZER
    }

training_baseline(rw_data, rw_data, wandb.init(project=PROJECT_NAME, entity=ENTITY, name=NAME_RUN, mode=WANDB_MODE, config=config))

| t-SNE | Confusion matrix | 
|-|-|
| ![tsne](https://drive.google.com/uc?export=view&id=1-VHZFUVkQniNZbZ2UTdxvp_OCH5Z1yQw) | ![cm](https://drive.google.com/uc?export=view&id=1-WIzxpDSEtDhVXJlcEama4OyrSpZXxDz) |

<br>

|Classification report |
|-|

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>precision</th>
      <th>recall</th>
      <th>f1-score</th>
      <th>support</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>backpack</th>
      <td>0.809524</td>
      <td>1.000000</td>
      <td>0.894737</td>
      <td>17.000</td>
    </tr>
    <tr>
      <th>bookcase</th>
      <td>0.941176</td>
      <td>0.800000</td>
      <td>0.864865</td>
      <td>20.000</td>
    </tr>
    <tr>
      <th>car jack</th>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>15.000</td>
    </tr>
    <tr>
      <th>comb</th>
      <td>0.909091</td>
      <td>0.909091</td>
      <td>0.909091</td>
      <td>22.000</td>
    </tr>
    <tr>
      <th>crown</th>
      <td>0.954545</td>
      <td>1.000000</td>
      <td>0.976744</td>
      <td>21.000</td>
    </tr>
    <tr>
      <th>file cabinet</th>
      <td>0.818182</td>
      <td>0.818182</td>
      <td>0.818182</td>
      <td>22.000</td>
    </tr>
    <tr>
      <th>flat iron</th>
      <td>0.937500</td>
      <td>0.937500</td>
      <td>0.937500</td>
      <td>16.000</td>
    </tr>
    <tr>
      <th>game controller</th>
      <td>0.909091</td>
      <td>0.952381</td>
      <td>0.930233</td>
      <td>21.000</td>
    </tr>
    <tr>
      <th>glasses</th>
      <td>0.947368</td>
      <td>0.947368</td>
      <td>0.947368</td>
      <td>19.000</td>
    </tr>
    <tr>
      <th>helicopter</th>
      <td>1.000000</td>
      <td>0.894737</td>
      <td>0.944444</td>
      <td>19.000</td>
    </tr>
    <tr>
      <th>ice skates</th>
      <td>1.000000</td>
      <td>0.761905</td>
      <td>0.864865</td>
      <td>21.000</td>
    </tr>
    <tr>
      <th>letter tray</th>
      <td>0.793103</td>
      <td>0.851852</td>
      <td>0.821429</td>
      <td>27.000</td>
    </tr>
    <tr>
      <th>monitor</th>
      <td>0.904762</td>
      <td>0.950000</td>
      <td>0.926829</td>
      <td>20.000</td>
    </tr>
    <tr>
      <th>mug</th>
      <td>0.916667</td>
      <td>0.916667</td>
      <td>0.916667</td>
      <td>24.000</td>
    </tr>
    <tr>
      <th>network switch</th>
      <td>0.809524</td>
      <td>1.000000</td>
      <td>0.894737</td>
      <td>17.000</td>
    </tr>
    <tr>
      <th>over-ear headphones</th>
      <td>0.941176</td>
      <td>0.941176</td>
      <td>0.941176</td>
      <td>17.000</td>
    </tr>
    <tr>
      <th>pen</th>
      <td>0.941176</td>
      <td>0.941176</td>
      <td>0.941176</td>
      <td>17.000</td>
    </tr>
    <tr>
      <th>purse</th>
      <td>0.882353</td>
      <td>0.652174</td>
      <td>0.750000</td>
      <td>23.000</td>
    </tr>
    <tr>
      <th>stand mixer</th>
      <td>0.913043</td>
      <td>1.000000</td>
      <td>0.954545</td>
      <td>21.000</td>
    </tr>
    <tr>
      <th>stroller</th>
      <td>0.909091</td>
      <td>0.952381</td>
      <td>0.930233</td>
      <td>21.000</td>
    </tr>
    <tr class="blank_row">
      <td colspan="6"></td>
    </tr>
    <tr>
      <th>accuracy</th>
      <td>0.905000</td>
      <td>0.905000</td>
      <td>0.905000</td>
      <td>0.905</td>
    </tr>
    <tr>
      <th>macro avg</th>
      <td>0.911869</td>
      <td>0.911330</td>
      <td>0.908241</td>
      <td>400.000</td>
    </tr>
    <tr>
      <th>weighted avg</th>
      <td>0.908879</td>
      <td>0.905000</td>
      <td>0.903542</td>
      <td>400.000</td>
    </tr>
  </tbody>
</table>


___

### Observations

The test accuracies obtained by finetuning a pre-trained ResNet18 model without using any unsupervised domain adaptation framework are already quite high. This is likely due to the fact that ResNet18 is a model that has been trained on a large dataset of images (ImageNet in this case) and is able to generalize well, allowing for the transfer of learned features to other image datasets such as Adaptiope. 

The experiments show that training on Real World domain and testing on Product domain leads to better results. 

Additionally, it is observed that the model tends to overfit after 3-4 epochs in all experiments.

We report in a table the results obtained by the Baseline implementation: 

|       | Baseline | 
|-------|----------|
| $P\rightarrow RW$ | $76\%$ |
| $RW\rightarrow P$ | $92\%$ |
| $P\rightarrow P$  | $96\%$ |
| $RW\rightarrow RW$| $91\%$ |

___
___

## 2° implementation: Maximum Classifier Discrepancy for UDA (MCD_DA)

To improve the baseline we decided to implement the [Maximum Classifier Discrepancy for Unsupervised Domain Adaptation](https://arxiv.org/abs/1712.02560) approach proposed by Saito et. al. in 2018.

Many UDA algorithms, particularly those for training neural networks, attempt to match the distribution of the source features with that of the target without considering the category of the sample. The method utilize two players to align distributions in an adversarial manner: domain classifier and feature generator. Source and target samples are input to the same feature generator.
The authors pointed out that the previous method assumes that target features are aligned with the source samples, thus classify them correctly by the object classifier.

The proposed method tries to overcome two main problems:
-  previous approaches should fail to extract discriminative features because they do not consider the relationship between target samples and the object decision boundary when aligning distributions;
- the generator can generate ambiguous features near the boundary because it simply tries to make the two distributions similar.

This approach, instead, attempts to align distributions of source and target by utilizing the object decision boundaries. They propose to maximize the discrepancy between two classifiers’ outputs to detect target samples that are far from the support of the source. A feature generator learns to generate target features near the support to minimize the discrepancy.

We can clearly see from the figure below what the authors proposed and wanted to achieve with this approach:
- Left. Previous methods try to match different distributions by mimicing the domain classifier without considering the decision boundaries;
- Right. The proposed method tries to detect target samples outside the support of the source distribution using an object classifier.

![figure_1](https://drive.google.com/uc?export=view&id=1vXFsoSgyCa-tVA1qFSRVS-xZwAbD3qyA)

Now we give **more details** of the overall process. 

First of all, we want to remark that we have access to a labeled source image $x_s$ and a corresponding label $y_s$ drawn from a set of labeled source images ${X_s, Y_s}$, as well as an unlabeled target $x_t$ drawn from unlabeled target images $X_t$.

The forward pipeline is the following:
- The generator $G$ extracts features from the inputs $\mathbf{x_s}$ or $\mathbf{x_t}$;
- Two classifiers take the features from the generator and try to classify them into $K$ classes (in this case 20 classes);
- The output of each discriminator is a K-dimensional vector of logits, where we apply the softmax function to the vector to obtain class probabilities. We use the annotation $p_1(\mathbf{y|x})$ and $p_2(\mathbf{y|x})$ respectively for the output vector from $F_1$ and $F_2$ with input $x$.

The goal of the method is to classify, using the two classifiers, the source samples correctly and, simultaneously, they are trained to detect the target samples that are far from the support of the source. In this way we are trying to align source and target features and consider the relationship between class boundaries and target samples. This means that samples far from the support do not have discriminative features because they are not clearly categorized into some classes. To identify these misclassified target samples, we utilize the disagreement of the two classifiers on the prediction for target samples. In this way the two classifiers can be treated as discriminators. The figure below can help us to understand the idea:
- Leftmost side. Two classifiers, inizialized differently, are assumed to classify source samples correctly (realistic assumption because we train supervisely the network with the source dataset). The *Discrepancy region* is likely to misclassify target samples;
- Right side. The generator is trained to output features that simply tries to make the two distributions similar by generating target features near the support, while considering classifiers' output for target samples. In this way the generator avoids generating features outside the support of the source. To achieve this we use the term *discrepancy* $d\left(p_1(\mathbf{y|x_t}), p_2(\mathbf{y|x_t})\right)$, to measure the disagreement of the two classifiers on their predictions.

![overview](https://mil-tokyo.github.io/MCD_DA/overview.png)

At the end, the proposed method trains discriminators $F_1$ and $F_2$ to maximize the discrepancy (*Maximize Discrepancy* in the figure) given target features, and then train the generator to fool the discrimators by minimizing the discrepancy (*Minimize Discrepancy* in the figure). The goal is to obtains features in which the support of the target is included by that of the source (*Obtained Distributions* in the figure). This allows the generator to generate discriminative features for target samples because it considers the relationship between the decision boundaries and target samples. This training is achieved in an adversarial manner.

Each step of the paper will be discussed in details in the following sections.

### Local constants for MCD_DA

In [None]:
# training
EPOCHS = 15
OPTIMIZER = 'Adam' 
LR = 0.001
WD = 5e-4
MOMENTUM = 0.9
GAMMA = 0.99
NUM_K = 4

### Loss function

The loss function applied for the unsupervised DA task is the discrepancy loss $\mathcal{L_d}$ used and defined in the paper as the absolute values of the difference beween the two classifiers' probabilistic outputs:

$$
\begin{aligned}
\mathcal{L_d}\left(p_1, p_2\right)=\frac{1}{K} \sum_{k=1}^K\left|p_{1 _k}-p_{2_k}\right|
\end{aligned}
$$

where $p_{1_k}$ and $p_{2_k}$ denote probability output of $p_1$ and $p_2$ for class $k$ respectively.

In [None]:
def get_discrepancy_loss(p1,p2):
  d_loss = torch.mean(torch.abs(F.softmax(p1, dim=1) - F.softmax(p2, dim=1)))
  return d_loss

### Network architecture

The network is separated into two modules:
- Generator ($G$), the feature generator is a ResNet18 without the classification layer. The output will be a 512 features vector;
- Discriminator ($F_1$,$F_2$), the two classifiers are a feed-forward network. The output will be a 20 logits vector corresponding to the probability of each class of the image in input.

In the Discriminator Class we have also added a ``forward_features`` function to extract features before the classification layer. These features are useful for the visualization using T-SNE.

> We have not fixed any random seed for the networks, so the classifiers' weights are initizialized randomly from PyTorch. In this way we follow what the paper wants for the two classifiers.

In [None]:
class Generator(torch.nn.Module):
    def __init__(self): 
        super(Generator, self).__init__()
        self.resnet = torchvision.models.resnet18(weights=WEIGHTS_RESNET18)
        self.feature_extractor = torch.nn.Sequential(*(list(self.resnet.children())[:-1])) 
        
    def forward(self, x):
        x = self.feature_extractor(x)
        x = x.view(x.size(0), 512)
        return x

class Discriminator(torch.nn.Module):
    def __init__(self, num_classes):
        super(Discriminator, self).__init__()
        self.num_classes = num_classes
        self.dropout = torch.nn.Dropout(p=0.5)
        self.bn = torch.nn.BatchNorm1d(256,affine=True)
        self.relu = torch.nn.ReLU(inplace=True)
        self.fc1 = torch.nn.Linear(512, 256)
        self.fc2 = torch.nn.Linear(256, 256)
        self.cls = torch.nn.Linear(256, num_classes)

    def forward(self, x):
        x = self.relu(self.bn(self.fc1(self.dropout(x))))
        x = self.relu(self.bn(self.fc2(self.dropout(x))))
        x = self.cls(x)
        return x

    def forward_features(self, x):
        x = self.relu(self.bn(self.fc1(self.dropout(x))))
        x = self.relu(self.bn(self.fc2(self.dropout(x))))
        return x

### Training step

For the training process, the authors proposed 3 steps:
- **Step A**. We train both classifiers $F_1$ and $F_2$ and generator $G$ to classify the source samples correctly. We compute the cross-entropy loss between the outputs of the classifiers and the corresponding source labels;
- **Step B**. We train both classifiers as a discriminator fixing the generator. We compute the cross-entropy loss of the outputs and labels source, and the discrepancy loss between the output of target images from the classifiers. In this step $F_1$ and $F_2$ try to maximize the discrepancy loss;
- **Step C**. We train the generator to mimimize the discrepancy fixing the classifiers. The parameter ``num_k`` indicates the number of times we repeat this step for the same batch. This term denotes the trade-off between the generator and the classifiers.

These three steps are repeated across the entire training step and for every batch. The order of the steps is not revelant, according to the authors. We deploy the solution in the order presented by the paper.

In [None]:
def reset_grad(opt):
    opt["gen"].zero_grad()
    opt["clf1"].zero_grad()
    opt["clf2"].zero_grad()

#### Step A

We train the classifier $F_1$ and $F_2$ and the generator $G$ in supervised mode to classify the source samples correctly. We compute the cross-entropy loss between the outputs of the classifiers and the corresponding source labels, then we backpropagate the error. We train the network to minimize the cross entropy loss. The objective is as follow:

$$\begin{aligned}
\min_{G, F_1, F_2} \mathcal{L_{ce}}\left(X_s, Y_s\right)
\end{aligned}$$

where $\mathcal{L_{ce}}$ is the Cross entropy loss declared in section *Loss function* of this notebook, $X_s$ and $Y_s$ are the set of labeled source images.

In [None]:
def stepA(net, opt, img_source, label_source, cost_function):

      # Forward pass
      feat_s    = net["gen"](img_source)
      output_s1 = net["clf1"](feat_s)
      output_s2 = net["clf2"](feat_s)

      # Apply the losses
      loss_s1 = cost_function(output_s1, label_source)
      loss_s2 = cost_function(output_s2, label_source)
      loss_s = loss_s1 + loss_s2

      # Backward pass
      loss_s.backward()

      # Update parameters
      opt["gen"].step()
      opt["clf1"].step()
      opt["clf2"].step()

      # Reset grad
      reset_grad(opt)
      
      return [loss_s1, loss_s2]


#### Step B 

We train both classifiers $F_1$ and $F_2$ as a discriminator fixing the generator $G$. We compute the cross-entropy loss of the outputs and labels source, and the discrepancy loss between the outputs of target images from the classifiers. By training the classifiers to increase the discrepancy, they can detect the target samples excluded by the support of the source. In this step $F_1$ and $F_2$ try to maximize the discrepancy loss.

<img src="https://drive.google.com/uc?export=view&id=1j18HhFRW22YXKTAOGqHvQ3BGukBIDu94" alt="stepB" width="500">

The objective is as follows:

$$\begin{aligned}
\min_{F_1, F_2} [\mathcal{L_{ce_{F_1}}}\left(X_s, Y_s\right) + \mathcal{L_{ce_{F_2}}}\left(X_s, Y_s\right)] - \mathcal{L_{d}}(X_t)
\end{aligned}$$

where $\mathcal{L_{ce_{F_1}}}$ is the cross entropy loss computed by the classifier $F_1$, $\mathcal{L_{ce_{F_2}}}$ is the cross entropy loss computed by the classifier $F_2$, and $\mathcal{L_{d}}$ is the discrepancy loss and $X_t$ is the target sample.


In [None]:
def stepB(net, opt, img_source, label_source, img_target, cost_function):

    # Forward pass
    feat_s    = net["gen"](img_source)
    output_s1 = net["clf1"](feat_s)
    output_s2 = net["clf2"](feat_s)

    feat_t    = net["gen"](img_target)
    output_t1 = net["clf1"](feat_t)
    output_t2 = net["clf2"](feat_t)

    # Apply the losses
    loss_s1 = cost_function(output_s1,label_source)
    loss_s2 = cost_function(output_s2,label_source)
    loss_disc = get_discrepancy_loss(output_t1, output_t2)

    loss = (loss_s1 + loss_s2) - loss_disc
    loss.backward()

    # We don't step the generator because we keep the weights fixed
    opt["clf1"].step() 
    opt["clf2"].step()

    # Reset the optimizers
    reset_grad(opt)

    return [loss_s1, loss_s2, loss_disc]


#### Step C

We train the generator to mimimize the discrepancy fixing the classifiers. The parameter ``num_k`` indicates the number of times we repeat this step for the same batch. This term denotes the trade-off between the generator and the classifiers.

<img src="https://drive.google.com/uc?export=view&id=1DgaxB2wVHxp51NoHJSFgGX369zDQeFok" alt="stepC" width="500">

The objective is as follows:
$$\begin{aligned}
\min_{G} \mathcal{L_{d}}(X_t)
\end{aligned}$$

In [None]:
def stepC(net, opt, img_target, cost_function, num_k):
      loss_disc = None
      for i in range(num_k):
        # Forward pass
        feat_t    = net["gen"](img_target)
        output_t1 = net["clf1"](feat_t)
        output_t2 = net["clf2"](feat_t)

        loss_disc = get_discrepancy_loss(output_t1, output_t2)
        loss_disc.backward()

        # We don't step the classifiers because we keep the weights fixed
        opt["gen"].step()

        # Reset the optimizers
        reset_grad(opt)

      return [loss_disc]


In the following function ``training_step_MCD_DA``, we train the three networks on all the training set batches by using the three steps presented above.

Following what we have seen in the course lab, we implemented ``try`` and ``except`` condition when we reach the end of the target set batch, in order to restart it when it ends.

It is worth noticing that we do not extract the target labels from the target batch. This confirms that we are working in unsupervised mode with the target samples.

In [None]:
def training_step_MCD_DA(net, opt, scheduler, cost_function, source_train_loader, target_train_loader):
  source_samples = 0.
  target_samples = 0.
  cumulative_ce_loss  = np.zeros(2)
  cumulative_discrepancy = 0.

  target_iter = iter(target_train_loader)

  net["gen"].train()
  net["clf1"].train()
  net["clf2"].train()

  for batch_idx, (img_source, label_source) in enumerate(source_train_loader):

      # get target data. If the target iterator reaches the end, restart it
      try:
        img_target, _ = next(target_iter)
      except:
        target_iter = iter(target_train_loader)
        img_target, _ = next(target_iter)

      img_source = img_source.to(DEVICE)
      label_source = label_source.to(DEVICE)
      img_target = img_target.to(DEVICE)

      loss_stepA = stepA(net, opt, img_source, label_source, cost_function)
      loss_stepB = stepB(net, opt, img_source, label_source, img_target, cost_function)
      loss_stepC = stepC(net, opt, img_target, cost_function, num_k = NUM_K)
      
      source_samples += img_source.shape[0]
      target_samples += img_target.shape[0]

      cumulative_ce_loss[0]  += loss_stepB[0].item()
      cumulative_ce_loss[1]  += loss_stepB[1].item()
      cumulative_discrepancy += loss_stepC[0].item()

  scheduler['clf1'].step()
  scheduler['clf2'].step()
  
  free_GPU_memory(img_source, label_source, img_target)

  return {"train/train_loss1": cumulative_ce_loss[0]/source_samples, 
          "train/train_loss2": cumulative_ce_loss[1]/source_samples, 
          "train/train_disc_loss": cumulative_discrepancy/target_samples}

### Test step

The **ensemble accuracy** refers to the accuracy of a group of models when used together to make predictions. In this test step we compute it by sum the outputs logits of the two classifiers.

In [None]:
def test_step_MCD_DA(net, cost_function, target_test_loader):
  samples = 0.
  test_loss = 0.
  cumulative_accuracy = np.zeros(3)

  net["gen"].eval()
  net["clf1"].eval()
  net["clf2"].eval()

  with torch.no_grad():
    for batch_idx, (img, label) in enumerate(target_test_loader):
      
      img = img.to(DEVICE)
      label = label.to(DEVICE)
        
      # Forward pass
      feat      = net["gen"](img)
      output_c1 = net["clf1"](feat)
      output_c2 = net["clf2"](feat)

      # Apply the loss
      test_loss += cost_function(output_c1, label).item()
      output_ensemble = output_c1 + output_c2

      # Predictions
      predicted_1   = output_c1.argmax(dim=1)
      predicted_2   = output_c2.argmax(dim=1)
      pred_ensemble = output_ensemble.argmax(dim=1)

      # Calculate accuracy
      cumulative_accuracy[0] += predicted_1.eq(label).sum().item()
      cumulative_accuracy[1] += predicted_2.eq(label).sum().item()
      cumulative_accuracy[2] += pred_ensemble.eq(label).sum().item()

      samples += img.shape[0]

  test_loss = test_loss / samples
  
  free_GPU_memory(img, label)

  return {"test/test_loss": test_loss, 
          "test/test_accuracy1": (cumulative_accuracy[0]/samples)*100,
          "test/test_accuracy2": (cumulative_accuracy[1]/samples)*100,
          "test/test_accuracy_ensemble": (cumulative_accuracy[2]/samples)*100}

### Declare the training loop

In [None]:
def training_MCD_DA(data_source, data_target, wandb_setup):
    config = wandb_setup.config
    print('CONFIGS\n', yaml.dump(config._items, default_flow_style=False)) # Pretty print of configs

    # Instantiates the network architecture 
    net = {
      "gen" : Generator().to(DEVICE), 
      "clf1": Discriminator(NUM_CLASSES).to(DEVICE), 
      "clf2": Discriminator(NUM_CLASSES).to(DEVICE)
      }

    # Instantiates the optimizers
    opt = {
      "gen" : get_optimizer(net["gen"].parameters() , config.optimizer, config.lr/10, config.wd), # small ResNet18 finetune 
      "clf1": get_optimizer(net["clf1"].parameters(), config.optimizer, config.lr, config.wd), 
      "clf2": get_optimizer(net["clf2"].parameters(), config.optimizer, config.lr, config.wd)
      }

    # Instantiates the schedulers
    scheduler = {
       "clf1": get_scheduler(opt['clf1'], GAMMA),
       "clf2": get_scheduler(opt['clf2'], GAMMA)
    }

    # Instantiates the cost function
    cost_function = get_cost_function().to(DEVICE)

    best_acc = 0.
    best_loss = 0.

    # Loop epochs
    for e in range(config.epochs):
        print(f'-- Epoch [{e+1}/{config.epochs}] --')
        train_metrics = training_step_MCD_DA(net, opt, scheduler, cost_function, data_source['train'], data_target['train'])
        test_metrics = test_step_MCD_DA(net, cost_function, data_target['test'])
        wandb.log({**train_metrics, **test_metrics})
        print('Train: \tLoss1: {:.6f}\t Loss2: {:.6f}\t Discrepancy: {:.6f}'.format(train_metrics["train/train_loss1"], train_metrics["train/train_loss2"], train_metrics["train/train_disc_loss"]))
        print('Test: \tAverage loss: {:.6f}\t Accuracy C1: {:.2f}%\t Accuracy C2: {:.2f}%\t Accuracy Ensemble: {:.2f}%'.format(test_metrics["test/test_loss"], test_metrics["test/test_accuracy1"], test_metrics["test/test_accuracy2"], test_metrics["test/test_accuracy_ensemble"]))
    
        if (best_acc < test_metrics["test/test_accuracy1"]): # test_accuracy1 reported. Also the paper used it to display results
            best_net = copy.deepcopy(net)
            best_acc = test_metrics["test/test_accuracy1"]
            best_loss = test_metrics["test/test_loss"]

    os.makedirs(WEIGHTS_PATH + 'mcd_da/', exist_ok = True) 
    torch.save(best_net['gen'].state_dict(), WEIGHTS_PATH + 'mcd_da/' + config.name + '_G.pt')
    torch.save(best_net['clf1'].state_dict(), WEIGHTS_PATH + 'mcd_da/' + config.name + '_C1.pt')
    torch.save(best_net['clf2'].state_dict(), WEIGHTS_PATH + 'mcd_da/' + config.name + '_C2.pt')

    visualize_results(best_net, data_source['test'], data_target['test'], ASSETS_PATH + 'mcd_da/' + config.name + '/')
    
    wandb.summary["test_best_acc"] = best_acc
    wandb.summary["test_best_loss"] = best_loss
    wandb.finish()

    free_GPU_memory(net, best_net)

### Let's train the MCD_DA!

**Train**: Product <br>
**Test**: Real World 

Best test accuracy $acc = 85\% $

In [None]:
NAME_RUN = "MCD_DA_P_to_RW"
config={
        "backbone": "ResNet18",
        "version": "DA",
        "name": NAME_RUN,
        "batch_size": BATCH_SIZE,
        "epochs": EPOCHS,
        "lr": LR,
        "optimizer": OPTIMIZER,
        "wd": WD,
        "momentum": MOMENTUM
    }

training_MCD_DA(product_data, rw_data, wandb.init(project=PROJECT_NAME, entity=ENTITY, name=NAME_RUN, mode=WANDB_MODE, config=config))

| t-SNE | Confusion matrix |
|-|-|
| ![tsne](https://drive.google.com/uc?export=view&id=1C7QauhRC335nGtimn1VXSE5e0yqT0aha) | ![cm](https://drive.google.com/uc?export=view&id=1yzh2zi5pLSBLLb-NmCDEZD8wAbITi-cd)|

<br>

|Classification report |
|-|

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>precision</th>
      <th>recall</th>
      <th>f1-score</th>
      <th>support</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>backpack</th>
      <td>0.750000</td>
      <td>0.882353</td>
      <td>0.810811</td>
      <td>17.0000</td>
    </tr>
    <tr>
      <th>bookcase</th>
      <td>0.857143</td>
      <td>0.600000</td>
      <td>0.705882</td>
      <td>20.0000</td>
    </tr>
    <tr>
      <th>car jack</th>
      <td>0.750000</td>
      <td>0.800000</td>
      <td>0.774194</td>
      <td>15.0000</td>
    </tr>
    <tr>
      <th>comb</th>
      <td>0.857143</td>
      <td>0.818182</td>
      <td>0.837209</td>
      <td>22.0000</td>
    </tr>
    <tr>
      <th>crown</th>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>21.0000</td>
    </tr>
    <tr>
      <th>file cabinet</th>
      <td>0.761905</td>
      <td>0.727273</td>
      <td>0.744186</td>
      <td>22.0000</td>
    </tr>
    <tr>
      <th>flat iron</th>
      <td>0.750000</td>
      <td>0.937500</td>
      <td>0.833333</td>
      <td>16.0000</td>
    </tr>
    <tr>
      <th>game controller</th>
      <td>1.000000</td>
      <td>0.809524</td>
      <td>0.894737</td>
      <td>21.0000</td>
    </tr>
    <tr>
      <th>glasses</th>
      <td>1.000000</td>
      <td>0.894737</td>
      <td>0.944444</td>
      <td>19.0000</td>
    </tr>
    <tr>
      <th>helicopter</th>
      <td>0.904762</td>
      <td>1.000000</td>
      <td>0.950000</td>
      <td>19.0000</td>
    </tr>
    <tr>
      <th>ice skates</th>
      <td>0.894737</td>
      <td>0.809524</td>
      <td>0.850000</td>
      <td>21.0000</td>
    </tr>
    <tr>
      <th>letter tray</th>
      <td>0.718750</td>
      <td>0.851852</td>
      <td>0.779661</td>
      <td>27.0000</td>
    </tr>
    <tr>
      <th>monitor</th>
      <td>0.761905</td>
      <td>0.800000</td>
      <td>0.780488</td>
      <td>20.0000</td>
    </tr>
    <tr>
      <th>mug</th>
      <td>0.916667</td>
      <td>0.916667</td>
      <td>0.916667</td>
      <td>24.0000</td>
    </tr>
    <tr>
      <th>network switch</th>
      <td>0.923077</td>
      <td>0.705882</td>
      <td>0.800000</td>
      <td>17.0000</td>
    </tr>
    <tr>
      <th>over-ear headphones</th>
      <td>0.944444</td>
      <td>1.000000</td>
      <td>0.971429</td>
      <td>17.0000</td>
    </tr>
    <tr>
      <th>pen</th>
      <td>0.764706</td>
      <td>0.764706</td>
      <td>0.764706</td>
      <td>17.0000</td>
    </tr>
    <tr>
      <th>purse</th>
      <td>0.894737</td>
      <td>0.739130</td>
      <td>0.809524</td>
      <td>23.0000</td>
    </tr>
    <tr>
      <th>stand mixer</th>
      <td>0.875000</td>
      <td>1.000000</td>
      <td>0.933333</td>
      <td>21.0000</td>
    </tr>
    <tr>
      <th>stroller</th>
      <td>0.840000</td>
      <td>1.000000</td>
      <td>0.913043</td>
      <td>21.0000</td>
    </tr>
    <tr class="blank_row">
      <td colspan="6"></td>
    </tr>
    <tr>
      <th>accuracy</th>
      <td>0.852500</td>
      <td>0.852500</td>
      <td>0.852500</td>
      <td>0.8525</td>
    </tr>
    <tr>
      <th>macro avg</th>
      <td>0.858249</td>
      <td>0.852866</td>
      <td>0.850682</td>
      <td>400.0000</td>
    </tr>
    <tr>
      <th>weighted avg</th>
      <td>0.859320</td>
      <td>0.852500</td>
      <td>0.851100</td>
      <td>400.0000</td>
    </tr>
  </tbody>
</table>

| Loss | Accuracy |
|-|-|
| ![loss](https://drive.google.com/uc?export=view&id=1O9ZxR0MH8CmfK989LoY5fcjn542ZnpMz) | ![accuracy](https://drive.google.com/uc?export=view&id=1TkLYfi79vBYV7VIydQlpH7N5QbpR0iJ7)|

As expected, the accuracy obtained using an UDA framework is much higher with respect to the baseline.

From the visualizations of the t-SNE and the confusion matrix, it is clear that the unsupervised domain adaptation framework provides superior results. Specifically, in the t-SNE chart, the clusters are more tightly grouped and well separated, indicating a higher level of class discrimination. Additionally, the confusion matrix shows that certain classes are classified completely correct, further demonstrating the effectiveness of the UDA framework.

The chart of the accuracy results clearly illustrates that both discriminators achieve the same level of accuracy. This validates the authors' decision in the paper to adopt the first discriminator to compute the overall accuracy.
___

**Train**: Real World <br>
**Test**: Product 

Best test accuracy $acc = 94\% $

In [None]:
NAME_RUN = "MCD_DA_RW_to_P"
config={
        "backbone": "ResNet18",
        "version": "DA",
        "name": NAME_RUN,
        "batch_size": BATCH_SIZE,
        "epochs": EPOCHS,
        "lr": LR,
        "optimizer": OPTIMIZER,
        "wd": WD,
        "momentum": MOMENTUM
    }

training_MCD_DA(rw_data, product_data, wandb.init(project=PROJECT_NAME, entity=ENTITY, name=NAME_RUN, mode=WANDB_MODE, config=config))

| t-SNE | Confusion matrix |
|-|-|
| ![tsne](https://drive.google.com/uc?export=view&id=1IOM1Om8YWK1IYKDTwJuF2BKcW7pLJEcW) | ![cm](https://drive.google.com/uc?export=view&id=1x07d3VLT39QLA4h5E_ik8fMUrJV5Iu19)|

<br>

|Classification report |
|-|

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>precision</th>
      <th>recall</th>
      <th>f1-score</th>
      <th>support</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>backpack</th>
      <td>0.965517</td>
      <td>0.965517</td>
      <td>0.965517</td>
      <td>29.0000</td>
    </tr>
    <tr>
      <th>bookcase</th>
      <td>0.869565</td>
      <td>0.952381</td>
      <td>0.909091</td>
      <td>21.0000</td>
    </tr>
    <tr>
      <th>car jack</th>
      <td>1.000000</td>
      <td>0.882353</td>
      <td>0.937500</td>
      <td>17.0000</td>
    </tr>
    <tr>
      <th>comb</th>
      <td>0.863636</td>
      <td>1.000000</td>
      <td>0.926829</td>
      <td>19.0000</td>
    </tr>
    <tr>
      <th>crown</th>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>20.0000</td>
    </tr>
    <tr>
      <th>file cabinet</th>
      <td>0.882353</td>
      <td>0.833333</td>
      <td>0.857143</td>
      <td>18.0000</td>
    </tr>
    <tr>
      <th>flat iron</th>
      <td>0.941176</td>
      <td>1.000000</td>
      <td>0.969697</td>
      <td>16.0000</td>
    </tr>
    <tr>
      <th>game controller</th>
      <td>1.000000</td>
      <td>0.916667</td>
      <td>0.956522</td>
      <td>24.0000</td>
    </tr>
    <tr>
      <th>glasses</th>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>19.0000</td>
    </tr>
    <tr>
      <th>helicopter</th>
      <td>0.894737</td>
      <td>1.000000</td>
      <td>0.944444</td>
      <td>17.0000</td>
    </tr>
    <tr>
      <th>ice skates</th>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>19.0000</td>
    </tr>
    <tr>
      <th>letter tray</th>
      <td>0.866667</td>
      <td>0.812500</td>
      <td>0.838710</td>
      <td>16.0000</td>
    </tr>
    <tr>
      <th>monitor</th>
      <td>1.000000</td>
      <td>0.850000</td>
      <td>0.918919</td>
      <td>20.0000</td>
    </tr>
    <tr>
      <th>mug</th>
      <td>0.944444</td>
      <td>1.000000</td>
      <td>0.971429</td>
      <td>17.0000</td>
    </tr>
    <tr>
      <th>network switch</th>
      <td>0.958333</td>
      <td>0.958333</td>
      <td>0.958333</td>
      <td>24.0000</td>
    </tr>
    <tr>
      <th>over-ear headphones</th>
      <td>0.882353</td>
      <td>1.000000</td>
      <td>0.937500</td>
      <td>15.0000</td>
    </tr>
    <tr>
      <th>pen</th>
      <td>0.923077</td>
      <td>0.827586</td>
      <td>0.872727</td>
      <td>29.0000</td>
    </tr>
    <tr>
      <th>purse</th>
      <td>1.000000</td>
      <td>0.904762</td>
      <td>0.950000</td>
      <td>21.0000</td>
    </tr>
    <tr>
      <th>stand mixer</th>
      <td>0.950000</td>
      <td>1.000000</td>
      <td>0.974359</td>
      <td>19.0000</td>
    </tr>
    <tr>
      <th>stroller</th>
      <td>0.909091</td>
      <td>1.000000</td>
      <td>0.952381</td>
      <td>20.0000</td>
    </tr>
    <tr class="blank_row">
      <td colspan="6"></td>
    </tr>
    <tr>
      <th>accuracy</th>
      <td>0.942500</td>
      <td>0.942500</td>
      <td>0.942500</td>
      <td>0.9425</td>
    </tr>
    <tr>
      <th>macro avg</th>
      <td>0.942548</td>
      <td>0.945172</td>
      <td>0.942055</td>
      <td>400.0000</td>
    </tr>
    <tr>
      <th>weighted avg</th>
      <td>0.944951</td>
      <td>0.942500</td>
      <td>0.941970</td>
      <td>400.0000</td>
    </tr>
  </tbody>
</table>

| Loss | Accuracy |
|-|-|
| ![loss](https://drive.google.com/uc?export=view&id=1v3V_eEVMkltbL4i0kQ8S6fBdRk8Atm-o) | ![accuracy](https://drive.google.com/uc?export=view&id=10_Sb_EsYz7xKn4ibztQLnGzSYGY6pZZq)|

Also for the $RW → P$ task, the UDA framework reaches an higher accuracy with respect to the baseline.

As expected, as we have seen in the previous observation, in t-SNE chart the clusters are more tightly grouped and well separated. Additionally, the confusion matrix shows that we obtain better results, further demonstrating the effectiveness of the UDA framework.


Moreover, also for the case $RW → P$ the two discriminators achieve the same level of accuracy.
___

### Observations

As expected, the method proposed by the paper [Maximum Classifier Discrepancy for Unsupervised Domain Adaptation](https://arxiv.org/abs/1712.02560) outperforms the baseline. From the results we obtained, we can demonstrate the effectiveness of the UDA device implemented.

Firstly, we trained the networks using the SGD optimizer, but we have not reached big improvements with respect to the baseline. Then, we tried using the Adam optimizer and we performed some tests for the tuning of the hyperparameters. It is worth mentioning that for the feature extraction layer, i.e. Resnet18 without the classification layer, we decided to use a learning rate 10 times lower with respect to the learning rate used for other layers. 

It is interesting to visualize, using t-SNE, the features obtained by the last layer of the discriminator $F_1$ before applying a fully connected layer. Blue dots and red dots are, respectively, the test set source features and the test set target features. We report the $RW\rightarrow P$ experiment.

| Source only | Adapted |
|-|-|
| ![source-only](https://drive.google.com/uc?export=view&id=1-pOPCZTK-aYwIbN6HNyC2VtBWwcwTx5W) | ![adapted](https://drive.google.com/uc?export=view&id=1-HkjO2t1LmL5pdW923WJ1Ng_IbeucJN-)|

We can appreciate the overlap of the features in the *Adapted* version, where we have a high accurate representation of the features.

In the end, we obtained a gain of $9\%$ in $P\rightarrow RW$ and a gain of $2\%$ in $RW\rightarrow P$. Below, we report a table to summarize the results:

|       | Baseline | MCD-DA | Gain |
|-------|----------|--------|------|
| $P\rightarrow RW$ | $76\%$      | $85\%$    | $+9\%$  |
| $RW\rightarrow P$ | $92\%$      | $94\%$  | $+2\%$ |

Overall, the paper presents a clear theoretical framework and we did not encounter any major challenges during the implementation.
___
___

## 3° implementation: Deep Subdomain Adaptation Network for image classification (DSAN)

The paper [Deep Subdomain Adaptation Network](https://arxiv.org/abs/2106.09388) (DSAN), released in 2021 by Zhu et. al., presents a new method for adapting deep neural networks to new domains for image classification tasks. The proposed method utilizes a Local Maximum Mean Discrepancy (LMMD) loss to align the relevant subdomain distributions of domain-specific layer activations across different domains, rather than solely focusing on aligning the global source and target distributions as traditional methods do. The focus of this paper is on learning relevant subdomain adaptation through the LMMD loss, which results in improved classification accuracy on benchmark datasets.

![](https://fuzhenzhuang.github.io/img/transfer/TNNLS2020_1.png)

The loss of Subdomain Adaptation method is formulated as: $$ \min_{f} \frac{1}{n_s} \sum_{i=1}^{n_s} \mathcal{L_{ce}}(f(x_i^s), y_i^s) + \lambda \sum_{l\in L} \hat{d}_l(p,q)$$

where $\mathcal{L_{ce}}$ is the cross-entropy loss function and $\hat{d}_l$ is the LMMD loss.

The LMMD loss is formulated as: 
$$
\begin{aligned}
& \hat{d}_l(p, q)=\frac{1}{C} \sum_{c=1}^C\left[\sum_{i=1}^{n_s} \sum_{j=1}^{n_s} w_i^{s c} w_j^{s c} k\left(\mathbf{z}_i^{s l}, \mathbf{z}_j^{s l}\right)\right. \left.+\sum_{i=1}^{n_t} \sum_{j=1}^{n_t} w_i^{t c} w_j^{t c} k\left(\mathbf{z}_i^{t l}, \mathbf{z}_j^{t l}\right)-2 \sum_{i=1}^{n_s} \sum_{j=1}^{n_t} w_i^{s c} w_j^{t c} k\left(\mathbf{z}_i^{s l}, \mathbf{z}_j^{t l}\right)\right]
\end{aligned} $$

Where $z^l$ is the l-th layer activation, $w_i^{sc}$ and $w_i^{tc}$ are the weight of $x_i^s$ and $x_i^t$ belonging to class $c$.

$w_i^{c}$ is computed as: 
$w_i^{c}$ = $\frac{y_{ic}}{\sum_{(x_j, y_j)\in D} y_{jc}}$

Where $y_{ic}$ is the c-th entry of vector $y_i$. For samples in the source domain, the true label $y_i^s$ is used as a one-hot vector to compute $w_i^{sc}$ for each sample. However, in unsupervised adaptation we do not have labels, but only the output of the neural network $\hat{y}_i = f(x_i)$. We can use $\hat{y}_i^t$ as the probability of assigning $x_i^t$ to each of the C classes and then calculate $w_j^{tc}$ for each target sample.


### Local constants for DSAN

In [None]:
# training
EPOCHS = 15
OPTIMIZER = 'Adam' 
LR = 0.001
WD = 5e-4
MOMENTUM = 0.9
GAMMA = 0.99
GAMMA_DSAN = 10

# LMMD loss
KERNEL_NUM = 5 # number of kernel to compute for the bandwidth estimation 
KERNEL_MUL = 2. # multiplicative factor for the bandwidth estimation

### Loss function

In [None]:
# Code adapted from https://github.com/easezyc/deep-transfer-learning/blob/master/UDA/pytorch1.0/DSAN/lmmd.py

class LMMDloss(torch.nn.Module):
    def __init__(self, class_num, kernel_mul=KERNEL_MUL, kernel_num=KERNEL_NUM, fix_sigma=None):
        super(LMMDloss, self).__init__()
        self.class_num = class_num
        self.kernel_num = kernel_num
        self.kernel_mul = kernel_mul
        self.fix_sigma = fix_sigma

    def guassian_kernel(self, source, target, kernel_mul=KERNEL_MUL, kernel_num=KERNEL_NUM, fix_sigma=None):
        n_samples = int(source.size()[0]) + int(target.size()[0])
        total = torch.cat([source, target], dim=0)
        total0 = total.unsqueeze(0).expand(
            int(total.size(0)), int(total.size(0)), int(total.size(1)))
        total1 = total.unsqueeze(1).expand(
            int(total.size(0)), int(total.size(0)), int(total.size(1)))
        L2_distance = ((total0-total1)**2).sum(2)
        if fix_sigma:
            bandwidth = fix_sigma
        else:
            bandwidth = torch.sum(L2_distance.data) / (n_samples**2-n_samples)
        bandwidth /= kernel_mul ** (kernel_num // 2)
        bandwidth_list = [bandwidth * (kernel_mul**i)
                          for i in range(kernel_num)]
        kernel_val = [torch.exp(-L2_distance / bandwidth_temp)
                      for bandwidth_temp in bandwidth_list]
        return sum(kernel_val)

    def get_loss(self, source, target, s_label, t_label):
        batch_size = source.size()[0]
        weight_ss, weight_tt, weight_st = self.cal_weight(s_label, t_label, batch_size=batch_size, class_num=self.class_num)
        weight_ss = torch.from_numpy(weight_ss).to(DEVICE)
        weight_tt = torch.from_numpy(weight_tt).to(DEVICE)
        weight_st = torch.from_numpy(weight_st).to(DEVICE)

        kernels = self.guassian_kernel(source, target,kernel_mul=self.kernel_mul, kernel_num=self.kernel_num, fix_sigma=self.fix_sigma)
        loss = torch.Tensor([0]).to(DEVICE)
        if torch.sum(torch.isnan(sum(kernels))):
            return loss
        SS = kernels[:batch_size, :batch_size]
        TT = kernels[batch_size:, batch_size:]
        ST = kernels[:batch_size, batch_size:]

        loss += torch.sum(weight_ss * SS + weight_tt * TT - 2 * weight_st * ST)
        return loss

    def convert_to_onehot(self, sca_label, class_num=31):
        return np.eye(class_num)[sca_label]

    def cal_weight(self, s_label, t_label, batch_size=32, class_num=31):
        batch_size = s_label.size()[0]
        s_sca_label = s_label.cpu().data.numpy()
        s_vec_label = self.convert_to_onehot(s_sca_label, class_num=self.class_num)
        s_sum = np.sum(s_vec_label, axis=0).reshape(1, class_num)
        s_sum[s_sum == 0] = 100
        s_vec_label = s_vec_label / s_sum

        t_sca_label = t_label.cpu().data.max(1)[1].numpy()
        t_vec_label = t_label.cpu().data.numpy()
        t_sum = np.sum(t_vec_label, axis=0).reshape(1, class_num)
        t_sum[t_sum == 0] = 100
        t_vec_label = t_vec_label / t_sum

        index = list(set(s_sca_label) & set(t_sca_label))
        mask_arr = np.zeros((batch_size, class_num))
        mask_arr[:, index] = 1
        t_vec_label = t_vec_label * mask_arr
        s_vec_label = s_vec_label * mask_arr

        weight_ss = np.matmul(s_vec_label, s_vec_label.T)
        weight_tt = np.matmul(t_vec_label, t_vec_label.T)
        weight_st = np.matmul(s_vec_label, t_vec_label.T)

        length = len(index)
        if length != 0:
            weight_ss = weight_ss / length
            weight_tt = weight_tt / length
            weight_st = weight_st / length
        else:
            weight_ss = np.array([0])
            weight_tt = np.array([0])
            weight_st = np.array([0])
        return weight_ss.astype('float32'), weight_tt.astype('float32'), weight_st.astype('float32')

### Network architecture

The backbone model we use is the Resnet18. The network architecture is organized as follows:
- Feature extractor: Resnet18 without the classification layer;
- Fully connected layer;
- Classification layer.

Moreover, we implemented 3 forward methods for the model:
- ``forward_DSAN``: this function is used for both the standard forward operation and the computation of the Local Maximum Mean Discrepancy (LMMD) loss. It takes as input source and target samples, as well as source labels, in order to calculate the LMMD loss which is used to align the relevant subdomain distributions of domain-specific layer activations across different domains;
- ``forward_features``: it is used to obtain the features of the samples given as input. This function is useful for the plotting of the t-SNE;
- ``forward``: this function is the standard forward operation of the network, in contrast to the forward_DSAN function which includes the computation of the Local Maximum Mean Discrepancy (LMMD) loss. It is used during the testing phase.

In [None]:
class DSAN_Network(torch.nn.Module):
    def __init__(self, num_class):
        super(DSAN_Network, self).__init__()
        self.resnet = torchvision.models.resnet18(weights=WEIGHTS_RESNET18)
        self.resnet_features = torch.nn.Sequential(*(list(self.resnet.children())[:-1]))
        self.fc1 = torch.nn.Linear(512, 256)
        self.cls = torch.nn.Linear(256, num_class)
        self.lmmd_loss = LMMDloss(class_num = num_class)

    def forward_DSAN(self, source, target, source_label):
        source = self.resnet_features(source)
        source = source.view(source.size(0), 512)
        source = self.fc1(source)

        target = self.resnet_features(target)
        target = target.view(target.size(0), 512)
        target = self.fc1(target)

        source_pred = self.cls(source)
        target_pred = self.cls(target)

        lmmd = self.lmmd_loss.get_loss(source, target, source_label, F.softmax(target_pred, dim=1))
        return source_pred, lmmd

    def forward_features(self, x):
        x = self.resnet_features(x)
        x = x.view(x.size(0), 512)
        x = self.fc1(x)
        return x

    def forward(self, x):
        x = self.resnet_features(x)
        x = x.view(x.size(0), 512)
        x = self.fc1(x)
        x = self.cls(x)
        return x

### Training step

The training phase, coherently with the Unsupervised Domain Adaptation approach, has been performed using only source samples, source labels and target samples. This means that the model is trained on the labeled source samples, and uses the target samples to align the feature representations and improve the model's ability to generalize to new domains.

Moreover, the authors of the paper propose a progressive schedule for the adaptation factor $\lambda$. Instead of fixing $\lambda$ at a constant value, they gradually increase it from 0 to 1 to suppress noisy activations at the early stages of training.

The schedule is defined by the following equation:
$$\lambda_θ = \frac {2}{exp(−γθ)} − 1$$

Following the paper, $γ$ is a constant fixed equal to 10 throughout the experiments, and $θ$ is the training progress linearly changing from 0 to 1. This progressive schedule allows for a more effective and stable training process by gradually increasing the influence of the LMMD loss over time.

In [None]:
def training_step_DSAN(model, optimizer, cost_function, source_train_loader, target_train_loader, total_epochs, current_epoch, scheduler):
  source_samples = 0.
  target_samples = 0.
  cumulative_lmmd_loss = 0.
  cumulative_total_loss = 0.
  cumulative_ce_loss = 0.

  target_iter = iter(target_train_loader)
  model.train()

  for batch_idx, (source, source_label) in enumerate(source_train_loader):
      try:
        target, _ = next(target_iter)
      except:
        target_iter = iter(target_train_loader)
        target, _ = next(target_iter)
      
      source = source.to(DEVICE)
      target = target.to(DEVICE)
      source_label = source_label.to(DEVICE)

      optimizer.zero_grad()

      source_samples += source.shape[0]
      target_samples += target.shape[0]

      # forward pass
      source_output, lmmd_loss = model.forward_DSAN(source, target, source_label)

      ce_loss = cost_function(source_output, source_label)

      lambd = 2 / (1 + math.exp(-GAMMA_DSAN * (current_epoch) / total_epochs)) - 1

      total_loss = ce_loss + (lambd * lmmd_loss)

      total_loss.backward()
      optimizer.step()

      cumulative_lmmd_loss += lmmd_loss.item()
      cumulative_total_loss += total_loss.item()
      cumulative_ce_loss += ce_loss.item()
  
  scheduler.step()

  free_GPU_memory(source, target, source_label)
  
  return {"train/train_loss": cumulative_total_loss/(source_samples + target_samples), 
          "train/train_lmmd_loss": cumulative_lmmd_loss/(source_samples + target_samples)}

### Test step

The testing phase, as stated in the introduction, has been performed only using target samples. This, in order to evaluate correctly the performance of the UDA device implemented for the project.

In [None]:
def test_step_DSAN(model, cost_function, target_test_loader):
  samples = 0.
  cumulative_loss = 0.
  cumulative_accuracy = 0.
  
  model.eval()

  with torch.no_grad():
    for batch_idx, (inputs, targets) in enumerate(target_test_loader):

        inputs = inputs.to(DEVICE)
        targets = targets.to(DEVICE)

        samples += inputs.shape[0]

        output = model(inputs)
        loss = cost_function(output, targets)

        pred = output.argmax(dim=1)

        cumulative_loss += loss.item()
        cumulative_accuracy += pred.eq(targets).sum().item()

  free_GPU_memory(inputs, targets)
  
  return {"test/test_loss": (cumulative_loss/samples), 
          "test/test_acc": (cumulative_accuracy/samples) * 100}

### Declare the training loop

In [None]:
def training_DSAN(data_source, data_target, wandb_setup):
  config = wandb_setup.config
  print('CONFIGS\n', yaml.dump(config._items, default_flow_style=False))

  model = DSAN_Network(num_class = NUM_CLASSES).to(DEVICE)

  optimizer = get_optimizer([{'params': model.resnet_features.parameters(), 'lr': config.lr/50},
                             {'params': model.fc1.parameters()},
                             {'params': model.cls.parameters()}], 
                             config.optimizer, config.lr, config.wd)

  scheduler = get_scheduler(optimizer, GAMMA)
  cost_function = get_cost_function()
  
  best_acc = 0.
  best_loss = 0.

  # Loop epochs
  for e in range(config.epochs):
      print(f'-- Epoch [{e+1}/{config.epochs}] --')
      train_metrics = training_step_DSAN(model, optimizer, cost_function, data_source['train'], data_target['train'], config.epochs, e, scheduler)
      test_metrics = test_step_DSAN(model, cost_function, data_target['test'])
      wandb.log({**train_metrics, **test_metrics})
      print('Train: \tLoss: {:.6f}\t LMMD loss: {:.6f}'.format(train_metrics["train/train_loss"], train_metrics["train/train_lmmd_loss"]))
      print('Test: \tAverage loss: {:.6f}\t Accuracy: {:.2f}%'.format(test_metrics["test/test_loss"], test_metrics["test/test_acc"]))

      if (best_acc < test_metrics["test/test_acc"]):
          best_model = copy.deepcopy(model)
          best_acc = test_metrics["test/test_acc"]
          best_loss = test_metrics["test/test_loss"]

  os.makedirs(WEIGHTS_PATH + 'dsan/', exist_ok = True) 
  torch.save(best_model.state_dict(), WEIGHTS_PATH + 'dsan/' + config.name + '.pt')

  visualize_results(best_model, data_source['test'], data_target['test'], ASSETS_PATH + 'dsan/' + config.name + '/')

  wandb.summary["test_best_acc"] = best_acc
  wandb.summary["test_best_loss"] = best_loss
  wandb.finish()

  free_GPU_memory(model, best_model)

### Let's train the DSAN!

**Train**: Product <br>
**Test**: Real World 

Best test accuracy $acc = 87\% $

In [None]:
NAME_RUN = "DSAN_P_to_RW"
config={
        "backbone": "ResNet18",
        "version": "DA",
        "name": NAME_RUN,
        "batch_size": BATCH_SIZE,
        "epochs": EPOCHS,
        "lr": LR,
        "optimizer": OPTIMIZER,
        "wd": WD,
        "momentum": MOMENTUM
    }

training_DSAN(product_data, rw_data, wandb.init(project=PROJECT_NAME, entity=ENTITY, name=NAME_RUN, mode=WANDB_MODE, config=config))

| t-SNE | Confusion matrix |
|-|-|
| ![tsne](https://drive.google.com/uc?export=view&id=1-cFR5VTD6YgKOj3tcOfo2s66ZZOcRfYt) | ![cm](https://drive.google.com/uc?export=view&id=1-aDZdGGWJenYg2DRWVo7sYYlakkayCgj)|

<br>

|Classification report |
|-|

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>precision</th>
      <th>recall</th>
      <th>f1-score</th>
      <th>support</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>backpack</th>
      <td>0.739130</td>
      <td>1.000000</td>
      <td>0.850000</td>
      <td>17.0000</td>
    </tr>
    <tr>
      <th>bookcase</th>
      <td>0.933333</td>
      <td>0.700000</td>
      <td>0.800000</td>
      <td>20.0000</td>
    </tr>
    <tr>
      <th>car jack</th>
      <td>0.933333</td>
      <td>0.933333</td>
      <td>0.933333</td>
      <td>15.0000</td>
    </tr>
    <tr>
      <th>comb</th>
      <td>0.888889</td>
      <td>0.727273</td>
      <td>0.800000</td>
      <td>22.0000</td>
    </tr>
    <tr>
      <th>crown</th>
      <td>0.954545</td>
      <td>1.000000</td>
      <td>0.976744</td>
      <td>21.0000</td>
    </tr>
    <tr>
      <th>file cabinet</th>
      <td>0.642857</td>
      <td>0.818182</td>
      <td>0.720000</td>
      <td>22.0000</td>
    </tr>
    <tr>
      <th>flat iron</th>
      <td>0.882353</td>
      <td>0.937500</td>
      <td>0.909091</td>
      <td>16.0000</td>
    </tr>
    <tr>
      <th>game controller</th>
      <td>0.944444</td>
      <td>0.809524</td>
      <td>0.871795</td>
      <td>21.0000</td>
    </tr>
    <tr>
      <th>glasses</th>
      <td>1.000000</td>
      <td>0.842105</td>
      <td>0.914286</td>
      <td>19.0000</td>
    </tr>
    <tr>
      <th>helicopter</th>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>19.0000</td>
    </tr>
    <tr>
      <th>ice skates</th>
      <td>0.900000</td>
      <td>0.857143</td>
      <td>0.878049</td>
      <td>21.0000</td>
    </tr>
    <tr>
      <th>letter tray</th>
      <td>0.793103</td>
      <td>0.851852</td>
      <td>0.821429</td>
      <td>27.0000</td>
    </tr>
    <tr>
      <th>monitor</th>
      <td>0.761905</td>
      <td>0.800000</td>
      <td>0.780488</td>
      <td>20.0000</td>
    </tr>
    <tr>
      <th>mug</th>
      <td>0.880000</td>
      <td>0.916667</td>
      <td>0.897959</td>
      <td>24.0000</td>
    </tr>
    <tr>
      <th>network switch</th>
      <td>0.750000</td>
      <td>0.882353</td>
      <td>0.810811</td>
      <td>17.0000</td>
    </tr>
    <tr>
      <th>over-ear headphones</th>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>17.0000</td>
    </tr>
    <tr>
      <th>pen</th>
      <td>0.933333</td>
      <td>0.823529</td>
      <td>0.875000</td>
      <td>17.0000</td>
    </tr>
    <tr>
      <th>purse</th>
      <td>0.894737</td>
      <td>0.739130</td>
      <td>0.809524</td>
      <td>23.0000</td>
    </tr>
    <tr>
      <th>stand mixer</th>
      <td>0.840000</td>
      <td>1.000000</td>
      <td>0.913043</td>
      <td>21.0000</td>
    </tr>
    <tr>
      <th>stroller</th>
      <td>0.944444</td>
      <td>0.809524</td>
      <td>0.871795</td>
      <td>21.0000</td>
    </tr>
    <tr class="blank_row">
      <td colspan="6"></td>
    </tr>
    <tr>
      <th>accuracy</th>
      <td>0.867500</td>
      <td>0.867500</td>
      <td>0.867500</td>
      <td>0.8675</td>
    </tr>
    <tr>
      <th>macro avg</th>
      <td>0.880820</td>
      <td>0.872406</td>
      <td>0.871667</td>
      <td>400.0000</td>
    </tr>
    <tr>
      <th>weighted avg</th>
      <td>0.878169</td>
      <td>0.867500</td>
      <td>0.867910</td>
      <td>400.0000</td>
    </tr>
  </tbody>
</table>

| Loss |
|-|
| ![loss](https://drive.google.com/uc?export=view&id=1J-n-YeVbYW8bU2PrCUz9ZdYjaxZK7Q56)|

The accuracy obtained with this UDA framework is higher with respect to the baseline and the MCD-DA method.

The t-SNE and confusion matrix visualizations reveal that the unsupervised domain adaptation framework yields exceptional results. The t-SNE chart demonstrates this by showing clusters that are compact and well separated. The confusion matrix confirms this with the results obtained.

___

**Train**: Real World <br>
**Test**: Product 

Best test accuracy $acc = 94\% $

In [None]:
NAME_RUN = "DSAN_RW_to_P"
config={
        "backbone": "ResNet18",
        "version": "DA",
        "name": NAME_RUN,
        "batch_size": BATCH_SIZE,
        "epochs": EPOCHS,
        "lr": LR,
        "optimizer": OPTIMIZER,
        "wd": WD,
        "momentum": MOMENTUM
    }

training_DSAN(rw_data, product_data, wandb.init(project=PROJECT_NAME, entity=ENTITY, name=NAME_RUN, mode=WANDB_MODE, config=config))

| t-SNE | Confusion matrix |
|-|-|
| ![tsne](https://drive.google.com/uc?export=view&id=1-jni6DdmqYE46zEdOl5-GlV1AkGloONI) | ![cm](https://drive.google.com/uc?export=view&id=1-jEj-xX3dyBZ34Z67bpY-09-qD0w_ne2)|

<br>

|Classification report |
|-|

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>precision</th>
      <th>recall</th>
      <th>f1-score</th>
      <th>support</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>backpack</th>
      <td>0.933333</td>
      <td>0.965517</td>
      <td>0.949153</td>
      <td>29.0000</td>
    </tr>
    <tr>
      <th>bookcase</th>
      <td>0.833333</td>
      <td>0.952381</td>
      <td>0.888889</td>
      <td>21.0000</td>
    </tr>
    <tr>
      <th>car jack</th>
      <td>1.000000</td>
      <td>0.882353</td>
      <td>0.937500</td>
      <td>17.0000</td>
    </tr>
    <tr>
      <th>comb</th>
      <td>1.000000</td>
      <td>0.947368</td>
      <td>0.972973</td>
      <td>19.0000</td>
    </tr>
    <tr>
      <th>crown</th>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>20.0000</td>
    </tr>
    <tr>
      <th>file cabinet</th>
      <td>0.875000</td>
      <td>0.777778</td>
      <td>0.823529</td>
      <td>18.0000</td>
    </tr>
    <tr>
      <th>flat iron</th>
      <td>0.941176</td>
      <td>1.000000</td>
      <td>0.969697</td>
      <td>16.0000</td>
    </tr>
    <tr>
      <th>game controller</th>
      <td>1.000000</td>
      <td>0.875000</td>
      <td>0.933333</td>
      <td>24.0000</td>
    </tr>
    <tr>
      <th>glasses</th>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>19.0000</td>
    </tr>
    <tr>
      <th>helicopter</th>
      <td>0.944444</td>
      <td>1.000000</td>
      <td>0.971429</td>
      <td>17.0000</td>
    </tr>
    <tr>
      <th>ice skates</th>
      <td>1.000000</td>
      <td>0.947368</td>
      <td>0.972973</td>
      <td>19.0000</td>
    </tr>
    <tr>
      <th>letter tray</th>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>16.0000</td>
    </tr>
    <tr>
      <th>monitor</th>
      <td>1.000000</td>
      <td>0.950000</td>
      <td>0.974359</td>
      <td>20.0000</td>
    </tr>
    <tr>
      <th>mug</th>
      <td>0.944444</td>
      <td>1.000000</td>
      <td>0.971429</td>
      <td>17.0000</td>
    </tr>
    <tr>
      <th>network switch</th>
      <td>0.920000</td>
      <td>0.958333</td>
      <td>0.938776</td>
      <td>24.0000</td>
    </tr>
    <tr>
      <th>over-ear headphones</th>
      <td>0.789474</td>
      <td>1.000000</td>
      <td>0.882353</td>
      <td>15.0000</td>
    </tr>
    <tr>
      <th>pen</th>
      <td>0.888889</td>
      <td>0.827586</td>
      <td>0.857143</td>
      <td>29.0000</td>
    </tr>
    <tr>
      <th>purse</th>
      <td>0.857143</td>
      <td>0.857143</td>
      <td>0.857143</td>
      <td>21.0000</td>
    </tr>
    <tr>
      <th>stand mixer</th>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>19.0000</td>
    </tr>
    <tr>
      <th>stroller</th>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>20.0000</td>
    </tr>
    <tr class="blank_row">
      <td colspan="6"></td>
    </tr>
    <tr>
      <th>accuracy</th>
      <td>0.942500</td>
      <td>0.942500</td>
      <td>0.942500</td>
      <td>0.9425</td>
    </tr>
    <tr>
      <th>macro avg</th>
      <td>0.946362</td>
      <td>0.947041</td>
      <td>0.945034</td>
      <td>400.0000</td>
    </tr>
    <tr>
      <th>weighted avg</th>
      <td>0.945466</td>
      <td>0.942500</td>
      <td>0.942450</td>
      <td>400.0000</td>
    </tr>
  </tbody>
</table>

| Loss |
|-|
| ![loss](https://drive.google.com/uc?export=view&id=1DKvQX2QYsVB0fBThEwlGsIt4zkUWNhFE)|

Also for the task $RW → P$ the accuracy obtained with this UDA framework is higher with respect to the baseline.

This is confirmed by the t-SNE and confusion matrix visualization. As stated before, also for this task the clusters of the t-SNE visualization are compact and well separated. As expected, also the confusion matrix confirms the superiority of this method.

___

### Observations

The method proposed by the paper [Deep Subdomain Adaptation Network for image classification (DSAN)](https://arxiv.org/abs/2106.09388), outperforms the baseline too.

We followed the same pipeline of the MCD-DA approach. We firstly started using SGD optimizer for the training of the network, but we have not reached big improvements with respect to the baseline. Then, with Adam optimizer with the right tuning of the hyperparameters we reached a huge improvement. As for the MCD-DA method, for the feature extraction network we decided to use a much smaller learning rate than the other parts of the model, specifically 50 times smaller. 

It is interesting to visualize, also in this method, the features obtained by the last layer of the fully connected layer before applying a linear classifier. Blue dots and red dots are, respectively, the test set source features and the test set target features. We report the $RW\rightarrow P$ experiment.

| Source only | Adapted |
|-|-|
| ![source-only](https://drive.google.com/uc?export=view&id=1-pOPCZTK-aYwIbN6HNyC2VtBWwcwTx5W) | ![adapted](https://drive.google.com/uc?export=view&id=1-g1308d0JtJBSeV2Sfm1hw8e2uzuu2M1)|

We can appreciate the overlap of the features in the *Adapted* version, where we have a higher accurate representation of the features.

The results we obtained with DSAN are very good, indeed we have a gain of $11%$ in $P\rightarrow RW$ and a gain of $3\%$ in $RW\rightarrow P$. Below, we report a table to summarize the results:

|       | Baseline | DSAN | Gain |
|-------|----------|--------|------|
| $P\rightarrow RW$ | $76\%$      | $87\%$    | $+11\%$  |
| $RW\rightarrow P$ | $92\%$      | $94\%$  | $+2\%$ |

Overall, the paper presents a clear theoretical framework and we did not encounter any major challenges during the implementation and even if it is a simple method, the results obtained are greater than the MCD-DA method and the baseline.

___
___

## Final results consideration

We experienced several issues related to computational resources provided by the basic plan of Google Colab. The most annoying one was the GPU memory, in fact we had to restart the kernel several times to conclude all the tests for each method. We have not found any good solution online to fix the problem, but we tried to limitate it.

In the following table we recap the gains obtained from each method, trained on Product $P$ and tested on Real World $RW$, and viceversa:


|                    | MCD-DA | DSAN  |   
|--------------------|--------|-------|
| $$P \rightarrow RW$$ Gain | $+9\% $   | $+11\%$ |   
| $$RW \rightarrow P$$ Gain | $+2\%$   | $+2\%$ |

Even without utilizing the Unsupervised Domain Adaptation (UDA) framework, the baseline model demonstrates the ability to produce quite good results. This highlights the model's capability to effectively extract relevant features from the source domain, allowing accurate predictions in the target domain. 

Analyzing the upper bound results, it is demonstrated that learning from Product domain is easier than learning from the Real World one. This is due to the fact that for the network it is easier to extract meaningful features from the product domain.

From the UDA perspective, it is clear that the gain we obtain from Product to Real World is much higher than Real World to Product. Furthermore, improving the $RW → P$ task is very challenging because the baseline accuracy is already 92%, which is close to the upper bound of Product domain, 96%.

The paper [Maximum Classifier Discrepancy for Unsupervised Domain Adaptation](https://arxiv.org/abs/1712.02560), is a great improvement with respect to the baseline. We have an impressive improvement for both $P\rightarrow RW$ and $RW\rightarrow P$ tasks. It is explainable by the fact that the generator learns to generate discriminative features for target samples considering the relationship between the decision boundary and target samples, thanks to the discrepancy loss.

Regarding the paper [Deep Subdomain Adaptation Network for image classification (DSAN)](https://arxiv.org/abs/2106.09388), it is evident that the UDA framework improves consistently the performance of the model for both $P\rightarrow RW$ and $RW\rightarrow P$ tasks, this means that it learns the alignment of the relevant subdomain distributions of domain-specific layer activations across different domains through the LMMD loss. Therefore, this method is much simpler with respect to MCD-DA, nevertheless it performs slightly better. Moreover, the training of this method is fairly faster because we observed that the MCD-DA method's steps are not optimized in terms of GPU computation.

Thanks to both methods, we almost reach the upper-bound accuracy over the Product $P$ domain. What a time to be alive! *-Two Minutes Paper moment-* This achievement can be related to the quality of the data that we test, where there are no occlusions, the textures are well visible and there is no noise background.

## Conclusions

We trained, tested and analyzed three methods for Unsupervised Domain Adaptation. We started with a baseline using a [ResNet18](https://pytorch.org/hub/pytorch_vision_resnet/), then we moved to a method presented in 2018, [Maximum Classifier Discrepancy for Unsupervised Domain Adaptation](https://arxiv.org/abs/1712.02560), and finally to the last method presented in 2021, [Deep Subdomain Adaptation Network for Image Classification](https://arxiv.org/abs/2106.09388).

We report again the table to summarize the results obtained with this project:

|       | Baseline | MCD-DA | DSAN   |
|-------|----------|--------|--------|
| $P\rightarrow RW$ | $76\%$      | $85\%$  | $\mathbf{87\%}$ |
| $RW\rightarrow P$ | $92\%$      | $94\%$  | $94\%$ |

We created a [GDrive folder](https://drive.google.com/drive/folders/1yyg4pHmEk3Jyc3T9xVX8M6z5nHpdpnhA?usp=sharing) to collect the entire project with the results's assets, model weights, link to the Adaptiope dataset and much more.

We also share the interactive [WandB workspace](https://wandb.ai/dlfl/DL_UDA_2022), complete with all the models's runs.

As future works, we would like to test it with different ResNet models to observe if a different feature generator improves the performance and, because we adopted only a subset of the Adaptiope dataset, we would like to see how the performances are affected by a large number of classes.