# Exploration and Comparison of Transformers for Image Classification

## Fine-tuning

Comparison of the results for fine-tuning across all datasets for each model
- ViT
- DeiT
- Swin
- CLIP

Showing the results in a different file is to avoid information and visual clutter.

NOTE: Since variables are not saved across multiple jupyter notebook files, the data needs to be taken from the individual notebooks and hardcoded here.

### Prerequisites

Load necessary packages.

In [2]:
import os
os.chdir('../../../')
from utils.data_utils import *
import pandas as pd

### Results

Get the results from each model.

NOTE: Taken from "Results" section of each notebook.

In [3]:
resisc45_vit = 0.946032
food101_vit = 0.877861
fer2013_vit = 0.697687
pcam_vit = 0.888702
sun397_vit = 0.763540
dtd_vit = 0.788830

resisc45_deit = 0.931429
food101_deit = 0.857505
fer2013_deit = 0.698245
pcam_deit = 0.838531
sun397_deit = 0.721149
dtd_deit = 0.771277

resisc45_swin = 0.958095
food101_swin = 0.895842
fer2013_swin = 0.703817
pcam_swin = 0.850311
sun397_swin = 0.778621
dtd_swin = 0.821809

resisc45_clip = 0.879524
food101_clip = 0.794693
fer2013_clip = 0.686542
pcam_clip = 0.833099
sun397_clip = 0.661333
dtd_clip = 0.657447

Aggregate results over datasets.

In [4]:
results_resisc45 = [resisc45_vit, resisc45_deit, resisc45_swin, resisc45_clip]
results_food101 = [food101_vit, food101_deit, food101_swin, food101_clip]
results_fer2013 = [fer2013_vit, fer2013_deit, fer2013_swin, fer2013_clip]
results_pcam = [pcam_vit, pcam_deit, pcam_swin, pcam_clip]
results_sun397 = [sun397_vit, sun397_deit, sun397_swin, sun397_clip]
results_dtd = [dtd_vit, dtd_deit, dtd_swin, dtd_clip]

Concatenate all results into a single variable.

In [5]:
results = [
    results_resisc45,
    results_food101,
    results_fer2013,
    results_pcam,
    results_sun397,
    results_dtd,
]

In [6]:
labels = ['RESISC45', 'Food-101', 'FER2013', 'PatchCamelyon', 'SUN397', 'DTD']
models = ['ViT', 'DeiT', 'Swin', 'CLIP']

In [7]:
acc_dict = create_accuracy_dict(
    results,
    labels
)

In [8]:
acc_dict

{'RESISC45': [0.946032, 0.931429, 0.958095, 0.879524],
 'Food-101': [0.877861, 0.857505, 0.895842, 0.794693],
 'FER2013': [0.697687, 0.698245, 0.703817, 0.686542],
 'PatchCamelyon': [0.888702, 0.838531, 0.850311, 0.833099],
 'SUN397': [0.76354, 0.721149, 0.778621, 0.661333],
 'DTD': [0.78883, 0.771277, 0.821809, 0.657447]}

Display a dataframe containing the results for each model and dataset combination.

In [9]:
df = pd.DataFrame(results, columns=[n for n in models], index=labels)
df

Unnamed: 0,ViT,DeiT,Swin,CLIP
RESISC45,0.946032,0.931429,0.958095,0.879524
Food-101,0.877861,0.857505,0.895842,0.794693
FER2013,0.697687,0.698245,0.703817,0.686542
PatchCamelyon,0.888702,0.838531,0.850311,0.833099
SUN397,0.76354,0.721149,0.778621,0.661333
DTD,0.78883,0.771277,0.821809,0.657447


Print the model that achieved the highest accuracy for each dataset.

In [10]:
for dataset, values in acc_dict.items():
    max_value = max(values)
    best_model = models[values.index(max_value)]
    print(f"For dataset {bold_string(dataset)}, the best model is {bold_string(best_model)} with an accuracy of {bold_string(f'{max_value:.4f}')}")

For dataset [1mRESISC45[0m, the best model is [1mSwin[0m with an accuracy of [1m0.9581[0m
For dataset [1mFood-101[0m, the best model is [1mSwin[0m with an accuracy of [1m0.8958[0m
For dataset [1mFER2013[0m, the best model is [1mSwin[0m with an accuracy of [1m0.7038[0m
For dataset [1mPatchCamelyon[0m, the best model is [1mViT[0m with an accuracy of [1m0.8887[0m
For dataset [1mSUN397[0m, the best model is [1mSwin[0m with an accuracy of [1m0.7786[0m
For dataset [1mDTD[0m, the best model is [1mSwin[0m with an accuracy of [1m0.8218[0m
