# Exploration and Comparison of Transformers for Image Classification

Comparison of results for fine-tuning with data augmentations across 6 datasets for each model
- ViT
- DeiT
- Swin
- CLIP

Showing the results in a different file is to avoid information and visual clutter.

NOTE: Since variables are not saved accros multiple jupyter notebook files, the data needs to be taken from the individual notebooks and hardcoded here.

### Prerequisites

Load necessary packages.

In [2]:
import os
os.chdir('../../../')
from utils.data_utils import *
import pandas as pd

### Results

Get the results from each model.

NOTE: Taken from "Results" section from each notebook.

In [3]:
resisc45_vit = 0.950635
food101_vit = 0.872317
fer2013_vit = 0.709111
pcam_vit = 0.871582
sun397_vit = 0.770805
dtd_vit = 0.786702

resisc45_deit = 0.955397
food101_deit = 0.845109
fer2013_deit = 0.686821
pcam_deit = 0.881317
sun397_deit = 0.732966
dtd_deit = 0.772340

resisc45_swin = 0.948889
food101_swin = 0.901703
fer2013_swin = 0.715798
pcam_swin = 0.898834
sun397_swin = 0.789425
dtd_swin = 0.807447

resisc45_clip = 0.947778
food101_clip = 0.815089
fer2013_clip = 0.677069
pcam_clip = 0.840698
sun397_clip = 0.685517
dtd_clip = 0.703191

Aggregate results over datasets.

In [4]:
results_resisc45 = [resisc45_vit, resisc45_deit, resisc45_swin, resisc45_clip]
results_food101 = [food101_vit, food101_deit, food101_swin, food101_clip]
results_fer2013 = [fer2013_vit, fer2013_deit, fer2013_swin, fer2013_clip]
results_pcam = [pcam_vit, pcam_deit, pcam_swin, pcam_clip]
results_sun397 = [sun397_vit, sun397_deit, sun397_swin, sun397_clip]
results_dtd = [dtd_vit, dtd_deit, dtd_swin, dtd_clip]

Concatenate all results into a single variable.

In [5]:
results = [
    results_resisc45,
    results_food101,
    results_fer2013,
    results_pcam,
    results_sun397,
    results_dtd,
]

In [6]:
labels = ['RESISC45', 'Food-101', 'FER2013', 'PatchCamelyon', 'SUN397', 'DTD']
models = ['ViT', 'DeiT', 'Swin', 'CLIP']

In [7]:
acc_dict = create_accuracy_dict(
    results,
    labels
)

In [8]:
acc_dict

{'RESISC45': [0.950635, 0.955397, 0.948889, 0.947778],
 'Food-101': [0.872317, 0.845109, 0.901703, 0.815089],
 'FER2013': [0.709111, 0.686821, 0.715798, 0.677069],
 'PatchCamelyon': [0.871582, 0.881317, 0.898834, 0.840698],
 'SUN397': [0.770805, 0.732966, 0.789425, 0.685517],
 'DTD': [0.786702, 0.77234, 0.807447, 0.703191]}

Display a dataframe containing the results for model and dataset.

In [9]:
df = pd.DataFrame(results, columns=[n for n in models], index=labels)
df

Unnamed: 0,ViT,DeiT,Swin,CLIP
RESISC45,0.950635,0.955397,0.948889,0.947778
Food-101,0.872317,0.845109,0.901703,0.815089
FER2013,0.709111,0.686821,0.715798,0.677069
PatchCamelyon,0.871582,0.881317,0.898834,0.840698
SUN397,0.770805,0.732966,0.789425,0.685517
DTD,0.786702,0.77234,0.807447,0.703191


Print the model that was best for each dataset.

In [10]:
for dataset, values in acc_dict.items():
    max_value = max(values)
    best_model = models[values.index(max_value)]
    print(f"For dataset \033[1m{dataset}\033[0m, the best model is \033[1m{best_model}\033[0m with an accuracy of \033[1m{max_value:.6}\033[0m")

For dataset [1mRESISC45[0m, the best model is [1mDeiT[0m with an accuracy of [1m0.955397[0m
For dataset [1mFood-101[0m, the best model is [1mSwin[0m with an accuracy of [1m0.901703[0m
For dataset [1mFER2013[0m, the best model is [1mSwin[0m with an accuracy of [1m0.715798[0m
For dataset [1mPatchCamelyon[0m, the best model is [1mSwin[0m with an accuracy of [1m0.898834[0m
For dataset [1mSUN397[0m, the best model is [1mSwin[0m with an accuracy of [1m0.789425[0m
For dataset [1mDTD[0m, the best model is [1mSwin[0m with an accuracy of [1m0.807447[0m
