![](https://scontent.fblr2-1.fna.fbcdn.net/v/t39.2365-6/131715266_303592217719628_2247522524990492321_n.png?_nc_cat=101&ccb=1-3&_nc_sid=ad8a9d&_nc_ohc=Pt6GSI30vO0AX_YI3Nd&_nc_ht=scontent.fblr2-1.fna&oh=a39cdc80e4052f0a8507b596e83318bf&oe=606E10D5)



This new technique — Data-efficient image Transformers (DeiT) — requires less data and less computing resources to produce a high-performance image classification model. Training a DeiT model over 3 days, achieved 84.2 top-1 accuracy on the widely used ImageNet benchmark without using any external data for training. This result is competitive with the performance of cutting-edge convolutional neural networks (CNNs), which have been the dominant approach to image classification for many years.

DeiT is an important step forward in using Transformers to advance computer vision. Its performance is already competitive with that of CNNs, even though the latter have been the dominant approach for computer vision tasks for the last eight years and have benefited from many improvements and adjustments. This indicates that additional research will produce significant additional gains.

# *Upvote the kernel if you find it insightful!*

# Install and Import Libraries

In [None]:
!pip install -q timm

In [None]:
# Python library used for working with arrays.
import numpy as np

# Python library to interact with the file system.
import os

# Software library written for data manipulation and analysis. 
import pandas as pd

# fastai library for computer vision tasks
from fastai.vision.all import *
from fastai.metrics import *

# Developing and training neural network based deep learning models.
import torch

# Import Data

In [None]:
dataset_path = Path('../input/ranzcr-clip-catheter-line-classification')

In [None]:
train_df = pd.read_csv(dataset_path/'train.csv')

In [None]:
train_df.head()

# Data Preprocessing

In [None]:
train_df['path'] = train_df['StudyInstanceUID'].map(lambda x:str(dataset_path/'train'/x)+'.jpg')
train_df = train_df.drop(columns=['StudyInstanceUID'])
train_df.head(10)

In [None]:
# Transforms we need to do for each image in the dataset (ex: resizing).
item_tfms = RandomResizedCrop(384, min_scale=0.75, ratio=(1.,1.)) 

# Transforms that can take place on a batch of images (ex: many augmentations).
batch_tfms = [*aug_transforms(size=384, max_warp=0), Normalize.from_stats(*imagenet_stats)]

In [None]:
label_names = list(train_df.columns[:11])

In [None]:
data = DataBlock(blocks=(ImageBlock, MultiCategoryBlock(encoded=True, vocab=label_names)), # multi-label target
                 splitter = RandomSplitter(seed=42),# split data into training and validation subsets.
                 get_x = ColReader(12),# obtain the input images.
                 get_y = ColReader(list(range(11))), # obtain the targets.
                 item_tfms = item_tfms,
                 batch_tfms = batch_tfms)

Get dataloader and show the data

In [None]:
dls = data.dataloaders(train_df,bs=16)

# We can call show_batch() to see what a sample of a batch looks like.
dls.show_batch()

# Model

In [None]:
model = torch.hub.load('facebookresearch/deit:main', 'deit_base_patch16_384', pretrained=True)


In [None]:
model.head

In [None]:
model.head = nn.Sequential(nn.Dropout(0.25), 
                           nn.Linear(768, 11))

model.head

In [None]:
learn = Learner(dls, model, metrics = [accuracy_multi])

In [None]:
learn.lr_find()

In [None]:
learn.fine_tune(1, base_lr=1.2022644114040304e-05)

# Submission File

In [None]:
sample_df = pd.read_csv(dataset_path/'sample_submission.csv')
sample_df.head()

In [None]:
_sample_df = sample_df.copy()
_sample_df['PatientID'] = 'None'
_sample_df['path'] = _sample_df['StudyInstanceUID'].map(lambda x:str(dataset_path/'test'/x)+'.jpg')
_sample_df = _sample_df.drop(columns=['StudyInstanceUID'])
test_dl = dls.test_dl(_sample_df)

In [None]:
test_dl.show_batch()

# Test Time Augmentation (TTA)
Similar to what Data Augmentation is doing to the training set, the purpose of Test Time Augmentation is to perform random modifications to the test images. Thus, instead of showing the regular, “clean” images, only once to the trained model, we will show it the augmented images several times. We will then average the predictions of each corresponding image and take that as our final guess.

The reason why it works is that, by averaging our predictions, on randomly modified images, we are also averaging the errors. The error can be big in a single vector, leading to a wrong answer, but when averaged, only the correct answer stand out.

In [None]:
# Return predictions on the ds_idx dataset or dl using Test Time Augmentation
preds, _ = learn.tta(dl=test_dl,n=3)

In [None]:
submission_df = sample_df
for i in range(len(submission_df)):
    for j in range(len(label_names)):
        submission_df.iloc[i, j+1] = preds[i][j].numpy().astype(np.float32)

In [None]:
submission_df.head(10)

In [None]:
submission_df.to_csv(f'submission.csv', index=False)