### Image Classification Using DeiT
Image classification is the canonical computer vision task of determining if an image contains a specific object, feature, or activity.

This notebook demonstrates training an image classifi using Vision Transformers. This demonstration uses Data-efficient Image Transformers DeiT pretrained on ImageNet for image classification.


The DeiT model was proposed in [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou. The Vision Transformer (ViT) introduced in Dosovitskiy et al., 2020 has shown that one can match or even outperform existing convolutional neural networks using a Transformer encoder (BERT-like). However, the ViT models introduced in that paper required training on expensive infrastructure for multiple weeks, using external data. DeiT (data-efficient image transformers) are more efficiently trained transformers for image classification, requiring far less data and far less computing resources compared to the original ViT models.


We install roboflow first.
Also, we will be saving the checkpoints in the /content/checkpoints. So let's create that folder as well.

In [None]:
import os
!pip3 install roboflow
os.mkdir('/content/checkpoints')

## Import section

In [2]:
import sys
import torch
import torch.nn as nn
import requests
from PIL import Image
from tqdm import tqdm
import torchvision.transforms as transforms
import torchvision.datasets as datasets

from roboflow import Roboflow
from transformers import AutoImageProcessor, DeiTForImageClassification

## Load the model


In [6]:
image_processor = AutoImageProcessor.from_pretrained("facebook/deit-base-distilled-patch16-224")
model = DeiTForImageClassification.from_pretrained("facebook/deit-base-distilled-patch16-224")

Some weights of DeiTForImageClassification were not initialized from the model checkpoint at facebook/deit-base-distilled-patch16-224 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


## Vanilla inference
We will do the inference on the downloaded model on a test image.

In [4]:
print("Starting inference on pretrained model...")
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

inputs = image_processor(images=image, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
# model predicts one of the 1000 ImageNet classes
predicted_class_idx = logits.argmax(-1).item()
print("Predicted class:", model.config.id2label[predicted_class_idx])
print("Completed inference on pretrained model!\n")

Starting inference on pretrained model...
Predicted class: bucket, pail
Completed inference on pretrained model!



## Load Data
We will be using a roboflow dataset. I am demonstrating this training pipeline using Tumour-Classification-1 dataset but you can use any dataset that you like.

We will also prepare train and val dataloaders in this section.

In [10]:
print("Commencing data download from roboflow...")
rf = Roboflow(api_key="<YOUR_API_KEY>")
project = rf.workspace("brain-tumor-c6lzv").project("tumor-classification-ufzoh")
dataset = project.version(1).download("folder")
print("Data download complete!\n")

print("Preparing dataloaders...")

train_dir = './Tumor-Classification-1/train'
val_dir = './Tumor-Classification-1/valid'

train_dataset = datasets.ImageFolder(train_dir, transforms.Compose([
    transforms.Resize(224),
    transforms.ToTensor()
]))
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=16, shuffle=True)
counter = 0

val_dataset = datasets.ImageFolder(val_dir, transforms.Compose([
    transforms.Resize(224),
    transforms.ToTensor()
]))
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=16, shuffle=True)

print("classes: ", train_dataset.classes)
print("Dataloaders prepared!")

Commencing data download from roboflow...
loading Roboflow workspace...
loading Roboflow project...
Data download complete!

Preparing dataloaders...
classes:  ['glioma', 'meningioma', 'notumor', 'pituitary']
Dataloaders prepared!


## Training
Note: I have set the epochs to only 5 for demonstration purpose. You may use the appropriate number of epochs to get the desired finetuning.

In [None]:

class DeiT():

    def train(self, train_loader, model, criterion, optimizer, epoch):
        model.train()
        losses = []
        running_loss = 0.0
        running_corrects = 0

        print("training:")
        for input, target in tqdm(train_loader):
          target = target.cuda()
          input_var = torch.autograd.Variable(input).cuda()
          target_var = torch.autograd.Variable(target).cuda()

          output = model(input_var)
          _, preds = torch.max(output.logits, 1)
          loss = criterion(output.logits.cuda(), target_var)

          optimizer.zero_grad()
          loss.backward()
          optimizer.step()
          losses.append(loss.detach().cpu().numpy())

          running_loss += loss.detach().cpu().numpy() * input.size(0)
          running_corrects += torch.sum(preds == target.data)

        return running_loss, running_corrects

    def validate(self, val_loader, model, criterion):

        model.eval()
        running_loss = 0.0
        running_corrects = 0
        print("validation:")
        for input, target in tqdm(val_loader):
            target = target.cuda()
            input_var = torch.autograd.Variable(input, volatile=True).cuda()
            target_var = torch.autograd.Variable(target, volatile=True).cuda()

            output = model(input_var)
            _, preds = torch.max(output.logits, 1)
            loss = criterion(output.logits.cuda(), target_var)

            running_loss += loss.detach().cpu().numpy() * input.size(0)
            # print(running_loss)
            running_corrects += torch.sum(preds == target.data)

        return running_loss, running_corrects

    def run(self):

        model.classifier = nn.Linear(model.classifier.in_features, 4)
        criterion = nn.CrossEntropyLoss()
        optimizer = torch.optim.SGD(model.parameters(), lr=0.005)
        epochs = 5
        model.cuda()

        print("Commencing training epochs...")
        for epoch in range(epochs):
          print("\nEpoch: ", epoch)

          # train for one epoch
          train_loss, train_acc = self.train(train_loader, model, criterion, optimizer, epoch)
          epoch_train_loss = train_loss / len(train_loader)
          epoch_train_acc = train_acc.double() / len(train_loader)

          # evaluate on validation set
          val_loss, val_acc = self.validate(val_loader, model, criterion)
          epoch_val_loss = val_loss / len(val_loader)
          epoch_val_acc = val_acc.double() / len(val_loader)

          print(f'train_acc: {epoch_train_acc:.4f} train_loss: {epoch_train_loss:.4f} val_acc: {epoch_val_acc:.4f} val_loss: {epoch_val_loss:.4f}')
          PATH = '/content/checkpoints/epoch{epoch}_{loss:.2f}.pth'.format(epoch=epoch, loss=epoch_train_loss)
          torch.save(model, PATH)


if __name__ == "__main__":
    deit = DeiT()
    deit.run()

## Testing
We will use the test data from the downloaded roboflow dataset.
Note: For the sake of demonstration, I have used the 4th checkpoint. You can use the best model for testing.

In [None]:
# inference
import matplotlib.pyplot as plt
import numpy as np

EVAL_BATCH = 1

test_dir = '/content/Tumor-Classification-1/test'
test_dataset = datasets.ImageFolder(test_dir, transforms.Compose([
            transforms.Resize(224),
            transforms.ToTensor()
        ]))
test_loader  = torch.utils.data.DataLoader(test_dataset, batch_size=EVAL_BATCH, shuffle=True, num_workers=4)

model = torch.load('/content/checkpoints/epoch4_0.37.pth')
model.eval()

running_loss = 0.0
running_corrects = 0
classes = test_dataset.classes
print("classes: ", classes)
print("validation:")
with torch.no_grad():
  for input, target in test_loader:
      target = target.cuda()
      input_var = torch.autograd.Variable(input, volatile=True).cuda()
      target_var = torch.autograd.Variable(target, volatile=True).cuda()

      output = model(input_var)
      probabilities = torch.softmax(output.logits,dim=1)
      pred = classes[torch.argmax(probabilities).detach().cpu().numpy()]
      running_corrects += torch.sum(torch.tensor(pred == classes[target.detach().cpu().numpy()[0]]))

      print("PREDICTED: ", pred)
      print("TARGET: ", classes[target.detach().cpu().numpy()[0]])
      print("\n\n")


  test_acc = running_corrects.double() / len(test_loader)
  print("test accuracy: ", test_acc.detach().cpu().numpy())
