This document is a hands-on practice from a studing group,[從 Python 到 TensorFlow 線上讀書會](https://bookclubtensorflow.github.io/) based in Taiwan.

Models for image classification including:
- [BEiT](https://huggingface.co/microsoft/beit-base-patch16-224-pt22k-ft22k) by Microsoft
- [VIT](https://huggingface.co/google/vit-base-patch16-224) by Google
- [SegFormer](https://huggingface.co/nvidia/mit-b0) by NVIDIA
- [DEIT](https://huggingface.co/facebook/deit-base-distilled-patch16-224) by Meta

I use this [shibainu picture](https://upload.wikimedia.org/wikipedia/commons/thumb/5/58/Shiba_inu_taiki.jpg/440px-Shiba_inu_taiki.jpg) in order to compare the result by each models.

![](https://upload.wikimedia.org/wikipedia/commons/thumb/5/58/Shiba_inu_taiki.jpg/440px-Shiba_inu_taiki.jpg)

# Set Up Environment

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [2]:
pip install transformers

[0mNote: you may need to restart the kernel to use updated packages.


In [3]:
from IPython.core.display import HTML
HTML("<script>Jupyter.notebook.kernel.restart()</script>")

## BEiT by Mocrosoft

In [4]:
from transformers import BeitImageProcessor, BeitForImageClassification
from PIL import Image
import requests

url = 'https://upload.wikimedia.org/wikipedia/commons/thumb/5/58/Shiba_inu_taiki.jpg/440px-Shiba_inu_taiki.jpg'
image = Image.open(requests.get(url, stream=True).raw)

processor = BeitImageProcessor.from_pretrained('microsoft/beit-base-patch16-224-pt22k-ft22k')
model = BeitForImageClassification.from_pretrained('microsoft/beit-base-patch16-224-pt22k-ft22k')

inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
# model predicts one of the 21,841 ImageNet-22k classes
predicted_class_idx = logits.argmax(-1).item()
print("Predicted class:", model.config.id2label[predicted_class_idx])

Downloading (…)rocessor_config.json:   0%|          | 0.00/276 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.67M [00:00<?, ?B/s]

Downloading (…)"pytorch_model.bin";:   0%|          | 0.00/414M [00:00<?, ?B/s]

Predicted class: dingo, warrigal, warragal, Canis_dingo


## VIT by Google

In [5]:
from transformers import ViTImageProcessor, ViTForImageClassification
from PIL import Image
import requests

url = 'https://upload.wikimedia.org/wikipedia/commons/thumb/5/58/Shiba_inu_taiki.jpg/440px-Shiba_inu_taiki.jpg'
image = Image.open(requests.get(url, stream=True).raw)

processor = ViTImageProcessor.from_pretrained('google/vit-base-patch16-224')
model = ViTForImageClassification.from_pretrained('google/vit-base-patch16-224')

inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
# model predicts one of the 1000 ImageNet classes
predicted_class_idx = logits.argmax(-1).item()
print("Predicted class:", model.config.id2label[predicted_class_idx])

Downloading (…)rocessor_config.json:   0%|          | 0.00/160 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/69.7k [00:00<?, ?B/s]

Downloading (…)"pytorch_model.bin";:   0%|          | 0.00/346M [00:00<?, ?B/s]

Predicted class: Eskimo dog, husky


## SegFormer by NVIDIA

In [6]:
from transformers import SegformerFeatureExtractor, SegformerForImageClassification
from PIL import Image
import requests

url = "https://upload.wikimedia.org/wikipedia/commons/thumb/5/58/Shiba_inu_taiki.jpg/440px-Shiba_inu_taiki.jpg"
image = Image.open(requests.get(url, stream=True).raw)

feature_extractor = SegformerFeatureExtractor.from_pretrained("nvidia/mit-b0")
model = SegformerForImageClassification.from_pretrained("nvidia/mit-b0")

inputs = feature_extractor(images=image, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
# model predicts one of the 1000 ImageNet classes
predicted_class_idx = logits.argmax(-1).item()
print("Predicted class:", model.config.id2label[predicted_class_idx])

Downloading (…)rocessor_config.json:   0%|          | 0.00/272 [00:00<?, ?B/s]



Downloading (…)lve/main/config.json:   0%|          | 0.00/70.0k [00:00<?, ?B/s]

Downloading (…)"pytorch_model.bin";:   0%|          | 0.00/14.4M [00:00<?, ?B/s]

Predicted class: dingo, warrigal, warragal, Canis dingo


## DEIT by Meta

In [7]:
from transformers import AutoFeatureExtractor, DeiTForImageClassificationWithTeacher
from PIL import Image
import requests

url = 'https://upload.wikimedia.org/wikipedia/commons/thumb/5/58/Shiba_inu_taiki.jpg/440px-Shiba_inu_taiki.jpg'
image = Image.open(requests.get(url, stream=True).raw)

feature_extractor = AutoFeatureExtractor.from_pretrained('facebook/deit-base-distilled-patch16-224')
model = DeiTForImageClassificationWithTeacher.from_pretrained('facebook/deit-base-distilled-patch16-224')

inputs = feature_extractor(images=image, return_tensors="pt")

# forward pass
outputs = model(**inputs)
logits = outputs.logits

# model predicts one of the 1000 ImageNet classes
predicted_class_idx = logits.argmax(-1).item()
print("Predicted class:", model.config.id2label[predicted_class_idx])

Downloading (…)rocessor_config.json:   0%|          | 0.00/287 [00:00<?, ?B/s]



Downloading (…)lve/main/config.json:   0%|          | 0.00/69.6k [00:00<?, ?B/s]

Downloading (…)"pytorch_model.bin";:   0%|          | 0.00/349M [00:00<?, ?B/s]

Predicted class: dingo, warrigal, warragal, Canis dingo
