<a href="https://colab.research.google.com/github/linhle32/Interactive-Models-with-Widget/blob/main/image_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Classification on Image Data

This is a small project on the task of image classification.

We will load and finetune the auto model `TFAutoModelForImageClassification` from `transformers` on a custom data set. The codes involving data processing and modeling mostly come from the HuggingFace tutorial at https://huggingface.co/docs/transformers/tasks/image_classification

I mainly modify the data loading part so that we can load customized dataset. Furthermore, I add a small GUI application at the end so we can interact with the model.

### Load data

Image data should be in a `zip` file and organized by one label - one folder. More specifically, all images from one label are placed in the same folder, and the folder name is the label name.

Please set `data_path` to the `zip` file in your Google Drive. The curly brackets `{}` allow us to use Python variable in a terminal command (`!unzip`) through Google Colab.

In this example, I extracted a small subset of the `Animal Image Dataset` from Kaggle https://www.kaggle.com/datasets/iamsouravbanerjee/animal-image-dataset-90-different-animals

In [None]:
data_path = ''

In [None]:
from google.colab import drive
drive.mount('/content/drive')
!unzip '{data_path}'

### Process data

This part can be run as is. All codes are from HuggingFace tutorial for image classification.

In [None]:
!pip install datasets evaluate transformers

import PIL, datasets, evaluate
from os import listdir
from os.path import isfile, join
from torchvision.datasets import ImageFolder
from datasets import load_dataset

dataset = load_dataset("imagefolder", data_dir="animals/")
dataset = dataset['train'].train_test_split(test_size=0.3)
labels = dataset["train"].features["label"].names
label2id, id2label = dict(), dict()
for i, label in enumerate(labels):
    label2id[label] = str(i)
    id2label[str(i)] = label

from transformers import AutoImageProcessor
checkpoint = "google/vit-base-patch16-224-in21k"
image_processor = AutoImageProcessor.from_pretrained(checkpoint)

from tensorflow import keras
from keras import layers
import numpy as np
import tensorflow as tf
from PIL import Image
from transformers import DefaultDataCollator
import evaluate
import numpy as np

size = (image_processor.size["height"], image_processor.size["width"])
train_data_augmentation = keras.Sequential(
    [
        layers.RandomCrop(size[0], size[1]),
        layers.Rescaling(scale=1.0 / 127.5, offset=-1),
        layers.RandomFlip("horizontal"),
        layers.RandomRotation(factor=0.02),
        layers.RandomZoom(height_factor=0.2, width_factor=0.2),
    ],
    name="train_data_augmentation",
)
val_data_augmentation = keras.Sequential(
    [
        layers.CenterCrop(size[0], size[1]),
        layers.Rescaling(scale=1.0 / 127.5, offset=-1),
    ],
    name="val_data_augmentation",
)

def convert_to_tf_tensor(image: Image):
    np_image = np.array(image)
    tf_image = tf.convert_to_tensor(np_image)
    return tf.expand_dims(tf_image, 0)

def preprocess_train(example_batch):
    images = [
        train_data_augmentation(convert_to_tf_tensor(image.convert("RGB"))) for image in example_batch["image"]
    ]
    example_batch["pixel_values"] = [tf.transpose(tf.squeeze(image)) for image in images]
    return example_batch

def preprocess_val(example_batch):
    images = [
        val_data_augmentation(convert_to_tf_tensor(image.convert("RGB"))) for image in example_batch["image"]
    ]
    example_batch["pixel_values"] = [tf.transpose(tf.squeeze(image)) for image in images]
    return example_batch

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return accuracy.compute(predictions=predictions, references=labels)

accuracy = evaluate.load("accuracy")
dataset["train"].set_transform(preprocess_train)
dataset["test"].set_transform(preprocess_val)
data_collator = DefaultDataCollator(return_tensors="tf")

### Modeling

Like data processing, I reused codes from the HuggingFace tutorial here. Mostly, there is nothing to change. However, we can change a few hyperparamters to see if the performance improves. Save the model when you are happy with the model performance.
- `num_epochs`: like in the previous module, this is the number of iteration
- `learning_rate`: how fast the model will update in each iteration
- `batch_size`: how many images are used in each batch in one iteration
- `weight_decay_rate`: how fast the learning rate drops while training

In [None]:
num_epochs = 3
learning_rate = 3e-5
batch_size = 32
weight_decay_rate = 0.01

In [None]:
from transformers import create_optimizer, TFAutoModelForImageClassification
from keras.losses import SparseCategoricalCrossentropy
from transformers.keras_callbacks import KerasMetricCallback

num_train_steps = len(dataset["train"]) * num_epochs

optimizer, lr_schedule = create_optimizer(
    init_lr=learning_rate,
    num_train_steps=num_train_steps,
    weight_decay_rate=weight_decay_rate,
    num_warmup_steps=0,
)
model = TFAutoModelForImageClassification.from_pretrained(
    checkpoint,
    id2label=id2label,
    label2id=label2id,
)
tf_train_dataset = dataset["train"].to_tf_dataset(
    columns="pixel_values", label_cols="label", shuffle=True, batch_size=batch_size, collate_fn=data_collator
)
tf_eval_dataset = dataset["test"].to_tf_dataset(
    columns="pixel_values", label_cols="label", shuffle=True, batch_size=batch_size, collate_fn=data_collator
)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer=optimizer, loss=loss)
metric_callback = KerasMetricCallback(metric_fn=compute_metrics, eval_dataset=tf_eval_dataset)
callbacks = [metric_callback]
model.fit(tf_train_dataset, validation_data=tf_eval_dataset, epochs=num_epochs, callbacks=callbacks)

### Save the Model

Set `model_path` to the desired place to save your model. After thus cell, you are done with this notebook.

In [None]:
model_path = ''
model.save_pretrained(model_path)

# Image Classification Application

In [None]:
model_path = '.../animal_model'

In [None]:
!pip install transformers
from google.colab import drive
drive.mount('/content/drive')
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
from google.colab import files
import io
from IPython.display import clear_output
from transformers import AutoImageProcessor
from transformers import TFAutoModelForImageClassification
import keras
import keras.layers as layers
import tensorflow as tf

checkpoint = "google/vit-base-patch16-224-in21k"
image_processor = AutoImageProcessor.from_pretrained(checkpoint)
model = TFAutoModelForImageClassification.from_pretrained(model_path)

In [None]:
import ipywidgets as widgets
from IPython.display import display, clear_output

button_predict = widgets.Button(description="Predict")
uploader = widgets.FileUpload(multiple=False)
output = widgets.Output()
display(button_predict, uploader, output)


@output.capture()
def on_predict_clicked(b):
  output.clear_output()
  try:
    image = Image.open(io.BytesIO(list(uploader.value.values())[0]['content']))
    image = image.convert("RGB")
    inputs = image_processor(image, return_tensors="tf")
    logits = model(**inputs).logits
    predicted_class_id = int(tf.math.argmax(logits, axis=-1)[0])
    label = model.config.id2label[predicted_class_id]
    plt.imshow(image)
    plt.title('this image is classified as ' + label, y=-0.2)
    plt.show()
  except:
    print('please upload an image first')

button_predict.on_click(on_predict_clicked)

Button(description='Predict', style=ButtonStyle())

FileUpload(value={}, description='Upload')

Output()