<a href="https://colab.research.google.com/github/ilsilfverskiold/smaller-models-docs/blob/main/computer-vision/cook/image-classification/Image_classification_resnet_trainer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Image classification with a CNN using Hugging Face**

---

The pre-trained model we'll fine-tune here is set at for microsoft/resnet-50, possibly works for other similar models.

Batch size is 32, epoch is 3.

**Make sure you change the dataset to what you need.** My dataset I've used has both a training and a validation set, so change the code accordingly if you don't have a validation set.

In [None]:
!pip install -q datasets transformers

In [None]:
dataset_url = "ilsilfverskiold/traffic-camera-norway-images" # public dataset
model_checkpoint = "microsoft/resnet-50" # decide on your pre-trained model
learning_rate = 5e-5
weight_decay = 0.01
epochs = 3
batch_size= 32

Import the dataset from huggingface below.

In [None]:
from datasets import load_dataset

dataset = load_dataset(dataset_url) # possible to import private too by setting token=your_token

dataset

Check the features and get the labels. Make sure the images are in PIL format.

In [None]:
dataset["train"].features["label"]

In [None]:
labels = dataset["train"].features["label"].names
label2id, id2label = dict(), dict()
for i, label in enumerate(labels):
    label2id[label] = i
    id2label[i] = label

We'll have to preprocess the data below using data normalization, augmentation, and compatibility with model input requirements. These preprocessing steps are critical for adapting the input data to the format expected by the ResNet-50 model, aligning the new training or validation data closely with the conditions of the original training set.

In [None]:
from torchvision.transforms import Compose, Normalize, Resize, CenterCrop, RandomHorizontalFlip, ToTensor
from transformers import AutoFeatureExtractor

feature_extractor = AutoFeatureExtractor.from_pretrained(model_checkpoint)

normalize = Normalize(mean=feature_extractor.image_mean, std=feature_extractor.image_std)

train_transform = Compose([
    Resize(256),
    CenterCrop(224),
    RandomHorizontalFlip(),
    ToTensor(),
    normalize,
])

val_transform = Compose([
    Resize(256),
    CenterCrop(224),
    ToTensor(),
    normalize,
])

def apply_transform(examples, transform):
    """Applies the transform to the 'img' key in examples."""
    examples['pixel_values'] = [transform(image.convert('RGB')) for image in examples['image']]
    return examples

def set_dataset_transform(dataset, transform):
    dataset.set_transform(lambda examples: apply_transform(examples, transform))

In [None]:
set_dataset_transform(dataset['train'], train_transform)
set_dataset_transform(dataset['validation'], val_transform)

Check that you have a new field called pixel_values with tensors for the first item in the training data.

In [None]:
dataset['train'][0]

We use the labels we set up earlier from the dataset when importing the pre-trained model below, we also tell it to ignore the pre-defined labels that it previously have been trained on.

In [None]:
from transformers import TFAutoModelForImageClassification

model = TFAutoModelForImageClassification.from_pretrained(
    model_checkpoint,
    label2id=label2id,
    id2label=id2label,
    ignore_mismatched_sizes=True,
)

Set up the compute metrics, your choice to have less.

In [None]:
import numpy as np
from datasets import load_metric

accuracy_metric = load_metric("accuracy")
precision_metric = load_metric("precision")
recall_metric = load_metric("recall")
f1_metric = load_metric("f1")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=1)

    # Compute metrics
    accuracy = accuracy_metric.compute(predictions=predictions, references=labels)
    precision = precision_metric.compute(predictions=predictions, references=labels, average='macro')
    recall = recall_metric.compute(predictions=predictions, references=labels, average='macro')
    f1 = f1_metric.compute(predictions=predictions, references=labels, average='macro')

    metrics = {
        "accuracy": accuracy['accuracy'],
        "precision": precision['precision'],
        "recall": recall['recall'],
        "f1": f1['f1']
    }
    return metrics

We'll set an optimizer as well. An optimizer is a method or algorithm for adjusting the parameters of neural networks based on the feedback from the loss function.

In [None]:
from transformers import AdamWeightDecay

optimizer = AdamWeightDecay(learning_rate=learning_rate, weight_decay_rate=weight_decay)

In [None]:
model.compile(optimizer=optimizer)

A data collator, like the DefaultDataCollator, ensures that batches of data are appropriately processed and standardized, handling tasks like padding and converting data to the required format, crucial for training machine learning models efficiently.

In [None]:
from transformers import DefaultDataCollator

data_collator = DefaultDataCollator(return_tensors="np")

train_set = dataset['train'].to_tf_dataset(
    columns=["pixel_values", "label"],
    shuffle=True,
    batch_size=batch_size,
    collate_fn=data_collator
)
val_set = dataset['validation'].to_tf_dataset(
    columns=["pixel_values", "label"],
    shuffle=False,
    batch_size=batch_size,
    collate_fn=data_collator
)

In [None]:
batch = next(iter(train_set))

Train the model below using the transformers library, make sure you check training loss and accuracy. Training loss should consistently go down while accuracy should go up.

In [None]:
from transformers.keras_callbacks import PushToHubCallback, KerasMetricCallback
from tensorflow.keras.callbacks import TensorBoard

metric_callback = KerasMetricCallback(
    metric_fn=compute_metrics, eval_dataset=val_set, batch_size=batch_size, label_cols=['labels']
)

tensorboard_callback = TensorBoard(log_dir="./ic_from_scratch_model_save/logs")

model_name = model_checkpoint.split("/")[-1]

callbacks = [metric_callback, tensorboard_callback]

model.fit(
    train_set,
    validation_data=val_set,
    callbacks=callbacks,
    epochs=epochs,
    batch_size=batch_size,
)

Let's evaluate on the validation set

In [None]:
eval_loss = model.evaluate(val_set)
eval_loss

In [None]:
for batch in iter(val_set):
    predictions = model.predict(batch)
    predicted_labels = np.argmax(predictions.logits, -1)
    metric.add_batch(predictions=predicted_labels, references=batch['labels'])

metric.compute()

In [None]:
model.save_pretrained("my_model", saved_model=True)

In [None]:
from transformers import TFAutoModelForImageClassification

model = TFAutoModelForImageClassification.from_pretrained("my_model")

In [None]:
from transformers import pipeline

pipe = pipeline("image-classification", model=model, feature_extractor=feature_extractor)

(optional) use a few new images in your Google Drive to to inference on to see how it performs.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
from PIL import Image

image_path = '/content/drive/MyDrive/my_new_image.png' # change this to the correct path

image = Image.open(image_path)
image

results = pipe(image)
results