Tutorial: https://huggingface.co/docs/transformers/tasks/image_classification

### Load Food-101 dataset

In [1]:
from datasets import load_dataset

In [2]:
food = load_dataset("food101", split="train[:5000]")

In [3]:
food = food.train_test_split(test_size=0.2)

In [4]:
#example - each example in the dataser has two fields: image and label
food["train"][0]

{'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=512x384>,
 'label': 81}

In [5]:
# creating a dictionary that maps the label name to an integer and vice versa
labels = food["train"].features["label"].names
label2id, id2label = dict(), dict()
for i, label in enumerate(labels):
    label2id[label] = str(i)
    id2label[str(i)] = label

In [6]:
# converting label id to a label name:
id2label[str(79)]

'prime_rib'

### Preprocess

In [7]:
# loading a ViT image processor to process the image into a tensor:
from transformers import AutoImageProcessor

In [8]:
checkpoint = "google/vit-base-patch16-224-in21k"

In [9]:
image_processor = AutoImageProcessor.from_pretrained(checkpoint)

2023-12-02 20:53:52.628938: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


##### Pytorch

Applying some image transformations to the images to make the model more robust against overfitting (with torchvision’s transforms module). We'll crop a random part of the image, resize it, and normalize it with the image mean and standard deviation:

In [11]:
from torchvision.transforms import RandomResizedCrop, Compose, Normalize, ToTensor

In [12]:
normalize = Normalize(mean=image_processor.image_mean, std=image_processor.image_std)

In [13]:
size = (
    image_processor.size["shortest_edge"]
    if "shortest_edge" in image_processor.size
    else (image_processor.size["height"], image_processor.size["width"])
)

In [14]:
_transforms = Compose([RandomResizedCrop(size), ToTensor(), normalize])

Then we create a preprocessing function to apply the transforms and return the pixel_values - the inputs to the model - of the image:

In [15]:
def transforms(examples):
    examples["pixel_values"] = [_transforms(img.convert("RGB")) for img in examples["image"]]
    del examples["image"]
    return examples

To apply the preprocessing function over the entire dataset, use 🤗 Datasets with_transform method. The transforms are applied on the fly when you load an element of the dataset:

In [16]:
food = food.with_transform(transforms)

Now create a batch of examples using DefaultDataCollator. Unlike other data collators in 🤗 Transformers, the DefaultDataCollator does not apply additional preprocessing such as padding.

In [17]:
from transformers import DefaultDataCollator

In [18]:
data_collator = DefaultDataCollator()

##### TensorFlow

To avoid overfitting and to make the model more robust, add some data augmentation to the training part of the dataset. Here we use Keras preprocessing layers to define the transformations for the training data (includes data augmentation), and transformations for the validation data (only center cropping, resizing and normalizing). You can use tf.imageor any other library you prefer.

In [19]:
from tensorflow import keras
from tensorflow.keras import layers

In [20]:
size = (image_processor.size["height"], image_processor.size["width"])

In [21]:
train_data_augmentation = keras.Sequential(
    [
        layers.RandomCrop(size[0], size[1]),
        layers.Rescaling(scale=1.0 / 127.5, offset=-1),
        layers.RandomFlip("horizontal"),
        layers.RandomRotation(factor=0.02),
        layers.RandomZoom(height_factor=0.2, width_factor=0.2),
    ],
    name="train_data_augmentation",
)

2023-12-02 21:09:59.642683: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10533 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1
2023-12-02 21:09:59.643494: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 10534 MB memory:  -> device: 1, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:04:00.0, compute capability: 6.1
2023-12-02 21:09:59.644053: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 10534 MB memory:  -> device: 2, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:81:00.0, compute capability: 6.1
2023-12-02 21:09:59.644613: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 10534 MB memory:  -> device: 3, name: NVIDIA GeForce

In [22]:
val_data_augmentation = keras.Sequential(
    [
        layers.CenterCrop(size[0], size[1]),
        layers.Rescaling(scale=1.0 / 127.5, offset=-1),
    ],
    name="val_data_augmentation",
)

Next, create functions to apply appropriate transformations to a batch of images, instead of one image at a time.

In [23]:
import numpy as np
import tensorflow as tf
from PIL import Image

In [24]:
def convert_to_tf_tensor(image: Image):
    np_image = np.array(image)
    tf_image = tf.convert_to_tensor(np_image)
    # `expand_dims()` is used to add a batch dimension since
    # the TF augmentation layers operates on batched inputs.
    return tf.expand_dims(tf_image, 0)

In [25]:
def preprocess_train(example_batch):
    """Apply train_transforms across a batch."""
    images = [
        train_data_augmentation(convert_to_tf_tensor(image.convert("RGB"))) for image in example_batch["image"]
    ]
    example_batch["pixel_values"] = [tf.transpose(tf.squeeze(image)) for image in images]
    return example_batch

In [26]:
def preprocess_val(example_batch):
    """Apply val_transforms across a batch."""
    images = [
        val_data_augmentation(convert_to_tf_tensor(image.convert("RGB"))) for image in example_batch["image"]
    ]
    example_batch["pixel_values"] = [tf.transpose(tf.squeeze(image)) for image in images]
    return example_batch