## Albumentations
![](https://neurohive.io/wp-content/uploads/2019/03/Screenshot-from-2019-03-14-00-53-12.png)

## + CutMix(up)
![](https://miro.medium.com/max/4176/1*IR3uTsclxKdzKIXDlTiVgg.png)

Hi everyone, welcome back to another Tensorflow implementation of state-of-the-art image augmentations today. I previously shared two notebooks with augmentations done through ImageDataGenerator, but that was limited in flexibility. Today, I'm going to show you how to utilize the customizability of tf.data to implementation Albumentations with CutMix(up).

Before I start, do checkout my previous notebooks for this competition if you haven't as they will provide good background:
- https://www.kaggle.com/junyingsg/step-by-step-guide-to-denoising-your-labels
- https://www.kaggle.com/junyingsg/end-to-end-cassava-disease-classification-in-keras

I also want to give a shoutout to the notebook who provided the implementation of CutMix(up) I drew from:
- https://www.kaggle.com/itsuki9180/efficientnet-and-cutmixup-with-tpu-train-phase

# Importing libraries and data

In [None]:
import numpy as np
import pandas as pd
from PIL import Image
import os
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm import tqdm
from sklearn.utils import shuffle
from sklearn.utils import class_weight
import random
import cv2
import warnings
warnings.filterwarnings('ignore')

import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Model, Sequential, load_model
from tensorflow.keras.layers import Dense, Flatten, Dropout, Activation, Input
from tensorflow.keras.layers import BatchNormalization, GlobalAveragePooling2D
from tensorflow.keras.callbacks import ModelCheckpoint, ReduceLROnPlateau, EarlyStopping
from tensorflow.keras.applications import EfficientNetB0
from sklearn.model_selection import StratifiedShuffleSplit, StratifiedKFold
import tensorflow_addons as tfa
import albumentations as A
from functools import partial
import gc

In [None]:
from tensorflow.keras.mixed_precision import experimental as mixed_precision
policy = tf.keras.mixed_precision.experimental.Policy('mixed_float16')
mixed_precision.set_policy(policy)

AUTOTUNE = tf.data.experimental.AUTOTUNE

In [None]:
import random
from numpy.random import seed
from tensorflow.random import set_seed

seed_value = 42
random.seed(seed_value)
seed(seed_value)
set_seed(seed_value)

In [None]:
df_train = pd.read_csv("../input/cassava-leaf-disease-classification/train.csv")
training_folder = '../input/cassava-leaf-disease-classification/train_images/'
df_train["filepath"] = training_folder+df_train["image_id"]
df_train.head()

# Converting data to tf.data

So, some of you might be wondering why go through all the trouble to convert our data to a tf.data implementation when we can just use the built-in tensorflow/keras functions and avoid all this hassle and complications? Here are two main reasons:

1. Python executes code eagerly (sequentially) by default, as do pandas and numpy. This means that the ability to run code in parallel with a GPU (multi-processing) is not utilized fully if you don't specify. TF.data maximizes parallel code execution by converting your code into Tensorflow graphs and tensors (in a nutshell), so it runs faster.

2. Like I mentioned earlier, tf.data offers much more functionality, flexibility and customization for your code than in-built functions. In our case, you can manipulate images much more intricately with tf.data than you can with in-built functions. We are able to implement CutMix(up) exactly because of this. 

Here are some links in case you want to reaed up more:
- https://www.tensorflow.org/guide/data_performance
- https://stackoverflow.com/questions/54894799/why-should-i-use-tf-data

*Disclaimer: I'm not making full use of tf.data functionality in this notebook. The problem with many notebooks utilizing tf.data is that they are too complex and not beginner-friendly. I aim to change that in this notebook and offer a introductory example of tf.data for further exploration. That is to say, tf.data can be implemented much more efficiently and as a result; run faster. This notebook only has barebones functionality to accomodate the necessary image augmentations.*

I'll be using EfficientNetB0 and a smaller image size for demonstration purposes. Feel free to scsale up both the model and the image size with additional tweaks for better results.

In [None]:
batch_size = 16
image_size = 224
input_shape = (image_size, image_size, 3)

In [None]:
skf = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=seed_value)
for train_index, val_index in skf.split(df_train["image_id"], df_train["label"]):
    train_data = df_train.loc[train_index]
    val_data = df_train.loc[val_index]

In [None]:
def load_image_and_label_from_path(image_path, label):
    img = tf.io.read_file(image_path)
    img = tf.image.decode_jpeg(img, channels=3)
    return img, label

![](https://storage.googleapis.com/jalammar-ml/tf.data/images/tf.data-pipeline-1.png)

The first step is to read in data from our csv file and images, convert them into tf.tensors with the features and labels assigned. You can think of tensors as small data blocks, whose sizes we can specify with "batch size".

In [None]:
training_data = tf.data.Dataset.from_tensor_slices((train_data["filepath"].values, train_data["label"].values))
validation_data = tf.data.Dataset.from_tensor_slices((val_data["filepath"].values, val_data["label"].values))

training_data = training_data.map(load_image_and_label_from_path, num_parallel_calls=AUTOTUNE)
validation_data = validation_data.map(load_image_and_label_from_path, num_parallel_calls=AUTOTUNE)

# Image augmentation
Next, we still specify our image augmentations in Albumentations and map them to our tf.tensors.

In [None]:
def augment_train_data(train_ds):
    transforms = A.Compose([
            A.RandomResizedCrop(image_size, image_size),
            A.Transpose(p=0.5),
            A.HorizontalFlip(p=0.5),
            A.VerticalFlip(p=0.5),
            A.ShiftScaleRotate(p=0.5),
            A.HueSaturationValue(hue_shift_limit=0.2, sat_shift_limit=0.2, val_shift_limit=0.2, p=0.5),
            A.RandomBrightnessContrast(brightness_limit=(-0.1, 0.1), contrast_limit=(-0.1, 0.1), p=0.5),
            A.CoarseDropout(p=0.5),
            A.Cutout(p=0.5),
            ], p=1)
    
    def aug_fn(image):
        data = {"image":image}
        aug_data = transforms(**data)
        aug_img = aug_data["image"]
        aug_img = tf.cast(aug_img, tf.float32)
        return aug_img

    def process_data(image, label):
        aug_img = tf.numpy_function(func=aug_fn, inp=[image], Tout=tf.float32)
        return aug_img, label
    
    def set_shapes(img, label, img_shape=(image_size,image_size,3)):
        img.set_shape(img_shape)
        label.set_shape([])
        return img, label
    
    ds_alb = train_ds.map(partial(process_data), num_parallel_calls=AUTOTUNE).prefetch(AUTOTUNE)
    ds_alb = ds_alb.map(set_shapes, num_parallel_calls=AUTOTUNE)
    ds_alb = ds_alb.repeat()
    ds_alb = ds_alb.batch(batch_size)
    return ds_alb

In [None]:
def augment_val_data(val_ds):
    transforms = A.Compose([
                A.CenterCrop(image_size, image_size),
                ], p=1)
    
    def aug_fn(image):
        data = {"image":image}
        aug_data = transforms(**data)
        aug_img = aug_data["image"]
        aug_img = tf.cast(aug_img, tf.float32)
        return aug_img

    def process_data(image, label):
        aug_img = tf.numpy_function(func=aug_fn, inp=[image], Tout=tf.float32)
        return aug_img, label
    
    def set_shapes(img, label, img_shape=(image_size, image_size,3)):
        img.set_shape(img_shape)
        label.set_shape([])
        return img, label
    
    ds_alb = val_ds.map(partial(process_data), num_parallel_calls=AUTOTUNE).prefetch(AUTOTUNE)
    ds_alb = ds_alb.map(set_shapes, num_parallel_calls=AUTOTUNE).batch(batch_size)
    return ds_alb

In [None]:
train_alb = augment_train_data(training_data)
val_alb = augment_val_data(validation_data)

In [None]:
def view_image(ds):
    image, label = next(iter(ds)) # extract 1 batch from the dataset
    image = image.numpy()/255
    label = label.numpy()

    fig = plt.figure(figsize=(22, 22))
    for i in range(batch_size):
        ax = fig.add_subplot(4, 4, i+1, xticks=[], yticks=[])
        ax.imshow(image[i])

### We can view the images post-augmentation with our "view image" function.

In [None]:
view_image(train_alb)

### Check that validation set remains unchanged!

In [None]:
view_image(val_alb)

# CutMix(up)

Great! The primary augmentations through Albumentations are done. Time to implement CutMix(up) on top of them for better generalization and performance of our model.

In [None]:
IMAGE_SIZE = [image_size, image_size]
AUG_BATCH = batch_size

In [None]:
def cutmix(image, label, PROBABILITY=0.5):
    # input image - is a batch of images of size [n,dim,dim,3] not a single image of [dim,dim,3]
    # output - a batch of images with cutmix applied
    DIM = IMAGE_SIZE[0]
    CLASSES = 5
    
    imgs = []; labs = []
    for j in range(AUG_BATCH):
        # DO CUTMIX WITH PROBABILITY DEFINED ABOVE
        P = tf.cast( tf.random.uniform([],0,1)<=PROBABILITY, tf.int32)
        # CHOOSE RANDOM IMAGE TO CUTMIX WITH
        k = tf.cast( tf.random.uniform([],0,AUG_BATCH),tf.int32)
        # CHOOSE RANDOM LOCATION
        x = tf.cast( tf.random.uniform([],0,DIM),tf.int32)
        y = tf.cast( tf.random.uniform([],0,DIM),tf.int32)
        b = tf.random.uniform([],0,1) # this is beta dist with alpha=1.0
        WIDTH = tf.cast( DIM * tf.math.sqrt(1-b),tf.int32) * P
        ya = tf.math.maximum(0,y-WIDTH//2)
        yb = tf.math.minimum(DIM,y+WIDTH//2)
        xa = tf.math.maximum(0,x-WIDTH//2)
        xb = tf.math.minimum(DIM,x+WIDTH//2)
        # MAKE CUTMIX IMAGE
        one = image[j,ya:yb,0:xa,:]
        two = image[k,ya:yb,xa:xb,:]
        three = image[j,ya:yb,xb:DIM,:]
        middle = tf.concat([one,two,three],axis=1)
        img = tf.concat([image[j,0:ya,:,:],middle,image[j,yb:DIM,:,:]],axis=0)
        imgs.append(img)
        # MAKE CUTMIX LABEL
        a = tf.cast(WIDTH*WIDTH/DIM/DIM,tf.float32)
        if len(label.shape)==1:
            lab1 = tf.one_hot(label[j],CLASSES)
            lab2 = tf.one_hot(label[k],CLASSES)
        else:
            lab1 = label[j,]
            lab2 = label[k,]
        labs.append((1-a)*lab1 + a*lab2)
            
    image2 = tf.reshape(tf.stack(imgs),(AUG_BATCH,DIM,DIM,3))
    label2 = tf.reshape(tf.stack(labs),(AUG_BATCH,CLASSES))
    return image2,label2

In [None]:
def mixup(image, label, PROBABILITY=0.5):
    # input image - is a batch of images of size [n,dim,dim,3] not a single image of [dim,dim,3]
    # output - a batch of images with mixup applied
    DIM = IMAGE_SIZE[0]
    CLASSES = 5
    
    imgs = []; labs = []
    for j in range(AUG_BATCH):
        # DO MIXUP WITH PROBABILITY DEFINED ABOVE
        P = tf.cast( tf.random.uniform([],0,1)<=PROBABILITY, tf.float32)
        # CHOOSE RANDOM
        k = tf.cast( tf.random.uniform([],0,AUG_BATCH),tf.int32)
        a = tf.random.uniform([],0,1)*P # this is beta dist with alpha=1.0
        # MAKE MIXUP IMAGE
        img1 = image[j,]
        img2 = image[k,]
        imgs.append((1-a)*img1 + a*img2)
        # MAKE CUTMIX LABEL
        if len(label.shape)==1:
            lab1 = tf.one_hot(label[j],CLASSES)
            lab2 = tf.one_hot(label[k],CLASSES)
        else:
            lab1 = label[j,]
            lab2 = label[k,]
        labs.append((1-a)*lab1 + a*lab2)
            
    image2 = tf.reshape(tf.stack(imgs),(AUG_BATCH,DIM,DIM,3))
    label2 = tf.reshape(tf.stack(labs),(AUG_BATCH,CLASSES))
    return image2,label2

In [None]:
def transform(image1,label):
    # THIS FUNCTION APPLIES BOTH CUTMIX AND MIXUP
    DIM = IMAGE_SIZE[0]
    CLASSES = 5
    SWITCH = 0.5
    CUTMIX_PROB = 0.66
    MIXUP_PROB = 0.66
    # FOR SWITCH PERCENT OF TIME WE DO CUTMIX AND (1-SWITCH) WE DO MIXUP

    image2, label2 = cutmix(image1, label, CUTMIX_PROB)
    image3, label3 = mixup(image1, label, MIXUP_PROB)
    imgs = []; labs = []
    for j in range(AUG_BATCH):
        P = tf.cast( tf.random.uniform([],0,1)<=SWITCH, tf.float32)
        imgs.append(P*image2[j,]+(1-P)*image3[j,])
        labs.append(P*label2[j,]+(1-P)*label3[j,])

    image4 = tf.reshape(tf.stack(imgs),(AUG_BATCH,DIM,DIM,3))
    label4 = tf.reshape(tf.stack(labs),(AUG_BATCH,CLASSES))
    return image4,label4

In [None]:
def onehot(image,label):
    CLASSES = 5
    return image,tf.one_hot(label,CLASSES)

In [None]:
def optimize_data(train_alb, val_alb):

    def configure_for_train(ds):
        ds = ds.map(transform, num_parallel_calls=AUTOTUNE)
        ds = ds.unbatch()
        ds = ds.shuffle(buffer_size=1024)
        ds = ds.batch(batch_size)
        ds = ds.prefetch(buffer_size=AUTOTUNE)
        return ds
    
    def configure_for_val(ds):
        ds = ds.map(onehot, num_parallel_calls=AUTOTUNE)
        ds = ds.prefetch(buffer_size=AUTOTUNE)
        return ds

    train_ds = configure_for_train(train_alb)
    val_ds = configure_for_val(val_alb)
    
    return train_ds, val_ds

In [None]:
train_ds, val_ds = optimize_data(train_alb, val_alb)

### Let's see how our images look like after applying CutMix and Mixup:

In [None]:
view_image(train_ds)

### Again, make sure the validation set is untouched.

In [None]:
view_image(val_ds)

# Model creation & training

In [None]:
def create_model():
    # Model creation
    base_model = EfficientNetB0(include_top=False, weights="imagenet", input_shape=input_shape)
    
    # Rebuild top
    inputs = Input(shape=input_shape)
    aug_model = base_model(inputs)
    pooling = GlobalAveragePooling2D()(aug_model)
    dropout = Dropout(0.5)(pooling)
    outputs = Dense(5, activation="softmax", dtype='float32')(dropout)

    # Compile
    model = Model(inputs=inputs, outputs=outputs)
    optimizer = tfa.optimizers.RectifiedAdam()
    loss = tf.keras.losses.CategoricalCrossentropy(label_smoothing=0.01)

    model.compile(optimizer=optimizer, loss=loss, metrics=["accuracy"])
    return model

This is where tf.data works its magic. Consider the following image:
### Naive pipeline
![](https://dominikschmidt.xyz/tensorflow-data-pipeline/assets/feed_dict_pipeline.png)

This is the typical workflow of a naive data pipeline, there is always some idle time and overhead due to the inefficiency of sequential execution.

In contrast, consider:
### tf.data pipeline
![](https://dominikschmidt.xyz/tensorflow-data-pipeline/assets/tf_data_pipeline.png)

This is the workflow of a tf.data pipeline. As you can see, downtime and waiting around is minimized while processing is maximized through parallel execution.

In [None]:
def train_val_split(train_data, val_data):
    training_data = tf.data.Dataset.from_tensor_slices((train_data["filepath"].values, train_data["label"].values))
    validation_data = tf.data.Dataset.from_tensor_slices((val_data["filepath"].values, val_data["label"].values))

    training_data = training_data.map(load_image_and_label_from_path, num_parallel_calls=AUTOTUNE)
    validation_data = validation_data.map(load_image_and_label_from_path, num_parallel_calls=AUTOTUNE)
    
    return (training_data, validation_data)

In [None]:
epochs = 10
total_steps = (int(len(df_train)*0.8/batch_size)+1)
fold_number = 4
n_splits = 5
train_list = []
val_list = []

skf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=seed_value)
for train_index, val_index in skf.split(df_train["image_id"], df_train["label"]):
    train_list.append(train_index)
    val_list.append(val_index)

In [None]:
for i in range(n_splits-fold_number): 
    tf.keras.backend.clear_session()
    gc.collect()
    train_set = df_train.loc[train_list[fold_number]]
    val_set = df_train.loc[val_list[fold_number]]
    train_data, val_data = train_val_split(train_set, val_set)
    train_alb = augment_train_data(train_data)
    val_alb = augment_val_data(val_data)
    train_final, val_final = optimize_data(train_alb, val_alb)

    model = create_model()
    print("Training fold no.: " + str(fold_number+1))

    model_name = "effnetb3 "
    fold_name = "fold.h5"
    filepath = model_name + str(fold_number+1) + fold_name
    callbacks = [ReduceLROnPlateau(monitor='val_loss', patience=1, verbose=1, factor=0.2),
                 EarlyStopping(monitor='val_loss', patience=2, verbose=1, restore_best_weights=True),
                 ModelCheckpoint(filepath=filepath, monitor='val_accuracy', save_best_only=True)]

    history = model.fit(train_final, steps_per_epoch=total_steps, epochs=epochs, validation_data=val_final, callbacks=callbacks)
    fold_number += 1
    if fold_number == n_splits:
        print("Training finished!")

And we're done! Hope you enjoyed this walkthrough and it motivates you to learn and try using tf.data. Please upvote this notebook if you liked it. It motivates me to continue producing high-quality notebooks. Thanks and stay tuned!