# Introduction.
Note: this is minimal working solution to show approach with merge ImageDataGenerator and augmentations libraries.

TL;DR Write custom AugmentDataGenerator, it can be found in last cell of this notebook. 

This competition is a great example to learn how to use `ImageDataGenerator.from_from_dataframe()`. Lets see: there is a `train.csv` file, contains two columns, `image_id` and `label`. Read it into pandas dataframe:

In [None]:
import os

import albumentations as A
import numpy as np
import pandas as pd
from tensorflow.keras.applications import EfficientNetB0
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.models import Sequential
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.utils import Sequence

In [None]:
INPUT_DIR = '/kaggle/input/cassava-leaf-disease-classification'
df = pd.read_csv(os.path.join(INPUT_DIR, 'train.csv'))
df['label'] = df['label'].astype(str)
print(df.head())

# Baseline example
Now we get simple model from `keras.applications` and prepare data generator to create batches in couple of lines!

In [None]:
model = Sequential([
    EfficientNetB0(include_top=False),
    GlobalAveragePooling2D(),
    Dense(5, activation='softmax'),
])
model.compile('adam', 'categorical_crossentropy', ['accuracy'])

datagen = ImageDataGenerator()\
    .flow_from_dataframe(
        dataframe=df,
        directory=os.path.join(INPUT_DIR, 'train_images'),
        x_col='image_id',
        y_col='label',
    )
model.fit(datagen, epochs=1, workers=4)

# Augmentations

Now lets add augmentations transformer and pass it as `preprocessing_function` into `ImageDataGenerator` constructor.

In [None]:
def transform(image):
    aug = A.Compose([
        A.Flip(),
        A.Rotate(),
    ])
    return aug(image=image)['image']

datagen = ImageDataGenerator(preprocessing_function=transform)\
    .flow_from_dataframe(
        dataframe=df,
        directory=os.path.join(INPUT_DIR, 'train_images'),
        x_col='image_id',
        y_col='label',
    )
model.fit(datagen, epochs=1, workers=4)

# Problem

Don't worry, error in cell below given as example of problem.

Imaging a situation: we want to select random piece of source image, take it and resize into shape of (256, 256). Sounds easy, right?
For this case we should pass image in original size of (600, 800) into transformer and then resize it.

In [None]:
def transform(image):
    aug = A.Compose([
        A.RandomResizedCrop(256, 256),
        A.Flip(),
        A.Rotate(),
    ])
    return aug(image=image)['image']

datagen = ImageDataGenerator(preprocessing_function=transform)\
    .flow_from_dataframe(
        dataframe=df,
        directory=os.path.join(INPUT_DIR, 'train_images'),
        x_col='image_id',
        y_col='label',
        target_size=(600, 800),
    )
model.fit(datagen, epochs=1, workers=4)

Wow, what the hell?
```
ValueError: could not broadcast input array from shape (256,256,3) into shape (600,800,3)
```
Why it happens? Lets check the keras spec:
> **preprocessing_function**: function that will be applied on each input. The function will run after the image is resized and augmented. The function should take one argument: one image (Numpy tensor with rank 3), and **should output a Numpy tensor with the same shape**.

We should resize images only in ImageDataGenerator, but what if we want to use albumentations (or another augmentation library, like imgaug)?

# Solution

Lets write simple wrapper and use minimum of code:

In [None]:
class AugmentDataGenerator(Sequence):
    def __init__(self, datagen, augment=None):
        self.datagen = datagen
        if augment is None:
            self.augment = A.Compose([])
        else:
            self.augment = augment

    def __len__(self):
        return len(self.datagen)

    def __getitem__(self, x):
        images, *rest = self.datagen[x]
        augmented = []
        for image in images:
            image = self.augment(image=image)['image']
            augmented.append(image)
        return (np.array(augmented), *rest)


datagen = ImageDataGenerator()\
    .flow_from_dataframe(
        dataframe=df,
        directory=os.path.join(INPUT_DIR, 'train_images'),
        x_col='image_id',
        y_col='label',
        target_size=(600, 800),
    )

datagen = AugmentDataGenerator(datagen, A.Compose([
    A.RandomResizedCrop(256, 256),
    A.Flip(),
    A.Rotate(),
]))

model.fit(datagen, epochs=1, workers=4)

Now it works without any problems. Just use this solution if you want use custom size augmentations not from keras (like albumentation or imgaug)

Hope it can be helpful to save some time :)