<a href="https://colab.research.google.com/github/rahiakela/deep-learning-for-computer-vision/blob/main/1-image-data-preparation/5_loading_large_datasets_from_directories_with_keras.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Loading Large Datasets From Directories with Keras

There are conventions for storing and structuring your image dataset on disk in order to make it fast and efficient to load and when training and evaluating deep learning models. Once structured, you can use tools like the **ImageDataGenerator** class in the Keras deep learning library to automatically load your train, test, and validation datasets. In addition, the generator will progressively load the images in your dataset (e.g. just-in-time), allowing you to work with both small and very large datasets containing thousands or millions of images that may not fit into system memory. In this tutorial, you will discover how to structure an image dataset and
how to load it progressively when fitting and evaluating a deep learning model.

This is divided into three parts; they are:

1. Dataset Directory Structure
2. Example Dataset Structure
3. How to Progressively Load Images

##Setup

In [1]:
import tensorflow as tf

from tensorflow.keras.preprocessing.image import ImageDataGenerator

In [2]:
%%shell

wget -q https://machinelearningmastery.com/wp-content/uploads/2019/01/red_car_01.jpg
wget -q https://machinelearningmastery.com/wp-content/uploads/2019/01/blue_car_01.jpg

# making directory structure
mkdir data
mkdir data/train
mkdir data/test
mkdir data/validation

mkdir data/train/red
mkdir data/train/blue
mkdir data/test/red
mkdir data/test/blue
mkdir data/validation/red
mkdir data/validation/blue

# copy image to directory structure
cp -r red_car_01.jpg data/train/red/
cp -r blue_car_01.jpg data/train/blue/

cp -r red_car_01.jpg data/test/red/
cp -r blue_car_01.jpg data/test/blue/

cp -r red_car_01.jpg data/validation/red/
cp -r blue_car_01.jpg data/validation/blue/



## Progressively Load Images

Instead of loading all images into memory, it will load just enough images into memory for the current and perhaps the next few mini-batches when training and evaluating a deep learning model. I refer to this as progressive loading (or lazy loading), as the dataset is progressively loaded from file, retrieving just enough data for what is needed immediately.

Two additional benefits of the using the ImageDataGenerator class is that it can also automatically scale pixel values of images and it can automatically generate augmented versions of images.

The pattern for using the ImageDataGenerator class is used as follows:

- Construct and configure an instance of the ImageDataGenerator class.
- Retrieve an iterator by calling the flow from directory() function.
- Use the iterator in the training or evaluation of a model.

In [3]:
# create generator
datagen = ImageDataGenerator()

# prepare an iterators for each dataset
train_itr = datagen.flow_from_directory("data/train/", class_mode="binary")
val_itr = datagen.flow_from_directory("data/validation/", class_mode="binary")
test_itr = datagen.flow_from_directory("data/test/", class_mode="binary")

# confirm the iterator works
batch_x, batch_y = train_itr.next()

print("Batch shape=%s, min=%.3f, max=%.3f" % (batch_x.shape, batch_x.min(), batch_x.max()))

Found 2 images belonging to 2 classes.
Found 2 images belonging to 2 classes.
Found 2 images belonging to 2 classes.
Batch shape=(2, 256, 256, 3), min=0.000, max=255.000


Reference:

https://keras.io/api/preprocessing/image/

https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html