# Introduction
With huge number of images to be loaded and processed, we will not have enough RAM to load and process all the images at once. The solution is not to read the data until it is required, load and process the images in batches as and when required by the model. The pre processing also has to be done on the fly in batches as part of the model. In keras this job is done by image data generator. Give the fit method the generator instead of the data array(x_train, y_train) directly, it takes care of loading and unloading the images in batches. Image data generator is implemented using python generators.

In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
import cv2, os, shutil

In [2]:
# Image data generator needs the data to be organised in a specific folder structure
# It needs a root folder and in it one folder for each category
# The category folders should have their repective category images
os.mkdir("master_data")
os.mkdir("master_data/cat")
os.mkdir("master_data/dog")

In [4]:
# copy the images into the folder structure based on their category
src = "train/"
dst_cat = "master_data/cat/"
dst_dog = "master_data/dog/"

for file_name in os.listdir(src):
    if "dog" in file_name:
        shutil.copy(src+file_name, dst_dog)
    else:
        shutil.copy(src+file_name, dst_cat)

In [7]:
print(len(os.listdir(dst_dog)))
print(len(os.listdir(dst_cat)))

12500
12500


In [10]:
batch_size = 32

idg = tf.keras.preprocessing.image.ImageDataGenerator(validation_split=0.1, rescale=1/255.0)
# one hot encoding is done by the generator itself
train_idg = idg.flow_from_directory(directory="master_data", target_size=(150,150), batch_size=batch_size, 
                                    subset="training")

Found 22500 images belonging to 2 classes.


In [11]:
val_idg = idg.flow_from_directory(directory="master_data", target_size=(150,150), batch_size=batch_size, 
                                  subset="validation")

Found 2500 images belonging to 2 classes.


In [12]:
# Modelling
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Input((150,150,3), name="Input"))
model.add(tf.keras.layers.Flatten(name="Flatten"))
model.add(tf.keras.layers.Dense(128, activation=tf.keras.activations.relu, name="Hidden"))
model.add(tf.keras.layers.Dense(2, activation=tf.keras.activations.softmax, name="Output"))
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 Flatten (Flatten)           (None, 67500)             0         
                                                                 
 Hidden (Dense)              (None, 128)               8640128   
                                                                 
 Output (Dense)              (None, 2)                 258       
                                                                 
Total params: 8,640,386
Trainable params: 8,640,386
Non-trainable params: 0
_________________________________________________________________


In [13]:
model.compile(optimizer=tf.keras.optimizers.SGD(), loss=tf.keras.losses.categorical_crossentropy, metrics=["acc"])

In [14]:
model.fit(train_idg, batch_size=batch_size, epochs=10, validation_data=val_idg)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x258c96a79a0>