## 1. Import libraries and preprocess dataset

To visualize (or recall) what shear does,

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/f/f1/Academ_Study_about_a_periodic_tiling_by_regular_polygons.svg/330px-Academ_Study_about_a_periodic_tiling_by_regular_polygons.svg.png" width="200" height="auto" />

(Source: [Baelde, 2013](https://en.wikipedia.org/wiki/Shear_mapping))

What rescale does,

<img src="https://www.researchgate.net/profile/Idrissa-Coulibaly/publication/263179118/figure/fig2/AS:669296715890712@1536584180478/Multi-scale-example-processing-by-octave.png" width="300" height="auto" />

(Source: [Idrissa Coulibaly et., 2014](https://www.researchgate.net/publication/263179118_A_novel_approach_for_road_damage_assessment_in_case_of_major_disaster_based_on_multi-resolution_analysis))

What zoom range does,

<img src="https://s3.ap-south-1.amazonaws.com/s3.studytonight.com/curious/uploads/pictures/1611473961-74364.png" width="450" height="auto" />

(Source: [@shekharpandey, 2021](https://www.studytonight.com/post/random-zoom-image-augmentation-keras-imagedatagenerator))

What flipping does,

<img src="https://desktop.arcgis.com/en/arcmap/latest/tools/data-management-toolbox/GUID-5EA301F4-0E6D-47DE-9F28-D9E754BD8784-web.gif" width="300" height="auto" />

(Source: [arcgis.com, n.d.](https://desktop.arcgis.com/en/arcmap/latest/tools/data-management-toolbox/changing-the-orientation-of-a-raster.htm))

### Why do we use the image data generator?

This adds synthetic data points, which exposes the model to additional variations without the cost of collecting and annotating more data. It reduces overfitting and improves the model's ability to generalize.

Intuitively, flipping an image object should be equally recognizable as its mirror image. Zooming reduces the contribution of the background in the CNN's decision for locating where an object is. Shearing should also allow the CNN to recognize it despite minor distortions.

Rescaling (or normalizing) (done by setting the `target_size`) to a fixed image size also reduces training time and makes the detection much easier.

The `batch_size` is the chunk size of the data for each epoch (due to huge volumes of data), and the iterations is the number of runs in each epoch. For `flow_from_directory`, the classes are determined from the respective folders name. In our case, we only need to do a binary classification since there are only 2 classes (in which a reference could be thought of sigmoid not softmax).

In [1]:
import tensorflow as tf
from keras.preprocessing.image import ImageDataGenerator

# Parameters
input_n_size = (64, 64)
batch_size = 32

# Training dataset
train_datagen = ImageDataGenerator(rescale=1./255,         
                                   shear_range=0.2,
                                   zoom_range=0.2,
                                   horizontal_flip=True)

training_set = train_datagen.flow_from_directory('dataset/training_set',
                                                 target_size=input_n_size,
                                                 batch_size=batch_size,
                                                 class_mode='binary')

# Test dataset
test_datagen = ImageDataGenerator(rescale=1./255)

test_set = test_datagen.flow_from_directory('dataset/test_set',
                                            target_size=input_n_size,
                                            batch_size=batch_size,
                                            class_mode='binary')

Found 8000 images belonging to 2 classes.
Found 2000 images belonging to 2 classes.


## 2. Create the CNN model

Similar reference with AlexNet network.

In [2]:
model = tf.keras.models.Sequential()

# L1: Convolution Layer
L1_f_size = 3 # 3 × 3 filter
L1_params = {
    'filters': 32, 
    'kernel_size': L1_f_size, 
    'activation': 'relu', 
    'input_shape': list(input_n_size) + [L1_f_size]
}

model.add(tf.keras.layers.Conv2D(**L1_params))

# L2: Max-Pooling Layer
model.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2))

# L3: Convolution Layer
model.add(tf.keras.layers.Conv2D(filters=32, kernel_size=2, activation='relu'))

# L4: Max-Pooling Layer
model.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2))

# L5-pre: Flattening layer
model.add(tf.keras.layers.Flatten())

# L5: Full connection layer
model.add(tf.keras.layers.Dense(units=128, activation='relu'))

# L6: Output layer
model.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))

# Compile with the descent algorithm and loss function
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

## 3. Train the CNN model

In [3]:
model.fit(x=training_set, validation_data=test_set, epochs=25)

Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


<tensorflow.python.keras.callbacks.History at 0x14c923ad0>

## 4. Predict the following observation

In [5]:
import numpy as np
from keras.preprocessing import image

test_image = image.load_img('dataset/single_prediction/cat_or_dog_1.jpg', target_size=input_n_size)
test_image = image.img_to_array(test_image)
test_image = np.expand_dims(test_image, axis=0)

# Make prediction
result = model.predict(test_image)

print('Classes: ', training_set.class_indices, '\n')
print(f'Prediction: {"dog" if result[0][0] == 1 else "cat"}')

Classes:  {'cats': 0, 'dogs': 1} 

Prediction: dog
