### <b>Table of Content</b>

0. Import functions

1. Download ZIP file from Google Drive and unzip in into local drive

2. Load image files

3. Define a CNN (Convolutional Neural Network)

    3-1.Initialize a Sequential model from Keras and add layers to it

    3-2. Compile the model with accuracy and f1 score as the evaluation metrics during training and testing

4. Train the CNN model and evaluate model performance

5. Save models for later use

6. Conclusion

### <b>0. Import functions</b>

In [1]:
from utils.load import extract_zip_file, load_images

import os
import sys
import warnings
warnings.filterwarnings("ignore")

### <b>1. Download ZIP file from Google Drive and unzip in into local drive</b>

In [2]:
# Details of the source file in G Drive
file_id = "1KDQBTbo5deKGCdVV_xIujscn5ImxW4dm"
file_url = f"https://drive.google.com/file/d/{file_id}"
zip_file_name = "images.zip"

# Details of local directories
root_path = sys.path[0]
download_path = root_path + "\\" + "data"
zip_file_path = download_path + "\\" + zip_file_name

Download the source file from G Drive if the file does not exist.

In [3]:
if os.path.exists(zip_file_path):
    print(f"File {zip_file_name} already exists in {download_path}.")
else:
    print("Downloading file from Google Drive.")
    print("This could take a few minutes.")
    !gdown 1KDQBTbo5deKGCdVV_xIujscn5ImxW4dm

File images.zip already exists in c:\Users\Admin\Documents\GitHub\Apziva\lnaNWaYIRf6JhvHJ\data.


Extract the zip file.

In [4]:
extract_zip_file(zip_file_path, download_path, zip_file_name)

images.zip already extracted in c:\Users\Admin\Documents\GitHub\Apziva\lnaNWaYIRf6JhvHJ\data.


### <b>2. Load image files</b>

Load images as is without any transformation such as converting to arrays for efficiency and memory saving.

In [5]:
array_dict = load_images(download_path, as_array=False)

Loading files in c:\Users\Admin\Documents\GitHub\Apziva\lnaNWaYIRf6JhvHJ\data\images
Loading files in c:\Users\Admin\Documents\GitHub\Apziva\lnaNWaYIRf6JhvHJ\data\images\testing
Loading files in c:\Users\Admin\Documents\GitHub\Apziva\lnaNWaYIRf6JhvHJ\data\images\testing\flip


100%|██████████| 290/290 [00:00<00:00, 1319.88it/s]


Loading files in c:\Users\Admin\Documents\GitHub\Apziva\lnaNWaYIRf6JhvHJ\data\images\testing\notflip


100%|██████████| 307/307 [00:00<00:00, 2746.37it/s]


Loading files in c:\Users\Admin\Documents\GitHub\Apziva\lnaNWaYIRf6JhvHJ\data\images\training
Loading files in c:\Users\Admin\Documents\GitHub\Apziva\lnaNWaYIRf6JhvHJ\data\images\training\flip


100%|██████████| 1162/1162 [00:00<00:00, 2641.52it/s]


Loading files in c:\Users\Admin\Documents\GitHub\Apziva\lnaNWaYIRf6JhvHJ\data\images\training\notflip


100%|██████████| 1230/1230 [00:00<00:00, 1996.13it/s]


Check the shape of the images.

In [6]:
from numpy import asarray

image_shape = None
for k, v in array_dict.items():
    for k2, v2 in v.items():
        for k3, v3 in v2.items():
            while image_shape == None:
                image_array = asarray(v3)
                image_shape = image_array.shape
                print(f"Image shape: {image_shape}")

Image shape: (1920, 1080, 3)


### <b>3. Define a CNN (Convolutional Neural Network)</b>

##### 3-1.Initialize a Sequential model from Keras and add layers to it

In [7]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
    
# 0. Initialize a Sequential model from Keras.
model = Sequential()

# 1.  Add a convolutional layer. The first convolutional layer includes an input layer as specified by input_shape.
reduced_image_shape = (int(image_shape[0]/10), int(image_shape[1]/10), image_shape[2])
model.add(Conv2D(filters=8, kernel_size=(7, 7), activation='relu', input_shape=reduced_image_shape))

# 2. Add a max pooling layer.
model.add(MaxPooling2D(pool_size=(3, 3)))

# Add another set of convolutional and pooling layers.
model.add(Conv2D(filters=16, kernel_size=(7, 7), activation='relu'))
model.add(MaxPooling2D(pool_size=(3, 3)))

# Add another set of convolutional and pooling layers.
model.add(Conv2D(filters=64, kernel_size=(7, 7), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

# 3. Add a flatten layer.
model.add(Flatten())

# 4. Add a dense (i.e. fully connected) layer with 32 neurons and a ReLU activation function.
model.add(Dense(units=32, activation='relu'))

# A dropout layer can be added to deal with overfitting.
# The below line of code will randomly drop 50% of the neurons during training, which helps to reduce overfitting.
# model.add(Dropout(0.5))

# 5. Add an output layer, which is another dense layer with 1 neurons and a sigmoid activation function.
model.add(Dense(units=1, activation='sigmoid'))

# Print out the summary of the model.
print(model.summary())

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 186, 102, 8)       1184      
                                                                 
 max_pooling2d (MaxPooling2D  (None, 62, 34, 8)        0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 56, 28, 16)        6288      
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 18, 9, 16)        0         
 2D)                                                             
                                                                 
 conv2d_2 (Conv2D)           (None, 12, 3, 64)         50240     
                                                                 
 max_pooling2d_2 (MaxPooling  (None, 6, 1, 64)         0

It is generally better to have a different number of filters in different convolutional layers in a CNN.

In the earlier layers of the network, it is common to use a small number of filters, such as 32 or 64, to extract simple and general features from the input images. In the later layers of the network, a larger number of filters, such as 128 or 256, are often used to extract more complex and specific features.

Using different numbers of filters in different convolutional layers can help the model learn more efficiently and effectively. It allows the network to identify simple and general features in the early layers, and then build on those features with more complex and specific features in the later layers. Additionally, using fewer filters in the early layers can help to reduce the number of parameters in the network, which can help to prevent overfitting.

Here's an explanation of the architecture of the network. Simply put, it is a CNN with multiple convolutional and max pooling layers, followed by a flatten layer, a fully connected layer and a binary classification output layer, which is commonly used for image classification tasks.

<b>0. Sequential model</b>

A Sequential model is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor.  This allows us to build a linear stack of layers.

<b>1-1. Input layer</b>

This layer accepts the input image data, which is typically in the form of a 2D or 3D array, depending on the color channels of the image. In our case, we have 1920 x 1080 RGB pictures so the input_shape would be (1920, 1080, 3).

<b>1-2. Convolutional layer</b>

This layer creates a convolution kernel that is convolved with the layer input to produce a tensor of outputs.
To put it differently, this layer performs feature extraction by applying a set of filters to the input image. Each filter detects a specific feature, such as edges, corners, or blobs. The output of each filter is a feature map, which highlights the presence of that feature in different parts of the input image.

In our CNN, Conv2D from Keras is used, which stands for 2-dimensional convolution.

The first parameter of Conv2D (i.e. filters) is the dimensionality of the output space, that is the number of output filters in the convolution. In the code, the first Conv2D layer has 32 filters, the second has 64 filters, and the third has 128 filters. These filters are applied to the input image to extract features that are relevant to the classification task. Increasing the number of filters can help the model learn more complex and abstract features, but also increases the number of parameters in the model, which can make training slower and more computationally intensive.

The second parameter (i.e. kernel_size) is the kernel size, specifying the height and width of the 2D convolution window. For binary image classification problems, the typical kernel sizes for the first convolutional layer are in the range of 3x3 to 7x7. Larger kernel sizes may be used for input images with larger spatial dimensions. Smaller kernel sizes can capture fine-grained details in the input image, while larger kernel sizes can capture more global features.

The Activation parameter refers to the non-linear function applied to the output of a layer, which adds non-linearity to the model,  allowing it to learn more complex features from the input data. Activation functions are typically applied after the linear transformation of the input data by a layer's weights and biases. This output is then passed through the activation function, which transforms the input into a new output.

ReLU (Rectified Linear Unit) is a popular choice for most applications due to its simplicity and effectiveness in reducing the vanishing gradient problem, and sigmoid can be used for binary classification problems. Both activation functions are available in Keras.

<b>2. Pooling layer</b>

This layer downsamples the feature maps produced by the convolutional layers by taking the maximum or average value within small regions of the feature maps. This helps to reduce the dimensionality of the feature maps and makes the network more computationally efficient.

In a Convolutional Neural Network (CNN), pooling layers are commonly used to reduce the spatial dimensions of the input volume (i.e., the height and width dimensions) while preserving the depth dimension. Max pooling and average pooling are two common types of pooling operations used in CNNs.

Max pooling takes the maximum value of each non-overlapping rectangular sub-region in the input volume and uses that as the output value for that region. This operation is called "max" pooling because it retains the largest (max) value from each region. Max pooling is useful for detecting the presence of a particular feature or pattern in an input volume, as it retains the strongest activation signal in each region.

Average pooling takes the average value of each non-overlapping rectangular sub-region in the input volume and uses that as the output value for that region. This operation is called "average" pooling because it takes the average value from each region. Average pooling is useful for reducing the spatial dimensions of an input volume while preserving the overall structure of the input, as it retains a more generalized representation of the input volume.

In general, max pooling is more commonly used in CNNs because it has been found to work better in practice, especially for tasks like object recognition. However, average pooling can also be useful in some cases, such as for tasks like semantic segmentation where spatial resolution is important.

In our CNN, max pooling with a 2x2 pooling window, as specified in the pool_size parameter, is used. This means that the pooling layer will take the max value over a 2x2 pooling window.

<b>3. Flatten layer</b>

This layer reshapes the output of the previous layers into a 1D array (or one-dimensional vector), which can be fed into a fully connected layer. Without the flatten layer, the output of the final convolutional layer would be a 3D tensor with a fixed spatial structure, which cannot be directly fed into a dense layer (or fully connected layer) that expects a 1D tensor. 

<b>4. Fully connected (dense) layer</b>

This layer performs the final classification by combining the features extracted by the convolutional layers and making a prediction based on them. The output of the final fully connected layer is a probability score indicating the likelihood of the input image belonging to each of the two classes. By fully connected, it means that every neuron in the previous layer is connected to every neuron in the current layer.

<b>5. Output layer</b>

This layer produces the final binary classification decision based on the probability scores generated by the previous layers. In our code, it is another dense layer with 1 neurons and sigmoid activation function. The sigmoid function squashes the output between 0 and 1, which can be interpreted as the probability of the input image belonging to the positive class.

In our CNN, the final layer is another dense layer with a single unit and 'sigmoid' activation function, which outputs the predicted probability of the input belonging to a certain class.

#
##### 3-2. Compile the model with accuracy and f1 score as the evaluation metrics during training and testing

In [8]:
# from sklearn.metrics import f1_score
from keras import backend as K
def f1_score(y_true, y_pred):
    y_true = K.round(y_true)
    y_pred = K.round(y_pred)
    tp = K.sum(y_true * y_pred)
    fp = K.sum((1 - y_true) * y_pred)
    fn = K.sum(y_true * (1 - y_pred))
    precision = tp / (tp + fp + K.epsilon())
    recall = tp / (tp + fn + K.epsilon())
    f1_score = 2 * precision * recall / (precision + recall + K.epsilon())
    return f1_score

model.compile(
    loss='binary_crossentropy',
    optimizer='adam',
    # metrics=['accuracy', f1_score],
    metrics=[f1_score]
    )

Binary cross-entropy is the most commonly used loss function for binary image classification tasks, where the output of the model is a probability distribution over two classes (i.e., flip or notflip in our case). Binary cross-entropy measures the difference between the predicted and true labels for each binary classification instance.

Adam is a popular optimizer that is often used for binary classification problems. It is an adaptive learning rate optimization algorithm that is well-suited for large datasets and high-dimensional parameter spaces, which is exactly our case.

For evaluating model performance during training and testing, we use f1 score since it's the success metric of the project.

### <b>4. Train the CNN model and evaluate model performance</b>

In [12]:
from tensorflow.keras.utils import image_dataset_from_directory

train_data_dir = './data/images/training'
test_data_dir = './data/images/testing'
# n_train_samples = len(array_dict["training"]["flip"]) + len(array_dict["training"]["notflip"])
# n_test_samples = len(array_dict["testing"]["flip"]) + len(array_dict["testing"]["notflip"])
batch_size = 32
epochs = 10 # 10~50

train_data, validate_data = image_dataset_from_directory(
    directory=train_data_dir,
    labels='inferred',
    label_mode='binary',
    color_mode='rgb',
    batch_size=batch_size,
    image_size=reduced_image_shape[:2], # Size to resize images to after they are read from disk, specified as (height, width).
    shuffle=True,
    seed=1,
    validation_split=0.2, # 20% of the data will be reserved for validation.
    subset='both', # Subset of the data to return. The utility returns a tuple of two datasets (the training and validation datasets respectively).
    crop_to_aspect_ratio=True # If True, resize the images without aspect ratio distortion. When the original aspect ratio differs from the target aspect ratio, the output image will be cropped so as to return the largest possible window in the image (of size `image_size`) that matches the target aspect ratio. By default (`crop_to_aspect_ratio=False`), aspect ratio may not be preserved.
)

test_data = image_dataset_from_directory(
    directory=test_data_dir,
    labels='inferred',
    label_mode='binary',
    color_mode='rgb',
    batch_size=batch_size,
    image_size=reduced_image_shape[:2],
    shuffle=True,
    seed=1,
    crop_to_aspect_ratio=True
)

history = model.fit(
    x=train_data, # Since we pass a generator to 'x', 'y' should not be specified (since targets will be obtained from 'x').
    batch_size=batch_size,
    epochs=epochs, # 10~50
    verbose=2, # This will output one line per epoch.
    # steps_per_epoch=n_train_samples // batch_size,
    validation_split=0.2,
    validation_data=validate_data
)

Found 2392 files belonging to 2 classes.
Using 1914 files for training.
Using 478 files for validation.
Found 597 files belonging to 2 classes.
Epoch 1/10
60/60 - 43s - loss: 0.4020 - f1_score: 0.8126 - val_loss: 0.4062 - val_f1_score: 0.8048 - 43s/epoch - 716ms/step
Epoch 2/10
60/60 - 44s - loss: 0.3126 - f1_score: 0.8783 - val_loss: 0.3556 - val_f1_score: 0.8761 - 44s/epoch - 735ms/step
Epoch 3/10
60/60 - 41s - loss: 0.2751 - f1_score: 0.8921 - val_loss: 0.3116 - val_f1_score: 0.8456 - 41s/epoch - 688ms/step
Epoch 4/10
60/60 - 46s - loss: 0.2602 - f1_score: 0.9053 - val_loss: 0.3164 - val_f1_score: 0.8584 - 46s/epoch - 767ms/step
Epoch 5/10
60/60 - 41s - loss: 0.1797 - f1_score: 0.9337 - val_loss: 0.2992 - val_f1_score: 0.9033 - 41s/epoch - 683ms/step
Epoch 6/10
60/60 - 48s - loss: 0.1553 - f1_score: 0.9379 - val_loss: 0.3128 - val_f1_score: 0.8643 - 48s/epoch - 797ms/step
Epoch 7/10
60/60 - 47s - loss: 0.1109 - f1_score: 0.9617 - val_loss: 0.1656 - val_f1_score: 0.9358 - 47s/epoch -

The batch size in a Convolutional Neural Network (CNN) refers to the number of images that are processed in a single forward/backward pass. The choice of batch size can impact the performance of your model, as well as the training time and memory requirements.

There is no hard and fast rule for choosing the batch size for a CNN model, as it depends on the specific architecture and dataset being used. However, here are some general guidelines that may help:

Consider your hardware limitations: If you have limited memory resources, you may need to choose a smaller batch size to prevent running out of memory. On the other hand, if you have a powerful GPU, you may be able to use a larger batch size to speed up training.

Consider the size of your dataset: If you have a large dataset, you may be able to use a larger batch size without overfitting. However, if your dataset is small, a smaller batch size may be necessary to prevent overfitting.

Consider the complexity of your model: If your model has a large number of parameters or is very deep, a smaller batch size may be necessary to prevent overfitting. On the other hand, if your model is relatively simple, you may be able to use a larger batch size without overfitting.

Experiment with different batch sizes: It's a good idea to experiment with different batch sizes and evaluate the performance of your model on a validation set. This will help you find the batch size that works best for your specific problem.

In general, batch sizes between 32 and 128 are commonly used for CNN models for image classification. However, the optimal batch size can vary depending on your specific problem and dataset, so it's important to experiment and find the best option for your situation.

During training, the model will iterate over the training data in batches, compute the gradients, and update the model parameters to minimize the loss. The validation data is also used periodically to evaluate the model performance on unseen data and prevent overfitting.

Once the training is complete, you can use the model.evaluate() method to compute the final loss and accuracy on the validation set, or use the model.predict() method to make predictions on new data. You can also save the trained model to disk using the model.save() method, so that you can reload it later and use it to make predictions on new data.

In [13]:
model.evaluate(test_data)



[0.16468897461891174, 0.9368169903755188]

In [14]:
import numpy as np
import pandas as pd

train_loss = history.history['loss']
train_f1 = history.history['f1_score']
val_loss = history.history['val_loss']
val_f1 = history.history['val_f1_score']

scores_list = []
for result in [train_loss, val_loss, train_f1, val_f1]:
    scores = [round(np.mean(result), 4), round(np.std(result), 4), round(np.max(result), 4), round(np.min(result), 4)]
    scores_list.append(scores)

pd.DataFrame(scores_list,
             columns=['Mean', 'Std', 'Max', 'Min'],
             index=['Train loss', 'Validate loss', 'Train f1', 'Validate f1']).T

Unnamed: 0,Train loss,Validate loss,Train f1,Validate f1
Mean,0.1906,0.2686,0.9256,0.8889
Std,0.111,0.085,0.0514,0.0437
Max,0.402,0.4062,0.9826,0.9411
Min,0.0603,0.1545,0.8126,0.8048


### <b>5. Save models for later use</b>

In [15]:
model.save('mon_reader')



INFO:tensorflow:Assets written to: mon_reader\assets


INFO:tensorflow:Assets written to: mon_reader\assets


### <b>6. Conclusion</b>