### <b>0. Import functions</b>

In [1]:
from utils.load import check_file_downloaded, extract_zip_file, load_images

import os
import warnings
warnings.filterwarnings("ignore")

### <b>1. Download ZIP file from Google Drive and unzip in into local drive</b>

In [2]:
# Details of the source file in G Drive
file_id = "1KDQBTbo5deKGCdVV_xIujscn5ImxW4dm"
file_url = f"https://drive.google.com/file/d/{file_id}"
zip_file_name = "images.zip"

# Details of local directories
root_path = os.getcwd()
download_path = root_path + "\\" + "data"
zip_file_path = download_path + "\\" + zip_file_name

Download the source file from G Drive if the file does not exist.

In [3]:
os.chdir(download_path)
file_exists = os.path.exists(zip_file_name)
if file_exists:
    print(f"File {zip_file_name} already exists in {download_path}.")
else:
    print("Downloading file from Google Drive.")
    print("This could take a few minutes.")
    !gdown 1KDQBTbo5deKGCdVV_xIujscn5ImxW4dm

File images.zip already exists in c:\Users\Admin\Documents\GitHub\Apziva\lnaNWaYIRf6JhvHJ\data.


Check if the downloading was successful.

In [4]:
check_file_downloaded(file_name=zip_file_name, default_path=root_path, download_path=download_path)

File images.zip exist in c:\Users\Admin\Documents\GitHub\Apziva\lnaNWaYIRf6JhvHJ\data!


Extract the zip file.

In [5]:
extract_zip_file(zip_file_path, download_path, zip_file_name)

images.zip already extracted in c:\Users\Admin\Documents\GitHub\Apziva\lnaNWaYIRf6JhvHJ\data.


### <b>2. Load image files</b>

Load images as is without any transformation such as converting to arrays for efficiency and memory saving.

In [6]:
array_dict = load_images(download_path, as_array=False)

Loading files in c:\Users\Admin\Documents\GitHub\Apziva\lnaNWaYIRf6JhvHJ\data\images
Loading files in c:\Users\Admin\Documents\GitHub\Apziva\lnaNWaYIRf6JhvHJ\data\images\testing
Loading files in c:\Users\Admin\Documents\GitHub\Apziva\lnaNWaYIRf6JhvHJ\data\images\testing\flip


100%|██████████| 290/290 [00:00<00:00, 1313.37it/s]


Loading files in c:\Users\Admin\Documents\GitHub\Apziva\lnaNWaYIRf6JhvHJ\data\images\testing\notflip


100%|██████████| 307/307 [00:00<00:00, 2249.07it/s]


Loading files in c:\Users\Admin\Documents\GitHub\Apziva\lnaNWaYIRf6JhvHJ\data\images\training
Loading files in c:\Users\Admin\Documents\GitHub\Apziva\lnaNWaYIRf6JhvHJ\data\images\training\flip


100%|██████████| 1162/1162 [00:00<00:00, 2411.35it/s]


Loading files in c:\Users\Admin\Documents\GitHub\Apziva\lnaNWaYIRf6JhvHJ\data\images\training\notflip


100%|██████████| 1230/1230 [00:00<00:00, 2300.92it/s]


Check the shape of the images.

In [7]:
from numpy import asarray

image_shape = None
for k, v in array_dict.items():
    for k2, v2 in v.items():
        for k3, v3 in v2.items():
            while image_shape == None:
                image_array = asarray(v3)
                image_shape = image_array.shape
                print(f"Image shape: {image_shape}")

Image shape: (1920, 1080, 3)


### <b>3. Define a CNN (Convolutional Neural Network)</b>

In [10]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
    
# 0. Initialize a Sequential model from Keras
model = Sequential()

# 1.  Add a convolutional layer. The first convolutional layer includes an input layer as specified by input_shape.
model.add(Conv2D(filters=32, kernel_size=(3, 3), activation='relu', input_shape=image_shape))

# 2. Add a max pooling layer
model.add(MaxPooling2D(pool_size=(2, 2)))

# Add another set of convolutional and pooling layers. For this convolutional layer, the number of output filer is 64.
model.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

# Add another set of convolutional and pooling layers. For this convolutional layer, the number of output filer is 128.
model.add(Conv2D(filters=128, kernel_size=(3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

# 3. Add a flatten layer
model.add(Flatten())

# 4. Add a dense (i.e. fully connected) layer with 128 neurons and a ReLU activation function
model.add(Dense(units=128, activation='relu'))

# A dropout layer can be added to deal with overfitting.
# It will randomly drop 50% of the neurons during training, which helps to reduce overfitting.
# model.add(Dropout(0.5))

# 5. Add an output layer, which is another dense layer with 1 neurons and a sigmoid activation function
model.add(Dense(units=1, activation='sigmoid'))

Here's an explanation of the architecture of the network. Simply put, it is a CNN with multiple convolutional and max pooling layers, followed by a flatten layer, a fully connected layer and a binary classification output layer, which is commonly used for image classification tasks.

<b>0. Sequential model</b>

A Sequential model is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor.  This allows us to build a linear stack of layers.

<b>1-1. Input layer</b>

This layer accepts the input image data, which is typically in the form of a 2D or 3D array, depending on the color channels of the image. In our case, we have 1920 x 1080 RGB pictures so the input_shape would be (1920, 1080, 3).

<b>1-2. Convolutional layer</b>

This layer creates a convolution kernel that is convolved with the layer input to produce a tensor of outputs.
To put it differently, this layer performs feature extraction by applying a set of filters to the input image. Each filter detects a specific feature, such as edges, corners, or blobs. The output of each filter is a feature map, which highlights the presence of that feature in different parts of the input image.

In our CNN, Conv2D from Keras is used, which stands for 2-dimensional convolution.

The first parameter of Conv2D (i.e. filters) is the dimensionality of the output space, that is the number of output filters in the convolution. In the code, the first Conv2D layer has 32 filters, the second has 64 filters, and the third has 128 filters. These filters are applied to the input image to extract features that are relevant to the classification task. Increasing the number of filters can help the model learn more complex and abstract features, but also increases the number of parameters in the model, which can make training slower and more computationally intensive.

The second parameter (i.e. kernel_size) is the kernel size, specifying the height and width of the 2D convolution window. For binary image classification problems, the typical kernel sizes for the first convolutional layer are in the range of 3x3 to 7x7. Larger kernel sizes may be used for input images with larger spatial dimensions. Smaller kernel sizes can capture fine-grained details in the input image, while larger kernel sizes can capture more global features.

The Activation parameter refers to the non-linear function applied to the output of a layer, which adds non-linearity to the model,  allowing it to learn more complex features from the input data. Activation functions are typically applied after the linear transformation of the input data by a layer's weights and biases. This output is then passed through the activation function, which transforms the input into a new output.

ReLU (Rectified Linear Unit) is a popular choice for most applications due to its simplicity and effectiveness in reducing the vanishing gradient problem, and sigmoid can be used for binary classification problems. Both activation functions are available in Keras.

<b>2. Pooling layer</b>

This layer downsamples the feature maps produced by the convolutional layers by taking the maximum or average value within small regions of the feature maps. This helps to reduce the dimensionality of the feature maps and makes the network more computationally efficient.

In a Convolutional Neural Network (CNN), pooling layers are commonly used to reduce the spatial dimensions of the input volume (i.e., the height and width dimensions) while preserving the depth dimension. Max pooling and average pooling are two common types of pooling operations used in CNNs.

Max pooling takes the maximum value of each non-overlapping rectangular sub-region in the input volume and uses that as the output value for that region. This operation is called "max" pooling because it retains the largest (max) value from each region. Max pooling is useful for detecting the presence of a particular feature or pattern in an input volume, as it retains the strongest activation signal in each region.

Average pooling takes the average value of each non-overlapping rectangular sub-region in the input volume and uses that as the output value for that region. This operation is called "average" pooling because it takes the average value from each region. Average pooling is useful for reducing the spatial dimensions of an input volume while preserving the overall structure of the input, as it retains a more generalized representation of the input volume.

In general, max pooling is more commonly used in CNNs because it has been found to work better in practice, especially for tasks like object recognition. However, average pooling can also be useful in some cases, such as for tasks like semantic segmentation where spatial resolution is important.

In our CNN, max pooling with a 2x2 pooling window, as specified in the pool_size parameter, is used. This means that the pooling layer will take the max value over a 2x2 pooling window.

<b>3. Flatten layer</b>

This layer reshapes the output of the previous layers into a 1D array (or one-dimensional vector), which can be fed into a fully connected layer. Without the flatten layer, the output of the final convolutional layer would be a 3D tensor with a fixed spatial structure, which cannot be directly fed into a dense layer (or fully connected layer) that expects a 1D tensor. 

<b>4. Fully connected (dense) layer</b>

This layer performs the final classification by combining the features extracted by the convolutional layers and making a prediction based on them. The output of the final fully connected layer is a probability score indicating the likelihood of the input image belonging to each of the two classes. By fully connected, it means that every neuron in the previous layer is connected to every neuron in the current layer.

<b>5. Output layer</b>

This layer produces the final binary classification decision based on the probability scores generated by the previous layers. In our code, it is another dense layer with 1 neurons and sigmoid activation function. The sigmoid function squashes the output between 0 and 1, which can be interpreted as the probability of the input image belonging to the positive class.

In our CNN, the final layer is another dense layer with a single unit and 'sigmoid' activation function, which outputs the predicted probability of the input belonging to a certain class.

In [None]:
# source: https://aakashgoel12.medium.com/how-to-add-user-defined-function-get-f1-score-in-keras-metrics-3013f979ce0d

def get_f1(y_true, y_pred): #taken from old keras source code
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
    predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
    precision = true_positives / (predicted_positives + K.epsilon())
    recall = true_positives / (possible_positives + K.epsilon())
    f1_val = 2*(precision*recall)/(precision+recall+K.epsilon())
    return f1_val

model.compile(loss='binary_crossentropy',
              optimizer='rmsprop',
            #   metrics=[Precision(), Recall(), F1Score(num_classes=2)])
              metrics=[get_f1])

Finally, model.compile() is used to compile the model with a binary cross-entropy loss function, the RMSprop optimizer, and the accuracy metric, which will be used to evaluate the performance of the model during training.

* RMSprop stands for Root Mean Square Propagation. It is a gradient descent-based optimization algorithm for neural networks, and it is used to update the weights of the network during training. RMSprop tries to resolve the problems of AdaGrad by using an exponentially decaying average of past gradients.

    In RMSprop, the running average of the squared gradient is used to normalize the gradient before updating the weights. This has the effect of scaling down the learning rate for dimensions with high variance and scaling up the learning rate for dimensions with low variance.

    RMSprop has been found to be effective in deep learning, particularly in recurrent neural networks, where it has been shown to converge faster than other optimization algorithms.

### <b>4. Train the CNN model with train and validation/test data</b>

In [None]:
train_data_dir = './data/images/training'
validation_data_dir = './data/images/testing'
nb_train_samples = len(array_dict["training"]["flip"]) + len(array_dict["training"]["notflip"])
nb_validation_samples = len(array_dict["testing"]["flip"]) + len(array_dict["testing"]["notflip"])
epochs = 10
batch_size = 32

train_datagen = ImageDataGenerator(
	# rescale=1. / 255,
	# shear_range=0.2,
	# zoom_range=0.2,
	# horizontal_flip=True
    )

test_datagen = ImageDataGenerator(
    # rescale=1. / 255
    )

import random
random.seed(1)
train_generator = train_datagen.flow_from_directory(
	train_data_dir,
	target_size=(img_width_reduced, img_height_reduced),
	batch_size=batch_size,
	class_mode='binary',
	seed=random.seed(1))

validation_generator = test_datagen.flow_from_directory(
	validation_data_dir,
	target_size=(img_width_reduced, img_height_reduced),
	batch_size=batch_size,
	class_mode='binary',
	seed=random.seed(1))

Found 2392 images belonging to 2 classes.
Found 597 images belonging to 2 classes.


In [None]:
model.fit_generator(
	train_generator,
	steps_per_epoch=nb_train_samples // batch_size,
	epochs=epochs,
	validation_data=validation_generator,
	validation_steps=nb_validation_samples // batch_size)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x245bc058af0>

The next step after calling model.fit_generator() with the specified arguments is to wait for the training process to complete. During training, the model will iterate over the training data in batches, compute the gradients, and update the model parameters to minimize the loss. The validation data is also used periodically to evaluate the model performance on unseen data and prevent overfitting.

Once the training is complete, you can use the model.evaluate() method to compute the final loss and accuracy on the validation set, or use the model.predict() method to make predictions on new data. You can also save the trained model to disk using the model.save() method, so that you can reload it later and use it to make predictions on new data.

In [None]:
from sklearn.metrics import classification_report

# make predictions on the test set
y_pred = model.predict(validation_generator)

# convert predictions from probabilities to labels
y_pred = [1 if pred > 0.5 else 0 for pred in y_pred]

# print the classification report containing precision, recall and F1 score
y_true = []
for i in range(len(validation_generator)):
    _, labels = validation_generator[i]
    y_true.extend(labels)
print(classification_report(y_true, y_pred))

              precision    recall  f1-score   support

         0.0       0.00      0.00      0.00       290
         1.0       0.51      1.00      0.68       307

    accuracy                           0.51       597
   macro avg       0.26      0.50      0.34       597
weighted avg       0.26      0.51      0.35       597



In [None]:
from sklearn.metrics import f1_score

# Generate predictions
y_pred = model.predict(validation_generator)
y_pred = y_pred.round()

# Extract true labels
y_true = []
for i in range(len(validation_generator)):
    _, labels = validation_generator[i]
    y_true.extend(labels)

# Calculate F1 score
f1 = f1_score(y_true, y_pred, average='macro')
f1



0.3396017699115044

In [None]:
model.evaluate(validation_generator)



[0.6927505135536194, 0.6705861687660217]

In [None]:
# from keras.models import load_model
# from keras.preprocessing.image import load_img
# from keras.preprocessing.image import img_to_array
# from keras.applications.vgg16 import preprocess_input
# from keras.applications.vgg16 import decode_predictions
# from keras.applications.vgg16 import VGG16
# import numpy as np

# from keras.models import load_model

# model = load_model('model_saved.h5')

# image = load_img('v_data/test/planes/5.jpg', target_size=(224, 224))
# img = np.array(image)
# img = img / 255.0
# img = img.reshape(1,224,224,3)
# label = model.predict(img)
# print("Predicted Class (0 - Cars , 1- Planes): ", label[0][0])


https://www.geeksforgeeks.org/python-image-classification-using-keras/

https://medium.com/techiepedia/binary-image-classifier-cnn-using-tensorflow-a3f5d6746697

In [None]:
# # # Python program to create
# # # Image Classifier using CNN

# # # Importing the required libraries
# # import cv2
# import os
# import numpy as np
# from random import shuffle
# from tqdm import tqdm

# # '''Setting up the env'''

# # TRAIN_DIR = 'E:/dataset / Cats_vs_Dogs / train'
# # TEST_DIR = 'E:/dataset / Cats_vs_Dogs / test1'
# # IMG_SIZE = 50
# LR = 1e-3


# # '''Setting up the model which will help with tensorflow models'''
# # MODEL_NAME = 'dogsvscats-{}-{}.model'.format(LR, '6conv-basic')

# # '''Labelling the dataset'''
# # def label_img(img):
# # 	word_label = img.split('.')[-3]
# # 	# DIY One hot encoder
# # 	if word_label == 'cat': return [1, 0]
# # 	elif word_label == 'dog': return [0, 1]

# # '''Creating the training data'''
# # def create_train_data():
# # 	# Creating an empty list where we should store the training data
# # 	# after a little preprocessing of the data
# # 	training_data = []

# # 	# tqdm is only used for interactive loading
# # 	# loading the training data
# # 	for img in tqdm(os.listdir(TRAIN_DIR)):

# # 		# labeling the images
# # 		label = label_img(img)

# # 		path = os.path.join(TRAIN_DIR, img)

# # 		# loading the image from the path and then converting them into
# # 		# grayscale for easier covnet prob
# # 		img = cv2.imread(path, cv2.IMREAD_GRAYSCALE)

# # 		# resizing the image for processing them in the covnet
# # 		img = cv2.resize(img, (IMG_SIZE, IMG_SIZE))

# # 		# final step-forming the training data list with numpy array of the images
# # 		training_data.append([np.array(img), np.array(label)])

# # 	# shuffling of the training data to preserve the random state of our data
# # 	shuffle(training_data)

# # 	# saving our trained data for further uses if required
# # 	np.save('train_data.npy', training_data)
# # 	return training_data

# # '''Processing the given test data'''
# # # Almost same as processing the training data but
# # # we dont have to label it.
# # def process_test_data():
# # 	testing_data = []
# # 	for img in tqdm(os.listdir(TEST_DIR)):
# # 		path = os.path.join(TEST_DIR, img)
# # 		img_num = img.split('.')[0]
# # 		img = cv2.imread(path, cv2.IMREAD_GRAYSCALE)
# # 		img = cv2.resize(img, (IMG_SIZE, IMG_SIZE))
# # 		testing_data.append([np.array(img), img_num])
		
# # 	shuffle(testing_data)
# # 	np.save('test_data.npy', testing_data)
# # 	return testing_data

# # '''Running the training and the testing in the dataset for our model'''
# # train_data = create_train_data()
# # test_data = process_test_data()

# # # train_data = np.load('train_data.npy')
# # # test_data = np.load('test_data.npy')
# '''Creating the neural network using tensorflow'''
# # Importing the required libraries
# import tflearn
# from tflearn.layers.conv import conv_2d, max_pool_2d
# from tflearn.layers.core import input_data, dropout, fully_connected
# from tflearn.layers.estimator import regression

# import tensorflow as tf
# tf.compat.v1.reset_default_graph()
# convnet = input_data(shape =[None, 1920, 1080, 3], name ='input')

# convnet = conv_2d(convnet, 32, 5, activation ='relu')
# convnet = max_pool_2d(convnet, 5)

# # convnet = conv_2d(convnet, 64, 5, activation ='relu')
# # convnet = max_pool_2d(convnet, 5)

# # convnet = conv_2d(convnet, 128, 5, activation ='relu')
# # convnet = max_pool_2d(convnet, 5)

# # convnet = conv_2d(convnet, 64, 5, activation ='relu')
# # convnet = max_pool_2d(convnet, 5)

# # convnet = conv_2d(convnet, 32, 5, activation ='relu')
# # convnet = max_pool_2d(convnet, 5)

# convnet = fully_connected(convnet, 1024, activation ='relu')
# convnet = dropout(convnet, 0.8)

# convnet = fully_connected(convnet, 2, activation ='softmax')
# convnet = regression(convnet, optimizer ='adam', learning_rate = LR,
# 	loss ='categorical_crossentropy', name ='targets')

# model = tflearn.DNN(convnet, tensorboard_dir ='log')

# # Splitting the testing data and training data
# # train = train_data[:-500]
# # test = train_data[-500:]

# '''Setting up the features and labels'''
# # X-Features & Y-Labels

# train_X = np.array([i[0] for i in train_data]).reshape(-1, 1920, 1080, 3)
# train_y = np.array([i[1] for i in train_data])
# test_X = np.array([i[0] for i in test_data]).reshape(-1, 1920, 1080, 3)
# test_y = np.array([i[1] for i in test_data])

# '''Fitting the data into our model'''
# # epoch = 5 taken
# model.fit({'input': train_X}, {'targets': train_y}, n_epoch = 5,
# 	validation_set =({'input': test_X}, {'targets': test_y}),
# 	snapshot_step = 500, show_metric = True, run_id = "initial.model")
# model.save("initial.model")

# '''Testing the data'''
# import matplotlib.pyplot as plt
# # if you need to create the data:
# # test_data = process_test_data()
# # if you already have some saved:
# test_data = np.load('test_data.npy')

# fig = plt.figure()

# for num, data in enumerate(test_data[:20]):
# 	# cat: [1, 0]
# 	# dog: [0, 1]
	
# 	img_num = data[1]
# 	img_data = data[0]
	
# 	y = fig.add_subplot(4, 5, num + 1)
# 	orig = img_data
# 	data = img_data.reshape(1920, 1080, 3)

# 	# model_out = model.predict([data])[0]
# 	model_out = model.predict([data])[0]
	
# 	if np.argmax(model_out) == 1: str_label ='Dog'
# 	else: str_label ='Cat'
		
# 	y.imshow(orig, cmap ='gray')
# 	plt.title(str_label)
# 	y.axes.get_xaxis().set_visible(False)
# 	y.axes.get_yaxis().set_visible(False)
# plt.show()