Hello everyone and welcome. 

In this notebook I will explain the convolutional neural networks, how it's work and why we use it in computer vision problems 
instead of using the traditional Multi-Layer Perceptron (MLP). 

First we will try to classify the image using traditional MLP (The results will be bad ☹️) and explain why the results was 
bas when we use the MLP in image classification. 

Next we will see the components of Convolutional Neural Network CNN and see how it works and why it works better than the MLP
in image classification problems. 

Finally we will see an example of CNN and compaire its result the the MLP. 

In [None]:
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
import tensorflow as tf 
import tensorflow_datasets as tfds
import pathlib
import os
from sklearn.model_selection import train_test_split

# 1- Input pipeline

In [None]:
data_path = pathlib.Path(r"../input/a-large-scale-fish-dataset/Fish_Dataset/Fish_Dataset")

In [None]:
#list of pathes for all images 
all_images = list(data_path.glob(r'*/*/*.jpg')) + list(data_path.glob(r'*/*/*.png'))

images = []
labels = []

#looping through the pathes to extract pathes and labels
for item in all_images:
    
    path = os.path.normpath(item)
    splits = path.split(os.sep)
    
    if 'GT' not in splits[-2]:
    
        images.append(item)
    
        label = splits[-2]
        labels.append(label)

In [None]:
# Dataframe with two columns: image_path, label 
image_pathes = pd.Series(images).astype(str)
labels = pd.Series(labels)

dataframe = pd.concat([image_pathes, labels], axis=1)

dataframe.columns = ['images', 'labels']

dataframe.head()

In [None]:
fig, axes = plt.subplots(nrows=3, ncols=5, figsize=(15,10), subplot_kw={'xticks':[], 'yticks':[]})
for i, ax in enumerate(axes.flat):
    ax.imshow(plt.imread(dataframe.images[i]))
    ax.set_title(dataframe.labels[i])
    
plt.show()

In [None]:
#Shuffle the dataframe rows and split it to train, val, test splits

In [None]:
shuffled_dataframe = dataframe.sample(frac = 1)

In [None]:
all_train, test = train_test_split(shuffled_dataframe, test_size=0.15, random_state=0)
train, val = train_test_split(all_train, test_size=0.17, random_state=0)

In [None]:
training_data_gen = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1/255.0,
                                                                   rotation_range=40,
                                                                   zoom_range=0.2,
                                                                   width_shift_range=0.2,
                                                                   height_shift_range=0.2,
                                                                   shear_range=0.2,
                                                                   horizontal_flip=True,
                                                                   vertical_flip=True)

training_generator = training_data_gen.flow_from_dataframe(dataframe=train,
                                                           x_col='images', y_col='labels',
                                                           target_size=(224, 224),
                                                           color_mode='rgb',
                                                           class_mode='categorical',
                                                           batch_size=64)
#------------------------------------------------------------------------------------------------------
val_data_gen = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1/255.0)
validation_generator = val_data_gen.flow_from_dataframe(dataframe=val,
                                                        x_col='images', y_col='labels',
                                                        target_size=(224, 224),
                                                        color_mode='rgb',
                                                        class_mode='categorical',
                                                        batch_size=64)
#------------------------------------------------------------------------------------------------------
test_data_gen = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1/255.0)
test_generator = test_data_gen.flow_from_dataframe(dataframe=test,
                                                   x_col='images', y_col='labels',
                                                   target_size=(224, 224),
                                                   color_mode='rgb',
                                                   class_mode='categorical',
                                                  batch_size=64)

# 2- Image classification using Multi Layer Perceptron (MLP)

We will first discuss the use of MLP in image classification problems. 

- MLP consists of **Input Layer** and the input must be 1D vector so the first thing we shoud do is to flatten the input images form 2D matrix to 1D vector to train the network. We will use keras Flatten layer that takes 2D image matrix and convert it  into a 1D vector. <br>


- The second thing we add is the **hidden layers**, you can add as much as you want, but generally it depends on the complexity of the data and the task you have, and we eill choose the activation function, Relu function performs the best in Hidden layers. <br> 


- Finally the **output layer**. The number of neurons is equal to the number of classes we have, and the activation function  is Softmax since we have a multi-class classification  problem. <br>

You can learn more about MLP at: 
    
- <a href="https://www.kaggle.com/general/265058">Multilayer Perceptron (MLP).</a>  <br>


- <a href="https://www.kaggle.com/general/265073">summary for Multilayer Perceptron (MLP) and backpropagation algorithm.</a> <br>


- <a href="https://www.kaggle.com/general/265279">Activation function, why do we need activation function in neural networks.</a> <br>


- <a href="https://www.kaggle.com/general/265294">Loss Function & Optimization.</a>

In [None]:
mlp_model = tf.keras.models.Sequential()

#Flatten layer
mlp_model.add(tf.keras.layers.Flatten(input_shape=(224, 224, 3)))

# 3 Hidden Layers with (256, 256, 128) neurons and relu activation function
mlp_model.add(tf.keras.layers.Dense(256, activation='relu'))
# dropout layer to reduce the overfitting 
mlp_model.add(tf.keras.layers.Dropout(0.4))
mlp_model.add(tf.keras.layers.Dense(256, activation='relu'))
mlp_model.add(tf.keras.layers.Dense(128, activation='relu'))

# output layer with 9 neurons and softmax activation function
mlp_model.add(tf.keras.layers.Dense(9, activation = 'softmax'))

In [None]:
# we can se the network using: 

tf.keras.utils.plot_model(mlp_model,
                          show_shapes=True,
                          show_dtype=True,
                          show_layer_names=True)

In [None]:
mlp_model.summary()

We can see the Trainable params which they are the parameters that will be adjusted and learned during the training process. <br>


- The first hidden layer has (224 * 224*3 nodes in the input layer) * (256 nodes in this hidden layer) + 256 bias terms = 38535424 learnable parameters. <br>


- The second Hidden layer has (256 nodes in the previous hidden layer) * (256 nodes in this hidden layer) + 256 bias terms =  **65792 learnable parameters.** <br>


- The last hidden layer has (256 nodes in the previous hidden layer) * (128 nodes in this hidden layer) + 128 bias terms =  **32896 learnable parameters.**  <br>


- Finally the output layer has (128 nodes in the previous hidden layer) * (9 nodes in the output layer) + 9 bias terms =  **1161 learnable parameters.** <br>


The total number of Trainable params: **24,983,305** and this is a huge number 💔 to such small neural network. What will happen if we add more and more layers or we resize the images to larger sizes, there will be tens of millions of parameters. 

Compile the model you can learn more about loss function and optimization at: <a href="https://www.kaggle.com/general/265294">Loss Function & Optimization.</a>

In [None]:
mlp_model.compile(loss='categorical_crossentropy',
                 optimizer='rmsprop',
                 metrics=['acc'])

In [None]:
mlp_model.fit(training_generator, 
             steps_per_epoch=24, 
             validation_data=validation_generator,
             validation_steps=20,
             epochs=5)

We will loss the spatial features of the image  when we flattening the image to 2D vector, we will loss 
a lot of information and the network does not relate the pixel values to each other when it is trying to find patterns 
thats why we get a very bad accuracy when we use MLP in such problem. 

**Why??** 


**loss of information** 🤕

when we Flatten the image to be a 1D vector, the pixel values that present the fish 🐟 will be distributed in a certain way 
in the vector lets say in the left side of the image, if we have a new image that has the same object but in different 
location in the image, the neural network will not recognize it because different neurons need to fires in order to recognize 
the fish, the neural network will have no idea that this is the same fish. But why it does better than that on the MNIST data set, because MNIST data are well prepared  for this task. The MLP will not learn the fish shape. 


**very large number of parameters** 🤕

Another problem with the MLP is that it is an Fully connected layer, where every node in the layer is connected to all nodes 
of the previous layer and all nodes in the next layer. You saw that with this simple network we have more that 24 million parameters to learn, with more complex network and larger image size we will end up with billions of parameters to train and it is very computationally  expensive. 

**Next we will use Convolutional neural networks to train the classifier**

# 3- Image classification using Convolutional Neural Networks (CNN)

## 3.1 Introduction

**Convolutional Neural Networks (CNN)** are locally connected, each node is connected to amall subset of the previous units.
And it can learn the object features such as a car wheel or a cat nose. 

In [None]:
from IPython.display import Image
Image(url='https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRwed5zvnSDt0zrFd_gf-kUIMoF7Nm6FXIwDw&usqp=CAU',
      width=750,
      height=500)

The Convolutional Neural Networks (CNN) has an input layer and a stack of hidden layers then the output layer just like the Fully Connected Network, the weights are randomly inialized, activation function is applied, we use back propagation and optimization algorithms to update the weights. The CNN does not need FLatten layer, it takes the whole image matrix as an input. 

The network consist of 2 main parts: 

1- Feature extraction part <br>
2- classification part <br>

- The image is passed to the CNN layers to extract features that called **feature map**, through this process the image dimension will shrinks and the number of feature  maps will increase. 

- The output of the final layer in the CNN is the feature that was extracted form the image and it represent the image and also this is what we are going to classify. Flatten layer used to flatten the feature map and then pass it to a fully connected layer for classification. 

- The earlier layers in the CNN will learn basic features such as edges and the later layers will learn more complex features such as a wheel of a car or the nose of a cat. 

So **to summarize** what we learn until now, we have an input image that passed directly to the network without flattening it, the the first part (the feature extractor) extract features that represent the input image and give us what we call it a feature map , and finally this feature map to the classifier that consists of Fully connected layers that perform the classification. 

In [None]:
Image(url='https://www.researchgate.net/profile/Lavender-Jiang-2/publication/343441194/figure/fig2/AS:921001202311168@1596595206463/Basic-CNN-architecture-and-kernel-A-typical-CNN-consists-of-several-component-types.ppm',
      width=750,
      height=500)

So as I saied the network consists of 2 main parts: the CNN and the FCN 
    
The CNN consist of **two main components:**
    
    1- Convolutional Layer (CONV)
    2- Pooling Layer  (POOL)

**INPUT   >   CONV   >   POOL   >   CONV   >   POOL   >   CONV   >   POOL >  FC >   FC (softmax)**

So the full architecture consists of Convolutional Layers , Pooling Layers and Fully Connected Layers. 

## 3.2- Convolutional Layers

Convolutional Layer is the basis of the Convolutional Networks, the units here is the **filters** and also called kernels and they works by sliding the filter over the input image, apply some processing at each location and store the result in the new processed image as we see in the figure below. 

In CNN these filters are the weights, their values are intialized randomly and learned during the training process. The area of the image that the filter process on is called receptive field. 


The math here is very simple, at each location the filter stop we multiply each pixel value in the receptive field by the corresponding pixel in the filter and then sum them to get the value of the center pixel in the new image as we see in the figure below. 

If the images are colored lets say with size of (520, 520, 3) where 3 is the number of channels (Red, Green and Blue (RGB)) the filter also will be (3, 3, 3) where the last 3 is number of channels in the input image. We have filter of size (3,3) for each channel in the input image and for each channel we do the same calculation then add the results of the filter processing on the 3 channels. 

In sime application colors are very important to identify objects. 

In [None]:
Image(url='https://i.stack.imgur.com/CQtHP.gif',
      width=750,
      height=500)

Filters are parameters that learned during the training process. some filters can detect vertical edges while other might 
detect horizontal edges and other detect different things. 

so we can see that each filter produce its own feature map, the conv layer has many of those filters so the number of filters
in the conv layer determine the number of feature maps that produced after applying this layer on the input. And the number 
of filters represent the depth of the output of the layer. As we increase the number of filters, the complexity of the network
increase and enable use to detect more complex features. 


This kernel is just a matrix of weights that  are learned during the training process this filter is works by sliding over
the image to extract features. kernels are almost always squares and can have different sizes (3x3), (5x5), (7x7) it is an
hyperparameter that you can tune and the performance will be different based on the problem that you are solving. 

**Strides and Padding**

In [None]:
Image(url='https://miro.medium.com/max/658/0*jLoqqFsO-52KHTn9.gif',
      width=750,
      height=500)

#Source: https://towardsdatascience.com/cnn-part-i-9ec412a14cb1

As we can see from the figure above the conv operation output is smaller that the input, we can control the shape 
of the output of this operation  by the **stride and padding**

Stride is the amount by which the filter slides  over the image, for example if stride = 1 then the filter will slide one pixel at a time, and if stride = 2 it will slide 2 pixel at a time. 


padding: We add zeros around the border of the image to preserve the spatial size of the image so we can train a deeper network and prevent losing  information from the edges of the image. 

the values for padding is either  2: 'same' where we add zeros around the image border in which the size of the output image is the same as the input image. 2: 'valid' which means without padding. 

In [None]:
Image(url='https://res.cloudinary.com/practicaldev/image/fetch/s--nUoflRuG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i.ibb.co/kG5vPdn/final-cnn.png',
      width=750,
      height=500)

#Source: https://towardsdatascience.com/cnn-part-i-9ec412a14cb1

So **to summarize** what we learn until now, the first component  of the CNN is the Conv layer, in this layer we have a lot of hyperparameters that we can tune to control the output of it. The Conv layers contain filters (kernels) that contain the weights that learned during training, these filters works by sliding them over the image and each filter detect a certain  feature, where the filters in the first layers detect simple things like edges and when we go deeper in the network, the filters will detect more complex features. Each filter in the layer produce a feature map and the feature maps of all filters in the layer are concatenated and this is the output of the layer. to control the shape of th output of the layer we have two additional hyperparameters, the stride which control the amount by which the filter slides  over the image and the padding where we add zeros around the border of the image. 

Example: if the input image is 224,224,3 to a Conv layer with 64 filters of size (3,3), stride of 1, and no padding ('valid'). The output will be of size {(224 + 0 -3 /1) +1 = 222} (222, 222, 64) where the 3 in the input image size is the RGB channels and the 64 in the output means 64 feature map produced by the 64 filters. 

## 3.3- Pooling Layers

As we go deeper in the ConvNet the number of the feature maps (depth) is increase which lead to the increase to the number 
of parameters that will increase the computational  power and memory needed. 

Pooling layer reduce the size of the feature map (not the depth it reduce the height and the width) with two types of pooling
max pooling and Average pooling. 

In [None]:
Image(url='https://nico-curti.github.io/NumPyNet/NumPyNet/images/maxpool.gif',
      width=750,
      height=500)

In pooling we also have a window with specified size that sliding on the feature map but without weights, the window take the 
max pixel value and ignore the rest values in max pooling, or take the average of the values in average pooling, so the pooling
layer help us to keep the important information and pass them to the next layer. 

**Where we put the pooling layers in the CNN?**

usually pooling layer are placed after one or two convolutional  layers.

the output of the final pooling layer after a sequence  of conv and pool layers will be for example (7, 7, 120) where the 120 id the depth (number of feature maps) now this output is ready for classification but first we need to flatten it then feeding it to the fully connected layers. 

Pooling layers don't  have any parameters to train. 

## 3.4- Fully Conected Layers

We pass the image through our feature extractor that composed of convolutional and pooling layer and give us the feature 
that we will pass through the FCL for classification. 

in This stage, we flatten the output of the feature extractor lets say for example its (7, 7, 40) wen we flattening it we get a vector
of size 7*7*40 = 1960, then we feed this vector to a fully connected  layers (we can use for example one layer with 256 units
with relu activation function and other layer with 9 units with softmax activation function for classification)

the number of units in the last layer will equal to the number of classes where each node represents  the probability of each class.

## 3.5 Image classification using CNN

In [None]:
# Building the model 
cnn_model = tf.keras.models.Sequential()

#----------------------------------------------------------------------------------------------

#Conv layer: 32 filters of size (3, 3), withstrides = 1 and relu activation
cnn_model.add(tf.keras.layers.Conv2D(32, kernel_size=(3, 3), strides=1, 
                                     activation='relu', input_shape=(224,224,3)))
#max-poolig layer with pool_size of (2,2)
cnn_model.add(tf.keras.layers.MaxPool2D(pool_size=(2,2)))

#----------------------------------------------------------------------------------------------

#Conv layer: 64 filters of size (3, 3), withstrides = 1 and relu activation
cnn_model.add(tf.keras.layers.Conv2D(64, kernel_size=(3, 3), strides=1, 
                                    activation='relu'))
#max-poolig layer with pool_size of (2,2)
cnn_model.add(tf.keras.layers.MaxPool2D(pool_size=(2,2)))

#----------------------------------------------------------------------------------------------

#Conv layer: 128 filters of size (3, 3), withstrides = 1 and relu activation
cnn_model.add(tf.keras.layers.Conv2D(64, kernel_size=(3, 3), strides=1, 
                                    activation='relu'))
#max-poolig layer with pool_size of (2,2)
cnn_model.add(tf.keras.layers.MaxPool2D(pool_size=(2,2)))

#----------------------------------------------------------------------------------------------

#Conv layer: 128 filters of size (3, 3), withstrides = 1 and relu activation
cnn_model.add(tf.keras.layers.Conv2D(128, kernel_size=(3, 3), strides=1, 
                                    activation='relu'))
#max-poolig layer with pool_size of (2,2)
cnn_model.add(tf.keras.layers.MaxPool2D(pool_size=(2,2)))

#----------------------------------------------------------------------------------------------

#Conv layer: 128 filters of size (3, 3), withstrides = 1 and relu activation
cnn_model.add(tf.keras.layers.Conv2D(256, kernel_size=(3, 3), strides=1, 
                                    activation='relu'))
#max-poolig layer with pool_size of (2,2)
cnn_model.add(tf.keras.layers.MaxPool2D(pool_size=(2,2)))

#----------------------------------------------------------------------------------------------
#Flattening the output of the last pooling layer 
cnn_model.add(tf.keras.layers.Flatten())

#cnn_model.add(tf.keras.layers.GlobalAveragePooling2D())


#Fully connected layer with 256 units and relu activation
cnn_model.add(tf.keras.layers.Dense(256, activation='relu'))

#Dropout layer to lower the overfitting with dropuot rate of rate 0.4
cnn_model.add(tf.keras.layers.Dropout(0.4))

#Fully connected layer with 9 units and softmax activation
cnn_model.add(tf.keras.layers.Dense(9, activation = 'softmax'))

In [None]:
cnn_model.summary()

As we can see the number of parameters is reduced from 38,635,273 in the MLP to 2,066,313 in CNN. reducing the parameters to 5.2% of the number of parameters in the MLP !!!

In [None]:
cnn_model.compile(loss='categorical_crossentropy',
                 optimizer='rmsprop',
                 metrics=['acc'])

In [None]:
history = cnn_model.fit(training_generator, 
                         steps_per_epoch=99, 
                         validation_data=validation_generator,
                         validation_steps=20,
                         epochs=25)

As we can see we get a much better accuracy when we use the convolutional network 

we can get much better accuracy when we use the transfer learning which will be the nect topic to discuss

In [None]:
cnn_train_loss = history.history['loss']
cnn_val_loss = history.history['val_loss']


plt.plot(history.epoch, cnn_train_loss, label='Training Loss')
plt.plot(history.epoch, cnn_val_loss, label='Validation Loss')
plt.grid(True)
plt.legend()

In [None]:
train_acc = history.history['acc']
val_acc = history.history['val_acc']

plt.plot(history.epoch, train_acc, label='Training Accuracy')
plt.plot(history.epoch, val_acc, label='Validation Accuracy')
plt.grid(True)
plt.legend()

In [None]:
cnn_model.evaluate(test_generator)

**In the next notebook I will explain the transfer learning in computer vision problems.**


**Thank you for reading, I hope you enjoyed and benefited from it.**

**If you have any questions or notes please leave it in the commont section.**

**If you like this notebook please press upvote and thanks again.**

In [None]:
nan