In [None]:
# Convolutional Neural Networks (CNNs):
- deep learning models that extract features from images using convolutional layers, followed by pooling and fully connected layers for tasks like image classification.
-  They excel in capturing spatial hierarchies and patterns, making them ideal for analyzing visual data.


# CNN ARCHITECTURE:
* 1. Convolutional Layer
* 2. Pooling Layer
* 3. Flattening
* 4. Fully Connected Layer : Utilizes the output from the convolution process and predicts the class of the image based on the features extracted in previous stages.

** Feature Extraction: A process of separating the image into features of the image for analysis using a convolution tool.
** This CNN model of feature extraction aims to reduce the number of features present in a dataset. It creates new features which summarises the existing features contained in an original set of features. 
** CNNs are also used for image classification, object detection, and image segmentation.
** Apart from these layers, two more important parameters which are the dropout layer and the activation function.
** 


#### Sequential() : A Sequential model is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor.
#### Conv2D : This layer creates a convolution kernel that is convolved with the layer input over a single spatial (or temporal) dimension to produce a tensor of outputs.
- A filter or a kernel in a conv2D layer “slides” over the 2D input data, performing an elementwise multiplication. As a result, it will be summing up the results into a single output pixel.

### kernel( filters) : 
- The kernel size here refers to the widthxheight of the filter mask.



In [3]:
from keras.layers import Dense, Input, Dropout, GlobalAveragePooling2D, Flatten, Conv2D, BatchNormalization, Activation, MaxPooling2D
from keras.models import Model, Sequential
from keras.optimizers import Adam

# Number of labels :
num_labels =  7

 # Creating a sequential model :

model = Sequential()


#### Determining the no of filters(kernels) to use in each convo layer: (general principles & guidelines) :
1. Increasing Complexity:As you go deeper into the network, the features captured by the filters become more abstract and complex. Increasing the number of filters in deeper layers allows the network to learn more detailed and varied features. Typically, you start with a smaller number of filters (e.g., 32 or 64) and increase it in deeper layers (e.g., 128, 256, 512). 
* For CNN : Begin with 32 or 64 filters for the first convolutional layer. These numbers are large enough to capture various features in the initial layers but not too large to overburden the computational resources.
2. Empirical guidelines: Common practice involves starting with a base number (like 32 or 64 filters) and then doubling the number of filters after each pooling layer or every few convolutional layers. 
- Layer 1: 32 filters
- Layer 2: 64 filters
- Layer 3: 128 filters
- Layer 4: 256 filters, etc.​
3. Network Depth and Computational Resources: The number of filters is also influenced by the computational resources available (e.g., GPU memory). More filters mean more parameters and higher computational cost. It’s important to balance the network’s capacity to learn with the available resources to avoid overfitting and ensure efficient training​.
4. Experimentation and Tuning: The optimal number of filters can vary depending on the specific dataset and task. It often requires experimentation and tuning. Techniques like hyperparameter optimization and cross-validation can help in finding the best configuration for the number of filters at each layer.

** Kernel Size:
- Common practice is to use  3*3 kernels.
- Larger kernels like 5*5 can be used in initial layers for broader feature extraction.

* For this: we will satrt with :
1. Number of Convolutional Layers:Start with 2-3 convolutional layers for a simpler model and increase as needed based on performance.
2. Number of Filters:
- Initial Layers: Use 32 or 64 filters.
- Intermediate Layers: Increase to 128 or 256 filters.
- Deeper Layers: Use 512 or more filters if the network is very deep.
3. kernel size : 3*3
4. Pooling size : 2*2


### Strides: 
- Stride determines how many pixels the kernel shifts over the input at a time. 
- Eg:  stride = 1 --means the dot product is performed on an n x n window of the 2D input, then shifts kernel by one pixel for subsequent operation across both axes. 
- stride length decreases: results in learning more features and larger output layers due to more feature extraction.
- stride length increases: results in reduced output layer dimensions. 
- Purpose: To control the overlap of receptive fields, reduce the spatial dimensions of the output, and potentially speed up the computations.

*When to use stride:*
- In the early layers of a CNN, it's common to use strides of 1 to preserve as much spatial information as possible. These layers typically extract low-level features like edges and textures.
- Strides are often used in the deeper layers of the network, especially after several convolutional operations with strides of 1. By this stage, you might want to reduce the spatial dimensions of your feature maps to make the network more computationally efficient and to increase the receptive field of the neurons. A common choice for these layers is to use strides of 2.

** Use of Strides: 

1. Reducing spatial dimensions : When the i/p image size is large --- using strides can help reduce the spatial dimensions more quickly than using pooling layers alone.
2. Efficient Computation:To decrease computational cost, especially in deeper networks. Larger strides reduce the size of the output feature map, leading to fewer computations in subsequent layers.
3. Avoiding Pooling Layers: In certain architectures like some versions of ResNet, strides are used in convolutional layers instead of pooling layers to reduce dimensions, ensuring that the information flow is more controlled and less lossy.

*NOTE* : 
- While strides help in reducing dimensions, excessive downsampling can lead to loss of important spatial information.
- The choice of using strides depends on the overall design and objective of the CNN architecture.


For small images(as in this project)(48x48 pixels), using strides can be beneficial but should be done carefully:
- Downsampling: If your image size is small, using large strides (e.g., stride of 2) in early layers might quickly reduce the spatial dimensions, which could lead to loss of important information. Therefore, you might want to use strides cautiously or rely on pooling layers for downsampling.
- Pooling Layers: Alternatively, we can use pooling layers (e.g., max pooling) to reduce the spatial dimensions while retaining important features. Pooling layers can also help in reducing the computational load and control overfitting.

### Padding : Helps to control the spatial dimensions of feature maps after convolution operations.
1 . Valid padding(No padding): 
2 .Same padding(Zero Padding)



NOTE:
- For small images(48x48 pixels), same padding can be beneficial to preserve spatial dimensions, particularly in deeper layers where reducing spatial size too quickly can lead to loss of important information.
- In more complex architectures, like those with multiple convolutional layers, same padding helps maintain a consistent feature map size, which can be advantageous for architectural consistency and ease of debugging.

Layer Type	    ||     Typical Range or Value	       ||                              Notes
---------------------------------------------------------------------------------------------------------------------------------------
Convolutional	||         3-10 layers	               ||             Start with 2-3 layers for simpler models, increase as needed
Filters         ||	       32, 64, 128, 256, 512	   ||             Increase with depth of the network
Kernel Size	    ||         3x3 (commonly used)	       ||            Can use 5x5 in initial layers for broader feature extraction
Pooling Layers	||         2x2 MaxPooling	           ||             After every few convolutional layers
Strides	        ||    1 (default), 2 for downsampling  ||          Use larger strides to reduce spatial dimensions

In [None]:
# 1st Convolution Layer:
model.add(Conv2D(64,(3,3), padding='same', input_shape=(48, 48,1),strides=1)  # 64 filters of size 3x3 

model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))


* Reference:*
- https://www.upgrad.com/blog/basic-cnn-architecture/
- https://www.simplilearn.com/tutorials/deep-learning-tutorial/convolutional-neural-network#layers_in_a_convolutional_neural_network
- https://learnopencv.com/understanding-convolutional-neural-networks-cnn/