## **Welcome!**
Welcome to this beginner-friendly notebook on Convolutional Neural Networks (CNNs). If you've ever wondered how computers can "see" and understand images, you're in the right place! In this notebook, we'll embark on a journey to demystify the world of CNNs and learn how to build a simple image classification model step by step.

<div style="text-align:center;">
    <img src="https://www.kaggle.com/static/images/site-logo.png" alt="Kaggle Logo">
</div>

## Objective
Our goal in this notebook is to introduce you to the fundamental concepts of CNNs and guide you through the process of building your first image classification model. Don't worry if you're new to deep learning – we'll take it one step at a time, explaining each concept along the way.

## Prerequisites
To make the most out of this tutorial, you should have basic knowledge of Python and a curiosity to explore the world of machine learning. No prior experience with deep learning is required – we'll cover the essentials together.
<br>


In [1]:
from sklearn.datasets import load_sample_images
import tensorflow as tf
from sklearn.model_selection import train_test_split
from tensorflow.keras.utils import to_categorical
import numpy as np


2024-03-24 22:09:58.611886: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-24 22:09:58.612024: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-24 22:09:58.766359: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


# **What is a neural network?**
A neural network is a machine learning program, or model, that makes decisions in a manner similar to the human brain, by using processes that mimic the way biological neurons work together to identify phenomena, weigh options and arrive at conclusions.

Every neural network consists of layers of nodes, or artificial neurons—an input layer, one or more hidden layers, and an output layer. Each node connects to others, and has its own associated weight and threshold. If the output of any individual node is above the specified threshold value, that node is activated, sending data to the next layer of the network. Otherwise, no data is passed along to the next layer of the network.

Neural networks are sometimes called artificial neural networks (ANNs) or simulated neural networks (SNNs). They are a subset of machine learning, and at the heart of deep learning models.

<div style="text-align:center;">
    <img src="https://lh4.googleusercontent.com/bHUcWaeVAzZA-sQLGNbO8lX__7eEvRJX7XB3RM8MQbT-bd9KSBDNzzjD83y3lnn0UbyYp_QzFXb7OETKgJMRc8X6ZRu6UCgI9VQzbeWr9l0ptv0OcY-n62URBsLfbu91YqnbXfnA" alt="neural network">
</div>

# How do neural networks work?
Think of each individual node as its own linear regression model, composed of input data, weights, a bias (or threshold), and an output. The formula would look something like this:

∑wixi + bias = w1x1 + w2x2 + w3x3 + bias

output = f(x) = 1 if ∑w1x1 + b>= 0; 0 if ∑w1x1 + b < 0

<div style="text-align:center;">
    <img src="https://miro.medium.com/v2/resize:fit:1400/1*SCz0aTETjTYC864Bqjt6Og.png" alt="neural network">
</div>

Once an input layer is determined, weights are assigned. These weights help determine the importance of any given variable, with larger ones contributing more significantly to the output compared to other inputs. All inputs are then multiplied by their respective weights and then summed. Afterward, the output is passed through an activation function, which determines the output. If that output exceeds a given threshold, it “fires” (or activates) the node, passing data to the next layer in the network. This results in the output of one node becoming in the input of the next node. This process of passing data from one layer to the next layer defines this neural network as a feedforward network.

Ultimately, the goal is to minimize our cost function to ensure correctness of fit for any given observation. As the model adjusts its weights and bias, it uses the cost function and reinforcement learning to reach the point of convergence, or the local minimum. The process in which the algorithm adjusts its weights is through gradient descent, allowing the model to determine the direction to take to reduce errors (or minimize the cost function). With each training example, the parameters of the model adjust to gradually converge at the minimum. 

Most deep neural networks are feedforward, meaning they flow in one direction only, from input to output. However, you can also train your model through backpropagation; that is, move in the opposite direction from output to input. Backpropagation allows us to calculate and attribute the error associated with each neuron, allowing us to adjust and fit the parameters of the model(s) appropriately.



# **Convolutional Neural Networks(CNN)**
CNNs,or convnets for short, are a special case of feedforward neural networks. They are very similar to the neural networks in the sense that they are made up of neurons with learnable weights and biases. The essential difference is that the CNN architecture makes the implicit assumption that the input are image-like, which allows us to encode certain properties in the architecture.


<div style="text-align:center;">
    <img src="https://www.tomasbeuzen.com/deep-learning-with-pytorch/_images/cnn-6.png" alt="neural network">
</div>

# Why CNN?
Full connectivity is a problem for image inputs<br>
• Scalability: 200x200x3 images imply 120,000 weights per neuron in first
hidden layer<br>
• Overfitting: Too many parameters would lead to overfitting

**Convolutional Neural Networks** are specialized to the case where inputs are images (more generally, data
with a grid-like topology)<br>
• Sparse connections, parameter sharing<br>
• Efficient to train<br>
• Avoid overfitting<br>
• Generalize across spatial translations of input<br>
• By sliding “filters” that learn distinct patterns (edges, blobs of color etc.)
<br><br>
**Key idea**<br>
• Replace matrix multiplication in neural networks with convolution<br>
• Everything else remains the same<br>


# How Does CNN work?
Before we go to the working of Convolutional neural networks (CNN), let’s cover the basics, such as what an image is and how it is represented. An RGB image is nothing but a matrix of pixel values having three planes whereas a grayscale image is the same but it has a single plane.
<br>
<br>
<div style="text-align:center;">
    <img src="https://editor.analyticsvidhya.com/uploads/306461_15yDvGKV47a0nkf5qLKOOQ.png" alt="neural network">
</div>
<br>
For simplicity, let’s stick with grayscale images as we try to understand how CNNs work.
<br>
<div style="text-align:center;">
    <img src="https://editor.analyticsvidhya.com/uploads/750710_QS1ArBEUJjjySXhE.png" alt="neural network">
</div>
<br>
We take a filter/kernel(3×3 matrix) and apply it to the input image to get the convolved feature. This convolved feature is passed on to the next layer.
<br>
<div style="text-align:center;">
    <img src="https://editor.analyticsvidhya.com/uploads/419681_GcI7G-JLAQiEoCON7xFbhg.gif" alt="neural network">
</div>
<br>

# Filters on multiple channels
Images are generally RGB !!<br>
How would a filter work on a image
with RGB channels?<br>
The filter should also have 3
channels.<br>
Now the output has a channel for
every filter we have used.<br>
<br>


In [2]:
images = load_sample_images()["images"]

# Center crop the images
images = tf.keras.layers.CenterCrop(height=70,width=120)(images)  # Adjust central_fraction as needed

# Rescale the pixel values to [0, 1]
images = tf.keras.layers.Rescaling(scale=1/255)(images)

labels = [0, 1] * (len(images) // 2)  # Assuming binary classification
# Convert labels to arrays
labels = np.array(labels)

print(images.shape)

(2, 70, 120, 3)


# ConvNet architecture
• A ConvNet is made up of Layers<br>
• Every Layer transforms an input 3D volume to an output 3D volume with
some differentiable function that may or may not have parameters<br>
• Neurons in a layer will only be connected to a small region of the layer
before it


![](https://miro.medium.com/v2/resize:fit:1400/format:webp/1*vkQ0hXDaQv57sALXAJquxA.jpeg)

Explaining the full architecture of a Convolutional Neural Network (ConvNet) can be quite extensive as there are various architectures with different depths and complexities. However, I'll provide a general overview of a typical ConvNet architecture, which may vary depending on the specific task and dataset.<br>

Let's break down a simplified version of a ConvNet architecture:<br>

1. Input Layer:<br>
This is where the raw input data, usually an image, is fed into the network.<br>
The input layer has dimensions corresponding to the size of the input image (e.g., width, height, and number of color channels).<br>

2. Convolutional Layers:<br>
Convolutional layers are the key building blocks of a ConvNet.<br>
Each convolutional layer applies a set of learnable filters (kernels) to the input data.<br>
These filters detect various features such as edges, textures, or patterns within the input image.<br>
The output of each convolutional layer is a feature map, where each neuron represents the activation of a particular feature.<br>

3. Activation Function:<br>
Typically, an activation function such as ReLU (Rectified Linear Unit) follows each convolutional operation.<br>
ReLU introduces non-linearity into the network, allowing it to learn complex patterns effectively.<br>

4. Pooling Layers:<br>
Pooling layers are used to reduce the spatial dimensions (width and height) of the feature maps while retaining important information.<br>
Common pooling operations include max pooling or average pooling.<br>
Pooling helps in making the representation smaller and more manageable, reducing the number of parameters and computation in the network.<br>

4. Fully Connected Layers:<br>
After several convolutional and pooling layers, the feature maps are flattened into a vector and fed into one or more fully connected layers.<br>
Fully connected layers perform high-level reasoning on the extracted features.<br>
These layers map the features to the output classes in classification tasks.<br>
The final fully connected layer usually employs a softmax activation function for multi-class classification, producing class probabilities.<br>

5. Output Layer:<br>
The output layer provides the final predictions of the network.<br>
For classification tasks, the output layer typically consists of neurons equal to the number of classes, with softmax activation to produce class probabilities.<br>
For tasks like object detection or semantic segmentation, the output layer might have a different structure tailored to the specific requirements of the task.<br>

6. Optional Layers:<br>
Depending on the architecture and requirements, additional layers such as dropout layers for regularization, batch normalization layers for improved training stability, or skip connections for facilitating the training of deeper networks may be included.<br>

![](https://analyticsindiamag.com/wp-content/uploads/2018/01/nural-network-05.jpg)



# **Convolution layers**

This is the first building block of a CNN. As the name suggests, the main mathematical task performed is called convolution, which is the application of a sliding window function to a matrix of pixels representing an image. The sliding function applied to the matrix is called kernel or filter, and both can be used interchangeably.

In the convolution layer, several filters of equal size are applied, and each filter is used to recognize a specific pattern from the image, such as the curving of the digits, the edges, the whole shape of the digits, and more.

Put simply, in the convolution layer, we use small grids (called filters or kernels) that move over the image. Each small grid is like a mini magnifying glass that looks for specific patterns in the photo, like lines, curves, or shapes. As it moves across the photo, it creates a new grid that highlights where it found these patterns.

For example, one filter might be good at finding straight lines, another might find curves, and so on. By using several different filters, the CNN can get a good idea of all the different patterns that make up the image.

Let’s consider this 32x32 grayscale image of a handwritten digit. The values in the matrix are given for illustration purposes.

<br>
<div style="text-align:center;">
    <img src="https://images.datacamp.com/image/upload/v1700043954/image5_b9b4c3cb25.png" alt="neural network">
</div>
<br>

Also, let’s consider the kernel used for the convolution. It is a matrix with a dimension of 3x3. The weights of each element of the kernel is represented in the grid. Zero weights are represented in the black grids and ones in the white grid.

**Do we have to manually find these weights?**

In real life, the weights of the kernels are determined during the training process of the neural network.

Using these two matrices, we can perform the convolution operation by applying the dot product, and work as follows:

1. Apply the kernel matrix from the top-left corner to the right.
2. Perform element-wise multiplication.
3. Sum the values of the products.
4. The resulting value corresponds to the first value (top-left corner) in the convoluted matrix.
5. Move the kernel down with respect to the size of the sliding window.

Repeat steps 1 to 5 until the image matrix is fully covered.

NOTE: The dimension of the convoluted matrix depends on the size of the sliding window. The higher the sliding window, the smaller the dimension.

<br>
<div style="text-align:center;">
    <img src="https://images.datacamp.com/image/upload/v1700043998/image9_fbc98b6c6e.png" alt="neural network">
</div>
<br>


# The Kernel

We also have a feature detector, also known as a kernel or a filter, which will move across the receptive fields of the image, checking if the feature is present. This process is known as a convolution.

The feature detector is a two-dimensional (2-D) array of weights, which represents part of the image. While they can vary in size, the filter size is typically a 3x3 matrix; this also determines the size of the receptive field. The filter is then applied to an area of the image, and a dot product is calculated between the input pixels and the filter. This dot product is then fed into an output array. Afterwards, the filter shifts by a stride, repeating the process until the kernel has swept across the entire image. The final output from the series of dot products from the input and the filter is known as a feature map, activation map, or a convolved feature.

<br>
<div style="text-align:center;">
    <img src="https://miro.medium.com/v2/resize:fit:786/format:webp/1*GcI7G-JLAQiEoCON7xFbhg.gif" alt="neural network">
</div>
<br>

Image Dimensions = 5 (Height) x 5 (Breadth) x 1 (Number of channels, eg. RGB)

In the above demonstration, the green section resembles our 5x5x1 input image, I. The element involved in the convolution operation in the first part of a Convolutional Layer is called the Kernel/Filter, K, represented in color yellow. We have selected K as a 3x3x1 matrix.

<br>
<div style="text-align:center;">
    <img src="https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcTn9ZEVAqEutz1hbm01ZkZJVEr5g6mDDaIqeo-tIR8zfDJrYHts" alt="neural network">
</div>
<br>

The Kernel shifts 9 times because of Stride Length = 1 (Non-Strided), every time performing an elementwise multiplication operation (Hadamard Product) between K and the portion P of the image over which the kernel is hovering.

<br>
<div style="text-align:center;">
    <img src="https://miro.medium.com/v2/resize:fit:640/format:webp/1*NsiYxt8tPDQyjyH3C08PVA@2x.png" alt="neural network">
</div>
<br>

The filter moves to the right with a certain Stride Value till it parses the complete width. Moving on, it hops down to the beginning (left) of the image with the same Stride Value and repeats the process until the entire image is traversed.

<br>
<div style="text-align:center;">
    <img src="https://miro.medium.com/v2/resize:fit:1100/format:webp/1*ciDgQEjViWLnCbmX-EeSrA.gif" alt="neural network">
</div>
<br>


In the case of images with multiple channels (e.g. RGB), the Kernel has the same depth as that of the input image. Matrix Multiplication is performed between Kn and In stack ([K1, I1]; [K2, I2]; [K3, I3]) and all the results are summed with the bias to give us a squashed one-depth channel Convoluted Feature Output.



Note that the weights in the feature detector remain fixed as it moves across the image, which is also known as parameter sharing. Some parameters, like the weight values, adjust during training through the process of backpropagation and gradient descent. However, there are three hyperparameters which affect the volume size of the output that need to be set before the training of the neural network begins. These include:

1. The number of filters affects the depth of the output. For example, three distinct filters would yield three different feature maps, creating a depth of three. 

2. Stride is the distance, or number of pixels, that the kernel moves over the input matrix. While stride values of two or greater is rare, a larger stride yields a smaller output.
*  Step size with which we slide the filters
*  When the stride is 1 then we move the filters one pixel at a time. When the stride is 2 (or uncommonly 3 or more) then the filters jump 2 pixels at a time as we slide them around

3. Zero-padding is usually used when the filters do not fit the input image. This sets all elements that fall outside of the input matrix to zero, producing a larger or equally sized output. There are three types of padding:

* Valid padding: This is also known as no padding. In this case, the last convolution is dropped if dimensions do not align.
* Same padding: This padding ensures that the output layer has the same size as the input layer.
* Full padding: This type of padding increases the size of the output by adding zeros to the border of the input.

<br>
<div style="text-align:center;">
    <img src="https://miro.medium.com/v2/resize:fit:640/format:webp/1*nYf_cUIHFEWU1JXGwnz-Ig.gif" alt="neural network">
</div>

<div align="center">
  SAME padding: 5x5x1 image is padded with 0s to create a 6x6x1 image
</div>
<br>

When we augment the 5x5x1 image into a 6x6x1 image and then apply the 3x3x1 kernel over it, we find that the convolved matrix turns out to be of dimensions 5x5x1. Hence the name — Same Padding.

On the other hand, if we perform the same operation without padding, we are presented with a matrix that has dimensions of the Kernel (3x3x1) itself — Valid Padding.


In [3]:
conv_layer=tf.keras.layers.Conv2D(filters=32,kernel_size=7)
#the default option is named padding="valid" which means no zero padding at all 
fmaps=conv_layer(images)
print(fmaps.shape)

(2, 64, 114, 32)


In [4]:
'''if we set padding="same" instead,then the inputs are padded with enough zeroes on all sides to 
ensure the output feature maps end up with the same size as input'''

conv_layer=tf.keras.layers.Conv2D(filters=32,kernel_size=7,padding="same")
fmaps=conv_layer(images)
print(fmaps.shape)


(2, 70, 120, 32)


# **Pooling layer**

Similar to the Convolutional Layer, the Pooling layer is responsible for reducing the spatial size of the Convolved Feature. This is to decrease the computational power required to process the data through dimensionality reduction. Furthermore, it is useful for extracting dominant features which are rotational and positional invariant, thus maintaining the process of effectively training the model.

<div style="text-align:center;">
    <img src="https://miro.medium.com/v2/resize:fit:640/format:webp/1*uoWYsCV5vBU8SHFPAPao-w.gif" alt="neural network">
</div>

The goal of the pooling layer is to pull the most significant features from the convoluted matrix. This is done by applying some aggregation operations, which reduce the dimension of the feature map (convoluted matrix), hence reducing the memory used while training the network. Pooling is also relevant for mitigating overfitting.

The most common aggregation functions that can be applied are:

* Max pooling, which is the maximum value of the feature map
* Sum pooling corresponds to the sum of all the values of the feature map
* Average pooling is the average of all the values.

Max Pooling also performs as a Noise Suppressant. It discards the noisy activations altogether and also performs de-noising along with dimensionality reduction. On the other hand, Average Pooling simply performs dimensionality reduction as a noise-suppressing mechanism. Hence, we can say that Max Pooling performs a lot better than Average Pooling.

<div style="text-align:center;">
    <img src="https://miro.medium.com/v2/resize:fit:828/format:webp/1*KQIEqhxzICU7thjaQBfPBQ.png" alt="neural network">
</div>
<br>

The Convolutional Layer and the Pooling Layer, together form the i-th layer of a Convolutional Neural Network. Depending on the complexities in the images, the number of such layers may be increased for capturing low-level details even further, but at the cost of more computational power.

After going through the above process, we have successfully enabled the model to understand the features. Moving on, we are going to flatten the final output and feed it to a regular Neural Network for classification purposes.

Also, the dimension of the feature map becomes smaller as the pooling function is applied.

The last pooling layer flattens its feature map so that it can be processed by the fully connected layer.

In [5]:
#implementing pooling layers with keras 
max_pool= tf.keras.layers.MaxPool2D(pool_size=2)
print(max_pool(images))

tf.Tensor(
[[[[0.4901961  0.54901963 0.5686275 ]
   [0.3647059  0.47450984 0.4784314 ]
   [0.24705884 0.39607847 0.3529412 ]
   ...
   [0.90196085 0.8980393  0.9176471 ]
   [0.9058824  0.90196085 0.9215687 ]
   [0.9058824  0.9058824  0.9215687 ]]

  [[0.48627454 0.4039216  0.46274513]
   [0.26666668 0.27058825 0.28627452]
   [0.43137258 0.5137255  0.49411768]
   ...
   [0.90196085 0.90196085 0.909804  ]
   [0.90196085 0.90196085 0.909804  ]
   [0.9058824  0.9058824  0.91372555]]

  [[0.32156864 0.30588236 0.23137257]
   [0.3254902  0.29803923 0.19607845]
   [0.36078432 0.30980393 0.27450982]
   ...
   [0.90196085 0.90196085 0.909804  ]
   [0.90196085 0.90196085 0.909804  ]
   [0.9058824  0.9058824  0.91372555]]

  ...

  [[0.5137255  0.25490198 0.19215688]
   [0.4666667  0.33333334 0.227451  ]
   [0.53333336 0.34117648 0.18823531]
   ...
   [0.7686275  0.81568635 0.8196079 ]
   [0.77647066 0.8196079  0.82745105]
   [0.7686275  0.8117648  0.8196079 ]]

  [[0.33333334 0.30588236 0.266666

In [6]:
global_avg_pool = tf.keras.layers.GlobalAveragePooling2D()
output = global_avg_pool(images)
print(output)

tf.Tensor(
[[0.64338624 0.5971759  0.5824972 ]
 [0.76306933 0.2601113  0.10849128]], shape=(2, 3), dtype=float32)


# **Fully connected layers**

These layers are in the last layer of the convolutional neural network, and their inputs correspond to the flattened one-dimensional matrix generated by the last pooling layer. ReLU activations functions are applied to them for non-linearity.

The name of the full-connected layer aptly describes itself. As mentioned earlier, the pixel values of the input image are not directly connected to the output layer in partially connected layers. However, in the fully-connected layer, each node in the output layer connects directly to a node in the previous layer.

Adding a Fully-Connected layer is a (usually) cheap way of learning non-linear combinations of the high-level features as represented by the output of the convolutional layer. The Fully-Connected layer is learning a possibly non-linear function in that space.

Finally, a softmax prediction layer is used to generate probability values for each of the possible output labels, and the final label predicted is the one with the highest probability score.

# **Activation Function**

Activation functions play a crucial role in Convolutional Neural Networks (CNNs) by introducing non-linearity into the network, allowing it to learn complex patterns and relationships within the data. Here's an explanation of some commonly used activation functions in CNNs:

**1. ReLU (Rectified Linear Unit):**
* f(x)=max(0,x)
* ReLU is the most widely used activation function in CNNs due to its simplicity and effectiveness.
* It introduces non-linearity by outputting zero for negative input values and leaving positive values unchanged.
* Benefits include faster convergence during training and alleviation of the vanishing gradient problem.
* However, ReLU can suffer from the "dying ReLU" problem where neurons can become inactive (output zero) indefinitely during training if the weighted sum of inputs is consistently negative.

**2. Sigmoid Function:**
* Sigmoid squashes input values to the range [0, 1].
* Historically used in neural networks for binary classification tasks where the output needs to be interpreted as probabilities.
* However, it suffers from vanishing gradient and is rarely used in hidden layers of deep neural networks due to its saturation property, which leads to the vanishing gradient problem.

**3. Tanh (Hyperbolic Tangent):**
* Tanh squashes input values to the range [-1, 1].
* Similar to the sigmoid function but centered around zero, which allows it to model negative values.
* Like sigmoid, tanh also suffers from vanishing gradient, particularly for deep networks.

**4. Leaky ReLU:**
* Leaky ReLU addresses the dying ReLU problem by allowing a small gradient when the input is negative (typically, alpha is a small constant, e.g., 0.01).
* This function helps to mitigate the issues with ReLU while maintaining its benefits of fast convergence.

**5. Softmax:**
* Softmax is commonly used in the output layer of CNNs for multi-class classification tasks.
* It normalizes the output into a probability distribution over multiple classes, ensuring that the sum of the output probabilities equals one.
* Softmax is useful when the network needs to make mutually exclusive predictions across multiple classes.

**6. ELU (Exponential Linear Unit):**
* ELU, like Leaky ReLU, also tries to alleviate the dying ReLU problem and can push mean unit activations closer to zero, which speeds up learning.

These activation functions contribute to the non-linear mapping of inputs to outputs in CNNs, enabling them to learn complex representations of data and perform effectively in various tasks such as image classification, object detection, and segmentation. Choosing the appropriate activation function depends on the specific characteristics of the problem at hand and empirical performance on the dataset.

<div style="text-align:center;">
    <img src="https://media.licdn.com/dms/image/C4E12AQFqIFCj71YJPw/article-cover_image-shrink_600_2000/0/1620764635917?e=2147483647&v=beta&t=E6iCHNUyncJu9QUjCX4EmVLgQanUw_WS6KnWrlV4Roc" alt="neural network">
</div>
<br>


# **Overfitting and Regularization in CNNs**
Overfitting is a common challenge in machine learning models and CNN deep learning projects. It happens when the model learns the training data too well (“learning by heart”), including its noise and outliers. Such a learning leads to a model that performs well on the training data but badly on new, unseen data.

This can be observed when the performance on training data is too low compared to the performance on validation or testing data, and a graphical illustration is given below:

<div style="text-align:center;">
    <img src="https://images.datacamp.com/image/upload/v1700044100/image3_93b1b7c0d9.png" alt="neural network">
</div>
<br>


In [7]:
#Here is how we can implement a basic CNN 
import tensorflow as tf
from tensorflow.keras import layers, models

# Define the input shape (e.g., image dimensions)
input_shape = (28, 28, 3)  # Example input shape for an RGB image with size 28x28

num_classes=10
# Initialize a Sequential model
model = models.Sequential()

# Add the first convolutional layer
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=input_shape))
# Explanation:
# - Conv2D: This layer creates a convolution kernel that is convolved with the layer input to produce a tensor of outputs.
# - 32: Number of filters/kernels to use.
# - (3, 3): Size of the convolutional window.
# - 'relu': Activation function (Rectified Linear Unit).
# - input_shape: Shape of input data.

# Add a max pooling layer
model.add(layers.MaxPooling2D((2, 2)))
# Explanation:
# - MaxPooling2D: This layer downsamples the input along its spatial dimensions (height and width) using max pooling.
# - (2, 2): Pooling window size.

# Add another convolutional layer
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
# Explanation: Same as the first convolutional layer but with 64 filters.

# Add another max pooling layer
model.add(layers.MaxPooling2D((2, 2)))

# Flatten the output to feed into fully connected layers
model.add(layers.Flatten())
# Explanation: This layer flattens the input, transforming it into a 1D array.

# Add fully connected layers
model.add(layers.Dense(64, activation='relu'))
# Explanation:
# - Dense: Fully connected layer.
# - 64: Number of neurons in the layer.
# - 'relu': Activation function.

# Add a dropout layer for regularization
model.add(layers.Dropout(0.5))
# Explanation: Dropout is a regularization technique where a random fraction of input units are dropped out (set to zero) during training.

# Add the output layer
model.add(layers.Dense(num_classes, activation='softmax'))
# Explanation:
# - Dense: Fully connected output layer.
# - num_classes: Number of classes in the classification task.
# - 'softmax': Activation function for multi-class classification, producing class probabilities.

# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# Explanation:
# - Optimizer: Algorithm used for optimizing the network weights (e.g., Adam).
# - Loss: Loss function used for training the model (e.g., categorical_crossentropy for multi-class classification).
# - Metrics: Evaluation metrics to be monitored during training (e.g., accuracy).

# Print model summary
model.summary()


Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_2 (Conv2D)           (None, 26, 26, 32)        896       
                                                                 
 max_pooling2d_1 (MaxPoolin  (None, 13, 13, 32)        0         
 g2D)                                                            
                                                                 
 conv2d_3 (Conv2D)           (None, 11, 11, 64)        18496     
                                                                 
 max_pooling2d_2 (MaxPoolin  (None, 5, 5, 64)          0         
 g2D)                                                            
                                                                 
 flatten (Flatten)           (None, 1600)              0         
                                                                 
 dense (Dense)               (None, 64)                1