# Building The CNN And Learning Almost Everything About It.

#### 1. Why CNN and Not MLP For Images ?
#### 2. How CNN works ? - > Layers : Convolutional, Pooling and Fully Connected Layers 
#### 3. How Backprop Happens In CNN ?

![img](../../images/cnn.png)

### 1. Why We Use The CNN And Not MLP For Images ?

CNN Handles the Image datasets better than the MLP 
and there are two main reason to it : ***1. Sparse Connectivety and 2. Parameter Shairng***


`Sparse Connectivity :` Sparse connectivity point towards that each neuron in a convolutional layer of a CNN is only connected to a small, localized region of the input image, rather than being fully connected to the entire input. This localized region is known as the receptive field.

Local Receptive Fields: In CNNs, each convolutional kernels operates over a small patch of the input image. This means that neurons in the convolutional layer only focus on small regions of the input, making the network more efficient by reducing the number of parameters and computations.

`Parameter Sharing:` The same convolutional filter is applied to entire input image. This parameter sharing allows CNNs to capture features like edges, textures, and patterns regardless of their position in the image.

#### Why not MLP ?

`High Number of Parameters:` For an image of size 32x32 with 3 color channels, a single neuron in the first hidden layer of an MLP would have ***32 × 32 × 3 = 3072*** connections. With a large number of neurons, this quickly becomes impractical.

`Lack of Locality:` MLPs treat all input pixels equally without considering the spatial structure of the image. This lack of locality makes it harder for MLPs to capture local patterns and structures that are important for image recognition tasks.

### 2.  How CNN works ? - > Layers : Convolutional, Pooling and Fully Connected Layers 

The CNN model consists primarily of a stack of layers: convolutional and pooling layers, followed by fully connected layers (MLP). Depending on the specific problem, the output layer is chosen as either sigmoid or softmax.

#### How Images go through CNN model :

***Step 1: Convolutional Layer***

The image first passes through the convolutional layer, which uses a kernel (filled with weights) that slides over the entire image. This process extracts features such as edges, textures, and patterns.

***Step 2: Pooling Layer***

After the convolutional layer, the output is passed to the pooling layer. Pooling reduces the spatial dimensions of the feature maps, which helps to decrease the computational load and the number of parameters in the network. It also makes the network invariant to small translations of the input image. There are two common types of pooling: MaxPooling, which selects the maximum value from each patch of the feature map, and AveragePooling, which computes the average value of each patch. This layer helps in summarizing the presence of features in patches of the feature map, rather than their precise location.

***Step 3: Fully Connected Layer***

The output from the stack of convolutional and pooling layers is then flattened into a one-dimensional vector and passed to the fully connected layer (MLP). This layer performs high-level reasoning by combining all the features detected in previous layers. Each neuron in a fully connected layer is connected to every neuron in the previous layer, allowing the network to learn complex representations and relationships between features.

***Step 4: Output Layer***

Finally, the output from the fully connected layer is passed to the output layer. The type of output layer depends on the nature of the problem. For a binary classification problem, a sigmoid activation function is often used, which outputs a probability between 0 and 1. For multiclass classification problems, a softmax activation function is typically used, which outputs a probability distribution over all classes, allowing the network to predict the class with the highest probability.
This layered approach allows CNNs to effectively process and classify images by learning and focusing on important features.

In [11]:
import numpy as np
import scipy.signal
def conv2d(X, W, p=(0, 0), s=(1, 1)):
    
    Wr = np.array(W)[::-1, ::-1]  #rotating the kernel as per the original formula 
    
    X_orig = np.array(X) 
    

    n1 = X_orig.shape[0] + 2*p[0] #
    #print(n1)
    n2 = X_orig.shape[1] + 2*p[1]
    #print(n2)
    X_padded = np.zeros(shape=(n1, n2))
    #print(X_padded)
    X_padded[p[0]:p[0] + X_orig.shape[0], p[1]:p[1] + X_orig.shape[1]] = X_orig
    #print(X_padded)
    res = []
    
    # Iterate over the output array dimensions
    for i in range(0, int((X_padded.shape[0] - Wr.shape[0]) / s[0]) + 1, s[0]):
        row = []
        for j in range(0, int((X_padded.shape[1] - Wr.shape[1]) / s[1]) + 1, s[1]):
            X_sub = X_padded[i:i + Wr.shape[0], j:j + Wr.shape[1]]
            conv_result = np.sum(X_sub * Wr)
            row.append(conv_result)
        res.append(row)
    
    return np.array(res)

X = [[1, 3, 2, 4], [5, 6, 1, 3], [1, 2, 0, 2], [3, 4, 3, 2]]
W = [[1, 0, 3], [1, 2, 1], [0, 1, 1]]
print('Conv2d Implementation:\n',
conv2d(X, W, p=(1, 1), s=(1, 1))) 
print('SciPy Results:\n',scipy.signal.convolve2d(X, W, mode='same'))

Conv2d Implementation:
 [[11. 25. 32. 13.]
 [19. 25. 24. 13.]
 [13. 28. 25. 17.]
 [11. 17. 14.  9.]]
SciPy Results:
 [[11 25 32 13]
 [19 25 24 13]
 [13 28 25 17]
 [11 17 14  9]]
