# Establish CNNs from scratch
This file fouces on **Forward Propogation** and **Backward Propogation** in convolutional layers of CNNs. In this file, the convolutional layer is separated from the convolution-pooling layer, which is composed by three layers, namely "convolutional layer","activation layer",and"pooling layer".\\


A convolutional layer is a core building block of CNNs, where the input image is processed using a set of learnable filters, also known as kernels. This operation allows the network to be sensitive to spatial hierarchies of features in the input data, making it well-suited for processing data with a grid-like topology, such as images. 

In [4]:
import numpy as np
from scipy import signal
from layer import Layer

### 1.Some definitions
Define the operator of "Cross-Correlation" as "$\star$", which has two different types denoted by "$\underset{\scriptstyle full}{\star}$" and "$\underset{\scriptstyle valid}{\star}$".  
\
Define the operator of "Full Convolution" as "$\underset{\scriptstyle full}{\ast}$", which is defined by $ I \; \underset{\scriptstyle full}{\ast} \; K = I \; \underset{\scriptstyle full}{\star} \; rot180(K) $ .  

### 2.Formulas in **Forward Propogation** and **Backward Propogation**
The formula of **Forward Propagation** goes as follows:
$$ 
Y_{i} = B_{i} +  \sum_{j=1}^{n_{c}}X_{j} \underset{\scriptstyle valid}{\star} K_{ij} \qquad i = 1,\ldots,d \tag{1}
$$  
where $X_{j}$ stands for a single channel of the original input, which could also be inheritted from the previous layer with a certain depth; $K_{ij}$ stands for a corresponding kernel, which is implemented with the cross-correlation operation; $Y_{i}$ stands for the output; $d$ stands for the depth of the output.  
\
The formulas of **Backward Propagation** go as follows:
$$ 
\frac{\partial E}{ \partial X_{j}}=\sum_{i=1}^{d} \frac{\partial E}{\partial Y_{i}}\underset{\scriptstyle full}{\ast} \partial K_{ij} \tag{2}
$$  
### 3.**Gredient Descent** method
Updating weights and biases follows the classic "Gradient  Descent" strategy:
$$ 
\frac{\partial E}{ \partial K_{ij}}=X_{j} \underset{\scriptstyle valid}{\star} \frac{\partial E}{\partial Y_{i}} \tag{3}
$$
$$ 
\frac{\partial E}{ \partial B_{i}}=\frac{\partial E}{\partial Y_{i}} \tag{4}
$$  
Hence
$$
K_{ij} \leftarrow K_{ij} - \alpha \frac{\partial E}{ \partial K_{ij}} \tag{5}
$$
$$
B_{i} \leftarrow B_{i} - \alpha \frac{\partial E}{\partial B_{i}} \tag{6}
$$  
where $\alpha$ is the learning rate.

In [None]:
class Convolutional(Layer):
    def __init__(self, input_shape:tuple, kernel_size:int, depth:int):
        input_depth, input_height, input_width = input_shape
        self.depth = depth
        self.input_shape = input_shape
        self.input_depth = input_depth
        self.output_shape = (depth, input_height - kernel_size + 1, input_width - kernel_size + 1)
        self.kernels_shape = (depth, input_depth, kernel_size, kernel_size)
        self.kernels = np.random.randn(*self.kernels_shape)
        self.biases = np.random.randn(*self.output_shape)

    def forward(self, input:np.ndarray):
        self.input = input
        self.output = np.copy(self.biases)
        for i in range(self.depth):
            for j in range(self.input_depth):
                self.output[i] += signal.correlate2d(self.input[j], self.kernels[i, j], "valid")
        return self.output

    def backward(self, output_gradient, learning_rate):
        kernels_gradient = np.zeros(self.kernels_shape)
        input_gradient = np.zeros(self.input_shape)

        for i in range(self.depth):
            for j in range(self.input_depth):
                kernels_gradient[i, j] = signal.correlate2d(self.input[j], output_gradient[i], "valid")
                input_gradient[j] += signal.convolve2d(output_gradient[i], self.kernels[i, j], "full")

        self.kernels -= learning_rate * kernels_gradient
        self.biases -= learning_rate * output_gradient
        return input_gradient