# Establish CNNs from scratch
The dense layer is also called the fully connected layer. The input is the flattened one-column vector from the last convolution-pooling layer by the reshaping process. The output would be a N-dimensional  vector, where N is the number of categories. For the dense layer, **Forward Propagation** and **Backward Propagation** are the same with those of a single-layer perceptron.

In [1]:
import numpy as np
from layer import Layer

The formula of **Forward Propagation** goes as follows:
$$
Y=W \cdot X+B
$$  
$$
Y=
\begin{bmatrix}
y_{1} \\
y_{2} \\
\vdots \\
y_{j} \\
\end{bmatrix}  
\;
W=
\begin{bmatrix}
\omega_{11} & \omega_{21} & \dots & \omega_{i1}\\
\omega_{12} & \omega_{22} & \dots & \omega_{i2}\\
\vdots & \vdots & \ddots & \vdots\\
\omega_{1j} & \omega_{2j} & \dots & \omega_{ij}\\
\end{bmatrix}
\;
X=
\begin{bmatrix}
x_{1} \\
x_{2} \\
\vdots \\
x_{i} \\
\end{bmatrix}  
\;
B=
\begin{bmatrix}
b_{1} \\
b_{2} \\
\vdots \\
b_{j} \\
\end{bmatrix}  
\;
$$
The formula of **Backward Propagation** goes as follows:
$$
\frac{\partial E}{\partial X} = W^T \cdot \frac{\partial E}{\partial Y}
$$  
The formulas of **Gredient Descent** go as follows:
$$
\frac{\partial E}{\partial W} = \frac{\partial E}{\partial Y} \cdot X^T
$$  
$$
\frac{\partial E}{\partial B} = \frac{\partial E}{\partial Y}
$$  
Hence
$$
W \leftarrow W-\alpha \frac{\partial E}{\partial W}
$$  
$$
B \leftarrow B-\alpha \frac{\partial E}{\partial B}
$$

In [None]:
class Dense(Layer):
    def __init__(self, input_size, output_size):
        self.weights = np.random.randn(output_size, input_size)
        self.bias = np.random.randn(output_size, 1)

    def forward(self, input):
        self.input = input
        return np.dot(self.weights, self.input) + self.bias

    def backward(self, output_gradient, learning_rate):
        weights_gradient = np.dot(output_gradient, self.input.T)
        input_gradient = np.dot(self.weights.T, output_gradient)
        self.weights -= learning_rate * weights_gradient
        self.bias -= learning_rate * output_gradient
        return input_gradient