# Deep L-Layer Neural Network

Here, we call deep neural network, a network that has at least 2 layers. A "shallow" network such as logistic regression contains a single layer as illustred in the image below:

<img src="images/logistic_regression.svg" width="30%" align="center"/>

Consider the deep neural network with 4 layers illustred in the image below:

<img src="images/deep_neural_network.svg" width="50%" align="center"/>

In this deep network, we have 4 layers (identified as $L=4$) and the number of neurons in each layer is identified as $n^{[l]}$, where $l$ is the layer. We index the input of the network as layer zero ($l=0$), the first hidden layer ($l=1$), the second hidden layer ($l=2$), the third hidden layer ($l=3$), and the output ($l=4$). Thus, we have $n^{[1]}=5$ since we have 5 units in layer 1, $n^{[2]}=5$ since we have 5 units in layer 2, $n^{[3]}=3$ since we have 3 units in layer 3, and $n^{[4]}=n^{[L]}=1$ since we have 1 units in the last layer. For the input, we have that $n^{[0]}=3$ since we have 3 features in the input. 

We also use $a^{[l]}$ to denote the activation in layer $l$. Thus, in forward, for example, we have that $a^{[l]}=g^{[l]}(z^{[l]})$. We use $w^{[l]}$ to denote the weights in layer $l$ and $b^{[l]}$ to denote the bias. Finally, we denote $X=a^{[0]}$ and $\hat{y}=a^{[L]}$.


# Forward Propagation in a Deep Network

Considering the deep network illustred above, we can compute its forward propagation as:

$
Z^{[1]} = W^{[1]}X + b^{[1]} \\
A^{[1]} = g^{[1]}(Z^{[1]}) \\
Z^{[2]} = W^{[2]}A^{[1]} + b^{[2]} \\
A^{[2]} = g^{[2]}(Z^{[2]}) \\
Z^{[3]} = W^{[3]}A^{[2]} + b^{[3]} \\
A^{[3]} = g^{[3]}(Z^{[3]}) \\
Z^{[4]} = W^{[4]}A^{[3]} + b^{[4]} \\
A^{[4]} = g^{[4]}(Z^{[4]}) \\
$

Considering that $X=A^{[0]}$, we can generalize the equation to:

$$
Z^{[l]} = W^{[l]}A^{[l-1]} + b^{[l]} \\
A^{[l]} = g^{[l]}(Z^{[l]}) \\
$$

Where the prediction is computed as:

$$
\hat{Y} = A^{[L]} = g^{[L]}(Z^{[L]})
$$

# Getting your matrix dimensions right

Consider the 5-layer neural network illustred below:

<img src="images/deep_neural_network_5layers.svg" width="60%" align="center"/>

In this network, we have the number of units as $n^{[0]} = n_x = 2$, $n^{[1]} = 3$, $n^{[2]} = 5$, $n^{[3]} = 4$, $n^{[4]} = 2$, and $n^{[5]} = 1$. The dimensions for each layer using a single example are defined as:

$
z^{[1]} = W^{[1]} X + b^{[1]} \\
(3, 1) = (3, 2) (2, 1) + (3, 1) \\
(n^{[1]}, 1) = (n^{[1]}, n^{[0]}) (n^{[0]}, 1) + (n^{[1]}, 1) \\
$

Using this example, we can see that $W^{[1]} : (n^{[1]}, n^{[0]})$ and in more general terms, we have $W^{[l]} : (n^{[l]}, n^{[l-1]})$. Considering the second layer as example, we can see that:

$
z^{[2]} = W^{[2]}a^{[1]} + b^{[2]} \\
(5, 1) = (5, 3) (3, 1) + (5, 1) \\
(n^{[2]}, 1) = (n^{[2]}, n^{[1]}) (n^{[1]}, 1) + (n^{[2]}, 1) \\
$

As we can see for bias, $b^{[1]} = (n^{[1]}, 1)$, $b^{[2]} = (n^{[2]}, 1)$. In the general case, $b^{[l]} = (n^{[l]}, 1)$. When considering a vectorized implementation, our matrices $z$ and $a$ become $Z$ and $A$ with dimensions:

$
Z^{[1]} = W^{[1]} X + b^{[1]} \\
(n^{[1]}, m) = (n^{[1]}, n^{[0]}) (n^{[0]}, m) + (n^{[1]}, 1) \\
$

where $b^{[1]}$ contains a single column but is broadcasted to $m$ examples, becoming $(n^{[1]}, m)$ automatically.