# Deep Neural Network

### Deep Neural Network Notation
* L = number of layers
* $n^{[l]}$ = number of units in layers, $l$
    - $n^{[0]} = n_x = $ number of units in the input layer
* $a^{[l]}$ = activations in layer $l$
    - we compute $a^{[l]} = g^{[l]}(z^{[l]})$
* $w^{[l]} = $weights for computing $z^{[l]}$ in layer $l$
* $b^{[l]} = $ used to compute $z^{[l]}$
* Input features are called $X$ and $X = a^{[0]}$. The activation of the final layer, $a^{[l]} = \hat{y}$.

## Forward Propagation in a Deep Neural Network

The general equation for forward propagation for a single training example looks like:
$$ 
z^{[l]} = w^{[l]} a^{[l-1]} + b^{[l]}\\
a^{[l]} = g^{[l]}(z^{[l]})
$$

The general equation for forward propagation for vectorized sytem of equations of the entire training set looks like:
$$ 
Z^{[l]} = W^{[l]} A^{[l-1]} + ^{[l]}\\
A^{[l]} = g^{[l]}(Z^{[l]})
$$



## Getting the matrix dimensions right

- The dimensions of parameter $W^{[l]}$ are: $(n^{[l]}, n^{[l-1]})$.
- The dimensions of parameter $b^{[l]}$ are: $(n^{[l]}, 1)$.
- If we are implementing the backward propagation, the dimensions of $dW^{[l]}$ are: $(n^{[l]},  n^{[l-1]}))$ (same dimensions as $W^{[l]})$
- If we are implementing the backward propagation, the dimensions of $db^{[l]}$ are: $(n^{[l]}, 1)$ (same dimensions as $b^{[l]})$

Now lets check the dimensions of $a^{[l]},\ z^{[l]}$ and  $A^{[l]},\ Z^{[l]}$.

We know that equation
$$ 
\underbrace{z^{[l]}}_{(n^{[l]}, 1)} = \underbrace{w^{[l]}}_{(n^{[l]}, 1)} \cdot \underbrace{x}_{(n^{[0]}, 1)} = \underbrace{b^{[l]}}_{(n^{[1]}, 1)}
$$

In the vectorized form, it beomes
$$
\underbrace{Z^{[l]}}_{(n^{[l]}, m)} = \underbrace{W^{[l]}}_{(n^{[l]}, n^{[l-1]})} \cdot \underbrace{X}_{(n^{[l-1]}, m)} = \underbrace{b^{[l]}}_{(n^{[1]}, 1)}
$$
- Thus dimension of ${Z^{[l]}}$ and $A^{[l]}$ are $(n^{[l]}, m)$.
- The dimensions of $A^{[0]} = X$ are $(n^{[0]}, m)$
- If we are implementing the backward propagation, the dimensions of $dZ^{[l]},\ dA^{[l]}$ are: $(n^{[l]}, m)$ (same dimensions as $Z^{[l]})$ and $A^{[l]}$.

## Building Blocks of Deep Neural Network

### Forward and Backward Propagation

Suppose we are at the layer $l$ in the network. Then we have: 
- $W^{[l]},\ b^{[l]}$
- For forward Propagation: Input $a^{[l-1]}$, $\quad$ Output $a^{[l]}$
- For backward Propagation, Input $da^{[l]}$, cache$(z^{[l]})$, $\qquad$ ouput $da^{[l-1]}$, $dw^{[l]}$, $db^{[l]}$

## Backward and Forward Propagation

### Forward Propagation for layer $l$

- Input $a^{[l-1]}$
- Output $a^{[l]}$, cache($z^{[l]}$, along with $w^{[l]}$, and $b^{[l]}$)

Implentations for single training examples
$$ 
z^{[l]} = w^{[l]} \cdot a^{[l-1]} + b^{[l]}\\
a^{[l]} = g^{[l]}(z^{[l]})
$$

and in Vecotrized Form
$$
Z^{[l]} = W^{[l]} \cdot A^{[l-1]} + b^{[l]}\\
A^{[l]} = g^{[l]}(Z^{[l]})
$$

### Backward Propagation for layer $l$

- Input $da^{[l]}$
- Output $da^{[l-1]},\ dW^{[l]},\ db^{[l]}$ along with $a^{[l-1]}$  in cache

Implentations for single training examples
$$ 
dz^{[l]} = da^{[l]} \times g^{[l]'}(z^{[l]})\\
dW^{[l]} = dz^{[l]} \cdot a^{[l-1]}\\
db^{[l]} = dz^{[l]} \cdot a^{[l-1]}\\
da^{[l-1]} = W^{[l]T} dz^{[l]}\\
\text{ Repeat from above }
$$

and in Vecotrized Form
$$
dZ^{[l]} = dA^{[l]} \times g^{[l]'}(Z^{[l]})\\
dW^{[l]} = \frac{1}{m} dZ^{[l]} \cdot A^{[l-1]T}\\
db^{[l]} = \frac{1}{m} \text{ np.sum } ( dZ^{[l]} , \text{ axis = 1}, \text{ keepdims =True})\\
dA^{[l-1]} = W^{[l]T} dZ^{[l]}\\
$$

## Parameters vs Hyperparameters

Parameters: $W^{[l]},\ b^{[l]}, \dots$

Hyperparameters: 
- Learning rate $\alpha$
- Number of iterations
- Number of hidden layer ($l$)
- Hidden units $n^{[1]},\ n^{[2]}, \dots$
- Choice of activation function

Later, we will also see momentum term, mini batch size, various forms of regularization parameters, and so on.