* [Neural Network Overview](#nn)
* [Neural Network Representation](#nnrep)
* [Vectorizing Across Multiple Examples](#vecm)
* [Explanation for Vectorized Implementation](#veci)
* [Activation Functions](#af)
* [Why do you need non-linear activation functions?](#why)
* [Derivatives of activation functions](#dr)
* [Gradient descent for Neural Networks](#gd)
* [Backpropagation intuition](#back)
* [Random Initialization](#rand)

<div id="nn">
    <img src="https://i.imgur.com/73oPtj6.png" style="width:700px;height:400px; float: left;">
</div>

<div id="nnrep">
    <img src="https://i.imgur.com/rtdBLyj.png" style="width:700px;height:400px; float: left;">
</div>

<img src="https://i.imgur.com/kTd417e.png" style="width:700px;height:400px; float: left;">

<img src="https://i.imgur.com/tYB9G3H.png" style="width:700px;height:400px; float: left;">

<img src="https://i.imgur.com/F9kMmTs.png" style="width:700px;height:400px; float: left;">

<img src="https://i.imgur.com/psB29lQ.png" style="width:700px;height:400px; float: left;">

<div id="vecm">
    <img src="https://i.imgur.com/9hqY728.png" style="width:700px;height:400px; float: left;">
</div>

<img src="https://i.imgur.com/KAjtoZ0.png" style="width:700px;height:400px; float: left;">

<div id="veci">
    <img src="https://i.imgur.com/k7g9ZLH.png" style="width:700px;height:400px; float: left;">
</div>

<img src="https://i.imgur.com/19VaWoe.png" style="width:700px;height:400px; float: left;">

<img src="https://i.imgur.com/qVQp7SH.png" style="width:280px;height:160px; float: top;">

$$ X = \begin{bmatrix}
    \vdots & &\vdots \\
    x^{(1)} & \dots & x^{(m)} \\
    \vdots & & \vdots
\end{bmatrix} $$

$$ z^{[1]} = \begin{bmatrix}
    z_1^{[1]} \\
    z_2^{[1]} \\
    z_3^{[1]} \\
    z_4^{[1]}
\end{bmatrix} = \begin{bmatrix}
    \dots & {w_1^{[1]}}^T & \dots \\
    \dots & {w_2^{[1]}}^T & \dots \\
    \dots & {w_3^{[1]}}^T & \dots \\
    \dots & {w_4^{[1]}}^T & \dots
\end{bmatrix}
\begin{bmatrix}
    x_1 \\
    x_2 \\
    x_3
\end{bmatrix} + \begin{bmatrix}
    b_1^{[1]} \\
    b_2^{[1]} \\
    b_3^{[1]}
\end{bmatrix}$$

$$ z^{[1]} = W^{[1]}X + b^{[1]} $$

$$ a^{[1]} = g^{[1]}(z^{[1]}) $$

$$ z^{[2]} = W^{[2]}a^{[1]} + b^{[2]} $$

$$ a^{[2]} = g^{[2]}(z^{[2]}) = \sigma(z^{[2]}) $$<br>

$$ W^{[1]} \in \mathbb{R}^{(4,3)},\ X \in \mathbb{R}^{(3,1)},\ a^{[1]} \in \mathbb{R}^{(4,1)} $$
$$ W^{[2]} \in \mathbb{R}^{(1,4)},\ a^{[1]} \in \mathbb{R}^{(4,1)},\ a^{[2]} \in \mathbb{R}^{(1,1)} $$



$ for\ i\ in\ range(m):$<br>
> $ z^{[1](i)} = W^{[1]}X + b^{[1]} $<br>
> $ a^{[1](i)} = g(z^{[1](i)}) $<br>
> $ z^{[2](i)} = W^{[2]}a^{[1](i)} + b^{[2]} $<br>
> $ a^{[2](i)} = g(z^{[2](i)}) $<br>

$$ W^{[1]} = \begin{bmatrix}
    \dots & {w_1^{[1]}}^T & \dots \\
    \dots & {w_2^{[1]}}^T & \dots \\
    \dots & \vdots & \dots \\
    \dots & {w_{n^{[1]}}^{[1]}}^T & \dots
\end{bmatrix} $$

$$ X = A^{[0]} = \begin{bmatrix}
    \vdots & &\vdots \\
    x^{(1)} & \dots & x^{(m)} \\
    \vdots & & \vdots
\end{bmatrix} $$

$$ Z^{[1]} = \begin{bmatrix}
    \vdots & &\vdots \\
    z^{[1](1)} & \dots & z^{[1](m)} \\
    \vdots & & \vdots
\end{bmatrix} $$

$$ A^{[1]} = \begin{bmatrix}
    \vdots & &\vdots \\
    a^{[1](1)} & \dots & a^{[1](m)} \\
    \vdots & & \vdots
\end{bmatrix} $$<br>


$$ Z^{[1]} = W^{[1]}A^{[0]} + b^{[1]} $$

$$ A^{[1]} = g^{[1]}(Z^{[1]}) $$

$$ Z^{[2]} = W^{[2]}A^{[1]} + b^{[2]} $$

$$ A^{[2]} = g^{[2]}(Z^{[2]})= \sigma(Z^{[2]}) $$<br>

$$ W^{[1]} \in \mathbb{R}^{(n^{[1]},n^{[0]})},\ A^{[0]} \in \mathbb{R}^{(n^{[0]},m)},\ A^{[1]} \in \mathbb{R}^{(n^{[1]},m)} $$
$$ W^{[2]} \in \mathbb{R}^{(1,n^{[1]})},\ A^{[1]} \in \mathbb{R}^{(n^{[1]},1)},\ A^{[2]} \in \mathbb{R}^{(1,1)} $$


<div id="af">
    <img src="https://i.imgur.com/GAkZxzG.png" style="width:700px;height:400px; float: left;">
</div>

<img src="https://i.imgur.com/oSLHbTi.png" style="width:700px;height:400px; float: left;">

<div id="why">
    <img src="https://i.imgur.com/2eD0xhO.png" style="width:700px;height:400px; float: left;">
</div>

<div id="dr">
    <img src="https://i.imgur.com/4JG20BB.png" style="width:700px;height:400px; float: left;">
</div>

<img src="https://i.imgur.com/o3i8MrL.png" style="width:700px;height:400px; float: left;">

<img src="https://i.imgur.com/m3OdTLk.png" style="width:700px;height:400px; float: left;">

<div id="gd">
    <img src="https://i.imgur.com/corP5Jo.png" style="width:700px;height:400px; float: left;">
</div>

<img src="https://i.imgur.com/okskvtk.png" style="width:700px;height:400px; float: left;">

<div id="back">
    <img src="https://i.imgur.com/AsZoj55.png" style="width:700px;height:400px; float: left;">
</div>

<img src="https://i.imgur.com/RCMV3uW.png" style="width:700px;height:400px; float: left;">

<img src="https://i.imgur.com/yroXRKB.png" style="width:700px;height:400px; float: left;">

<img src="https://i.imgur.com/JNR4sYx.png" style="width:700px;height:400px; float: left;">

$$ z = w^Tx + b \rightarrow a = \sigma(z) \rightarrow \mathcal{L}(a, y) $$
$$ dz = da\cdot g{'}(z) = a - y $$
$$ dw = dz\cdot x $$
$$ db = dz $$

$$ z^{[1]} = W^{[1]}X + b^{[1]} \rightarrow a^{[1]} = g^{[1]}(z^{[1]}) \rightarrow z^{[2]} = W^{[2]}a^{[1]} + b^{[2]} \rightarrow a^{[2]} = g^{[2]}(z^{[2]}) = \sigma(z^{[2]}) $$
$$ dz^{[2]} = a^{[2]} - y $$
$$ dW^{[2]} = dz^{[2]}{a^{[1]}}^T$$
$$ db^{[2]} = dz^{[2]} $$
$$ dz^{[1]} = {W^{[2]}}^Tdz^{[2]}*g^{[1]'}(z^{[1]}) $$
$$ dW^{[1]} = dz^{[1]}x^T $$
$$ db^{[1]} = dz^{[1]} $$<br>

$$ dZ^{[2]} = A^{[2]} - Y $$
$$ dW^{[2]} = \frac{1}{m}dZ^{[2]}{A^{[1]}}^T $$
$$ db^{[2]} = \frac{1}{m}np.sum(dZ^{[2]},\ axis=1,\ keepdims=True) $$
$$ dZ^{[1]} = {W^{[2]}}^T dZ^{[2]}*g^{[1]'}(Z^{[1]}) $$
$$ dW^{[1]} = \frac{1}{m}dZ^{[1]}X^T $$
$$ db^{[1]} = \frac{1}{m}np.sum(dZ^{[1]},\ axis=1,\ keepdims=True) $$

<div id="rand">
    <img src="https://i.imgur.com/E4T0M44.png" style="width:700px;height:400px; float: left;">
</div>

activations of hidden units become identical

$ W^{[1]} $ as well as $ dw $ will have identical rows

<img src="https://i.imgur.com/qWQGqzS.png" style="width:700px;height:400px; float: left;">

Random Initialization:

$ W^{[1]} = np.random.randn(2,2)*0.01 $<br>
$ b^{[1]} = np.zeros((2,1)) $<br>
$ W^{[2]} = np.random.randn(1,2)*0.01 $<br>
$ b^{[2]} = 0 $

In [1]:
import numpy as np

A = np.random.randn(4,3)
B = np.sum(A, axis = 1, keepdims = True)
print(A)
print(B)
print(A.shape)
print(B.shape)

[[-0.30337578 -0.34457447 -1.18786543]
 [-0.46428449 -1.08368126 -0.35498015]
 [ 0.61267569  1.32231855  0.43636428]
 [-1.45285995 -1.52168787  0.66469301]]
[[-1.83581568]
 [-1.90294591]
 [ 2.37135852]
 [-2.30985481]]
(4, 3)
(4, 1)
