# **Module 3: Shallow Neural Network**

## Section I Neural Network Representation and Vectorization

#### 1. Neural network representation

##### (1) One hidden layer neural network

![Single Hidden Layer](resource%20database%20for%20MD%20notes/Week3/SingleHiddenLayer.jpg)
- **Definition**: A neural network structure that contains a hidden layer prior to its output layer (also called **2-layer network**)
- **Terms**:
    - Input layer (*layer 0*): $x_1, x_2, x_3 \Rightarrow X = a^{[0]}$
    - Hidden layer (*layer 1*): $a^{[1]}_1,a^{[1]}_2,a^{[1]}_3,a^{[1]}_4 \Rightarrow a^{[1]}$
        - *Hidden* indicates that the true values of these nodes are **not observed**.
    - Output layer (*layer 2*): $\hat{y} = a^{[2]}$

##### (2) Computing a neural network's output on a *single* sample

- For each node of *layer 1* $a_i^{[1]}$:
$$\boxed{z_i^{[1]} = \textbf{w}_i^{[1]T}\textbf{x} + b_i^{[1]}, a_i^{[1]} = \sigma{z_i^{[1]}}}$$
where: *superscipt* $^{[1]}$ indicates the No. of *layer*, *subscript* $_i$ indicates the No. of the *node* in this *layer*
- For the full *layer 1*:

$$\boxed{\left\{\begin{array}{c}
z_1^{[1]} = \textbf{w}_1^{[1]T}\textbf{x} + b_1^{[1]}\\
z_2^{[1]} = \textbf{w}_2^{[1]T}\textbf{x} + b_2^{[1]}\\
z_3^{[1]} = \textbf{w}_3^{[1]T}\textbf{x} + b_3^{[1]}\\
z_4^{[1]} = \textbf{w}_4^{[1]T}\textbf{x} + b_4^{[1]}
\end{array}\right\}\Rightarrow \left\{\begin{array}{c}
a_1^{[1]} = \sigma{z_1^{[1]}}\\
a_2^{[1]} = \sigma{z_2^{[1]}}\\
a_3^{[1]} = \sigma{z_3^{[1]}}\\
a_4^{[1]} = \sigma{z_4^{[1]}}
\end{array}\right\}}$$
- **Vectorization of the calculation**:
$$\boxed{\left[\begin{array}{c}
z_1^{[1]}\\
z_2^{[1]}\\
z_3^{[1]}\\
z_4^{[1]}
\end{array}\right] =  
\left[\begin{array}{c}
(w_1^{[1]})_{x_1} & (w_1^{[1]})_{x_2} & (w_1^{[1]})_{x_3}\\
(w_2^{[1]})_{x_1} & (w_2^{[1]})_{x_2} & (w_2^{[1]})_{x_3}\\
(w_3^{[1]})_{x_1} & (w_3^{[1]})_{x_2} & (w_3^{[1]})_{x_3}\\
(w_4^{[1]})_{x_1} & (w_4^{[1]})_{x_2} & (w_4^{[1]})_{x_3}\\
\end{array}\right]
\left[\begin{array}{c}
x_1\\
x_2\\
x_3
\end{array}\right] + \left[\begin{array}{c}
b_1^{[1]}\\
b_2^{[1]}\\
b_3^{[1]}\\
b_4^{[1]}
\end{array}\right]}$$
$$\boxed{\textbf{z}^{[1]} = \textbf{w}^{[1]T}\textbf{x} + \textbf{b}^{[1]}}$$
where: $\textbf{z}\mathrm{.shape} = (4,1), \textbf{w}^{[1]T}\mathrm{.shape} = (4,3), \textbf{x}\mathrm{.shape} = (3,1), \textbf{b}\mathrm{.shape} = (4,1)$  
$$\boxed{\left[\begin{array}{c}
a_1^{[1]}\\
a_2^{[1]}\\
a_3^{[1]}\\
a_4^{[1]}
\end{array}\right] = \sigma\left[\begin{array}{c}
z_1^{[1]}\\
z_2^{[1]}\\
z_3^{[1]}\\
z_4^{[1]}
\end{array}\right]}$$
$$\boxed{\textbf{a}^{[1]} = \sigma(\textbf{z}^{[1]})}$$
where: $\textbf{a}\mathrm{.shape} = \textbf{z}\mathrm{.shape} = (4,1)$
- For the full *layer 2*:
$$\boxed{z^{[2]} = \textbf{w}^{[2]T}\textbf{a}^{[1]} + b^{[2]}\Rightarrow a^{[2]} = \sigma(z^{[2]})}$$
where: $\textbf{w}^{[2]T}\mathrm{.shape} = (1,4), \textbf{a}^{[1]}\mathrm{.shape} = (4,1)$
- **Vectorization of the calculation**:
$$\boxed{z^{[2]} = \left[\begin{array}{c}
w_1^{[2]} & w_2^{[2]} & w_3^{[2]} & w_4^{[2]}\end{array}\right]
\left[\begin{array}{c}
a_1^{[1]}\\
a_2^{[1]}\\
a_3^{[1]}\\
a_4^{[1]}
\end{array}\right]+b^{[2]}}$$
- For the sample:
$$\textbf{z}^{[1]} = \textbf{w}^{[1]T}\textbf{x} + \textbf{b}^{[1]}$$
$$\textbf{a}^{[1]} = \sigma(\textbf{z}^{[1]})$$
$$z^{[2]} = \textbf{w}^{[2]T}\textbf{a}^{[1]} + b^{[2]}$$
$$a^{[2]} = \sigma(z^{[2]})$$

##### (3) Vectorizing across *multiple* examples

- For *m* samples:
    - Denotation of a *node*: $a_l^{[i](j)}$  
where: *i* denotes the No. of the *layer*, *j* denotes the No. of the *sample*, and *l* denotes the No. of the *node* in this layer
    - *For* loop for all samples:
        - *for i = 1 to m*:
            - $z^{[1](i)} = w^{[1]T}x^{(i)}+b^{[1]}$
            - $a^{[1](i)} = \sigma(z^{[1](i)})$
            - $z^{[2](i)} = w^{[2]T}a^{[1](i)}+b^{[2]}$
            - $a^{[2](i)} = \sigma(z^{[2](i)})$
- **Vectorization of the calculation**:
$$\boxed{\textbf{X} = \left[\begin{array}{c}
x_1^{(1)} & x_1^{(2)} & ... & x_1^{(m)}\\
x_2^{(1)} & x_2^{(2)} & ... & x_2^{(m)}\\
\vdots & \vdots & \vdots & \vdots \\
x_{n_x}^{(1)} & x_{n_x}^{(2)} & ... & x_{n_x}^{(m)}
\end{array}\right], \textbf{X}.\mathrm{shape} = (n_x,m)}$$
$$\boxed{\textbf{w} = \left[\begin{array}{c}
w_1^{(1)} & w_1^{(2)} & ... & w_1^{(k_1)}\\
w_2^{(1)} & w_2^{(2)} & ... & w_2^{(k_1)}\\
\vdots & \vdots & \vdots & \vdots \\
w_{n_x}^{(1)} & w_{n_x}^{(2)} & ... & w_{n_x}^{(k_1)}
\end{array}\right], \textbf{w}.\mathrm{shape} = (n_x,k_1)}$$
By converting $x^{(1)}$ - $x^{(m)}$ to $\textbf{X}$, we can recompute the equations for all *m* samples:
$$\textbf{Z}^{[1]} = \textbf{w}^{[1]T}\textbf{X} + \textbf{b}^{[1]}$$
$$\textbf{A}^{[1]} = \sigma(\textbf{Z}^{[1]})$$
where: $\textbf{A}^{[1]}.\mathrm{shape} = \textbf{Z}^{[1]}.\mathrm{shape} = (k_1,m), \textbf{w}^{[1]T}.\mathrm{shape} = (k_1,n_x),\textbf{X}.\mathrm{shape} = (n_x,m),\textbf{b}^{[1]}.\mathrm{shape} = (k_1,m)( \mathrm{broadcasted}\ \mathrm{from}\ (k_1,1))$, $k_1$ is the total number of *nodes* in *layer* 1 (i.e., 4 in this example).

$$\textbf{Z}^{[2]} = \textbf{w}^{[2]T}\textbf{A}^{[1]} + b^{[2]}$$
$$\textbf{A}^{[2]} = \sigma(\textbf{Z}^{[2]})$$
where: $\textbf{A}^{[2]}.\mathrm{shape} = \textbf{Z}^{[2]}.\mathrm{shape} = (1,m), \textbf{w}^{[2]T}.\mathrm{shape} = (1,n_x),\textbf{X}.\mathrm{shape} = (n_x,m),\textbf{b}^{[2]}.\mathrm{shape} = (1,m)( \mathrm{broadcasted}\ \mathrm{from}\ (1,1))$, assuming *layer* 2 is the *output layer* where $\textbf{A}^{[2]} = \hat{\textbf{Y}}$
- **Further explanation for vectorized implementation**:
$$\textbf{w}^{[1]T}\textbf{X} = \left[\begin{array}{c}
w_1^{[1](1)} & w_2^{[1](1)} & ... & w_{n_x}^{[1](1)}\\
w_1^{[1](2)} & w_2^{[1](2)} & ... & w_{n_x}^{[1](2)}\\
\vdots & \vdots & \vdots & \vdots \\
w_1^{[1](k_1)} & w_2^{[1](k_1)} & ... & w_{n_x}^{[1](k_1)}
\end{array}\right]\left[\begin{array}{c}
x_1^{(1)} & x_1^{(2)} & ... & x_1^{(m)}\\
x_2^{(1)} & x_2^{(2)} & ... & x_2^{(m)}\\
\vdots & \vdots & \vdots & \vdots \\
x_{n_x}^{(1)} & x_{n_x}^{(2)} & ... & x_{n_x}^{(m)}
\end{array}\right] = \left[\begin{array}{c}
\sum\limits_{i=1}^{n_x}w_i^{[1](1)}x_i^{(1)} & \sum\limits_{i=1}^{n_x}w_i^{[1](1)}x_i^{(2)} & ... & \sum\limits_{i=1}^{n_x}w_i^{[1](1)}x_i^{(m)}\\
\sum\limits_{i=1}^{n_x}w_i^{[1](2)}x_i^{(1)} & \sum\limits_{i=1}^{n_x}w_i^{[1](2)}x_i^{(2)} & ... & \sum\limits_{i=1}^{n_x}w_i^{[1](2)}x_i^{(m)}\\
\vdots & \vdots & \vdots & \vdots\\
\sum\limits_{i=1}^{n_x}w_i^{[1](k_1)}x_i^{(1)} & \sum\limits_{i=1}^{n_x}w_i^{[1](k_1)}x_i^{(2)} & ... & \sum\limits_{i=1}^{n_x}w_i^{[1](k_1)}x_i^{(m)}\\
\end{array}\right]$$