# Neural Networks - Representation

## 1. Model Representation

Neural Networks were developed as simulating networks of neurons in the brain. So, to start understanding the representation of these hypotheses, let's start by understanding how a single neuron in the brain works: 

![Biological neuron](https://upload.wikimedia.org/wikipedia/commons/4/44/Neuron3.png)

The components of the neuron are:

- A cell body;
- Input wires (dendrites);
- Output wire (axon).

The axon often goes to the dendrites of other neurons, forming a network.

The neurons communicate via pulses of electricity.

### Neuron model: Logistic unit

Given the above, we're going to use a very simple model of the neuron:

![single neuron](figures/single_neuron.png)

where 

$$
h_{\theta}(x) = \frac{1}{1 + e^{-\theta^T x}} = g(\theta^T x)
$$

with $x=[x_0 \quad x_1 \quad x_2 \quad x_3]^T$ and $\theta=[\theta_0 \quad \theta_1 \quad \theta_2 \quad \theta_3]^T$.

Here, we are using the sigmoid (logistic) function as the **activation function**. Other functions such as $\tanh(\cdot)$ or $\mathrm{ReLU(\cdot)}=\max\{0, \cdot\}$ are also used often.

### Neural network

A neural networks is a group of these neurons acting together:

![neural network](figures/neural_network.png)

In this network:

- $a_i^{(j)}$ is the activation of unit $i$ in layer $j$.
- $\Theta^{(j)}$ is the matrix of weights controlling function mapping from layer $j$ to layer $j+1$.

Thus:

\begin{align}
a_1^{(2)} &= g(\Theta_{10}^{(1)} x_0 + \Theta_{11}^{(1)} x_1 + \Theta_{12}^{(1)} x_2 + \Theta_{13}^{(1)} x_3) \\
a_2^{(2)} &= g(\Theta_{20}^{(1)} x_0 + \Theta_{21}^{(1)} x_1 + \Theta_{22}^{(1)} x_2 + \Theta_{23}^{(1)} x_3) \\
a_3^{(2)} &= g(\Theta_{30}^{(1)} x_0 + \Theta_{31}^{(1)} x_1 + \Theta_{32}^{(1)} x_2 + \Theta_{33}^{(1)} x_3) \\
& \\
h_{\Theta}(x) &= a_1^{(3)} = g(\Theta_{10}^{(3)} a_0^{(2)} + \Theta_{11}^{(2)} a_1^{(2)} + \Theta_{32}^{(2)} a_2^{(2)} + \Theta_{33}^{(2)} a_3^{(2)})
\end{align}

In this setting $\Theta^{(1)} \in \mathbb{R}^{3 \times 4}$ and $\Theta^{(2)} \in \mathbb{R}^{1 \times 4}$.

In general, if a network has $s_j$ units in layer $j$, and $s_{j+1}$ units in layer $j+1$, then $\Theta^{(j)}$ will be of dimension $s_{j+1} \times (1 + s_j)$.

In the above setting, we can define intermediate variables 

$$
z^{(j+1)}_i = \Theta_{i0}^{(j)} a_0^{(j)} + \Theta_{i1}^{(j)} a_1^{(j)} + \Theta_{i2}^{(j)} a_2^{(j)} + \Theta_{i3}^{(j)} a_3^{(j)},
$$

and in terms of $z^{(j+1)}_i$ we can define $a^{(j+1)}_i$ as:

$$
a^{(j+1)}_i = g(z^{(j+1)}_i).
$$

Moreover, we can write the above in a vectorized efficient form as:

\begin{align}
a^{(1)} = x\\
z^{(2)} &= \Theta^{(1)} \left[\begin{array}{c} 1 \\ a^{(1)} \end{array}\right] \\
a^{(2)} &= g(z^{(2)}) \\
z^{(3)} &= \Theta^{(2)} \left[\begin{array}{c} 1 \\ a^{(2)} \end{array}\right] \\
a^{(3)} &= g(z^{(3)}).
\end{align}

where

$$
x = \left[\begin{array}{c} x_1 \\ x_2 \\ x_3 \end{array}\right], \qquad z^{(2)} = \left[\begin{array}{c} z^{(2)}_1 \\ z^{(2)}_2 \\ z^{(2)}_3 \end{array}\right], \qquad a^{(2)} = \left[\begin{array}{c} a^{(2)}_1 \\ a^{(2)}_2 \\ a^{(2)}_3 \end{array}\right], \qquad z^{(3)} = z^{(3)}_1, \qquad a^{(3)} = a^{(3)}_1,
$$

and

$$
\Theta^{(1)} = \left[
\begin{array}{cccc}
\Theta_{10}^{(1)} & \Theta_{11}^{(1)} & \Theta_{12}^{(1)} & \Theta_{13}^{(1)}  \\
\Theta_{20}^{(1)} & \Theta_{21}^{(1)} & \Theta_{22}^{(1)} & \Theta_{23}^{(1)}  \\
\Theta_{30}^{(1)} & \Theta_{31}^{(1)} & \Theta_{32}^{(1)} & \Theta_{33}^{(1)} 
\end{array}\right],
\qquad 
\Theta^{(2)} = \left[
\begin{array}{cccc}
\Theta_{10}^{(2)} & \Theta_{11}^{(2)} & \Theta_{12}^{(2)} & \Theta_{13}^{(2)} 
\end{array}\right],
$$

This algorithm is called **forward propagation**.

<script>
  $(document).ready(function(){
    $('div.prompt').hide();
    $('div.back-to-top').hide();
    $('nav#menubar').hide();
    $('.breadcrumb').hide();
    $('.hidden-print').hide();
  });
</script>

<footer id="attribution" style="float:right; color:#808080; background:#fff;">
Created with Jupyter by Esteban Jiménez Rodríguez. Based on the content of the Machine Learning course offered through coursera by Prof. Andrew Ng.
</footer>