## Lineal Algebra in Neural Networks

The design of the Artificial Neural Network was inspired by the biological one. The neurons used in the artificial network below are essentially mathematical functions.

Each network has:
* Input neurons: which we refer to as the input layers of neurons.
* Output neurons: which we refer to as the output layer of neurons.
* Internal neurons: which we refer to as the hidden layer of neurons. Each neural network can have many hidden layers.

![](images/simple_neural_network.png)

This version of a simplified artificial neural network is compromise out of:
* An input vector $\vec{x} = [x_1 x_2 x_3 ... x_n]$
* A hidden layer vector $\vec{h} = [h_1 h_2 h_3 ... h_m]$
* An output vector $\vec{y} = [y_1 y_2 y_3 ... y_k]$

Each element in the vectors is a mathematical argument
There is no connection between the number of inputs, number of hidden neurons in the hidden layer or number of outputa


Lines connecting the different:
* In practice, these lines symbolize a coefficient (a scalar) that is mathematically connecting one neuron to the next. These coefficients are called *__weights__*
* The "lines" connect each neuron in a specific layer to *__all__* of neurons on the following. For example, in out example, you can see how each neuron is the hidden layer is connected to a neuron in the output one.

Since there are so many *__weights__* connecting one layer to the next, we mathematically organize those coefficients in a matrix, denoted as the *__weight matrix__*

![](images/weight_matrix.png)

$w_k$ is the weight matrix k

$w^k_{ij}$ is the ij element of the weight matrix k


![](images/activation_function.png)

When working with neural networks we have 2 primary phases:
* Training
* Evaluation

During the training phase, we take the data set (also called training set), which include many pairs of inputs and their corresponding targets (outputs). Our goal is to find a set of weights that would best map the inputs to the desired outputs.

In the evaluation phase, we use the network that was created in the training phase, apply out new inputs and expect to obtain the desired otputs.

The training phase will include two steps:
* Feedforward
* Backpropagation

We will repeat these steps as many times as we need until we decided that our system has reached the best set of weights, giving us the best possible outputs.


### The Feedforward Process - Finding $\vec{h}$ - STEP 1

In this section we will look closely at the math behind the feedforward process. With the use of basic Linear Algebra tools, these calculations are pretty simple!
Assuming that we have a single hidden layer, we will need two steps in our calculations. The first will be calculating the value of the hidden states and the latter will be calculating the value of the outputs.

![](images/finding_h.png)

Both the hidden layer and the output layer are displayed as vectors, as they are both represented by more than a single neuron.

![](images/step1_finding_h.png)

Vector $\vec{h}$ of the hidden layer will be caltulated by multiplying the input vecrtor with the weight matrix $W^1$ the following way:

![](images/finding_h_simple.png)

Using vector by matrix multiplication, we can look at this computation the following way:

![](images/finding_h_matrix_multi.png)

After finding $\vec{h^1}$ we need an activation function, the symbol we use for the activation function is the greek letter phi: $\phi$

This activation function finalizes the computation of the hidden layer's values.

We can use the following two equations to express the final hidden vector $\vec{h^1}$

$\vec{h} = \phi(\vec{x}W^1)$

or 

$\vec{h} = \phi(\vec{h^1})$

Since $W_{ij}$ represents the weight component in the weight matrix, connectingneuron i from the input to neuron j in the hidden layer, we can also write these calculations using a *__linear combination__*
(in this example we have n inputs and only 3 hidden neurons)

$h_1 = \phi(x_1W_11 + x_2W_21 + ... x_nW_n1)$

$h_2 = \phi(x_1W_12 + x_2W_22 + ... x_nW_n2)$

$h_3 = \phi(x_1W_13 + x_2W_23 + ... x_nW_n3)$




## The Feedforward Process - Finding $\vec{y}$ - STEP 2

The process of calculating the output vector is mathematically similar to that of calculating the vector of the hidden layer. We use, again, a vector by matrix multiplication. The vector is the newly calculated hidden layer and the matrix is the one connecting the hidden layer to the output.

<img src="images/feedforward_step2.png" height=300 width=750/>

Essentially, each new layer in an neural network is calculated by a vector by matrix multiplication, where the vector represents the inputs to the new layer and the matrix is the one connecting these new inputs to the next layer.

ie: The input vector is $\vec{h}$ and the matrix is $W^2$, therefore $\vec{y} = \vec{h}W^2$

$\begin{bmatrix}y_1&y_2\end{bmatrix} = \begin{bmatrix}h_1 & h_2 & h_3\end{bmatrix}\begin{bmatrix} w_{11} & w_{12} \\ w_{21} & w_{22}\\ w_{31} & w_{32} \end{bmatrix}$

![](images/feedforward_step2_a.png)
![](images/feedforward_step2_b.png)
![](images/feedforward_step2_c.png)
![](images/feedforward_step2_d.png)
![](images/feedforward_step2_e.png)