## Neural Network - Architecture

In order to get those non-lineal model from two linear models we are going to combine them.

Visually looks like over-imposing both models creating a new one.

![](images/non_linear_models_1.png)

It's like doing arithmetics on models, like this line + this line equals that curve.

![](images/non_linear_models_2.png)

A lineal model a whole probability space, this means that for every point it gives us the probability of the point being blue.

![](images/non_linear_models_3.png)

We calculate the probability for one of them, the probability for the other, then add them and then we apply the sigmoid function.

We can also weight the sum, giving more priority to any to the 2 models.

![](images/non_linear_models_4.png)
 
It's a linear combinations of the two linear models before or the line between the two models.

It looks like perceptrons

![](images/non_linear_models_5.png)

 
### Neural network
 
![](images/nn_example_1.png)
![](images/nn_example_2.png)
![](images/nn_example_3.png)
![](images/nn_example_4.png)

The weights on the left, tell us what equations the linear models have. And the weights on the right, tell us what the linear combination is of the two models to obtain the curve non-linear model in the right.

![](images/nn_example_5.png)
 
When you see a NN like this, think about what could be the nonlinear boundary defined by the neural network.

![](images/nn_example_6.png)

In every layer we have a bias unit comming from a node with a one on it. For Example "-8" on the top node becomes an edge labelled minus "-8" comming from the bias node.
![](images/nn_example_7.png)


### Multiple layers
We can do the following things to neural networks to make them more complicated:

* Add more nodes to the input, hidden, and output layers.
* Add more layers.

![](images/nn_architecture_1.png)

*__Input layer__*

*__Hidden layer:__* It's a set of linear models created with the inputs.

*__Output layer:__* where the linear models get combined to obtain a nonlinear model.

If the input layer has more nodes, for example 3 nodes, the hidden layer has planes in 3 dimention and the output layer bounds a nonlinear region in three space.

![](images/nn_architecture_3d.png)

If the input layer has more nodes, then we just have more outputs. A multi class classification model. 

![](images/nn_architecture_2.png)

If we have more layers, then we have a deep neural network.
![](images/nn_architecture_3.png)

The linear models combine to create nonlinear models and then these combine to create even more nonlinear models and these combine to create even more nonlinear models. 

![](images/nn_architecture_4.png)


In general we can do this many times and obtain highly complex models with lots of hidden layers. The neural network will just split the n-dimensional space with a highly nonlinear boundary.


#### Multi-class classification.

Neural networks are really good at binary classification
![](images/nn_architecture_b1.png)

We can think about having a Neural network per each case, but it would be overkilled
![](images/nn_architecture_b2.png)

What we need to do it to add more nodes in the output layer and each one of the nodes will give us the probability that the image is each of the animals.
![](images/nn_architecture_b3.png)

Then we take the scores and applty the SoftMax function to obtain well-defined probabilities.


#### Feedforward
Feedforward is the process neural networks use to turn the input into an output. 

`To obtain the prediction from the input vector.`

Training means what parameters should they have in the edges in order to model our data well.

The perceptron plots the point x1, x2 and it outputs the probability that the point is blue, since the point is in the red area the output is a small number, since the point it's not very likely to be blue.

![](images/nn_feedforward_1.png)
*Bad model

With complex networks it's the same. the neural network plot the point in the top graph and also in the bottom graph and the outputs coming out will be a small number from the top model. since the point lies in the red area which means it has a small probability of being blue and a large number from the second model, since the points lies in the blue area which means it has large probability of being blue.
As the two models get combined into this nonlinear model and 
the output layer plots the point and it tells the probability that the point is blue. 

![](images/nn_feedforward_2.png)
*Bad model 

This is what neural networks do, they take the input vector and then apply a sequence of linear models and sigmoid functions. These maps when combined become a highly non-linear map. And the final formula is simply y-hat.

![](images/nn_feedforward_3.png)

We do this again in a multi-layer perceptron.

![](images/nn_feedforward_4.png)


#### Error function

This function gives us a measure of error of how badly each point is being classified. This is a very small number if the point is correctly classified and a measure of how far the point is from the line and the point is incorrectly classified. 

![](images/nn_error_function_5.png)


### Backpropagation
Backpropagation consist on:
* Doing a feedforward operation.
* Comparing the output of the model with the desired output.
* Calculating the error.
* Running the feedforward operation backwards (backpropagation) to spread the error to each of the weights.
* Use this to update the weights, and get a better model.
* Continue this until we have a model that is good.

Feedforward:
![](images/nn_feedforward_1.png)

We ask the point "What do you want the model to do for you?" 
Since it was misclassified, it wants the boundary to come closer to it.
The lines get closer to it by updating the weights

![](images/nn_backpropagations_1.png)

We obtain two new weights ($w_1' and w_2'$)

#### In Multi-layer perceptrons:

We check the error: 
This model it's not good because it predicts that the point will be red when in reality the point is blue.
We ask the point? What so you want this model to do in order for you to be better classified?
and the point says, "I kind of want this blue region to come closer to me."

![](images/nn_backpropagations_2.png)

it seems like the top one is badly misclassifying, the point whereas the bottom one is classifying it correctly. So we kind of want to listen to the bottom one more and to the top one less.

So what we want to do is *__to reduce the weight coming from the top model and increase the weight coming from the bottom model__*. 


![](images/nn_backpropagations_3.png)

We can actually go to the linear models and ask the point, "What can these models do to classify you better?"
And the point will say, "Well,the top model is misclassifying me,so I kind of want this line to move closer to me. And the second model is correctly classifying me,so I want this line to move farther away from me."

![](images/nn_backpropagations_4.png)

And so this change in the model will actually update the weights.

So now after we update all the weights we have better predictions at all the models in the hidden layer and also a better prediction at the model in the output layer.


#### Backpropagation Math

![](images/nn_backpropagations_math_1.png)

![](images/nn_backpropagations_math_2.png)

When composing functions, that derivatives just multiply. Feedforwarding is literally composing a bunch of functions, and backpropagation is literally taking the derivative at each piece, and since taking the derivative of a composition is the same as multiplying the partial derivatives, then all we are going to do is multiply a bunch of partial derivatives to get what we want.


#### Training resume
![](images/nn_backpropagations_math_ff.png)

![](images/nn_backpropagations_math_bp_1.png)

![](images/nn_backpropagations_math_bp_2.png)


#### Calculation of the derivative of the sigmoid function
The sigmoid function has a beautiful derivative, which we can see in the following calculation. This will make our backpropagation step much cleaner.
![](images/nn_backpropagations_math_formula.gif)

