# Overview of Neural Networks
A supervised neural network can be looked as a black box with two methods **learn** and **predict** as shown below. The learning process takes the inputs and the desired outputs and updates its internal state accordingly, so that the calculated output gets closer to the actual/desired output. The predict process takes input and generate, using the internal state/model, the most likely output according to its **training experience**. 

### Neural Network - learn and predict

![NN_Blackbox.jpg](attachment:NN_Blackbox.jpg)

Now, let's understand the steps invloved in the learning process.

#### Step 1- Model initialization

To start with, in this first step, the weights and bias are initialized to random values. We start with the randomly derived model using initial weights and biases, improve it iteratively, and make it converge - to fit the real/desired model.

#### Step 2- Forward propagate

Using the model derived in step 1, we process the labelled inputs (either all inputs or in batches) and check the output given by the model.

#### Step 3- Loss function

Now its time that we measure the correctness of the model, by defining a metric - Loss Function, which will calculate the deviation/error/loss, the output of the current model w.r.t the actual/desired output of the given input. Ex:- Mean Square Error, Absolute Error etc. The loss function is an error metric, that gives an indicator on how much precision we lose, if we replace the real desired output by the actual output generated by our trained neural network model.

Loss = Differnce(Desired Output, Computed Output)

The goal now is to minimise the loss function, to reach as close as possible to 0. Now the Machine Learning problem transforms  to an optimisation process that aims to minimise the loss function, which in turn is to optimise the weights and biases.

#### Step 4- Differentiation

In mathematics, the derivative of a function at a certain point, gives the rate or the speed of which this function is changing its values at this point - magnitude and direction.

We try to understand - the rate of the error changes, relatively to the changes on the weights and biases.

- If derivative is **positive**, meaning the error increases if we increase the weights, then we should decrease the weight.
- If derivative **negative**, meaning the error decreases if we increase the weights, then we should increase the weight.
- If derivative **Zero(0)**, we do nothing, we reach our stable point.

![Gradient.jpg](attachment:Gradient.jpg)

The total error/loss generated in our model/computation - has the contribution from each layer depending on their weight and biases. So if we are to correct/optimise the model we have to correct the error at each of the layers, from the last layer to the first. We also should note that the chain/compunding effect of the error propogation between the layers. Hence we need to correct them from last to first.

#### Step 5- Back-propagation

We have the starting point of errors, which is the loss function, and we know how to derivate it, and if we know how to derivate each function from the composition, we can propagate back the error from the end to the start. Luckily for us, derivative is decomposable, thus can be back-propagated.

If we create a library of differentiable functions or layers where for each function we know how to forward-propagate (by directly applying the function) and how to back-propagate (by knowing the derivative of the function), we can compose any complex neural network. We only need to keep a stack of the function calls during the forward pass and their parameters, in order to know the way back to backpropagate the errors using the derivatives of these functions. This can be done by de-stacking through the function calls. This technique is called **auto-differentiation**, and requires only that each function is provided with the implementation of its derivative


#### Step 6- Weight update

Once we back-propogate the errors, we need to update the model's weights and bias, so that we improve the model to reach our desired output. We shouldnt update the weights and biases in big steps, vary it in small steps over multiple steps so that we are in control of the process.

General rule of weight updates is the delta rule:

New weight = old weight — Derivative Rate * learning rate

The learning rate is introduced as a constant (usually very small), in order to force the weight to get updated very smoothly and slowly (to avoid big steps and chaotic behaviour).

In order to validate this equation:

- If the derivative rate is positive, it means that an increase in weight will increase the error, thus the new weight should be smaller.
- If the derivative rate is negative, it means that an increase in weight will decrease the error, thus we need to increase the weights.
- If the derivative is 0, it means that we are in a stable minimum. Thus, no update on the weights is needed -> we reached a stable state.

#### Step 7- Iterate until convergence

Now, we iterate the above steps untill we have no more improvement in the error. Since we update the weights with a small delta step at a time, it will take several iterations in order to learn. In neural network, after each iteration, the gradient descent force updates the weights towards less and less global loss function.

How many iterations are needed to converge?

- This depends on how strong the learning rate we are applying. High learning rate means faster learning, but with higher chance of instability.
- It depends as well on the meta-parameters of the network (how many layers, how complex the non-linear functions are). The more it has variables the more it takes time to converge, but the higher precision it can reach.
- It depends on the optimisation method use, some weight updates rule are proven to be faster than others.
- It depends on the random initialisation of the network.
- It depends on the quality of the training set. If the input and output has no correlation between each other, the neural network will not do magic and can’t learn a random correlation.

### Summary

The overall summary is given in the below diagram.

![NN_Steps.jpg](attachment:NN_Steps.jpg)

References
- Class Notes
- https://medium.com/datathings/neural-networks-and-backpropagation-explained-in-a-simple-way-f540a3611f5e


