<a href="https://colab.research.google.com/github/rahiakela/machine-learning-algorithms/blob/main/pro-machine-learning-algorithm/07-artificial-neural-network/neural_network_algorithm.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Neural Network Algorithm

Artificial neural network is a supervised learning algorithm that leverages a mix of multiple hyper-parameters that help in approximating complex relation between input and output. Some of the hyper-parameters in artificial neural network include the following:

- Number of hidden layers
- Number of hidden units
- Activation function
- Learning rate

Neural networks came about from the fact that not everything can be approximated
by a linear/logistic regression—there may be potentially complex shapes within data that can only be approximated by complex functions. The more complex the function (with some way to take care of overfitting), the better is the accuracy of predictions. We’ll start by looking at how neural networks work toward fitting data into a model.

##Structure of a Neural Network

The input level/layer in the figure is typically the independent variables that are used to predict the output (dependent variable) level/layer. Typically in a regression problem, there will be only one node in output layer and in a classification problem, the output layer contains as many nodes as the number of classes (distinct values) present in dependent variable.

<img src='https://github.com/rahiakela/machine-learning-algorithms/blob/main/pro-machine-learning-algorithm/07-artificial-neural-network/images/ann-1.png?raw=1' hieght='200' width='400'/>

The hidden level/layer is used to transform the input variables into a higher order function. The way the hidden layer transforms the output is shown:

<img src='https://github.com/rahiakela/machine-learning-algorithms/blob/main/pro-machine-learning-algorithm/07-artificial-neural-network/images/ann-2.png?raw=1' hieght='200' width='400'/>

$x_1$ and $x_2$ are the independent variables, and $b_0$ is the bias term
(similar to the bias in linear/logistic regression). $w_1$ and $w_2$ are the weights given to each of the input variables. 

If $a$ is one of the units/neurons in hidden layer, it is equal to the following:


$$ a = f  \left( \sum_{i=0}^N w_i x_i \right) $$

The function in the preceding equation is the activation function we are applying on top of the summation so that we attain non-linearity (we need non-linearity so that our model can now learn complex patterns).

Moreover, having more than one hidden layer helps in achieving high non-linearity. We want to achieve high non-linearity because without it, a neural network would be a giant linear function.

Hidden layers are necessary when the neural network has to make sense of
something very complicated, contextual, or non-obvious, like image recognition. The term deep learning comes from having many hidden layers. These layers are known as hidden because they are not visible as a network output.



##Training a Neural Network

Training a neural network basically means calibrating all the weights by repeating two key steps: 

- forward propagation 
- back propagation

In forward propagation, we apply a set of weights to the input data and calculate an output. For the first forward propagation, the set of weights’ values are initialized randomly.

In back propagation, we measure the margin of error of the output and adjust the
weights accordingly to decrease the error.

Neural networks repeat both forward and back propagation until the weights are
calibrated to accurately predict an output.

###Forward Propagation

Let’s go through a simple example of training a neural network to function as an
exclusive or (XOR) operation to illustrate each step in the training process. The XOR function can be represented by the mapping of the inputs and outputs, as shown in the following table, which we’ll use as training data. It should provide a correct output given any input acceptable by the XOR function.

| Input | Output |
| ---   | ---    |
| (0, 0) |   0   |
| (0, 1) |   1   |
| (1, 0) |   1   |
| (1, 1) |   0   |

Let’s use the last row from the preceding table, (1,1) => 0, to demonstrate forward propagation. Note that, while this is a classification problem,
we will still treat it as a regression problem, only to understand how forward and back propagation work.

<img src='https://github.com/rahiakela/machine-learning-algorithms/blob/main/pro-machine-learning-algorithm/07-artificial-neural-network/images/ann-3.png?raw=1' hieght='200' width='400'/>

We now assign weights to all the synapses. Note that these weights are selected
randomly (the most common way is based on Gaussian distribution) since it is the
first time we’re forward propagating. The initial weights are randomly assigned to be between 0 and 1 (but note that the final weights don’t need to be between 0 and 1).

<img src='https://github.com/rahiakela/machine-learning-algorithms/blob/main/pro-machine-learning-algorithm/07-artificial-neural-network/images/ann-4.png?raw=1' hieght='200' width='400'/>


We sum the product of the inputs with their corresponding set of weights to arrive at the first values for the hidden layer. You can think of the weights as
measures of influence that the input nodes have on the output:

```
1 × 0.8 + 1 × 0.2 = 1
1 × 0.4 + 1 × 0.9 = 1.3
1 × 0.3 + 1 × 0.5 = 0.8
```

<img src='https://github.com/rahiakela/machine-learning-algorithms/blob/main/pro-machine-learning-algorithm/07-artificial-neural-network/images/ann-5.png?raw=1' hieght='200' width='400'/>


### Applying the Activation Function

Activation functions are applied at the hidden layer of a neural network. The purpose of an activation function is to transform the input signal into an output signal. They are necessary for neural networks to model complex non-linear patterns that simpler models might miss.

For our example, let’s use the sigmoid function for activation. And applying
Sigmoid(x) to the three hidden layer sums, we get.

```
Sigmoid(1.0) = 0.731
Sigmoid(1.3) = 0.785
Sigmoid(0.8) = 0.689
```

<img src='https://github.com/rahiakela/machine-learning-algorithms/blob/main/pro-machine-learning-algorithm/07-artificial-neural-network/images/ann-6.png?raw=1' hieght='200' width='400'/>

Then we sum the product of the hidden layer results with the second set of weights (also determined at random the first time around) to determine the output sum:

```
0.73 × 0.3 + 0.79 × 0.5 + 0.69 × 0.9 = 1.235
```

<img src='https://github.com/rahiakela/machine-learning-algorithms/blob/main/pro-machine-learning-algorithm/07-artificial-neural-network/images/ann-7.png?raw=1' hieght='200' width='400'/>

Because we used a random set of initial weights, the value of the output neuron is off the mark—in this case, by 1.235 (since the target is 0).

Now we have these followings matrix:

1. The input layer has two inputs (1,1), thus input layer is of
dimension of 1 × 2 (because every input has two different values).
2. The 1 × 2 hidden layer is multiplied with a randomly initialized weight
matrix of dimension 2 × 3.
3. The output of hidden layer is a 1 × 3 matrix.

So, we can visualize whole network as a matrix multiplication:

<img src='https://github.com/rahiakela/machine-learning-algorithms/blob/main/pro-machine-learning-algorithm/07-artificial-neural-network/images/ann-8.png?raw=1' hieght='200' width='400'/>

The output of the activation function is multiplied by a 3 × 1 dimensional randomly initialized weight matrix to get an output that is 1 × 1 in dimension.





