<a href="https://colab.research.google.com/github/rahiakela/edureka-deep-learning-with-tensorflow/blob/module-3-deep-dive-into-neural-networks-with-tensorFlow/module_3_multi_layer_perceptron.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Multi Layer Perceptron

In this Neural Network tutorial we will take a step forward and will discuss about the network of Perceptrons called Multi-Layer Perceptron (Artificial Neural Network).

We will be discussing the following topics in this Neural Network tutorial:

* Limitations of Single-Layer Perceptron
* What is Multi-Layer Perceptron (Artificial Neural Network)?
* How Artificial Neural Networks Work? 
* Use-case

This blog on Neural Network tutorial will include a use-case in the end. For implementing that use-case, we will be using TensorFlow.

Now, I will start by discussing what are the limitations of Single-Layer Perceptron.

Reference: https://www.edureka.co/blog/neural-network-tutorial/

## Limitations of Single-Layer Perceptron

Well, there are two major problems:

* Single-Layer Percpetrons cannot classify non-linearly separable data points. 
* Complex problems, that involve a lot of parameters cannot be solved by Single-Layer Perceptrons.

**Single-Layer Percpetrons cannot classify non-linearly separable data points**

Let us understand this by taking an example of XOR gate. Consider the diagram below:

<img src='https://d1jnx9ba8s6j9r.cloudfront.net/blog/wp-content/uploads/2017/09/Limitaions-Of-Single-Layer-Perceptron-Neural-Network-Tutorial-Edureka.png?raw=1' width='800'/>

Here, you cannot separate the high and low points with a single straight line. But, we can separate it by two straight lines. Consider the diagram below: 

<img src='https://d1jnx9ba8s6j9r.cloudfront.net/blog/wp-content/uploads/2017/09/Solution-Single-Layer-Percpetron-Neural-Network-Tutorial-Edureka.png?raw=1' width='800'/>

**Complex problems, that involve a lot of parameters cannot be solved by Single-Layer Perceptrons**

Here also, I will explain with an example. 

As an E-commerce firm, you have noticed a decline in your sales. Now, you try to form a marketing team who would market the products for increasing the sales.

The marketing team can market your product through various ways, such as:

* Google Ads
* Personal emails
* Sale advertisement on relevant sites
* Reference program
* Blogs and so on . . .

Considering all the factors and options available, marketing team has to decide a strategy to do optimal and efficient marketing, but this task is too complex for a human to analyse, because number of parameters are quite high. This problem will have to be solved using Deep Learning. Consider the diagram below:

<img src='https://d1jnx9ba8s6j9r.cloudfront.net/blog/wp-content/uploads/2017/09/Category_02-1.png?raw=1' width='800'/>

They can either use just one means to market their products or use a variety of them.

Each way would have different advantages and disadvantages as well, they will have to focus on a variety of factors and options such as:

<img src='https://d1jnx9ba8s6j9r.cloudfront.net/blog/wp-content/uploads/2017/09/Category.png?raw=1' width='800'/>

Number of sales that would happen would be dependent on different categorical inputs, their sub categories and their parameters. However, computing and calculating from so many inputs and their sub parameters is not possible just through one neuron (Perceptron).

That is why more than one neuron would be used to solve this problem. Consider the diagram below:

<img src='https://d1jnx9ba8s6j9r.cloudfront.net/blog/wp-content/uploads/2017/09/Limitations-of-Single-Layer-Perceptron-Neural-Network-Tutorial-Edureka.png?raw=1' width='800'/>

Because of all these reasons, Single-Layer Perceptron cannot be used for complex non-linear problems.

Next up, in this Neural Network tutorial I will focus on Multi-Layer Perceptrons (MLP).

## What is Multi-Layer Perceptron?

As you know our brain is made up of millions of neurons, so a Neural Network is really just a composition of Perceptrons, connected in different ways and operating on different activation functions.

Consider the diagram below:

<img src='https://d1jnx9ba8s6j9r.cloudfront.net/blog/wp-content/uploads/2017/09/Multi-Layer-Perceptron-Neural-Network-Tutorial-Edureka.png?raw=1' width='800'/>

* **Input Nodes** – The Input nodes provide information from the outside world to the network and are together referred to as the “Input Layer”. No computation is performed in any of the Input nodes – they just pass on the information to the hidden nodes.

* **Hidden Nodes** – The Hidden nodes have no direct connection with the outside world (hence the name “hidden”). They perform computations and transfer information from the input nodes to the output nodes. A collection of hidden nodes forms a “Hidden Layer”. While a network will only have a single input layer and a single output layer, it can have zero or multiple Hidden Layers. A Multi-Layer Perceptron has one or more hidden layers.

* **Output Nodes** – The Output nodes are collectively referred to as the “Output Layer” and are responsible for computations and transferring information from the network to the outside world.

Yeah, you guessed it right, I will take an example to explain – how an Artificial Neural Network works.

Suppose we have data of a football team, Chelsea. The data contains three columns. The last column tells whether Chelsea won the match or they lost it. The other two columns are about, goal lead in the first half and possession in the second half. Possession is the amount of time for which the team has the ball in percentage. So, if I say that a team has 50% possession in one half (45 minutes), it means that, the team had ball for 22.5 minutes out of 45 minutes. 

<img src='https://github.com/rahiakela/img-repo/blob/master/chelsea-table.png?raw=1' width='800'/>

The Final Result column, can have two values 1 or 0 indicating whether Chelsea won the match or not. For example, we can see that if there is a 0 goal lead in the first half and in next half Chelsea has 80% possession, then Chelsea wins the match.

Now, suppose, we want to predict whether Chelsea will win the match or not, if the goal lead in the first half is 2 and the possession in the second half is 32%.

This is a binary classification problem where a multi layer Perceptron can learn from the given examples (training data) and make an informed prediction given a new data point. We will see below how a multi layer perceptron learns such relationships.

The process by which a Multi Layer Perceptron learns is called the Backpropagation algorithm, I would recommend you to go through the [Backpropagation](https://www.edureka.co/blog/backpropagation/) blog.


Consider the diagram below:

<img src='https://d1jnx9ba8s6j9r.cloudfront.net/blog/wp-content/uploads/2017/09/Multi-Layer-Perceptron-Example-Neural-Network-Tutorial-Edureka.png?raw=1' width='800'/>







### Forward Propagation

Here, we will propagate forward, i.e. calculate the weighted sum of the inputs and add bias. In the output layer we will use the softmax function to get the probabilities of Chelsea winning or loosing. 

If you notice the diagram, winning probability is 0.4 and loosing probability is 0.6. But, according to our data, we know that when goal lead in the first half is 1 and possession in the second half is 42% Chelsea will win. Our network has made wrong prediction. 

If we see the error (Comparing the network output with target), it is 0.6 and -0.6.

### Backward Propagation and Weight Updation

We calculate the total error at the output nodes and propagate these errors back through the network using Backpropagation to calculate the gradients. Then we use an optimization method such as Gradient Descent to ‘adjust’ all weights in the network with an aim of reducing the error at the output layer. 

Let me explain you how the gradient descent optimizer works:

* **Step – 1**: First we calculate the error, consider the equation below:

<img src='https://d1jnx9ba8s6j9r.cloudfront.net/blog/wp-content/uploads/2017/09/Error-Neural-Network-Tutorial-Edureka-1-528x200.png?raw=1' width='800'/>

* **Step – 2**: Based on the error we got, it will calculate the rate of change of error w.r.t change in the weights.

<img src='https://d1jnx9ba8s6j9r.cloudfront.net/blog/wp-content/uploads/2017/09/Gradient-Neural-Network-Tutorial-Edureka-467x300.png?raw=1' width='800'/>

* **Step – 3**: Now, based on this change in weight, we will calculate the new weight value.

If we now input the same example to the network again, the network should perform better than before since the weights have now have been adjusted to minimize the error in prediction. Consider the example below, As shown in Figure, the errors at the output nodes now reduce to [0.2, -0.2] as compared to [0.6, -0.4] earlier. This means that our network has learnt to correctly classify our first training example. 

<img src='https://d1jnx9ba8s6j9r.cloudfront.net/blog/wp-content/uploads/2017/09/Neural-Network-Example-Neural-Network-Tutorial-Edureka.png?raw=1' width='800'/>

We repeat this process with all other training examples in our dataset. Then, our network is said to have learnt those examples.

Now, I can feed in the input to our network. If I feed in goal lead in the first half as 2 and possession in the second half as 32%, our network will predict whether Chelsea will win that match or not.

Now in this Neural Network Tutorial, we will some have fun with hands-on. I will be using TensorFlow to model a Multi-Layer Neural Network.






## Use-Case