<a href="https://colab.research.google.com/github/rahiakela/edureka-deep-learning-with-tensorflow/blob/module-3-deep-dive-into-neural-networks-with-tensorFlow/module_3_perceptron_learning_algorithm.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Perceptron Learning Algorithm

As you know a perceptron serves as a basic building block for creating a deep neural network therefore, it is quite obvious that we should begin our journey of mastering Deep Learning with perceptron and learn how to implement it using TensorFlow to solve different problems. 

Following are the topics that will be covered in this blog on Perceptron Learning Algorithm:

* Perceptron as a Linear Classifier
* Implementation of a Perceptron using TensorFlow Library
* SONAR Data Classification Using a Single Layer Perceptron

## Types of Classification Problems

One can categorize all kinds of classification problems that can be solved using neural networks into two broad categories:
* Linearly Separable Problems
* Non-Linearly Separable Problems

Basically, a problem is said to be linearly separable if you can classify the data set into two categories or classes using a single line. For example, separating cats from a group of cats and dogs. On the contrary, in case of a non-linearly separable problems, the data set contains multiple classes and requires non-linear line for separating them into their respective classes. For example, classification of handwritten digits. Let us visualize the difference between the two by plotting the graph of a linearly separable problem and non-linearly problem data set:

<img src='https://d1jnx9ba8s6j9r.cloudfront.net/blog/wp-content/uploads/2017/07/Linear-528x264.jpg?raw=1' width='800'/>

Since, you all are familiar with AND Gates, I will be using it as an example to explain how a perceptron works as a linear classifier.

**Note**: As you move onto much more complex problems such as Image Recognition, which I covered briefly in the previous blog, the relationship in the data that you want to capture becomes highly non-linear and therefore, requires a network which consists of multiple artificial neurons, called as artificial neural network. 

## Perceptron as AND Gate

As you know that AND gate produces an output as 1 if both the inputs are 1 and 0 in all other cases. Therefore, a perceptron can be used as a separator or a decision line that divides the input set of AND Gate, into two classes:

* **Class 1**: Inputs having output as 0 that lies below the decision line.
* **Class 2**: Inputs having output as 1 that lies above the decision line or separator. 

The below diagram shows the above idea of classifying the inputs of AND Gate using a perceptron:

<img src='https://d1jnx9ba8s6j9r.cloudfront.net/blog/wp-content/uploads/2017/07/AND-Gate-Classifier-Deep-Learning-Tutorial-Edureka-528x194.png?raw=1' width='800'/>

Till now, you understood that a linear perceptron can be used to classify the input data set into two classes. But, how does it actually classify the data? 

Mathematically, one can represent a perceptron as a function of weights, inputs and bias (vertical offset):

<img src='https://d1jnx9ba8s6j9r.cloudfront.net/blog/wp-content/uploads/2017/06/Transfer-Function-Deep-Learning-Tutorial-Edureka-300x152.png?raw=1' width='800'/>

* Each of the input received by the perceptron has been weighted based on the amount of its contribution for obtaining the final output. 
* Bias allows us to shift the decision line so that it can best separate the inputs into two classes.

Enough of the theory, let us look at the first example of this blog on Perceptron Learning Algorithm where I will implement AND Gate using a perceptron from scratch. 

## Perceptron Learning Algorithm: Implementation of AND Gate

### 1. Import all the required library

In [1]:
import tensorflow as tf

### Define Vector Variables for Input and Output

Now, I will create variables for storing the input, output and bias for my perceptron:

In [0]:
# input1, input2 and bias
train_in = [
   [1., 1., 1],
   [1., 0, 1],
   [0, 1., 1],
   [0, 0, 1]         
]

# target
train_out = [[1.], [0], [0], [0]]

### 3. Define Weight Variable

Now, I need to define the weight variable and assign some random values to it initially. Since, I have three inputs over here (input 1, input 2 & bias), I will require 3 weight values for each input. So, I will define a tensor variable of shape 3×1 for our weights that will be initialized with random values:

In [0]:
# weight variable initialized with random values using random_normal()
w = tf.Variable(tf.random_normal([3, 1], seed=12))

### 4. Define placeholders for Input and Output

In TensorFlow, you can specify placeholders that can accept external inputs on the runtime. So, I will define two placeholders –  x for input and y for output. Later on, you will understand how to feed inputs to a placeholder.

In [0]:
# Placeholder for input and output
x = tf.placeholder(tf.float32, [None, 3])
y = tf.placeholder(tf.float32, [None, 1])

### 5. Calculate Output and Activation Function

As discussed earlier, the input received by a perceptron is first multiplied by the respective weights and then, all these weighted inputs are summed together. This summed value is then fed to activation for obtaining the final result as shown in the image below followed by the the code:

<img src='https://d1jnx9ba8s6j9r.cloudfront.net/blog/wp-content/uploads/2017/07/AND-Gate-Perceptron-Perceptron-Learning-Algorithm-Edureka-528x207.png?raw=1' width='800'/>

In [0]:
# calculate output
output = tf.nn.relu(tf.matmul(x, w))

### 6. Calculate the Cost or Error

Now, I need to calculate the error value w.r.t perceptron output and the desired output. Generally, this error is calculated as Mean Squared Error which is nothing but the square of difference of perceptron output and desired output as shown below:

In [0]:
# Mean Squared Loss or Error
loss = tf.reduce_sum(tf.square(output - y))

### 7. Minimize Error

TensorFlow provides optimizers that slowly change each variable (weight and bias) in order to minimize the loss in successive iterations. The simplest optimizer is gradient descent which I will be using in this case. 

In [0]:
# Minimize loss using GradientDescentOptimizer with a learning rate of 0.01
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)

### 8. Initialize all the variables

Variables are not initialized when you call tf.Variable. So, I need to explicitly initialize all the variables in a TensorFlow program using the following code:

In [0]:
# Initialize all the global variables
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

### 9. Training Perceptron in Iterations

Now, I need to train our perceptron i.e. update values of weights and bias in successive iteration to minimize the error or loss. Here, I will train our perceptron in 1000 epochs.

In [10]:
# Compute output and cost w.r.t to input vector
for i in range(10):
  sess.run(train, {x: train_in, y: train_out})
  cost = sess.run(loss, feed_dict={x: train_in, y: train_out})
  print(f'Epoch-- {str(i)} -- loss -- {str(cost)}')

Epoch-- 0 -- loss -- 1.0213106
Epoch-- 1 -- loss -- 1.0033305
Epoch-- 2 -- loss -- 0.9856898
Epoch-- 3 -- loss -- 0.96837854
Epoch-- 4 -- loss -- 0.95138687
Epoch-- 5 -- loss -- 0.93470633
Epoch-- 6 -- loss -- 0.9183289
Epoch-- 7 -- loss -- 0.90224737
Epoch-- 8 -- loss -- 0.8864547
Epoch-- 9 -- loss -- 0.8709445


In  the above code, you can observe how I am feeding train_in (input set of AND Gate) and train_out (output set of AND gate) to placeholders x and y respectively using feed_dict for calculating the cost or error.

## Activation Functions

As discussed earlier, the activation function is applied to the output of a perceptron as shown in the image below:

<img src='https://d1jnx9ba8s6j9r.cloudfront.net/blog/wp-content/uploads/2017/06/Activation-Function-Deep-Learning-Tutorial-Edureka-300x121.png?raw=1' width='800'/>

In the previous example, I have shown you how to use a linear perceptron with relu activation function for performing linear classification on the input set of AND Gate. But, what if the classification that you wish to perform is non-linear in nature. In that case, you will be using one of the non-linear activation functions. Some of the prominent non-linear activation functions have been shown below:

<img src='https://d1jnx9ba8s6j9r.cloudfront.net/blog/wp-content/uploads/2017/06/Activation-Functions-Deep-Learning-Tutorial-Edureka-528x177.png?raw=1' width='800'/>

TensorFlow library provides built-in functions for applying activation functions. The built-in functions w.r.t. above stated activation functions are listed below:

* **tf.sigmoid(x, name=None)**
  * Computes sigmoid of x element-wise
  * For an element x, sigmoid is calculated as –  y = 1 / (1 + exp(-x))
* **tf.nn.relu(features, name=None)**
  * Computes rectified linear as – max(features, 0)
* **tf.tanh(x, name=None)**
  * Computes hyperbolic tangent of x element wise

So far, you have learned how a perceptron works and how you can program it using TensorFlow. So, it’s time to move ahead and apply our understanding of a perceptron to solve an interesting use case on SONAR Data Classification.

## SONAR Data Classification Using Single Layer Perceptrons