# <font color = red> Perceptrons </font>

<br>

**Ref URL's**
- [Towards Data Science](https://towardsdatascience.com/activation-functions-neural-networks-1cbd9f8d91d6)
- [Towards Data Science](https://towardsdatascience.com/what-the-hell-is-perceptron-626217814f53)
- [SimpliLearn](https://www.simplilearn.com/what-is-perceptron-tutorial)
- [Standford](https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-deep-learning)
- [Edureka Blog](https://www.edureka.co/blog/perceptron-learning-algorithm/)

- **Perceptron is a single layer neural network** and *a multi-layer perceptron is called <b>Neural Networks</b>*.
- A perceptron is a neural network unit (an artificial neuron) that does certain computations to detect features or business intelligence in the input data.
- A Perceptron is an algorithm for **supervised learning of binary classifiers**. This algorithm enables neurons to learn and processes elements in the training set one at a time.
- This algorithm was designed to classify patterns and groups by finding linear separation between different objects and patterns.

<div>
<img src="attachment:Perceptron.png" width="800"/>
</div>

<br>

<div>
<img src="attachment:Perceptron_Bias_ActFunction.png" width="600"/>
</div>

<br>

<div>
<img src="attachment:Perceptron-Learning-Algorithm.gif" width="600"/>
</div>

<br>

The perceptron consists of 4 parts.
1. Input values or One input layer
2. Weights and Bias
3. Net sum
4. Activation Function

---

**Mathematically, one can represent a perceptron as a function of weights, inputs and bias (vertical offset)**
- Each of the input received by the perceptron has been weighted based on the amount of its contribution for obtaining the final output. 
- Bias allows us to shift the decision line so that it can best separate the inputs into two classes.


<div>
<img src="attachment:Perceptron%20Mathematically.png" width="400"/>
</div>

<br>

#### Why do we need Weights and Bias?
- **Weights** shows the strength of the particular node.
- **Bias** value allows you to shift the activation function curve up or down.

### <font color = orange> Single Layer Perceptrons </font> can learn only linearly separable patterns and cannot learn to separate the data that are non-linear in nature.
### <font color = orange> Multilayer Perceptrons or Feedforward Neural Networks </font> with two or more layers have the greater processing power. The nodes never form a cycle. This kind of neural network has an input layer, hidden layers, and an output layer.

## Perceptron Learning Steps
- The algorithm would automatically learn the optimal weight coefficients. 
- The input features are then multiplied with these weights to determine if a neuron fires or not.
- Activation function applies a step rule to check if the output of the weighting function is greater than zero.
- Linear decision boundary is drawn enabling the distinction between the two linearly separable classes +1 and -1.
- If the sum of the input signals exceeds a certain threshold, it outputs a signal; otherwise, there is no output.

<div>
<img src="attachment:Perceptron-Learning-Rule.jpg" width="600"/>
</div>

- A Boolean output is based on inputs such as salaried, married, age, past credit profile, etc. 
- It has only two values: Yes and No or True and False. 
- The summation function `∑` multiplies all inputs of “x” by weights “w” and then adds them up as follows: ($w_{0}$ + $w_{1}$$x_{1}$ + $w_{2}$$x_{2}$ + … + $w_{n}$$x_{n}$) = <font color = red> k </font>

<br>

**Weighted Sum** - Summation of all the multiplied values (*calculated in previous step*) = strength of a node

\begin{equation*}
\left( \sum_{k=1}^n k \right)
\end{equation*}

<br>

<div>
<img src="attachment:Weighted%20Sum.gif" width="500"/>
</div>

<br>

The calculated **weighted sum** is applied to the **activation function**

---

**Sample Output of the Perceptron**
<br>

$
\begin{equation*}
\left(\sum_{} (w_{n}x_{n}) \right) > 0
\end{equation*}
$
- Then final output "o" = 1 (YES)
- Else final output "o" = -1 (NO)

<font color = salmon> **Error in Perceptron** </font> - In the Perceptron Learning Rule, the predicted output is compared with the known output. If it does not match, the error is propagated backward to allow weight adjustment to happen.

## Activation Functions of Perceptron
- The activation function applies a step rule (convert the numerical output into +1 or -1) to check if the output of the weighting function is greater than zero or not.
- The activation function is the one which decides whether a neuron should be activated or not by calculating a weighted sum and further adding bias with it. 
- The purpose of the activation function is to introduce non-linearity into the output of a neuron.
- Without Activation Function, the Neural Network will not be able to learn and model more complicated kinds of data (image, audio, video ...)

<div>
<img src="attachment:Activation%20Functions%20of%20Perceptrons.jpg" width="500"/>
</div>

**The Activation Functions can be basically divided into 2 types**
1. Linear Activation Function
2. Non-linear Activation Functions

---

**1. Linear Activation Function**
- The function is a line or linear. Therefore, the output of the functions will not be confined between any range.
- Equation : f(x) = x
- Range : (-infinity to infinity)

<div>
<img src="attachment:Linear%20Activation%20Function.png" width="450"/>
</div>

**2. Non-linear Activation Function**
- Nonlinear Activation Functions are the most used activation functions.
- It makes it easy for the model to generalize or adapt with variety of data and to differentiate between the output.

<div>
<img src="attachment:Non%20Linear%20Activation%20Function.png" width="450"/>
</div>

The main terminologies needed to understand for nonlinear functions are:
>**Derivative or Differential**: Change in y-axis w.r.t. change in x-axis.It is also known as slope.
<br>
>**Monotonic function**: A function which is either entirely non-increasing or non-decreasing.

### Types of Non-Linear Activation Functions

<br>

<div>
<img src="attachment:Non-Linear%20Activation%20Function%20Comparison.png" width="900"/>
</div>

<br>

1. **Sigmoid or Logistic Activation Function**
    - Curve looks like a S-shape
    - Curve exists between (0 to 1)
    - Used for models where we have to predict the **probability as an output**.
    - Since probability of anything exists only between the range of 0 and 1, sigmoid is the right choice
    - Function is `differentiable` - meaning we can find slope of the sigmoid curve at any two points
---    
2. **Tanh or hyperbolic tangent Activation Function**
    - Like logistic sigmoid but better
    - Range of the tanh function is from (-1 to 1)
    - tanh is also sigmoidal (s - shaped)
    - The advantage is that the negative inputs will be mapped strongly negative and the zero inputs will be mapped near zero in the tanh graph.
    - The function is differentiable.
    - The function is monotonic while its derivative is not monotonic
    - Mainly used classification between two classes

<div>
<img src="attachment:tanh%20vs%20Logistic%20Sigmoid.jpeg" width="350"/>
</div>

---

3. **ReLU (Rectified Linear Unit) Activation Function**
    - Most used activation function
    - Used in almost all the convolutional neural networks or deep learning
    - The ReLU is half rectified (from bottom)
    - f(z) is zero when z is less than zero and f(z) is equal to z when z is above or equal to zero
    - The function and its derivative both are monotonic
    - Range: [0 to infinity)
    - Cons
        - all the negative values become zero immediately which decreases the ability of the model to fit or train from the data properly
        - Any negative input given to the ReLU activation function turns the value into zero immediately in the graph, which in turns affects the resulting graph by not mapping the negative values appropriately.
    
<div>
<img src="attachment:ReLU%20vs%20Logistic%20Sigmoid.png" width="500"/>
</div>

---

4. **Leaky ReLU**
    - An attempt to solve the dying ReLU problem
    - The leak helps to increase the range of the ReLU function. Usually, the value of a is 0.01 or so
    - When a is not 0.01 then it is called `Randomized ReLU`
    - Range: (-infinity to infinity)
    - The function and its derivative both are monotonic

<div>
<img src="attachment:ReLU%20vs%20Leaky%20ReLU.jpeg" width="500"/>
</div>

### Non-Linear Activation Function Cheatsheet

<br>

<div>
<img src="attachment:Activation%20Function%20Cheatsheet.png" width="900"/>
</div>

<div>
<img src="attachment:List%20of%20Activation%20Function.png" width="750"/>
</div>

### Perceptron Decision Function
A decision function `φ(z)` of Perceptron - a linear combination of x and w vectors.

\begin{equation*}
w = \begin{bmatrix} w_{1} \\ w_{2} \\ w_{3} \\ \dots \\ w_{m} \end{bmatrix}
\end{equation*}
<br>
\begin{equation*}
x = \begin{bmatrix} x_{1} \\ x_{2} \\ x_{3} \\ \dots \\ x_{m} \end{bmatrix}
\end{equation*}


The value `z` in the decision function is given by: z = ($w_{0}$ + $w_{1}$$x_{1}$ + $w_{2}$$x_{2}$ + … + $w_{m}$$x_{m}$)

**The decision function is +1 if z is greater than a threshold θ, and it is -1 otherwise**

$
\begin{equation*}
\phi(z) =
  \begin{cases}
    1   & \quad \text{if } z >= \phi \\
    -1  & \quad \text{otherwise }
  \end{cases}
\end{equation*}
$

#### Bias Unit
For simplicity, the threshold `θ` can be brought to the left and represented as $w_{0}$$x_{0}$, where $w_{0}$ = -θ and $x_{0}$ = 1.

z = ($w_{0}$ + $w_{1}$$x_{1}$ + $w_{2}$$x_{2}$ + … + $w_{m}$$x_{m}$) = $w^{T}x$

- $w_{0}$ - Bias Unit

---

The below figure shows how the Decision Function squashes $w^{T}x$ to either +1 or -1 and how it can be used to discriminate between two linearly separable classes.

<br>

<div>
<img src="attachment:Decision%20Function.jpg" width="500"/>
</div>