## ANN: Perceptron

So, we understood how we can represent a Logistic Regression model and the linear SVM model as a single-neuron Neural Network.

There is another historically important single-neuron model, known as the **Perceptron model**.


Suppose we have the same single-neuron structure
- Gets $d$ inputs, and one input as $1$ (for bias).
- Gives out one output: $o_i$

But the activation function is: <br>
\begin{equation}
  f_{perceptron}(x_i,w,b)=\begin{cases}
    1, & \text{if $w^Tx_i+b>0$}.\\
    0, & \text{otherwise}.
  \end{cases}
\end{equation}

This activation function is called the **Perceptron activation function**


**Note:**
- Here, we are assuming we already know the weights and bias values. This is a very simple activation function.

<br>

> **Q. What does forward propagation look like in a perceptron?**

Here also, it is done in the same way, in 2 steps:-
- $z = w_1x_1 + w_2x_2 + ... + w_dx_d + b$
- $o_i = f_{perceptron}(z)$



There are 3 ways of thinking about a model:
1. Based on **Equations**
2. Based on **Geometry**
3. Based on **Neural Network representation**

We already saw the equation and NN representation of perceptron.

<br>

> **Q. How is the perceptron model different from the Logistic Regression model?**

Recall the geometrical meaning of Logistic Regression model: 
- The geometry comes from a hyperplane acting as a separator: $Π^d$
- And of course, there is **squashing** using the sigmoid function.

Coming back to the perceptron. Let's take a closer look at the perceptron activation function.
- Given a few positive and negative points, they would be divided by a hyperplane: $Π^d$ which is $w^T.x_i + b = 0$
 - So this function is simply predicting all datapoints above the hyperplane as positive, and the ones below as negative.
- So, here also, a hyperplane is acting as a separator
- Only difference is: there is **no squashing** here.
- We find that it is a **step function**

> **Q. What are some cons of perceptron model?**

- These models are even older than LogReg model, and dont perform even as good.
- Impact of outliers will be massive, as there is no squashing. So, outliers will just kill this model.
- Linear model
- Can't get probabilities, only 1 or 0

For the same shortcomings, preceptron model is not used as widely anymore. It has been replaced by better activation functions.

<br>

![picture](https://drive.google.com/uc?export=view&id=11tRrA3NlcC9Vy3qHfoHpShhBBxr4vX42)

#### Q. What would training look like in a perceptron model?

- In this model, we are only getting output values as 0 or 1, and not in form of probabilities.
 - Hence in order to compute loss, we can use even basic loss functions like **MSE**.
 - With parameters $w$ and $b$, we need to minimise this loss across all $n$ datapoints.

- We can also add **L2 Regularization** on the weight values, to avoid overfitting.

This gives us our optimization problem as: 
$$\underset{w,b}{min} \ \sum_{i-1}^n Loss(y_i,\hat{y_i}) + \lambda * L_2Reg \ (w_j)$$

We can easily solve this using **Gradient Descent**.

![picture](https://drive.google.com/uc?export=view&id=1H0fOgQZbQDWil0PKq99znvfJgI9s3X_J)


**Note:-**
- Neural Networks are also sometimes referred to as **Multi Layered Perceptrons (MLPs)**
- Though the name MLP contains "perceptron", but the activation function in the neurons here need not be perceptron activation functions. They can be anything.
- An MLP consists of at least three layers of nodes: an input layer, a hidden layer and an output layer.
- Due to this, in order to avoid any confusion, we will be continuing to use the term Neural network (NN).