## Deep Learning

### Perceptron
#### It is a simple Neural Network architecture developed in 1957, which made use of different artificial neurons called `threshold linear unit` or        `TLU` in which 
```
the inputs and output are `numbers` (instead of binary on/off values), and each input connection
is associated with a weight. The TLU computes a weighted sum of its inputs (z = w1 x1 + w2 x2 + ⋯ + wn xn = x⊺ w),
then applies a step function to that sum and outputs the
result: hw(x) = step(z), where z = x⊺ w.
```

![image.png](attachment:fb39f923-2e3c-4658-acdb-a186ae615eea.png)
![image.png](attachment:8a761881-6ffc-4686-bece-35b721b10150.png)

In [6]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.linear_model import Perceptron

iris = load_iris()
X = iris.data[:, (2, 3)] # petal length, petal width
y = (iris.target == 0).astype(int) # Iris setosa?

per_clf = Perceptron()
per_clf.fit(X, y)

y_pred = per_clf.predict([[2, 0.5]])

In [9]:
y_pred ## yes, Iris Setosa

array([0])

```
Scikit-Learn’s Perceptron class is equivalent to using an SGDClassifier with the following hyperparameters:
    loss="perceptron", learning_rate="constant", eta0=1 (the learning rate), and penalty=None (no regularization).

Note that contrary to Logistic Regression classifiers, Perceptrons do not output a class probability;
rather, they make predictions based on a hard threshold.
This is one reason to prefer Logistic Regression over Perceptrons.

![image.png](attachment:4ba46144-7fd7-40cf-9417-084271c8e3a2.png)

#### The Signal flows in a single direction hence called `FeedForward Neural Network (FNN)`
#### Layer closer to input layer are called  `below layer` Layer closer to output layer are called `above layer`

```
Training MLP layer was hard till `backpropagation` training algo was created.
In two steps, forward and backward, model were able to learn each connection weight and each bias of data
in regards to every single model param(due to its autodiff='reverse-mode')

reverse-mode auto differentiaton is fast and precise, and is well suited when
the function to differentiate has many variables (e.g., connection
weights) and few outputs (e.g., one loss). 

![image.png](attachment:a93b6c01-1033-402b-84a5-2ad204614e7f.png)

### Key change from single layered Percerptron to MLP was using `tanh` or `ReLU` step function instead of `heaviside`
```
heaviside actication func outputs 0 or 1 (flat segments) which was major drawback to solve complex problem
tanh and ReLU func added non-linearity and allow Gradient descent to progress at each step
(Gradient descent doesn't work on fat surface)

if f(x) = 2x + 3 and g(x) = 5x – 1, then chaining these two linear functions
gives you another linear function: f(g(x)) = 2(5x – 1) + 3 = 10x + 1. So if you don’t have some nonlinearity between layers,
then even a deep stack of layers is equivalent to a single layer, and you can’t solve very complex problems with that.
```
![image.png](attachment:a6ba290a-3cb4-4526-b57a-b4742b7db839.png)

## Regression MLPs

![image.png](attachment:2c81cdfb-9709-4e17-9aa1-36646c89826b.png)

## Classification MLPs

![image.png](attachment:675ba458-e101-410b-9a4e-88a8df5fb4e1.png)