### *Dense Layer* or *fully connected layer*
All neurons in a layer are connected every neuron in the previous layer.

### Input Layer
The inputs of a perceptron are fed to special passthrough neurons called *input neurons*

### Bias feature
Generally x0 = 1



## Perceptron
One of the simplest ANN architectures, based on a slightly differente articial neuron callsed *threshold logic unit (TLU)* or *linear threshold unir (LTU)*. A single TLU can be used for a simple linear binary classification. It computes a linear combination of the inputs, and if the result exceeds a threshold, it output a positive class.

A Perceptron is simply composed of a single layer of TLUs, with each TLE connected to all the inputs.

$$ h_{W, b} = \phi(XW + b) $$

- X matrix of input features.
- W weight matrix, contains all the connection weights. It has one row per input input neuron and one column per artifical neuron in the layer.
- b bias vector, it has one bias term per artifical neuron.
- $\phi$ activation function

### How the perceptron is trained?

Perceptrons are trained using a variant of this rule that takes into account the error made by the network when it makes prediction

$$
w_{i,j}^{(nest step)} = w_{i,j} + \alpha (y_{j} - \hat y_{j})x_{i}
$$

- w_{i,j} is the connection weight between the $i^{th}$ input neuron and the $j^{th}$ output neuron
- $x_{i} i^{th}$ vale of the current training instance
- $\hat y_{j} j^{th}$ output neuron for the current training instance
- $\hat y_{j}$ target output
- $\alpha$ learning rate

The perceptron learning algorithm strongly resembles Stochastic Gradient Descent. Note that contrary to Logistic Regression classifiers, Perceptron do not output a class probability; rather, they make predictions based on a hard threshold. This is one reason to prefer LR over Perceptrons.

In [2]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.linear_model import Perceptron

In [12]:
iris = load_iris()
X = iris.data[:, (2,3)]
# filtrando todas as iris classificadas como 0, gerando um array de boolean e convertendo pra int
y = (iris.target == 0).astype(np.int)

In [18]:
per_clf = Perceptron()
per_clf.fit(X,y)

y_pred = per_clf.predict([[2,0.5]])
y_pred

array([0])

Some  of the limitations of Perceptrons can be eliminated by stacking multiple Perceptrons. The resulting ANN is called a *Multilayer Perceptron (MLP)*. An MPL can solve the XOR problem. An MPL is composed of one *input layer*, one or more layers of TLUs, called hidden layers, and one final layer of TLUs called the *output layer*.

When an ANN contains a deep stack of hidden layers, it is called a *deep neural network* (DNN). For many years researchers struggled to find a way to train MLPs, without success. But in 1986, David Rumelhad, Geoffrey Hinton and Ronald Willians publishd a **groundbreaking paper** that introduce the *backpropagation* training algorithm, which is still used today. In short, it for computing the gradients automatically: in just two passes through the network (one forward, one backward), the backpropagation is able to compute the gradient of the network's error with regard to every single model parameter. In other words, it can find out how each
connection weight and each bias term should be tweaked in order to reduce the error. Once it has these gradients, it just performs a regular Gradient Descent step, and the whole process is repeated until the network converges to the solution.