<h1>Perceptron</h1>

<h2>From BNNs to ANNs</h2>
<p>Where do Artificial Neural Networks came from? As many other inventions, the concept behind them came from nature. In particular, <b>our brain</b>.</p>

<h2><i>Biological Neurons</i></h2>
<img src="img/neurons.png" width="500" height="500"></img>
<h2><i>Perceptron</i></h2>
<img src="img/perceptron.png" width="500" height="500"></img>

<p>They are very similar, aren't they? We have <b>inputs</b>, some kind of <b>computation</b> and an <b>output</b>. The Perceptron is one of the simplest ANN architectures. It is based on a particular artificial neuron called <i>linear threshold unit</i> (LTU). 
<p><b>How it works?</b></p>
<ul>
    <li>inputs and outputs are numbers (not binary values)</li>
    <li>each input connection is associated with a weight</li>
    <li>the LTU computes a weighted sum of its inputs (z = <b>w'.x</b>)</li>
    <li>then applies a step function (heaviside function or sign function) to that sum and outputs the result</li>
</ul>

<p><b>Training a Perceptron</b><p>
<p>For each instance, it makes its predictions. For every output neuron that produced a wrong prediction, it reinforces the connection weights from the inputs that would have contributed to the correct prediction.</p>
<img src="img/learning_rule.png" width="300" height="50"></img>

<p><b>What's the problem?</b><p>
<p>Like any other linear classification model, the Perceptron cannot solve some trivial problems, like XOR classification problem.</p>

<p><b>How to solve?</b><p>
<p>If we stack multiple Perceptrons we can eliminate a lot of these limitations. So Multi-Layer Perceptron is born and from that all the ANNs we know today.</p>

<img src="img/multi.jpg" width="400" height="400"></img>

<p>As you can see we have one or more layers of LTU, called <i>hidden layers</i> (if the hidden layers are two or more the ANN is called Deep Neural Network).</p>
<p><b>What kind of activation functions here?</b></p>
<p>As you can see from the image, in the hidden layers we use a non-linear function, while in the output layer we use a linear one. Why? </p>
<p>That's because of the training algorithm we're gonna to use with a MLP: <i>backpropagation</i>.</p>
<p>I'm not going to explain here how it works (reference <a href="https://en.wikipedia.org/wiki/Backpropagation">HERE</a>), but in the final step we use the Gradient Descent step to reduce the error, and because of the step function contains only flat segment there is no gradient to work with.</p>

<p><b>Common Activation Functions: </b></p>
<ul>
    <li>logistic function</li>
    <li>hyperbolic tangent function</li>
    <li>ReLU function</li>
</ul>
<p>Insted, the linear activation function which we use in the output layer is the <b>Softmax function</b>.</p>

<h1>Some code now!</h1>

In [1]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.linear_model import Perceptron

In [2]:
iris = load_iris()
X = iris.data[:, (2, 3)]
y = (iris.target == 0).astype(np.int) # Iris Setosa

In [8]:
per_clf = Perceptron(random_state=42)
per_clf.fit(X, y)

y_pred = per_clf.predict([[2, 0.5]])
print(y_pred)

[1]
