# The Perceptron Learning Algorithm
 
## Learning And Machine Learning
   Learning is the ability to improve one's behavior with experience. In focusing on the experience aspect of learning, machine learning can be be defined as building computers, systems that automatically improve with experience.
   A typical algorithm solves a problem by giving a computer data and a program to give a suitable output. Machine learning involves giving the data  and examples of outcomes to the computer to output a model for getting suitable
outputs.


<img align="center" width="500" height="300" src="PLA.png">


## Synopsis of The Perceptron  
   The Perceptron learning algorithm is an algorithm that is used for supervised learning of binary clusters.The supervised learning involves giving conditions(inputs) with documented outputs. An allusion of learning and the binary clusters means that the choice of output has only two outcomes. The Perceptron was invented in 1957 by Frank Rosenblatt Ph.D. The idea is that the Perceptron works just like the neuron of a nervous system. Each neuron receives thousands of signals from other neurons, connected via synapses. Once the sum of the signals being received surpasses a certain threshold, a response is sent through the axon.


## Composition of the Perceptron Learning Algorithm
The implementation of the perceptron learning algorithm involves using a collection of features to answer a question that has two choices; binary cluster. And the algorithm learns to make these choices from being exposed to previous data collected with resultant outcomes with one of the two choices. 
So what we have actualy is:
$\mathcal{X} \subseteq \mathbb{R}^d$ and $d \in \mathbb{N}$ be the input space, and let $\mathcal{Y} = \{-1, 1\}$ 
 $x$: Input customer information that is used to make credit decision.
* $f:\mathcal{X} \rightarrow \mathcal{Y}$: *Unknown target* function that is the ideal formula for credit approval. 
* $\mathcal{X}$: *Input space* consisting of all possible input $x$.
* $\mathcal{Y}$: *Output space* consisting of no or yes credit approval.
* $\mathcal{D}$: *Data set* of tuples in  input-output examples of the form $(x_i, y_i)$, where $f(x_i) = y_i$ and $i \in \mathbb{N}$ .
* $\mathcal{A}$: Learning algorithm which uses $D$ to pick a formula (hypothesis) $g:\mathcal{X}\rightarrow \mathcal{Y}$ so that $g\approx f$, where $g\in \mathcal{H}$. Here $\mathcal{H}$ is the *hypothesis space*. 

For $h \in \mathcal{H}$, $h(x)$ gives different weights to the different coordinates of $x$. This reflects the relative importance of each coordinate to the credit decision. The combinded weighted coordinates form a credit score which is compared to some threshold, say $theta$. 

* Approve if
$$
\sum_{i=1}^{d}w_ix_i > \theta
$$

* Deny if
$$
\sum_{i=1}^{d}w_ix_i < \theta
$$

We next introduce a *bias* $- b = \theta$, and so, we build the following form for hypothesis functions in $\mathcal{H}$.

$$
h(x) = \text{sign}\Big((\sum_{i=1}^{d}w_ix_i) + b\Big), 
$$

where $h(x) = 1$ means approve and $h(x) = -1 $ means deny. 

We next simplify notation by treating the bias $b$ as a weight, and modify $x$ so that 

$$
w = [b, w_1, \dots, w_d]^{T}
$$

$$
x = [1.0, x_1, \dots, x_d]^{T}
$$

Thus, $\mathcal{X} = {1.0}\times\mathbb{R}^d$, and $h(x) = \text{sign}(w^{T}x)$. 

### Perceptron Learning Algorithm (PLA)
This is an iterative method. Suppose an example from $(x_1,y_1), \dots, (x_N, y_N)$ is currently misclassifed at time $t$, and denote this misclassifed example by $(x(t), y(t))$. Note that since $(x(t), y(t))$ is currently misclassifed, 

$$
y(t) \neq \text{sign}(w^{T}(t)x(t)). 
$$

**Update Rule:**

$$
w(t+1) = w(t) + y(t)x(t).
$$

**Theorem.** The perceptron model will always classify the training examples correctly when the data is linearly seperable. 


In [82]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('iris_data.csv')
df = df[0:100]
#shuffled the data to include all all observations
df = df[0:100].sample(frac=1).reset_index(drop=True)
print(df)

    SepalLength  SepalWidth  PetalLength  PetalWidth     Species
0           6.2         2.2          4.5         1.5  versicolor
1           6.6         2.9          4.6         1.3  versicolor
2           5.1         3.8          1.5         0.3      setosa
3           4.9         3.6          1.4         0.1      setosa
4           5.1         2.5          3.0         1.1  versicolor
5           5.1         3.3          1.7         0.5      setosa
6           6.1         3.0          4.6         1.4  versicolor
7           6.0         2.2          4.0         1.0  versicolor
8           6.0         2.9          4.5         1.5  versicolor
9           5.8         4.0          1.2         0.2      setosa
10          4.3         3.0          1.1         0.1      setosa
11          5.6         2.5          3.9         1.1  versicolor
12          5.0         2.0          3.5         1.0  versicolor
13          6.1         2.8          4.0         1.3  versicolor
14          5.1         3

In [83]:
# get sepal length and petal lenght from the iris data set 
X = [np.array([1.0, df['SepalLength'][i], df['PetalLength'][i]]) for i in range(99)]

# Convert the species label to a numeric valua
make_int = lambda label: 1 if label == 'setosa' else -1
Y = [make_int(df['Species'][i]) for i in range(99)]

In [84]:
 #Set perceptron hypothesis: h(x) = sign(w^T*x)
def h(w, x):
    if w @ x > 0:
        return 1
    else:
        return -1

In [85]:
def Perceptron(x, y, iterations = 1000):
    
    w = np.random.rand(3)      #initializing the weights 
    n = len(x)
    
    for _ in range(iterations):
        i = np.random.randint(n)
        if h(w, x[i]) != y[i]:
            w += y[i]*x[i]
    
    return w

In [87]:
# Iterate the perceptron learning algorithm 1000 times 
w = Perceptron(X, Y, 10000)


In [88]:
def predict(w, i):
    if h(w, X[i]) == 1:
        return 'Setosa'
    else:
        return 'Versicolor'

In [89]:
predict(w, 20)

'Setosa'