# Minimal Perceptron Test example

Inspired by https://www.kdnuggets.com/2018/09/6-steps-write-machine-learning-algorithm.html
and https://machinelearningmastery.com/implement-perceptron-algorithm-scratch-python/


* The single layer Perceptron is the most basic neural network. It’s typically used for binary classification problems (1 or 0, “yes” or “no”).
* It’s a linear classifier, so it can only really be used when there’s a linear decision boundary. Some simple uses might be sentiment analysis (positive or negative response) or loan default prediction (“will default”, “will not default”). For both cases, the decision boundary would need to be linear.
* If the decision boundary is non-linear, you really can’t use the Perceptron. For those problems, you’ll need to use something different.
    

Here is the Perceptron algorithm broken down into the following chunks:

- Initialize the weights
- Multiply weights by inputs and then sum them up (i.e. this is a 'dot-product' calculation)
- Compare the result against the threshold to compute the output (0 or 1)
- Update the weights
- Repeat

Start learning a simple function. For a perceptron a NAND function is a perfect example. If both inputs are true (1) then the output is false (0), otherwise, the output is true. Here is what the data set looks like

<html>
<table align="left" style="width:20%">
  <tr>
      <th>x<sub>1</sub></th>
      <th>x<sub>2</sub></th>
    <th>y</th>
  </tr>
  <tr>
    <td>0</td> <td>0</td> <td>1</td>
  </tr>
  <tr>
    <td>0</td> <td>1</td> <td>1</td>
  </tr>
  <tr>
    <td>1</td> <td>0</td> <td>1</td>
  </tr>
  <tr>
    <td>1</td> <td>1</td> <td>0</td>
  </tr>
</table>
</html>

Start implementating algorithm defined above. We can import and use the dot product algorithm from numpy. 

In [1]:
import numpy as np

w = [0, 0, 0]
x = [1, 0, 1]

np.dot(w, x)

0

Implement minimal perceptron code to learn the NAND function. We can estimate the weight values for our training data using the stochastic gradient descent algorithm.

Stochastic gradient descent requires two parameters:

- Learning Rate: Used to limit the amount each weight is corrected each time it is updated.
- Epochs: The number of times to run through the training data while updating the weight.
These, along with the training data will be the arguments to the function.

There are 3 loops we need to perform in the function:

1. Loop over each epoch.
2. Loop over each row in the training data for an epoch.
3. Loop over each weight and update it for a row in an epoch.
As you can see, we update each weight for each row in the training data, each epoch.

Weights are updated based on the error the model made. The error is calculated as the difference between the expected output value and the prediction made with the candidate weights.

In [2]:
# Make a prediction with weights
def predict(row, weights):
    activation = weights[0]
    for i in range(len(row)-1):
        activation += weights[i + 1] * row[i]
    return 1.0 if activation >= 0.0 else 0.0

# Estimate Perceptron weights using stochastic gradient descent
def train_weights(train, l_rate, n_epoch):
    weights = [0.0 for i in range(len(train[0]))]
    for epoch in range(n_epoch):
        sum_error = 0.0
        for row in train:
            prediction = predict(row, weights)
            error = row[-1] - prediction
            sum_error += error**2
            weights[0] = weights[0] + l_rate * error
            for i in range(len(row)-1):
                weights[i + 1] = weights[i + 1] + l_rate * error * row[i]
        print('> epoch=%d, lrate=%.3f, error=%.3f' % (epoch, l_rate, sum_error))
    return weights


# Define training dataset 
#       NAND function
#           x1,x2,y
dataset = [ [0,0,1],
            [0,1,1],
            [1,0,1],
            [1,1,0]]

# Calculate weights
l_rate = 0.1
n_epoch = 10
weights = train_weights(dataset, l_rate, n_epoch)
print('Final weights model = ', weights)

> epoch=0, lrate=0.100, error=1.000
> epoch=1, lrate=0.100, error=3.000
> epoch=2, lrate=0.100, error=3.000
> epoch=3, lrate=0.100, error=2.000
> epoch=4, lrate=0.100, error=1.000
> epoch=5, lrate=0.100, error=0.000
> epoch=6, lrate=0.100, error=0.000
> epoch=7, lrate=0.100, error=0.000
> epoch=8, lrate=0.100, error=0.000
> epoch=9, lrate=0.100, error=0.000
Final weights model =  [0.2, -0.2, -0.1]


In [3]:
# Make one-off predictions using our weights model
predict(row=[0,0,None], weights=[0.2, -0.2, -0.1])

1.0

In [4]:
predict(row=[0,1,None], weights=[0.2, -0.2, -0.1])

1.0

In [5]:
predict(row=[1,0,None], weights=[0.2, -0.2, -0.1])

1.0

In [6]:
predict(row=[1,1,None], weights=[0.2, -0.2, -0.1])

0.0

This is cool! We can see it takes 5 epochs for the weights to completely reduce the error to 0.0 and to learn the function. We note the final values of our weights (i.e. the model) and can re-use them to make predictions across a range of inputs.   

We could do more here, like modify the learning rate or learn a different function but this is a good start! 

### Compare with sklearn Perceptron

In theory... we should be able to test our perceptron implementation with the version implemented in the excellent scikit-learn libraries. If our implememntation is correct then we should get identical results, right ? Let's test it...!  

In [7]:
# The input format of our data is slightly different
# Each row in the training dataset has to have an extra value '1' pre-pended
# This dummy feature tells the sklearn.Perceptron algorithm that the data has been centered
# see sklearn docs (http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Perceptron.html)
x_train = []
y_train = []
for d in dataset:
    one_list = [1]    # insert dummy value '1'
    one_list = one_list + d[0:-1]
    x_train.append(one_list)
    y_train = y_train + [d[-1]]   
    
# Training predictions are split off into a separate list (y_train)    
    
print('dataset = ', dataset)    
print('x_train = ', x_train)
print('y_train = ', y_train)


dataset =  [[0, 0, 1], [0, 1, 1], [1, 0, 1], [1, 1, 0]]
x_train =  [[1, 0, 0], [1, 0, 1], [1, 1, 0], [1, 1, 1]]
y_train =  [1, 1, 1, 0]


In [8]:
from sklearn.linear_model import Perceptron

# Train the sklearn Perceptron
# eta0 is the learning rate 
# max_iter is the number of epochs
# All other parameters are set to false
clf = Perceptron(random_state=None, eta0=l_rate, shuffle=False, fit_intercept=False, max_iter=n_epoch)
        
clf.fit(x_train, y_train)

x_test = x_train
y_predict = clf.predict(x_test)
print("x_test = ", x_test)
print("y_predict = ", y_predict)


x_test =  [[1, 0, 0], [1, 0, 1], [1, 1, 0], [1, 1, 1]]
y_predict =  [1 1 1 0]


If our original perceptron implementation is correct then the weights from our original perceptron implementation and the weights from the sklearn Perceptron implementation should be the same...

In [9]:

print ("sklearn.Perceptron weights:")
print (clf.coef_[0])

print ("my perceptron weights:")
print (weights)

sklearn.Perceptron weights:
[ 0.2 -0.2 -0.1]
my perceptron weights:
[0.2, -0.2, -0.1]
