# How To Implement The Perceptron Algorithm From Scratch In Python

by Jason Brownlee on August 13, 2019.[Here](https://machinelearningmastery.com/implement-perceptron-algorithm-scratch-python/) in [Code Algorithms From Scratch](https://machinelearningmastery.com/category/algorithms-from-scratch/)

The Perceptron algorithm is the __simplest type of artificial neural network__.

It is a model of a single neuron that can be used for two-class classification problems and provides the foundation for later developing much larger networks.

After completing this tutorial, you will know:

- How to train the network weights for the Perceptron.
- How to make predictions with the Perceptron.
- How to implement the Perceptron algorithm for a real-world classification problem.

## Tutorial
This tutorial is broken down into 3 parts:

1. Description
    - 1.1 Perceptron Algorithm
    - 1.2 Stochastic Gradient Descent
    - 1.3 Sonar Dataset
2. Making Predictions.
3. Training Network Weights.
4. Modeling the Sonar Dataset.
5. Extensions

## 1. Description

### 1.1 Perceptron Algorithm

__Perceptron__ receives `input signals from examples of training data` that we weight and combined in a `linear equation` called the __activation__.

$activation = sum(weight_i * x_i) + bias$

The activation is then transformed into an __output value__ or __prediction__ using a transfer function, such as the `step transfer function`.

$prediction = 1.0 (if (activation >= 0.0) else (0.0))$

In this way, the __Perceptron is a classification algorithm for problems with two classes__ (0 and 1) where a linear equation (like or hyperplane) can be used to separate the two classes.

### 1.2 Stochastic Gradient Descent

__Gradient Descent__ is the process of minimizing a function by following the gradients of the cost function.

In machine learning, we can use a technique that evaluates and updates the weights every iteration called stochastic gradient descent `to minimize the error of a model on our training data`. 

The model makes a prediction for a training instance, the error is calculated and the model is updated in order to reduce the error for the next prediction.

This procedure can be used to find the set of weights in a model that result in the smallest error for the model on the training data.

For the Perceptron algorithm, each iteration the weights (__w__) are updated using the equation:

$w = w + learning_rate * (expected - predicted) * x$

Where w is weight being optimized, __learning_rate__ is a learning rate that you must configure (e.g. 0.01), (__expected__ – __predicted__) is the prediction error for the model on the training data attributed to the weight and x is the input value.


### 1.3 Sonar Dataset

This is a dataset that describes sonar chirp returns bouncing off different services. The 60 input variables are the strength of the returns at different angles. It is a binary classification problem that requires a model to differentiate rocks from metal cylinders.

It is a well-understood dataset. All of the `variables are continuous and generally in the range of 0 to 1`. As such we __will not have to normalize the input data, which is often a good practice with the Perceptron algorithm__. The output variable is a string “__M__” for mine and “__R__” for rock, which will need to be converted to integers __1__ and __0__.

## 2. Making Predictions.
The first step is to develop a function that can make predictions.

This will be needed both in the evaluation of candidate weights values in stochastic gradient descent, and after the model is finalized and we wish to start making predictions on test data or new data.

In [1]:
# small dataset to test our prediction function
# Make a prediction with weights
def predict(row, weights):
    activation = weights[0]
    for i in range(len(row)-1):
        activation += weights[i + 1] * row[i]
    return 1.0 if activation >= 0.0 else 0.0

In [2]:
# test predictions
dataset = [[2.7810836,2.550537003,0],
           [1.465489372,2.362125076,0],
           [3.396561688,4.400293529,0],
           [1.38807019,1.850220317,0],
           [3.06407232,3.005305973,0],
           [7.627531214,2.759262235,1],
           [5.332441248,2.088626775,1],
           [6.922596716,1.77106367,1],
           [8.675418651,-0.242068655,1],
           [7.673756466,3.508563011,1]]

# first weight is always the bias as it is standalone and 
# not responsible for a specific input value
weights = [-0.1, 0.20653640140000007, -0.23418117710000003]

for row in dataset:
    prediction = predict(row, weights)
    print("Expected=%d, Predicted=%d" % (row[-1], prediction))

Expected=0, Predicted=0
Expected=0, Predicted=0
Expected=0, Predicted=0
Expected=0, Predicted=0
Expected=0, Predicted=0
Expected=1, Predicted=1
Expected=1, Predicted=1
Expected=1, Predicted=1
Expected=1, Predicted=1
Expected=1, Predicted=1


There are two inputs values (__X1__ and __X2__) and three weight values (__bias__, __w1__ and __w2__). The activation equation we have modeled for this problem is:

$activation = (w1 * X1) + (w2 * X2) + bias$

Or, with the specific weight values we chose by hand as:

$activation = (0.206 * X1) + (-0.234 * X2) + -0.1$


## 3. Training Network Weights.
We can estimate the __weight values__ for our training data using stochastic gradient descent.

Stochastic gradient descent requires two parameters:

- __Learning Rate__: Used to limit the amount each weight is corrected each time it is updated.
- __[Epochs](https://machinelearningmastery.com/difference-between-a-batch-and-an-epoch/)__: The number of times to run through the training data while updating the weight.

There are 3 loops we need to perform in the function:

- Loop over each epoch.
- Loop over each row in the training data for an epoch.
- Loop over each weight and update it for a row in an epoch.

__we update each weight for each row in the training data, each epoch.__

__Weights__ are updated based on the error the model made. The __error__ is calculated as the difference between the expected output value and the prediction made with the candidate weights.

There is one weight for each input attribute, and these are updated in a consistent way, for example:

$w(t+1)= w(t) + learning_rate * (expected(t) - predicted(t)) * x(t)$

The bias is updated in a similar way, except without an input as it is not associated with a specific input value:

$bias(t+1) = bias(t) + learning_rate * (expected(t) - predicted(t))$


In [8]:
# Make a prediction with weights
def predict(row, weights):
    activation = weights[0]
    for i in range(len(row)-1):
        activation += weights[i + 1] * row[i]
    return 1.0 if activation >= 0.0 else 0.0

In [9]:
# Estimate Perceptron weights using stochastic gradient descent
def train_weights(train, l_rate, n_epoch):
    weights = [0.0 for i in range(len(train[0]))]
    for epoch in range(n_epoch):
        sum_error = 0.0
        for row in train:
            prediction = predict(row, weights)
            error = row[-1] - prediction
            sum_error += error**2
            weights[0] = weights[0] + l_rate * error
            for i in range(len(row)-1):
                weights[i + 1] = weights[i + 1] + l_rate * error * row[i]
        print('>epoch=%d, lrate=%.3f, error=%.3f' % (epoch, l_rate, sum_error))
    return weights

In [10]:
# Calculate weights
dataset = [[2.7810836,2.550537003,0],
           [1.465489372,2.362125076,0],
           [3.396561688,4.400293529,0],
           [1.38807019,1.850220317,0],
           [3.06407232,3.005305973,0],
           [7.627531214,2.759262235,1],
           [5.332441248,2.088626775,1],
           [6.922596716,1.77106367,1],
           [8.675418651,-0.242068655,1],
           [7.673756466,3.508563011,1]]

l_rate = 0.1
n_epoch = 5
weights = train_weights(dataset, l_rate, n_epoch)

print(weights)

>epoch=0, lrate=0.100, error=2.000
>epoch=1, lrate=0.100, error=1.000
>epoch=2, lrate=0.100, error=0.000
>epoch=3, lrate=0.100, error=0.000
>epoch=4, lrate=0.100, error=0.000
[-0.1, 0.20653640140000007, -0.23418117710000003]


## 4. Modeling the Sonar Dataset.
In this section, we will train a Perceptron model using stochastic gradient descent on the Sonar dataset.

The dataset is first loaded, the string values converted to numeric and the output column is converted from strings to the integer values of 0 to 1. This is achieved with helper functions __load_csv()__, __str_column_to_float()__ and __str_column_to_int()__ to load and prepare the dataset.

We will use k-fold cross validation to estimate the performance of the learned model on unseen data. This means that we will construct and evaluate k models and estimate the performance as the mean model error. Classification accuracy will be used to evaluate each model. These behaviors are provided in the __cross_validation_split()__, __accuracy_metric()__ and __evaluate_algorithm()__ helper functions.

In [11]:
# Perceptron Algorithm on the Sonar Dataset
from random import seed
from random import randrange
from csv import reader

# Load a CSV file
def load_csv(filename):
    dataset = list()
    with open(filename, 'r') as file:
        csv_reader = reader(file)
        for row in csv_reader:
            if not row:
                continue
            dataset.append(row)
    return dataset

In [12]:
# Convert string column to float
def str_column_to_float(dataset, column):
    for row in dataset:
        row[column] = float(row[column].strip())

In [13]:
# Convert string column to integer
def str_column_to_int(dataset, column):
    class_values = [row[column] for row in dataset]
    unique = set(class_values)
    lookup = dict()
    for i, value in enumerate(unique):
        lookup[value] = i
    for row in dataset:
        row[column] = lookup[row[column]]
    return lookup

In [14]:
# Split a dataset into k folds
def cross_validation_split(dataset, n_folds):
    dataset_split = list()
    dataset_copy = list(dataset)
    fold_size = int(len(dataset) / n_folds)
    for i in range(n_folds):
        fold = list()
        while len(fold) < fold_size:
            index = randrange(len(dataset_copy))
            fold.append(dataset_copy.pop(index))
        dataset_split.append(fold)
    return dataset_split

In [15]:
# Calculate accuracy percentage
def accuracy_metric(actual, predicted):
    correct = 0
    for i in range(len(actual)):
        if actual[i] == predicted[i]:
            correct += 1
    return correct / float(len(actual)) * 100.0

In [16]:
# Evaluate an algorithm using a cross validation split
def evaluate_algorithm(dataset, algorithm, n_folds, *args):
    folds = cross_validation_split(dataset, n_folds)
    scores = list()
    for fold in folds:
        train_set = list(folds)
        train_set.remove(fold)
        train_set = sum(train_set, [])
        test_set = list()
        for row in fold:
            row_copy = list(row)
            test_set.append(row_copy)
            row_copy[-1] = None
        predicted = algorithm(train_set, test_set, *args)
        actual = [row[-1] for row in fold]
        accuracy = accuracy_metric(actual, predicted)
        scores.append(accuracy)
    return scores

In [17]:
# Make a prediction with weights
def predict(row, weights):
    activation = weights[0]
    for i in range(len(row)-1):
        activation += weights[i + 1] * row[i]
    return 1.0 if activation >= 0.0 else 0.0

In [18]:
# Estimate Perceptron weights using stochastic gradient descent
def train_weights(train, l_rate, n_epoch):
    weights = [0.0 for i in range(len(train[0]))]
    for epoch in range(n_epoch):
        for row in train:
            prediction = predict(row, weights)
            error = row[-1] - prediction
            weights[0] = weights[0] + l_rate * error
            for i in range(len(row)-1):
                weights[i + 1] = weights[i + 1] + l_rate * error * row[i]
    return weights

In [19]:
# Perceptron Algorithm With Stochastic Gradient Descent
def perceptron(train, test, l_rate, n_epoch):
    predictions = list()
    weights = train_weights(train, l_rate, n_epoch)
    for row in test:
        prediction = predict(row, weights)
        predictions.append(prediction)
    return(predictions)

In [35]:
# Test the Perceptron algorithm on the sonar dataset
seed(1)

# load and prepare data
filename = '..\\..\\..\\data\\sonar.all.data.csv'
dataset = load_csv(filename)
for i in range(len(dataset[0])-1):
    str_column_to_float(dataset, i)

# convert string class to integers
str_column_to_int(dataset, len(dataset[0])-1)

# evaluate algorithm
n_folds = 8 # (3)72.947, (5):73.659, (**8**):74.519, (9):72.464, (10):72.000, (12):71.078
l_rate = 0.01
n_epoch = 300 # (200):73.558, (250):75.962 (**300**):75.962, (400):75.481, (500):74.519, (700):74.519

scores = evaluate_algorithm(dataset, perceptron, n_folds, l_rate, n_epoch)

print('Scores: %s' % scores)
print('Mean Accuracy: %.3f%%' % (sum(scores)/float(len(scores))))

Scores: [80.76923076923077, 76.92307692307693, 92.3076923076923, 73.07692307692307, 76.92307692307693, 76.92307692307693, 61.53846153846154, 69.23076923076923]
Mean Accuracy: 75.962%


A k value of 3 was used for cross-validation, giving each fold 208/3 = 69.3 or just under 70 records to be evaluated upon each iteration. A learning rate of 0.1 and 500 training epochs were chosen with a little experimentation.

## 5. Extensions
This section lists extensions to this tutorial that you may wish to consider exploring.

- `Tune The Example`. Tune the learning rate, number of epochs and even data preparation method to get an improved score on the dataset.
- `Batch Stochastic Gradient Descent`. Change the stochastic gradient descent algorithm to accumulate updates across each epoch and only update the weights in a batch at the end of the epoch.
- `Additional Regression Problems`. Apply the technique to other classification problems on the UCI machine learning repository.