# Classification with a neuron

### Part 1: Perceptron classification

The perceptron, as we saw in class, is the simpliest form of neural network consisting of a single neuron. Because it's so simple, it can only be used for two-class classification problems.

The perceptron is inspired by a single neural cell, called a neuron. This accepts input signals via dendrites. Similarly, the perceptron receives inputs from examples of training data, that we weight and combine in a linear equation, called the activation function.

activation = bias + sum(weight(i) * xi)

You should notice the similarity between this, and the linear regression and logistic regression that we've implemented so far.

Once we've computed the activation, we then transform it into the output value, using a transfer function (such as the step transfer function below) 

prediction = 1.0 IF activation >= 0.0, ELSE 0.0

In order for this mechanism to work, we have to estimate the weights given in the activation function. Fortunately, we know how to do that using stochastic gradient descent.

Each epoch, the weights are updated using the equation:

w = w + learning_rate * (expected - predicted) * x

Where you know that (expected - predicted) is the measure of error.

This is enough information for you to implement the following (which will be closely related to previous assignments):

- a predict function
    - that takes a single instance, and a list of weights, where weights[0] is the bias
- a stochastic gradient descent function
    - that takes training data, learning rate and a number of epochs
    - where the weights are first assigned zero scores, and then iteratively updated based on the formula
        - w(i) = w(i) + learning_rate * (expected - predicted) * x(i)
    - where you also update the bias based on the formula:
        - bias = bias + learning_rate * (expected - predicted)
- a perceptron function
    - that takes training set, test set, learning rate and epochs
    - that learns the weights using SGD
    - then makes predictions over the test set using these weights
    - and returns these predictions as a list

I've given you a contrived data set for both your predict function, and for testing your SGD function. I've included sample output below. 

Then I want you to apply your classifier to the included sonar dataset, using the parameters given, as well as running a reasonable baseline comparison algorithm. You should perform a 3 fold cross validation. You can find out more about this data set here: https://archive.ics.uci.edu/ml/datasets/Connectionist+Bench+(Sonar,+Mines+vs.+Rocks)

The extra twist here is that the data in the sonar data set should be converted to floats EXCEPT for the class (in the last position in each instance), that we should convert to an integer that represents...what? Currently, the class is a nominal category, and we should convert it to an integer: 1 for one class and 0 for the other. Also we will not normalize this data. Why not?

In [3]:
# Implement or copy your code here
def activation_predict(instance, weights):
    activation = weights[0] 
    for i in range(len(instance)-1):      
        activation += weights[i+1] * instance[i]
    if activation >= 0.0:
        return 1.0
    else:
        return 0.0

def sgd(dataset, learning_rate, epochs):
    length = len(dataset[0])
    weights = [0.0 for i in range(length)]
    
    for i in range(epochs):
        total_error = 0
        for instance in dataset:
            predictedY = activation_predict(instance, weights)
            error = instance[-1] - predictedY 
            total_error += error**2
            
            weights[0] = weights[0] + learning_rate * error
            for index in range(length-1):
                weights[index+1] = weights[index+1] + learning_rate * error * instance[index]
        print("Epoch =", i, "lrate = ", learning_rate, "error =", total_error)
    return weights

def perceptron(train,test,learning_rate,epochs):
    prediction = []
    weights = sgd(train, learning_rate, epochs)
    for instance in test:
        prediction.append(activation_predict(instance, weights))
    return prediction

# Contrived data
# Predict should work, given the weights below
dataset = [[2.7810836,2.550537003,0],
    [1.465489372,2.362125076,0],
    [3.396561688,4.400293529,0],
    [1.38807019,1.850220317,0],
    [3.06407232,3.005305973,0],
    [7.627531214,2.759262235,1],
    [5.332441248,2.088626775,1],
    [6.922596716,1.77106367,1],
    [8.675418651,-0.242068655,1],
    [7.673756466,3.508563011,1]]

weights = [-0.1, 0.20653640140000007, -0.23418117710000003]

# Using your SGD function with a learning rate of 0.1, and 5 epochs, should give you:
#
#>epoch=0, lrate=0.100, error=2.000
#>epoch=1, lrate=0.100, error=1.000
#>epoch=2, lrate=0.100, error=0.000
#>epoch=3, lrate=0.100, error=0.000
#>epoch=4, lrate=0.100, error=0.000
#
#[-0.1, 0.20653640140000007, -0.23418117710000003]

print(sgd(dataset, 0.1, 5))

Epoch = 0 lrate =  0.1 error = 2.0
Epoch = 1 lrate =  0.1 error = 1.0
Epoch = 2 lrate =  0.1 error = 0.0
Epoch = 3 lrate =  0.1 error = 0.0
Epoch = 4 lrate =  0.1 error = 0.0
[-0.1, 0.20653640140000007, -0.23418117710000003]


In [None]:
filename = 'sonar.all-data.csv'

#==============================================
import random 
import csv
import copy

def cross_validation_data(dataset, folds):
    new_list = []
    copy_list = copy.deepcopy(dataset)
    fold_len = len(dataset)/folds
    for i in range(folds):
        current_fold = []
        while len(current_fold) < fold_len and len(copy_list)!=0:
            random_inst = random.choice(copy_list) 
            current_fold.append(random_inst)
            copy_list.remove(random_inst)
        new_list.append(current_fold)
    return new_list

def evaluate_algorithm(dataset, algorithm, folds, metric, *args):
    new_data = cross_validation_data(dataset, folds)  
    scores = []
    for fold in new_data:
        train = copy.deepcopy(new_data)
        train.remove(fold)
        train = [element for sublist in train for element in sublist]
        test = [instance[:-1] + [None] for instance in fold]
        
        predicted = algorithm(train,test, *args)
        actual = [instance[-1] for instance in fold]
        result = metric(actual,predicted)
        scores.append(result)

    return scores

#=======================================================
def load_data(filename):
    csv_reader = csv.reader(open(filename, newline=''), delimiter=',')
    new_list = []
    for row in csv_reader:
        new_list.append(row)
    return new_list

data_sonar = load_data(filename)

# Convert the features from strings to floats 
def column2Float(dataset, column):
    for row in dataset:
        row[column] = float(row[column])
        
for i in range(len(data_sonar[0])-1):  
    column2Float(data_sonar, i)
for row in data_sonar:
    if row[-1] == 'R':
        row[-1] = 0
    else:
        row[-1] = 1
        
def mean(listOfValues):
    return sum(listOfValues)/len(listOfValues)
#==============================================
def accuracy(actual, predicted):
    length = len(actual)
    counter = 0
    for i in range(length):
        if actual[i] == predicted[i]:
            counter += 1
    return (counter/length)*100

import collections
from collections import Counter 

def zeroRC(train, test):
    valueY = [instance[-1] for instance in train]
    most_occur = Counter(valueY).most_common(1)[0][0] 
    return [most_occur for i in range(len(test))]

#==============================================
folds = 3
learning_rate = 0.01
epochs = 500

percep_scores = evaluate_algorithm(data_sonar, perceptron, folds, accuracy, learning_rate, epochs)
zeroRC_result = evaluate_algorithm(data_sonar, zeroRC, folds, accuracy)

print("\nNumber of instances:", len(data_sonar))
print("Number of features :", len(data_sonar[0])-1, "\n")

print("Perceptron highest score:", max(percep_scores))
print("perceptron lowest score:", min(percep_scores))
print("perceptron mean score:", mean(percep_scores), "\n")

print("zeroRC highest score:", max(zeroRC_result))
print("zeroRC lowest score:", min(zeroRC_result))
print("zeroRC mean score:", mean(zeroRC_result))

Epoch = 0 lrate =  0.01 error = 60.0
Epoch = 1 lrate =  0.01 error = 40.0
Epoch = 2 lrate =  0.01 error = 34.0
Epoch = 3 lrate =  0.01 error = 45.0
Epoch = 4 lrate =  0.01 error = 36.0
Epoch = 5 lrate =  0.01 error = 40.0
Epoch = 6 lrate =  0.01 error = 31.0
Epoch = 7 lrate =  0.01 error = 30.0
Epoch = 8 lrate =  0.01 error = 30.0
Epoch = 9 lrate =  0.01 error = 32.0
Epoch = 10 lrate =  0.01 error = 30.0
Epoch = 11 lrate =  0.01 error = 32.0
Epoch = 12 lrate =  0.01 error = 30.0
Epoch = 13 lrate =  0.01 error = 33.0
Epoch = 14 lrate =  0.01 error = 34.0
Epoch = 15 lrate =  0.01 error = 24.0
Epoch = 16 lrate =  0.01 error = 32.0
Epoch = 17 lrate =  0.01 error = 35.0
Epoch = 18 lrate =  0.01 error = 25.0
Epoch = 19 lrate =  0.01 error = 21.0
Epoch = 20 lrate =  0.01 error = 26.0
Epoch = 21 lrate =  0.01 error = 29.0
Epoch = 22 lrate =  0.01 error = 27.0
Epoch = 23 lrate =  0.01 error = 24.0
Epoch = 24 lrate =  0.01 error = 36.0
Epoch = 25 lrate =  0.01 error = 26.0
Epoch = 26 lrate =  0.

Epoch = 236 lrate =  0.01 error = 20.0
Epoch = 237 lrate =  0.01 error = 14.0
Epoch = 238 lrate =  0.01 error = 16.0
Epoch = 239 lrate =  0.01 error = 18.0
Epoch = 240 lrate =  0.01 error = 17.0
Epoch = 241 lrate =  0.01 error = 15.0
Epoch = 242 lrate =  0.01 error = 17.0
Epoch = 243 lrate =  0.01 error = 23.0
Epoch = 244 lrate =  0.01 error = 21.0
Epoch = 245 lrate =  0.01 error = 19.0
Epoch = 246 lrate =  0.01 error = 22.0
Epoch = 247 lrate =  0.01 error = 24.0
Epoch = 248 lrate =  0.01 error = 24.0
Epoch = 249 lrate =  0.01 error = 24.0
Epoch = 250 lrate =  0.01 error = 17.0
Epoch = 251 lrate =  0.01 error = 15.0
Epoch = 252 lrate =  0.01 error = 21.0
Epoch = 253 lrate =  0.01 error = 16.0
Epoch = 254 lrate =  0.01 error = 21.0
Epoch = 255 lrate =  0.01 error = 18.0
Epoch = 256 lrate =  0.01 error = 15.0
Epoch = 257 lrate =  0.01 error = 19.0
Epoch = 258 lrate =  0.01 error = 20.0
Epoch = 259 lrate =  0.01 error = 16.0
Epoch = 260 lrate =  0.01 error = 20.0
Epoch = 261 lrate =  0.01

- Here, class 'R' is 0, class 'M' is 1. We don't need to normalize the class because they're already in range (0,1). 

- Perceptron clearly outperforms zeroRC with >70% accuracy compare to <55% accuracy of ZeroRC.