# Introduction


In this homework you will implement a simple binary perceptron and train it to perform simple classification. This task involves classifying tumors as malignant or benign (0 and 1 respectively) according to 30 different measurements (features).

# Grading


1. As long as your perceptron classifier (or the other classifiers other than a naive classifier, such as predicting the most frequent class label for all test cases) achieves an at least 70% accuracy and runs in less than three minutes you will get the full credit!
2. If you classifier achieves more than 90% accuracy, you get 5 bonus points (20% of the full credit for Project 2).
2. Please do not use sklearn or any off the shelf perceptron classifiers. 
3. We have a solution with less than 12 lines of code that gets over 94% accuracy. This is not a complicated/long coding assignment. 
4. Please make sure to document your code.


# Procedure

There are two phases for this project:
1. getting and loading the dataset
2. implementing, training and testing the perceptron.


# Phase One: Packages, Data, and Setup


The package sklearn is a popular machine learning library for python. In addition to implementations of many algorithms and tools for  statistical analysis this package also contains many small datasets of anything. Please note you will not use that library for the perceptron algorithm --that is your task to implement. But you can use it for loading data, etc. 


In [15]:
import numpy as np
import sklearn
from sklearn import datasets
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import random

In [16]:
from sklearn.preprocessing import StandardScaler

Next we will import the data and split it to training and testing sets. The following lines load the UCI ML Breast Cancer Wisconsin (Diagnostic) Data Set. which contains 569 cases of tumors (each represented by 30 measurements). We split this data to 500 training cases and the rest for testing.

In [17]:
# load the data set
img,label=sklearn.datasets.load_breast_cancer(return_X_y=True)
# split the data set
TRAIN_SIZE = 511
label = 2*label-1
train_img,test_img = img[:TRAIN_SIZE], img[TRAIN_SIZE:]
train_label,test_label = label[:TRAIN_SIZE], label[TRAIN_SIZE:]

# Phase Two: Perceptron Model

In [18]:
# Perceptron Class
class Perceptron(object):
    # Initialize the perceptron
    def __init__(self, dim_input = 30, dim_out = 2, learning_rate = 1):
        # model parameters 
        self.w = np.zeros(dim_input)
        self.bias = 0.0
        
        # learning rate
        self.learning_rate = learning_rate
    
    
    def predict(self,input_array):
        # See the "Perceptron learning rule" slides: w * x
        # Calculate dot product of weights and the input array, add the bias
        output = np.dot(self.w, input_array) + self.bias
        # Apply the step function
        return np.where(output >= 0, 1, -1)
                    
            
    def train(self, training_inputs, labels):
        for inputs, label in zip(training_inputs, labels):
            prediction = self.predict(inputs)
            # Update weights and bias
            self.w += self.learning_rate * (label - prediction) * inputs
            self.bias += self.learning_rate * (label - prediction)

    
    def test(self, testing_inputs, labels):
        # number of correct predictions
        count_correct = 0
        # a list of the predicted labels the same order as the input 
        pred_list = []
        for test_array, label in zip(testing_inputs,labels):
            prediction = self.predict(test_array)
            if prediction == label:
                count_correct += 1
            pred_list.append(prediction)
        accuracy = float(count_correct)/len(test_label)
        print('Accuracy is '+str(accuracy))
        return np.asarray(pred_list)

In [19]:
# Feature scaling
scaler = StandardScaler()
train_img = scaler.fit_transform(train_img)
test_img = scaler.transform(test_img)

np.random.seed(42)

In [20]:
# Number of epochs (iterations over the training set)
NUM_EPOCH = 500

In [21]:
perceptron = Perceptron(learning_rate=0.5)
for ii in range(NUM_EPOCH):
    # Shuffle the training data
    permutation = np.random.permutation(len(train_img))
    train_img, train_label = train_img[permutation], train_label[permutation]
    perceptron.train(train_img, train_label)
print(str(NUM_EPOCH)+' epochs')
pred_array = perceptron.test(test_img, test_label)

500 epochs
Accuracy is 0.9827586206896551
