# Perceptron Algorithm

## Learning Task 1

Build a classifier using the perceptron algorithm. Figure out if dataset is linearly seperable.

In [1]:
import numpy as np
import pandas as pd
import random
import sys
sys.path.append("..")
from preprocessor import Preprocessor
from Models.Perceptron import Perceptron
import warnings 
warnings.filterwarnings(action="ignore")

In [2]:
dataset = pd.read_csv("../dataset.csv")
dataset.drop(columns=["id"],inplace=True)
# Creating a dataframe to store the results
results = pd.DataFrame(columns=["threshold","delta","method","mean_accuracy","std_accuracy","epochs"])

In [3]:
preprocessor = Preprocessor(dataset,"diagnosis")
splits = preprocessor.preprocess(drop_na=True,n_splits=10,standardize=False,labels=[-1,1]) # splitting into training and testing

{-1: 'B', 1: 'M'}


### Using training data as given

#### Using an infinite loop for the model

In [4]:
#PM1 is perceptron model 1 without shuffling training data
pm1 = Perceptron()
accuracies:list[float]=[]
for split in splits:
    train, test = split
    X_train, y_train = train.drop(columns=["diagnosis"]).to_numpy(), train["diagnosis"].to_numpy()
    X_test, y_test = test.drop(columns=["diagnosis"]).to_numpy(), test["diagnosis"].to_numpy()
    pm1.fit(X_train,y_train,True)
    tp,tn,fp,fn=pm1.score(X_test,y_test)
    accuracies.append((tp + tn) / (tp + tn + fp + fn))
result_dict ={"threshold":100,"delta":0.001,"method":"PM1-Infinite",
              "mean_accuracy": np.round((np.mean(accuracies)*100), 2),
        "std_accuracy": np.round((np.std(accuracies)*100), 2)
              ,"epochs":None}
results = results.append(result_dict,ignore_index=True)

Breaking as not improving
Breaking as not improving
Breaking as not improving
Breaking as not improving
Breaking as not improving
Breaking as not improving
Breaking as not improving
Breaking as not improving
Breaking as not improving
Breaking as not improving


#### Using epochs instead of infinite model

In [5]:
accuracies:list[float]=[]
for split in splits:
    train, test = split
    X_train, y_train = train.drop(columns=["diagnosis"]).to_numpy(), train["diagnosis"].to_numpy()
    X_test, y_test = test.drop(columns=["diagnosis"]).to_numpy(), test["diagnosis"].to_numpy()
    pm1.fit(X_train,y_train,False,epochs=1000)
    tp,tn,fp,fn=pm1.score(X_test,y_test)
    accuracies.append((tp + tn) / (tp + tn + fp + fn))
result_dict ={"threshold":None,"delta":None,"method":"PM1-Epochs",
              "mean_accuracy": np.round((np.mean(accuracies)*100), 2),
                "std_accuracy": np.round((np.std(accuracies)*100), 2)
              ,"epochs":1000}
results = results.append(result_dict,ignore_index=True)
## As epochs increases we break if if we get 100 percent accuracy

### Shuffling the training data

In [6]:
pm2 = Perceptron()

### Using an infinite loop with some threshold

In [7]:
accuracies:list[float]=[]
for split in splits:
    train, test = split
    train = train.sample(frac=1).reset_index(drop=True)
    X_train, y_train = train.drop(columns=["diagnosis"]).to_numpy(), train["diagnosis"].to_numpy()
    X_test, y_test = test.drop(columns=["diagnosis"]).to_numpy(), test["diagnosis"].to_numpy()
    pm2.fit(X_train,y_train,True)
    tp,tn,fp,fn=pm2.score(X_test,y_test)
    accuracies.append((tp + tn) / (tp + tn + fp + fn))
result_dict ={"threshold":100,"delta":0.001,"method":"PM2-Infinite",
              "mean_accuracy": np.round((np.mean(accuracies)*100), 2),
                "std_accuracy": np.round((np.std(accuracies)*100), 2)
              ,"epochs":None
              }
results = results.append(result_dict,ignore_index=True)

Breaking as not improving
Breaking as not improving
Breaking as not improving
Breaking as not improving
Breaking as not improving
Breaking as not improving
Breaking as not improving
Breaking as not improving
Breaking as not improving
Breaking as not improving


### Using epochs

In [8]:
accuracies:list[float]=[]
for split in splits:
    train, test = split
    train = train.sample(frac=1).reset_index(drop=True)
    X_train, y_train = train.drop(columns=["diagnosis"]).to_numpy(), train["diagnosis"].to_numpy()
    X_test, y_test = test.drop(columns=["diagnosis"]).to_numpy(), test["diagnosis"].to_numpy()
    pm2.fit(X_train,y_train,False,epochs=1000)
    tp,tn,fp,fn=pm2.score(X_test,y_test)
    accuracies.append((tp + tn) / (tp + tn + fp + fn))
result_dict ={"threshold":None,"delta":None,"method":"PM2-Epochs",
              "mean_accuracy": np.round((np.mean(accuracies)*100), 2),
                "std_accuracy": np.round((np.std(accuracies)*100), 2)
              ,"epochs":1000}
results = results.append(result_dict,ignore_index=True)

## Learning Task 2

Build a perceptron model on normalized data

In [9]:
splits = preprocessor.preprocess(n_splits=10,standardize=True,labels=[-1,1]) # splitting into training and testing

{-1: 'B', 1: 'M'}


### Using an infinite loop that terminates on some learning threshold

In [10]:
pm3 = Perceptron()
accuracies:list[float]=[]
for split in splits:
    train, test = split
    X_train, y_train = train.drop(columns=["diagnosis"]).to_numpy(), train["diagnosis"].to_numpy()
    X_test, y_test = test.drop(columns=["diagnosis"]).to_numpy(), test["diagnosis"].to_numpy()
    pm3.fit(X_train,y_train,True)
    tp,tn,fp,fn=pm3.score(X_test,y_test)
    accuracies.append((tp + tn) / (tp + tn + fp + fn))
result_dict ={"threshold":100,"delta":0.001,"method":"PM3-Infinite",
              "mean_accuracy": np.round((np.mean(accuracies)*100), 2),
                "std_accuracy": np.round((np.std(accuracies)*100), 2)
              ,"epochs":None
              }
results = results.append(result_dict,ignore_index=True)

Breaking as learnt perfect decision boundary
Breaking as learnt perfect decision boundary
Breaking as learnt perfect decision boundary
Breaking as learnt perfect decision boundary
Breaking as learnt perfect decision boundary
Breaking as learnt perfect decision boundary
Breaking as learnt perfect decision boundary
Breaking as learnt perfect decision boundary
Breaking as learnt perfect decision boundary
Breaking as learnt perfect decision boundary


### Using Epochs

In [11]:
accuracies:list[float]=[]
for split in splits:
    train, test = split
    X_train, y_train = train.drop(columns=["diagnosis"]).to_numpy(), train["diagnosis"].to_numpy()
    X_test, y_test = test.drop(columns=["diagnosis"]).to_numpy(), test["diagnosis"].to_numpy()
    pm3.fit(X_train,y_train,inf_loop=False,epochs=1000)
    tp,tn,fp,fn=pm3.score(X_test,y_test)
    accuracies.append((tp + tn) / (tp + tn + fp + fn))
result_dict ={"threshold":None,"delta":None,"method":"PM3-Epochs",
              "mean_accuracy": np.round((np.mean(accuracies)*100), 2),
                "std_accuracy": np.round((np.std(accuracies)*100), 2)
              ,"epochs":1000}
results = results.append(result_dict,ignore_index=True)

## Learning Task 3

Change the order of the features in the dataset randomly and build a perceptron model

In [12]:
splits = preprocessor.preprocess(drop_na=True,n_splits=10,standardize=False,labels=[-1,1]) # splitting into training and testing

{-1: 'B', 1: 'M'}


### Using an infinite loop 

In [13]:
pm4 = Perceptron()
accuracies:list[float]=[]
for split in splits:
    train, test = split
    train = train.sample(frac=1,axis = 1,random_state=23)
    test = test.sample(frac=1,axis = 1,random_state=23)
    X_train, y_train = train.drop(columns=["diagnosis"]).to_numpy(), train["diagnosis"].to_numpy()
    X_test, y_test = test.drop(columns=["diagnosis"]).to_numpy(), test["diagnosis"].to_numpy()
    pm4.fit(X_train,y_train,True)
    tp,tn,fp,fn=pm4.score(X_test,y_test)
    accuracies.append((tp + tn) / (tp + tn + fp + fn))
result_dict ={"threshold":100,"delta":0.001,"method":"PM4-Infinite",
              "mean_accuracy": np.round((np.mean(accuracies)*100), 2),
                "std_accuracy": np.round((np.std(accuracies)*100), 2)
              ,"epochs":None
              }
results = results.append(result_dict,ignore_index=True)

Breaking as not improving
Breaking as not improving
Breaking as not improving
Breaking as not improving
Breaking as not improving
Breaking as not improving
Breaking as not improving
Breaking as not improving
Breaking as not improving
Breaking as not improving


### Using Epochs

In [14]:
accuracies:list[float]=[]
for split in splits:
    train, test = split
    train = train.sample(frac=1,axis = 1,random_state=23)
    test = test.sample(frac=1,axis = 1,random_state=23)
    X_train, y_train = train.drop(columns=["diagnosis"]).to_numpy(), train["diagnosis"].to_numpy()
    X_test, y_test = test.drop(columns=["diagnosis"]).to_numpy(), test["diagnosis"].to_numpy()
    pm4.fit(X_train,y_train,inf_loop=False,epochs=1000)
    tp,tn,fp,fn=pm4.score(X_test,y_test)
    accuracies.append((tp + tn) / (tp + tn + fp + fn))
result_dict ={"threshold":None,"delta":None,"method":"PM4-Epochs",
              "mean_accuracy": np.round((np.mean(accuracies)*100), 2),
                "std_accuracy": np.round((np.std(accuracies)*100), 2)
              ,"epochs":1000}
results = results.append(result_dict,ignore_index=True)

## Results

In [15]:
results.sort_values(by=["mean_accuracy"],ascending=False)

Unnamed: 0,threshold,delta,method,mean_accuracy,std_accuracy,epochs
4,100.0,0.001,PM3-Infinite,94.24,0.94,
5,,,PM3-Epochs,94.24,0.94,1000.0
1,,,PM1-Epochs,91.41,2.26,1000.0
7,,,PM4-Epochs,91.41,2.26,1000.0
0,100.0,0.001,PM1-Infinite,91.11,2.11,
6,100.0,0.001,PM4-Infinite,91.11,2.11,
3,,,PM2-Epochs,89.04,5.59,1000.0
2,100.0,0.001,PM2-Infinite,86.62,5.67,


## Conclusion

#### Learning Task 1

Since perceptron is an algorithm which varies depending on the order of the training data, we observe that the PM1 accuracies and PM2 accuracies vary buy around 2 percent. This is because the training samples were shuffled in PM2 which gave a different model that classified the testing data differently

#### Learning Task 2

In this model we normalize the data twice, once with respect to the training sample's mean and the testing sample's mean(the code which is available in the preprocessor file). Since normalized data treats all the features with equal weightage, we observe that it gives a higher accuracy with respect to PM1. 

#### Learning Task 3

The order of the tuples in Perceptron does not matter as in the end we get the same weights but in a shuffled order, which denotes the same decision boundary. Hence we get the same accuracy for both pm1 and pm4. 