# Perceptron Algorithm

## Learning Task 1

Build a classifier using the perceptron algorithm. Figure out if dataset is linearly seperable.

In [1]:
import numpy as np
import pandas as pd
import random
import sys
sys.path.append("..")
from preprocessor import Preprocessor
from Models.Perceptron import Perceptron
import warnings 
warnings.filterwarnings(action="ignore")

In [2]:
dataset = pd.read_csv("../dataset.csv")
dataset.drop(columns=["id"],inplace=True)
# Creating a dataframe to store the results
results = pd.DataFrame(columns=["threshold","delta","method","mean_accuracy","std_accuracy","epochs"])

In [3]:
preprocessor = Preprocessor(dataset,"diagnosis")
splits = preprocessor.preprocess(drop_na=True,n_splits=10,standardize=False,labels=[-1,1]) # splitting into training and testing

{-1: 'B', 1: 'M'}


### Using training data as given

#### Using an infinite loop for the model

In [4]:
#PM1 is perceptron model 1 without shuffling training data
pm1 = Perceptron()
accuracies:list[float]=[]
for i, split in enumerate(splits):
    print(f"Split {i}:")
    train, test = split
    X_train, y_train = train.drop(columns=["diagnosis"]).to_numpy(), train["diagnosis"].to_numpy()
    X_test, y_test = test.drop(columns=["diagnosis"]).to_numpy(), test["diagnosis"].to_numpy()
    pm1.fit(X_train,y_train,True)
    tp,tn,fp,fn=pm1.score(X_test,y_test, _print=True)
    accuracies.append((tp + tn) / (tp + tn + fp + fn))
    print("-----------------------")
result_dict ={"threshold":100,"delta":0.001,"method":"PM1-Infinite",
              "mean_accuracy": np.round((np.mean(accuracies)*100), 2),
        "std_accuracy": np.round((np.std(accuracies)*100), 2)
              ,"epochs":None}
results = results.append(result_dict,ignore_index=True)

Split 0:
Breaking as not improving
--------Results--------
Accuracy: 0.9343434343434344
----Class 1----
Precision: 0.9587628865979382
Recall: 0.9117647058823529
----Class 0----
Precision: 0.9108910891089109
Recall: 0.9583333333333334
-----------------------
Split 1:
Breaking as not improving
--------Results--------
Accuracy: 0.9191919191919192
----Class 1----
Precision: 0.897196261682243
Recall: 0.9504950495049505
----Class 0----
Precision: 0.945054945054945
Recall: 0.8865979381443299
-----------------------
Split 2:
Breaking as not improving
--------Results--------
Accuracy: 0.9242424242424242
----Class 1----
Precision: 0.9019607843137255
Recall: 0.9484536082474226
----Class 0----
Precision: 0.9479166666666666
Recall: 0.900990099009901
-----------------------
Split 3:
Breaking as not improving
--------Results--------
Accuracy: 0.9292929292929293
----Class 1----
Precision: 0.9183673469387755
Recall: 0.9375
----Class 0----
Precision: 0.94
Recall: 0.9215686274509803
---------------------

#### Using epochs instead of infinite model

In [5]:
accuracies:list[float]=[]
for i, split in enumerate(splits):
    print(f"Split {i}:")
    train, test = split
    X_train, y_train = train.drop(columns=["diagnosis"]).to_numpy(), train["diagnosis"].to_numpy()
    X_test, y_test = test.drop(columns=["diagnosis"]).to_numpy(), test["diagnosis"].to_numpy()
    pm1.fit(X_train,y_train,False,epochs=1000)
    tp,tn,fp,fn=pm1.score(X_test,y_test, _print=True)
    accuracies.append((tp + tn) / (tp + tn + fp + fn))
    print("-----------------------")
result_dict ={"threshold":None,"delta":None,"method":"PM1-Epochs",
              "mean_accuracy": np.round((np.mean(accuracies)*100), 2),
                "std_accuracy": np.round((np.std(accuracies)*100), 2)
              ,"epochs":1000}
results = results.append(result_dict,ignore_index=True)
## As epochs increases we break if if we get 100 percent accuracy

Split 0:
--------Results--------
Accuracy: 0.9191919191919192
----Class 1----
Precision: 0.8981481481481481
Recall: 0.9509803921568627
----Class 0----
Precision: 0.9444444444444444
Recall: 0.8854166666666666
-----------------------
Split 1:
--------Results--------
Accuracy: 0.9343434343434344
----Class 1----
Precision: 0.9230769230769231
Recall: 0.9504950495049505
----Class 0----
Precision: 0.9468085106382979
Recall: 0.9175257731958762
-----------------------
Split 2:
--------Results--------
Accuracy: 0.9343434343434344
----Class 1----
Precision: 0.92
Recall: 0.9484536082474226
----Class 0----
Precision: 0.9489795918367347
Recall: 0.9207920792079208
-----------------------
Split 3:
--------Results--------
Accuracy: 0.9292929292929293
----Class 1----
Precision: 0.8942307692307693
Recall: 0.96875
----Class 0----
Precision: 0.9680851063829787
Recall: 0.8921568627450981
-----------------------
Split 4:
--------Results--------
Accuracy: 0.9141414141414141
----Class 1----
Precision: 0.861111

In [6]:
results

Unnamed: 0,threshold,delta,method,mean_accuracy,std_accuracy,epochs
0,100.0,0.001,PM1-Infinite,91.11,2.11,
1,,,PM1-Epochs,91.41,2.26,1000.0


### Shuffling the training data

In [7]:
pm2 = Perceptron()

### Using an infinite loop with some threshold

In [8]:
accuracies:list[float]=[]
for i, split in enumerate(splits):
    print(f"Split {i}:")
    train, test = split
    train = train.sample(frac=1,random_state=42).reset_index(drop=True)
    X_train, y_train = train.drop(columns=["diagnosis"]).to_numpy(), train["diagnosis"].to_numpy()
    X_test, y_test = test.drop(columns=["diagnosis"]).to_numpy(), test["diagnosis"].to_numpy()
    pm2.fit(X_train,y_train,True)
    tp,tn,fp,fn=pm2.score(X_test,y_test, _print=True)
    accuracies.append((tp + tn) / (tp + tn + fp + fn))
    print("-----------------------")
result_dict ={"threshold":100,"delta":0.001,"method":"PM2-Infinite",
              "mean_accuracy": np.round((np.mean(accuracies)*100), 2),
                "std_accuracy": np.round((np.std(accuracies)*100), 2)
              ,"epochs":None
              }
results = results.append(result_dict,ignore_index=True)

Split 0:
Breaking as not improving
--------Results--------
Accuracy: 0.9242424242424242
----Class 1----
Precision: 0.9065420560747663
Recall: 0.9509803921568627
----Class 0----
Precision: 0.945054945054945
Recall: 0.8958333333333334
-----------------------
Split 1:
Breaking as not improving
--------Results--------
Accuracy: 0.9242424242424242
----Class 1----
Precision: 0.967391304347826
Recall: 0.8811881188118812
----Class 0----
Precision: 0.8867924528301887
Recall: 0.9690721649484536
-----------------------
Split 2:
Breaking as not improving
--------Results--------
Accuracy: 0.9242424242424242
----Class 1----
Precision: 0.9270833333333334
Recall: 0.9175257731958762
----Class 0----
Precision: 0.9215686274509803
Recall: 0.9306930693069307
-----------------------
Split 3:
Breaking as not improving
--------Results--------
Accuracy: 0.9292929292929293
----Class 1----
Precision: 0.9361702127659575
Recall: 0.9166666666666666
----Class 0----
Precision: 0.9230769230769231
Recall: 0.94117647058

### Using epochs

In [9]:
accuracies:list[float]=[]
for i, split in enumerate(splits):
    print(f"Split {i}:")
    train, test = split
    train = train.sample(frac=1).reset_index(drop=True)
    X_train, y_train = train.drop(columns=["diagnosis"]).to_numpy(), train["diagnosis"].to_numpy()
    X_test, y_test = test.drop(columns=["diagnosis"]).to_numpy(), test["diagnosis"].to_numpy()
    pm2.fit(X_train,y_train,False,epochs=1000)
    tp,tn,fp,fn=pm2.score(X_test,y_test, _print=True)
    accuracies.append((tp + tn) / (tp + tn + fp + fn))
    print("-----------------------")
result_dict ={"threshold":None,"delta":None,"method":"PM2-Epochs",
              "mean_accuracy": np.round((np.mean(accuracies)*100), 2),
                "std_accuracy": np.round((np.std(accuracies)*100), 2)
              ,"epochs":1000}
results = results.append(result_dict,ignore_index=True)

Split 0:
--------Results--------
Accuracy: 0.9090909090909091
----Class 1----
Precision: 0.9666666666666667
Recall: 0.8529411764705882
----Class 0----
Precision: 0.8611111111111112
Recall: 0.96875
-----------------------
Split 1:
--------Results--------
Accuracy: 0.8484848484848485
----Class 1----
Precision: 1.0
Recall: 0.7029702970297029
----Class 0----
Precision: 0.7637795275590551
Recall: 1.0
-----------------------
Split 2:
--------Results--------
Accuracy: 0.9040404040404041
----Class 1----
Precision: 0.9875
Recall: 0.8144329896907216
----Class 0----
Precision: 0.847457627118644
Recall: 0.9900990099009901
-----------------------
Split 3:
--------Results--------
Accuracy: 0.8333333333333334
----Class 1----
Precision: 1.0
Recall: 0.65625
----Class 0----
Precision: 0.7555555555555555
Recall: 1.0
-----------------------
Split 4:
--------Results--------
Accuracy: 0.8888888888888888
----Class 1----
Precision: 0.9866666666666667
Recall: 0.7789473684210526
----Class 0----
Precision: 0.829

In [10]:
results

Unnamed: 0,threshold,delta,method,mean_accuracy,std_accuracy,epochs
0,100.0,0.001,PM1-Infinite,91.11,2.11,
1,,,PM1-Epochs,91.41,2.26,1000.0
2,100.0,0.001,PM2-Infinite,92.22,0.61,
3,,,PM2-Epochs,88.23,4.69,1000.0


## Learning Task 2

Build a perceptron model on normalized data

In [11]:
splits = preprocessor.preprocess(n_splits=10,standardize=True, drop_na=False, labels=[-1,1]) # splitting into training and testing

{-1: 'B', 1: 'M'}


### Using an infinite loop that terminates on some learning threshold

In [12]:
pm3 = Perceptron()
accuracies:list[float]=[]
for i, split in enumerate(splits):
    print(f"Split {i}:")
    train, test = split
    X_train, y_train = train.drop(columns=["diagnosis"]).to_numpy(), train["diagnosis"].to_numpy()
    X_test, y_test = test.drop(columns=["diagnosis"]).to_numpy(), test["diagnosis"].to_numpy()
    pm3.fit(X_train,y_train,True)
    tp,tn,fp,fn=pm3.score(X_test,y_test, _print=True)
    accuracies.append((tp + tn) / (tp + tn + fp + fn))
    print("-----------------------")
result_dict ={"threshold":100,"delta":0.001,"method":"PM3-Infinite",
              "mean_accuracy": np.round((np.mean(accuracies)*100), 2),
                "std_accuracy": np.round((np.std(accuracies)*100), 2)
              ,"epochs":None
              }
results = results.append(result_dict,ignore_index=True)

Split 0:
Breaking as learnt perfect decision boundary
--------Results--------
Accuracy: 0.9545454545454546
----Class 1----
Precision: 0.9696969696969697
Recall: 0.9411764705882353
----Class 0----
Precision: 0.9393939393939394
Recall: 0.96875
-----------------------
Split 1:
Breaking as learnt perfect decision boundary
--------Results--------
Accuracy: 0.9545454545454546
----Class 1----
Precision: 0.9791666666666666
Recall: 0.9306930693069307
----Class 0----
Precision: 0.9313725490196079
Recall: 0.979381443298969
-----------------------
Split 2:
Breaking as learnt perfect decision boundary
--------Results--------
Accuracy: 0.9494949494949495
----Class 1----
Precision: 0.9578947368421052
Recall: 0.9381443298969072
----Class 0----
Precision: 0.941747572815534
Recall: 0.9603960396039604
-----------------------
Split 3:
Breaking as learnt perfect decision boundary
--------Results--------
Accuracy: 0.9444444444444444
----Class 1----
Precision: 0.956989247311828
Recall: 0.9270833333333334
---

### Using Epochs

In [13]:
accuracies:list[float]=[]
for i, split in enumerate(splits):
    print(f"Split {i}:")
    train, test = split
    X_train, y_train = train.drop(columns=["diagnosis"]).to_numpy(), train["diagnosis"].to_numpy()
    X_test, y_test = test.drop(columns=["diagnosis"]).to_numpy(), test["diagnosis"].to_numpy()
    pm3.fit(X_train,y_train,inf_loop=False,epochs=1000)
    tp,tn,fp,fn=pm3.score(X_test,y_test, _print=True)
    accuracies.append((tp + tn) / (tp + tn + fp + fn))
    print("-----------------------")
result_dict ={"threshold":None,"delta":None,"method":"PM3-Epochs",
              "mean_accuracy": np.round((np.mean(accuracies)*100), 2),
                "std_accuracy": np.round((np.std(accuracies)*100), 2)
              ,"epochs":1000}
results = results.append(result_dict,ignore_index=True)

Split 0:
--------Results--------
Accuracy: 0.9545454545454546
----Class 1----
Precision: 0.9696969696969697
Recall: 0.9411764705882353
----Class 0----
Precision: 0.9393939393939394
Recall: 0.96875
-----------------------
Split 1:
--------Results--------
Accuracy: 0.9545454545454546
----Class 1----
Precision: 0.9791666666666666
Recall: 0.9306930693069307
----Class 0----
Precision: 0.9313725490196079
Recall: 0.979381443298969
-----------------------
Split 2:
--------Results--------
Accuracy: 0.9494949494949495
----Class 1----
Precision: 0.9578947368421052
Recall: 0.9381443298969072
----Class 0----
Precision: 0.941747572815534
Recall: 0.9603960396039604
-----------------------
Split 3:
--------Results--------
Accuracy: 0.9444444444444444
----Class 1----
Precision: 0.956989247311828
Recall: 0.9270833333333334
----Class 0----
Precision: 0.9333333333333333
Recall: 0.9607843137254902
-----------------------
Split 4:
--------Results--------
Accuracy: 0.9444444444444444
----Class 1----
Precisio

In [14]:
results

Unnamed: 0,threshold,delta,method,mean_accuracy,std_accuracy,epochs
0,100.0,0.001,PM1-Infinite,91.11,2.11,
1,,,PM1-Epochs,91.41,2.26,1000.0
2,100.0,0.001,PM2-Infinite,92.22,0.61,
3,,,PM2-Epochs,88.23,4.69,1000.0
4,100.0,0.001,PM3-Infinite,94.24,0.94,
5,,,PM3-Epochs,94.24,0.94,1000.0


## Learning Task 3

Change the order of the features in the dataset randomly and build a perceptron model

In [15]:
splits = preprocessor.preprocess(drop_na=True,n_splits=10,standardize=False,labels=[-1,1]) # splitting into training and testing

{-1: 'B', 1: 'M'}


### Using an infinite loop 

In [16]:
pm4 = Perceptron()
accuracies:list[float]=[]
for i, split in enumerate(splits):
    print(f"Split {i}:")
    train, test = split
    train = train.sample(frac=1,axis = 1,random_state=23)
    test = test.sample(frac=1,axis = 1,random_state=23)
    X_train, y_train = train.drop(columns=["diagnosis"]).to_numpy(), train["diagnosis"].to_numpy()
    X_test, y_test = test.drop(columns=["diagnosis"]).to_numpy(), test["diagnosis"].to_numpy()
    pm4.fit(X_train,y_train,True)
    tp,tn,fp,fn=pm4.score(X_test,y_test, _print=True)
    accuracies.append((tp + tn) / (tp + tn + fp + fn))
    print("-----------------------")
result_dict ={"threshold":100,"delta":0.001,"method":"PM4-Infinite",
              "mean_accuracy": np.round((np.mean(accuracies)*100), 2),
                "std_accuracy": np.round((np.std(accuracies)*100), 2)
              ,"epochs":None
              }
results = results.append(result_dict,ignore_index=True)

Split 0:
Breaking as not improving
--------Results--------
Accuracy: 0.9343434343434344
----Class 1----
Precision: 0.9587628865979382
Recall: 0.9117647058823529
----Class 0----
Precision: 0.9108910891089109
Recall: 0.9583333333333334
-----------------------
Split 1:
Breaking as not improving
--------Results--------
Accuracy: 0.9191919191919192
----Class 1----
Precision: 0.897196261682243
Recall: 0.9504950495049505
----Class 0----
Precision: 0.945054945054945
Recall: 0.8865979381443299
-----------------------
Split 2:
Breaking as not improving
--------Results--------
Accuracy: 0.9242424242424242
----Class 1----
Precision: 0.9019607843137255
Recall: 0.9484536082474226
----Class 0----
Precision: 0.9479166666666666
Recall: 0.900990099009901
-----------------------
Split 3:
Breaking as not improving
--------Results--------
Accuracy: 0.9292929292929293
----Class 1----
Precision: 0.9183673469387755
Recall: 0.9375
----Class 0----
Precision: 0.94
Recall: 0.9215686274509803
---------------------

### Using Epochs

In [17]:
accuracies:list[float]=[]
for i, split in enumerate(splits):
    print(f"Split {i}:")
    train, test = split
    train = train.sample(frac=1,axis = 1,random_state=23)
    test = test.sample(frac=1,axis = 1,random_state=23)
    X_train, y_train = train.drop(columns=["diagnosis"]).to_numpy(), train["diagnosis"].to_numpy()
    X_test, y_test = test.drop(columns=["diagnosis"]).to_numpy(), test["diagnosis"].to_numpy()
    pm4.fit(X_train,y_train,inf_loop=False,epochs=1000)
    tp,tn,fp,fn=pm4.score(X_test,y_test, _print=True)
    accuracies.append((tp + tn) / (tp + tn + fp + fn))
    print("-----------------------")
result_dict ={"threshold":None,"delta":None,"method":"PM4-Epochs",
              "mean_accuracy": np.round((np.mean(accuracies)*100), 2),
                "std_accuracy": np.round((np.std(accuracies)*100), 2)
              ,"epochs":1000}
results = results.append(result_dict,ignore_index=True)

Split 0:
--------Results--------
Accuracy: 0.9191919191919192
----Class 1----
Precision: 0.8981481481481481
Recall: 0.9509803921568627
----Class 0----
Precision: 0.9444444444444444
Recall: 0.8854166666666666
-----------------------
Split 1:
--------Results--------
Accuracy: 0.9343434343434344
----Class 1----
Precision: 0.9230769230769231
Recall: 0.9504950495049505
----Class 0----
Precision: 0.9468085106382979
Recall: 0.9175257731958762
-----------------------
Split 2:
--------Results--------
Accuracy: 0.9343434343434344
----Class 1----
Precision: 0.92
Recall: 0.9484536082474226
----Class 0----
Precision: 0.9489795918367347
Recall: 0.9207920792079208
-----------------------
Split 3:
--------Results--------
Accuracy: 0.9292929292929293
----Class 1----
Precision: 0.8942307692307693
Recall: 0.96875
----Class 0----
Precision: 0.9680851063829787
Recall: 0.8921568627450981
-----------------------
Split 4:
--------Results--------
Accuracy: 0.9141414141414141
----Class 1----
Precision: 0.861111

## Results

In [18]:
results.sort_values(by=["mean_accuracy"],ascending=False)

Unnamed: 0,threshold,delta,method,mean_accuracy,std_accuracy,epochs
4,100.0,0.001,PM3-Infinite,94.24,0.94,
5,,,PM3-Epochs,94.24,0.94,1000.0
2,100.0,0.001,PM2-Infinite,92.22,0.61,
1,,,PM1-Epochs,91.41,2.26,1000.0
7,,,PM4-Epochs,91.41,2.26,1000.0
0,100.0,0.001,PM1-Infinite,91.11,2.11,
6,100.0,0.001,PM4-Infinite,91.11,2.11,
3,,,PM2-Epochs,88.23,4.69,1000.0


## Conclusion

#### Learning Task 1

Since perceptron is an algorithm which varies depending on the order of the training data, we observe that the PM1 accuracies and PM2 accuracies vary around **2 percent**. This is because the training samples were shuffled in PM2 which trained a different model that classified the testing data differently.

#### Learning Task 2

In this model we normalize the data. Since normalized data treats all the features with equal weightage, we observe that it gives a higher accuracy with respect to PM1. 

#### Learning Task 3

The order of the tuples in Perceptron does not matter as in the end we get the same weights but in a shuffled order, which denotes the same decision boundary. Hence we get the same accuracy for both pm1 and pm4. 