### Assignment 01 Perceptron vs Logistics Regression

### (1.1) Applying Perceptron Classifier Model on the Breast Cancer Wisconsin dataset for binary classification

In [1]:
# Import Library
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
from sklearn.datasets import load_breast_cancer
from sklearn.datasets import load_iris

from sklearn.linear_model import Perceptron
from sklearn.linear_model import LogisticRegression
from sklearn.multiclass import OneVsOneClassifier
from sklearn.multiclass import OneVsRestClassifier

from sklearn.model_selection import train_test_split 
from sklearn.metrics import mean_absolute_error 
from sklearn.metrics import mean_absolute_percentage_error
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report

In [2]:
# Load Breast Cancer dataset
data = load_breast_cancer()

In [3]:
# Assigning X to feature values, and y to target values
X = data.data
y = data.target

# Splitting Data into Training and Test Sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, 
                                                    random_state=0)

In [4]:
# Perceptron Model Fitting
clf = Perceptron(tol=1e-3, max_iter=1000, random_state=0)
clf.fit(X_train, y_train)

In [5]:
# Predicting the test set results and calculating the accuracy
y_pred = clf.predict(X_test)
print('Accuracy of Perceptron classifier on test set:{: .2f}'.
      format(clf.score(X_test, y_test)))

Accuracy of Perceptron classifier on test set: 0.91


In [6]:
# Computing Confusion Matrix
clf_cm = confusion_matrix(y_test, y_pred)
print(clf_cm)

[[ 48  15]
 [  1 107]]


The result is telling us that we have 48+107 correct predictions and 1+15 incorrect predictions.

In [7]:
# Printing Classification Report
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.98      0.76      0.86        63
           1       0.88      0.99      0.93       108

    accuracy                           0.91       171
   macro avg       0.93      0.88      0.89       171
weighted avg       0.91      0.91      0.90       171



The F-beta score can be interpreted as a weighted harmonic mean of the precision and recall, where an F-beta score reaches its best value at 1 and worst score at 0.

In [8]:
# Compute MAE
print('Mean Absolute Error:{: .2f}'.
      format(mean_absolute_error(y_test, y_pred)))

Mean Absolute Error: 0.09


Mean Absolute Error(MAE) is a measure of errors between paired observations expressing the same phenomenon. It is the average of the absolute errors. The smaller the MAE the better the model’s performance. The closer MAE is to 0, the more accurate the model is.


### (1.2) Applying Logistic Regression Model on the Breast Cancer Wisconsin dataset for binary classification

In [9]:
# Assigning X to feature values, and y to target values
lr_X = data.data
lr_y = data.target

# Splitting Data into Training and Test Sets
lr_X_train, lr_X_test, lr_y_train, lr_y_test = train_test_split(lr_X, lr_y, test_size=0.3, 
                                                    random_state=0)

In [10]:
# Logistic Regression Model Fitting
logreg = LogisticRegression(tol=1e-3, max_iter=1000, random_state=0, 
                            solver='liblinear')
logreg.fit(lr_X_train, lr_y_train)

In [11]:
# Predicting the test set results and calculating the accuracy
lr_y_pred = logreg.predict(lr_X_test)
print('Accuracy of Perceptron classifier on test set:{: .2f}'.
      format(logreg.score(lr_X_test, lr_y_test)))

Accuracy of Perceptron classifier on test set: 0.95


In [12]:
# Computing Confusion Matrix
lr_cm = confusion_matrix(lr_y_test, lr_y_pred)
print(lr_cm)

[[ 61   2]
 [  7 101]]


The result is telling us that we have 61+101 correct predictions and 7+2 incorrect predictions.

In [13]:
# Printing Classification Report
print(classification_report(lr_y_test, lr_y_pred))

              precision    recall  f1-score   support

           0       0.90      0.97      0.93        63
           1       0.98      0.94      0.96       108

    accuracy                           0.95       171
   macro avg       0.94      0.95      0.94       171
weighted avg       0.95      0.95      0.95       171



The F-beta score can be interpreted as a weighted harmonic mean of the precision and recall, where an F-beta score reaches its best value at 1 and worst score at 0.

In [14]:
# Compute MAE
print('Mean Absolute Error:{: .2f}'.
      format(mean_absolute_error(lr_y_test, lr_y_pred)))

Mean Absolute Error: 0.05


The closer MAE is to 0, the more accurate the model is.

From the above results, it is obvious that Logistic regression and the perceptron algorithm are very similar to each other. It’s common to think of logistic regression as a kind of perceptron algorithm on steroids, in that a logistic model can predict probabilities while a perceptron can only predict yes or no. In fact, taking a logistic model and setting all values less than .5 to zero, and all values above .5 to one gives a very similar result to just the perceptron algorithm.

Using both models for binary classification, Logistic regression has proven to be a little bit better model than Perceptron algorithm in terms of Mean Absolute Error (0.05 to 0.09), Accuracy (0.95 to 0.91), precision, recall, and f1-score. With an accuracy of 0.95, Logistic regression model was able to make a total of 162 correct predictions and 9 incorrect predictions on the test dataset, as against the perceptron model's accuracy of 0.91 with a total of 155 correct predictions and 16 incorrect predictions on the test dataset.



### (2.1)  Applying  Perceptron Classifier Model on the Iris Dataset for multiclass classification.

In [15]:
# Loading Iris dataset
iris = load_iris()

In [16]:
# Assigning X to feature values, and y to target values
pmc_X = iris.data
pmc_y = iris.target

# Splitting Data into Training and Test Sets
pmc_X_train, pmc_X_test, pmc_y_train, pmc_y_test = train_test_split(pmc_X, pmc_y, test_size=0.3, 
                                                    random_state=0)

In [17]:

model = Perceptron(tol=1e-3, max_iter=1000, random_state=0)
ovo_clf = OneVsOneClassifier(model)

# fit model
ovo_clf.fit(pmc_X_train, pmc_y_train)

In [18]:
# Predicting the test set results and calculating the accuracy
pmc_y_pred = ovo_clf.predict(pmc_X_test)
print('Accuracy of Perceptron classifier on test set:{: .2f}'.
      format(ovo_clf.score(pmc_X_test, pmc_y_test)))

Accuracy of Perceptron classifier on test set: 0.96


In [19]:
# Computing Confusion Matrix
pmc_cm = confusion_matrix(pmc_y_test, pmc_y_pred)
cm_df = pd.DataFrame(pmc_cm,
                     index = ['SETOSA','VERSICOLR','VIRGINICA'], 
                     columns = ['SETOSA','VERSICOLR','VIRGINICA'])
cm_df

Unnamed: 0,SETOSA,VERSICOLR,VIRGINICA
SETOSA,15,1,0
VERSICOLR,0,17,1
VIRGINICA,0,0,11


In [20]:
# Printing Classification Report
print(classification_report(pmc_y_test, pmc_y_pred, target_names=['SETOSA','VERSICOLR','VIRGINICA']))

              precision    recall  f1-score   support

      SETOSA       1.00      0.94      0.97        16
   VERSICOLR       0.94      0.94      0.94        18
   VIRGINICA       0.92      1.00      0.96        11

    accuracy                           0.96        45
   macro avg       0.95      0.96      0.96        45
weighted avg       0.96      0.96      0.96        45



In [21]:
# Compute MAE
print('Mean Absolute Error:{: .2f}'.
      format(mean_absolute_error(pmc_y_test, pmc_y_pred)))

Mean Absolute Error: 0.04


### (2.2)  Applying Logistic Regression Model on the the Iris Dataset for multiclass classification.

In [22]:
# Assigning X to feature values, and y to target values
lrm_X = iris.data
lrm_y = iris.target

# Splitting Data into Training and Test Sets
lrm_X_train, lrm_X_test, lrm_y_train, lrm_y_test = train_test_split(lrm_X, lrm_y, test_size=0.3, 
                                                    random_state=0)

In [23]:
# Defining Perceptron model for multi-class classification using scikit-learn OneVsOneClassifier
lr_model = logreg = LogisticRegression(tol=1e-3, max_iter=1000, random_state=0, 
                            solver='liblinear')
ovo_lr = OneVsOneClassifier(lr_model)

# fit model
ovo_lr.fit(lrm_X_train, lrm_y_train)

In [24]:
# Predicting the test set results and calculating the accuracy
lrm_y_pred = ovo_lr.predict(lrm_X_test)
print('Accuracy of LogisticRegression classifier on test set:{: .2f}'.
      format(ovo_lr.score(lrm_X_test, lrm_y_test)))

Accuracy of LogisticRegression classifier on test set: 0.96


In [25]:
# Computing Confusion Matrix
lrm_cm = confusion_matrix(lrm_y_test, lrm_y_pred)
lrm_cm_df = pd.DataFrame(pmc_cm,
                     index = ['SETOSA','VERSICOLR','VIRGINICA'], 
                     columns = ['SETOSA','VERSICOLR','VIRGINICA'])
lrm_cm_df

Unnamed: 0,SETOSA,VERSICOLR,VIRGINICA
SETOSA,15,1,0
VERSICOLR,0,17,1
VIRGINICA,0,0,11


In [26]:
# Printing Classification Report
print(classification_report(lrm_y_test, lrm_y_pred, target_names=['SETOSA','VERSICOLR','VIRGINICA']))

              precision    recall  f1-score   support

      SETOSA       1.00      1.00      1.00        16
   VERSICOLR       1.00      0.89      0.94        18
   VIRGINICA       0.85      1.00      0.92        11

    accuracy                           0.96        45
   macro avg       0.95      0.96      0.95        45
weighted avg       0.96      0.96      0.96        45



In [27]:
# Compute MAE
print('Mean Absolute Error:{: .2f}'.
      format(mean_absolute_error(lrm_y_test, lrm_y_pred)))

Mean Absolute Error: 0.04


Using Perceptron and Logistic regression models for multiclass classification, they both proved to be the same as they all produce the same perfomance results -  Accuracy of 0.96 and Mean Absolute Error of 0.04. This indicates that both models equally perform better when used in multiclass classification task.