# Performance Evaluation Methods

**Aim**: The aim of this notebook is understand the various performance evaluation methods that can be used to evaluate your model. 

## Table of contents

1. Performance evaluation methods for classification algorithms
2. Performance evaluation methods for regression algorithms 
3. Performance evaluation methods for unsupervised algorithms


## Package Requirements

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn import linear_model
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.metrics import roc_curve
from sklearn.metrics import roc_auc_score
from sklearn import model_selection
from sklearn import linear_model
from sklearn import metrics
import matplotlib.pyplot as plt
import scikitplot as skplt
from sklearn.cluster import KMeans

## Performance evaluation methods for classification algorithms

**Building the K-Nearest Neighbors Model**

In [None]:
#Reading in the fraud detection dataset 

df = pd.read_csv('fraud_prediction.csv')

In [None]:
#Creating the features 

features = df.drop('isFraud', axis = 1).values
target = df['isFraud'].values

In [None]:
#Splitting the data into training and test sets 

X_train, X_test, y_train, y_test = train_test_split(features, target, test_size = 0.3, random_state = 42, stratify = target)

In [None]:
# Building the K-NN Classifier 

knn_classifier = KNeighborsClassifier(n_neighbors=3)

knn_classifier.fit(X_train, y_train)

**Building the Logistic Regression Model**

In [None]:
#Initializing an logistic regression object

logistic_regression = linear_model.LogisticRegression()

#Fitting the model to the training and test sets

logistic_regression.fit(X_train, y_train)

**Confusion Matrix**

In [None]:
from sklearn.metrics import confusion_matrix

#Creating predictions on the test set 

prediction = knn_classifier.predict(X_test)

#Creating the confusion matrix 

print(confusion_matrix(y_test, prediction))

In [None]:
#Creating the classification report 

print(classification_report(y_test, prediction))

**Normalized Confusion Matrix**

In [None]:
#Normalized confusion matrix for the K-NN model

prediction_labels = knn_classifier.predict(X_test)
skplt.metrics.plot_confusion_matrix(y_test, prediction_labels, normalize=True)
plt.show()

In [None]:
#Normalized confusion matrix for the logistic regression model

prediction_labels = logistic_regression.predict(X_test)
skplt.metrics.plot_confusion_matrix(y_test, prediction_labels, normalize=True)
plt.show()

**Area under the curve**

In [None]:
#Probabilities for each prediction output 

target_prob = knn_classifier.predict_proba(X_test)[:,1]

#Plotting the ROC curve 

fpr, tpr, thresholds = roc_curve(y_test, target_prob)

plt.plot([0,1], [0,1], 'k--')

plt.plot(fpr, tpr)

plt.xlabel('False Positive Rate')

plt.ylabel('True Positive Rate')

plt.title('ROC Curve')

plt.show()

In [None]:
#Computing the auc score 

roc_auc_score(y_test, target_prob)

**Cumulative Gains Plot**

In [None]:
# Cumulative gains plot for the K-NN model

target_prob = knn_classifier.predict_proba(X_test)
skplt.metrics.plot_cumulative_gain(y_test, target_prob)
plt.show()

In [None]:
#Cumulative gains plot for the logistic regression model

target_prob = logistic_regression.predict_proba(X_test)
skplt.metrics.plot_cumulative_gain(y_test, target_prob)
plt.show()

**Lift Curve**

In [None]:
# Lift curve for the K-NN model

target_prob = knn_classifier.predict_proba(X_test)
skplt.metrics.plot_lift_curve(y_test, target_prob)
plt.show()

In [None]:
#Cumulative gains plot for the logistic regression model

target_prob = logistic_regression.predict_proba(X_test)
skplt.metrics.plot_lift_curve(y_test, target_prob)
plt.show()

**KS-Statistic plot**

In [None]:
#KS plot for the K-NN model

target_proba = knn_classifier.predict_proba(X_test)
skplt.metrics.plot_ks_statistic(y_test, target_proba)
plt.show()

In [None]:
#KS plot for the logistic regression model

target_proba = logistic_regression.predict_proba(X_test)
skplt.metrics.plot_ks_statistic(y_test, target_proba)
plt.show()

**Calibration Plot**

In [None]:
#Extracting the probabilites that the positive class will be predicted

knn_proba = knn_classifier.predict_proba(X_test)
log_proba = logistic_regression.predict_proba(X_test)

#Storing probabilities in a list

probas = [knn_proba, log_proba]

# Storing the model names in a list 

model_names = ["k_nn", "Logistic Regression"]

#Creating the calibration plot

skplt.metrics.plot_calibration_curve(y_test, probas, model_names)

plt.show()

**Learning Curve**

In [None]:
#Learning curve for the K-NN model

skplt.estimators.plot_learning_curve(knn_classifier, features, target)

plt.show()

**Cross-validated box plot**

In [None]:
#List of models

models = [('k-NN', knn_classifier), ('LR', logistic_regression)]

In [None]:
#Initializing empty lists in order to store the results
cv_scores = []
model_name_list = []

for name, model in models:
    
    #5-fold cross validation
    cv_5 = model_selection.KFold(n_splits= 5, random_state= 50)
    # Evaluating the accuracy scores
    cv_score = model_selection.cross_val_score(model, X_test, y_test, cv = cv_5, scoring= 'accuracy')
    cv_scores.append(cv_score)
    model_name_list.append(name)
    
# Plotting the cross-validated box plot 

fig = plt.figure()
fig.suptitle('Boxplot of 5-fold cross validated scores for all the models')
ax = fig.add_subplot(111)
plt.boxplot(cv_scores)
ax.set_xticklabels(model_name_list)
plt.show()

## Performance Evaluation methods for regression algorithms

**Building a linear regression model**

In [None]:
## Building a simple linear regression model

#Reading in the dataset

df = pd.read_csv('fraud_prediction.csv')

#Define the feature and target arrays

feature = df['oldbalanceOrg'].values
target = df['amount'].values

In [None]:
#Initializing a linear regression model 

linear_reg = linear_model.LinearRegression()

#Reshaping the array since we only have a single feature

feature = feature.reshape(-1, 1)
target = target.reshape(-1, 1)

#Fitting the model on the data

linear_reg.fit(feature, target)

In [None]:
predictions = linear_reg.predict(feature)

**Computing the Mean Absoloute Error**

In [None]:
metrics.mean_absolute_error(target, predictions)

**Computing the Mean Squared Error**

In [None]:
metrics.mean_squared_error(target, predictions)

**Computing the Root Mean Squared Error**

In [None]:
np.sqrt(metrics.mean_squared_error(target, predictions))

## Performance Evaluation methods for unsupervised algorithms

**Building a K-Means model**

In [None]:
#Reading in the dataset

df = pd.read_csv('fraud_prediction.csv')

#Dropping the target feature & the index

df = df.drop(['Unnamed: 0', 'isFraud'], axis = 1)

In [None]:
#Initializing K-means with 2 clusters

k_means = KMeans(n_clusters = 2)

**Elbow plot**

In [None]:
skplt.cluster.plot_elbow_curve(k_means, df, cluster_ranges=range(1, 20))
plt.show()