# A Benchmark of Supervised Learning Algorithms

**Abstract:**
Supervised Learning is the task of searching a predefined hypotheses space for the hypothesis that best maps the relationship underlying the input instances and can generalize from observed training examples to classify unseen instances. This paper introduces the paradigm of Supervised Learning through benchmarking five distinct Supervised Learning Algorithms using two different real-world classification problems.

**Keywords**: *Machine Learning, Supervised Learning, Classification, Benchmark, Data Analysis.*

**The Classification Problem:**
Classification is a concept learning problem, where we aim to infer a general definition of a concept from a set of examples. According to Mitchell (1997): "Concept Learning is acquiring the definition of a general category given a sample of positive and negative training examples of the category".


In [None]:
# Importing required Python libraries
from sklearn.neural_network import MLPClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.metrics import accuracy_score

from helperfunctions import *
from learningmodels import *

The two problems that will be explored in this paper are:
1. UCI Credit Card Default Dataset: This problem is about using the training data to predict the default of credit card clients in Taiwan (Default: 1, No Default: 0). The dataset consists of 30,000 examples, with 23 numeric attributes. Predicting default probability is a crucial activity that has critical influence on the banking business. This is a real-world data extracted from a bank in Taiwan in 2005. The dataset consists of 30,000 instances with 24 attributes. We have split the dataset into three subsets: Training set (18,000 example), Validation set (6,000 example) and Test set (6,000 example). The Test set will stay untouched during the training process till we produce the final model. The data has enough distinct attributes to enable meaningful learning. The correlation matrix of the data is shown in Figure 1. Darker color indicates higher correlation.

2. The MNIST Database of handwritten digits: This is a classical dataset for classifying binary images of handwritten digits. It is a subset of the larger dataset from NIST, and was introduced in 1998 (LeCun, Bottou, Bengio & P. Haffner – 1998). The data consists of a training set of 70,000 examples. In order to save computation time, we will use a subset (25%) of this dataset in our paper. We have split the dataset into three subsets: Training set (8,400 example), Validation set (3,600 example) and Test set (5,000 example). The Test set will stay untouched during the training process till we produce the final model. The data generates relatively good results for accuracy, allowing for informative benchmarking. The examples are almost equally distributed over the different classes. The histogram in Figure 2 shows the examples distribution.


In [None]:
# Reading Credit Card Default dataset
credit_dataset = list()
credit_dataset = ReadDataSet_Credit()

# Dataset analysis
print("Size of Training set (Credit):", len(credit_dataset[2]))
print("Size of Validation set (Credit):", len(credit_dataset[4]))
print("Size of Test set (Credit):", len(credit_dataset[6]))

# Correlation Matrix
plot_Correlation_Matrix(credit_dataset[0], 'Credit_CorrMat.png')

In [None]:
# Reading MNIST Database of handwritten digits
MNIST_dataset = list()
MNIST_dataset = ReadDataSet_MNIST()

# Dataset analysis
print("Size of Training set (MNIST):", len(MNIST_dataset[2]))
print("Size of Validation set (MNIST):", len(MNIST_dataset[4]))
print("Size of Test set (MNIST):", len(MNIST_dataset[6]))

# Dataset histogram
histogram_plot(MNIST_dataset[3])

## Decision Tree

**Model Description:**
Decision Trees is a method for inductive inference, with the goal of creating a model that predicts the value of a target variable by learning simple decision rules inferred from the input data. Over the last three decades, several algorithms were developed in order to learn decision trees. ID3 (Quinlan – 1979) was developed with a strategy of constructing the decision trees by performing simple-to-complex, hill-climbing search through the hypotheses space. ID3 creates a multiway tree, finding for each node the categorical feature that will yield the largest information gain for categorical targets.
C4.5 (Quinlan – 1993) is a successor to ID3 that removed the restriction that features must be categorical by dynamically defining a discrete attribute (based on numerical variables) that partitions the continuous attribute value into a discrete set of intervals.
CART (Breiman, Friedman, Olshen and Stone – 1984) is another decision trees algorithm that is similar the C4.5. CART constructs trees that have only binary splits. This restriction simplifies the splitting criterion because there need not be a penalty for multi-way splits. A pruning technique called minimal cost complexity pruning is used (Quinlan – 1993). It assumes that the bias in the re-substitution error of a tree increases linearly with the number of leaf nodes. In this paper, we will apply CART to our datasets, and we will use pruning to reduce the possibility of overfitting.
In terms of Decision Trees bias, shorter trees are preferred over longer trees. In addition, trees that place high information gain attributes close to the root are preferred over those that do not.

**Model Analysis:**
The hyperparameter we will tune to improve the model’s performance is max_depth, which is the maximum depth of the tree. We will use the validation set to determine the best maximum depth to be used with each dataset, and then we will use the Test set to check the accuracy of the final model.
We started tuning the max_depth parameter, in a range between 1 and 40 with a step of 2, to determine the best setting for each dataset. From the Model Complexity Curve in Figure 3, we can observe that when we increase the maximum depth of the tree, the model’s accuracy decreases when measured using Cross Validation or the Validation set. This behavior is expected, since increasing the depth of the tree can result in overfitting. After crossing a certain depth, the model will fail to generalize. This behavior can be observed over the two datasets.


### Credit Card Default dataset

In [None]:
# Credit Card Default dataset
model_credit_DT = Model_Validation(credit_dataset[2], credit_dataset[3], DecisionTreeClassifier(), len(credit_dataset[2]))
model_credit_DT.set_valid_data(credit_dataset[4], credit_dataset[5])
get_DT_VC(model_credit_DT, credit_dataset[8])
model_credit_DT.plot_learning_curve(title = "Learning Curve (Decision Tree) - " + credit_dataset[8], cv = 5, n_jobs=-1)

# Decision Tree with the best max_depth applied to Credit Card Default dataset
classifier_credit_DT = DecisionTreeClassifier(max_depth=5)
classifier_credit_DT.fit(credit_dataset[2], credit_dataset[3])
# Predicting new results
y_pred = classifier_credit_DT.predict(credit_dataset[6])
acc = "Accuracy = %.2f" % ((accuracy_score(credit_dataset[7], y_pred)) * 100)+"%"
print(acc)

# Plot cloassifier graph
plot_classifier_graph(classifier_credit_DT, credit_dataset[8]+" Decision Tree graph")

### MNIST Database of handwritten digits

In [None]:
model_MNIST = Model_Validation(MNIST_dataset[2], MNIST_dataset[3], DecisionTreeClassifier(), len(MNIST_dataset[2]))
model_MNIST.set_valid_data(MNIST_dataset[4], MNIST_dataset[5])
get_DT_VC(model_MNIST, MNIST_dataset[8])
model_MNIST.plot_learning_curve(title = "Learning Curve (Decision Tree) - " + MNIST_dataset[8], cv = 5, n_jobs=-1)

# Decision Tree with the best max_depth applied to MNIST Database of handwritten digits
classifier_MNIST_DT = DecisionTreeClassifier(max_depth=9)
classifier_MNIST_DT.fit(MNIST_dataset[2], MNIST_dataset[3])
# Predicting new results
y_pred = classifier_MNIST_DT.predict(MNIST_dataset[6])
acc = "Accuracy = %.2f" % ((accuracy_score(MNIST_dataset[7], y_pred)) * 100)+"%"
print(acc)

# Plot cloassifier graph
plot_classifier_graph(classifier_MNIST_DT, MNIST_dataset[8]+" Decision Tree graph")

We can observe from Figure 3 that the best max_depth for Credit Card dataset is (5). As shown in the Model Complexity Curve, the model suffers overfitting after crossing this threshold, which explains the increase in the Cross-Validation and the Validation Set Errors.
We can also observe that the best max_depth for MNIST Database is (9), after which the Error values did not decrease anymore. Applying these settings to the datasets resulted in a prediction accuracy of (81.60%) on Credit Card test set, and (80.56%) on MNIST test set.

To analyze the relation between the model’s accuracy and the size of the training data, a Learning Curve is presented in Figure We can observe that with Credit Card dataset, the error increases with more than 11,500 training examples, which indicates overfitting, and explains why we have better accuracy with pruning.
On the other hand, with MNIST Database, the error keeps decreasing by adding more training examples. This indicates that the model needs more input examples to converge. The reason for this behavior is the huge number of attributes in the dataset, which is a default phenomenon with image data. 

In Figure 5 we have the Decision Tree structure used for Credit Card dataset. We can observe that the attribute (PAY_2) is the most important attribute when predicting Default. Hence it is the tree root.

In [None]:
combine_images(['../Figures/Learning Curve (Decision Tree) - Credit Card Default dataset.png',
                '../Figures/Learning Curve (Decision Tree) - MNIST Database of handwritten digits.png'],
                '../Figures/Learning Curve (Decision Tree).png')
combine_images(['../Figures/Model Complexity Curve (Decision Trees) - Credit Card Default dataset.png',
                '../Figures/Model Complexity Curve (Decision Trees) - MNIST Database of handwritten digits.png'],
                '../Figures/Model Complexity Curve (Decision Trees).png')

## Neural Networks

**Model Description:**
Neural Networks (Lippmann – 1987) provide a robust method for approximating target functions. In terms of bias, Neural Networks have a low Restriction Bias since they can model a wide variety of functions. This, however, increase the possibility of overfitting.
In addition, a Neural Network will not only overfit because of excessive complexity but can also overfit because of excessive training. As for the Preference Bias, Neural Networks prefer simpler and generalizable representations, which means few hidden layers and simple weights. In this paper, we will use a Multi-layer Perceptron classifier (SK. Pal and S. Mitra – 1992), with one hidden layer.

**Model Analysis:**
The hyperparameter we will tune to improve the model’s performance is hidden_layer_sizes, which is the number of neurons in the hidden layer. We will use the validation set to determine the best hidden layer sizes to be used with each dataset, and then we will use the Test set to check the accuracy of the final model.
We started tuning the hidden_layer_sizes parameter, in a range between 1 and 200 with a step of 5, to determine the best setting for each dataset. From the Model Complexity Curve in Figure 6, we can observe that on Credit Card dataset, a hidden_layer_sizes of more than (10) results in an increase in the Cross Validation and Validation Set Errors. On the other hand, on MNIST Database we could not observe an overfitting point, we choose a hidden_layer_sizes of (21), after which the Error rates stabilized. Accuracies produced by MLP on the Test set are (76.60%) and (89.54%) respectively.


### Credit Card Default dataset

In [None]:
# Credit Card Default dataset
model_credit_NN = Model_Validation(credit_dataset[2], credit_dataset[3], MLPClassifier(), len(credit_dataset[2]))
model_credit_NN.set_valid_data(credit_dataset[4], credit_dataset[5])
get_NN_VC(model_credit_NN, credit_dataset[8])
model_credit_NN.plot_learning_curve(title = "Learning Curve (Neural Networks) - " + credit_dataset[8], cv = 5, n_jobs=-1)

# Neural Network with the best hidden_layer_sizes applied to Credit Card Default dataset
classifier_credit_NN = MLPClassifier(hidden_layer_sizes=(10,))
classifier_credit_NN.fit(credit_dataset[2], credit_dataset[3])

# Predicting new results
y_pred = classifier_credit_NN.predict(credit_dataset[6])
acc = "Accuracy = %.2f" % ((accuracy_score(credit_dataset[7], y_pred)) * 100)+"%"
print(acc)

### MNIST Database of handwritten digits

In [None]:
# MNIST Database of handwritten digits
model_MNIST_NN = Model_Validation(MNIST_dataset[2], MNIST_dataset[3], MLPClassifier(), len(MNIST_dataset[2]))
model_MNIST_NN.set_valid_data(MNIST_dataset[4], MNIST_dataset[5])
get_NN_VC(model_MNIST_NN, MNIST_dataset[8])
model_MNIST_NN.plot_learning_curve(title = "Learning Curve (Neural Networks) - " + MNIST_dataset[8], cv = 5, n_jobs=-1)

# Neural Network with the best hidden_layer_sizes applied to MNIST Database of handwritten digits
classifier_MNIST_NN = MLPClassifier(hidden_layer_sizes=(21,))
classifier_MNIST_NN.fit(MNIST_dataset[2], MNIST_dataset[3])

# Predicting new results
y_pred = classifier_MNIST_NN.predict(MNIST_dataset[6])
acc = "Accuracy = %.2f" % ((accuracy_score(MNIST_dataset[7], y_pred)) * 100)+"%"
print(acc)

The Learning Curve of MLP is presented in Figure 7. We can observe that with both datasets, the more training examples we add, the better prediction accuracy the model produces.

In [None]:
combine_images(['../Figures/Learning Curve (Neural Networks) - Credit Card Default dataset.png',
                '../Figures/Learning Curve (Neural Networks) - MNIST Database of handwritten digits.png'],
                '../Figures/Learning Curve (Neural Network).png')
combine_images(['../Figures/Model Complexity Curve (Neural Networks) - Credit Card Default dataset.png',
                '../Figures/Model Complexity Curve (Neural Networks) - MNIST Database of handwritten digits.png'],
                '../Figures/Model Complexity Curve (Neural Network).png')

## Boosting

**Model Description:**
Boosting (Schapire – 1990) is a method for enhancing the performance of a weak learning algorithm. It adaptively changes the distribution of the training set based on the performance of previous classifiers. In this paper, we will use a particular boosting algorithm called “AdaBoost” (Freund and Schapire – 1995). AdaBoost generates a set of classifiers sequentially and votes them. It adjusts adaptively to the errors of the weak
hypotheses. The goal is to force the learner to minimize expected error over different input distributions.

**Model Analysis:**
In this paper, the hyperparameter we will tune to improve the model’s performance is n_estimators, which is the maximum number of weak learners at which boosting is terminated. We will use the validation set to determine the best number of estimators to be used with each dataset, and then we will use the Test set to check the accuracy of the final model.
We started tuning the n_estimators parameter, in a range between 10 and 300 with a step of 10, to determine the best setting for each dataset. From the Model Complexity Curve in Figure 8, we can observe that on Credit Card dataset, a n_estimators of more than (20) results in an increase in the Cross Validation and Validation Set Errors. Similarly, on MNIST Database we can see that crossing a threshold of (40) results in a drastic increase in both the Cross Validation and Validation Set Errors.

### Credit Card Default dataset

In [None]:
# Credit Card Default dataset
model_credit_AdaB = Model_Validation(credit_dataset[2], credit_dataset[3], AdaBoostClassifier(), len(credit_dataset[2]))
model_credit_AdaB.set_valid_data(credit_dataset[4], credit_dataset[5])
get_Boost_VC(model_credit_AdaB, credit_dataset[8])
model_credit_AdaB.plot_learning_curve(title = "Learning Curve (AdaBoost) - " + credit_dataset[8], cv = 5, n_jobs=-1)

# AdaBoost with the best n_estimators applied to Credit Card Default dataset
classifier_credit_AdaB = AdaBoostClassifier(n_estimators=20)
classifier_credit_AdaB.fit(credit_dataset[2], credit_dataset[3])

# Predicting new results
y_pred = classifier_credit_AdaB.predict(credit_dataset[6])
acc = "Accuracy = %.2f" % ((accuracy_score(credit_dataset[7], y_pred)) * 100)+"%"
print(acc)

### MNIST Database of handwritten digits

In [None]:
# MNIST Database of handwritten digits
model_MNIST_AdaB = Model_Validation(MNIST_dataset[2], MNIST_dataset[3], AdaBoostClassifier(), len(MNIST_dataset[2]))
model_MNIST_AdaB.set_valid_data(MNIST_dataset[4], MNIST_dataset[5])
get_Boost_VC(model_MNIST_AdaB, MNIST_dataset[8])
model_MNIST_AdaB.plot_learning_curve(title = "Learning Curve (AdaBoost) - " + MNIST_dataset[8], cv = 5, n_jobs=-1)

# AdaBoost with the best n_estimators applied to MNIST Database of handwritten digits
classifier_MNIST_AdaB = AdaBoostClassifier(n_estimators=40)
classifier_MNIST_AdaB.fit(MNIST_dataset[2], MNIST_dataset[3])

# Predicting new results
y_pred = classifier_MNIST_AdaB.predict(MNIST_dataset[6])
acc = "Accuracy = %.2f" % ((accuracy_score(MNIST_dataset[7], y_pred)) * 100)+"%"
print(acc)

The Learning Curve of AdaBoost is presented in Figure 9. Similar to previous observations, we can see that with both datasets, the more training examples we add, the better prediction accuracy the model produces. This trend is more obvious on MNIST Database given the huge number of attributes.

In [None]:
combine_images(['../Figures/Learning Curve (AdaBoost) - Credit Card Default dataset.png',
                '../Figures/Learning Curve (AdaBoost) - MNIST Database of handwritten digits.png'],
                '../Figures/Learning Curve (AdaBoost).png')
combine_images(['../Figures/Model Complexity Curve (AdaBoost) - Credit Card Default dataset.png',
                '../Figures/Model Complexity Curve (AdaBoost) - MNIST Database of handwritten digits.png'],
                '../Figures/Model Complexity Curve (AdaBoost).png')

## Support Vector Machines

**Model Description:**
Support Vector Machines (Cortes and Vapnik – 1995) are basically linear classifiers, that is, they classify between two sets of points by constructing a line that separates these two classes. If the data is not linearly separable, the points are first mapped through a non-linear function and then SVM is used to in the transformed space. In terms of bias, the Restriction Bias of SVM depends on the underlying kernel. On the other hand, the Preference Bias of SVM is maximizing Margin to avoid overfitting.
In this paper, the hyperparameters we will tune to improve the model’s performance are the underlying kernel and its coefficient “gamma”. We will use the validation set to determine the parameters to be used with each dataset, and then we will use the Test set to check the accuracy of the final model.

**Model Analysis:**
In this paper, we analyzed the performance of different SVM kernels on both datasets. Figure 10 shows the variations in accuracy and computation time. We have normalized the input instances before doing this comparison to reduce the execution time. This will not affect the conclusion since we are trying to capture the trends in accuracy, not a specific accuracy score.
As shown, on Credit Card dataset, the Polynomial Kernel and Radial Basis Function (RBF) Kernel have the highest accuracy. We decided to use RBF kernel since it has the highest accuracy and lowest running time. This is a result of the natural ability of RBF to produce wider and more inclusive hypotheses spaces. On the other hand, with the MNIST Database, the Linear Kernel outperforms all other kernels.

### SVC Kernels Comparison

In [None]:
get_SVM_compare(credit_dataset, 'Credit Card Default dataset')

In [None]:
get_SVM_compare(MNIST_dataset, 'MNIST Database of handwritten digits')

In [None]:
combine_images(['../Figures/Credit Card Default dataset - SVM kernels comparison.png',
                '../Figures/MNIST Database of handwritten digits - SVM kernels comparison.png'],
                '../Figures/SVM kernels comparison.png')

After determining the best kernel for each dataset, the hyperparameter we will tune to improve the model’s performance is gamma, which is the kernel coefficient for “rbf”, “poly” and “sigmoid” kernels. Given that MNSIT Database uses “linear” kernel, it will not be included in this analysis. We will use the validation set to determine the best number of estimators to be used with each dataset, and then we will use the Test set to check the accuracy of the final model.

We started tuning the gamma parameter, in a range between (10^-6) and (10^1.2), to determine the best setting for Credit Card dataset. From the Model Complexity Curve in Figure 11, we can observe that after a gamma value of (10^0.193), the model’s accuracy starts to decrease. Using this value produced an accuracy of (77.05%) on the Test set.

### Credit Card Default dataset

In [None]:
model_credit_SVM = Model_Validation(credit_dataset[2], credit_dataset[3], SVC(), len(credit_dataset[2]))
model_credit_SVM.set_valid_data(credit_dataset[4], credit_dataset[5])
bestgamma_credit_SVM = get_SVC_VC(model_credit_SVM, credit_dataset[8], ker='rbf')
model_credit_SVM.plot_learning_curve(title = "Learning Curve (SVM) - " + credit_dataset[8], cv = 5, n_jobs=-1)

# SVC with the best gamma applied to Credit Card Default dataset
classifier_credit_SVM = SVC(gamma=bestgamma_credit_SVM, kernel='rbf')
classifier_credit_SVM.fit(credit_dataset[2], credit_dataset[3])

# Predicting new results
y_pred = classifier_credit_SVM.predict(credit_dataset[6])
acc = "Accuracy = %.2f" % ((accuracy_score(credit_dataset[7], y_pred)) * 100)+"%"
print(acc)

### MNIST Database of handwritten digits

In [None]:
model_MNIST_SVM = Model_Validation(MNIST_dataset[2], MNIST_dataset[3], SVC(), len(MNIST_dataset[2]))
model_MNIST_SVM.set_valid_data(MNIST_dataset[4], MNIST_dataset[5])
model_MNIST_SVM.plot_learning_curve(title = "Learning Curve (SVM) - " + MNIST_dataset[8], cv = 5, n_jobs=-1)

# SVC with the best gamma applied to MNIST Database of handwritten digits
classifier_MNIST_SVM = SVC(kernel='linear')
classifier_MNIST_SVM.fit(MNIST_dataset[2], MNIST_dataset[3])

# Predicting new results
y_pred = classifier_MNIST_SVM.predict(MNIST_dataset[6])
acc = "Accuracy = %.2f" % ((accuracy_score(MNIST_dataset[7], y_pred)) * 100)+"%"
print(acc)

The Learning Curve of SVM is presented in Figure 12. We can observe that on credit Card dataset, adding more training examples did not result in a bug improvement in accuracy. On the other hand, on MNIST Database, the more training examples we add, the better prediction accuracy the model produces.

In [None]:
combine_images(['../Figures/Model Complexity Curve (SVM) - Credit Card Default dataset.png',
                '../Figures/Model Complexity Curve (SVM) - MNIST Database of handwritten digits.png'],
                '../Figures/Model Complexity Curve (SVM).png')

## K-Nearest Neighbors

**Model Description:**
The basic principle behind K-Nearest Neighbors (Cover and Hart – 1967) is to find a predefined number of training samples closest in distance to the new point and predict the label from these. The Nearest Neighbors algorithm is a lazy learner. It procrastinates learning till it starts querying. In terms of Restriction Bias, K-NN is in general good for measuring distance-based approximations, but it is prone to the Curse of Dimensionality. As for the Preference Bias, K-NN assumes that an instance x will be most similar to the classification of other instances that are nearby. In addition, averaging produces smooth functions.
In this paper, the hyperparameter we will tune to improve the model’s performance is the number of neighbors “n_neighbors”. We will use the validation set to determine the best value to be used with each dataset, and then we will use the Test set to check the accuracy of the final model.

**Model Analysis:**
We started tuning the n_neighbors parameter, in a range between 1 and 50 with a step of 2, to determine the best setting for each dataset. From the Model Complexity Curve in Figure 13, we can observe that on Credit Card dataset, a n_neighbors of more than (20) results in an increase in the Cross Validation and Validation Set Errors. Similarly, on MNIST Database we can see that crossing a threshold of (40) results in a drastic increase in the Error score. Applying these settings on the Test set resulted in accuracies of (77.47%) and (94.54) respectively.

### Credit Card Default dataset

In [None]:
# Credit Card Default dataset
model_credit_KNN = Model_Validation(credit_dataset[2], credit_dataset[3], KNeighborsClassifier(), len(credit_dataset[2]))
model_credit_KNN.set_valid_data(credit_dataset[4], credit_dataset[5])
get_KNN_VC(model_credit_KNN, credit_dataset[8])
model_credit_KNN.plot_learning_curve(title = "Learning Curve (KNN) - " + credit_dataset[8], cv = 5, n_jobs=-1)

# KNN with the best n_neighbors applied to Credit Card Default dataset
classifier_credit_KNN = KNeighborsClassifier(n_neighbors=37)
classifier_credit_KNN.fit(credit_dataset[2], credit_dataset[3])

# Predicting new results
y_pred = classifier_credit_KNN.predict(credit_dataset[6])
acc = "Accuracy = %.2f" % ((accuracy_score(credit_dataset[7], y_pred)) * 100)+"%"
print(acc)

### MNIST Database of handwritten digits

In [None]:
# MNIST Database of handwritten digits
model_MNIST_KNN = Model_Validation(MNIST_dataset[2], MNIST_dataset[3], KNeighborsClassifier(), len(MNIST_dataset[2]))
model_MNIST_KNN.set_valid_data(MNIST_dataset[4], MNIST_dataset[5])
get_KNN_VC(model_MNIST_KNN, MNIST_dataset[8])
model_MNIST_KNN.plot_learning_curve(title = "Learning Curve (KNN) - " + MNIST_dataset[8], cv = 5, n_jobs=-1)

# KNN with the best n_neighbors applied to Credit Card Default dataset
classifier_MNIST_KNN = KNeighborsClassifier(n_neighbors=5)
classifier_MNIST_KNN.fit(MNIST_dataset[2], MNIST_dataset[3])

# Predicting new results
y_pred = classifier_MNIST_KNN.predict(MNIST_dataset[6])
acc = "Accuracy = %.2f" % ((accuracy_score(MNIST_dataset[7], y_pred)) * 100)+"%"
print(acc)

The Learning Curve of K-NN is presented in Figure 14. Similar to the trend observed with other algorithms, we can see that with both datasets, the more training examples we add, the better prediction accuracy the model produces. This trend is more obvious on MNIST Database given the huge number of attributes.


In [None]:
combine_images(['../Figures/Learning Curve (KNN) - Credit Card Default dataset.png',
                '../Figures/Learning Curve (KNN) - MNIST Database of handwritten digits.png'],
                '../Figures/Learning Curve (KNN).png')
combine_images(['../Figures/Model Complexity Curve (KNN) - Credit Card Default dataset.png',
                '../Figures/Model Complexity Curve (KNN) - MNIST Database of handwritten digits.png'],
                '../Figures/Model Complexity Curve (KNN).png')

## Conclusion

In this part, we compared the performance of the five learning algorithms on both datasets. The results are shown in Figure 15. On Credit Card dataset, AdaBoost and SVM has the highest accuracy. But if we took the running time into account, AdaBoost outperforms SVM by a significant margin.
On MNIST Database, we can observe that KNN produces the highest accuracy, and Neural Networks with the second highest accuracy. However, Neural Networks offer a very good performance in a fraction of KNN’s running time (13%).

We have also plotted the AUC-ROC curve of the five algorithms on Credit Card dataset, to determine the separability measure of each algorithm. As shown in Figure 4, AdaBoost has the highest area under the curve. This indicates that AdaBoost classifier is able to strengthen the confidence given enough input examples to get the correct classification.

### Credit Card Default dataset

In [None]:
# Credit Card Default dataset
model_credit_Overview = Model_Validation(credit_dataset[2], credit_dataset[3], DecisionTreeClassifier(), len(credit_dataset[2]))
model_credit_Overview.set_valid_data(credit_dataset[4], credit_dataset[5])
get_overview(model_credit_Overview, 'Credit Card Default dataset')

# Receiver Operating Characteristic (ROC) Curve
get_ROC_curve(credit_dataset, 'Credit Card Default dataset')

### MNIST Database of handwritten digits

In [None]:
# MNIST Database of handwritten digits
model_MNIST_Overview = Model_Validation(MNIST_dataset[2], MNIST_dataset[3], DecisionTreeClassifier(), len(MNIST_dataset[2]))
model_MNIST_Overview.set_valid_data(MNIST_dataset[4], MNIST_dataset[5])
get_overview(model_MNIST_Overview, 'MNIST Database of handwritten digits')

In [None]:
combine_images(['../Figures/Credit Card Default dataset overview.png',
                '../Figures/MNIST Database of handwritten digits overview.png'],
                '../Figures/Datasets overview.png')

## References|

[1] Thomas M. Mitchell (1997), “Machine Learning”, McGraw Hill.

[2] Y. LeCun, L. Bottou, Y. Bengio and P. Haffner (1998), “Gradient-Based Learning Applied to Document Recognition”, IEEE.

[3] J. R. Quinlan (1979), “Discovering rules from large collections of examples: A case study”, Edinburgh University Press.

[4] J. R. Quinlan (1993), “C4.5: Programs for Machine Learning”, Morgan Kaufmann Publishers.

[5] L. Breiman, J. Friedman, R. A. Olshen and C. J. Stone, (1984), “Classification and Decision Trees”, Chapman and Hall.

[6] R. Lippmann (1987), “An introduction to computing with neural nets”, IEEE.

[7] S. K. Pal and S. Mitra (1992), “Multilayer perceptron, fuzzy sets, and classification”, IEEE.

[8] R. E. Schapire (1990), “The Strength of Weak Learnability”, Springer.

[9] Y. Freund and R. E. Schapire (1995), “A desicion-theoretic generalization of on-line learning and an application to boosting”, Springer.

[10] C. Cortes and V. Vapnik (1995), “Support-Vector Networks”, Springer.

[11] T. Cover and P. Hart (1967), “Nearest neighbor pattern classification”, IEEE.