## Exercise 2:One versus all MNIST

In [10]:
import numpy as np
import pandas as pd
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.svm import SVC
from utils import A3, mnist_reader
from joblib import dump, load
from sklearn.metrics import confusion_matrix

The function mnist_reader saves the cvs file as a pickle file and returns the dataset as a numpy array
X_train has the shape (60000, 784), that is that there is 60000 images with 784 columns and every column has a value 0 -255.<br>
0 is the colour white and 255 is the color black. Numbers inbetween are different shades of gray. To make the model easier to train
the values < 0 will is converted to 1

In [11]:
data_train =  mnist_reader.load_mnist_dataset('mnist_train.csv') # load the mnist train set as a numpy array
data_test = mnist_reader.load_mnist_dataset('mnist_test.csv') # load the test set as a numpy array
X_train = data_train[:, 1:]
y_train = data_train[:, 0]
X_test = data_test[:, 1:]
y_test = data_test[:, 0]
X_train[X_train>0] = 1
X_test[X_test>0] = 1

Loading data...
Loading data...


For the parameter tuning I set aside a validation set. 
This set is used in the parameter search and not included when training the model

In [5]:
X_train, X_validation, y_train, y_validation = train_test_split(X_train, y_train, test_size=0.01, random_state=42)

params = {
    'C': [0.01, 0.1, 1, 10, 100],
    'gamma': [0.0001, 0.001, 0.01, 0.1],
    'kernel':['rbf']} 

svc = SVC()
print('Hyperparameter tuning...')
grid_search = GridSearchCV(svc, params, cv=5)
grid_search.fit(X_validation, y_validation)
df = pd.DataFrame(grid_search.cv_results_)

print(df[['param_C', 'param_gamma', 'mean_test_score']])
print('Best params:',grid_search.best_params_)
print('Best score:', grid_search.best_score_)
print('Best estimator:', grid_search.best_estimator_)

Hyperparameter tuning...
   param_C param_gamma  mean_test_score
0     0.01      0.0001         0.135000
1     0.01       0.001         0.135000
2     0.01        0.01         0.135000
3     0.01         0.1         0.125000
4      0.1      0.0001         0.135000
5      0.1       0.001         0.135000
6      0.1        0.01         0.371667
7      0.1         0.1         0.125000
8        1      0.0001         0.156667
9        1       0.001         0.795000
10       1        0.01         0.865000
11       1         0.1         0.270000
12      10      0.0001         0.796667
13      10       0.001         0.851667
14      10        0.01         0.885000
15      10         0.1         0.291667
16     100      0.0001         0.848333
17     100       0.001         0.861667
18     100        0.01         0.885000
19     100         0.1         0.291667
Best params: {'C': 10, 'gamma': 0.01, 'kernel': 'rbf'}
Best score: 0.885
Best estimator: SVC(C=10, break_ties=False, cache_size=200, cl

### One-vs-one classifiers, sklearn
<ul>
   <li>Using the best hyperparameters to train the model with the trainingset</li>
   <li>Saving the trained classifier</li>
</ul>
First time I save the model using the out commented code

In [12]:
#print('Training model...')
#clf = SVC(kernel='rbf', C=10, gamma=0.01)
# clf.fit(X_train, y_train)
# x = clf.score(X_test, y_test)
# print('Result', x)
# # Joblib is more efficient on big data
# print('Saveing model...')
# dump(clf, 'mnist_clf.joblib')
clf = load('utils/classifiers/mnist_clf.joblib')
print('Model successfully loaded')

Model successfully loaded


<ul>
   <li>Create a confusion matrix</li>
</ul>

In [13]:
prediciton = clf.predict(X_test)
print('Model accuracy:', str(100 * np.mean(prediciton == y_test)) + '%')
cm = confusion_matrix(y_test, prediciton)
print(cm)

Model accuracy: 98.13%
[[ 972    0    0    0    0    1    2    1    3    1]
 [   1 1123    2    1    1    1    2    0    4    0]
 [   4    0 1018    1    1    0    0    4    4    0]
 [   0    0    2  996    0    2    0    3    5    2]
 [   2    0    2    0  961    0    3    1    1   12]
 [   2    0    0   10    0  865    5    1    7    2]
 [   4    2    0    0    1    4  944    0    3    0]
 [   1    2    6    1    1    0    0 1009    1    7]
 [   4    1    3    6    3    1    2    3  949    2]
 [   1    6    1    7    8    4    0    4    2  976]]


### One-vs-all classifiers
<ul>
   <li>Traning and testing the classifiers</li>
   <li>Create a confusion matrix</li>
</ul>
First time I trained and saved the classifiers using the out commented code

In [14]:
# X = X_train
# y = y_train
X = X_test
y = y_test
print('Running one vs all...')
yy = A3.one_vs_all(X,y, 10)
print('Model accuracy:', str(100 * np.mean(yy == y)) + '%')
print('Building confusion matrix...')
cm = confusion_matrix(y, yy)
print(cm)

Running one vs all...
Model accuracy: 98.22%
Building confusion matrix...
[[ 972    1    0    1    1    0    2    1    1    1]
 [   1 1123    3    1    0    2    2    0    3    0]
 [   4    0 1014    3    1    0    0    5    5    0]
 [   0    0    1  990    1    3    0    3    9    3]
 [   0    0    3    0  962    0    2    1    3   11]
 [   2    0    0    9    0  871    4    0    5    1]
 [   3    2    0    0    1    4  943    0    5    0]
 [   1    0    5    2    0    0    0 1013    2    5]
 [   2    1    3    3    2    1    1    3  956    2]
 [   4    5    1    8    7    2    0    2    2  978]]


### One-vs-one vs One-vs-all
I get a slightly better result using the one-vs-all to make predicitons, the resluts differ with 0.09% so the time it took to train the one-vs-all classifiers was not worth it. By studing the confusin matrix it is shown that the two methods results are very much alike. It is the same features that they have high respectively low error rate on.