## SVM, MLP and CNN

In [1]:
import pyreadr
import xarray
mnist = pyreadr.read_r("mnist.RData")

In [2]:
import numpy as np
x_train = np.array(mnist['x_train'])
x_test = np.array(mnist['x_test'])
y_train = np.array(mnist['y_train'])
y_test = np.array(mnist['y_test'])

In [3]:
# rescale
x_train = x_train/255.0
x_test = x_test/255.0
# reshape
x_train = x_train.reshape((10000, 28*28))
x_test = x_test.reshape((60000, 28*28))
y_train = y_train.reshape((10000,))
y_test = y_test.reshape((60000,))

### 1. SVM

In [4]:
from sklearn import svm
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV

In [5]:
# pick 10% data from train and test
# to make the size of every class balanced, use stratified split in train_test_split

# training data
x_train_small, xt, y_train_small, yt = train_test_split(x_train, y_train, test_size=0.9, 
                                                        shuffle=True, stratify=y_train, random_state=503)

# test data
x_test_small, xt, y_test_small, yt = train_test_split(x_test, y_test, test_size=0.9, 
                                                      shuffle=True, stratify=y_test, random_state=503)

#### Linear Kernel

In [6]:
C_range = np.array([0.001, 0.01, 0.1, 1, 10, 100])
gamma_range = np.array([0.0001, 0.001, 0.01, 0.1, 1, 10, 100])
parameters = {'C': C_range, 'gamma':gamma_range}

In [7]:
svm_linear = svm.SVC(kernel='linear')
grid_linear = GridSearchCV(estimator=svm_linear, param_grid=parameters, n_jobs = 10, verbose = 1)
grid_linear.fit(x_train_small, y_train_small)

Fitting 5 folds for each of 42 candidates, totalling 210 fits


GridSearchCV(estimator=SVC(kernel='linear'), n_jobs=10,
             param_grid={'C': array([1.e-03, 1.e-02, 1.e-01, 1.e+00, 1.e+01, 1.e+02]),
                         'gamma': array([1.e-04, 1.e-03, 1.e-02, 1.e-01, 1.e+00, 1.e+01, 1.e+02])},
             verbose=1)

In [8]:
import pandas as pd
res = pd.DataFrame(grid_linear.cv_results_['params'])
res['cv_accuracy'] = grid_linear.cv_results_['mean_test_score']
res.transpose()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,32,33,34,35,36,37,38,39,40,41
C,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.01,0.01,0.01,...,10.0,10.0,10.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0
gamma,0.0001,0.001,0.01,0.1,1.0,10.0,100.0,0.0001,0.001,0.01,...,1.0,10.0,100.0,0.0001,0.001,0.01,0.1,1.0,10.0,100.0
cv_accuracy,0.727,0.727,0.727,0.727,0.727,0.727,0.727,0.875,0.875,0.875,...,0.877,0.877,0.877,0.877,0.877,0.877,0.877,0.877,0.877,0.877


In [9]:
grid_linear.best_params_

{'C': 0.1, 'gamma': 0.0001}

In [10]:
grid_linear.best_score_

0.8779999999999999

The best SVM with linear kernel achieves 0.8780 accuracy and the corresponding parameters are `C = 0.1` and `gamma = 0.0001`.

In [11]:
svm_linear_best = grid_linear.best_estimator_
y_pred = svm_linear_best.predict(x_test_small)
svm_linear_error = np.mean(y_pred!=y_test_small)
svm_linear_error

0.11233333333333333

The test error of the best linear kernel SVM is 0.1123.

#### Radial Kernel

In [12]:
svm_rbf = svm.SVC(kernel='rbf')
grid_rbf = GridSearchCV(estimator=svm_rbf, param_grid=parameters, n_jobs=10, verbose=1)
grid_rbf.fit(x_train_small, y_train_small)

Fitting 5 folds for each of 42 candidates, totalling 210 fits


GridSearchCV(estimator=SVC(), n_jobs=10,
             param_grid={'C': array([1.e-03, 1.e-02, 1.e-01, 1.e+00, 1.e+01, 1.e+02]),
                         'gamma': array([1.e-04, 1.e-03, 1.e-02, 1.e-01, 1.e+00, 1.e+01, 1.e+02])},
             verbose=1)

In [13]:
res = pd.DataFrame(grid_rbf.cv_results_['params'])
res['cv_accuracy'] = grid_rbf.cv_results_['mean_test_score']
res.transpose()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,32,33,34,35,36,37,38,39,40,41
C,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.01,0.01,0.01,...,10.0,10.0,10.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0
gamma,0.0001,0.001,0.01,0.1,1.0,10.0,100.0,0.0001,0.001,0.01,...,1.0,10.0,100.0,0.0001,0.001,0.01,0.1,1.0,10.0,100.0
cv_accuracy,0.114,0.114,0.114,0.114,0.114,0.114,0.114,0.114,0.114,0.114,...,0.114,0.114,0.114,0.879,0.889,0.917,0.697,0.114,0.114,0.114


In [14]:
grid_rbf.best_params_

{'C': 10.0, 'gamma': 0.01}

In [15]:
grid_rbf.best_score_

0.917

The best SVM with radial kernel achieves 0.917 accuracy and the corresponding parameters are `C = 10` and `gamma = 0.01`.

In [16]:
svm_rbf_best = grid_rbf.best_estimator_
y_pred = svm_rbf_best.predict(x_test_small)
svm_rbf_error = np.mean(y_pred!=y_test_small)
svm_rbf_error

0.07883333333333334

The test error of the best radial kernel SVM is 0.0788.

#### Polynomial Kernel

In [17]:
degree_range = np.array([3, 5, 7, 10])
gamma_range_poly = np.array([0.0001, 0.001, 0.01, 0.1, 1, 10, 30])
parameters_poly = {'C': C_range, 'gamma':gamma_range_poly, 'degree':degree_range}

In [18]:
svm_poly = svm.SVC(kernel='poly')
grid_poly = GridSearchCV(estimator=svm_poly, param_grid=parameters_poly, n_jobs=10, verbose=2)
grid_poly.fit(x_train_small, y_train_small)

Fitting 5 folds for each of 168 candidates, totalling 840 fits


GridSearchCV(estimator=SVC(kernel='poly'), n_jobs=10,
             param_grid={'C': array([1.e-03, 1.e-02, 1.e-01, 1.e+00, 1.e+01, 1.e+02]),
                         'degree': array([ 3,  5,  7, 10]),
                         'gamma': array([1.e-04, 1.e-03, 1.e-02, 1.e-01, 1.e+00, 1.e+01, 3.e+01])},
             verbose=2)

In [19]:
res = pd.DataFrame(grid_poly.cv_results_['params'])
res['cv_accuracy'] = grid_poly.cv_results_['mean_test_score']
res.transpose()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,158,159,160,161,162,163,164,165,166,167
C,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,...,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0
degree,3.0,3.0,3.0,3.0,3.0,3.0,3.0,5.0,5.0,5.0,...,7.0,7.0,7.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0
gamma,0.0001,0.001,0.01,0.1,1.0,10.0,30.0,0.0001,0.001,0.01,...,1.0,10.0,30.0,0.0001,0.001,0.01,0.1,1.0,10.0,30.0
cv_accuracy,0.114,0.114,0.114,0.813,0.873,0.873,0.873,0.114,0.114,0.114,...,0.681,0.681,0.681,0.114,0.114,0.497,0.576,0.576,0.576,0.576


In [20]:
grid_poly.best_params_

{'C': 0.001, 'degree': 3, 'gamma': 1.0}

In [21]:
grid_poly.best_score_

0.873

The best SVM with polynomial kernel achieves 0.873 accuracy and the corresponding parameters are `C = 0.001` and `gamma = 1.0`.

In [22]:
svm_poly_best = grid_poly.best_estimator_
y_pred = svm_poly_best.predict(x_test_small)
svm_poly_error = np.mean(y_pred!=y_test_small)
svm_poly_error

0.119

The test error of the best poly kernel SVM is 0.119.

#### Summary

In [57]:
summary = pd.DataFrame({'Model':['SVM(linear kernel)', 'SVM(radial kernel)', 'SVM(polynomial kernel)'],
                        'cost':[0.1, 10, 0.01], 'gamma':[0.0001, 0.01, 1.0], 'degree':[np.nan, np.nan, 3],
                        'cv score': [grid_linear.best_score_, grid_rbf.best_score_, grid_poly.best_score_],
                        'test score': [1-svm_linear_error, 1-svm_rbf_error, 1-svm_poly_error]
                       })
summary

Unnamed: 0,Model,cost,gamma,degree,cv score,test score
0,SVM(linear kernel),0.1,0.0001,,0.878,0.887667
1,SVM(radial kernel),10.0,0.01,,0.917,0.921167
2,SVM(polynomial kernel),0.01,1.0,3.0,0.873,0.881


We can see that the SVM with raidal kernel performs best among three kernels, which gives a 0.917 cross-validation accuracy and 0.921 test accuracy.

### 2.1. MLP

In [24]:
from tensorflow import keras
from tensorflow.keras.layers import Activation, Dense, Input
from tensorflow.keras import utils 
from tensorflow.keras import Sequential
from tensorflow.keras.optimizers import Adam, SGD
from tensorflow.keras.losses import sparse_categorical_crossentropy
from scikeras.wrappers import KerasClassifier

In [25]:
def create_mlp(hid_layer_n, lr, opt, acti):
    model = Sequential()
    model.add(Input(shape=(784, )))
    model.add(Dense(hid_layer_n, activation = acti))
    model.add(Dense(10, activation = 'softmax'))
    model.compile(optimizer=opt(learning_rate=lr), loss=sparse_categorical_crossentropy, metrics=['accuracy'])
    return model

In [26]:
# parameters
actis = ['relu', 'tanh', 'sigmoid']
hidden_ns = [32, 64, 128, 256]
lrs = [0.001, 0.01, 0.1]
opts = [Adam, SGD]
grid_mlp = [(a, hn, lr, opt) for a in actis for hn in hidden_ns for lr in lrs for opt in opts]

In [None]:
from sklearn.model_selection import cross_val_score

mlp_res = dict()
count = 0
for g in grid_mlp:
    count += 1
    print(count)
    mlp = KerasClassifier(create_mlp(g[1], g[2], g[3], g[0]),
                          epochs=10,
                          batch_size=50,
                          verbose=1
                         )
    cv_score = np.mean(cross_val_score(mlp, x_train, y_train, cv = 5))
    mlp_res[g] = cv_score

_The output showing the training process of the MLP is too long to show in the exported html file. So I clear the output of this cell._

In [36]:
# best MLP
mlp_res_sort = [(k, v) for k, v in sorted(mlp_res.items(), key=lambda item: -item[1])]
mlp_res_sort[0]

(('sigmoid', 256, 0.01, keras.optimizer_v2.adam.Adam), 0.9558)

We can see that the best parameters are sigmoid activation function, 256 neurons in the hidden layer and Adam optimizer with 0.01 learning rate. This group of parameters achieves 0.9558 accuracy in the 5-fold cross validation.

In [50]:
mlp_best = KerasClassifier(create_mlp(256, 0.01, Adam, 'sigmoid'), epochs=10, batch_size=50, verbose=0)
mlp_best.fit(x_train, y_train)
y_pred = mlp_best.predict(x_test)

2022-03-25 21:46:07.552030: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-03-25 21:46:13.819626: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.


In [52]:
mlp_error = np.mean(y_pred!=y_test)
mlp_error

0.042616666666666664

The best MLP model gives a 0.9574 test accuracy.

### 2.2 CNN

In [37]:
from tensorflow.keras.layers import Conv2D, Flatten, Dropout, MaxPooling2D
from tensorflow.keras.optimizers import Adadelta

In [38]:
x_train = np.array(mnist['x_train'])
x_test = np.array(mnist['x_test'])
# rescale
x_train = x_train/255.0
x_test = x_test/255.0

In [39]:
def create_cnn(num_cnn_layers, drop_out, max_neuron, opt, acti):
    NUM_FILTERS = 32
    KERNEL = (3,3)
    
    model = Sequential()
    model.add(Input(shape=(28, 28, 1)))
    for i in range(1, num_cnn_layers+1):
        model.add(Conv2D(NUM_FILTERS*i, kernel_size = KERNEL, activation = 'relu', padding = 'same'))
    
    model.add(MaxPooling2D(pool_size = (2,2)))
    model.add(Dropout(drop_out))
    model.add(Flatten())
    model.add(Dense(max_neuron, activation = acti))
    model.add(Dense(10, activation = 'softmax'))
    
    model.compile(optimizer=opt(learning_rate = 0.001), loss=sparse_categorical_crossentropy, metrics=['accuracy'])
    
    return model

In [40]:
num_cnn_layerss = [1, 2]
drop_outs = [0.25, 0.5]
max_neurons = [120, 240]
opts = [Adam, Adadelta]
actis = ['relu', 'sigmoid']
grid_cnn = [(nc, do, mn, opt, ac) for nc in num_cnn_layerss for do in drop_outs for mn in max_neurons 
                                  for opt in opts for ac in actis]

In [None]:
cnn_res = dict()
count = 0
for g in grid_cnn:
    count += 1
    print("^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^")
    print(count)
    print("^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^")
    cnn = KerasClassifier(create_cnn(g[0], g[1], g[2], g[3], g[4]),
                          epochs=10,
                          batch_size=50,
                          verbose=1
                         )
    cv_score = np.mean(cross_val_score(cnn, x_train, y_train, cv = 5))
    cnn_res[g] = cv_score

_The output showing the training process of the CNN is too long to show in the exported html file. So I clear the output of this cell._

In [43]:
# best CNN
cnn_res_sort = [(k, v) for k, v in sorted(cnn_res.items(), key=lambda item: -item[1])]
cnn_res_sort[0]

((2, 0.25, 240, keras.optimizer_v2.adam.Adam, 'sigmoid'), 0.9792)

We can see that the best parameters are 2 convolution layers, 0.25 drop rate, 240 neurons in the dense layer with sigmoid activation function after flatten, and Adam optimizer. This group of parameters achieves 0.9792 accuracy in the 5-fold cross validation.

In [45]:
cnn_best = KerasClassifier(create_cnn(2, 0.25, 240, Adam, 'sigmoid'), epochs=10, batch_size=50, verbose=0)
cnn_best.fit(x_train, y_train)
y_pred = cnn_best.predict(x_test)

2022-03-25 21:43:35.178056: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-03-25 21:44:02.099014: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.


In [48]:
cnn_error = np.mean(y_pred!=y_test)
cnn_error

0.0266

The best CNN model achieves a 0.9734 test accuracy.

### Summary

In [58]:
nn_summary = pd.DataFrame({'Model': ['MLP', 'CNN'], 
                            'cv score': [mlp_res_sort[0][1], cnn_res_sort[0][1]], 
                            'test score': [1-mlp_error, 1-cnn_error]})
pd.concat([summary, nn_summary], ignore_index=True)

Unnamed: 0,Model,cost,gamma,degree,cv score,test score
0,SVM(linear kernel),0.1,0.0001,,0.878,0.887667
1,SVM(radial kernel),10.0,0.01,,0.917,0.921167
2,SVM(polynomial kernel),0.01,1.0,3.0,0.873,0.881
3,MLP,,,,0.9558,0.957383
4,CNN,,,,0.9792,0.9734


It can be found that NN performs better than SVM in terms of both cv accuracy and test accuracy. And among three SVM models, SVM with raidal kernel performs best with 0.92 test accuracy; for NN models, CNN perform better than MLP. CNN performs best among five models, giving 0.97 test accuracy.