# MNIST one-to-one: DL Vs Classic ML

Interesting test. Who will be the best at it? On the right corner we have DL, which will use only a very simple model, no convolutions, no strange normalization. Nothing. On the left side we have our old friends, SVM, RF, PCA and so on, ready to battle. Let's see what happens.

## The dataset
You know, It's MNIST


<font color=red><b> Import the dataset
</font>

In [25]:
import os, time
from numpy import expand_dims
import tensorflow as tf
physical_devices = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True)

tf.keras.backend.clear_session() 
from tensorflow.keras.datasets import mnist

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

## Round 1: DL

As mentioned above, we will make a very simplisticDL model

<font color=red><b> Build your own model and train it.
</font>

In [36]:
x_train_dl = x_train.astype('float32')
x_test_dl = x_test.astype('float32')
x_train_dl /= 255
x_test_dl /= 255

from keras.utils import to_categorical
y_train_binary = to_categorical(y_train)
y_test_binary = to_categorical(y_test)

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
batch_size = 256
num_classes = 10
epochs = 10

model = Sequential()
model.add (Dense(128, activation='relu', input_shape =(28, 28)))
model.add (Dense(32, activation='relu', input_shape =(28, 28)))

model.add(Flatten())
model.add (Dense(10, activation='softmax'))

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train_dl, y_train_binary,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(x_test_dl, y_test_binary))

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f41a06386a0>

## Round 2: Classic baseline

Let's see what can the classic guys do with not a big effort:

<font color=red><b> Build a couple of classic models and see how it goes. How long does it take?
</font>

In [21]:
from sklearn.metrics import accuracy_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression

x_train_classic = x_train.reshape(x_train.shape[0],-1)
x_test_classic = x_test.reshape(x_test.shape[0], -1)

## Training 
lr = LogisticRegression()
lr.fit(x_train_classic, y_train)

## Predicting
y_pred_lr = lr.predict(x_test_classic)
logistic_regression_score = accuracy_score(y_test, y_pred_lr)
logistic_regression_score

0.9255

In [22]:
rf = RandomForestClassifier()
rf.fit(x_train_classic, y_train)

## Predicting
y_pred_rf = rf.predict(x_test_classic)

random_forest_score = accuracy_score(y_test, y_pred_rf)
random_forest_score

0.9704

## Round 3: lets get convolutional
Now DL will still being simplistic, but in this case, let's use some convolutions

<font color=red><b> Build a simple CNN and see if you can beat classic ML
</font>

In [48]:
from tensorflow.keras.layers import Conv2D, Dropout, MaxPooling2D


x_train_dl_conv = x_train_dl.reshape(x_train.shape[0], 28, 28, 1)
x_test_dl_conv = x_test_dl.reshape(x_test.shape[0], 28, 28, 1)

# Creating a Sequential Model and adding the layers
model = Sequential()
model.add(Conv2D(28, kernel_size=(3,3), input_shape=(28, 28, 1)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten()) # Flattening the 2D arrays for fully connected layers
model.add(Dense(128, activation=tf.nn.relu))
model.add(Dropout(0.2))
model.add(Dense(10,activation=tf.nn.softmax))

model.compile(optimizer='adam', 
              loss='sparse_categorical_crossentropy', 
              metrics=['accuracy'])
model.fit(x_train_dl_conv, y_train,
          batch_size=batch_size,
          epochs=10,
          verbose=1,
          validation_data=(x_test_dl_conv, y_test))


Train on 60000 samples, validate on 10000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f418ca62cc0>

## Round 4: The Empire Strikes Back
Let's reduce dimensionality so that we can be sure we just use the interesting information

<font color=red><b> Use PCA and reduce the dataset dimensions to something, let's say 95% of variability. Then, train again on this new data. Add KNN to the equation
</font>

In [51]:
from sklearn.decomposition import PCA
pca = PCA()
pca.fit_transform(x_train_classic)

# Calculating optimal k to have 95% (say) variance 

k = 0
total = sum(pca.explained_variance_)
current_sum = 0

while(current_sum / total < 0.95):
    current_sum += pca.explained_variance_[k]
    k += 1
k

154

In [52]:
pca = PCA(n_components=k, whiten=True)

x_train_pca = pca.fit_transform(x_train_classic)
x_test_pca = pca.transform(x_test_classic)

In [57]:
## Training 
lr = LogisticRegression()
lr.fit(x_train_pca, y_train)

## Predicting
y_pred_lr = lr.predict(x_test_pca)
logistic_regression_score = accuracy_score(y_test, y_pred_lr)
logistic_regression_score

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


0.9246

In [58]:
rf = RandomForestClassifier()
rf.fit(x_train_pca, y_train)

## Predicting
y_pred_rf = rf.predict(x_test_pca)

random_forest_score = accuracy_score(y_test, y_pred_rf)
random_forest_score

0.9491

In [59]:
from sklearn.neighbors import KNeighborsClassifier
## Training 
knn = KNeighborsClassifier()
knn.fit(x_train_pca, y_train)
## Predicting
y_pred_knn = knn.predict(x_test_pca)

knn_score = accuracy_score(y_test, y_pred_knn)
knn_score

0.9017

## Final Round: SVM and RF grid search

These guys are now taking it seriously. 

<font color=red><b> Let's see what SVM and a grid search on RF can do. Maybe it is a good idea to train on SVM with even less features... or maybe let the hard training to be at home.
</font>

In [62]:
from sklearn.model_selection import GridSearchCV

param_grid = {'max_depth': [7, 14, 28],
              'n_estimators': [100, 200, 400]}

rf = RandomForestClassifier()
gs = GridSearchCV(estimator=rf, param_grid=param_grid, scoring='accuracy', cv=2, n_jobs=-1, verbose=1)
gs = gs.fit(x_train_classic, y_train)

print(gs.best_score_)
print(gs.best_params_)

Fitting 2 folds for each of 9 candidates, totalling 18 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 12 concurrent workers.
[Parallel(n_jobs=-1)]: Done  14 out of  18 | elapsed:  1.9min remaining:   32.9s
[Parallel(n_jobs=-1)]: Done  18 out of  18 | elapsed:  2.5min finished


0.9639
{'max_depth': 28, 'n_estimators': 400}


In [56]:
clf = SVC(C=0.1, kernel='rbf', gamma=0.1)
clf = clf.fit(x_train_pca, y_train)
y_pred_svc = clf.predict(x_test_pca)
svm_score = accuracy_score(y_test, y_pred_lr)
svm_score

0.9255