# MNIST Models

In this Notebook we will see two Machine Learning models to predict manuscript digits, training with **MNIST** Data.

In [1]:
import pandas as pd

import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

## Dense Neural Network

**MNIST** data set is available on package from _**scikit-learn**_ distribution, so we can import data directly. We have to normalize data between 0 and 1, so we have to divide by 255 each pixel. We also prepare the arrays for the first model, newral networks densely connected, so we need to have all the data in just a single array.

In [2]:
from keras.datasets import mnist

(X_train, y_train), (X_test, y_test)= mnist.load_data()

X_train = X_train.reshape(60000, 784)
X_test  = X_test.reshape (10000, 784)

X_train = X_train.astype ('float32' )
X_test  = X_test.astype  ('float32' )

X_train/= 255
X_test /= 255

Using TensorFlow backend.


Once we have loaded the data, we can go ahead and define and train the model. Specifically we have to connect the input layer (dimensions $28\cdot 28=784$ pixels) with two hidden layers, and with the output layer, where we apply _**softmax**_.

Additionally, in order to validate correctly, we have to convert labels to dummies.

Now we can compile the net specifying **loss**, **optimizer** and **metrics**.

And then we are ready to train the model. We can use the Keras _**evaluate**_ method in order to validate.

In [3]:
import keras

from keras.models import Sequential
from keras.layers import Dense

num_classes=10

# convert class vectors to binary class matrices
y_train_cat=keras.utils.to_categorical(y_train, num_classes)
y_test_cat =keras.utils.to_categorical(y_test , num_classes)

model=Sequential()
model.add(Dense(128, activation='relu', input_shape=(784,)))
model.add(Dense(64 , activation='relu'))
model.add(Dense(num_classes, activation='softmax'))

model.summary()

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

model.fit(X_train, y_train_cat, epochs=3, verbose=1, validation_data=(X_test, y_test_cat))

score=model.evaluate(X_test, y_test_cat, verbose=0)
print('Test loss:'    , score[0])
print('Test accuracy:', score[1])

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 128)               100480    
_________________________________________________________________
dense_2 (Dense)              (None, 64)                8256      
_________________________________________________________________
dense_3 (Dense)              (None, 10)                650       
Total params: 109,386
Trainable params: 109,386
Non-trainable params: 0
_________________________________________________________________
Train on 60000 samples, validate on 10000 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3
Test loss: 0.07671645826920867
Test accuracy: 0.9760000109672546


As we can see, we have obtained good results, but we will calculate the _**confusion matrix**_ to see the error distribution in the different classes.

In [4]:
from sklearn.metrics import classification_report, confusion_matrix

y_pred=model.predict_classes(X_test)

print(classification_report(y_test, y_pred))

conf=pd.DataFrame(
    confusion_matrix(y_test, y_pred),
)
print(conf)

              precision    recall  f1-score   support

           0       0.97      0.99      0.98       980
           1       0.99      0.98      0.99      1135
           2       0.96      0.98      0.97      1032
           3       0.97      0.97      0.97      1010
           4       0.98      0.98      0.98       982
           5       0.98      0.96      0.97       892
           6       0.98      0.98      0.98       958
           7       0.98      0.97      0.97      1028
           8       0.95      0.98      0.96       974
           9       0.98      0.96      0.97      1009

    accuracy                           0.98     10000
   macro avg       0.98      0.98      0.98     10000
weighted avg       0.98      0.98      0.98     10000

     0     1     2    3    4    5    6    7    8    9
0  973     1     0    0    0    0    2    1    2    1
1    0  1115     5    0    0    1    2    1   11    0
2    4     0  1016    1    3    0    1    5    2    0
3    0     0    11  979  

## Convolutional Neural Network

Now, we will try with convolutional layers. In this case, because of the input layer is a convolutional layer, we have to prepare the data to be bidimensional arrays (matrix), instead of unidimendionals.

In [5]:
X_train=X_train.reshape(X_train.shape[0], 1, 28, 28)
X_test =X_test.reshape (X_test.shape [0], 1, 28, 28)

This code is to make sure that we don't have any compatibility problems among our actual Keras version and the last released version.

In [6]:
import tensorflow as tf
import keras.backend.tensorflow_backend as tfback

print("tf.__version__ is", tf.__version__)
print("tf.keras.__version__ is:", tf.keras.__version__)

def _get_available_gpus():
    """Get a list of available gpu devices (formatted as strings).

    # Returns
        A list of available GPU devices.
    """
    return []
    #global _LOCAL_DEVICES
    if tfback._LOCAL_DEVICES is None:
        devices = tf.config.list_logical_devices()
        tfback._LOCAL_DEVICES = [x.name for x in devices]
    return [x for x in tfback._LOCAL_DEVICES if 'device:gpu' in x.lower()]

tfback._get_available_gpus = _get_available_gpus

tf.__version__ is 2.0.0
tf.keras.__version__ is: 2.2.4-tf


In [7]:
import keras

from keras.models import Sequential
from keras.layers import Dense, Conv2D, MaxPooling2D, Flatten

num_classes=10

# convert class vectors to binary class matrices
y_train_cat=keras.utils.to_categorical(y_train, num_classes)
y_test_cat =keras.utils.to_categorical(y_test , num_classes)

model=Sequential()
model.add(Conv2D(20, (5, 5), activation='relu', input_shape=(1, 28, 28), data_format='channels_first'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Conv2D(50, (5, 5), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Flatten())
model.add(Dense(500, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))

model.summary()

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

model.fit(X_train, y_train_cat, epochs=3, verbose=1, validation_data=(X_test, y_test_cat))

score=model.evaluate(X_test, y_test_cat, verbose=0)
print('Test loss:'    , score[0])
print('Test accuracy:', score[1])

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 20, 24, 24)        520       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 10, 12, 24)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 6, 8, 50)          30050     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 3, 4, 50)          0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 600)               0         
_________________________________________________________________
dense_4 (Dense)              (None, 500)               300500    
_________________________________________________________________
dense_5 (Dense)              (None, 10)               

With this model, we have better results but it's also slower and computationally more complex.

In [8]:
y_pred=model.predict_classes(X_test)

print(classification_report(y_test, y_pred))

conf=pd.DataFrame(
    confusion_matrix(y_test, y_pred),
)
print(conf)

              precision    recall  f1-score   support

           0       1.00      0.99      0.99       980
           1       0.99      1.00      1.00      1135
           2       0.99      0.99      0.99      1032
           3       0.99      0.99      0.99      1010
           4       0.99      1.00      0.99       982
           5       0.99      0.99      0.99       892
           6       0.99      0.99      0.99       958
           7       0.99      0.99      0.99      1028
           8       0.98      0.99      0.99       974
           9       0.99      0.98      0.98      1009

    accuracy                           0.99     10000
   macro avg       0.99      0.99      0.99     10000
weighted avg       0.99      0.99      0.99     10000

     0     1     2    3    4    5    6     7    8    9
0  968     1     0    0    0    1    5     2    2    1
1    0  1132     2    0    0    1    0     0    0    0
2    1     1  1023    0    0    0    0     4    2    1
3    0     0     2  9

## Save the model

Finally, we will save the model in **.pkl** format.

In [9]:
import pickle

pickle.dump(model, open('convolutional_nn_model.pkl', 'wb'))

And we can load the model again like this:

In [10]:
model = pickle.load(open('convolutional_nn_model.pkl', 'rb'))

## You can see this model with an UI here: <a href="https://mnistplayground.herokuapp.com/" target="_blank">MNIST Playground</a>