# **MNIST Dataset** 🔢

### Introduction
The **MNIST** dataset is an acronym that stands for the **Modified National Institute of Standards and Technology** and it's the “Hello World” dataset of computer vision.
The dataset consists of 70000 pictures of handwritten digits (60000 training images and 10000 test images) each containing 28 * 28 pixels.

The goal of this notebook is to build a simple neural network to predict digits from handwritten images using Keras (with TensorFlow as the backend).Particularly, I will be using Keras' Functional Model API to build my neural network model.

My first step was to create a 6 layer model (4 hidden layers and two input and output layers) with a stochastic gradient descent optimizer.In my next model, I switched to Adam and finally, I used dropout layers in the final model to avoid overfitting.

### Import libraries and read data

In [1]:
import pandas as pd 

from tensorflow import keras
from keras.models import Model
from keras.layers import *

from sklearn.model_selection import train_test_split

In [2]:
train_df = pd.read_csv('./Dataset/mnist_train.csv')
test_df = pd.read_csv('./Dataset/mnist_test.csv')

print('Shape of training data: ', train_df.shape)
print('Shape of test data: ', test_df.shape)

Shape of training data:  (60000, 785)
Shape of test data:  (10000, 785)


Our dataset is a csv file format with 785 features, one of which is our target and the rest are image pixels.

In [3]:
train_df.head(3)

Unnamed: 0,label,1x1,1x2,1x3,1x4,1x5,1x6,1x7,1x8,1x9,...,28x19,28x20,28x21,28x22,28x23,28x24,28x25,28x26,28x27,28x28
0,5,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,4,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [4]:
test_df.head(3)

Unnamed: 0,label,1x1,1x2,1x3,1x4,1x5,1x6,1x7,1x8,1x9,...,28x19,28x20,28x21,28x22,28x23,28x24,28x25,28x26,28x27,28x28
0,7,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,2,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### Separate features and label  

In my dataset, I separated the label from the features.

In [5]:
train_features = train_df.iloc[:, 1:]
train_labels = train_df.iloc[:, 0]

test_features = test_df.iloc[:, 1:]
test_labels = test_df.iloc[:, 0]

In [6]:
train_features.head(3)

Unnamed: 0,1x1,1x2,1x3,1x4,1x5,1x6,1x7,1x8,1x9,1x10,...,28x19,28x20,28x21,28x22,28x23,28x24,28x25,28x26,28x27,28x28
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [7]:
train_labels.head()

0    5
1    0
2    4
3    1
4    9
Name: label, dtype: int64

### Divide the training data into the training and validation sets

15% of the training data was used for validation and the rest for training.

In [8]:
X_train, X_validation , y_train, y_validation = train_test_split(
    train_features, 
    train_labels,
    test_size=0.15, 
    random_state=42
)

### Data Preprocessing

As a result of reading the csv files in pandas, my datasets are pandas dataframes. I used to_numpy() to convert pandas dataframe to numpy ndarray aslo I changed data types to 'float32'.

The dataset consists of grayscale pictures(each pixel has a value between 1 and 255), for normalizing the data I divided each value to 255.

In [9]:
X_train = X_train.to_numpy().astype('float32') / 255.0
X_validation = X_validation.to_numpy().astype('float32') / 255.0
X_test = test_features.to_numpy().astype('float32') / 255.0

Also I used to_categorical() to one hot encode the labels into 10 categories(from digit 0 to 9).

In [10]:
y_train = keras.utils.to_categorical(y_train, 10)
y_validation = keras.utils.to_categorical(y_validation, 10)
y_test = keras.utils.to_categorical(test_labels, 10)

## Train Model ✅

### Model 1
Simple Neural Network with 4 hidden layers (200, 150, 100, 100) and stochastic gradient descent optimizer.

In [11]:
input = Input(shape=(784,))
x = Dense(200, activation='relu', name='Hidden-Layer-1')(input)
x = Dense(150, activation='relu', name='Hidden-Layer-2')(x)
x = Dense(100, activation='relu', name='Hidden-Layer-3')(x)
x = Dense(100, activation='relu', name='Hidden-Layer-4')(x)
output = Dense(10, activation='softmax', name='Output-Layer')(x)

Metal device set to: Apple M1


2022-05-17 14:24:21.222482: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-05-17 14:24:21.222757: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)


In [12]:
model_1 = Model(inputs=input, outputs=output, name='Model-1')
model_1.summary()

Model: "Model-1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 784)]             0         
                                                                 
 Hidden-Layer-1 (Dense)      (None, 200)               157000    
                                                                 
 Hidden-Layer-2 (Dense)      (None, 150)               30150     
                                                                 
 Hidden-Layer-3 (Dense)      (None, 100)               15100     
                                                                 
 Hidden-Layer-4 (Dense)      (None, 100)               10100     
                                                                 
 Output-Layer (Dense)        (None, 10)                1010      
                                                                 
Total params: 213,360
Trainable params: 213,360
Non-trainab

In [13]:
model_1.compile(
    loss='categorical_crossentropy',
    optimizer='sgd',
    metrics=['accuracy']
)

In [14]:
history_1 = model_1.fit(
    X_train, 
    y_train, 
    epochs=20,
    batch_size=100,
    validation_data=(X_validation, y_validation)
)

Epoch 1/20


2022-05-17 14:24:21.494705: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2022-05-17 14:24:21.661007: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.




2022-05-17 14:24:24.778447: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.


Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


 ### Model 2
 In my second model I trained my model with the same layers (200, 150, 100, 100) but with adam optimizer to improve my models functionality.
 as you can see the accuracy has improved.

In [15]:
input = Input(shape=(784, ))
x = Dense(200, activation='relu', name='Hidden-Layer-1')(input)
x = Dense(150, activation='relu', name='Hidden-Layer-2')(x)
x = Dense(100, activation='relu', name='Hidden-Layer-3')(x)
x = Dense(100, activation='relu', name='Hidden-Layer-4')(x)
output = Dense(10, activation='softmax', name='Otput-Layer')(x)

model_2 = Model(inputs=input, outputs=output, name='Model-2')
model_2.summary()

Model: "Model-2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_2 (InputLayer)        [(None, 784)]             0         
                                                                 
 Hidden-Layer-1 (Dense)      (None, 200)               157000    
                                                                 
 Hidden-Layer-2 (Dense)      (None, 150)               30150     
                                                                 
 Hidden-Layer-3 (Dense)      (None, 100)               15100     
                                                                 
 Hidden-Layer-4 (Dense)      (None, 100)               10100     
                                                                 
 Otput-Layer (Dense)         (None, 10)                1010      
                                                                 
Total params: 213,360
Trainable params: 213,360
Non-trainab

In [16]:
model_2.compile(
    loss='categorical_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)

In [17]:
history_2 = model_2.fit(
    X_train, 
    y_train, 
    epochs=20,
    batch_size=100,
    validation_data=(X_validation, y_validation)
)

Epoch 1/20
 17/510 [>.............................] - ETA: 3s - loss: 1.7343 - accuracy: 0.5035

2022-05-17 14:25:30.278982: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.




2022-05-17 14:25:33.801204: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.


Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


### Model 3 
To avoid model overfitting I used dropout layers in my final model.

In [18]:
input = Input(shape=(784, ))
x = Dense(200, activation='relu', name='Hidden-Layer-1')(input)
x = Dropout(0.2)(x)
x = Dense(150, activation='relu', name='Hidden-Layer-2')(x)
x = Dropout(0.2)(x)
x = Dense(100, activation='relu', name='Hidden-Layer-3')(x)
x = Dropout(0.2)(x)
x = Dense(100, activation='relu', name='Hidden-Layer-4')(x)
output = Dense(10, activation='softmax', name='Otput-Layer')(x)

model_3 = Model(inputs=input, outputs=output, name='Model-3')
model_3.summary()

Model: "Model-3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_3 (InputLayer)        [(None, 784)]             0         
                                                                 
 Hidden-Layer-1 (Dense)      (None, 200)               157000    
                                                                 
 dropout (Dropout)           (None, 200)               0         
                                                                 
 Hidden-Layer-2 (Dense)      (None, 150)               30150     
                                                                 
 dropout_1 (Dropout)         (None, 150)               0         
                                                                 
 Hidden-Layer-3 (Dense)      (None, 100)               15100     
                                                                 
 dropout_2 (Dropout)         (None, 100)               0   

In [19]:
model_3.compile(
    loss='categorical_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)

In [20]:
history_3 = model_3.fit(
    X_train, 
    y_train, 
    epochs=20,
    batch_size=100,
    validation_data=(X_validation, y_validation)
)

Epoch 1/20
 17/510 [>.............................] - ETA: 3s - loss: 2.0247 - accuracy: 0.3400

2022-05-17 14:26:46.767967: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.




2022-05-17 14:26:50.414476: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.


Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [23]:
history_3.history

{'loss': [0.39831963181495667,
  0.15745437145233154,
  0.11554968357086182,
  0.09480706602334976,
  0.07778573781251907,
  0.06841371208429337,
  0.06319032609462738,
  0.059195686131715775,
  0.050510697066783905,
  0.047499485313892365,
  0.0433037169277668,
  0.043293070048093796,
  0.04016260802745819,
  0.03628402575850487,
  0.0359664261341095,
  0.03260982036590576,
  0.03396379202604294,
  0.030067596584558487,
  0.0304805189371109,
  0.02617982029914856],
 'accuracy': [0.8777058720588684,
  0.9523921608924866,
  0.9650196433067322,
  0.9712941646575928,
  0.9760000109672546,
  0.9782353043556213,
  0.9801765084266663,
  0.9810980558395386,
  0.984333336353302,
  0.9855294227600098,
  0.9857059121131897,
  0.9863137602806091,
  0.9870392084121704,
  0.988588273525238,
  0.9890588521957397,
  0.9892941117286682,
  0.9891961216926575,
  0.9901372790336609,
  0.9905294179916382,
  0.9914118051528931],
 'val_loss': [0.13892154395580292,
  0.112286277115345,
  0.09719885885715485,

## Test and save model ✅

In [25]:
result = model_3.evaluate(X_test, y_test, batch_size=100)
print('test loss, test acc: ', result)

test loss, test acc:  [0.0806407481431961, 0.9794000387191772]


In [26]:
model_3.save('final-model.h5')