## MNIST Dataset

The Modified National Institute of Standards and Technology ([MNIST](https://en.wikipedia.org/wiki/MNIST_database)) database is a large database of handwritten digits. It was created by mixing samples taken from American Sensor board employees and those taken from high school students.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import to_categorical

In [None]:
# Load dataset
# The path needs to be updated with the path to the data files
df_train = pd.read_csv('../input/mnist-in-csv/mnist_train.csv')
df_test = pd.read_csv('../input/mnist-in-csv/mnist_test.csv')

The training and test dataset consists of 60,000 and 10,000 samples of handwritten digits in grayscale images, respectively. Each image has been scaled to 28 x 28 pixels. Each pixel has values in the range [0,255]. The pixels of the images have been flattened into an array of size 28 x 28 = 784. Each image in the training set has been labelled as [0-9].

In [None]:
print(df_train.shape)
print(df_test.shape)

In [None]:
df_train.head()

In [None]:
df_test.head()

In [None]:
df_train.describe()

In [None]:
df_test.describe()

## Data Visualization
Let us plot the first few images of the training dataset along with their labels. In order to do that, we need to reshape the images from 784 into 28 x 28 pixels.

In [None]:
train_images = np.reshape(df_train.drop(columns='label').values,(60000,28,28))
test_images  = np.reshape(df_test.drop(columns='label').values,(10000,28,28))

In [None]:
plt.figure(figsize=(10,10))
for i in range (25):
  plt.subplot(5,5,i+1)
  plt.imshow(train_images[i,:,:])
  plt.axis('off')
plt.subplots_adjust(wspace=0.0, hspace=0.1) 

Prepare X_Train and Y_Train for model fitting. Re-scale the pixel values from [0:255] to [0:1].

In [None]:
X_Train = df_train.drop(columns='label').values/255
X_Test  = df_test.drop(columns='label').values/255
Y_Train = df_train['label'].values
Y_Test  = df_test['label'].values

## Model Building and Compilation

Now let us build the model using a feed-forward neural network consisting of 3 layers:
* Input layer: 784 neurons corresponding to the 28 x 28 pixels
* Second layer: hidden layer consisting of 64 neurons
* Output layer: 10 nodes corresponding to the digits [0-9]

We would be using the Rectified Linear Units (ReLU) activation for the first two layers. For the output layer we would be using the 'softmax' activation the convert the outputs into categorical probabilities.

The model is compiled with the 'Adam' optimizer with accuracy serving as the metrics of the fitting. The categorical cross-entropy function, that is suitable for multi-class classification problems calculates the loss function. 

In [None]:
# Build the model
# 3 layers, 2 layers with 64 neurons + ReLu activation function
# l layer with 10 neuron and softmax function (Maximum entropy)
model = Sequential()
model.add(Dense(64,activation='relu',input_dim = 784))
model.add(Dense(64,activation='relu'))
model.add(Dense(10,activation='softmax'))

In [None]:
# Complie the model
model.compile(
    optimizer = 'adam',
    loss = 'categorical_crossentropy',
    metrics = ['accuracy']
    )

In [None]:
# Train the model
X_Train_fit = model.fit(
    X_Train,
    to_categorical(Y_Train), # Ex. 2 -> [0,0,1,0,0,0,0,0,0,0]
    epochs = 10,
    batch_size = 50
)

## Training Performance

In [None]:
# Plot the training performance
plt.figure(figsize=(12,4))
plt.suptitle('Training Performance')

plt.subplot(121)
plt.plot(X_Train_fit.epoch,X_Train_fit.history['accuracy'])
plt.xlabel('Epochs')
plt.ylabel('Accuracy')

plt.subplot(122)
plt.plot(X_Train_fit.epoch,X_Train_fit.history['loss'])
plt.xlabel('Epochs')
plt.ylabel('Loss')

plt.show()

## Model Evaluation

In [None]:
# Evaluate the model
model.evaluate(
    X_Test,
    to_categorical(Y_Test)
)

The accuracy of the test set is slightly lower than that of the training set (the difference being by ~ 2%). It is possible that this can e due to overfitting of the traiing data. It might partition the training set into a validation set to further minimize this difference.

## Result Visualization
Let us visualize a random set of test images along with the predicted classifications.

In [None]:
indices = np.random.randint(10000,size=25)

In [None]:
# predict on the first 5 test images
predictions = model.predict(X_Test[indices,:])
# Print model predictions
print(np.argmax(predictions, axis = 1))
print(Y_Test[indices])

In [None]:
plt.figure(figsize=(10,10))
j = 0
for i in indices:
  plt.subplot(5,5,j+1)
  plt.imshow(test_images[i,:,:])
  plt.axis('off')
  plt.title(str(np.argmax(predictions, axis = 1)[j]))
  j = j+1
plt.show()