# Handwritten Digits Classifier

<p align="center">
<img src="img/digits.gif">
</p>

The MNIST database contains binary images of **handwritten digits**. The original black and white images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. The images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field.The database has a training set of 60,000 examples, and a test set of 10,000 examples. There are 10 classes (one for each of the 10 digits). **The task at hand is to train a model using the 60,000 training images and subsequently test its classification accuracy on the 10,000 test images**.

# 1. Imports 

In [4]:
# imports
import pandas as pd
import numpy  as np

from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.metrics         import confusion_matrix
from sklearn.metrics         import accuracy_score
from sklearn.neural_network  import MLPClassifier
from sklearn.linear_model    import Perceptron

from yellowbrick.classifier  import ConfusionMatrix

import random
import warnings
warnings.filterwarnings( "ignore" )

## 1.1. Load data

In [19]:
# Load data
from mnist import MNIST

mndata = MNIST( 'datasets/images_handwritten_digits' )

image_train, label_train = mndata.load_training( )
image_test, label_test = mndata.load_testing( )

In [23]:
# View an image
index = random.randrange( 0, len( image_train ) ) 
print( mndata.display( image_train[index] ) )


............................
............................
............................
............................
.............@@@@...........
..........@@@@@@@...........
.........@@@@.@@@...........
..........@@..@@@...........
..............@@@...........
..............@@............
.............@@@............
.............@@.............
............@@@.............
............@@@.............
...........@@@..............
...........@@@..............
..........@@@...............
.........@@@................
........@@@@................
........@@@.....@@@@........
.......@@@@@@@@@@@@@@@@@....
........@@@@@@@@@@@@@@@@....
...........@@@@.............
............................
............................
............................
............................
............................


# 2. Preprocessing

In [24]:
# Data transformation
X_train = pd.DataFrame( image_train )
X_test = pd.DataFrame( image_test )
y_train = pd.DataFrame( label_train )
y_test = pd.DataFrame( label_test )

In [25]:
# Joining image Dataframes
X = pd.concat( [X_train, X_test], ignore_index=True )

# Joining target Dataframes
y = pd.concat( [y_train, y_test], ignore_index=True )

In [14]:
# Data dimensions of image dataframe
print( 'Number of rows: ', X.shape[0] )
print( 'Number of columns: ', X.shape[1] )

Number of rows:  70000
Number of columns:  784


In [26]:
# Data dimensions of target dataframe
print( 'Number of rows: ', y.shape[0] )
print( 'Number of columns: ', y.shape[1] )

Number of rows:  70000
Number of columns:  1
