Digit Recognizer:

The competion is about classifying popular MNIST images. These images are of handwritten English digits. Ten classes from 0 to 9. Images are of size 28 x 28 but they are flatten and given as row vectors in train and test file. Train and test files have 42K and 28K rows respectively corresponding to 42K train images and 28K test images. Training file has 785 columns.  The first column is class label and next 784 are pixel intensity of an image (flatten image). Test file has 784 columns as it does not have class label for the images. Sample Submission file has 2 columns, the first one is ImageId and the second one is Label. Read this file in a dataframe. We then need to overwrite 28K predictied labels on Label column of the dataframe and save the dataframe file as submission.csv

In this notebook, we use feed forward neural network (FFNN) and create a baseline. In the next notebook, we will use convolutional neural network with data augmentation.

Steps:
1. Import require modules/packages/libraries.
2. Read training and test data.
3. Separate X_train (image) and y_train (class label) and get numpy arrays from training and test data.
4. Normalize train and test images.
5. One-hot encode class labels.
6. Setup callbacks. We only set ReduceLROnPlateau which helps in reducing learning rate (LR) during training best on specified conditions in setup.
7. Define FFNN model using Sequential or Model. Compile model.
8. Display model summary and model plot. 
9. Fit (train) Model.
10. Predict for test images.
11. Read sample_submission.csv in a dataframe.
12. Overwrite "Label" column in the dataframe with predictions.
13. Write dataframe as submission.csv.

Please Upvote the notebook, if you find it useful.

In [None]:
#import require modules/packages/libraries
import pandas as pd #for reading csv files
import numpy as np 
import tensorflow as tf 

from tensorflow import keras #we are going to code using keras
from keras.models import Sequential, Model #we can use Sequential or Model to define model
from keras.layers import Dense, BatchNormalization, Activation, Input #Importing layers which we will use in the model
from keras.utils.np_utils import to_categorical #for encoding target labels
from keras.callbacks import ReduceLROnPlateau #this will allow learning rate to be changed during training


In [None]:
#Let us read the data first
trdf=pd.read_csv('/kaggle/input/digit-recognizer/train.csv', header='infer')
tsdf=pd.read_csv('/kaggle/input/digit-recognizer/test.csv', header='infer')

#Separate X_train (image) and y_train (class label) and get numpy arrays
y_train=trdf["label"].values
X_train=trdf.iloc[:,1:].values

#numpy arrays of test images
X_test=tsdf.values

#delete dataframe, they are no longer reuired
del trdf, tsdf

#Check the data type and size
print(type(X_train), X_train.shape) #ndarray of size 42K X 784
print(type(X_test), X_test.shape) #ndarray of size 28K X 784
print(type(y_train), y_train.shape) #ndarray of size 42K

In [None]:
#Normalized images such that pixel intensities are in the range of -1 to 1. At present images are
#8 bit gray-scale images with intensities between 0 to 255.

X_train=(X_train-127.5)/127.5
X_test=(X_test-127.5)/127.5

#One-hot encode class labels, I mean y_train
y_train=to_categorical(y_train)

print(type(y_train), y_train.shape) #Notice the output, y_train is now one-hot encoded
print(X_train.dtype, y_train.dtype, X_test.dtype)

In [None]:
#setup callback
reduceLROnPlateau=ReduceLROnPlateau(monitor='val_acc', patience=3, verbose=1, factor=0.5, min_lr=0.00001)
#We are going to monitor validation accuracy and going to have patience of 3 epochs (that is we will wait for 3 epochs without improvement)
#before we set new lr = lr*factor. 
#Verbose is 1 which indicates that the update messages will be displayed, in case if it 0, no messages will
#be displayed

In [None]:
#Create model using Sequential
ffnn=Sequential() #allows us to add layers sequrntially
ffnn.add(Dense(units=512, input_shape=(784,))) #Dense (Fully connected) layer with 512 neurons
ffnn.add(BatchNormalization()) #Batchnormalization before activation
ffnn.add(Activation('relu')) #applying relu activation
ffnn.add(Dense(units=256))
ffnn.add(BatchNormalization())
ffnn.add(Activation('relu'))
ffnn.add(Dense(units=128))
ffnn.add(BatchNormalization())
ffnn.add(Activation('relu'))
ffnn.add(Dense(units=64))
ffnn.add(BatchNormalization())
ffnn.add(Activation('relu'))
ffnn.add(Dense(units=32))
ffnn.add(BatchNormalization())
ffnn.add(Activation('relu'))
ffnn.add(Dense(units=10))
ffnn.add(BatchNormalization())
ffnn.add(Activation('softmax')) #to output probability vector

#Set compile settings
ffnn.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

#display summary of the model
ffnn.summary()

#display the model
tf.keras.utils.plot_model(ffnn)

In [None]:
'''
#Alternatively model can be created using Model

inp=Input(shape=(784,)) #start with input layer

d1=Dense(units=512)(inp)
b1=BatchNormalization()(d1)
a1=Activation('relu')(b1)

d2=Dense(units=256)(a1)
b2=BatchNormalization()(d2)
a2=Activation('relu')(b2)

d3=Dense(units=128)(a2)
b3=BatchNormalization()(d3)
a3=Activation('relu')(b3)

d4=Dense(units=64)(a3)
b4=BatchNormalization()(d4)
a4=Activation('relu')(b4)

d5=Dense(units=32)(a4)
b5=BatchNormalization()(d5)
a5=Activation('relu')(b5)

d6=Dense(units=10)(a5)
b6=BatchNormalization()(d6)
a6=Activation('softmax')(b6)

ffnn=Model(inputs=inp, outputs=a6)
ffnn.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

ffnn.summary()
tf.keras.utils.plot_model(ffnn)

'''

In [None]:
#fit the model. We are setting number of epochs = 150 and batch_size is 120. validation_split is 0.2 that
#is 20% data will be used for validation
ffnn.fit(x=X_train, y=y_train, epochs=150, callbacks=[reduceLROnPlateau], batch_size=120, validation_split=0.2)

In [None]:
predictions=ffnn.predict(X_test) #make predictions on test images
print(predictions.shape) #This will be 28K x 10. For each test image, probability of 10 classes

#Decide class label for each test image based on max probability.
#This will be vector of size 28K. 
predictions=np.argmax(predictions,axis=1) 
print(predictions.shape) 


#read sample_submission.csv in the dataframe. The dataframe will have 2 columns ImageId and Label
sub=pd.read_csv('/kaggle/input/digit-recognizer/sample_submission.csv', header='infer')

#
sub["Label"]=predictions

sub.to_csv('submission.csv', index=False)

#Please Upvote the notebook, if you find it useful.