Digit Recognizer:

The competion is about classifying popular MNIST images. These images are of handwritten English digits. Ten classes from 0 to 9. Images are of size 28 x 28 but they are flatten and given as row vectors in train and test file. Train and test files have 42K and 28K rows respectively corresponding to 42K train images and 28K test images. Training file has 785 columns.  The first column is class label and next 784 are pixel intensity of an image (flatten image). Test file has 784 columns as it does not have class label for the images. Sample Submission file has 2 columns, the first one is ImageId and the second one is Label. We need to overwrite 28K predictied labels on Label column and save the file as submission.csv

If you are interested in FFNN baseline in keras refer https://www.kaggle.com/priyankdl/ffnn-baseline-in-keras and for CNN Baseline in keras refer https://www.kaggle.com/priyankdl/cnn-baseline-in-keras 


Features of the model in this notebook:
1. 99.6+ accuracy
2. No Leaks
3. Data Augmentation using ImageDataGenerator
4. Ensemble of 3 heterogeneius CNNs. CNN1 with 3 x 3 valid convolutions, CNN2 with 5 x 5 valid convolutions and CNN3 with 7 x 7 valid convolutions
5. No Pooling Layers
6. BatchNormalization Layers
7. ReLU activations
8. Valid convolutions

Stpes:
1. Import required modules/packages/libraries
2. Load train and test data
3. Separate image label from train data
4. Normalize train and test images such that pixel intensities are between -1 to 1.
5. Reshpae training and test images to size 28 x 28 x 1 (1 channel)
6. One-hot encode the target variable
7. Define models for 3 different CNNs and create instance of each type (model architectures are as discussed in https://arxiv.org/pdf/2008.10400v2.pdf). 
8. Create instance of ImageDataGenerator for image augmentation with rotation_range=10,               zoom_range=0.1, width_shift_range=0.1, height_shift_range=0.1. We don't augment images with anything else. Augmenting images is very important to ensure no overfitting. 
9. Use flow method of ImageDataGenerator with batch_size=128. This will allow fit method of the model to receive images in batches of size 128.
10. Set ReduceLROnPlateau
11. Train 3 CNNs using fit method and for 150 epochs
12. Predict using 3 CNNs (predictions would be 28K x 10 from each model as there are 10 classes)
13. Average probability of predictions from 3 models.
14. Decide the final predictions using argmax on average predictions.
15. Read sample_submission in a data frame, overwrite Label column with final prediction and write the updated dataframe as submission.csv

That's it.

Submit.

Please Upvote if you find it useful.

In [None]:
import numpy as np 
import pandas as pd 

from tensorflow import keras
from keras import Sequential
from keras.layers import Dense, Conv2D, Flatten, BatchNormalization, Activation
from keras.utils.np_utils import to_categorical
from keras.callbacks import ReduceLROnPlateau
from keras.preprocessing.image import ImageDataGenerator
#from scipy.stats import mode
from sklearn.model_selection import train_test_split

#Define model with 3 x 3 valid convolution, kernel_size=3, stride 1, and ReLU activation. 
#Also use BatchNormalization
def my_model3():
    model=Sequential()
    model.add( Conv2D(filters=32, kernel_size=(3,3), strides=(1,1), padding='valid', activation=None, use_bias=False, input_shape=(X_train.shape[1:])) )
    model.add(BatchNormalization())
    model.add(Activation('relu'))
    
    model.add( Conv2D(filters=48, kernel_size=(3,3), strides=(1,1), padding='valid', activation=None, use_bias=False) )
    model.add(BatchNormalization())
    model.add(Activation('relu'))
    
    model.add( Conv2D(filters=64, kernel_size=(3,3), strides=(1,1), padding='valid', activation=None, use_bias=False) )
    model.add(BatchNormalization())
    model.add(Activation('relu'))
    
    model.add( Conv2D(filters=80, kernel_size=(3,3), strides=(1,1), padding='valid', activation=None, use_bias=False) )
    model.add(BatchNormalization())
    model.add(Activation('relu'))
    
    model.add( Conv2D(filters=96, kernel_size=(3,3), strides=(1,1), padding='valid', activation=None, use_bias=False) )
    model.add(BatchNormalization())
    model.add(Activation('relu'))
    
    model.add( Conv2D(filters=112, kernel_size=(3,3), strides=(1,1), padding='valid', activation=None, use_bias=False) )
    model.add(BatchNormalization())
    model.add(Activation('relu'))
    
    model.add( Conv2D(filters=128, kernel_size=(3,3), strides=(1,1), padding='valid', activation=None, use_bias=False) )
    model.add(BatchNormalization())
    model.add(Activation('relu'))
    
    model.add( Conv2D(filters=144, kernel_size=(3,3), strides=(1,1), padding='valid', activation=None, use_bias=False) )
    model.add(BatchNormalization())
    model.add(Activation('relu'))
    
    model.add( Conv2D(filters=160, kernel_size=(3,3), strides=(1,1), padding='valid', activation=None, use_bias=False) )
    model.add(BatchNormalization())
    model.add(Activation('relu'))
    
    model.add( Conv2D(filters=176, kernel_size=(3,3), strides=(1,1), padding='valid', activation=None, use_bias=False) )
    model.add(BatchNormalization())
    model.add(Activation('relu'))
    
    model.add(Flatten())
    
    model.add(Dense(units=10))
    model.add(BatchNormalization())
    model.add(Activation('softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

#Define model with 5 x 5 valid convolution, kernel_size=3, stride 1, and ReLU activation. 
#Also use BatchNormalization
def my_model5():
    model=Sequential()
    
    model.add( Conv2D(filters=32, kernel_size=(5,5), strides=(1,1), padding='valid', activation=None, use_bias=False, input_shape=(X_train.shape[1:])) )
    model.add(BatchNormalization())
    model.add(Activation('relu'))
    
    model.add( Conv2D(filters=64, kernel_size=(5,5), strides=(1,1), padding='valid', activation=None, use_bias=False) )
    model.add(BatchNormalization())
    model.add(Activation('relu'))
    
    model.add( Conv2D(filters=96, kernel_size=(5,5), strides=(1,1), padding='valid', activation=None, use_bias=False ) )
    model.add(BatchNormalization())
    model.add(Activation('relu'))
    
    model.add( Conv2D(filters=128, kernel_size=(5,5), strides=(1,1), padding='valid', activation=None, use_bias=False) )
    model.add(BatchNormalization())
    model.add(Activation('relu'))
    
    model.add( Conv2D(filters=160, kernel_size=(5,5), strides=(1,1), padding='valid', activation=None, use_bias=False) )
    model.add(BatchNormalization())
    model.add(Activation('relu'))
        
    model.add(Flatten())
    
    model.add(Dense(units=10))
    model.add(BatchNormalization())
    model.add(Activation('softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

#Define model with 7 x 7 valid convolution, kernel_size=3, stride 1, and ReLU activation. 
#Also use BatchNormalization
def my_model7():
    model=Sequential()
    
    model.add( Conv2D(filters=48, kernel_size=(7,7), strides=(1,1), padding='valid', activation=None, use_bias=False, input_shape=(X_train.shape[1:])) )
    model.add(BatchNormalization())
    model.add(Activation('relu'))
    
    model.add( Conv2D(filters=96, kernel_size=(7,7), strides=(1,1), padding='valid', activation=None, use_bias=False) )
    model.add(BatchNormalization())
    model.add(Activation('relu'))
    
    model.add( Conv2D(filters=144, kernel_size=(7,7), strides=(1,1), padding='valid', activation=None, use_bias=False) )
    model.add(BatchNormalization())
    model.add(Activation('relu'))
    
    model.add( Conv2D(filters=192, kernel_size=(7,7), strides=(1,1), padding='valid', activation=None, use_bias=False) )
    model.add(BatchNormalization())
    model.add(Activation('relu'))
        
    model.add(Flatten())
    
    model.add(Dense(units=10))
    model.add(BatchNormalization())
    model.add(Activation('softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

'''
#The following function is useful if you want to implement ensemble throug majority vote
def mostCommon(mostC):
    val, count = mode(mostC, axis=1)
    return val.ravel()#.tolist() '''

#Read training and test data
X_train_full=pd.read_csv('/kaggle/input/digit-recognizer/train.csv', header='infer').values
X_test=pd.read_csv('/kaggle/input/digit-recognizer/test.csv', header='infer').values

#Separate label and images from the training data
X_train=X_train_full[:,1:]
y_train=X_train_full[:,0]

#Normalize train and test images
X_train = (X_train.astype(np.float32) - 127.5)/127.5
X_test = (X_test.astype(np.float32) - 127.5)/127.5

#If you wish to normalize intensities in the range of 0 to 1 use following
#X_train=X_train/255.
#X_test=X_test/255.

#delete X_train_full, you don't need it further
del X_train_full

#Reshpae train and test images from 784 to 28 x 28 x 1
X_train=X_train.reshape(-1,28,28,1)
X_test=X_test.reshape(-1,28,28,1)

#One-hot encode class labels
y_train_vectors=to_categorical(y_train)

print(X_train.shape)
print(X_test.shape)

X_train, X_val, y_train, y_val= train_test_split(X_train, y_train_vectors, test_size=0.2, random_state=2)

#Create instance of 3 CNNs
model3=my_model3()
model5=my_model5()
model7=my_model7()


#Create instance of ImageDataGenerator for augmenting training images.
#Augmentation can help avoid overfitting
#We are using rotation_range=10,zoom_range=0.1, width_shift_range=0.1, height_shift_range=0.1. 
#Nothing else for augmentation

train_datagen = ImageDataGenerator(featurewise_center=False,
                             samplewise_center=False,
                             featurewise_std_normalization=False,
                             samplewise_std_normalization=False,
                             zca_whitening=False,
                             rotation_range=10,
                             zoom_range=0.1,
                             width_shift_range=0.1,
                             height_shift_range=0.1,
                             horizontal_flip=False,
                             vertical_flip=False
                            )

#Use flow method to pass images to fit method in the batches of size 120
train_generator = train_datagen.flow(X_train, y_train,
                                     batch_size=120,
                                     shuffle=True)

val_datagen = ImageDataGenerator()
val_generator = val_datagen.flow(X_val, y_val,
                                 batch_size=120,
                                 shuffle=True)

#Set how we plan to reduce learning rate on plateau
reduceLROnPlateau = ReduceLROnPlateau(monitor='val_acc', 
                                patience=3,
                                verbose=1, 
                                factor=0.5,
                                min_lr=0.00001)


#fit 3 CNNs
model3.fit(train_generator, epochs=150, callbacks=[reduceLROnPlateau], validation_data=val_generator)
model5.fit(train_generator, epochs=150, callbacks=[reduceLROnPlateau], validation_data=val_generator)
model7.fit(train_generator, epochs=150, callbacks=[reduceLROnPlateau], validation_data=val_generator)

#Use 3-trained CNNs to make predictions. 
#Each prediction varialbe is a matrix of size 28K x 10 as there are 10 classes
prediction_vectors3=model3.predict(X_test)
prediction_vectors5=model5.predict(X_test)
prediction_vectors7=model7.predict(X_test)

print(prediction_vectors3.shape)
print(prediction_vectors5.shape)
print(prediction_vectors7.shape)

#One way of esembling, average predictions for 3 models and then use argmax to decide the
#prediction with max probability
average_prediction_vectors=(prediction_vectors3+prediction_vectors5+prediction_vectors7)/3.
predictions_final=np.argmax(average_prediction_vectors, axis=1)

'''
Another way of ensembling
Decide prediction from individual model and then take the majority vote

#Following 3 lines decide prediction from individual models, each prediction variable will be 
#now vector of size 28K
predictions3=np.argmax(prediction_vectors3,axis=1)
predictions5=np.argmax(prediction_vectors5,axis=1)
predictions7=np.argmax(prediction_vectors7,axis=1)

print(predictions3.shape)
print(predictions5.shape)
print(predictions7.shape)

#Combine predictions from individual model in 1 matrix, 
#number of rows will be 28K but now number of columns is 3
predictions=np.stack([predictions3,predictions5,predictions7], axis=1)

#mostCommon is the function written to take the majority vote
#After call to it, predictions_final is back to vector of size 28K
predictions_final=mostCommon(predictions)
print(predictions_final.shape)

'''

#Read sample_submission.csv in dataframe sub
sub = pd.read_csv('/kaggle/input/digit-recognizer/sample_submission.csv')

#Overwrite labels in dataframe sub
sub["Label"] = predictions_final

#Write updated dataframes as submission.csv
sub.to_csv('submission.csv',index=False)
#sub.head()

#Submit Now.
#Please Upvote if you find it useful.