Digit Recognizer:

The competion is about classifying popular MNIST images. These images are of handwritten English digits. Ten classes from 0 to 9. Images are of size 28 x 28 but they are flatten and given as row vectors in train and test file. Train and test files have 42K and 28K rows respectively corresponding to 42K train images and 28K test images. Training file has 785 columns.  The first column is class label and next 784 are pixel intensity of an image (flatten image). Test file has 784 columns as it does not have class label for the images. Sample Submission file has 2 columns, the first one is ImageId and the second one is Label. Read this file in a dataframe. We need to overwrite 28K predictied labels on Label column and save the dataframe as submission.csv

In this notebook, we use convolutional neural network (CNN) and create a baseline. In the next notebook, we will use better CNN with ensemble learning. If you want to refer to feed forward neural network (FFNN) baseline, you can refer to the notebook: https://www.kaggle.com/priyankdl/ffnn-baseline-in-keras

Steps:
1. Import require modules/packages/libraries
2. Read training and test data
3. Separate X_train (image) and y_train (class label) and get numpy arrays from training data
4. Normalize train and test images
5. One-hot encode class labels and create train and validation split for training.
6. Define CNN model using Sequential or Model. Compile model
7. Display model summary and model plot. 
8. Create instance of ImageDataGenerator for image augmentation with rotation_range=10, zoom_range=0.1, width_shift_range=0.1, height_shift_range=0.1. We don't augment images with anything else. Augmenting images is very important to ensure no overfitting. We create generator for train and validation.
9. Use flow method of ImageDataGenerator with batch_size=128. This will allow fit method of the model to receive images in batches of size 128.
10. Setup callbacks
11. Fit Model
12. Predict for test images
13. Read sample_submission.csv in a dataframe
14. Overwrite "Label" column in the dataframe with predictions
15. Write dataframe as submission.csv

Please Upvote the notebook, if you find it useful.

In [None]:
import numpy as np 
import pandas as pd
import tensorflow as tf

from tensorflow import keras
from keras import Sequential
from keras.layers import Dense, Conv2D, Flatten, BatchNormalization, Activation, MaxPooling2D
from keras.utils.np_utils import to_categorical
from keras.callbacks import ReduceLROnPlateau
from keras.preprocessing.image import ImageDataGenerator
from sklearn.model_selection import train_test_split

In [None]:
!nvidia-smi

In [None]:
#Read training and test data
X_train_full=pd.read_csv('/kaggle/input/digit-recognizer/train.csv', header='infer').values
X_test=pd.read_csv('/kaggle/input/digit-recognizer/test.csv', header='infer').values

#Separate label and images from the training data
X_train=X_train_full[:,1:]
y_train=X_train_full[:,0]

#Normalize train and test images
X_train = (X_train.astype(np.float32) - 127.5)/127.5
X_test = (X_test.astype(np.float32) - 127.5)/127.5

In [None]:
#Reshpae train and test images from 784 to 28 x 28 x 1
X_train=X_train.reshape(-1,28,28,1)
X_test=X_test.reshape(-1,28,28,1)

#One-hot encode class labels
y_train_vectors=to_categorical(y_train)

#Create train and validation split
X_train, X_val, y_train, y_val= train_test_split(X_train, y_train_vectors, test_size=0.2, random_state=2)

In [None]:
#Define CNN
cnn=Sequential()

cnn.add(Conv2D(filters=32, kernel_size=(3,3), strides=(1,1), padding='valid', activation=None, use_bias=False, input_shape=(X_train.shape[1:])))
cnn.add(BatchNormalization())
cnn.add(Activation('relu'))
cnn.add(Conv2D(filters=64, kernel_size=(3,3), strides=(1,1), padding='valid', activation=None, use_bias=False))
cnn.add(BatchNormalization())
cnn.add(Activation('relu'))
cnn.add(MaxPooling2D(pool_size=(2,2)))

cnn.add(Conv2D(filters=96, kernel_size=(3,3), strides=(1,1), padding='valid', activation=None, use_bias=False))
cnn.add(BatchNormalization())
cnn.add(Activation('relu'))
cnn.add(Conv2D(filters=128, kernel_size=(3,3), strides=(1,1), padding='valid', activation=None, use_bias=False))
cnn.add(BatchNormalization())
cnn.add(Activation('relu'))


cnn.add(Flatten())
    
cnn.add(Dense(units=10))
cnn.add(BatchNormalization())
cnn.add(Activation('softmax'))

cnn.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In [None]:
#Create instance of ImageDataGenerator for augmenting training images.
#Augmentation can help avoid overfitting
#We are using rotation_range=10,zoom_range=0.1, width_shift_range=0.1, height_shift_range=0.1. 
#Nothing else for augmentation

train_datagen = ImageDataGenerator(featurewise_center=False,
                             samplewise_center=False,
                             featurewise_std_normalization=False,
                             samplewise_std_normalization=False,
                             zca_whitening=False,
                             rotation_range=10,
                             zoom_range=0.1,
                             width_shift_range=0.1,
                             height_shift_range=0.1,
                             horizontal_flip=False,
                             vertical_flip=False
                            )

#Use flow method to pass images to fit method in the batches of size 120
train_generator = train_datagen.flow(X_train, y_train,
                                     batch_size=120,
                                     shuffle=True)

val_datagen = ImageDataGenerator()
val_generator = val_datagen.flow(X_val, y_val,
                                 batch_size=120,
                                 shuffle=True)

In [None]:
#Set how we plan to reduce learning rate on plateau
reduceLROnPlateau = ReduceLROnPlateau(monitor='val_acc', 
                                patience=3,
                                verbose=1, 
                                factor=0.5,
                                min_lr=0.00001)

In [None]:
#Display model summary and plot
cnn.summary()

tf.keras.utils.plot_model(cnn)

In [None]:
#Fit/Train CNN
cnn.fit(train_generator, epochs=150, callbacks=[reduceLROnPlateau], validation_data=val_generator, steps_per_epoch=400)

In [None]:
#Predict on test images, 10 probabilities for each test image as there are 10 classes.
#So, basically prediction_vectors is 28K x 10
prediction_vectors=cnn.predict(X_test)

#Decide the label from probabilities, so now, prediction_final is 28K x 10
predictions_final=np.argmax(prediction_vectors, axis=1)

In [None]:
#Read sample_submission.csv in dataframe sub
sub = pd.read_csv('/kaggle/input/digit-recognizer/sample_submission.csv')

#Overwrite labels in dataframe sub
sub["Label"] = predictions_final

#Write updated dataframes as submission.csv
sub.to_csv('submission.csv',index=False)

#Please Upvote the notebook, if you find it useful.