This Notebook trains a deep learning model using CNN (Convolutional Neural Network) to recognize MNIST digits from 0 to 9.

In [None]:
import pandas as pd
import numpy as np

The data files train.csv and test.csv contain gray-scale images of hand-drawn digits, from zero through nine.

Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total. Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker. This pixel-value is an integer between 0 and 255, inclusive.

The training data set, (train.csv), has 785 columns. The first column, called "label", is the digit that was drawn by the user. The rest of the columns contain the pixel-values of the associated image.

Each pixel column in the training set has a name like pixelx, where x is an integer between 0 and 783, inclusive. To locate this pixel on the image, suppose that we have decomposed x as x = i * 28 + j, where i and j are integers between 0 and 27, inclusive. Then pixelx is located on row i and column j of a 28 x 28 matrix, (indexing by zero).

# Load the data

In [None]:
train_data = pd.read_csv('../input/digit-recognizer/train.csv')
test_data = pd.read_csv('../input/digit-recognizer/test.csv')

train_data.head()

# Separate the label and features 

In [None]:
y_train = train_data['label']
x_train = train_data.drop(['label'], axis=1)

del train_data

# Plot the data

In [None]:
import seaborn as sns

sns.set(style='white', context='notebook', palette='Paired')

sns.countplot(y_train)

y_train.value_counts()

From the plotting, it is clear that the data is balanced, hence we can proceed further.

# Handle null values or missing values 

In [None]:
x_train.isnull().any().describe()

In [None]:
test_data.isnull().any().describe()

No null values or missing values, hence move on..

# Normalization

Normalization is an important step while training deep learning model. The model converges faster. Dividing by 255 is called grayscale normalization which helps in reducing the effect of illumination difference among various images.

In [None]:
x_train = x_train / 255.0
test_data = test_data / 255.0

x_train.head()

# Reshaping

converting images from 1D vector to 3D vector of shape (28 , 28, 1), where the 3rd dimension shows the color (channel) RGB. MNIST images are grayscale thats why we are taking it as 1.

In [None]:
x_train = x_train.values.reshape(-1, 28 , 28, 1)
test_data = test_data.values.reshape(-1, 28 , 28, 1)

In [None]:
test_data.shape

In [None]:
x_train.shape

# Encode the y_train (labels) 

In [None]:
from keras.utils.np_utils import to_categorical

y_train = to_categorical(y_train, num_classes = 10)

In [None]:
y_train[0]

# Split the data into training and validation sets

In [None]:
from sklearn.model_selection import train_test_split

random_seed = 4

x, x_val, y, y_val = train_test_split(x_train, y_train, test_size=0.1, random_state=random_seed)

In [None]:
import matplotlib.pyplot as plt

# Some examples
plt.figure(figsize=(10, 10))

for i in range(6):  
    plt.subplot(3, 3, i+1)
    plt.imshow(x[i][:,:,0])
    

# Model Architecture 

In [None]:
from keras.models import Sequential, Model
from keras.layers import Dense, Dropout, Flatten
from keras.layers.convolutional import Conv2D, MaxPooling2D
from keras.layers import Input, Dense, GlobalAveragePooling2D, Dropout, BatchNormalization
from keras.preprocessing.image import ImageDataGenerator
from keras.utils import plot_model
from keras.applications.xception import Xception
from keras.callbacks import ReduceLROnPlateau, ModelCheckpoint

In [None]:
model = Sequential()

model.add(Conv2D(filters = 32, kernel_size = (5,5),padding = 'Same', 
                 activation ='relu', input_shape = (28,28,1)))
model.add(Conv2D(filters = 32, kernel_size = (5,5),padding = 'Same', 
                 activation ='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25))


model.add(Conv2D(filters = 64, kernel_size = (3,3),padding = 'Same', 
                 activation ='relu'))
model.add(Conv2D(filters = 64, kernel_size = (3,3),padding = 'Same', 
                 activation ='relu'))
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2)))
model.add(Dropout(0.25))


model.add(Flatten())
model.add(Dense(256, activation = "relu"))
model.add(Dropout(0.5))
model.add(Dense(10, activation = "softmax"))

model.summary()

In [None]:
plot_model(model, show_shapes=True)

In [None]:
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

ReduceLROnPlateau reduces overfitting, it simply reduces the learning rate by a factor of 0.5 (i.e. half) whenever there is no improvement in the monitored value (here, validation accuracy) after three (patience) epochs.

ModelCheckpoint saves the model whenever it sees the monitored value (here, validation loss) is minimum (mode) than the previous model.

In [None]:
filepath = './model-ep{epoch:02d}-acc{val_accuracy:.3f}.h5'
callbacks = [   
             ReduceLROnPlateau(monitor='val_acc', 
                                patience=3, 
                                verbose=1, 
                                factor=0.5, 
                                min_lr=0.00001),          
            ModelCheckpoint(filepath= filepath, save_best_only = True, monitor='val_loss', mode='min')
            ]

In [None]:
EPOCHS = 20

history = model.fit(x,  
                    y,              
                    verbose = 1,            
                    epochs = EPOCHS, 
                    validation_data=(x_val, y_val),
                   callbacks = callbacks)

# Plot the validation loss and training loss

In [None]:
plt.plot(history.history['loss'], color='r')
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epochs')
plt.legend(['training', 'validation'], loc='upper right')
plt.show()

# Evaluate the model 

Test the model's performance on test dataset. I am using the highest accuracy model (last model) rather than the low validation loss model.

In [None]:
rows = 5
cols = 5

plt.figure(figsize=(10,10))
for index in range(rows*cols):
    img = test_data[index].reshape(1, 28, 28, 1)
    pred = np.argmax(model.predict(img))
    plt.subplot(rows, cols, index+1)
    plt.imshow(test_data[index][:,:,0])
    plt.xlabel('Predicted : {}'.format(pred))

plt.tight_layout()
plt.show()

# Submission

In [None]:
%%time

results =[]
for index in range(28000):
    img = test_data[index].reshape(1, 28, 28, 1)
    pred = np.argmax(model.predict(img))
    results.append(pred)

In [None]:
submission = pd.DataFrame()
submission['ImageId'] = [i for i in range(1, 28001)]
submission['Label'] = results

In [None]:
submission.to_csv('submission.csv', index=False)