# MNIST Classification using CNN

**This notebook is a basic introduction to image classification in TensorFlow. Here you can learn about:**
* Reading CSV files into an Image
* Visualizing the image
* Reshaping method and handling Image color channels
* Data Augmentation using Tensorflow ImageDataGenerator
* Building a CNN model with an introduction to the layers used
* Plotting the training and loss curve
* Reusable helper function

So, go ahead and read the notebook for a quick and easy introduction into CNN Image classification

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn import model_selection
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Input, BatchNormalization, Dropout, Conv2D, MaxPooling2D

**DATA EXPLORATION**

In [None]:
train = pd.read_csv('/kaggle/input/digit-recognizer/train.csv')
test = pd.read_csv('/kaggle/input/digit-recognizer/test.csv')
train.head(5)

In [None]:
x_train = train.iloc[:,1:]
y_train = train.iloc[:,0]

After the inspecting the dataset given as a CSV file, I separate the Image inputs (784 features) and Labels ( 1 feature)

In [None]:
sns.barplot(x=y_train.unique(), y=y_train.value_counts())
plt.xlabel('Digits')
plt.ylabel('Number of image samples')

Looking at this class distribution graph, I can see that the classes are almost evenly distributed. Therefore, it does not require any resampling to be applied

In [None]:
def visualize_helper(index):
    plt.imshow(np.array(x_train.iloc[index, :]).reshape((28,28)), cmap = 'binary_r')
    plt.axis('off')
    plt.show()

visualize_helper(10)

I just visualize the image here. Since the images are given in a csv format, I reshaped it into the image format (i.e 28x28 = 784). For this purpose, I wrote a helper function. You can use modify and use this function according to your needs as well - Just give reference credit for this notebook!

In [None]:
x_train = np.array(x_train).reshape(-1,28,28,1)

We are reshaping the data into images.
* -1 is given so that the number of rows are dynamically written as per the dataset
* 28*28 is the size of the image
* 1 is channel as Images are in Black and White (Note - If it is in RGB, then channel value should be 3)

**DATA AUGMENTATION**

In [None]:
train_datagen = ImageDataGenerator(
    rescale = 1./255,
    rotation_range=20,
    zoom_range=0.2,
    width_shift_range = 0.1,
    height_shift_range = 0.1,
    validation_split=0.2
)

train_generator = train_datagen.flow(x_train, y_train, batch_size=64, subset='training')
validation_generator = train_datagen.flow(x_train, y_train, batch_size=64,subset='validation')

Data Augmentation is performed in order to introduce variations that we may encounter in the test set. These variations are domain specific. So Please pick and choose the variations according to your problem domain. Here I have used the following variations:
* Rescale - rescales the image in the range 0 to 1
* Rotation Range - Rotates for upto +/-20 degrees randomly
* Zoom Range - Zooms into the image for upto 20% randomly
* Height shift range - Shifts the height of image by +/-10%
* Width Shift range - Shifts width of image by +/-10%
Also, I have reserved 20% of the training set as the validation set

The augmentations are applied via the generator function provided by TF. Check this out for more info - https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator

**MODEL BUILDING**

In [None]:
nn_model = Sequential([
    
    Input(shape=(28,28,1)),
    Conv2D(16, 8, activation='relu', padding='same'),
    MaxPooling2D(),
    BatchNormalization(),
    Dropout(0.4),
    Conv2D(32, 6, activation='relu', padding='same'),
    MaxPooling2D(),
    BatchNormalization(),
    Dropout(0.3),
    Flatten(),
    Dense(256, activation='relu'),
    BatchNormalization(),
    Dropout(0.2),
    Dense(128, activation='relu'),
    BatchNormalization(),
    Dropout(0.1),
    Dense(10, activation='softmax')
])

nn_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

nn_model.summary()

* The Input Layer is defined to read the images into the CNN

*  Convolution Layer - These are important when nearby associations among the features matter. Here it is less likely that the pixels in opposite corner of the images will contribute to the classification. Therefore, I used the convolutional layer 

* Max Pooling Layer - It selects the maximum element from the region of the feature map covered by the filter. So, the output would be a feature map containing the most prominent features of the previous feature map.

* Batch Normalization - It is technique for training very deep neural networks that standardizes the inputs to a layer for each mini-batch. This has the effect of stabilizing the learning process and dramatically reducing the number of training epochs required to train deep networks.

* Dropout - It is a way of cutting too much association among features by dropping a specific percentage of weights randomly.

* Flatten -  It is used when we get multidimensional output and want to make it linear to pass it onto a Dense layer. It is equivalent to numpy.ravel (Look at the shape before and flatten in the model summary)

* Dense - are used when association can exist among any feature to any other feature in data point

**The output layer in a classification problem should always be a softmax layer with the number of output classes as its paramter**

Note - The choice of parameters are not explained here as it would take a good deal of understanding/explanation. I will make a separate notebook for this purpose. Please refer the TensorFlow documentation for now - https://www.tensorflow.org/api_docs/python/tf/keras/layers

In [None]:
early_stopping = tf.keras.callbacks.EarlyStopping(patience=20)

Early stopping is a method that allows you to specify an arbitrary large number of training epochs and stop training once the model performance stops improving on a hold out validation dataset. It helps to prevent overfitting of the model 

In [None]:
history=nn_model.fit(train_generator, validation_data = validation_generator, epochs=200, callbacks=[early_stopping])

In [None]:
def acc_loss_plot():
    fig, ax = plt.subplots(1,2,figsize=(15,7))
    ax[0].plot(history.history['accuracy'])
    ax[0].plot(history.history['val_accuracy'])
    ax[0].set_title('Accuracy')
    ax[0].set_ylabel('Accuracy')
    ax[0].set_xlabel('Epoch')
    ax[0].legend(['train', 'validation'], loc='lower right')
    ax[0].grid()
    ax[1].plot(history.history['loss'])
    ax[1].plot(history.history['val_loss'])
    ax[1].set_title('Loss')
    ax[1].set_ylabel('Loss')
    ax[1].set_xlabel('Epoch')
    ax[1].legend(['train', 'validation'], loc='upper right')
    ax[1].grid()
    plt.show()

acc_loss_plot()

An another helper function. This is to plot the training and loss curve to track the model's performance. Please cite the notebook before using the function

**PREDICTION AND SUBMISSION**

In [None]:
test = np.array(test).reshape(-1, 28, 28 , 1) / 255

In [None]:
preds = nn_model.predict(test)

In [None]:
labels = [np.argmax(x) for x in preds]
ids = [x+1 for x in range(len(preds))]

sub = pd.DataFrame()

In [None]:
sub['ImageId'] = ids
sub['Label'] = labels

sub.to_csv('submission.csv', index=False)