#  MNIST Classification

MNIST is a Multiclass Classification project involving image recognition. We have to classify handwritten digits as 0 to 9.


Further reading:
Linked this? Try out how to get the computer to generate MNIST data (deep fake numbers in handwritten format)

Link: https://www.kaggle.com/kmldas/mnist-generative-adverserial-networks-in-pytorch/
Note this will be more advanced as it talks about GANs (generative adversrial networks) but good to know!


# Import Libraries

In [None]:
import numpy as np
import pandas as pd

import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras import layers

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix

keras.backend.set_image_data_format('channels_last')

# Read Directories & Folders

In [None]:
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

## Load the Data



In [None]:
mnist_train = pd.read_csv('/kaggle/input/digit-recognizer/train.csv')
mnist_test = pd.read_csv('/kaggle/input/digit-recognizer/test.csv')

display(" train data",mnist_train )
display(" train data",mnist_test )


# What is 784?

Each image is 28 pixels wide and 28 pixles long..... 28x28=

In [None]:
image_size=28*28
image_size

# Get trian and test data

In [None]:
# Convert to trian and test data; Preserve original dataset
X_train = mnist_train.drop('label', axis=1).copy()
X_test = mnist_test.copy()
Y_train = mnist_train['label'].copy()

In [None]:
X_train.describe()

In [None]:
# Normalize values
X_train = X_train / 255.0
X_test = X_test / 255.0

Why 255?

1 byte of information = 8 bits. each bit has 2 values 0 or 1; the color intensity is 2^8=256 possible value; 
i.e. goes from 0 to 255

by dividing by 255, we are making maximum value 1. Now black will be 1 and 0 is white; with various shades of grey in between

In [None]:
# Reshape to 28 x 28 so that we can see the image ie. handwritten number
X_train = X_train.values.reshape(-1, 28, 28, 1)
X_test = X_test.values.reshape(-1, 28, 28, 1)

##  Display images

To check whether everything worked as expected, let's take a look at a few images from each folder.

In [None]:

import random
no_images=len(X_train)

# Display random Image
fig, ax = plt.subplots(figsize=(10, 10))

plt.imshow(X_train[random.randint(0,no_images), :, :, 0], cmap='Greys', interpolation='nearest') 

# replace random.randint(0,no_images) in code above with a number if you want to see specific image. 
#This dispalys a random image each time

plt.title("Sample Image")
plt.show()

Here 1 is black and 0 is white....after normalization

Smaller size helps us visualise better?

In [None]:

# Display random Image
fig, ax = plt.subplots(figsize=(2,2)) # now fix size is 2 x 2

plt.imshow(X_train[random.randint(0,no_images), :, :, 0], cmap='Greys', interpolation='nearest')
plt.title("Sample Image")
plt.show()

### Data Conversion
We have a validation set of 20%

In [None]:
# Split between train and validation set
X_train, X_val, Y_train, Y_val = train_test_split(X_train, Y_train, test_size=0.2)

In [None]:
# Get one hot encoding
Y_train = keras.utils.to_categorical(Y_train, num_classes=10)
Y_val = keras.utils.to_categorical(Y_val, num_classes=10)

## Model Architecture


Defining the Model (Convolutional Neural Network)

The 2D convolution is a fairly simple operation at heart: you start with a kernel, which is simply a small matrix of weights. This kernel “slides” over the 2D input data, performing an elementwise multiplication with the part of the input it is currently on, and then summing up the results into a single output pixel. - [SOURCE- read more click here](https://towardsdatascience.com/intuitively-understanding-convolutions-for-deep-learning-1f6f42faee1)

![A standard convolution](https://miro.medium.com/max/535/1*Zx-ZMLKab7VOCQTxdZ1OAw.gif)


Below is a function defining a CNN model with 3 main blocks of Convolutional layers. The first two blocks follow the same structure:
1. Apply a Conv2D layer with 3x3 kernel size and valid padding, then another Conv2D layer with 3x3 kernel but with same padding to keep the same dimensions
2. Apply a BatchNormalization layer to avoid layers being too depended from one another and allowing each activation to have 0 mean
3. A RELU activation and a MaxPooling2D layer with 2x2 kernel size and stride=2
4. An element-wise Dropout layer applied to MaxPooling2D keeping 80% of the activation units

The final 2 layers consists of a Fully Connected Layer that uses a Softmax activation for classification.

In [None]:
# Build CNN Model
def CNN():
    model = keras.Sequential()
    # CONV > CONV > BN > RELU > MAXPOOLING > DROPOUT
    model.add(layers.Conv2D(32, (3, 3), (1, 1), padding='valid', input_shape=(28, 28, 1), name='conv2d_1_1'))
    model.add(layers.Conv2D(32, (3, 3), (1, 1), padding='same', name='conv2d_1_2'))
    model.add(layers.BatchNormalization(name='bn_1'))
    model.add(layers.Activation('relu', name='relu_1'))
    model.add(layers.MaxPooling2D((2, 2), (2, 2), padding='valid', name='mp2d_1'))
    model.add(layers.Dropout(0.2, name='drop_1'))
    # CONV > CONV > BN > RELU > MAXPOOLING > DROPOUT
    model.add(layers.Conv2D(64, (3, 3), (1, 1), padding='valid', name='conv2d_2_1'))
    model.add(layers.Conv2D(64, (3, 3), (1, 1), padding='same', name='conv2d_2_2'))
    model.add(layers.BatchNormalization(name='bn_2'))
    model.add(layers.Activation('relu', name='relu_2'))
    model.add(layers.MaxPooling2D((2, 2), (2, 2), padding='valid', name='mp2d_2'))
    model.add(layers.Dropout(0.2, name='drop_2'))
    # FLATTEN > DENSE > CLASSIFICATION
    model.add(layers.Flatten())
    model.add(layers.Dense(100, activation='relu'))
    model.add(layers.Dense(10, activation='softmax'))
    
    return model

In [None]:
model = CNN()

You can try different models- Resnet or Efficientnet and see how they work!

Here is a link to my model on multiclass classification of Human Protiens which was in the top 4% in the in-class competition
This uses CNN,  Resnet34,  Resnet50 and Resnet101 : [Human Protein Classification (top 4%) : PyTorch](https://www.kaggle.com/kmldas/human-protein-classification-top-4-pytorch)


### Model Compilation

Here we will use an Adam optimizer with a Cross entropy loss function.

In [None]:
model.compile(optimizer='adam', loss='CategoricalCrossentropy', metrics=['accuracy'])

In [None]:
model.summary()

## Training and Prediction

We will train the model for 50 epochs, with a batch size of 64.

In [None]:
history = model.fit(X_train, Y_train, validation_data=(X_val, Y_val), batch_size=64, epochs=50, verbose=1)

You can make verbose=0, in the code above if you dod not  want to see each step in the process

### Graphing Accuracy

Let us check how our model went by graphing accuracy with validation accuracy, and our training loss with validation loss.

In [None]:
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='accuracy')
plt.plot(history.history['val_accuracy'], label='val_accuracy')
plt.xlabel("Epochs")
plt.ylabel("Accuracy")
plt.legend(loc='lower right')

plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='loss')
plt.plot(history.history['val_loss'], label='val_loss')
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.legend(loc='lower right')

plt.tight_layout()
plt.show()

As you can see 20 epochs should have been fine. Not much improvement in scores after 20 epochs.

What can you change to get better validation? Try our different parameter changes and see!!

### Predictions

Below is a function to help see whether we have trained the model properly. "imgs" is a parameter to see the first x number of images in our test dataset.

In [None]:
def predict(model, X, imgs):
    s = int(np.sqrt(imgs))
    fig, ax = plt.subplots(s, s, sharex=True, sharey=True, figsize=(15, 15))
    ax = ax.flatten()
    preds = model.predict(X[:imgs])
    for i in range(imgs):
        y_pred = np.argmax(preds[i])
        img = X[i].reshape(28, 28)
        ax[i].imshow(img, cmap='Greys', interpolation='nearest')
        ax[i].set_title(f'p: {y_pred}')

In [None]:
predict(model, X_test, 25)

## Submission

We create the full prediction and place the predictions into the requested format.

In [None]:
y_pred = model.predict(X_test)
y_pred = np.argmax(y_pred, axis=1)

In [None]:
name="Jairam_Mohan" #Add your name here

file_name=name+"_mnist_submission.csv"

In [None]:
y_pred = pd.Series(y_pred, name='Label')
sub = pd.concat([pd.Series(range(1, 28001), name="ImageId"), y_pred], axis=1)
sub.to_csv(file_name, index=False)

Download the predictions from folder "output" 

refresh  if not visible

SUBMIT the predictions here: https://www.kaggle.com/c/digit-recognizer

This code gets you into the top 35% of the competition. 

What more can you do? try and experiment. Happy learning!


Look below for other attempts and read more kernels on this competition

# Acknowledgement, Sources and Suggestions




kernel by Chris: https://www.kaggle.com/christianwallenwein/beginners-guide-to-mnist-with-fast-ai

kernel by Timothy: https://www.kaggle.com/susantotm/digit-recognizer

kernel by Yassine: https://www.kaggle.com/yassineghouzam/introduction-to-cnn-keras-0-997-top-6

can you use FASTAI to do this faster?
https://www.fast.ai

Look at my colab notenook on using 5 lines of code in FASTAI to get similar results! 
https://colab.research.google.com/drive/1tuKzXuWgYuJVa83k6NiL0GVKy_hyJJni?usp=sharing (link updated for viewing only- Pls copy to own colab/download to run. This does not have edit access)
 
 
Try out how to get the computer to generate MNIST data (deep fake numbers in handwritten format)

Link: https://www.kaggle.com/kmldas/mnist-generative-adverserial-networks-in-pytorch/

