# Digit Recognizer CNN

This is my first attempt at a CNN and using keras. I am using a popular notebook by Poonam Ligade as a guide: https://www.kaggle.com/poonaml/deep-neural-network-keras-way. I am also using the documentation for Keras which can be found here: https://keras.io/about/ .This notebook is heavily annotated, because I want this to be a clear notebook with which I (and hopefully others) can learn from. I got a score of 0.98575 from this CNN, I didn't put a lot of time into the finetuning parameters, so this score can easily be improved. Feel free to use this as a template and tweak the parameters to get a better result.

In [None]:
# This Python 3 enle competitions download -c digit-recognizervironment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python

# Here are all the packages that I will be using
import numpy as np # Linear algebra
import pandas as pd # Data processing 
import matplotlib.pyplot as plt # Data visualistaion
from sklearn.model_selection import train_test_split # Creating the training for the neural network
from keras.models import Sequential # The structure of the neural network
from keras import backend as K # Output layer of the neural network
from keras.layers import Dense , Dropout , Lambda, Flatten # Functions for the inner layers of the neural network
from keras.optimizers import Adam ,RMSprop # Optimizing the network 
from keras.preprocessing.image import ImageDataGenerator # Preprocessing the images

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# Inspecting the Data

First I want to look at how the images are represented, so I know how to reshape them into a form that my CNN can use.

In [None]:
# Let's look at the training data
train = pd.read_csv("/kaggle/input/digit-recognizer/train.csv")
print(train.shape)
train.head()

In [None]:
# Let's get an overview of the training data
train.info()

In [None]:
# Let's now look at the test data
test = pd.read_csv("/kaggle/input/digit-recognizer/test.csv")
print(test.shape) 
test.head()

In [None]:
# Let's get an overview of the test data
test.info()

As expected the tables for the train and test data are very similar. The labels are a value from 0-9 corresponding to the digit that the image represents. The pixels have a spectrum of values depending on how black or white they are, (0 for completely black, 255 for completely white).  

In [None]:
# Now I want to split up the train data into the images (X) and the labels (y).
X_train = (train.iloc[:,1:].values).astype('float32') # Images (represented as pixel values)
y_train = train.iloc[:,0].values.astype('int32') # Labels (numbers represented by the images)
X_test = test.values.astype('float32')

In [None]:
# View X_train
X_train

In [None]:
# View y_train
y_train

The labels contain multiple values (integers from 0-9). Problems of this sought (with multiple outputs) are called multiclass classification problems. This will be important later because the sequential model (our input layer for the neural network), only works for problems with one input tensor and one output tensor. For more information go and read the documentation for sequential models in Keras which can be found here: https://keras.io/guides/sequential_model/.

# Visualising the Images

Before making the neural network, visualising the data can be a useful way for us to get an intutition for what our neural network should look like.

In [None]:
# Reshape the data so that into a 28x28 grid so that the pixels form the corresponding original image
X_train = X_train = X_train.reshape(X_train.shape[0], 28, 28)
X_test = X_test.reshape(X_test.shape[0], 28, 28)

# Generate the images
for i in range(20,23):
    plt.figure(figsize=(14,14)) # Scale up the image (to make it easier to see)
    plt.subplot(500 + (i+1)) # Creates the suplot
    plt.imshow(X_train[i], cmap=plt.get_cmap('gray')) # Displays the image using a greyscale colour map
    plt.title(y_train[i]) # Title for our image

In [None]:
# Reshaping the data so it adds in a greyscale dimension
X_train = X_train.reshape(X_train.shape[0], 28, 28, 1)
X_test = X_test.reshape(X_test.shape[0], 28, 28, 1)
X_train.shape

# Preprocessing the Data

The purpose of this step is to get the data into a form where it can be inputed into the neural network. Remember from earlier that sequential models can only take one input tensor and one output tensor. The Keras documentation of preprocessing data can be found here: https://keras.io/guides/preprocessing_layers/.   

In [None]:
# Normalising the training data
# Normalising formula: Z = (X-mean)/std, this changes the distrubtion of the data to N~(mean=0, var=1)
mean_px = X_train.mean().astype(np.float32)
std_px = X_train.std().astype(np.float32)

def standardize(x): 
    return (x-mean_px)/std_px

Now I would like to change the labels from being a digit from 0-9 to a one-hot vector. Which is a 10x1 vector that is zero in all the entries apart from the entry corresponding to the number, where it is 1. For example 0 is [1,0,0,0,0,0,0,0,0,0], 1 is [0,1,0,0,0,0,0,0,0,0] etc... The reason for doing this is because the neural network can output results in this form but not in the current form of the labels.

In [None]:
from keras.utils.np_utils import to_categorical # Changes the data to categorical data (the type we want)
y_train= to_categorical(y_train)
num_classes = y_train.shape[1] # Number to columns in our new y vector
num_classes

# Linear Model

In [None]:
# Fix the random seed for reproducibility
seed = 37
np.random.seed(seed)

In [None]:
from keras.models import  Sequential # Linear model
from keras.layers.core import  Lambda , Dense, Flatten, Dropout # Functions for manipulating the inner layers of the neural network
from keras.callbacks import EarlyStopping # Checking the models accuracy whilst training
from keras.layers import BatchNormalization, Convolution2D , MaxPooling2D # Inner layers of the neural network

* **Lambda** performs basic arithmetic operations on the inputs (sum, mean etc...)
* **Flatten** transforms the data into a 1D array
* **Dense** connects all the neurons in one layer to all the neurons in the next layer
* **Dropout** randomly sets input units to zero to help prevent overfitting

In [None]:
model= Sequential()
model.add(Lambda(standardize,input_shape=(28,28,1)))
model.add(Flatten())
model.add(Dense(10, activation='softmax')) # softmax function converts the inputs to a vector with elements in (0,1) with the sum of the elements = 1
print("input shape ",model.input_shape)
print("output shape ",model.output_shape)

There are a few more things to add before we can train the neural network.
1. A loss function to measure to good the networks guesses are, https://keras.io/api/losses/
2. An optimiser to update the network everytime it sees a new piece of data, https://keras.io/api/optimizers/
3. Metrics to montior the performance of the neural network, https://keras.io/api/metrics/

In [None]:
from keras.optimizers import RMSprop # RMSprop uses the moving mean squared average of the gradients to optimise the model 
model.compile(optimizer=RMSprop(lr=0.001), # lr is the learning rate (default is 0.001)
 loss='categorical_crossentropy', # categorical_crossentropy is used for labels that are one-hot vectors
 metrics=['accuracy']) # Calculates how often the neural networks prediction matches the label

In [None]:
from keras.preprocessing import image
gen = image.ImageDataGenerator() # Generates batches of tensor image data 

In [None]:
from sklearn.model_selection import train_test_split # Splits the training data to help avoid overfitting
X = X_train
y = y_train
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.1, random_state=37)
batches = gen.flow(X_train, y_train, batch_size=64)
val_batches=gen.flow(X_val, y_val, batch_size=64)

In [None]:
# Train the model
history=model.fit_generator(generator=batches, steps_per_epoch=batches.n, epochs=1, 
                    validation_data=val_batches, validation_steps=val_batches.n)

# Fully Connected Model

Fully connected models are models where all the neurons in one layer are fully connected to all the neurons in the next layer.

In [None]:
# Creates a Fully connected model
def get_fc_model():
    model = Sequential([
        Lambda(standardize, input_shape=(28,28,1)),
        Flatten(),
        Dense(512, activation='relu'), # relu (rectified linear unit) is the max(x,0), https://keras.io/api/layers/activations/#relu-function
        Dense(10, activation='softmax')
        ])
    model.compile(optimizer='Adam', # Adam is also known as Stochastic gradient descent, https://keras.io/api/optimizers/adam/ 
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])
    return model

In [None]:
# Train the model
fc = get_fc_model()
fc.optimizer.lr=0.01
history=fc.fit_generator(generator=batches, steps_per_epoch=batches.n, epochs=1, 
                    validation_data=val_batches, validation_steps=val_batches.n)

# Convolution Neural Networks (CNN)

CNN's differ from regular neural networks in that they look for features in the input data independent of where they are on the input data. They are especially good at classifying images for this reason.

Documentation:
* **Convolution2D**- https://keras.io/api/layers/convolution_layers/convolution2d/
* **MaxPooling2D**- https://keras.io/api/layers/pooling_layers/max_pooling2d/

In [None]:
# Creating the CNN
from keras.layers import Convolution2D, MaxPooling2D

def get_cnn_model():
    model = Sequential([
        Lambda(standardize, input_shape=(28,28,1)),
        Convolution2D(32,(3,3), activation='relu'),# Convolution2D(filters (specifies the number of pieces the data is divided into), strides (how many neighbouring inputs the layer considers), activation)
        Convolution2D(32,(3,3), activation='relu'),
        MaxPooling2D(), # Compacts the layer by taking the max value in a given window (default is 2x2)
        Convolution2D(64,(3,3), activation='relu'),
        Convolution2D(64,(3,3), activation='relu'),
        MaxPooling2D(),
        Flatten(),
        Dense(512, activation='relu'),
        Dense(10, activation='softmax')
        ])
    model.compile(Adam(), 
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])
    return model

In [None]:
# Train the CNN
model= get_cnn_model()
model.optimizer.lr=0.01
history=model.fit_generator(generator=batches, steps_per_epoch=batches.n, epochs=1, 
                    validation_data=val_batches, validation_steps=val_batches.n)

# Finetuning

Data augmentation is creating new data for the model by modifying existing data. This is done to avoid overfitting. For images this may include: rotating, cropping, translating, different brightness etc... 

In [None]:
# Augment the data
gen =ImageDataGenerator(rotation_range=8, width_shift_range=0.09, shear_range=0.3,
                               height_shift_range=0.09, zoom_range=0.04)
batches = gen.flow(X_train, y_train, batch_size=64) # Create the batches
val_batches = gen.flow(X_val, y_val, batch_size=64) # Create the labels for the batches

In [None]:
# Train the model with the augmented data
model.optimizer.lr=0.001
history=model.fit_generator(generator=batches, steps_per_epoch=batches.n, epochs=1, 
                    validation_data=val_batches, validation_steps=val_batches.n)

Batch Normalisation (BN) fine tunes the parameters to aid in training the neural network

In [None]:
# Create the BN model
from keras.layers.normalization import BatchNormalization

def get_bn_model():
    model = Sequential([
        Lambda(standardize, input_shape=(28,28,1)),
        Convolution2D(32,(3,3), activation='relu'),
        BatchNormalization(axis=1),
        Convolution2D(64,(3,3), activation='relu'),
        MaxPooling2D(),
        BatchNormalization(axis=1),
        Convolution2D(32,(3,3), activation='relu'),
        BatchNormalization(axis=1),
        Convolution2D(64,(3,3), activation='relu'),
        MaxPooling2D(),
        Flatten(),
        BatchNormalization(),
        Dense(512, activation='relu'),
        BatchNormalization(),
        Dense(10, activation='softmax')
        ])
    model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'])
    return model

In [None]:
# Update the Model with BN
model= get_bn_model()
model.optimizer.lr=0.01
history=model.fit_generator(generator=batches, steps_per_epoch=batches.n, epochs=1, 
                    validation_data=val_batches, validation_steps=val_batches.n)

In [None]:
# Train the model on the full dataset
model.optimizer.lr=0.01
gen = image.ImageDataGenerator()
batches = gen.flow(X, y, batch_size=64)
history=model.fit_generator(generator=batches, steps_per_epoch=batches.n, epochs=1)

In [None]:
# Kaggle Predictions
predictions = model.predict_classes(X_test, verbose=0)

submissions=pd.DataFrame({"ImageId": list(range(1,len(predictions)+1)),
                         "Label": predictions})
submissions.to_csv("DRC.csv", index=False, header=True)