# Convolutional Neural Network for MNIST
_Gabriella Mansur_

Sources: 
* https://machinelearningmastery.com/handwritten-digit-recognition-using-convolutional-neural-networks-python-keras/
* https://www.sitepoint.com/keras-digit-recognition-tutorial/
* https://www.kaggle.com/yassineghouzam/introduction-to-cnn-keras-0-997-top-6/notebook#3.-CNN



### 1. Import classes and functions

In [18]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import seaborn as sns
from sklearn.model_selection import train_test_split

from keras.utils.np_utils import to_categorical
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPool2D
from keras.layers.normalization import BatchNormalization
from keras.utils import np_utils

### 2. Load the given dataset

In [19]:
# Load the data
train = pd.read_csv("train.csv") #42000 rows, 784 pixel columns + 1 label column
test = pd.read_csv("test.csv")   #28000 rows, 784 pixel columns

In [20]:
# Split train into X and Y
Y = train['label']
X = train.drop(columns=['label'])

In [21]:
print(X.shape)
print(test.shape)
print(Y.shape)

(42000, 784)
(28000, 784)
(42000,)


In [22]:
X[100]

KeyError: 100

### 3. Pre-process data

#### 3.1. Reshape
We need to reshape the dataset so that it is suitable for use training a CNN. In Keras, the layers used for two-dimensional convolutions expect pixel values with the dimensions [pixels][width][height][channels]. width = 28, height = 28, and channels = 1 (RGB would be 3 channels, but here it is black/white)

In [None]:
X = X.values.reshape(-1, 28, 28, 1).astype('float32')
test = test.values.reshape(-1, 28, 28, 1).astype('float32')

#### 3.2. Normalize
We perform a grayscale normalization to reduce the effect of illumination's differences. Moreover the CNN converg faster on [0..1] data than on [0..255].

In [None]:
# normalize inputs from 0-255 to 0-1
X = X / 255
test = test / 255

In [None]:
X.shape

#### 3.3. Encode labels to one hot vectors

In [None]:
num_classes = 10
Y = to_categorical(Y, num_classes)

In [None]:
Y.shape

#### 3.4. Split the train and the validation set for the fitting

In [None]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.1, random_state = 2)

### 4. CNN model

   * Layer 1: Convolutional layer - 32 filters, size 5x5, relu activation, same padding
   * Layer 2: Convolutional layer - 32 filters, size 5x5, relu activation, same padding
   * Layer 3: Max pooling - 2x2
   * Layer 4: Dropout 25%


   * Layer 5: Convolutional layer - 64 filters, size 3x3, relu activation, same padding
   * Layer 6: Convolutional layer - 64 filters, size 3x3, relu activation, same padding
   * Layer 7: Max pooling - 2x2, stride 2x2
   * Layer 8: Dropout 25% 
   
   
   * Layer 9: Flatten layer
   * Layer 10: Dense layer, relu activation
   * Layer 11: Dropout 50% 
   * Layer 12: Dense layer, softmax activation
    
P.S.: A “relu” activation stands for “Rectified Linear Units”, which takes the max of a value or zero.

#### 4.1. Compile the model

In the model design process, we’ve created an empty model without an objective function. We need to compile the model and specify:

* a loss function,
* an optimizer function, and
* a metric to assess model performance.

#### 4.2 Train the model

The model is trained using logarithmic loss and the ADAM gradient descent algorithm.

In [None]:
def baseline_model():
    # create model

    model = Sequential()

    model.add(Conv2D(filters = 32, kernel_size = (5,5),padding = 'Same', activation ='relu', input_shape = (28,28,1)))
    model.add(Conv2D(filters = 32, kernel_size = (5,5),padding = 'Same', activation ='relu'))
    model.add(MaxPool2D(pool_size=(2,2)))
    model.add(Dropout(0.25))
    
    model.add(Conv2D(filters = 64, kernel_size = (3,3),padding = 'Same', activation ='relu'))
    model.add(Conv2D(filters = 64, kernel_size = (3,3),padding = 'Same', activation ='relu'))
    model.add(MaxPool2D(pool_size=(2,2), strides=(2,2)))
    model.add(Dropout(0.25))
    
    model.add(Flatten())
    model.add(Dense(256, activation = "relu"))
    model.add(Dropout(0.5))
    model.add(Dense(10, activation = "softmax"))
    
    # Compile model
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

In [None]:
# Build the model
model = baseline_model()

# Fit the model
model.fit(X_train, Y_train, validation_data=(X_test, Y_test), epochs=30, batch_size=86)

# Final evaluation of the model
scores = model.evaluate(X_test, Y_test, verbose=0)
print("CNN Error: %.2f%%" % (100-scores[1]*100))

### 5. Predictions

In [20]:
results = model.predict(test)
results = np.argmax(results,axis = 1)
results = pd.Series(results,name="Label")

In [22]:
submission = pd.concat([pd.Series(range(1,28001),name = "ImageId"),results],axis = 1)
submission.to_csv("results.csv",index=False)