# Deep Convolutional Neural Network in Keras

In this notebook, we build a deep, convolutional, MNIST-hand written digits classifying network inspired by [LeNet-5](http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf)

Convolutional neural networks (CNN) have been some of the most influential innovations in the field of computer vision, When a computer sees an image (takes an image as input), it will see an array of pixel values. Each image is 28x28x3 pixels(3 is for RGB). Each pixel value is between 0 to 255 to describe the pixel intensity at that point.
In CNN a neuron represents a unique image pattern of 3x3. 


### Set random seed for reproducibility

In [1]:
import numpy as np
np.random.seed(42)

### Load dependencies
[Keras](https://keras.io/) is a API to use with Tensorflow.

In [2]:
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.layers import Flatten, Conv2D, MaxPooling2D # new!
from keras.optimizers import SGD

Using TensorFlow backend.
  return f(*args, **kwds)


### Load data from the NIST data set of hand written digits

In [3]:
(X_train, y_train), (X_test, y_test) = mnist.load_data()

Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz


### Preprocess data

In [9]:
X_train = X_train.reshape(60000, 28, 28, 1).astype('float32')
X_test = X_test.reshape(10000, 28, 28, 1).astype('float32')

#### Apply matrix scalar division to bring value between 0 to 1
Then convert the labels class vector into a matrix (integers from 0 to num_classes), using the one-hot encoding: (n classes, means n new features for each unique value in the nominal feature column. 

In [10]:
X_train /= 255
X_test /= 255

In [11]:
num_classes = 10
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

### Design neural network architecture
Using the simplest form with one hidden layer, one input and one output layers. The 3 layers are dense (all nodes are connected). Input is 28x28=784 array. This array will be the input to the neural network. The output layer should be a 10 dimension array set to 0 or 1 to the matching digit. Dense means the hidden layer of the neural network is connected to all the input layer and output layer. The first choice is to use 64 nodes in the hidden layer. 

Recall that the activation function represents a way to propagate the signal on the neuron. The sigmoid function:
$$ S(x)=frac(1,(1 + exp(-x)) $$

As there is 10 ouput, the random guess is at an accuracy of 10%. To get better results define the number of iterations on the neural network by setting the epochs.
SGD is the stochastic gradian descent used to optimize the cost function which is the mean squared error function. The learning rate is set to 0.01

In [12]:
from keras.optimizers import SGD
model = Sequential()
model.add(Dense(64,activation='sigmoid',input_shape=(784,)))
model.add(Dense(10,activation='softmax'))
model.summary()
# 50240 nodes is coming from 64 * 784 + 64, while 650 = 10*64+10
# compile and training using the stocastic gradient descent
model.compile(loss='mean_squared_error', optimizer=SGD(lr=0.01),metrics=['accuracy'])
model.fit(X_train, y_train, batch_size=128, epochs=50, verbose=1, validation_data=(X_test, y_test))
print("-> We can observe low accuracy and progressing slowly... continuing to 200 epochs will reach")
print("loss: 0.0282 - acc: 0.8587 - val_loss: 0.0272 - val_acc: 0.865\n\n")

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_5 (Dense)              (None, 64)                50240     
_________________________________________________________________
dense_6 (Dense)              (None, 10)                650       
Total params: 50,890
Trainable params: 50,890
Non-trainable params: 0
_________________________________________________________________


ValueError: Error when checking input: expected dense_5_input to have 2 dimensions, but got array with shape (60000, 28, 28, 1)


First layer is using convolutional two dimensions with 3x3 pixels. The input is a 28x28 black and white image (color is coded on 1 pixel). 
A kernel (or filter) is the ‘flashlight’ going over the image. The kernel needs to have the same depth as the image.
As the kernel is sliding, or convolving, around the input image, it is multiplying the values in the kernel with the original pixel values of the image and then summed to get a unique number. 



In [7]:
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)))


In [None]:
Second convolutioal layer has 64 neurons with still a 3x3 kernel
Third for every 4 pixel we reduce by one so the pool size is 2x2
dropout 1/4 of the neuron to control overfitting
flatten to one dimension

In [8]:
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(n_classes, activation='softmax'))
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 24, 24, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 12, 12, 64)        0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 12, 12, 64)        0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 9216)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 128)               1179776   
_________________________________________________________________
dropout_2 (Dropout)          (None, 128)               0         
__________

#### Configure model

In [9]:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

#### Train!

In [10]:
model.fit(X_train, y_train, batch_size=128, epochs=20, verbose=1, validation_data=(X_test, y_test))

Train on 60000 samples, validate on 10000 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20

KeyboardInterrupt: 